SPATIO-TEMPORAL FIELD PREDICTION UNDER LOCALIZATION
UNCERTAINTY FOR MOBILE SENSOR NETWORKS
By
Mahdi Jadaliha

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
Mechanical Engineering-Doctor of Philosophy
2013

ABSTRACT
SPATIO-TEMPORAL FIELD PREDICTION UNDER LOCALIZATION
UNCERTAINTY FOR MOBILE SENSOR NETWORKS
By
Mahdi Jadaliha
Recently, there has been a growing interest in wireless sensor networks due to the advanced embedded system and network technologies. Their applications include, but are not
limited to, environment monitoring, building comfort control, traﬃc control, manufacturing
and plant automation, and surveillance systems. The conventional inverse problem approach
based on physical transport models is computationally prohibitive for resource-constrained,
multi-agent systems. In contrast, Gaussian process and Gaussian Markov models have been
widely used to draw statistical inference from geostatistical and environmental data. However, the statistical models need to be carefully tailored such that they can be practical and
usable for mobile sensor networks with limited resources. Additionally, one needs to be careful in coping with localization uncertainty resulted from low-cost mobile sensor networks.
Thus, a fundamental problem in various applications is to correctly fuse the spatially collected data and estimate the process of interest under localization uncertainty. Motivated by
the aforementioned issues, in this dissertation, we consider the problem of using mobile sensor
networks to estimate and predict environmental ﬁelds modeled by spatio-temporal Gaussian
processes and Gaussian Markov random ﬁelds in the presence of localization uncertainty.
In the ﬁrst part of this dissertation, we formulate Gaussian process regression with observations under the localization uncertainty. In our formulation, eﬀects of measurement
noise, localization uncertainty, and prior distributions are all correctly incorporated in the
posterior predictive statistics. The analytically intractable posterior predictive statistics are

proposed to be approximated by two techniques, viz., Monte Carlo sampling and Laplaces
method. In addition, the localization problem is studied in this part, when the position of
the robot is estimated by a maximum likelihood estimation (MLE) using vision data. We
transform the high dimensional vision data to a set of uncorrelated feature candidates. A
multivariate GP regression with unknown hyperparameters is formulated to connect the set
of selected features to their corresponding sampling positions. In order to decrease computational load and increase the accuracy of localization, a feature reduction approach is
developed. Therefore, a subset of the features is selected to minimize the localization error
using cross-validation methods.
In the second part of the dissertation, we consider the problem of predicting a spatial
(spatio-temporal) Gaussian Markov random ﬁeld (GMRF) using sequential noisy observations collected by robotic sensors. The random ﬁeld of interest is modeled by a GMRF
instead of Gaussian process. In this way, we propose iteratively updated predictive inferences. We derive the exact Bayesian solution to the problem of computing the predictive
inference of the random ﬁeld, taking into account uncertain hyperparameters, measurement
noise, and uncertain localization in a fully Bayesian point of view. We show that the exact
solution is not scalable as the number of observations increases. To cope with this exponentially increasing complexity, we propose scalable approximations with a controllable tradeoﬀ
between approximation error and complexity to the exact solution. Finally, we derive an
approximate Bayesian solution to the problem of the simultaneously localization and computing the predictive inferences, taking into account observations, uncertain hyperparameters,
measurement noise, kinematics of robotic sensors, and uncertain localization.

ACKNOWLEDGMENTS

I would like to thank everybody who helped to accomplish this dissertation. First of all,
I would like to express my sincere appreciation and gratitude to my adviser, Dr. Jongeun
Choi, who continuously encouraged, and supported me. This thesis is not possible without
his invaluable advice and eﬀort to improve the work. In addition, I would like to thank all
my committee members, Dr. Steven Shaw, Dr. Guoming Zhu, and Dr. Tapabrata Maiti,
for their contribution and suggestions.
Last but not least, I cannot thank enough my sister, my parents and my wife for always
being there for me. I would never have been able to accomplish what I have without them.

iv

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Chapter 1 Preliminaries . . . . . . . . . . .
1.1 Mathematical Notation . . . . . . . . . .
1.2 Physical Process Model . . . . . . . . . .
1.2.1 Gaussian process . . . . . . . . .
1.2.2 Spatio-temporal Gaussian process
1.2.3 Gaussian Markov random ﬁeld . .
1.3 Mobile Sensor Network . . . . . . . . . .
1.4 Gaussian Processes for Regression . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

1
1
2
3
4
5
5
7

Chapter 2 Gaussian process regression under localization uncertainty
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . .
2.2.2 Monte Carlo and Importance Sampling . . . . . . . . . . . . . .
2.2.3 Laplace Approximations . . . . . . . . . . . . . . . . . . . . . .
2.3 The Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 The Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Predictive mean . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Predictive variance . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Fully Exponential Laplace Approximations . . . . . . . . . . . . . . . .
2.5.1 Perdictive mean . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Predictive variance . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.3 Simple Laplace approximations . . . . . . . . . . . . . . . . . .
2.6 Error and Complexity Analysis . . . . . . . . . . . . . . . . . . . . . .
2.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.1 Experiment with simulated noisy sampling positions . . . . . . .
2.8.2 Experiment with noisy sampling positions . . . . . . . . . . . .
2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

10
11
15
15
18
20
23
27
27
28
29
30
32
32
34
35
38
38
49
51

v

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

55
55
60
63
64
66
69
69
70
72
77

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

79
80
83
88
89
93
93
99
106
107
113
116
117
121

Chapter 5 Simultaneous Localization and Spatial Prediction . .
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Sequential Bayesian inference with a GMRF . . . . . . . . . . . .
5.2.1 Gaussian processes and Gaussian Markov random ﬁelds . .
5.2.2 Multiple robotic sensors . . . . . . . . . . . . . . . . . . .
5.2.3 Kinematics of robotic vehicles . . . . . . . . . . . . . . . .
5.2.4 Problem formulation and its Bayesian predictive inference
5.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

122
122
125
125
126
127
129
133
135

Chapter 3 Position estimation using Gaussian process regression
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Image features . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Gaussian process (GP) model . . . . . . . . . . . . . . . . . . . .
3.3.1 The ρ-th random ﬁeld . . . . . . . . . . . . . . . . . . . .
3.4 Localization and feature selection . . . . . . . . . . . . . . . . . .
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Experimental setups . . . . . . . . . . . . . . . . . . . . .
3.5.2 Learning of GP models in an empirical Bayes approach . .
3.5.3 Comparison study . . . . . . . . . . . . . . . . . . . . . .
3.6 Conclusion and future works . . . . . . . . . . . . . . . . . . . . .
Chapter 4 Bayesian Prediction with Uncertain Localization
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Gaussian Process and Gaussian Markov Random ﬁelds . . .
4.3 Mobile Senor Networks . . . . . . . . . . . . . . . . . . . . .
4.4 Spatio-Temporal Field Model . . . . . . . . . . . . . . . . .
4.5 Bayesian Predictive Inference . . . . . . . . . . . . . . . . .
4.5.1 Uncertain hyperparameters and exact localization . .
4.5.2 Uncertain hyperparameters and localization . . . . .
4.5.3 Complexity of Algorithms . . . . . . . . . . . . . . .
4.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . .
4.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . .
4.7.1 The Static Experiment . . . . . . . . . . . . . . . . .
4.7.2 The Dynamic Experiment . . . . . . . . . . . . . . .
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

Chapter 6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . 138
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
BIBLIOGRAPHY

. . . . . . . . . . . . .

vi

. . . . . . . . . . . . . . . . . 141

LIST OF TABLES

Table 2.1

Error and complexity analysis . . . . . . . . . . . . . . . . . . . . .

35

Table 2.2

Simulation RMS error . . . . . . . . . . . . . . . . . . . . . . . . . .

37

Table 2.3

Experiment parameters . . . . . . . . . . . . . . . . . . . . . . . . .

45

Table 2.4

Experimental RMS error w.r.t. Gaussian estimation . . . . . . . . .

48

Table 3.1

Computational time . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

Table 3.2

The localization performance comparison . . . . . . . . . . . . . . .

74

vii

LIST OF FIGURES

Figure 2.1

For interpretation of the references to color in this and all other ﬁgures, the reader is referred to the electronic version of this dissertation. The prediction results of applying Gaussian process regression
on the true and noisy sampling position are shown. The ﬁrst, second, and third rows correspond to the prediction, prediction error
variance, and squared empirical error (between predicted and true
ﬁelds) ﬁelds. The ﬁrst column shows the result of applying Gaussian
process regression on the true sampling positions. Second and third
columns show the results of applying Gaussian process regression on
the noisy sampling positions with 0.1 and 0.4 noise covariance matrices, respectively. True and noisy sampling positions are shown in
circles with agent indices in the second row. . . . . . . . . . . . . . .

13

Figure 2.2

A realization of the Gaussian process-ground truth. . . . . . . . . .

35

Figure 2.3

Gaussian process regression using the true positions x. a) The predictive mean estimation. b) The predictive variance estimation, and
the true sampling positions x in aquamarine crosses. . . . . . . . . .

38

¯
The results of the QDS 1 approximations using x. a) The predictive
¯
mean estimation. b) The predictive variance estimation, and x shown
with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . .

39

The results of the QDS 2 approximations using E(x|¯ ). a) The prex
dictive mean estimation. b) The predictive variance estimation, and
E(x|¯ ) shown with aquamarine crosses. . . . . . . . . . . . . . . . .
x

40

The results of the Monte Carlo approximations with 1000 samples us¯
ing x. a) The predictive mean estimation. b) The predictive variance
¯
estimation, and x shown with aquamarine crosses.. . . . . . . . . . .

41

¯
The results of the Laplace approximations using x. a) Predictive
¯
mean estimation. b) The predictive variance estimation, and x shown
with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . .

42

Figure 2.4

Figure 2.5

Figure 2.6

Figure 2.7

viii

Figure 2.8

Figure 2.9

Figure 2.10

Figure 2.11

Figure 2.12

Figure 2.13

Figure 2.14

Figure 2.15

Figure 2.16

Figure 2.17

¯
The results of the simple Laplace approximations using x. a) The
predictive mean estimation. b) The predictive variance estimation,
ˆ
ˆ
and x shown with aquamarine crosses. x is the estimation of the true
sampling positions, computed as a by-product of both fully exponential Laplace and simple Laplace approximations. . . . . . . . . . . .

43

(a) The dye pumping system. (b) An example of normal dye concentration applied to the stream for the pheromone distribution estimation. (c) An example of the highly visible dye plume near the source
location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

The results of Gaussian process regression using true x. a) The posterior mean estimation. b) The posterior variance estimation, and x
shown with aquamarine crosses. . . . . . . . . . . . . . . . . . . . .

46

¯
The results of the QDS 1 approach using noisy positions x. a) The
predictive mean estimation. b) The predictive variance estimation,
¯
and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . .

46

The results of the Monte Carlo method with 1000 samples using noisy
¯
positions x. a) The predictive mean estimation. b) The predictive
¯
variance estimation, and x shown with aquamarine crosses. . . . . .

47

¯
The results of the Laplace method using noise augmented positions x
shown with aquamarine crosses. a) The predictive mean estimation.
¯
b) The predictive variance estimation and x shown with aquamarine
crosses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

The results of the simple Laplace method using noise augmented po¯
sitions x shown with aquamarine crosses. a) The predictive mean
ˆ
estimation. b) The predictive variance estimation and x shown with
aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

a) The developed robotic sensor. b) The experimental environmenta 12 × 6 meters outdoor swimming pool. . . . . . . . . . . . . . . .

50

¯
The results of the QDS 1 approach using noisy positions x. a) The
predictive mean estimation. b) The predictive variance estimation,
¯
and x shown with aquamarine crosses. . . . . . . . . . . . . . . . . .

51

The results of the Monte Carlo method with 1000 samples using noisy
¯
positions x. a) The predictive mean estimation. b) The predictive
¯
variance estimation, and x shown with aquamarine crosses. . . . . .

52

ix

Figure 2.18

Figure 2.19

Figure 3.1

Figure 3.2

¯
The results of the Laplace approximations using x. a) Predictive
¯
mean estimation. b) The predictive variance estimation, and x shown
with aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . .

53

The results of the simple Laplace method using noise augmented po¯
sitions x shown with aquamarine crosses. a) The predictive mean
ˆ
estimation. b) The predictive variance estimation and x shown with
aquamarine crosses. . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

(a) and (b) show the wrapped omnidirectional image and the unwrapped panoramic image, respectively. (c) and (d) show the reduced
size gray scale image and the two-dimensional FFT magnitude plot,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

(a) Dot iPhone Panorama lens is used for Case 1. (b) A vision-based
robot was built equipped with a 360 degree omnidirectional camera
for Case 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

Figure 3.3

Estimated hyperparameters for each individual scalar ﬁeld for Case 1. 72

Figure 3.4

Estimated hyperparameters for each individual scalar ﬁeld for Case 2. 72

Figure 3.5

Predictive mean function map for each individual feature zk for Case 1. 73

Figure 3.6

Prediction variance map for each individual feature zk for Case 1. .

Figure 3.7

Predictive mean function map for each individual feature zk for Case 2. 76

Figure 3.8

Prediction variance map for each individual feature zk for Case 2. .

77

Figure 4.1

Example of localization uncertainty for s[i] . The measured sampling
location q [i] is indicated in a small red circle which is the closest
˜
point in the discrete support to the measured sampling position in the
continuous space. The small red circle along with the blue squares
and the blue star show the possible locations of the true sampling
point q [i] according to the prior distribution P(q [i] = ˚(j) |˜[i] ) with
s q
a compact support as shown in the big red circle. The blue star
indicates q [i] which is the closest point in the discrete support to the
true sampling position in the continuous space. . . . . . . . . . . .

88

Elements of the precision matrix Qθ related to a given location. . .

92

Figure 4.2

x

75

Figure 4.3

Agents 1 and 3 measure their exact sampling positions while agent 2
has a noisy sampling position measured in the star location. Four
possible sampling positions for the true sampling position of agent 2
6
2 6 2
are 2a, 2b, 2c and 2d with the prior probabilities 16 , 16 , 16 , and 16 ,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Figure 4.4

(a) The realized spatio-temporal ﬁeld at time t = 2 is the summation
of (b) the smooth mean ﬁeld and (c) the GMRF as described in (4.13).109

Figure 4.5

The prediction results of Cases 1, 2, and 3 at time t = 2 are shown in
the ﬁrst, second, third columns, respectively. The ﬁrst, second, and
third rows correspond to the prediction, prediction error variance,
and squared empirical error ﬁelds between the prediction and true
ﬁelds. True, noisy, and probable sampling positions are shown in
circles, stars, and corners of squares, respectively. . . . . . . . . . . . 111

Figure 4.6

(a) The RMS estimation error of βt v.s. time and (b) the posterior
probability of the true hyperparameter vector v.s. time. The results
for Cases 1, 2, and 3 are shown in blue circles, green stars, and red
squares, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Figure 4.7

The mean square diﬀerence between Case 1 and Case 3 is shown for
the diﬀerent approximation orders m = 1, · · · , 5. On each box, the
central mark is the median, the upper and lower edges of the box are
the 25th and 75th percentiles, respectively. The whiskers extend to
the most extreme data points. . . . . . . . . . . . . . . . . . . . . . 113

Figure 4.8

The number of observations with uncertain sampling position on the
three diﬀerent cases is shown on the horizontal axis. The ﬁeld prediction RMS error for Case 1, Case 2 and Case 3 are shown with white,
black, and hatched bars, respectively. . . . . . . . . . . . . . . . . . 114

Figure 4.9

A vision-based robot is built by the authors, and equipped with a 360
degree omnidirectional camera. . . . . . . . . . . . . . . . . . . . . . 115

Figure 4.10

The experimental setup is shown. The measured position of the robot
and 4 possible sampling positions are shown in dark green star and
light green squares, respectively. The spacial sites are marked with
aqua blue dots on the ground. The panoramic view of the robot is
pictured on the upper left hand side of the ﬁgure. . . . . . . . . . . 115

Figure 4.11

The posterior probability of the hyperparameter vector at t = 2. . . 117

xi

Figure 4.12

The prediction results of Cases 1, 2, and 3 at time t = 2 are shown
in the ﬁrst, second, third columns, respectively. The ﬁrst and second
rows correspond to the predictions and the natural logarithm of the
prediction error variance. . . . . . . . . . . . . . . . . . . . . . . . 118

Figure 4.13

The time varying ﬁeld is shown for 4 time iterations. The ﬁrst column
shows the moving object in the ﬁeld. The second column shows the
sampling positions with blue dots and the positioning blind spot with
black areas. the third, forth and ﬁfth columns show predicted ﬁeld
for Case 1, Case 3 and Case 4, respectively. The rows are correspond
to the time intervals t = 1, 2, 3 and 4. . . . . . . . . . . . . . . . . . 120

Figure 5.1

The prediction results of Cases 1, 2, and 3 at time t = 100 are shown
in the ﬁrst, second, third columns, respectively. The ﬁrst, second, and
third rows correspond to the prediction, prediction error variance,
and squared empirical error ﬁelds between the prediction and true
ﬁelds. True, noisy, and probable sampling positions are shown in
circles, stars, and dots, respectively. The x and y axis represent 2-D
localization, and colors represent the value of the desired quantity in
the locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Figure 5.2

The trajectories of true, predicted, and noisy sampling positions of
the robot are shown by red diamonds, blue dots, and green stars for
time t ∈ {10, 11, · · · , 30}. The blue ellipsoids show the conﬁdence
regions of about 68% for the estimated sampling positions. . . . . . 137

xii

LIST OF ALGORITHMS

9
6

xiii

Chapter 1
Preliminaries

1.1

Mathematical Notation

Standard notation will be used throughout the dissertation. Let R, R≥0 , R>0 , Z, Z≥0 , Z>0
denote, respectively, the sets of real, non-negative real, positive real, integer, non-negative
integer, and positive integer numbers. In denotes the identity matrix of size n (I for an
appropriate dimension).
T

For column vectors va ∈ Ra , vb ∈ Rb , and vc ∈ Rc , col(va , vb , vc ) :=

T
va

T
vb

T
vc

∈

Ra+b+c stacks all vectors to create one column vector, and va denotes the Euclidean norm
(or the vector 2-norm) of va . Let rowi (A) ∈ Rm and colj (A) ∈ Rn denote the i-th row and
the j-th column of a matrix A ∈ Rn×m , respectively.
|A| and |(| A) denote the determinant of a matrix A ∈ Rn×n , and tr(A) denotes trace of
a square matrix A. Let AT ∈ Rm×n be the transpose of a matrix A ∈ Rn×m . The positive
deﬁniteness and the positive semi-deﬁniteness of a square matrix A are denoted by A
and A

0

0, respectively. The symbol ⊗ denotes the Kronecker product.

Let E, Var and Cov denote the expectation operator, the variance operator and the
covariance operator, respectively.
A random vector z ∈ Rq , which is distributed by a multivariate Gaussian distribution of
a mean µ ∈ Rq and a variance Σ ∈ Rq×q , is denoted by z ∼ N (µ, Σ).

1

We deﬁne the ﬁrst and the second derivative operators on h : Rm → R with respect to a
vector x ∈ Rm as follow.
∂h(x)
∂h(x) T
,··· ,
,
∂x1
∂xm

∂ 2 h(x)
∂ 2 h(x)
 ∂x1 ∂x1 · · · ∂x1 ∂xm

2
.
.
2 h(x) = ∂ h(x) = 
..
.
.

.
.
.
T

∂x∂x
 2
∂ h(x)
∂ 2 h(x)
· · · ∂x ∂x
∂x ∂x
h(x) =

∂h(x)
=
∂x

m

1

m





.



m

Let x denote the standard Euclidean norm (2-norm) of a vector x. If there exists c
and k ∈ R>0 , such that the approximation error satisﬁes x − x ≤ k| |, ∀ < c, we say that
ˆ
the error between x and x is of order O( ) and also write x − x = O( ).
ˆ
ˆ
The relative complement of a set A in a set B is denoted by B \ A := B ∩ Ac , where Ac is
the complement of A. For a set A ∈ I, we deﬁne zA = {zi | i ∈ A}. Let −A denote the set
I \ A. For given A = {c, d} and B = {1, 2}, the multiplication between two sets is deﬁned
as B × A = {(1, c), (1, d), (2, c), (2, d)}. Other notation will be explained in due course.

1.2

Physical Process Model

In this section, we review important notions on the Gaussian process which will be used
to model the physical phenomenon. In particular, we introduce a class of spatio-temporal
Gaussian process model with anisotropic covariance functions. The properties of Gaussian
Markov Random ﬁelds (GMRF) are also brieﬂy reviewed.

2

1.2.1

Gaussian process

A Gaussian process can be thought of a generalization of a Gaussian distribution over a ﬁnite
vector space to function space of inﬁnite dimension. It is formally deﬁned as follows [83, 84]:
Deﬁnition 1.2.1. A Gaussian process (GP) is a collection of random variables, any ﬁnite
number of which have a consistent1 joint Gaussian distribution.
A Gaussian process
z(x) ∼ GP µ(x), K(x, x ; θ)

(1.1)

is completely speciﬁed by its mean function µ(x) and covariance function K(x, x ; θ). Although not needed to be done, we take the mean function to be zero for notational simplicity2 , i.e., µ(x) = 0. If the covariance function K(x, x ; θ) is invariant to translations in
the input space, i.e., K(x, x ; θ) = K(x − x ; θ), we call it stationary. Furthermore, if the
covariance function is a function of only the distance between the inputs, i.e., K(x, x ; θ) =
K( x − x ; θ), then it is called isotropic.
In practice, a parametric family of functions is used instead of ﬁxing the covariance
function [3]. One common choice of a stationary covariance function is

K(x, x ; θ) = σf 2 exp





b

−
=1



x −x

2

2σ 2



,

(1.2)

where x is the -th element of x ∈ Rb . From (1.2), it can be easily seen that the correlation
between two inputs decreases as the distance between them increases. This decreasing rate
1 It is also known as the marginalization property [83]. It means simply that the random

variables obey the usual rules of marginalization, etc.
2 This is not a drastic limitation since the mean of the posterior process is not conﬁned
to zero [84].
3

depends on the choice of the kernel bandwidths {σ }. A very large kernel bandwidth means
that the predictions would have little bearing on the corresponding input which is then said
to be insigniﬁcant. σf 2 gives the overall vertical scale relative to the mean of the Gaussian
process in the output space. These parameters play the role of hyperparameters since they
correspond to the hyperparameters in neural networks and in the standard parametric model.
Therefore, we deﬁne θ = (σf 2 , σ1 , · · · , σb )T ∈ Rb+1 as the hyperparameter vector.

1.2.2

Spatio-temporal Gaussian process

In the ﬁrst part of this dissertation, spatio-temporal Gaussian processes are of particular
interest. Consider a spatio-temporal Gaussian process

z(s, t) ∼ GP(µ(s, t), K(s, t, s , t ; θ)),

which is a special case of the Gaussian process deﬁned in (1.1), where x = (sT , t)T ∈
Rι × R≥0 . We consider the following generalized anisotropic covariance function K(x, x ; θ)
with a hyperparameter vector θ := (σf 2 , σ1 , · · · , σι , σt )T ∈ Rι+2 :
ι

K(x, x ; θ) = σf 2 exp

−

(s − s )2
2σ 2

=1

exp −

(t − t )2
2
2σt

,

(1.3)

where s, s ∈ S ⊂ Rι , t, t ∈ R≥0 . {σ1 , · · · , σι } and σt are kernel bandwidths for space
and time, respectively. (1.3) shows that points close in the measurement space and time
indices are strongly correlated and produce similar values. In reality, the larger temporal
distance two measurements are taken with, the less correlated they become, which strongly
supports our generalized covariance function in (1.3). This may also justify the truncation

4

(or windowing) of the observed time series data to limit the size of the covariance matrix for
reducing the computational cost. A spatially isotropic version of the covariance function in
(1.3) has been used in [62].

1.2.3

Gaussian Markov random ﬁeld

In this section, we introduce Gaussian Markov random ﬁelds (GMRFs). An undirected
graph G = (V, E) is a tuple consisting of a set of vertices V := {1, · · · , n} and a set of edges
E ⊂ V × V. The neighbors of i ∈ V in G are denoted by Ni := {j ∈ V | {i, j} ∈ E}. The
GMRF is formally deﬁned as follows [88].
Deﬁnition 1.2.2. (GMRF, [88, Deﬁnition 2.1]) A random vector z = (z [1] , · · · , z [n] )T
∈ Rn is called a GMRF with respect to a graph G = (V, E) with mean µ and precision matrix
0, if and only if its density has the form

Q

P(z) =

|Q|1/2
(2π)n/2

1
exp − (z − µ)T Q(z − µ) ,
2

and (Q)ij = 0 ⇔ {i, j} ∈ E for all i = j, where the precision matrix (or information
matrix) Q = K−1 is the inverse of the covariance matrix K.

1.3

Mobile Sensor Network

In this section, we consider the sensor network formed by multiple mobile sensing agents and
present the measurement model used throughout the thesis.
Let N be the number of sensing agents distributed over the surveillance region S ⊂ Rι .
The identity of each agent is indexed by I := {1, 2, · · · , N }. Assume that all agents are
5

equipped with identical sensors and take noisy observations at time t ∈ Z>0 . At time t, the
sensing agent i takes a noise-corrupted measurement y [i] (t) at its current location s[i] (t) ∈ S,
i.e.,
[i] (t) i.i.d. N (0, σ 2 ),
∼

y [i] (t) = z(s[i] (t), t) + [i] (t),

where the sensor noise [i] is considered to be an independent and identically distributed
Gaussian random variable. σ 2 > 0 is the noise level and we deﬁne the signal-to-noise ratio
as
γ=

σf 2

.
σ2

Notice that when a static ﬁeld is considered, we have z(s, t) = z(s).
For notational simplicity, we denote the collection of positions of all N agents at time t
as st , i.e.,
T

st := s[1] (t)T , · · · , s[N ] (t)T

∈ SN .

The collective measurements from all N mobile sensors at time t is denoted by

yt := y [1] (t), · · · , y [N ] (t)

T

∈ RN .

The cumulative measurements from time t ∈ Z>0 to time t ∈ Z>0 is denoted by

yt:t := yT , · · · , yT
t
t

6

T

∈ R(t −t+1)N .

1.4

Gaussian Processes for Regression
(x[i] , y [i] ) | i = 1, · · · , n

Suppose we have a data set D =

collected by mobile sensing

agents where x[i] denotes an input vector of dimension b and y [i] denotes a scalar value
of the noise corrupted output. The objective of probabilistic regression is to compute the
predictive distribution of the function values z := z(x ) at some test input x .
For notational simplicity, we deﬁne the collective input vector x of dimension n × b as the
aggregation of n input vectors (i.e., x := (x[1] , · · · , x[n] )T ), and the outputs are collected in
a vector y := (y [1] , · · · , y [n] )T . The corresponding vector of noise-free outputs is deﬁned as
z := (z(x[1] ), · · · , z(x[n] ))T .
The advantage of the Gaussian process formulation is that the combination of the prior
and noise models can be carried out exactly via matrix operations [111]. The idea of Gaussian process regression is to place a GP prior directly on the space of functions without
parameterizing the function z(·), i.e.,

P(z|θ) = N (µ, K),

where µ ∈ Rn is the mean vector obtained by µi = µ(x[i] ), and K := Cov(z, z|θ) ∈ Rn×n
is the covariance matrix obtained by (K)ij = K(x[i] , x[j] ; θ). Notice that the GP model and
all expressions are always conditional on the corresponding inputs. In the following, we will
neglect the explicit conditioning on the input matrix x.
The inference in the Gaussian process model is as follows. First, we assume a joint GP

7

prior P(z, z |θ) over functions, i.e.,


 



k
 µ   K

P(z, z |θ) = N 
,
 ,
T K(x , x ; θ)
µ(x )
k

(1.4)

where k := Cov(z, z |θ) ∈ Rn is the covariance between z and z

obtained by ki =

K(x[i] , x ; θ). Then, the joint posterior is obtained using Bayes rule, i.e.,

P(z, z |θ, y) =

P(y|z) P(z, z |θ)
,
P(y|θ)

where we have used P(y|z, z ) = P(y|z). Finally, the desired predictive distribution P(z |θ, y)
is obtained by marginalizing out the latent variables in z, i.e.,

P(z |θ, y) =

P(z, z |θ, y)dz =

1
P(y|θ)

P(y|z) P(z, z |θ, y)dz.

(1.5)

Since we have the joint Gaussian prior given in (1.4) and

y|z ∼ N z, σ 2 I ,

the integral in (1.5) can be evaluated in closed-form and the predictive distribution turns
out to be Gaussian, i.e.,
2
z |θ, y ∼ N µz |θ,y , σz |θ,y ,

(1.6)

µz |θ,y = µ(x ) + k T (K + σ 2 I)−1 (y − µ),

(1.7)

where

8

and
2
σz |θ,y = K(x , x ; θ) − k T (K + σ 2 I)−1 k.

(1.8)

For notational simplicity, we deﬁne the covariance matrix of the noisy observations as C :=
Cov(y, y|θ) = K + σ 2 I.

9

Chapter 2
Gaussian process regression under
localization uncertainty
Abstract: In this chapter, we formulate Gaussian process regression with observations under the localization uncertainty due to the resource-constrained mobile sensor networks. In
our formulation, eﬀects of observations, measurement noise, localization uncertainty, and
prior distributions are all correctly incorporated in the posterior predictive statistics. The
analytically intractable posterior predictive statistics are proposed to be approximated by
two techniques, viz., Monte Carlo sampling and Laplace’s method. Such approximation
techniques have been carefully tailored to our problems and their approximation error and
complexity are analyzed. Simulation study demonstrates that the proposed approaches perform much better than approaches without considering the localization uncertainty properly.
Finally, we have applied the proposed approaches on the experimentally collected real data
from a dye concentration ﬁeld over a section of a river and a temperature ﬁeld of an outdoor
swimming pool to provide proof of concept tests and evaluate the proposed schemes in real
situations. In both simulation and experimental results, the proposed methods outperform
the quick-and-dirty solutions often used in practice.

10

2.1

Introduction

Recently, there has been a growing interest in wireless sensor networks due to advanced embedded network technology. Their applications include, but are not limited to, environment
monitoring, building comfort control, traﬃc control, manufacturing and plant automation,
and surveillance systems [17, 25]. Mobility in a sensor network can increase its sensing
coverage both in space and time, and robustness against uncertainties in environments. Exploitation of mobile sensor networks has been increased in collecting spatio-temporal data
from the environment [10, 13, 64].
Gaussian process regression (or Kriging in geostatistics) has been widely used to draw
statistical inference from geostatistical and environmental data [14, 84]. Gaussian process
modeling enables us to predict physical values, such as temperature or harmful algae bloom
biomass, at any point and time with a predicted uncertainty level. For example, nearoptimal static sensor placements with a mutual information criterion in Gaussian processes
were proposed in [56, 57]. A distributed Kriged Kalman ﬁlter for spatial estimation based
on mobile sensor networks was developed in [13]. Centralized and distributed navigation
strategies for mobile sensor networks to move in order to reduce prediction error variances
at points of interest were developed in [118].
Localization in sensor networks is a fundamental problem in various applications to correctly fuse the spatially collected data and estimate the process of interest. However, obtaining precise localization of robotic networks under limited resources is very challenging [37,38].
Among the localization schemes, the range-based approach [50, 78] provides higher precision
as compared to the range-free approach that could be cost-eﬀective. The global positioning
system (GPS) becomes one of the major absolute positioning systems in robotics and mobile

11

sensor networks. Most of aﬀordable GPSs slowly update their measurements and have resolution worse than one meter. A GPS is often augmented by the inertial navigation system
(INS) for better resolution [102]. In practice, resource-constrained sensor network systems
are prone to large uncertainty in localization. Most previous works on Gaussian process
regression for mobile sensor networks [14,57,113,118] have assumed that the exact sampling
positions are available, which is not practical in real situations.
Therefore, motivated by the aforementioned issues, we consider correct (Bayesian) integration of uncertainties in sampling positions, and measurements noise for Gaussian process
regression and its computation error and complexity analysis for the sensor network applications. The overall picture of our work is similar to the one in [74] in which an extended
Kalman ﬁlter (EKF) was used to incorporate robot localization uncertainty and ﬁeld parameter uncertainty. However, [74] relies on a parametric model, which is a radial basis function
network model and EKF, while our motivation is to use more ﬂexible non-parametric approach, viz., Gaussian process regression taking into account localization uncertainty in a
Bayesian framework.
Gaussian process regression in [29] integrate uncertainty in the test input position for
multiple-step ahead time series forecasting. In [29], uncertainty was not considered in the
sampling positions of the training data (or observations). However, localization uncertainty
eﬀect is potentially signiﬁcant. For example, Fig. 2.1 shows the eﬀect of noisy sampling positions on the results of Gaussian process regression. Note that adding noise to the sampling
positions considerably increase the empirical RMS error, shown in the third row of Fig. 2.1.
A Gaussian approximation to the intractable posterior predictive statistics obtained in
[29] has been utilized for the predictive control with Gaussian models [54, 73] and Gaussian
process dynamic programming [18]. In general, the length of the training data is much longer
12

(a)

(b)

(c)

2

9

1

5

19

14
18 13
4
3
6
17
7

(e)

1.4

18
3
4

9
19

11
10 2 16
20

0.2

(g)

15 10
2

13

(f)

1.5

0.2

11
10

0.5
2
15

5

0.6

1.5

6

16
20

(h)

3
13
4
5

6
7
8
12

18

19
9

5

11
812

15

-10

−2

−2
(d)

20

4

7
8
20 12
16

0.5

(i)

1

300

100

Figure 2.1: For interpretation of the references to color in this and all other ﬁgures, the reader
is referred to the electronic version of this dissertation. The prediction results of applying
Gaussian process regression on the true and noisy sampling position are shown. The ﬁrst,
second, and third rows correspond to the prediction, prediction error variance, and squared
empirical error (between predicted and true ﬁelds) ﬁelds. The ﬁrst column shows the result
of applying Gaussian process regression on the true sampling positions. Second and third
columns show the results of applying Gaussian process regression on the noisy sampling
positions with 0.1 and 0.4 noise covariance matrices, respectively. True and noisy sampling
positions are shown in circles with agent indices in the second row.
than that of the test point for sensor network applications, therefore, our problem possibly
involves a high dimensional uncertainty vector for the sampling positions.
Gaussian process prediction with localization uncertainty can be obtained as a posterior
predictive distribution using Bayes’ rule. The main diﬃculty to this is that it has no analytic

13

closed-form solution and has to be approximated either through Monte Carlo sampling [75]
or other approximation techniques such as variational inference [108]. As an important
analytical approximation technique, Laplace’s method has been known to be useful in many
such situations [100, 101]. Diﬀerent Laplace approximation techniques have been analyzed
in terms of approximation error and computation complexity [68, 69, 100, 101].
The contribution of this chapter is as follows. First, we formulate Gaussian process regression with observations under the localization uncertainty due to the resource-constrained
(possibly mobile) sensor networks. Next, approximations have been obtained for analytically
intractable predictive mean, and predictive variance by using the Monte Carlo sampling and
Laplace’s method. Such approximation methods have been carefully tailored to our problems. In particular, a modiﬁed version of the moment generating function (MGF) approximation [101] has been developed as a part of Laplace’s method to reduce the computational
complexity. In addition, we have analyzed and compared the approximation error and the
complexity so that one can choose a tradeoﬀ between the performance requirements and
constrained resources for a particular sensor network application. Another important contribution has been to provide proof of concept tests and evaluate the proposed schemes in
real situations. We have applied the proposed approaches on the experimentally collected
real data from a dye concentration ﬁeld over a section of a river and a temperature ﬁeld of
an outdoor swimming pool.
The remainder of this chapter is organized as follows. In Section 2.2, we review Gaussian process regression, Monte Carlo sampling and Laplace’s method brieﬂy. In Section 2.3,
Gaussian process prediction in the presence of the localization uncertainty has been formulated for a proposed sampling scheme as a Bayesian inference problem. We ﬁrst present the
Monte Carlo estimators for the posterior predictive mean and variance in Section 2.4. In
14

Section 2.5, using Laplace’s method, we provide approximations for the posterior predictive
statistics. Section 2.6 compares the computational cost and approximation accuracy over the
proposed estimators based on the Mont Carlo sampling and the Laplace’s method. Finally
the simulation and experimental results are provided in Sections 2.7 and 2.8, respectively.

2.2

Preliminaries

In this section, we review the spatio-temporal Gaussian process, Monte Carlo sampling and
Laplace’s method, which will be used throughout the chapter.

2.2.1

Gaussian Process Regression

A Gaussian process deﬁnes a distribution over a space of functions and it is completely
speciﬁed by its mean function and covariance function. Let x ∈ X := S × T ⊂ Rb denote
the index vector, where x := [ sT t ]T contains the sampling position s ∈ S ⊂ Rι and the
sampling time t ∈ T ⊂ R≥0 . A Gaussian process, z(x) ∈ R, is formally deﬁned as below.
Deﬁnition 2.2.1. A Gaussian process [84] is a collection of random variables, any ﬁnite
number of which have a joint Gaussian distribution.
For notational simplicity, we consider a zero-mean Gaussian process1 z(x) ∼ GP(0,
K(x, x )) ∈ R. For example, one may consider a covariance function deﬁned as

K(x, x ) = σf 2 exp

−

x−x
2
2σ0

2

,

(2.1)

1 A Gaussian process with a nonzero-mean can be treated by a change of variables. Even

without a change of variables, this is not a drastic limitation, since the mean of the posterior
process is not conﬁned to zero [84].

15

where x, x ∈ Rb . In general, the mean and the covariance functions of a Gaussian process
can be estimated a priori by maximizing the likelihood function [113].
Suppose, we have N noise corrupted observations without localization error, and D =
{(x[i] , y [i] ) | i = 1, · · · , N }. Assume that y [i] = z [i] + [i] ∈ R, where [i] is an independent and
identically distributed (i.i.d.) white Gaussian noise with variance σ 2 . x is deﬁned by x =
col(x[1] , x[2] , · · · , x[N ] ) ∈ RbN . The collections of the realizations z = [ z [1] · · · z [N ] ]T ∈
RN and the observations y = [ y [1] · · · y [N ] ]T ∈ RN have the Gaussian distributions

z ∼ N (0, K(x)),

y ∼ N (0, K(x) + σ 2 I),

where K(x) ∈ RN ×N is the covariance matrix of z obtained by (K(x))ij = K(x[i] , x[j] ),
and I is the identity matrix with an appropriate size. We can predict the value z of the
Gaussian process at a point x as [84]

z |D ∼ N (µ (x), σ 2 (x)).

(2.2)

µ (x) = E(z |D ) = k(x)T (K(x) + σ 2 I)−1 y,

(2.3)

In (2.2), the predictive mean is

and the predictive variance is given by

σ 2 (x) = Var(z |D ) = σf 2 − k(x)T (K(x) + σ 2 I)−1 k(x),

(2.4)

where k(x) ∈ RN is the covariance matrix between z and z obtained by kj (x) = K(x[j] , x ),

16

and σf 2 = K(x , x ) ∈ R is the variance at x . (2.3) and (2.4) can be obtained from the
fact that
 

2
 σf


col(z , y)|x , x ∼ N 0, 

k(x)T

k(x) (K(x) + σ 2 I)



 .

(2.5)

Note that the predictive mean in (2.3) and its prediction error variance in (2.4) require
the inversion of the covariance matrix whose size depends on the number of observations
N . Hence a drawback of Gaussian process regression is that its computational complexity
and memory space increase as more measurements are collected, making the method prohibitive for agents with limited memory and computing power. To overcome this increase
in complexity, a number of approximation methods for Gaussian process regression have
been proposed. In particular, the sparse greedy approximation method [95], the Nystrom
method [112], the informative vector machine [58], the likelihood approximation [93], and
the Bayesian committee machine [104] have been shown to be eﬀective for many problems.
In particular, it has been proposed that spatio-temporal Gaussian process regression can
be applied to truncated observations including only measurements near the target position
and time of interest for agents with limited resources [118]. To justify prediction based on
only the most recent observations, a similar argument has been made in [6] in the sense
that the data from the remote past do not change the predictors signiﬁcantly under the
exponentially decaying correlation functions.
In this chapter, we also assume that at each iteration the mobile sensor networks only
needs to fuse a ﬁxed number of truncated observations, which are near the target points of
interest, to limit the computational resources.

17

2.2.2

Monte Carlo and Importance Sampling

In what follows, we brieﬂy review Monte Carlo and Importance sampling based on [2]. The
idea of Monte Carlo simulation is to draw an i.i.d. set of random samples {x(i) }r from
i=1
a target density P(x), where x ∈ X N = R. These r random samples will be used to
r

δ(x(i) − x), where

approximate P(x) by an empirical point-mass function Pr (x) = 1
r
i=1

δ(·) denotes the Dirac delta. Consequently, the integrals I(f ) can be approximated with
tractable sums Ir (f ) that converge as follows.
1
Ir (f ) =
r

r

a.s.
r→∞

f (x(i) ) → I(f ) =
i=1

R

f (x) P(x)dx.

Ir (f ) is an unbiased estimator. In addition, it will almost surely asymptotically converge to
I(f ), which can be proved by the strong law of large numbers. Considering f : R → R for
simplicity, if f (·) satisﬁes σe 2 = E(f 2 (x)) − I 2 (f ) < ∞, then the variance of the estimator
2
Ir (f ) converges to σe as r increases. In particular, the central limit theorem provides
r
√
us with convergence in distribution as follows. r (Ir (f ) − I(f )) ⇒ N (0, σe 2 ), where ⇒
r→∞

denotes convergence in distribution.
Importance sampling is a special case of Monte Carlo implementation, having random
samples generated from an available distribution rather than the distribution of interest. Let
us introduce an arbitrary importance proposal distribution ρ(x) whose support includes the
P(x)

support of P(x). We can then rewrite I(f ) as I(f ) = R f (x)ω(x)ρ(x)dx, where ω(x) = ρ(x)

is known as the importance weight. Simulating r i.i.d. samples {x(i) }r according to
i=1
ρ(x) and evaluating ω(x(i) ), a possible unbiased Monte Carlo estimate of I(f ) is given by
ˆ
Ir (f ) =

r
(i)
(i)
i=1 f (x )ω(x )/

r
(i)
i=1 ω(x ).

a.s.
ˆ
Under weak assumptions, the strong law of large numbers applies, yielding Ir (f ) → I(f ).
r→∞

18

This integration method can also be interpreted as a sampling method where the posterior
ˆ
density P(x) is approximated by Pr (x) =

r
(i)
(i)
i=1 ω(x )δ(x − x )/

r
(i)
ˆ
i=1 ω(x ), and Ir (f )

ˆ
is the integration of f (x) with respect to the empirical measure Pr (x).
In this chapter, we shall compute the ratio of two integrals in the form of
f (x)ω(x)ρ(x)dx
G(f ) = R
,
R ω(x)ρ(x)dx

(2.6)

where ρ(x), f (x) and ω(x) are deﬁned on a space R. To compute G(f ) in (2.6), we use the
following approximation as proposed in [28].

ˆ
Gr (f ) =

r
(i)
(i)
i=1 f (x )ω(x ) ,
r
(i)
i=1 ω(x )

(2.7)

where {x(i) | i = 1, · · · , r} is a sequence of i.i.d. random vectors, which is drawn from ρ(x)
distribution.
In the following theorem, which can be shown to be a special case of the results from [28],
we show the convergence properties of the approximation in (2.7) as functions of r, the
number of random samples [28].
ˆ
Theorem 2.2.2. (Theorems 1 and 2 from [28]) Consider the approximation Gr (f ) given in
(2.7) to the ratio given G(f ) in (2.6). If ω(x)ρ(x) is proportional to a proper probability density function deﬁned on R, and G(f ), R ω(x)ρ(x)dx and R f 2 (x)w2 (x)ρ(x)dx are ﬁnite,
we have
√
a.s.
ˆ
ˆ
Gr (f ) → G(f ), and r Gr (f ) − G(f )
r→∞

2
⇒ N (0, σg ),

r→∞

where
2
σg =

(f (x) − G(f ))2 w2 (x)ρ(x)dx.
R

19

Proof. It is a straightforward application of the results from Theorems 1 and 2 in [28] to
(2.6) and (2.7).

2.2.3

Laplace Approximations

The Laplace method is a family of asymptotic approximations that approximate an integral
of a function, i.e., R f (x)dx, where x ∈ R ⊂ Rm , and m = N × b. Let the function f (x) be
in a form f (x) = e−nh(x) , where 1

n ∈ Z>0 , and h : R → R is a two times continuously

ˆ
diﬀerentiable real function on R. Let x denote the exact mode of −h, i.e.,

ˆ
x = arg max −h(x).
x∈R

Then Laplace’s method produces the approximation [100]:

f (x)dx =
R

2π
n

m
2

1

x
|φ(ˆ )| 2 e−nh(ˆ ) + O(n−1 ),
x

(2.8)

where φ(ˆ ) = [ 2 h(ˆ )]−1 . The Laplace approximation in (2.8) will produce reasonable
x
x
results as long as the −h is unimodal or at least dominated by a single mode.
In practice it might be diﬃcult to ﬁnd the exact mode of −h. A concept of an asymptotic
mode is introduced to gauge the approximation error when the exact mode is not used [68].
˘
˘ ˆ
Deﬁnition 2.2.3. x is called an asymptotic mode of order O(n−k ) for −h if x − x →
0 as

n → ∞, and

h(˘ ) = O(n−k ).
x

˘
˘
Suppose that x is an asymptotic mode of order O(n−1 ) for −h and {h, x} satisﬁes the

20

regularity conditions. Then it follows that we have

f (x)dx =
R

2π
n

m
2

1

x
x
|φ(˘ )| 2 e−nh(˘ ) Ch (˘ ) + O(n−1 ),
x

(2.9)

where Ch (˘ ) is given by
x
n
xT x
x
Ch (˘ ) = e 2 h(˘ ) φ(˘ ) h(˘ ) .
x

More precise form with the asymptotic mode of order O(n−2 ) is computed for an approximation of order O(n−3 ) in [69].
In many Bayesian inference applications and as in our problem, we need to compute
the ratio of two integrals. To this end, fully exponential Laplace approximations has been
developed by [100] to compute Laplace approximations of the ratio of two integrals, i.e.,
e−nu(x) dx
.
M= R
e−nh(x) dx
R

(2.10)

If each of −u and −h has a dominant peak at its maximum, then Laplace’s method may
be directly applied to both the numerator and denominator of (2.10) separately. If the
regularity conditions are satisﬁed, using (2.8) for the denominator approximation and (2.9)
for the numerator approximation, Miyata [68] obtained the following approximation and its
error order ,

ˆ
M=

1
x
| 2 h(ˆ )| 2 enh(ˆ )
x
1
x
| 2 u(˘ )| 2 enu(˘ )
x

ˆ
M = M + O(n−2 ),

21

× Cu (˘ ),
x
(2.11)

ˆ
˘
where x is the exact mode of −h, and x is the asymptotic mode of −u, and
n u(˘ )T [ 2 u(˘ )]−1 u(˘ )
x
x
x
Cu (˘ ) = e 2
x
,

(2.12)
˘ ˆ
x = x − [ 2 u(ˆ )]−1
x

u(ˆ ).
x

Laplace approximations typically has an error of order O(n−1 ) as shown in (2.8) and (2.9).
On the other hand, fully exponential Laplace approximations for the ratio form yield an error
of order O(n−2 ) as shown in (2.11) since the error terms of order O(n−1 ) in the numerator
and denominator cancel each other [100].
Fully exponential Laplace approximations which are presented in (2.11) is limited for
positive functions. Then, Tierney et al. [101] proposed a second-order approximation to the
expectation of a general function g(x) (not necessarily positive) by applying the fully exponential method to approximate M(τ ) = E(eτ g(x) ) and then diﬀerentiating the approximated
M(τ ). Consider M(τ ) is deﬁned as follow
eτ g(x) e−nh(x) dx
e−nu(x) dx
= R
,
M(τ ) = R
−nh(x) dx
−nh(x) dx
Re
Re
τ
where u(x) = − n g(x) + h(x), and eτ g(x) is positive, while g(x) could be positive or negative.
d ˆ
In particular, dτ M(τ ) evaluated at τ = 0 yields a second-order approximation to E(g(x))

and its order of the error as follow.

E(g(x)) =

d
d ˆ
M(τ )
=
M(τ )
+ O(n−2 ).
dτ
dτ
τ =0
τ =0

(2.13)

This method, which is called moment generating function (MGF) Laplace approximation,
gives a standard-form approximation using the exact mode of −h(x) [101].
22

Miyata [68, 69] extended the MGF method for one without computing the exact mode of
ˆ
ˆ
−h(x). Let x be an asymptotic mode of order O(n−1 ) for −h(x). Suppose that {h, x} satisﬁes the regularity conditions for the asymptotic-mode Laplace method. By using Theorem 5
ˆ
in [68], the approximation of E(g(x)) and its error order are given as
1
ˆ
tr
E(g(x)) = g(ˆ ) +
x
2n
−

2 g(ˆ )φ(ˆ ) − 1
x x

2n

m
ijk =1

∂g(ˆ )
x
∂ 3 h(ˆ )
x
φi φjk
∂xi ∂xj ∂xk
∂x
(2.14)

g(ˆ )T φ(ˆ ) h(ˆ ),
x
x
x

ˆ
E(g(x)) = E(g(x)) + O(n−2 ),
ˆ
where φij is the i-th row, j-th column element of the matrix φ(ˆ ). Furthermore, if x is the
x
exact mode of −h(x), then the approximation has a simpler form because the terms that
include

h(ˆ ) vanish
x

1
ˆ
tr
E(g(x)) = g(ˆ ) +
x
2n

2.3

m

2 g(ˆ )φ(ˆ ) − 1
x x

2n

ijk =1

∂ 3 h(ˆ )
x
∂
φi φjk
g(ˆ ).
x
∂xi ∂xj ∂xk
∂x

(2.15)

The Problem Statement

In practice, D (data with perfect localization) is not available due to localization uncertainty,
and instead its exact sampling points will be replaced with noise corrupted sampling points.
To average out measurement and localization noises, in this chapter, we propose to use a
sampling scheme in which multiple measurements are taken repeatedly at a set of sampling
points of a sensor network. For robotic sensors or mobile sensor networks, this sampling
strategy could be eﬃcient and inexpensive since the large energy consumption is usually due
to the mobility of the sensor network. Let N sensing agents be indexed by J = {1, · · · , N }.

23

From the proposed sampling scheme, we assume that each agent takes multiple data pairs
{(¯[i] , y [i] ) | i ∈ I}, which are indexed by I = {1, · · · , n} at a set of sampling points by the
x
sensor network {x[j] | j ∈ J }. We then deﬁne the map ψ : I → J that takes the index of
the data pair in I as the input and returns the index of the sensor that produced the data
pair as the output. Consider the following realizations using the sampling scheme and the
notation just introduced.

x[i] = x(ψ(i)) + v [i] ∈ Rb ,
¯
y [i] = z (ψ(i)) + [i] ∈ R,

∀i ∈ I
∀i ∈ I,

where [i] is an i.i.d. white Gaussian noise with a zero mean and a variance of σ 2 , i.e.,
[i] ∼ N (0, σ 2 ) and v [i] is a localization error which has a multivariate normal distribution

with a zero mean and a covariance matrix Σv ∈ Rb×b , i.e., v [i] ∼ N (0, Σv ). For instance,
the distribution of the localization error may be a result of the fusion of GPS and INS
measurements [102], or landmark observations and robot’s kinematics [24].
To simplify the notation, D is introduced to denote the data with the measurement noise
and localization error as follows.

D = (¯[i] , y [i] ) | i ∈ I .
x

(2.16)

We also deﬁne the collective sampling point vectors with and without localization uncer-

24

tainty, and the cumulative localization noise vector, respectively by

x = col(x[1] , x[2] , · · · , x[N ] ) ∈ RbN ,
¯
x = col(¯[1] , x[2] , · · · , x(n) ) ∈ Rbn ,
x ¯
¯

(2.17)

v = col(v [1] , v [2] , · · · , v (n) ) ∈ Rbn .
From the proposed sampling scheme, to average out the measurement and localization uncertainties, the number of measurements n can be increased without increasing the number
of sensors N , and consequently without increasing the dimension of x ∈ RbN . Hence, this
approach may be eﬃcient for the resource-constrained (mobile) sensor network at the cost
of taking more measurements. Using collective sampling point vectors in (2.17), we have the
following relationship.
¯
x = Lx + v,

(2.18)

¯
¯
¯
¯
where L = L ⊗ Ib ∈ Rbn×bN , L ∈ Rn×N and L ij = 1 if ψ(i) = j, otherwise L ij = 0.
The conditional probabilities P(¯ |x) and P(x|¯ ) can be expressed as
x
x

P(¯ |x) =
x

1

1 x
T −1 x
e− 2 (¯ −Lx) Σv (¯ −Lx) ,
1

|2πΣv | 2
P(x|¯ ) =
x

P(¯ |x) P(x)
x
,
x
R P(¯ |x) P(x)dx

where Σv = In ⊗ Σv ∈ Rbn×bn . From a Bayesian perspective, we can treat x as a random
vector to incorporate a prior distribution on x. For example, if we assign a prior distribution
on x such that x ∼ N (0, Σx ) then we have

P(x|¯ ) =
x

1

1
¯ T ˚−1
¯
e− 2 (x−H x) Σ (x−H x) .
1

|2π˚ 2
Σ|

25

(2.19)

where H = ˚ T Σ−1 and ˚−1 = Σ−1 + LT Σ−1 L.
ΣL v
Σ
x
v
Evaluating posterior predictive statistics such as the density, the mean, and the variance
are of critical importance for the sensor network applications.
Therefore, given the data D in (2.16), our goal is to compute the posterior predictive
statistics. In particular we focused on the following two quantities given in detail.
• The predictive mean estimator (PME) is given by E(z |D) = Z z P(z |D)dz .

z
E(z |D) = Z
=

x
R P(z |x, y) P(y|x) P(x|¯ )dxdz
x
R P(y|x) P(x|¯ )dx

(2.20)

x
R µ (x) P(y|x) P(x|¯ )dx ,
x
R P(y|x) P(x|¯ )dx

where µ (x) is given by (2.3).
• The predictive variance estimator (PVE) is obtained similarly. From the following
equation,

E(z 2 |D) = R

2
Z z P(z |x, y)dz

P(y|x) P(x|¯ )dx
x
,
x
R P(y|x) P(x|¯ )dx

where Z z 2 P(z |x, y)dz = σ 2 (x) + µ 2 (x), we obtain Var(z |D) given as the following formula.

(σ
Var(z |D) = R

2 (x) + µ 2 (x)) P(y|x) P(x|¯ )dx
x

x
R P(y|x) P(x|¯ )dx

− E2 (z |D),

(2.21)

where σ 2 (x) is given by (2.4).
The main challenge to our problems is the fact that there are no closed-form formulas
26

for the posterior predictive statistics listed in (2.20), and (2.21). Therefore, in this chapter, approximation techniques will be carefully applied to obtain approximate solutions.
In addition, the tradeoﬀs between the computational complexity and the precision will be
investigated for the sensor networks with limited resources.
From (2.19), one might be tempted to use the best estimate of x, e.g., the conditional
¯
expectation of x for given measured locations x, i.e., E(x|¯ ) for Gaussian process regression.
x
Comparison between these type of quick-and-dirty solutions and the proposed Bayesian
approaches will be evaluated and discussed with simulated and experimental data presented
in Sections 2.7 and 2.8, respectively.

2.4

The Monte Carlo method

In this section, we propose importance sampling to compute the posterior predictive statistics. We also summarize the convergence results of the MC estimators based on Theorem 2.2.2.

2.4.1

Predictive mean

The predictive mean estimator is given by (2.20). Using importance sampling to approximate
(2.20) leads to the following equation,

ˆ
E(z |D) =

r
(i)
(i)
i=1 P(y|x )µ (x ) .
r
(i)
i=1 P(y|x )

27

(2.22)

Theorem 2.2.2 and (2.22) lead to
a.s.
ˆ
E(z |D) → E(z |D),
r→∞

ˆ
E(z |D) − E(z |D)

⇒ N

r→∞

σ2
0, m
r

,

2
where σm = R (µ (x) − E(z |D))2 P2 (y|x) P(x|¯ )dx, µ (x) is calculated from (2.3), and
x

x(i) has been sampled from P(x|¯ ) given by (2.19).
x

2.4.2

Predictive variance

For the prediction error variance given by (2.21), we have

Var(z |D) = E σ 2 (x) + Var (µ (x)) .

Thus, we propose the following estimator.

Var(z |D) =

r
(i)
2 (i)
i=1 P(y|x )σ (x )
r
(i)
i=1 P(y|x )
r
ˆ
P(y|x(i) )(µ (x(i) ) − E(z |D))2
,
+ i=1
r
P(y|x(i) )
i=1

(2.23)

where σ 2 (x(i) ) = Var(z |x(i) , y), µ (x(i) ) = E(z |x(i) , y) are obtained from the formulas
ˆ
similar to (2.4) and (2.3), and E(z |D) is given by (2.22).
ˆ
Applying Theorem 2.2.2 for f (x) = σ 2 (x) + (µ (x) − E(z |D))2 , ω(x) = P(y|x), and
ρ(x) = P(x|¯ ), we obtain the following results.
x

a.s.
r→∞

Var(z |D) → Var(z |D),

28

where
ˆ
ˆ
Var(z |D) = E(z 2 |D) − 2E(z |D)E(z |D) + E2 (z |D),
Var(z |D) − Var(z |D)

2
⇒ N 0, σz r−1 ,

r→∞

and

2
σz =

R

ˆ
(σ 2 (x) + (µ (x) − E(z |D))2 − Var(z |D))2 P2 (y|x) P(x|¯ )dx.
x

ˆ
Remark 2.4.1. Since E(z |D) is not available in (2.23), E(z |D) is used instead. Subsequently, the convergence results for Var(z |D) are given with respect to Var(z |D) which
converges to Var(z |D) as r → ∞ by the deﬁnition.

2.5

Fully Exponential Laplace Approximations

In this section, we propose fully exponential Laplace approximations to compute the posterior
predictive statistics. In the process of applying Laplace approximations, we also obtain the
estimation of the sampling points given D as a by-product. From the observations D given
by (2.16), we can update the estimates of the sampling points x. To this end, we use the
maximum a posteriori probability (MAP) estimate of x given by

ˆ
xM AP = arg max P(y|x) P(x|¯ ).
x
x∈R

29

(2.24)

2.5.1

Perdictive mean

The predictive mean estimator, given by (2.20), can be approximated using MGF method
(2.14). To compute E(z |D), let g(x) = µ (x). Using
1
h(x) = − ln (P(y|x) P(x|¯ )),
x
n

(2.25)

ˆ
the predictive mean estimation using MGF method (MGF-PME), E(z |D) and its error
order are given as

ˆ
ˆ
E(z |D) = E(g(x)),

(2.26)

ˆ
E(z |D) = E(z |D) + O(n−2 ),
ˆ
where E(g(x)) is given in (2.14) or (2.15). However, the MGF-PME given by (2.26) needs
the computation of the third derivative of h, which increases the complexity of the algorithm.
In this chapter, another MGF method has been developed in order not to use the third
derivative of h. To approximate the derivative of M(·) at a point τ in (2.13), we utilize
a three-point estimation, which is the slope of a nearby secant line through the points
(τ − ζ, M(τ − ζ)) and (τ + ζ, M(τ + ζ)). Approximating the derivative in (2.13) with the
three-point estimation, we can avoid the third derivative in (2.14) or (2.15). We summarize
our results in the following theorem.
ˆ
Theorem 2.5.1. Let x be the exact mode of −h(x). The three-point predictive mean esti-

30

mator (TP-PME) and its order of the error are given by
1 3
ˆ
E(z |D) = n 4
2
×

1
x
2 h(ˆ ) 2 enh(ˆ )
x
1
˚x
2˚ x ) − 2 C (ˆ )e−nb(ˆ˚) −
b
b(ˆ˚
˚ x˚
b
b b

1
p(ˆ p
2˚ x ) − 2 C (ˆ )e−n˚ x˚)
p(ˆ˚
p
˚ x˚
p p

,

(2.27)

ˆ
E(z |D) = E(z |D) + O(n−3/2 ),
where h(x) is given by (2.25), and we have used the following deﬁnitions
7

˚ = h(x) − n− 4 µ (x),
b(x)
7

˚ = h(x) + n− 4 µ (x),
p(x)
ˆb ˆ
x˚ = x − [ 2˚ x)]−1 ˚ x),
b(ˆ
b(ˆ
ˆp ˆ
x˚ = x − [ 2˚ x)]−1 ˚ x),
p(ˆ
p(ˆ
n ˚ x )T [ 2˚ x )]−1 ˚ x )
b(ˆ˚
b(ˆ˚
b(ˆ˚
b
b
b ,

C˚(ˆ˚) = e 2
x
b b

n ˚ x )T [ 2˚ x )]−1 ˚ x )
p(ˆ˚
p(ˆ˚
p(ˆ˚
p
p
p .

C˚(ˆ˚) = e 2
p xp

Proof. First we use the three-point method to approximate derivative such that

d
M(ζ) − M(−ζ)
M(τ )
=
+ O(ζ 2 ).
dτ
2ζ
τ =0
ˆ
Plugging M(ζ) = M(ζ) + O(n−2 ) into the above equation and using (2.13), we obtain
ˆ
ˆ
M(ζ) − M(−ζ)
ˆ
E(z |D) =
+ O(n−2 ) O(ζ −1 ) + O(ζ 2 ).
2ζ

(2.28)

By selecting ζ = n−3/4 , we recover the order of the error O(n−2 ) O(ζ −1 ) + O(ζ 2 ) =
ˆ
ˆ
O(n−3/2 ). By computing the estimates M(ζ) and M(−ζ) in (2.28) using (2.11), (2.27)
31

is obtained.
Remark 2.5.2. The complexity of either (2.14) or (2.15) is O(n4 ) while the complexity of
(2.27) is O(n3 ). In return, the error of (2.15) is of order O(n−2 ) and the error of (2.27) is
of order O(n−3/2 ).

2.5.2

Predictive variance

We now apply Laplace’s method to approximate the prediction error variance in a similar
way. The prediction error variance is given by (2.21). In this case, h(x) is given by (2.25)
1
and u(x) = − n ln(σ 2 (x) + µ 2 (x)) + ln(P(y|x) P(x|¯ )) . Applying (2.11) to this case, the
x

approximate of Var(z |D) and its order of the error are given by

Var(z |D) =

1
x
| 2 h(ˆ )| 2 enh(ˆ )
x
1
x
| 2 u(˘ )| 2 enu(˘ )
x

ˆ
Cu (˘ ) − E2 (z |D),
x
(2.29)

Var(z |D) =Var(z |D) + O(n−3/2 ),
ˆ
˘
˘
where x is the exact mode of −h, and x is the asymptotic mode of −u. Cu (˘ ) and x are
x
ˆ
given by (2.12) and E(z |D) is given by (2.27).

2.5.3

Simple Laplace approximations

To minimize the computational complexity, one may prefer a simpler approximation. In
this chapter, we propose such a simple approximation at the cost of precision, which is
summarized in the following theorem.
ˆ
Theorem 2.5.3. Let x be an asymptotic mode of order O(n−1 ) for −h given by (2.25).
ˆ
Assume that {h, x} satisﬁes the regularity conditions. Consider the following approximations
32

for E(z |D) and Var(z |D)

ˆ
E(z |D) = kT (ˆ )(K(ˆ ) + σ 2 I)−1 y,
x
x

(2.30)

Var(z |D) = σf 2 − kT (ˆ )(K(ˆ ) + σ 2 I)−1 k(ˆ ),
x
x
x

(2.31)

ˆ
where K(ˆ ) and k(ˆ ) are covariance matrices as in (2.3) but obtained with x.
x
x
We have then the following order of errors.

ˆ
E(z |D) = E(z |D) + O(n−1 ),
Var(z |D) = Var(z |D) + O(n−1 ).
Proof. The approximation for E(z |D) given by (2.30) is the ﬁrst term of (2.14) neglecting
ˆ
high order terms. The second and third terms in (2.14) have n−1 inside. x is an asymptotic
mode of order O(n−1 ) for −h. By Deﬁnition 2.2.3, the last term of (2.14) that contains

h(ˆ )
x

is O(n−1 ). Hence, without these high order terms, we obtain a simpler approximation of
order O(n−1 ).
The approximation for Var(z |D) given by (2.31) can be obtained by approximating
(2.21) with the ﬁrst term of the MGF approximation. Let g(x) = σ 2 (x) + µ 2 (x), then
Var(z |D) = E(g(x)|D) − E2 (z |D). Using the ﬁrst term of MGF E(g(x)|D) = g(ˆ ) +
x
O(n−1 ) = σ 2 (ˆ ) + µ 2 (ˆ ) + O(n−1 ), and using E2 (z |D) = µ 2 (ˆ ) + O(n−1 ), we obtain
x
x
x
Var(z |D) = σ 2 (ˆ ) + O(n−1 ). The procedure of showing the order of the approximation
x
error is the same as what was shown for the approximation in (2.30).
Remark 2.5.4. Note that the simple Laplace predictive mean and variance estimators in
(2.30) and (2.31) take the same forms of the original predictive mean and variance without
33

the localization error, respectively given in (2.3) and (2.4), evaluated at the MAP estimator
ˆ
of x. As we previously mentioned, x is the mode of −h given by (2.25) and is the MAP
ˆ ˆ
estimator of x, i.e., x = xM AP as deﬁned in (2.24). Therefore, the diﬀerence in the simple
Laplace approximations with respect to a quick-and-dirty solution in which the measured
¯
ˆ
¯
location vector x is used is that the simple Laplace approximations use x instead of x.
In applying Laplace’s method, using the one step Newton gradient method to compute
˘
ˆb
ˆp
asymptotic modes, e.g., x required in (2.29) or x˚ and x˚ required in (2.27) may not satisfy
the regularity conditions. In this case, one needs to continue the Newton gradient optimization until the regularity conditions are satisﬁed.

2.6

Error and Complexity Analysis

In this section, we analyze the order of the error and the computational complexity for the
proposed approximation methods, which are summarized in Table 2.1. A tradeoﬀ between
the approximation error and complexity can be chosen taking into account the performance
requirements and constrained resources for a speciﬁc sensor network application.
For Laplace’s method, the order of the error ranges from O(n−1 ) to O(n−2 ) at the cost
of complexity from O(n3 ) to O(n4 ) as shown in Table 2.1.
For the Monte Carlo estimators, we introduce O(σr ), which is the probabilistic error
2
order and it implies that the estimation error converges to N (0, σr ) in distribution as the

number of random samples r increases. Therefore it is not appropriate to compare the error
bound between Monte Carlo and Laplace’s method exactly. Monte Carlo algorithms are
used for multivariate integration of dimension b × N . The probabilistic error order of Monte
Carlo algorithms that use r sample evaluations is of order r−1/2 for a given n. We may
34

Table 2.1: Error and complexity analysis
Estimator

Method

Error

Complexity

MC-PME in (2.22)

Monte Carlo

O(n3 × r)

MGF-PME in (2.14)
TP-PME in (2.27)
S-PME in (2.30)

Laplace MGF
Laplace MGF
Laplace MGF

√
O σm
r
−2 )
O(n
O(n−3/2 )
O(n−1 )

O(n4 )
O(n3 )
O(n3 )

0
2
1
0
2
−2
3

0

1

2

3

4

5

6

Figure 2.2: A realization of the Gaussian process-ground truth.
assume that the order of the error for Monte Carlo methods do not depend on the number of
measurements n for a large n [28]. With this assumption, the number of samples needed for
Monte Carlo algorithms to reduce the initial probabilistic error by n−2 is of order r = n4 .
The complexity of the Monte Carlo methods, for the investigated problems in this chapter,
is O(n3 × r). To achieve the probabilistic error order O(n−2 ), the complexity of Monte Carlo
methods has to be O(n7 ). If we want to keep the error order at the level of O(n−2 ) and
O(n−2 ) for Laplace’s and Monte Carlo methods, respectively, the Monte Carlo methods are
slightly more expensive than Laplace’s method.

2.7

Simulation Results

In this section, we provide simulation results to evaluate the performances of diﬀerent estimation methods. To this end, a realization of a Gaussian process that will serve as ground
35

truth is shown in Fig. 2.2. The Gaussian process is generated for σf =

√

2 and σ0 =

√

2 in

(2.1). The measurement noise and the sampling position uncertainty variance are σ = 0.01
√
and Σv = 0.1 × I, respectively. N = 20 and n = 40 imply that 20 robot takes measurements twice at each sampling position. In Figs. 2.2-2.7, the predicted ﬁelds and predicted
error variance ﬁelds are shown with color bars.
The results from Gaussian process regression using the noiseless positions x and the noisy
measurement y are shown in Fig. 2.3.
¯
To compare with typical quick-and-dirty solutions (QDS) to deal with noisy locations x
in practice, we deﬁne two solutions: QDS 1 and 2 approaches. QDS 1 is to applying Gaussian
process regression given by (2.3) and (2.4) by simply ignoring noises in the locations and
¯
taking x as the true positions, i.e.,

µ (¯ ) = kT (¯ )(K(¯ ) + σ 2 I)−1 y.
x
x
x

(2.32)

When the measurements are taken repeatedly as suggested in Section 2.3, QDS 1 could be
improved. In this regard, QDS 2 is to use the conditional expectation of sampling positions
¯
x given x as in (2.19) and the least squares solution of z for given y, which shall be plugged
into Gaussian process regression, i.e.,

¯ ¯
¯ ¯
¯
¯
¯
µ (H x) = kT (H x)(K(H x) + σ 2 I)−1 [(LT L)−1 LT z],

(2.33)

¯
where H is from (2.19) and L is from (2.18). QDS 2 might be an improved version of
QDS 1 when there are many repeated measurements for a set of ﬁxed sampling positions.
¯
Figs. 2.4 and 2.5 show the results of applying QDS 1 and QDS 2 on x and y, respectively.

36

Table 2.2: Simulation RMS error
Estimator & Method
Gaussian process regression in (2.3)
QDS 1 in (2.32)
QDS 2 in (2.33)
Monte Carlo method in (2.22)
Laplace’s method in (2.27)
Simple Laplace method in (2.30)

Figure
Fig. 2.3
Fig. 2.4
Fig. 2.5
Fig. 2.6
Fig. 2.7
Fig. 2.8

RMS error
0.1281
0.9126
0.7374
0.3403
0.3320
0.3503

This averaging helps the QDS approach to generate smoother predictions, and it shows
improvement with respect to QDS 1.
The results of the Monte Carlo method with r = 1000 samples are shown in Fig. 2.6.
The results of Laplace’s method are shown in Fig. 2.7, and they look very similar to those of
the Monte Carlo methods in Fig. 2.6. Figs. 2.4-2.7 clearly shows that both the Monte Carlo
and Laplace’s method outperform QDS 1 and 2 with respect to the true ﬁeld in Fig. 2.3.
To numerically quantify the performance of each approach, we have computed the root
mean square (RMS) error between the predicted and true ﬁelds for all methods, which are
summarized in Table 2.2. This RMS error analysis could be done since we know the true
realization of the Gaussian process exactly in this simulation study. As expected, Gaussian
process regression with the true locations perform best. The proposed approaches, i.e.,
Monte Carlo and Laplace’s method outperform QDS 1 and 2 in terms of RMS errors as well.

37

(a)

0

2
1
0
2
−2
3

0

1

2

0

3
(b)

4

5

6

0.4
1

0.3
0.2

2

0.1
3

0

1

2

3

4

5

6

0

Figure 2.3: Gaussian process regression using the true positions x. a) The predictive mean
estimation. b) The predictive variance estimation, and the true sampling positions x in
aquamarine crosses.

2.8
2.8.1

Experimental Results
Experiment with simulated noisy sampling positions

In this section, we apply proposed prediction algorithms to a real experimental data set to
model the concentration of 7α, 12α, 24-trihydroxy-5α-cholan-3-one 24-sulfate (3kPZS), a
synthesized sea lamprey (Petromyzon marinus) mating pheromone, in the Ocqueoc River,
MI, USA which the authors of [49] provided. The sea lamprey is an ecologically damaging
vertebrate invasive ﬁsh invader of the Laurentian Great Lakes [94] and a sea lamprey management program has been established [11]. A recent study by Johnson et al. [49] showed

38

(a)

0

2
1
0
2
−2
3

0

1

2

0

3
(b)

4

5

6
0.1

1
0.05
2
3

0

1

2

3

4

5

6

0

¯
Figure 2.4: The results of the QDS 1 approximations using x. a) The predictive mean
¯
estimation. b) The predictive variance estimation, and x shown with aquamarine crosses.
that synthesized 3kPZS, a synthesized component of the male mating pheromone, when released into a stream to reach concentrations of 10−14 − 10−10 M (molar) or mol/L, triggers
robust upstream movement in ovulated females drawing ≈ 50% into baited traps. The ability to predict 3kPZS concentration at any location and time with a predicted uncertainty
level would allow for ﬁne-scale evaluations of hypothesized sea lamprey chemo-orientation
mechanisms such as odor-conditioned rheotaxis [49]. Here 3kPZS was added to the stream
to produce pulsing excitation to sea lampreys by applying 3kPZS at two minutes intervals [48]. To describe 3kPZS concentration throughout the experimental stream, rhodamine
dye (Turner Designs, Rhodamine WT, Sunnyale, CA, USA) was applied at the pheromone
release point (or source location) to reach a ﬁnal in-stream concentration of 1.0 mug/L (mea39

(a)

0

2
1
0
2
−2
3

0

1

2

0

3
(b)

4

5

6

0.4
1

0.3
0.2

2

0.1
3

0

1

2

3

4

5

6

0

Figure 2.5: The results of the QDS 2 approximations using E(x|¯ ). a) The predictive mean
x
estimation. b) The predictive variance estimation, and E(x|¯ ) shown with aquamarine
x
crosses.
sure the concentration of the 3kPZS). The same pulsing strategy is used when 3kPZS was
applied. The dye and 2kPZS pumping systems are shown in Fig. 2.9-(a). An example of the
highly visible dye plume near the source location is shown in Fig. 2.9-(c).
To quantify dye concentrations in the stream, water samples were collected along transects occurring every 5 m from the sea lamprey release cage (0 m) to the application location
(73 m upstream of release cage) as shown in Fig. 2.9-(b). Three sampling positions were ﬁxed
along each transect at 1/4, 1/2 and 3/4 of the channel width from the left bank. Water was
collected from the three sampling sites along a transect simultaneously (three samplers) every 10 sec. for 2 minutes time interval. Further, a series of samples over 2 min were collected

40

(a)

0

2
1
0
2
−2
3

0

1

2

0

3
(b)

4

5

6

0.4
1

0.3
0.2

2

0.1
3

0

1

2

3

4

5

6

0

¯
Figure 2.6: The results of the Monte Carlo approximations with 1000 samples using x. a)
¯
The predictive mean estimation. b) The predictive variance estimation, and x shown with
aquamarine crosses..
exactly 1 meter downstream of the dye application location. Water samples were collected in
5 ml glass vials and subsequently the ﬂuorescence intensity of each sample measured at 556
nm was determined in a luminescence spectrometer (Perkin Elmer LSS55, Downers Grove,
IL, USA) and rhodamine concentration was estimated using a standard curve (R2 = 0.9998).
The objective here is to predict the spatio-temporal ﬁeld of the dye concentration. In
fact, the sampling positions are exactly known from the experiment. Therefore, we will
intentionally inject some noises in the sampling positions to evaluate the proposed prediction
algorithms in this chapter. Before applying the proposed algorithms, the hyperparameters
such as σf and σ0 were identiﬁed from the experimental data by a maximum a posteriori

41

(a)

0

2
1
0
2
−2
3

0

1

2

0

3
(b)

4

5

6
1.5

1

1

2

0.5

3

0

1

2

3

4

5

6

0

¯
Figure 2.7: The results of the Laplace approximations using x. a) Predictive mean estima¯
tion. b) The predictive variance estimation, and x shown with aquamarine crosses.
probability (MAP) estimator as described in [113]. On the other hand, the value for σ was
set to σ = 5×10−3 mug/L according to the noise level from the data sheet of the sensor. We
consider the anisotropic covariance function [113] to deal with the case that the correlations
along diﬀerent directions in x = [x1 , x2 , x3 ]T ∈ R3 are diﬀerent. Next, we performed a
change of variables such that in a transformed space and time. The covariance function,
given (2.1), could be used for the proposed approaches.
For the case of this experiment, the spatial correlation along the river ﬂow direction is
diﬀerent from the correlation perpendicular to the river ﬂow direction. We consider the

42

(a)

0

2
1
0
2
−2
3

0

1

2

0

3
(b)

4

5

6

0.4
1

0.3
0.2

2

0.1
3

0

1

2

3

4

5

6

0

¯
Figure 2.8: The results of the simple Laplace approximations using x. a) The predictive
ˆ
mean estimation. b) The predictive variance estimation, and x shown with aquamarine
ˆ
crosses. x is the estimation of the true sampling positions, computed as a by-product of
both fully exponential Laplace and simple Laplace approximations.
following covariance function in the units for the experimental data.




2 (x − x )2
j
j 
2 exp −
.
K(x, x ) = σf
2
2σj
j=1

(2.34)

Using this covariance function in (2.34) and computing likelihood function with true value
of x, the hyperparameter vector θ := [σf , σ1 , σ2 ] can be computed by the MAP estimator as
follow:

θM AP = arg max P(y|θ) P(θ).
θ∈Θ

43

(2.35)

(a)

(b)

(c)

Figure 2.9: (a) The dye pumping system. (b) An example of normal dye concentration
applied to the stream for the pheromone distribution estimation. (c) An example of the
highly visible dye plume near the source location.

44

Table 2.3: Experiment parameters
Description
Number of sensors
Number of measurements
The variability at a ﬁxed point
Bandwidth

Parameter
N
n
σf
σ1 (vertical)
σ2 (horizontal)

Noise covariance matrix

Σv

Measurement noise level

σ

Using the optimization in (2.35), we obtained σf =

√

Value
15
15
√
0.3
1.6
10.7
4
0
0 0.089
0.005

0.3 mug/L, σ1 = 1.6 m and σ2 = 10.7

m. x1 and x2 are the coordinates of the vertical and horizontal (ﬂow direction) axes of
Fig. 2.10. As we expect, we have σ1 < σ2 , i.e., the ﬁeld is more correlated along the river
ﬂow direction as compared to the perpendicular to the river ﬂow direction. Next, after ﬁnding
MAP estimates of hyperparameters σ1 and σ2 , we change the scale of each component of x,
such that we have the same values for hyperparameters in the transformed coordinates. The
σ

new coordinates are given by {xnew = σ0 xi } and we recover
i
i


K(x, x ) = σf 2 exp −

b
i=1

2
σ
σ
( σ0 )xi − ( σ0 )xi 
i
i
,
2
2σ0



σ

as in (2.1). Note that scaling xi with σ0 , also transforms the covariance matrix Σv in a
i
normalized space. Table 2.3 shows parameter values which are estimated and used for the
experimental data.
Fig. 2.10 shows the predicted ﬁeld by applying Gaussian process regression on the true
positions x. To illustrate the advantage of proposed methods over the QDS approach in
dealing with the noisy localizations, we show the results of applying QDS 1, Monte Carlo,
Laplace, and simple Laplace approximations in Figs. 2.11, 2.12, 2.13, and 2.14, respectively.

45

(a)

1

−2
0
2
0

0.5

5

10

15

20

25

30

35

40

45

50

(b)

0

−2

0.1

0

0.05

2
0

5

10

15

20

25

30

35

40

45

50

0

Figure 2.10: The results of Gaussian process regression using true x. a) The posterior mean
estimation. b) The posterior variance estimation, and x shown with aquamarine crosses.
(a)
−2

2

0

1

2
0

5

10

15

20

25

30

35

40

45

50

(b)

0

−2

0.2

0

0.1

2
0

5

10

15

20

25

30

35

40

45

50

0

¯
Figure 2.11: The results of the QDS 1 approach using noisy positions x. a) The predictive
¯
mean estimation. b) The predictive variance estimation, and x shown with aquamarine
crosses.
Since there is only one measurement per position, QDS 2 is same as QDS 1. As can be
seen in these ﬁgures, the peaks of the predicted ﬁelds by Monte Carlo and Laplace’s method

46

(a)

1

−2
0
2
0

0.5

5

10

15

20

25

30

35

40

45

50

(b)

0

−2

0.2

0

0.1

2
0

5

10

15

20

25

30

35

40

45

50

0

Figure 2.12: The results of the Monte Carlo method with 1000 samples using noisy positions
¯
¯
x. a) The predictive mean estimation. b) The predictive variance estimation, and x shown
with aquamarine crosses.
(a)

1

−2
0
2
0

0.5

5

10

15

20

25

30

35

40

45

50

(b)

0

−2

0.2

0

0.1

2
0

5

10

15

20

25

30

35

40

45

50

0

¯
Figure 2.13: The results of the Laplace method using noise augmented positions x shown
with aquamarine crosses. a) The predictive mean estimation. b) The predictive variance
¯
estimation and x shown with aquamarine crosses.
¯
with noisy x are closer to the peak predicted by the true x than that of the QDS approach.
Indeed, QDS 1 has failed in Fig. 2.11 to produce a good estimation of the true ﬁeld with
47

(a)

1

−2
0
2
0

0.5

5

10

15

20

25

30

35

40

45

50

(b)

0

−2

0.2

0

0.1

2
0

5

10

15

20

25

30

35

40

45

50

0

Figure 2.14: The results of the simple Laplace method using noise augmented positions
¯
x shown with aquamarine crosses. a) The predictive mean estimation. b) The predictive
ˆ
variance estimation and x shown with aquamarine crosses.
Table 2.4: Experimental RMS error w.r.t. Gaussian estimation
Estimator & Method
QDS 1 in (2.32)
Monte Carlo method in (2.22)
Laplace’s method in (2.27)
Simple Laplace method in (2.30)

Figure
Fig. 2.11
Fig. 2.12
Fig. 2.13
Fig. 2.14

RMS error
0.3955
0.1176
0.1305
0.1252

neglecting the uncertainty in the positions, while Monte Carlo, Laplace and simple Laplace
methods, proposed in this chapter, produce good estimations in Figs. 2.12, 2.13, and 2.14,
respectively.
Gaussian regression using true x is our best estimation of the true ﬁeld (Fig. 2.10).
Table 2.4 shows the RMS errors of the diﬀerent estimation approaches using the noisy x (i.e.
¯
x) with respect to Gaussian regression using the true x. As can be seen in Table 2.4, the
proposed approaches outperform QDS 1 in terms of the RMS error.

48

2.8.2

Experiment with noisy sampling positions

In Section 2.8.1 we used the true and simulated noisy sampling positions to compare the
performance of the proposed approaches. In this section, we present another set of experimental results under real localization errors. The experimental results were obtained using a
single aquatic surface robot in an outdoor swimming pool as shown in Fig. 2.15. The robot
is capable of monitoring the water temperature in an autonomous manner while could be
remotely supervised by a central station as well. To measure the location of the robot, we
used a Xsense MTi-G sensory unit [1] (as shown in the center of Fig. 2.15-(a)) which consists
of an accelerometer, a gyroscope, a magnetometer, and a Global Positioning System (GPS)
unit. A Kalman ﬁlter is implemented inside of this sensor by the company, which produces
the localization of the robot with accuracy of one meter. More information about this experiment can be found in [39]. In this experiment, we have selected and identiﬁed the system
parameters based on a priori knowledge about the process and sensor noise characteristics.
Therefore, model hyperparameters such as σ0 = 2, Σv = I, σ = 0.005, and σf 2 = 0.3 are
known. The number of sampling positions and sampled measurements are N = n = 10.
To validate our approaches, we have controlled the temperature ﬁeld by turning the hot
water pump on and oﬀ. The hot water outlet locations have been shown in Fig. 2.15-(b).
We turned on the hot water pump for a while. After that, the hot water pump was turned
oﬀ, and after 6 minutes, the robot collected 10 measurements with 10 sec. time intervals.
The estimated temperature and its error variance ﬁelds, by applying QDS 1, Monte Carlo,
fully exponential Laplace, and simple Laplace methods are shown in Figs. 2.16, 2.17, 2.18,
and 2.19, respectively. For each method, the estimated temperature and its error variance
ﬁelds are shown in subﬁgures of (a) and (b), respectively.

49

temperature sensor
connector

HDD

motor driver

wifi

battery micro controller
MTi-G

PC104
motor
(a)

Hot water outlets

(b)
Figure 2.15: a) The developed robotic sensor. b) The experimental environment- a 12 × 6
meters outdoor swimming pool.
As can be seen in Figs. 2.17, 2.18, and 2.19, results from Monte Carlo, fully exponential
Laplace, and simple Laplace methods are well matched.

50

(a)

−3

18

−2

17.8

−1

17.6

0

17.4

1

17.2

2
3
−6

−4

−2

0

2

4

6

(b)

−3

0.5

−2

0.4

−1

0.3

0

0.2

1

0.1

2
3
−6

17

−4

−2

0

2

4

6

0

¯
Figure 2.16: The results of the QDS 1 approach using noisy positions x. a) The predictive
¯
mean estimation. b) The predictive variance estimation, and x shown with aquamarine
crosses.

2.9

Conclusion

We have formulated Gaussian process regression with observations under the localization
uncertainty due to (possibly mobile) sensor networks with limited resources. Eﬀects of the
measurements noise, localization uncertainty, and prior distributions have been all correctly
incorporated in the posterior predictive statistics in a Bayesian approach. We have reviewed
the Monte Carlo sampling and Laplace’s method, which have been applied to compute the
analytically intractable posterior predictive statistics of the Gaussian processes with localization uncertainty. The approximation error and complexity of all the proposed approaches
51

(a)

−3

18

−2

17.8

−1

17.6

0

17.4

1

17.2

2
3
−6

−4

−2

0

2

4

6

(b)

−3

0.5

−2

0.4

−1

0.3

0

0.2

1

0.1

2
3
−6

17

−4

−2

0

2

4

6

0

Figure 2.17: The results of the Monte Carlo method with 1000 samples using noisy positions
¯
¯
x. a) The predictive mean estimation. b) The predictive variance estimation, and x shown
with aquamarine crosses.
have been analyzed. In particular, we have provided tradeoﬀs between the error and complexity of Laplace approximations and their diﬀerent degrees such that one can choose a
tradeoﬀ taking into account the performance requirements and computation complexity due
to the resource-constrained sensor network. Simulation study demonstrated that the proposed approaches perform much better than approaches without considering the localization
uncertainty properly. Finally, we applied the proposed approaches on the experimentally
collected real data to provide proof of concept tests and evaluation of the proposed schemes
in practice. From both simulation and experimental results, the proposed methods outper-

52

(a)

−3

18

−2

17.8

−1

17.6

0

17.4

1

17.2

2
3
−6

−4

−2

0

2

4

6

(b)

−3

0.5

−2

0.4

−1

0.3

0

0.2

1

0.1

2
3
−6

17

−4

−2

0

2

4

6

0

¯
Figure 2.18: The results of the Laplace approximations using x. a) Predictive mean estima¯
tion. b) The predictive variance estimation, and x shown with aquamarine crosses.
formed the quick-and-dirty solutions often used in practice.

53

(a)

−3

18

−2

17.8

−1

17.6

0

17.4

1

17.2

2
3
−6

−4

−2

0

2

4

6

(b)

−3

0.5

−2

0.4

−1

0.3

0

0.2

1

0.1

2
3
−6

17

−4

−2

0

2

4

6

0

Figure 2.19: The results of the simple Laplace method using noise augmented positions
¯
x shown with aquamarine crosses. a) The predictive mean estimation. b) The predictive
ˆ
variance estimation and x shown with aquamarine crosses.

54

Chapter 3
Position estimation using Gaussian
process regression
Abstract: This chapter presents a position estimation method for a robot using an omnidirectional camera. We present an approach to build a map from optimally selected visual
features using Gaussian Process (GP) regression. Optimal feature selection for regression
is challenging in general. In this chapter, we apply a cross-validation technique to select
features in order to improve the quality of the position estimation. The position of the robot
is estimated by a maximum likelihood estimation (MLE). The collection of selected features
over a surveillance region is modeled by a multivariate GP with unknown hyperparameters.
The hyperparameters are identiﬁed through the learning process by an MLE, which are used
in prediction in an empirical Bayes approach. The excellent results of the proposed algorithm
are illustrated by the experimental study under two diﬀerent real-world scenarios with and
without uncertainty in sampling positions.

3.1

Introduction

Localization of a mobile robot relative to its environment using vision information has received extensive attention over past few decades from the robotic and computer vision com-

55

munities [5, 20]. Vision-based robot positioning involves two steps. The ﬁrst step involves
learning some properties of vision data (features) with respect to the spatial position where
observation is made, so-called mapping. The second step is to ﬁnd the best match for the
new spatial position corresponding to the newly observed features, so-called matching. The
mapping from these visual features to the domain of the associated spatial position is highly
nonlinear and sensitive to the type of selected features. In most cases, it is very diﬃcult
to derive the map analytically. The features shall vary as much as possible over the spatial
domain while varying as small as possible for a given position over the disturbance. For
example, they should be insensitive to changes in illumination and partial obstruction.
Motivated by the aforementioned situations, we consider the problem of selecting features
from the original feature set in order to improve the localization performance of a robot.
The central assumption when using a feature selection technique is that the original feature
set contains many redundant or irrelevant features. To facilitate further discussion, let us
consider a conﬁguration where the input vector X is the robot position and the output
feature vector Y is the collection of extracted features from the vision data. We ﬁrst build
a feature map F at a robot location X such that F (X) = Y . In order to reduce position
estimation error, the ideal subset is deﬁned as follows.

ˆ
Yopt = arg min X − F −1 (Y ) 2 ,
ˆ
Y

ˆ
where Y is a vector that consists of the selected entries of the original vector Y . However with
a high cardinality of the original feature set, the optimal solution relies on the combinatorial
optimization which is not feasible.
On the other hand, using mutual information criteria, F and F −1 could be chosen as

56

follows:
F (X) = arg max I(X, Y ),
Y

where I(X, Y ) =

P(X,Y )

P(X, Y ) log P(X) P(Y )

F −1 (Y ) = arg max I(X, Y ),
X

is the mutual information of X and Y . Note

that, in the case where P(X) and P(Y ) are constant then F (X) is equivalent to the maximized
log-likelihood function. Guo et al. [33] show by using mutual information, one can achieve a
recognition rate higher than 90% while just using 0.61% of feature space for a classiﬁcation
problem. The only issue with mutual information is with its computational complexity and
to avoid that, one has to avoid the higher dependencies among variables [81].
In order to make a fast and precise estimation, most of the existing localization algorithms
extract a small set of important features from the robotic sensor measurements. The features
used in diﬀerent approaches for robotic localization range from C.1: artiﬁcial markers such
as color tags [45] and barcodes (that need to be installed) [92], C.2: geometric features such
as straight wall segments and corners [46], and to C.3: natural features such as light and
color histograms [4]. Most of the landmark-based localization algorithms are classiﬁed in C.1
and C.2. It is shown in [87] that autonomous navigation is possible for outdoor environments
with the use of a single camera and natural landmarks. In a similar attempt, [22] addressed
the challenging problem of indoor place recognition from wearable video recording devices.
The localization methods which rely on artiﬁcial markers (or static landmarks) have
three disadvantages: lack of ﬂexibility, lack of optimality, and lack of autonomy. A method
is described in [99] that enables robots to learn landmarks for localization. Artiﬁcial neural
networks are used for extracting landmarks. However, the localization methods which rely
on dynamic landmarks [99] have disadvantages such as lack of stability. Furthermore, there
are reasons to avoid the geometric model as well, even when a geometric model does exist.

57

Such cases may include: 1) the diﬃculty of reliably extracting sparse, stable features using
geometrical models, 2) the ability to use all sensory data directly rather than a relatively
small amount of abstracted discrete information obtained from feature extraction algorithms,
and 3) high computational and storage costs of dealing with dense geometric features.
In contrast to the localization problem with artiﬁcial markers or popular geometrical
models, there is a growing number of practical scenarios in which global statistical information is used instead. Some works illustrate localization using various spatially distributed
(continuous) signals such as distributed wireless Ethernet signal strength [27], or multidimensional magnetic ﬁelds [105]. In [110], a neural network is used to learn the implicit
relationship between the pose displacements of a 6-DOF robot and the observed variations
in global descriptors of the image, such as geometric moments and Fourier descriptors. In
similar studies, gradient orientation histograms [55] and low dimensional representation of
the vision data [103] are used to localize mobile robots. In [80], an algorithm is developed for
navigating a mobile robot using a visual potential. The visual potential is computed from
the image appearance sequence captured by a camera mounted on the robot. A method
for recognizing scene categories by comparing the histograms of local features is presented
in [59]. Without explicit object models, by using global cues as indirect evidence about the
presence of an object, they consistently achieve an improvement over an orderless image
representation [59].
The recent research eﬀorts that are closely related to our problem are summarized as
follows. The location for a set of image features from new observations is inferred by comparing new features with the calculated map [7, 66, 67]. In [109], a neural network is used
to learn the mapping between image features and robot movements. Similarly, there exists
eﬀort on automatically ﬁnding the transformation that maximizes the mutual information
58

between two random variables [107]. Using Gaussian process (GP) regression, the authors
of [7, 91] present eﬀective approaches to build a map from a sparse set of noisy observations
taken from known locations using an omnidirectional camera. While the selection of visual
features for such applications determines the ultimate performance of the algorithms, such
a topic has not been investigated to date. Therefore, building on Brook’s approach [7] our
work expands it more on the feature extraction and selection in order to improve the quality
of localization. A Bayesian point of view is taken to make the map using a GP framework.
The contributions of the chapter are as follows. This chapter provides a position estimation method for a robot using an omnidirectional camera. We present an approach to
build a map from optimally selected visual features using GP regression. First, we describe
how we can extract some robust properties from vision data captured by an omnidirectional
camera (Section 3.2). In particular, we describe how the fast Fourier transform (FFT) is
applied to the panoramic image to calculate a set of image properties. We then transform
the high dimensional vision data to a set of uncorrelated feature candidates. A multivariate
GP regression with unknown hyperparameters is formulated to connect the set of selected
features to their corresponding sampling positions (Section 3.3). An empirical Bayes method
using a point estimate is used to predict the feature map. Next, a feature reduction approach
is developed using a cross-validation method such that an optimal subset of the features is
selected to minimize the Root Mean Square Error (RMSE) (Section 3.4). The eﬀectiveness of
the proposed algorithms is illustrated by excellent experimental results under two diﬀerent
setups (Section 3.5). Additionally, some comparison results are presented under diﬀerent
images features and feature selection methods.

59

(a)

(b)

(c)

(d)

Figure 3.1: (a) and (b) show the wrapped omnidirectional image and the unwrapped
panoramic image, respectively. (c) and (d) show the reduced size gray scale image and
the two-dimensional FFT magnitude plot, respectively.

3.2

Image features

Conventional video cameras with projective lens have restricted ﬁelds of view. Adding mirrors of diﬀerent shapes, 360◦ panoramic views can be achieved in a single image [26]. In
this chapter, to make localization insensitive to heading angle, a spherical mirror is used to
capture a 360◦ view from the environment of a robot.
Before an omnidirectional image is processed, it is ﬁrst unwrapped. When it comes from
the camera, the image is a nonlinear mapping of a 360◦ panoramic view onto a donut shape.
Recovering the panoramic view from the wrapped view requires the reverse mapping of pixels
from the wrapped view onto a panoramic view [19, 66]. Figs. 3.1-(a) and 3.1-(b) show the
wrapped omnidirectional image and the unwrapped panoramic image, respectively.
The fast Fourier transform (FFT) is applied to the panoramic image to calculate a set of

60

image properties y. For a square image of size N × N , the two-dimensional FFT is given by
N −1 N −1

F [i] (ρ, l) =

b
a
−j2π(ρ N +l N )

f [i] (a, b)e

,

a=0 b=0

where f [i] is the i-th two-dimensional realized image, and j is the imaginary unit. In order
to use FFT, we convert panoramic color images to gray scale 128 × 128 pixel images, i.e.,
[f (a, b)]. Fig. 3.1-(c) and (d) show the reduced size gray scale image and its two-dimensional
FFT magnitude plot, respectively. The Fourier transform is an important image processing tool which is used to decompose an image into its periodic components. The output
of the transformation represents the image in the Fourier or frequency domain, while the
input image is in the spatial domain. In image processing, often only the magnitude of the
Fourier transformed image is utilized, as it contains most of the information of the geometric
structure of the spatial domain image [76]. The Fourier transform is used in a wide range of
applications such as image analysis, image ﬁltering, image reconstruction, and image compression. In addition, localization methods are proposed in [66, 110] where every image is
represented by a family of components in the Fourier domain.
We brieﬂy highlight three properties obtained by applying the Fourier transform on the
omnidirectional images. First, the magnitude of the Fourier components can be represented
by a continuous function of the robot position. Next, the phase angles of the Fourier components are related to the heading of the robot. Finally, by representing an image with a small
number of frequency components in the Fourier domain, data compression can be achieved.
In [66], it was shown that the ﬁrst 15 components of FFT carry enough information to
correctly match a pair of images.
In an indoor environment most of the objects stand vertically. In this planer navigation

61

case, it is reasonable to assume that features in the horizontal direction are more important
than those in the vertical direction. For example, the borders between wall and doors and
widows in the horizontal direction will be observed by this planar robot. As expected,
from the spectrum Fig. 3.1-(c) it can be seen that almost all the information is contained
near the center and the low parts of the frequencies. It is reasonable that components in
the frequencies provide the best ground for the localization process. Therefore, the image
properties are computed by:

[i]
yρ = F [i] (ρ, 0) =

127

127

−j2πρb

f [i] (a, b) e 128
b=0

,

(3.1)

a=0

where |·| denotes the absolute value of a complex number.
Equation (3.1) can be interpreted as the one-dimensional FFT of the image that is averaged over the vertical direction.
In addition to the FFT coeﬃcients, the histogram for the image, and wavelet coeﬃcients
could be used as image properties [59]. An image histogram is a type of the histogram that
acts as a graphical representation of the tonal distribution in a digital image. The number
of diﬀerent tonal bins in the histogram is speciﬁed by the number of properties. Then the
histogram determines the number of pixels for each tonal value. A discrete wavelet transform
(DWT) is a wavelet transform for which the wavelets are discretely sampled. The wavelet
coeﬃcients is also used widely by appearance-based place recognition methods [7, 103]. In
Section 3.5, we will experimentally evaluate our approach with features obtained by using
FFT, histogram, and discrete wavelet transforms.

62

3.3

Gaussian process (GP) model

We propose a multivariate GP as a model for the collection of image features. A GP deﬁnes
a distribution over a space of functions and it is completely speciﬁed by its mean function
[i]

and covariance function. We denote that yρ := yρ s[i] ∈ R is the i-th realization of the
ρ-th image property and s[i] ∈ S is the associated position where the realization occurs.
Here S denotes the surveillance region, which is a compact set. Then, the accumulative
image properties y is a random vector deﬁned by y =
[1]

[n]

yρ , · · · , yρ

T
T
y1 , · · · , ym

T

∈ Rnm , and yρ =

∈ Rn contains n realizations of the ρ-th image property.

We assume that the accumulative image properties can be modeled by a multivariate
GP, i.e. y ∼ GP(Γ, Λ), where Γ : S n → Rmn and Λ : S n → Rmn×mn are the mean function
and the covariance function, respectively. However, the size and multivariate nature of the
data lead to computational challenges in implementing the framework.
For models with multivariate output, a common practice is to specify a separable covariance structure for the GP for eﬃcient computation. For example, Higdon [36] calibrated
a GP simulator with the high dimensional multivariate output, using principal components
to reduce the dimensionality. Following such model reduction techniques, we transform the
vector y to a vector z such that its elements zρ |ρ ∈ Ωm , where Ωm = {1, · · · , m} are i.i.d.
The statistics of y can be computed from the learning dataset.

1
µy =
2

n

y [i] ,
i=1

1
Σy =
n−1

n

||y [i] − µy ||2 .
i=1

The singular value decomposition (SVD) of Σy is a factorization of the form Σy = U SU T ,
where U is a real unitary matrix and S is a rectangular diagonal matrix with nonnegative
real numbers on the diagonal. In summary, the transformation will be performed by the
63

following formula.
z [i] = S −1/2 U T (y [i] − µy ).

(3.2)

From now on, we assume that we applied the transformation given by (3.2) to the visual
data. Hence, we have the zero-mean multivariate GP: z(s) ∼ GP(0, K(s, s )), which consists
of multiple scalar GPs that are independent of each other.

3.3.1

The ρ-th random ﬁeld

In this subsection, we only consider the ρ-th random ﬁeld (visual feature). Other scalar
random ﬁelds can be treated in the same way. The collection of n realized values of the
[1]
[n]
[i]
ρ-th random ﬁeld is denoted by zρ := (zρ , · · · , zρ )T ∈ Rn , where zρ := zρ (s[i] ) is the
[i] [i]
i-th realization of the ρ-th random ﬁeld and s[i] = (s1 , s2 ) ∈ S ⊂ R2 is the associated

position where the realization occurs. We then have zρ (s) ∼ N (0, Σρ ), where Σρ ∈ Rn×n is
[ij]

the covariance matrix. The i, j-th element of Σρ is deﬁned as Σρ

[i]

[j]

= Cov (zρ , zρ ). In this

chapter, we consider the squared exponential covariance function [85] deﬁned as

[ij]

Σρ

2
= σf,ρ exp −

1
2

2

[i]

(s

[j]
− s )2

σ 2,ρ

=1


.

(3.3)

In general, the mean and the covariance functions of a GP can be estimated a priori by
maximizing the likelihood function [114].
[i]

The prior distribution of zρ is given by N (0, Σρ ). A noise corrupted measurement zρ at
˜
its corresponding location s[i] is deﬁned as follows.

[i]

[i]

[i]

zρ = zρ + ρ ,
˜

64

(3.4)

[i]

where the measurement errors { ρ } are assumed to be an independent and identically dis[i] i.i.d.

tributed (i.i.d.) Gaussian white noise, i.e., ρ

∼ N (0, σ 2,ρ ). Thus, we have that

zρ ∼ N (0, Rρ ),
˜

where Rρ = Σρ + σ 2,ρ I . The log-likelihood function is deﬁned by
1
1 T
n
˜
Lθ,ρ = − zρ Rρ zρ − log |R|ρ − log 2π,
˜
2
2
2

(3.5)

where n is the size of zρ .
˜
The hyperparameter vector of the ρ-th random ﬁeld is deﬁned as θρ = (σf,ρ , σ ,ρ , σ1,ρ , σ2,ρ )
∈ R4 . Using the likelihood function in (3.5) the hyperparameter vector can be computed
>0
by the ML estimator
¯
θρ = arg max Lθ,ρ ,
θ

(3.6)

which will be plugged in prediction as in an empirical Bayes way.
All parameters are learned simultaneously. If no prior information is given, then the
maximum a posteriori probability (MAP) estimator is equal to the ML estimator [114].
In a GP, every ﬁnite collection of random variables has a multivariate normal distribution.
Consider a realized value of the ρ-th random ﬁeld zρ being taken from the associated location
s . The probability distribution P(zρ |s , s, zρ ) is a normal distribution with the following
˜
mean and variance.

T −1 ˜
µρ (s ) = Cρ Rρ zρ ,

2
2
T −1
σρ (s ) = σf,ρ − Cρ Rρ Cρ ,

65

(3.7)

where the covariance Cρ := Cov (zρ , zρ ) ∈ R1×n is deﬁned similar to (3.3).
In order to estimate location s , using the MAP estimator, we need to compute
P(s |˜ρ , s, zρ ), where the noisy observation zρ is the summation of the realized values of the
z
˜
˜
random ﬁeld zρ and a noise process.

˜
P(s |˜ρ , s, zρ ) =
z

˜
˜
P(˜ρ |s , s, zρ ) P(s |s, zρ )
z
.
P(˜ρ |s, zρ )
z
˜

(3.8)

A MAP estimator given the collection of observations zρ is a mode of the posterior
˜
distribution.
sρ = arg max P(s |˜ρ , s, zρ ).
¯
z
˜
s ∈S

(3.9)

If P(s |s, zρ ) and P(˜ρ |s, zρ ) are uniform probabilities, then the MAP estimator is equal
˜
z
˜
to the ML estimator, given by
sρ = arg max Lρ (s ),
¯
s ∈S

(3.10)

where the ρ-th log-likelihood function, i.e., Lρ (s ), is deﬁned as follows.

Lρ (s ) =

3.4

−1
2

|˜ρ − µρ (s )|2
z
2
σ 2,ρ + σρ (s )

2
+ log σ 2,ρ + σρ (s ) + log 2π .

(3.11)

Localization and feature selection

Let Ω be the collection of indices that are associated to the multiple scalar random ﬁelds
(of the multivariate GP). Provided that all scalar random ﬁelds (of the multivariate GP) are
independent of each other, we then obtain a computationally eﬃcient ML estimate of the

66

location given the observations of all scalar random ﬁelds {˜ρ |ρ ∈ Ω} as follows.
z

sΩ = arg max
¯

s ∈S

Lρ (s ),

(3.12)

ρ∈Ω

where Lρ (s ) is the ρ-th log-likelihood function as given in (3.11).
In this chapter, a hold-out cross-validation technique [52] is used for the model selection.
It is mainly used in settings where the goal is prediction, and one wants to estimate how
accurately a predictive model will perform in practice. To this end, we divide the dataset
into two segments: one used to learn or train the GP model and the other used to validate
the model.
The RMSE is used to measure the performance of GP models. It is deﬁned by the
following equation.

RMSE(Ω) =

1
nc

nc

[i]

sc − sΩ
¯

2

(3.13)

,

i=1

where · is the Euclidean norm of a vector. In the case that Ω = ∅, we deﬁne the following.

RMSE(∅) =

1
nc

nc

[i]

sc − median(sc )

2

,

i=1

where median(·) is the median of a random vector. Assume that Ωm = {1, · · · , m} is the set
of all features. Dupuis et al. [23] reported that the backward sequential selection outperform
the forward sequential selection. Thus, we use a backward sequential selection algorithm as

67

Algorithm 1 Learning maps from a sparse set of panoramic images observed in known
locations
Input:
(1) training dataset includes a set of panoramic images captured from
known spatial sites,
Output: (1) a linear transformation from image properties to uncorrelated
visual features,
(2) the estimated hyperparameter, the estimated mean and the estimated variance function of each independent visual feature,
extract image properties y [i] in the available learning dataset using (3.1)
use SVD to make a set of uncorrelated visual features z [i] using (3.2)
for each independent visual feature estimate hyperparameters using (3.6)
compute the mean function and variance function for each of independent features
using (3.7)
5: choose optimal subset of visual features using (3.15) to eliminate some of the visual
features that are worthless for the localization goal.

1:
2:
3:
4:

follows.

Ω −1 = Ω − arg min RMSE(Ω − ρ), ∀ ∈ Ωm ,
ρ∈Ω

(3.14)

where Ω − ρ = {p|p ∈ Ω , p = ρ}.
Finally a subset of features is selected as follows.

Ωopt = arg

min

Ω=Ω1 ,··· ,Ωm

RMSE(Ω).

(3.15)

The optimum subset Ωopt has the minimum RMSE among {Ω1 , · · · , Ωm }. The mapping and
matching steps of the proposed approaches in this chapter are summarized in Algorithm 1
and Algorithm 2, respectively.

68

Algorithm 2 Localization predictive inference using learned map of visual features
Input:
(1) a linear transformation from image properties to uncorrelated
visual features,
(2) the estimated hyperparameter and the estimated mean and variance function of selected visual features,
Output: (1) position of newly captured images.
1: capture new images and obtain image properties y using (3.1)
2: compute the selected visual features z using (3.2)
3: compute the likelihood function of selected features ρ ∈ Ωopt over possible sampling
positions using (3.11)
4: determine the estimated position sΩ
¯
using (3.12)
opt

3.5

Experimental Results

In this section, we demonstrate the eﬀectiveness of the proposed localization algorithms using
experiments. We report results on two diﬀerent datasets of Case 1 and Case 2, obtained
from two experimental setups.

3.5.1

Experimental setups

In Case 1, the Kogeto panorama lens was used to capture 360-degree images. In total,
207 pairs of exact sampling positions were recorded manually and corresponding captured
panoramic images on a regular lattice (23 × 9 feet2 ) were collected.
In Case 2, a vision-based robotic sensor was built by the authors to validate a proof of
concept for the proposed methods. The wireless mobile robot is equipped with two motorized
wheels, a micro-controller, a 360 degree omnidirectional camera, a wireless receiver, and a
transmitter. The omnidirectional camera was built from a cheap camera (Ai-Ball Treck R )
and a spherical mirror. The 360 degree images of the environment around the robot produced
by the omnidirectional camera were steamed via 802.11 b/g Wi-Fi (see Fig. 3.2). In Case 2,
we collected 825 pairs of noisy sampling positions recorded automatically by image processing
69

software (using an exterior camera), and corresponding captured panoramic images. The
noise power of the image processing software is 1 square foot, and the samples were taken
on a 10 × 26 feet2 surface.
Figs. 3.2-(a) and 3.2-(b) show two diﬀerent hardware setups for Case 1 and Case 2,
respectively. The datasets are divided into 80% learning, 10% cross-validation and 10%
testing subsets. The learning subset is used to estimate hyperparameters, mean functions,
and variance functions of GP models for the scalar ﬁelds. The cross-validation subset is used
to select the best features in order to minimize the localization estimation error. After the
training, the testing subset is used to evaluate the algorithm performance.

3.5.2

Learning of GP models in an empirical Bayes approach

As illustrative examples, we apply the proposed algorithm on both dataset with 32 computed
visual features. Figs. 3.3 and 3.4 show the estimated hyperparameters (k ∈ Ω32 ) for Case 1
and Case 2, respectively. The variance of the random ﬁeld σf , the spatial bandwidths β1 and
β2 , and the noise variance σ are estimated for each independent feature, individually. Thus,
32 × 4 = 128 hyperparameters have been estimated in total for each experimental setup.
In Case 1, the predictive mean and variance of each visual features {zk |k ∈ Ω32 } are shown
in Figs. 3.5 and 3.6, respectively. The predictive mean and variance for Case 2 are shown
similarly in Figs. 3.7 and 3.8, respectively. The ﬁxed running time using Matlab R2009b on
a PC (3.2 GHz Intel i7 Processor) is summarized in Table 3.1, for Case 2 (similar results are
observed for Case 1). Localization for each test point takes 0.2 sec which is fast enough for
a real time implementation.

70

(a)

(b)
Figure 3.2: (a) Dot iPhone Panorama lens is used for Case 1. (b) A vision-based robot was
built equipped with a 360 degree omnidirectional camera for Case 2.

71

σf,k

σ1,k

σ2,k

σε,k

1

2

4

6

8

10

12 14 16 18 20 22 24
k−th independent features zk.

26

28

30

32

Figure 3.3: Estimated hyperparameters for each individual scalar ﬁeld for Case 1.
σf,k

σ1,k

σ2,k

σε,k

1

2

4

6

8

10

12 14 16 18 20 22 24
k−th independent features zk.

26

28

30

32

Figure 3.4: Estimated hyperparameters for each individual scalar ﬁeld for Case 2.

3.5.3

Comparison study

We compare the performance of the proposed approach under diﬀerent image features as
summarized in Table 3.2. In this comparison study three diﬀerent types of appearance-based
features such as FFT [110], histogram [34], and wavelet [7] are considered. Additionally, the
comparison between the proposed cross-validation (CV) method and the traditional PCA in
feature elimination is investigated. Finally, we compared the result of localization methods
in terms of diﬀerent number of features.
Table 3.2 shows the dataset (column 1), the appearance-based features type (column 2),
the total number of features (column 3), the optimum number of features determined by the
72

µ1
8

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

µ2

5

10 15 20
µ5

10 15 20
µ9

10 15 20
µ13

10 15 20
µ17

10 15 20
µ21

10 15 20
µ25

10 15 20
µ29

10 15 20

2

8
5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

µ3

10 15 20
µ6

10 15 20
µ10

10 15 20
µ14

10 15 20
µ18

10 15 20
µ22

10 15 20
µ26

10 15 20
µ 30

8

2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

10 15 20

10 15 20
µ7

10 15 20
µ11

10 15 20
µ15

10 15 20
µ19

10 15 20
µ23

10 15 20
µ27

10 15 20
µ31

8
5

µ4

2

5

10 15 20

2

5

8
2

5

8
2

5

10 15 20
µ20

5

10 15 20
µ24

8
2

5

5

1

0

−1

10 15 20
µ28

−3

10 15 20

−4

µ32

8
2

3

−2

8
2

10 15 20
µ16

5

8
2

10 15 20
µ12

4

2

8
2

10 15 20
µ8

5

10 15 20

Figure 3.5: Predictive mean function map for each individual feature zk for Case 1.
proposed approach (column 4), the localization RMSE obtained using the total number of
features (column 5), the sixteen features selected by PCA (column 6), the sixteen features
selected by CV (column 7), and the optimum number of features (column 8).
Note that sampling positions of Case 1 were recorded exactly while those of Case 2

73

Table 3.1: Computational time
Procedure
Pre-processing
Feature extraction
Learning hyperparameters
Learning map and variances
Feature selection
Localization

Number of observations
825
825
659
659
83
83

Time (sec)
35
2
286
51
21
10

Table 3.2: The localization performance comparison

Case 1

Case 2

feature type
FFT
FFT
FFT
FFT
Histogram
Histogram
Histogram
Histogram
Wavelet
FFT
FFT
FFT
FFT
Histogram
Histogram
Histogram
Histogram
Wavelet

number of features
total
optimum
8
7
16
13
32
10
64
26
8
5
16
10
32
13
64
32
78
32
8
8
16
11
32
21
64
21
8
5
16
9
32
19
64
32
78
27

74

RMSE using subset of features
All PCA 16 CV 16 CV opt
4.98
0.63
4.98
0.53
4.98
4.98
0.24
0.22
4.98
4.98
0.31
0.26
4.12
2.10
3.85
1.57
1.44
1.43
1.22
1.18
1.46
1.56
1.15
1.13
6.69
4.43
4.03
2.49
2.21
2.21
2.00
1.58
1.52
2.42
1.37
1.36
1.96
2.54
1.24
1.19
2.88
2.38
2.60
1.77
2.03
1.54
1.59
1.47
2.08
2.23
1.53
1.40
5.96
7.94
3.82
3.52

σ1

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

10 15 20
σ5

10 15 20
σ9

10 15 20
σ13

10 15 20
σ17

10 15 20
σ21

10 15 20
σ25

10 15 20
σ29

10 15 20

σ2

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

10 15 20
σ6

10 15 20
σ10

10 15 20
σ14

10 15 20
σ18

10 15 20
σ22

10 15 20
σ26

10 15 20
σ30

10 15 20

σ3

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

8
2

5

10 15 20
σ7

10 15 20
σ11

10 15 20
σ15

10 15 20
σ19

10 15 20
σ23

10 15 20
σ27

10 15 20
σ
31

10 15 20

σ4

8
2

5

10 15 20
σ8

5

10 15 20
σ12

8
2
8
2

5

8
2

5

5

8
2

5

8
2

0.8
0.7

10 15 20
σ20

10 15 20
σ24

10 15 20
σ28

0.5
0.4
0.3
0.2
0.1

5

10 15 20
σ
32

5

10 15 20

8
2

0.9

0.6

8
2

10 15 20
σ16

1

Figure 3.6: Prediction variance map for each individual feature zk for Case 1.
were noisy. As expected, the results of Case 1 outperform those of Case 2. This could
be due to the fact that noisy sampling positions lead to poorly estimated hyperparameters
and predictions of the GP. Furthermore, the FFT appearance-based features show more
vulnerability against sampling position noise. The RMSE results of FFT-based localization
75

25

µ1

25

µ2

25

µ3

25

µ4

25

µ5

25

µ6

25

µ7

25

20

20

20

20

20

20

20

20

15

15

15

15

15

15

15

15

10

10

10

10

10

10

10

10

5

5

5

5

5

5

5

5

25

5 10
µ10
25

5 10
µ11
25

5 10
µ12
25

5 10
µ13
25

5 10
µ14
25

5 10
µ15
25

20

20

20

20

20

20

20

20

15

15

15

15

15

15

15

15

10

10

10

10

10

10

10

10

5

5

5

5

5

5

5

5

25

5 10
µ22
25

5 10
µ23
25

µ8

25

25

5 10
µ9

5 10
µ17

25

5 10
µ18

25

5 10
µ19

25

5 10
µ20

25

5 10
µ21

20

20

20

20

20

20

20

15

15

15

15

15

15

10

10

10

10

10

10

5

5

5

5

5

5

1.5
1

5 10
µ24

10

5

2

5

0.5
0

15

10

5 10
µ16

20

15

2.5

25

5 10
µ25

25

5 10
µ26

25

5 10
µ27

25

5 10
µ28

25

5 10
µ29

25

5 10
µ30

25

5
µ31

10

−0.5
−1

5

25

20

20

20

20

20

20

20

15

15

15

15

15

15

10

10

10

10

10

10

5

5

5

5

5

5

−2

10

5

−1.5

15

10

µ32

20

15

10

5

5

10

5

10

5

10

5

10

5

10

5

10

5

10

−2.5

5

10

Figure 3.7: Predictive mean function map for each individual feature zk for Case 2.
are slightly better than those of histogram-based localization and much better than those of
wavelet-based localization [7]. Besides the type and total number of features, the CV has
76

25

σ
1

25

σ
2

25

σ
3

25

σ
4

25

σ
5

25

σ
6

25

σ
7

25

20

20

20

20

20

20

20

20

15

15

15

15

15

15

15

15

10

10

10

10

10

10

10

10

5

5

5

5

5

5

5

5

25

5 10
σ
9
25

5 10
σ
10
25

5 10
σ
11
25

5 10
σ
12
25

5 10
σ
13
25

5 10
σ
14
25

5 10
σ
15
25

20

20

20

20

20

20

20

20

15

15

15

15

15

15

15

15

10

10

10

10

10

10

10

10

5

5

5

5

5

5

5

5

25

5 10
σ
17
25

5 10
σ
18
25

5 10
σ
19
25

5 10
σ
20
25

5 10
σ
21
25

5 10
σ
22
25

5 10
σ
23
25

20

20

20

20

20

20

20

20

15

15

15

15

15

15

15

15

10

10

10

10

10

10

10

10

5

5

5

5

5

5

5

σ
8

5

5
25

10

σ
25

5

25

σ
26

10

5

25

σ
27

10

5

25

σ
28

10

5

25

σ
29

10

5

25

σ
30

5

10
25

σ
31

10
25

20

20

20

20

20

20

20

15

15

15

15

15

15

10

10

10

10

10

10

5

5

5

5

5

5

0.1
5

10

σ
32

0.05

5

5

10

5

10

5

10

5

10

5

10

5

10

5

10

5

10

Figure 3.8: Prediction variance map for each individual feature zk for Case 2.

77

0.2

0.15

10

5

5 10
σ
24

15

10

0.3

0.25

20

15

5 10
σ
16

a better performance with respect to PCA. In general, increased cardinality of the original
feature set leads to a better performance in the 5th column possibly due to the increased
information. Similar trends are observed in the 7th and the 8th columns of Table 3.2.
Most importantly, we conclude that eliminating some of the features helps to improve the
localization error. Therefore, the RMSE of the proposed approach in this chapter is much
lower than other methods.

3.6

Conclusion and future works

This chapter has presented a novel approach to use vision data for the mobile robot localization. The predictive statistics of vision data is learned in advance and used in order to
estimate the position of mobile robot, equipped just with an omnidirectional camera in a
GPS-denied environment. The multivariate GP model with unknown hyperparameters is
used to model a collection of selected visual features. Experimental study shows excellent
positioning results within a reasonable computational time under two diﬀerent experimental
setups with and without uncertainty in sampling positions.
A key limitation of the current approach arises from the fact that, after the initial training
phase, learning is discontinued. In particular, if the environment changes, it is desirable that
the localization routines adapt to the changes in the environment. Thus, a future research
direction is develop a localization scheme that is adaptive to changes in the environment.

78

Chapter 4
Bayesian Prediction with Uncertain
Localization
Abstract: In this chapter, we consider the problem of predicting a spatio-temporal random
ﬁeld using sequential noisy observations collected by robotic sensors. The spatio-temporal
ﬁeld of interest is modeled by a sum of a time-varying mean function and a Gaussian Markov
random ﬁeld (GMRF) with unknown hyperparameters. We ﬁrst derive the exact Bayesian
solution to the problem of computing the predictive inference of the random ﬁeld, taking
into account observations, uncertain hyperparameters, measurement noise, and uncertain
localization in a fully Bayesian point of view. We show that the exact solution for uncertain localization is not scalable as the number of observations increases. To cope with
this exponentially increasing complexity and to be usable for mobile sensor networks with
limited resources, we propose a scalable approximation with a controllable tradeoﬀ between
approximation error and complexity to the exact solution. The eﬀectiveness of the proposed
algorithms is demonstrated by simulation and experimental results.

79

4.1

Introduction

In recent years, there has been an increasing exploitation of mobile robotic sensors in environmental monitoring [10, 12, 62, 64, 117]. Gaussian processes (or Gaussian random ﬁelds)
deﬁned by mean and covariance functions over a continuum space [14, 84] have been frequently used for mobile sensor networks to statistically model physical phenomena such as
harmful algal blooms, pH, and temperature, e.g., [30, 57, 119].
The signiﬁcant computational complexity in Gaussian process regression due to the growing number of observations (and hence the size of covariance matrix) has been tackled in
diﬀerent ways. In [119], the conditions under which near-optimal prediction can be achieved
is analyzed, using truncated observations when the covariance function is known a priori.
In [120], a new eﬃcient and scalable inference algorithm for a class of Gaussian processes
that builds on a Gaussian Markov random ﬁeld (GMRF) is developed for known hyperparameters. On the other hand, unknown hyperparameters in the covariance function can be
estimated by a maximum likelihood (ML) estimator or a maximum a posteriori (MAP) estimator and then can be used for the prediction [114]. However, the point estimate itself needs
to be identiﬁed using certain amount of measurements and it does not fully incorporate the
uncertainty in the estimated hyperparameters into the prediction in a Bayesian perspective.
The advantage of a fully Bayesian approach is the capability of incorporating various uncertainties in the model parameters and measurement processes in the prediction [3]. However, the solution often requires an approximation technique such as Markov Chain Monte
Carlo (MCMC), Laplace approximation, or variational Bayes methods, which still requires
a high level of computational complexity [3]. In [115, 117], a sequential Bayesian prediction
algorithm and its distributed version are designed to deal with uncertain bandwidths by

80

using a compactly supported kernel and selecting a subset of collected measurements. Sequential fully Bayesian prediction algorithms for a GMRF with unknown hyperparameters
are reported in [116].
Inexpensive wireless/mobile sensor networks [17, 79] are widespread at the cost of precision in localization. Due to their growing usage, there are many practical opportunities
where continuously sampled measurements need to be fused for sensors with localization
uncertainty. Theoretically-correct yet eﬃcient (or scalable) inference algorithms need to be
developed to meet such demands.
Related works involving uncertain localization in our context are as follows. Gaussian
process regression has been used in building map and localization in many practical applications. In [7], Gaussian process regression is used to model geo-referenced sensor measurements (obtained from a camera). After training, an ML estimator is used to ﬁnd the best
match for the locations of newly sampled measurements. However, the training step should
be done in advance for a new environment using a set of data including noisy measurements
and the exact sampling positions [7]. In [51, 105], Gaussian process regression is used to
implement simultaneous localization and mapping (SLAM) using a magnetic ﬁeld and its
feasibility is shown experimentally. In [77, 106], the problem of using laser range-ﬁnder data
is investigated to probabilistically classify a robot’s environment. They provide a continuous representation of the robot’s surroundings by employing a Gaussian process. In [65], a
Gaussian process model is presented for training on input points corrupted by independent
and identically distributed (i.i.d.) Gaussian noise. To make the computation tractable a
local linear expansion is used about each sampling position for the mean function. The work
in [65] assumes that all hyperparameters are trained oﬄine a priori. In [44], the problem of
Gaussian process regression with uncertain localization and known hyperparameters is for81

mulated and solved. A key limitation of such a approach with ﬁxed hiperparameters arises
from the fact that, after the initial training phase, learning is discontinued. In general, if the
environment changes, it is desirable that the localization algorithm adapts to the changes
on the ﬂy. A fully Bayesian approach that treats hyperparameters as random variables can
address this issue with increased computational complexity.
The novelty of our work as compared to the previous ones in [115–117] is to fully consider
the uncertainty on the sampling positions along with other uncertainties such as hyperparameters, observation noise etc., in a fully Bayesian manner. To the best of our knowledge,
fully Bayesian prediction algorithms for spatio-temporal random ﬁelds that can take into
account uncertain localization is scant to date. Hence, this chapter aims to develop such
inference algorithms for robotic sensor networks in practical situations. With continuous
improvement in computation power in embedded systems, it is very important to prepare
theoretically-correct, and ﬂexible fully Bayesian approach to cope with such practical problems.
The contributions of the chapter are as follows. First, we model a physical spatiotemporal random ﬁeld as a GMRF with uncertain hyperparameters and formulate the prediction problems with and without localization uncertainty. Next, we derive the exact Bayesian
solution to the problem of computing the predictive inference of the random ﬁeld, taking
into account uncertain hyperparameters, measurement noise, and uncertain localization in
a fully Bayesian point of view. We show that the exact solution for uncertain localization is
not scalable as the number of observations increases. To cope with this increasing complexity, we propose a scalable approximation with a controllable tradeoﬀ between approximation
error and complexity to the exact solution. The eﬀectiveness of the proposed algorithms is
demonstrated by simulation and experimental results. In particular, experimental results
82

are illustrated in both static and dynamical environments.
A preliminary version of this work without the proofs of theorems and experimental
results has been reported in [41].
The chapter is organized as follows. In Section 4.2, we explain how a GMRF can be
viewed as a sparse and discretized version of a Gaussian process. In Sections 4.3 and 4.4, we
introduce a spatio-temporal ﬁeld model based on a GMRF and the mobile sensor network. In
Section 4.5.1, we present a fully Bayesian inference approach to estimate the spatio-temporal
ﬁeld. The Bayesian prediction algorithm is extended for uncertain sampling positions in
Section 4.5.2. The complexity of the proposed algorithms is discussed in Section 4.5.3. We
demonstrate the eﬀectiveness through a simulation study in Section 4.6. Finally, we evaluate
our approach on a real experimental setup in Section 4.7.

4.2

Gaussian Process and Gaussian Markov Random
ﬁelds

Recently, there are eﬀorts to ﬁt a computationally eﬃcient GMRF on a discrete lattice to a
Gaussian random ﬁeld on a continuum space [15, 35, 90, 116]. It has been demonstrated that
GMRFs with small neighborhoods can approximate Gaussian ﬁelds surprisingly well [90].
This approximated GMRF and its regression are very attractive for the resource-constrained
mobile sensor networks due to its computational eﬃciency and scalability [60] as compared
to the standard Gaussian process and its regression. Fast kriging of large data sets by using
a GMRF as an approximation of a Gaussian random ﬁeld has been proposed in [35].
We now brieﬂy review a GMRF as a discretized Gaussian process on a lattice. Consider
a zero-mean Gaussian process: z(s) ∼ GP(0, K(s, s )), where K is the covariance function
83

deﬁned in a continuum space S. We discretize the compact domain S := [0 max(s1 )] ×
[0 max(s2 )] into n spatial sites Q :=

˚(1) , · · · ,˚(n)
s
s

⊂ R2 , where n = h max(s1 ) ×

h max(s2 ). h will be chosen such that n ∈ Z>0 . Note that n → ∞ as h → ∞. The collection
of realized values of the random ﬁeld in Q is denoted by z := (z (1) , · · · , z (n) )T ∈ Rn , where
z (i) := z(˚(i) ).
s
The prior distribution of z is given by z ∼ N (0, K), and so we have

1
P(z) ∝ exp − zT K−1 z ,
2

(4.1)

where K ∈ Rn×n is the covariance matrix. The i, j-th element of K is deﬁned as (K)ij =
Cov(z (i) , z (j) ) = K(z (i) , z (j) ). The prior distribution of z can be written by a precision
matrix Q0 = K−1 , i.e., z ∼ N (0, Q−1 ). This can be viewed as a discretized version of
0
the Gaussian process (or a GMRF) with a precision matrix Q0 on Q. Note that Q0 of this
ˆ
GMRF is not sparse. However, a sparse version of Q0 , i.e., Q0 with local neighborhood that
ˆ
can represent the original Gaussian process can be found, for example, making Q0 close to
Q0 in some norm [15,35,90]. This approximate GMRF will be computationally eﬃcient due
ˆ
to the sparsity of Q0 . For our main problems, we will use a GMRF with a sparse precision
matrix that represents a Gaussian process precisely (see Section 4.4).
In this section, we assume that we take N noisy measurements y = (y [1] , · · · , y [N ] )T ∈
RN from corresponding sampling locations s = (s[1]T , · · · , s[N ]T )T ∈ S N . Here, the measurement model is given by

y [i] := y(s[i] ) = z(s[i] ) + [i] , ∀i = 1, · · · , N

84

(4.2)

i.i.d.
where [i] ∼ N (0, σ 2 ) is the measurement noise and is assumed to be independent and

identically distributed (i.i.d.).
Using Gaussian process regression, the posterior distribution for z ∈ Rn is given by

z|s, y ∼ N (µ, Σ).

(4.3)

The predictive mean µ ∈ Rn and covariance matrix Σ ∈ Rn×n can by obtained by

µ = kT C−1 y,

Σ = K − kT C−1 k,

(4.4)

where the covariance matrices are deﬁned as k := Cov(y, z) ∈ RN ×n , C := Cov(y, y) ∈
RN ×N , and K := Cov(z, z) ∈ Rn×n .
In practice, resource-constrained robotic sensors are prone to large uncertainty in localization. Most previous works on Gaussian process regression for mobile sensor networks [8, 9, 14, 57] have assumed that the exact sampling positions are available, which is
not practical in real situations. Localization uncertainty eﬀect is potentially signiﬁcant. For
example, Fig. 2.1 shows the eﬀect of noisy sampling positions on the results of Gaussian
process regression. Note that adding noise to the sampling positions considerably increase
the empirical RMS error, shown in the third row of Fig. 2.1. This motivates our work in
this chapter.
The pose of a robot can be estimated by fusing diﬀerent sensory information producing
its estimate and estimation error statistics. In general, Bayes ﬁlters such as extended and
unscented Kalman ﬁlters [47, 96], particle ﬁlters [70], and Monte Carlo techniques [98] are
used for such localization. Nevertheless, obtaining precise localization of robotic networks

85

under limited resources is very challenging [37, 38].
Throughout the chapter, we assume that the positions of robotic sensors are estimated
by a standard technique that produces the position estimates along with estimation error
statistics. A discussion regarding speciﬁc localization technique is standard and is beyond
the scope of this chapter.
Having had the aforementioned assumption, from a localization algorithm, the prior distribution for sampling location s[i] is given as P(s[i] |˜[i] ), possibly with a compact support in
s
S. Then the predictive distribution of z given the measured locations ˜ = (˜[1]T , · · · , s[N ]T )T
s
s
˜
is thus given by

P(z|˜, y) =
s

s∈S N

P(z|s, y) P(s|˜, y)ds,
s

(4.5)

where P(z|s, y) can be obtained in (4.3). However, the predictive distribution in (4.5) does
not have a closed-form solution and needs to be computed either by MCMC methods or
approximation techniques [43].
Now we consider a discretized version of the Gaussian process, i.e., (GMRF) with a
precision matrix Q0 on Q. Since the sampling points of Gaussian process regression are not
necessarily on Q, we use the nearest grid point of a given sampling point s in S

q [i] = arg min s[i] − q .
q∈Q

The sampling positions for the GMRF are then exactly on the lattice, i.e., q [i] ∈ Q. The
posterior distribution of z ∈ Rn on Q given by measurements in y ∈ RN and sampling

86

positions in q = (q [1]T , · · · , q [N ]T )T ∈ QN , is then obtained by

z|q, y ∼ N (Q−1 b, Q−1 ),

(4.6)

where Q = Q0 + HP −1 H T , b = HP −1 y, with P = σ 2 I ∈ RN ×N and H ∈ Rn×N deﬁned
as

(H)ij =



 1, if ˚(i) = q [j] ,
s

(4.7)


 0, otherwise.
We consider again localization uncertainty for this GMRF. Let the measured noisy location q [i] be the nearest grid point of the measured noisy sampling point s[i] of the Gaussian
˜
˜
process. Now we obtain a set of discretized probabilities in Q induced by the continuous
prior distribution deﬁned in S. The discrete prior distribution for the sampling location q [i]
is given by
P(q [i] = ˚(j) |˜[i] ) =
s s

s⊂Vj

P(s|˜[i] )ds,
s

(4.8)

where P(s|˜) is the continuous prior as in Gaussian process regression and Vj is the Voronoi
s
cell of the j-th grid point ˚(j) given by
s

Vj := {s ∈ S | s − ˚(j) ≤ s − ˚(i) , ∀i = j}.
s
s

˜
The predictive distribution of z given y and q is thus given by

P(z|˜ , y) =
q

P(z|q, y) P(q|˜ , y),
q
q∈QN

87

(4.9)

(a) h1

(b) h2

Figure 4.1: Example of localization uncertainty for s[i] . The measured sampling location
q [i] is indicated in a small red circle which is the closest point in the discrete support to the
˜
measured sampling position in the continuous space. The small red circle along with the
blue squares and the blue star show the possible locations of the true sampling point q [i]
according to the prior distribution P(q [i] = ˚(j) |˜[i] ) with a compact support as shown in the
s q
[i] which is the closest point in the discrete support
big red circle. The blue star indicates q
to the true sampling position in the continuous space.
where P(z|q, y) can be obtained by (4.6) and the summation is over all possible locations
in Q. Fig. 4.1 shows two examples of using this approximation approach with h1 > h2
to convert a continuous space to a discrete one. When h → ∞, q → s and the standard
˜
˜
Gaussian process regression in a continuum space shall be recovered from the prediction
using the GMRF in a discretized space.

4.3

Mobile Senor Networks
(1)

(n)

Suppose that the sampling time t ∈ Z>0 is discrete. Let zt := (zt , · · · , zt )T ∈ Rn be the
corresponding values of the scalar ﬁeld at n special sites and time t.
Consider N spatially distributed mobile sensing agents indexed by j ∈ J := {1, · · · , N }
sampling at time t ∈ Z>0 . At time t, agent j takes a noise corrupted measurement at its

88

[j]

˚(i) ∈ Q, i.e.,
s

current location st

[j]

(i)

[j] i.i.d.
∼ N (0, σ 2 ),
t

[j]

yt = zt + t ,

(4.10)

[j]

where the measurement errors { t } are assumed to be i.i.d. The measurement noise level
σ 2 > 0 is assumed to be known. For notational simplicity, we denote all agents’ locations at
[1]T

time t by qt = qt

[N ]T T

, · · · , qt

[N ] T

[1]

by yt = yt , · · · , yt

∈ QN and the observations made by all agents at time t

∈ RN . Furthermore, we denote the collection of agents’ locations

and the collective observations from time 1 to t by q1:t = qT , · · · , qT
t
1
(1)

T

∈ QN t and y1:t =
(n)

(y1 , · · · , yt )T ∈ RN t , respectively. In addition, let us deﬁne zt = (zt , · · · , zt )T ∈ Rn on
[1]

[N ]

Q, and t = ( t , · · · , t )T ∈ RN . We then have the following collective notation.

T
yt = Ht zt + t ,

(4.11)

where Hτ ∈ Rn×N is deﬁned by

(Hτ )ij =


[j]

 1, if ˚(i) = qτ ,
s

(4.12)


 0, otherwise.

4.4

Spatio-Temporal Field Model
(i)

The value of the scalar ﬁeld at ˚(i) , zt is modeled by a sum of a time-varying mean function
s
and a GMRF

(i)

(i)

(i)

zt = λt + ηt , ∀i ∈ {1, · · · , n}, t ∈ Z>0 .

89

(4.13)

(i)

Here the mean function λt : Q × Z>0 → R is deﬁned as

(i)

λt = f (˚(i) )T βt ,
s

(4.14)

where f (˚(i) ) = (f1 (˚(i) ), · · · , fp (˚(i) ))T ∈ Rp is a known regression function and βt =
s
s
s
[1]

[p]

(βt , · · · , βt )T ∈ Rp is an unknown vector of regression coeﬃcients. The time evolution of
βt ∈ Rp is modeled by a linear time-invariant system given by

(4.15)

βt+1 = At βt + Bt ωt ,

where ωt ∼ N (0, W ), β0 ∼ N µβ0 , Σβ0 , and At and Bt are known system parameters.
In addition, we consider a zero-mean GMRF [88] ηt =

(1)

(n) T

ηt , · · · , ηt

∈ Rn whose

covariance matrix is given by

T
E(ηt ηk |θ) = Q−1 δ(t − k),
θ

(4.16)

where δ(·) is the Kronecker delta deﬁned by

δ(k) =



 1, k = 0,

(4.17)


 0, otherwise,
and the inverse covariance matrix (or precision matrix) Qθ ∈ Rn×n is a function of the
hyperparameter vector θ.
There are diﬀerent parameterizations of the GMRF (i.e., the precision matrix Qθ ) [88].
Our Bayesian approach does not depend on the choice of the parameterization for the preci-

90

sion matrix. However, for a concrete and useful exposition, we describe a speciﬁc parameterization used in this chapter. The precision matrix is parameterized with the full conditionals
as follows.
Let η be a GMRF on a regular two-dimensional lattice. The associated Gaussian full
conditional mean is
n
1
(j)
(i) (−i)
(Qθ )ij ηt ,
E(ηt |ηt , θ) = −
(Qθ )ii
j=1

(4.18)
(−i)

where (Qθ )ij is the i-th row and j-th column element of κ−1 Qθ . Here, ηt

is the collection

of ηt values everywhere except ˚(i) . The hyperparameter vector is deﬁned as θ = (κ, α)T ∈
s
R2 , where α = a − 4. Fig. 4.2 shows the value of κ−1 Qθ ij for one point along with its
>0
neighbors, graphically. The value of (Qθ )ij is (4 + a2 )κ−1 as denoted at the center node
of the graph in Fig. 4.2. That of (Qθ )ij is −2aκ−1 if j is one of the four closest neighbors
of i in the vector 1-norm sense as illustrated by the graph in Fig. 4.2. Thus, the value of
(Qθ )ij is zero if j is not one of the twelve closest neighbors of i (or twelve neighbors whose
1-norm distance to the i-th location is less than or equal to 2). The equation in (4.18) states
(i)

that the conditional expectation of ηt

(−i)

given the value of ηt everywhere else (i.e., ηt

)

can be determined just by knowing the value of ηt on the twelve closest neighbors (see more
details in [63]). The resulting GMRF accurately represents a Gaussian random ﬁeld with
the Mat´rn covariance function as shown in [63]
e
21−ρ
G(r) = σf 2

√

2ρr ρ

Γ(ρ)

Kρ

√
2ρr

,

where Kρ (·) is a modiﬁed Bessel function [84], with order ρ = 1, a bandwidth
and vertical scale σf 2 = 1/4πακ.

91

(4.19)

= 1/h

α,
2

1

2

2

-2a

4+a2

-2a

2

1

-2a

-2a

2

1

1
Figure 4.2: Elements of the precision matrix Qθ related to a given location.
The hyperparameter α > 0 guarantees the positive deﬁniteness of the precision matrix
Qθ . In the case where α = 0, the resulting GMRF is a second-order polynomial intrinsic
GMRF [88, 89].
From the presented model in (4.13), (4.15), and (4.16), the distribution of zt given βt
and θ is
zt |βt , θ ∼ N Fs βt , Q−1 ,
θ

(4.20)

where Fs := (f (˚(1) ), · · · , f (˚(n) ))T ∈ Rn×p .
s
s
In other words, zt |βt , θ ∼ GP(Fs βt , Kθ ) ∈ Rn is a non-zero mean Gaussian process. Here,
the covariance matrix Kθ is deﬁned as inverse of the precision matrix (i.e., Kθ = Q−1 ).
θ
Note that the precision matrix is a positive deﬁnite matrix and invertible, and (Kθ )ij =
(i)

(j)

Cov(zt , zt ), where (Kθ )ij is the i, j-th element of the covariance matrix.
For simplicity, let us deﬁne Bt = {βt , qt , yt , θ}. Using Gaussian process regression, the

92

posterior distribution for zt |Bt ∈ Rn is given by
−1
T
T
µz |B =Fs βt + Kθ Ht Ht Kθ Ht + σ 2 I
yt − Ht Fs βt ,
t t
−1 T
T
Σz |B =Kθ − Kθ Ht Ht Kθ Ht + σ 2 I
Ht Kθ .
t t

(4.21)

The basic idea behind the model introduced in (4.13), (4.15), and (4.16) stems from
the space-time Kalman ﬁlter model proposed in [16]. The advantage of this spatio-temporal
model with known hyperparameters is to make inferences in a recursive manner as the number
of observations increases. The zero-mean Gaussian process represents a spatial structure
by assuming that the diﬀerence between the parametric mean function and the dynamical
environmental process is governed by a relatively large time scale. This formulation in turn
makes the optimal prediction recursive in time.
In this chapter, however, uncertainties in the precision matrix and sampling positions are
considered in a fully Bayesian manner. In addition, in contrast to [12, 16, 53], the GMRF
with a sparse precision matrix is used to increase the computational eﬃciency.

4.5
4.5.1

Bayesian Predictive Inference
Uncertain hyperparameters and exact localization

In this section, we consider the problem of predicting a spatio-temporal random ﬁeld, using
successive noisy measurements sampled by a mobile sensor network. For a known covariance function, the prediction can be shown to be recursive [12] based on Gaussian process
regression. The uncertainty in θ in a GMRF has been considered and its sequential prediction algorithms are derived in [116]. However, only the static ﬁeld has been considered, i.e.

93

µt = µ0 . In this section, we use a Bayesian approach to make predictive inferences of the
spatio-temporal random ﬁeld zt ∈ Rn for the case with uncertain hyperparameters and the
exact localization. To this end, we use the following assumptions A.1-A.5.
A.1 The spatio-temporal random ﬁeld is generated by (4.13), (4.15), and (4.16).
A.2 The precision matrix Qθ is a given function of an uncertain hyperparameter vector θ.
A.3 The noisy measurements {yt }, as in (4.11), are continuously collected by robotic sensors
in time t.
A.4 The sample positions {qt } are measured precisely by robotic sensors in time t.
A.5 The prior distribution of the hyperparameter vector θ is discrete with a support Θ =
θ(1) , · · · , θ(L) .
Remark 4.5.1. A.1 and A.2 stem from the discretization of a Gaussian process as we described in Section 4.2. From the model in A.1, the zero-mean GMRF represents a spatial
structure by assuming that the diﬀerence between the parametric mean function and the dynamical environmental process is governed by a relatively large time scale. However, this
assumption could be rather strong for cases where data can be ﬁtted better with temporally
correlated GMRFs. A.3 is a standard assumption over the measured observations [53]. A.4
is rather strong but it will be relaxed to A.6 in Section 4.5.2 in order to deal with localization uncertainty. A.5 is from the discretization of the hyperparameter vector to replace an
integration with a summation over possible hyperparameters.

94

For notational simplicity, we denote the full latent ﬁeld of dimension n + p by xt =
T
(zT , βt )T . Let’s deﬁne Dk:r := {Pk−1 , qk:r , yk:r }, where
t

Pk = µx |D , Σx |D } ∪ {P(θ|D1:k )|θ ∈ Θ ,
k 1:k
k 1:k

and P0 is assumed to be known.
We formulate the ﬁrst problem as follows.
Problem 4.5.2. Consider the assumptions A.1-A.5. Our problem is to ﬁnd the predictive
distribution, mean, and variance of xt conditional on D1:t .
In what follows, we summarize the intermediate steps to obtain the solution to Problem 4.5.2.
Lemma 4.5.3. Under assumptions A.1 and A.2, the predictive distribution of xt conditional
on the hyperparameter vector θ and the measurements D1:t−1 is Gaussian with the following
mean and precision matrix




Fs µβt |θ,D1:t−1 
µx |θ,D
=
,
t
1:t−1
µβ |θ,D
t
1:t−1


Qx |θ,D
=

t
1:t−1


−Qθ Fs

Qθ

T
T
−Fs Qθ Fs Qθ Fs + Σ−1
βt |θ,D1:t−1

(4.22)


,


where µβ |θ,D
= At µβ |θ,D
denotes the expectation of βt conditional on θ and
t
1:t−1
t−1
1:t−1
T
D1:t−1 and Σβ |θ,D
= At Σβ |θ,D
AT + Bt W Bt denotes the associated estimation
t
1:t−1
t−1
1:t−1 t

error covariance matrix. For a given hyperparameter vector θ, (4.22) provides the optimal
prediction of the spatio-temporal ﬁeld in time t using data up to time t − 1.
95

The following lemma is used to compute the posterior distribution of θ, recursively.
Lemma 4.5.4. Under assumptions A.3 and A.4, the posterior distribution of the hyperparameter vector θ can be obtained recursively via

P(θ|D1:t ) ∝ P(yt |θ, D1:t−1 , qt ) P(θ|D1:t−1 ),

(4.23)

where the distribution of yt given {θ, D1:t−1 , qt } is Gaussian with the following mean and
variance
,
µy |θ,D
= MtT µx |θ,D
t
t
1:t−1 ,qt
1:t−1

(4.24)

Σy |θ,D
= MtT Σx |θ,D
M + σ 2 I,
t
t
1:t−1 ,qt
1:t−1 t
T
here MtT = [Ht 0] ∈ RN ×(n+p) .

Proof. The posterior distribution of θ given in (4.23) is computed by applying Bayes’ rule on
P(θ|yt , D1:t−1 ). The predictive statistics of yt |θ, D1:t−1 are straightforward results of using
(4.11). Note that P(θ|D1:t−1 ) is equal to P(θ|D1:t−1 , qt ).
Lemma 4.5.5. Under assumptions A.1-A.4, the full conditional distribution of xt for a
given hyperparameter vector and data up to time t is

xt |θ, D1:t ∼ N (µx |θ,D , Q−1
),
xt |θ,D1:t
t
1:t

where
Qx |θ,D =Qx |θ,D
+ σ −2 Mt MtT ,
t
t
1:t
1:t−1
).
µx |θ,D =µx |θ,D
+ σ −2 Q−1
M (y − MtT µx |θ,D
xt |θ,D1:t t t
t
t
t
1:t
1:t−1
1:t−1
96

(4.25)

Remark 4.5.6. In order to keep computing the prediction error covariance matrix Q−1
x |θ,D
t

1:t

alone, the Woodbury lemma could be used to reduce the computational load as follows.

Q−1
x |θ,D
t

1:t

=Q−1
x |θ,D
t

×

1:t−1

− Q−1
x |θ,D
t

1:t−1

Mt

σ 2 I + MtT Q−1
M
xt |θ,D1:t−1 t

−1

(4.26)
MtT Q−1
,
xt |θ,D1:t−1

where Q−1
can be computed with blockwise inversion using (4.22),
xt |θ,D1:t−1




−1
T
Qθ + Fs Σβt |θ,D1:t−1 Fs Fs Σβt |θ,D1:t−1 
.

T
Fs
Σβ |θ,D
ΣT |θ,D
β

=
Q−1

xt |θ,D1:t−1

t

1:t−1

t

1:t−1

.
The blockwise inversion needs to be updated only with Σβ |θ,D
t
1:t−1
The following theorem explicitly illustrates how the results of Lemma 4.5.4 and Lemma
4.5.5 lead to the predictive statistics of xt under assumptions A.1-A.5, which will be the
solution to Problem 4.5.2.
Theorem 4.5.7. Under assumption A.5, the predictive distribution of xt |D1:t is given by

P(xt |D1:t ) =

P(xt |θ, D1:t ) P(θ|D1:t ),

(4.27)

θ∈Θ

where P(θ|D1:t ) and P(xt |θ, D1:t ) are given by Lemmas 4.5.4 and 4.5.5, respectively. The

97

predictive mean and variance follow as

µx |D =
t 1:t

µx |θ,D P(θ|D1:t ),
t
1:t
θ∈Θ

Σx |θ,D + (µx |θ,D − µx |D )(µx |θ,D − µx |D )T P(θ|D1:t ).
t
t
t 1:t
t
t 1:t
1:t
1:t
1:t

Σx |D =
t 1:t
θ∈Θ

(4.28)
Proof. The predictive mean and variance is obtained by marginalizing over the conditional
distribution of θ given D1:t . The marginal mean and variance are E(Y ) = E(E(Y |X)) and
Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)). Having Y := xt |D1:t and X := θ|D1:t completes
the proof of Theorem 4.5.7.
Remark 4.5.8. The optimal prediction of the spatio-temporal ﬁeld xt |θ, D1:t−1 using predictive statistics of xt−1 |θ, D1:t−1 is provided by Lemma 4.5.3. Lemma 4.5.5 provides the
optimal estimator of xt |θ, D1:t , using predictive statistics of xt |θ, D1:t−1 which is given by
Lemma 4.5.3. Using Lemma 4.5.3 and Lemma 4.5.5 sequentially we can update predictive statistics of xt |θ, D1:t for known hyperparameters. Lemma 4.5.4 gives us the posterior
distribution of θ based on the measured data. Finally Theorem 4.5.7 provides the optimal
Bayesian prediction of the spatio-temporal random ﬁeld with a time varying mean function
and uncertain hyperparameters by marginalizing θ over the conditional distribution of θ given
D1:t .
Remark 4.5.9. Under assumption A.5, i.e., a discrete prior on the hyperparameter vector
T
θ with a support Θ = θ(1) , · · · , θ(L) , the predictive distribution of xt = (zT , βt )T (or the
t

predictive mean and variance) conditional on D1:t can be computed explicitly in a constant
time as time t (or as the number of observations) increases in a recursive way. For a

98

Algorithm 3 Sequential Bayesian predictive inference
Initialization:
1: initialize Fs
2: for θ ∈ Θ, initialize Qθ , and compute Q−1
θ
At time t ∈ Z>0 , do:
1: obtain new observations yt collected at current locations qt
2: ﬁnd the map Mt from qt to spacial sites Q, and compute radial basis values Fqt in
qt .
3: for θ ∈ Θ do
4: predict µx |θ,D
and Qx |θ,D
using measurements up to time t − 1, given
t
t
1:t−1
1:t−1
by (4.22).
5: compute µx |θ,D and Qx |θ,D given by (4.25).
t
t
1:t
1:t
6: compute µy |θ,D
,qt and Σyt |θ,D1:t−1 ,qt given by (4.24).
t
1:t−1
7: calculate P(θ|D1:t ) given by (4.23).
8: end for
9: compute the predictive mean and variance using (4.28).

continuous prior distribution P(θ), the summations in (4.27) and (4.28) need to be replaced
by integrations with respect to θ, which need to be approximated. The proposed solution to
the formulated problem is summarized by Algorithm 3.

4.5.2

Uncertain hyperparameters and localization

In the previous section, we assumed that the localization data q1:t is exactly known. However, in practice positions of sensor networks cannot be measured without noise. Instead,
for example, there could be several probable possibilities inferred from the measured position. In this section, the proposed method in the previous section will be extended for the
uncertain localization data. To the best of our knowledge, this is the ﬁrst attempt to obtain
predictive statistics for a case where both uncertainties in hyperparameters and localization
are simultaneously included in a fully Bayesian perspective. In what follows, we present our
idea in a problem statement and the corresponding solution.

99

In order to take into account the uncertainty in the sampling positions, we replace assumption A.4 with the following assumption A.6.
(i)

(i)

q
A.6 The prior distribution P(qt = ˚t ) is discrete with a support ˚t ∈ Ω(t), and
q

Ω(t) = (˚(k1 ) , · · · ,˚(kN ) )|(k1 , · · · , kN ) ∈ L(t)
s
s

, which is given at time t along with the corresponding measurement yt . Here, γ(t) =
|L(t)| denotes the number of the probable possibilities for qt .
A straightforward consequence of Assumption A.6 is that the prior distribution P(qk:r ) is
discrete with a support Ω(k : r) :=

r
g=k Ω(g). In addition, L(k : r) :=

the index in the support Ω(k : r), and γ(k : r) :=

r
g=k L(g) denotes

r
g=k γ(g) is the number of the probable

possibilities for qk:r . Now we state the problem as follows.
Problem 4.5.10. Consider assumptions A.1, A.2, A.3, A.5, and A.6. Our problem is to
ﬁnd the predictive distribution, mean and variance of xt conditional on the prior P0 and the
measurements y1:t .
For the sake of conciseness, let us deﬁne Rr:k := {Pr−1 , yr:k }. We then have that
˜
Rr:k ⊂ Dr:k , where we recall that Dr:k := {Pr−1 , qr:k , yr:k }.
To solve Problem 4.5.10, we ﬁrst seek for a way to compute the posterior distribution of
qt as summarized in the following lemma.
(n)

(k)

(n)

Lemma 4.5.11. Consider P(yt |θ, D1:t−1 ,˚t ) given by (4.24) and P(θ|D1:t−1 ) given by
q
(n)

(4.23), where n ∈ L(1 : t − 1), k ∈ L(t), and D1:t−1 :=

100

(n)

P0 ,˚1:t−1 , y1:t−1 . Under
q

assumption A.5, we have

(n)

(k)

(k)

(n)

(n)

q
P(yt |θ, D1:t−1 ,˚t ) P(θ|D1:t−1 ).

q
P(yt |D1:t−1 ,˚t ) =
θ∈Θ

(k)

(n)

Under assumption A.6, the posterior distribution of q1:t = ˚1:t−1 ,˚t
q
q

can be obtained,

recursively, via

(n)

(k)

(k)

(n)

(n)

(k)

q
q
P ˚1:t−1 ,˚t |R1:t ∝ P(˚1:t−1 |R1:t−1 ) P(yt |D1:t−1 ,˚t ) P(˚t ).
q
q
q

(4.29)

Proof. The proof is a straightforward result of applying Bayes’ rule.
We now give the exact solution to Problem 4.5.10 as follows.
(i)

Theorem 4.5.12. Consider the predictive distribution P(xt |D1:t ) given by Theorem 4.5.7
(i)

(i)

and the posterior P ˚1:t |R1:t
q

(i)

given by Lemma 4.5.11, where ˚1:t ∈ Ω(1 : t) and D1:t =
q

(i)

P0 ,˚1:t , y1:t . Under assumption A.6, the predictive distribution of xt |R1:t can be obtained
q
as follows.
(i)

(i)

q
P xt |D1:t P ˚1:t |R1:t .

P (xt |R1:t ) =

(4.30)

i∈L(1:t)

Consequently, the predictive mean and variance are given by the formulas.

µx |R =
t 1:t

(i)
q
(i) P ˚1:t |R1:t ,
x |D
i∈L(1:t) t 1:t

µ

Σx |R =
t 1:t

Σ
i∈L(1:t)

(i) +
xt |D1:t

µ

(i) − µxt |R1:t
xt |D1:t
T

×

µ

(i) − µxt |R1:t
xt |D1:t

101


 P ˚(i) |R1:t .
q1:t

(4.31)

Proof. The proof is similar to that of Theorem 4.5.7. Hence, the predictive mean and variance
are obtained by marginalizing over the conditional distribution of q1:t . The marginal mean
and variance are E(Y ) = E(E(Y |X)) and Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)). Having
Y := xt |R1:t and X := q1:t |R1:t proves Theorem 4.5.12.
Remark 4.5.13. The complexity of the proposed algorithm in Theorem 4.5.12 is proportional
to the number of possibilities for q1:t . The result of Lemma 4.5.11 enables us to compute these
probable possibilities recursively. However, the number of the non-zero probable combinations
grows exponentially by a power of time. In other words, the number of possibilities for q1:t
is the multiplication of the numbers of possibilities for q1:t−1 and possibilities for qt . Note
that this growing computational complexity of this exact solution, given in Theorem 4.5.12,
to Problem 4.5.10 is prohibitive for sensor networks with limited resources.
In what follows, we propose an approximation, with a controllable tradeoﬀ between approximation error and complexity, to the exact solution given in Theorem 4.5.12 by including
an option of ignoring uncertainties on past position data. This approximation will include
a case of the exact solution with the maximal and original complexity. The idea is based
on the fact that the estimation of xt is more susceptible to the uncertainties in recently
sampled positions as compared to old ones. To formulate our idea clearly, we present ﬁrst
the following results.
Lemma 4.5.14. Using prior distribution of xt−m and measured data yt−m+1:t , where m ∈
Z>0 , the posterior distribution of qt−m+1:t can be obtained recursively via
(j)

(k)

P ˚t−m+1:t−1 ,˚t |Rt−m+1:t ∝
q
q
(j)
(j)
(k)
(k)
P(˚t−m+1:t−1 |Rt−m+1:t−1 ) P(yt |Dt−m+1:t−1 ,˚t ) P(˚t ).
q
q
q

102

(4.32)

where j ∈ L(t − m + 1 : t − 1), and k ∈ L(t).
Proof. The proof is a straightforward result of applying Bayes’ rule.
Theorem 4.5.15. Consider µ

and Σ
computed by Theorem 4.5.7.
(h)
(h)
xt |Dt−m+1:t
xt |Dt−m+1:t

Under assumption A.6, the predictive statistics of xt |Rt−m+1:t are as follows.

µx |R
=
t t−m+1:t

(h)
P ˚t−m+1:t |Rt−m+1:t ,
q
(h)
x |D
h∈L(t−m+1,t) t t−m+1:t

µ

Σx |R
=
t t−m+1:t

Σ

+
(h)
xt |Dt−m+1:t

h∈L(t−m+1,t)

µ

− µx |R
(h)
t t−m+1:t
xt |Dt−m+1:t
T

×

µ

− µx |R
(h)
t t−m+1:t
xt |Dt−m+1:t

(4.33)


 P ˚(h)
qt−m+1:t |Rt−m+1:t .

Proof. The proof is similar to that of Theorem 4.5.12.
To implement approximations to the predictive statistics of xt |R1:t which are given by
Theorem 4.5.12, we consider the following conditions.
C.1 For 1

m ≤ t, we have that

P(xt |R1:t ) ≈ P(xt |Rt−m+1:t )

C.2 For 1

(4.34)

m ≤ t, Pt can be approximated by

Pt ≈ µx |R
,Σ
∪ {P(θ|Rt−m+1:t )|θ ∈ Θ} .
t t−m+1:t xt |Rt−m+1:t

103

(4.35)

Under conditions C.1 and C.2, it is natural for us to propose the following approximations.

µx |R ≈ µx |R
,
t 1:t
t t−m+1:t

(4.36)

Σx |R ≈ Σx |R
.
t 1:t
t t−m+1:t
(h)

Remark 4.5.16. In Theorem 4.5.15, the predictive statistics of xt |Dt−m+1:t are obtained
from Algorithm 3 which is given in Section 4.5.1. The only diﬀerence is that we start from
time t − m + 1 instead of time 1 with Pt−m instead of P0 . Note that without condition
(h)

C.2 we cannot use Algorithm 3 to calculate the statistics of xt |Dt−m+1:t . The proposed
approximation for the case of uncertain localization in (4.36) is quite diﬀerent from a mere
truncation of old data in the sense that past measurements still aﬀect the current estimation
through the approximately updated prior information using (4.35). Note that we update the
prior information from Pt−m to Pt with the cumulative data collected from time t−m+1 up to
time t, which is diﬀerent from only using truncated observations. The proposed approximation
for the formulated problem is summarized by Algorithm 4.
To further understand the nature of the proposed approximation, consider the following
two extreme special cases.
Corollary 4.5.17. As a special case of Theorem 4.5.15 for m = 1, the posterior distribution
of qt can be obtained via

(k)

(k)

(k)

P(˚t |Pt−1 , yt ) ∝ P(yt |Pt−1 ,˚t ) P(˚t ),
q
q
q

(4.37)

where k ∈ L(t). The predictive distribution of xt |R1:t can be approximated in a constant

104

time as time t increases in a sequential way.

P (xt |R1:t ) ≈ P (xt |Pt−1 , yt ) ,

where

(k)

(k)

P xt |Pt−1 ,˚t , yt P ˚t |Pt−1 , yt
q
q

P (xt |Pt−1 , yt ) =

(4.38)

k∈L(t)

(k)

q
and the posterior P ˚t |Pt−1 , yt

is given by (4.37). Consequently, the predictive mean

µx |R and variance Σx |R can be approximated by µx |P ,y and Σx |P ,y , respect t−1 t
t t−1 t
t 1:t
t 1:t
tively, i.e.,

µx |P ,y =
t t−1 t

(k)
P ˚t |Pt−1 , yt ,
q
(k)
x |P
,˚ ,yt
q
k∈L(t) t t−1 t

µ

Σ

Σx |P ,y =
t t−1 t
k∈L(t)

+
(k)
xt |Pt−1 ,˚t ,yt
q

µ

− µx |P ,y
(k)
t t−1 t
xt |Pt−1 ,˚t ,yt
q
T

×

µ

− µx |P ,y
(k)
t t−1 t
xt |Pt−1 ,˚t ,yt
q

(4.39)


 P ˚(k) |Pt−1 , yt .
qt

Corollary 4.5.18. For another special case with m = t, Theorem 4.5.15 becomes Theorem
4.5.12.
Remark 4.5.19. For a ﬁxed m ∈ Z>0 , Algorithm 4 is scalable as time t (or the number
of observations) increases. In our approach, the level of the approximation can be controlled
by users by selecting a tradeoﬀ (or by choosing a value for m) between the approximation
error and complexity. Most simplistic approximation can be obtained by choosing m = 1
as in Corollery 4.5.17. The original exact solution with maximal complexity is recovered by

105

Algorithm 4 Sequential Bayesian predictive inference approximation with uncertain localization.
At time t ∈ Z>0 , do:
1: obtain new observations yt along with the probabilities for locations P(qt )
(h)
2: for ˚t ∈ Ω(t − m + 1 : t) do
q
3:
4:
5:
6:
7:

predict µ

(j)
(k)
,Σ
and P(yt |Dt−m+1:t−1 ,˚t ) using Algorithm
q
(h)
(h)
xt |Dt−m+1:t
xt |Dt−m+1:t

3
(h)
compute P ˚t−m+1:t |Rt−m+1:t using Lemma 4.5.14.
q
end for
compute µx |R
and Σx |R
, using Theorem 4.5.15.
t t−m+1:t
t t−m+1:t
use following approximation to update estimations
,
µx |R ≈ µx |R
t t−m+1:t
t 1:t
,
Σx |R ≈ Σx |R
t t−m+1:t
t 1:t

Pt ≈ µx |R
,Σ
, P(Θ|Rt−m+1:t ) .
t t−m+1:t xt |Rt−m+1:t

selecting m = t as shown in Corollery 4.5.18.

4.5.3

Complexity of Algorithms

In this section, we discuss complexity aspects of the proposed algorithms. For a ﬁxed number
of the radial basis functions (i.e., p) and a ﬁxed number of the special sites (i.e., n), the computational complexity of Algorithm 3 is dominated by (4.25). The complexity of Algorithm 3
in each time step is O(LN 2 ), where L is the number of possible hyperparameter vectors and
N is the number of agents. The complexity of Algorithm 4 in time t is O (γ(t − m + 1 : t))
times the complexity of Algorithm 3 for m time steps. Hence, the complexity of Algorithm
4 in time t is O γ(t − m + 1 : t)LN 2 M .
The numbers of special sites and radial basis functions aﬀect the complexity of Algorithm
3 as well. The complexity of the three cases is O(n3 ) with respect to the number of special

106

sites due to the matrix inversion. Thus, if we use ﬁner grids we increase the number of
special sites, i.e. n, and the complexity increase cubical. For a ﬁxed set of L, n and N , the
complexity of Algorithm 3 with respect to p is O(p3 ).

4.6

Simulation Results

In this section, we demonstrate the eﬀectiveness of the proposed sequential Bayesian inference
algorithms using a numerical experiment. Consider a scenario, where mobile sensing agents
can only move on a regular lattice, denoted by G, which is shown with circles in Fig. 4.3.
This reachable set of robotic sensors is a subset of the spatial sites (i.e., G ⊂ Q). Note
that Q is a ﬁner grid graph on which the GMRF is realized. The spatial sites are shown
as intersections between dashed lines in Fig. 4.3. Assume that some agents measure their
true positions exactly while others only know a ﬁnite set of probable positions with the
associated probabilities. Fig. 4.3 shows an illustrative conﬁguration where agents 1 and 3
measure sampling positions exactly while agent 2 knows only a set of uncertain possibilities
(as shown with 2a, 2b, 2c, and 2d) for a given measured sampling position (a solid star). The
prior probability that the true sampling position of agent 2 is located in 2a is proportional
2
to the rectangular area between the star and 2d, i.e., 16 . The other prior probabilities of 2b,

2c, and 2d are deﬁned in the same way, which provides a high probability for a node whose
location is close to the observed sampling location. This way of assigning priors guarantees
that the sum of prior probabilities is equal to one. In this simulation, we allow two agents
get measurements at the same location and time. Repeatedly sampled positions are denoted
by large circles in Fig. 4.5-(g).
In this illustrative example, we use the spatio-temporal ﬁeld introduced in Section 4.4.

107

1

3

2a

2b

2c

2d

Figure 4.3: Agents 1 and 3 measure their exact sampling positions while agent 2 has a noisy
sampling position measured in the star location. Four possible sampling positions for the
2 6 2
true sampling position of agent 2 are 2a, 2b, 2c and 2d with the prior probabilities 16 , 16 , 16 ,
6
and 16 , respectively.
The spacial sites in Q consist of 51×51 grid points, i.e., n = 2601, uniformly distributed over
the surveillance region [−25, 25] × [−25, 25]. The time-varying mean function µt consists of
ten radial basis functions (i.e., p = 10), which are deﬁned as follow.

fj (q [i] ) = exp

−

q [i] − ξj 2
2
2σj

,

j ∈ {1, · · · , p},

(4.40)

where σj is the bandwidth and ξj is the center location of the j-th radial basis function. The
ﬁrst radial basis function has an inﬁnity bandwidth (i.e., σ1 = ∞) to represent the average
of the ﬁeld, and the others have a bandwidth equal to σj = 15. The centers of radial basis
functions are {(0, 0)} ∪ {−15, 0, 15} × {−15, 0, 15}. The prior distribution of β0 is chosen to
be β0 ∼ N (0, 5I). The time evolution of βt is modeled by (4.15), where the state matrix
At and the input matrix Bt are given by 0.95I10×10 and 0.5I10×10 , respectively. The input
disturbance variance is known to be W = I10×10 .

108

(a)

(b)

(c)

4

2
0

2
0

2
1

−2

−2

−4

−4

0
−1
−2

Figure 4.4: (a) The realized spatio-temporal ﬁeld at time t = 2 is the summation of (b) the
smooth mean ﬁeld and (c) the GMRF as described in (4.13).
We choose the precision matrix Qx|θ with hyperparameters α = 0.01 equivalent to a
√ √
bandwidth = 2/ α ≈ 14.14, and κ = 10 equivalent to σf 2 = 1/4πακ ≈ 0.8. The
measurement noise variance σ 2 = 0.1 is assumed to be known.
The numerically simulated ﬁeld zt from our model in (4.13) is shown in Fig. 4.4-(a).
From the same realization, the mean function µt and the GMRF ηt , as in (4.13), are shown
in Fig. 4.4-(b) and Fig. 4.4-(c), respectively.
Ten mobile sensing agents, i.e, N = 10, take measurements at time t ∈ {1, 2, · · · , 10},
where only three agents are assumed to have uncertain sampling positions. In Figs. 4.5-(g),
(h), and (i), true, noisy, and probable sampling positions are shown in circles, stars, and
corners of squares, respectively.
The prediction results are summarized for three methods of prediction as follows.
• Case 1: Figs. 4.5-(a), (d), and (g) show the prediction, prediction error variance, and
squared (empirical) error ﬁelds, using Algorithm 3 with exact sampling positions. With
the true sampling positions, the best prediction quality is expected for this case.
• Case 2: Figs. 4.5-(b), (e), and (h) show the resulting ﬁelds, by applying Algorithm 3
naively to measured sampling positions including noisy locations. The results clearly
109

illustrate that naively applying Algorithm 3 to noisy sampling positions can potentially
distort prediction at a signiﬁcant level as shown in Figs. 4.5-(b) and (h).
• Case 3: Figs. 4.5-(c), (f), and (i) show the resulting ﬁelds, by applying Algorithm 4
with m = 1. In particular, we use the result from Corollary 4.5.17 (i.e., m = 1) to evaluate the eﬀectiveness of the proposed approximation in marginalizing the uncertainty
in sampling positions. The resulting prediction quality is much improved as compared
to Case 2 and is even comparable to the result for Case 1.
Cases 1, 2, and 3 will be also applied to realistic experimental data in Section 4.7.
Fig. 4.6-(a) shows the RMS errors of the estimated βt for Cases 1, 2, and 3 in time. The
RMS errors are deﬁned as follows.

RMS(t) =

1
p

p
i=1

[i]
(µ [i] − βt )2 .
β

(4.41)

t

Fig. 4.6-(a) clearly illustrates that the RMS errors of Cases 1 and 3 are similarly lower than
that of Case 2 for all time.
Now we examine the convergence rate of hyperparameters. The prior distribution of the
hyperparameter vector θ is discrete with a support

Θ = {(κ, α), (0.1κ, α), (10κ, α), (κ, 0.1α), (κ, 10α)} ,

along with the corresponding uniform probabilities {0.2, 0.2, 0.2, 0.2, 0.2}.

As shown in

Fig. 4.6-(b), the updated posterior probabilities of the hyperparameters (κ, α) converge to
the true hyperparameters for Case 1 (blue circles) and Case 3 (red squares), as the number
of observations increases. However, Case 2 (green stars) does not converge to the true ones.
110

(a)

(b)

(c)
3
2
1
0
−1
−2
−3

(d)

(e)

(f)
25
20
15
10
5

(g)

(h)

(i)

0

10
8
6
4
2
0

Figure 4.5: The prediction results of Cases 1, 2, and 3 at time t = 2 are shown in the
ﬁrst, second, third columns, respectively. The ﬁrst, second, and third rows correspond to
the prediction, prediction error variance, and squared empirical error ﬁelds between the
prediction and true ﬁelds. True, noisy, and probable sampling positions are shown in circles,
stars, and corners of squares, respectively.

111

(a)

10

Case 1
Case 2
Case 3

8
6
4
2

0

2

5

10

15

20

(b)

1
0.8

Case 1
Case 2
Case 3

0.6
0.4
0.2
0

0

2

5

10

15

20

Figure 4.6: (a) The RMS estimation error of βt v.s. time and (b) the posterior probability
of the true hyperparameter vector v.s. time. The results for Cases 1, 2, and 3 are shown in
blue circles, green stars, and red squares, respectively.
Note that the results in Figs. 4.4 and 4.5 are obtained at time t = 2. The corresponding
values for t = 2 in Fig. 4.6 are highlighted by a gray area.
Fig. 4.7 shows the controllable tradeoﬀ between approximation error and complexity.
The complexity of Algorithm 4 increases exponentially with respect to m. The predicted
ﬁeld is compared between Case 1, which is the best prediction quality expected, and Case 3.
Clearly, by increasing m, the mean square diﬀerence between Case 1 and Case 3 decreases.
However, this improvement costs exponentially increasing computational load.
Fig. 4.8 shows the eﬀect of increasing number of observations with uncertain sampling
positions on the three cases. Here we assume that we have seven observations with known
true sampling positions plus a few observations with uncertain sampling positions. Clearly
the RMS error decreases in Case 1 and Case 3 by adding new observations with true and
112

The mean square diffrence

2.5
2
1.5
1
0.5
0

1

2
3
4
The approximation order m

5

Figure 4.7: The mean square diﬀerence between Case 1 and Case 3 is shown for the diﬀerent
approximation orders m = 1, · · · , 5. On each box, the central mark is the median, the upper
and lower edges of the box are the 25th and 75th percentiles, respectively. The whiskers
extend to the most extreme data points.
uncertain sampling positions, respectively. On the other hand, as shown in Fig. 4.8, adding
new observations with noisy sampling positions could increase RMS error for Case 2.
In this example, the ﬁxed running time using Matlab R2009b (MathWorks) on a PC
(2.4 GHz Intel Core 2 Duo Processor) is about 3 minutes for the proposed approximation
(m = 1) which is fast enough for real world implementation. In addition, the quality of
prediction by Algorithm 4 with a ﬁnite m could be as good as one using Algorithm 3 with
the true sampling positions.

4.7

Experimental Results

Similarly to [82], a vision-based robotic sensor was built by the authors to validate the
proof of concept for the proposed methods. The wireless mobile robot is equipped with two
motorized wheels, a micro-controller, and a 360 degree omnidirectional camera, a motion
sensor, a wireless receiver, and transmitter. The omnidirectional camera is homemade from
a cheap Wi-Fi remote CCD camera (Ai-Ball Treck R ) and a globe mirror. The vision images

113

Average of RMS Error

1.6
Case 1
Case 2
Case 3

1.5
1.4
1.3
1.2
1.1
1

0

1
2
Number of uncertain sampling positions

3

Figure 4.8: The number of observations with uncertain sampling position on the three different cases is shown on the horizontal axis. The ﬁeld prediction RMS error for Case 1, Case
2 and Case 3 are shown with white, black, and hatched bars, respectively.
of the 360 degree environment around the robot produced by the omnidirectional camera
are steamed via 802.11 b/g Wi-Fi interface to a laptop for image processing (see Fig. 4.9).
In this section, we apply the proposed prediction algorithms to real experimental data.
Fig. 4.10 shows our experimental setup in which a redness intensity ﬁeld is sampled by
the captured images from the CCD camera on top of the mobile robot. The CCD camera
captures 360 degree images. The redness intensity is computed by simply averaging the
red component of the RGB picture. Noisy measurements are sampled at random sampling
positions by our robot. The position of the robot has been measured by an image processing
software which is built by the authors and the true sampling positions are obtained manually.
The objective is to predict the redness intensity ﬁeld using the scalar ﬁeld model proposed
in Section 4.4. Note that each image has a lot of information. However, we only used the
redness out of each image as a scalar value of interest. In the future, we plan to extend this
experiment for multivariate random ﬁelds for multiple features from each image. Here we are

114

Figure 4.9: A vision-based robot is built by the authors, and equipped with a 360 degree
omnidirectional camera.

Figure 4.10: The experimental setup is shown. The measured position of the robot and 4
possible sampling positions are shown in dark green star and light green squares, respectively.
The spacial sites are marked with aqua blue dots on the ground. The panoramic view of the
robot is pictured on the upper left hand side of the ﬁgure.
115

considering two diﬀerent scenarios. First, we consider a static ﬁeld and then a time-varying
ﬁeld with a moving person in the surveillance region.

4.7.1

The Static Experiment

In this study, the spatial sites in Q are considered to be 10 × 26 grid points, i.e., n = 260.
The grid points are shown in Fig. 4.10 with aqua-blue dots. The mean function µt consists
of only one radial basis function that keeps moving average of the ﬁeld. Note that this basis
function can model the changes in the brightness of the images caused by the slow changes of
the environment lights. The center of the radial basis function is (5, 13), and its bandwidth
σ1 is ∞. The prior distribution of the hyperparameter vector θ is chosen to be discrete with
a support Θ = {25, 50, 100, 200, 400} × {0.1, 0.2, 0.4, 0.8, 1.6} and the associated uniform
probabilities. The measurement noise variance σ = 0.01 is estimated.
To demonstrate the usefulness of our model in (4.13), (4.15), and (4.16), and our prediction algorithm, the mobile robot measures eighty samples, i.e., N = 80, where only nine
sampling positions are uncertain with four possibilities each. Thus there is 49 possible combinations for this set of sampling positions. After two set of observations, the resulting
posterior probabilities of the hyperparameters for Case 1 is shown in Fig. 4.11. Fig. 4.11
shows the estimated hyperparameters converge to κ = 100 and α = 0.8, where is equivalent
with

= 1.58 and σf = 0.0315.

Similarly to Section 4.6, the prediction and prediction error variance are computed for
Cases 1, 2, and 3. However, since it is a realistic experiment, the ground truth of the ﬁeld of
interest is not available and so we are not able to compute the empirical error. The prediction
and the prediction error variance using true sampling positions (Case 1), are shown in Figs.
4.12-(a) and (d), respectively. Figs. 4.12-(b) and (e) show the resulting ﬁelds, by applying
116

probability
Figure 4.11: The posterior probability of the hyperparameter vector at t = 2.
Algorithm 3 naively to measured sampling positions including noisy locations (Case 2).
Figs. 4.12-(c) and (f) show the results by applying Algorithm 4 with m = 1 (Case 3).
The results conﬁrm that the quality of the prediction in Case 3 is not much compromised as
compared to Case 1 and demonstrate the capability of our proposed algorithm to deal with
uncertain sampling positions.

4.7.2

The Dynamic Experiment

In this scenario, the spatial sites are the same as in the previous experiment. The mean
function µt consists of twenty nine radial basis functions. The centers of radial basis functions
are {(5, 13)} ∪ {1, 4, 7, 10} × {1, 5, 9, 13, 17, 21, 25}. The time evolution of βt is modeled
by (4.15), where the state matrix At and the input matrix Bt are given by I and 0.05I,
respectively. The ﬁrst radial basis function has an inﬁnity bandwidth (i.e., σ1 = ∞) to
represent the average of the ﬁeld, and the others have a bandwidth that is equal to σj = 4.

117

(a)

(b)

(c)

25

25

25

20

20

20

15

15

15

10

10

10

5

5

5

2 4 6 8 10

2 4 6 8 10

(d)

0.9
0.85
0.8
0.75

2 4 6 8 10

(e)

0.7

(f )

25

25

25

20

20

20

15

15

15

10

10

10

−8

5

5

5

−9

2 4 6 8 10

2 4 6 8 10

−5
−6
−7

2 4 6 8 10

Figure 4.12: The prediction results of Cases 1, 2, and 3 at time t = 2 are shown in the ﬁrst,
second, third columns, respectively. The ﬁrst and second rows correspond to the predictions
and the natural logarithm of the prediction error variance.
In this experiment, a person moves in the surveillance region while the robot is collecting
observations. The robot collects 57, 40, 31, and 45 observations corresponding to times
t = 1, 2, 3, and 4, respectively. The ﬁrst column in Fig. 4.13 shows the positions of moving
robot and person in the domain. As mentioned, the position of the robot has been measured
with a ﬁxed camera and help of an image processing software. Sometimes, the robot is not
visible by the positioning system since the moving person blocks the visual contact between
the robot and the positioning system. The black areas in the second column of Fig. 4.13
118

show the blind spots of the positioning system. Note that other positioning systems like
GPS also have blind spots such as GPS denied areas. Therefore, when the robot moves to
a blind spot the position cannot be determined precisely. In this case, we assign diﬀerent
probabilities on multiple sampling positions.
True sampling positions in each time step are shown with blue dots in the second column
of Fig. 4.13 while just blue dots in the white area are measured through the positioning
system and the positions in the blind spots are recorded manually for a comparison purpose.
The prediction results of Case 1 and Case 3 are compared with a trivial method of
prediction deﬁned as follows.
• Case 4: The ﬁfth column of Fig. 4.13 shows the resulting ﬁelds, by applying Algorithm 3 on just observed sampling positions. Here all the observations whose sampling
positions are uncertain are discarded. In particular, 12, 3, 2, and 4 observations have
been discarded for time t = 1, 2, 3 and 4, respectively.
The predicted ﬁeld using true sampling positions (Case 1) and uncertain sampling positions (Case 3) are shown in the third and forth columns of Fig. 4.13, respectively. Since in
the fully automated experiment true sampling positions in the blind spot are not available,
we used the proposed algorithm in this chapter to deal with uncertain sampling positions
in the blind spot areas. The predicted ﬁeld simply by discarding uncertain sampling positions is shown in the ﬁfth column of Fig. 4.13. Clearly, the results obtained for Case
3 with m = 1, is comparable to the result for Case 1 that is the best prediction quality
expected among all cases. Thus, the experimental result demonstrates the eﬀectiveness of
the proposed algorithm in this chapter.

119

25

25

25

20

20

20

20

15

15

15

15

10
t=1

25

10

10

10

5

5

5

5

2 6 10

2 6 10

25

25

25

20

20

20

20

15

15

15

15

10

10

10

10

5

5

5

5

2 6 10

2 6 10

25

25

0.75

25

20

20

20

20

15

15

15

15

10

10

10

10

5

5

5

5

2 6 10

2 6 10

0.9

0.75
2 6 10

2 6 10

25

25

25

25

20

20

20

20

15

15

15

15

10
t=4

0.9

2 6 10

2 6 10

25

t=3

0.75
2 6 10

2 6 10

25

t=2

0.9

10

10

10

5

5

5

5

2 6 10

2 6 10

2 6 10

0.9

0.75
2 6 10

Figure 4.13: The time varying ﬁeld is shown for 4 time iterations. The ﬁrst column shows
the moving object in the ﬁeld. The second column shows the sampling positions with blue
dots and the positioning blind spot with black areas. the third, forth and ﬁfth columns show
predicted ﬁeld for Case 1, Case 3 and Case 4, respectively. The rows are correspond to the
time intervals t = 1, 2, 3 and 4.

120

4.8

Conclusion

We have tackled a problem of predicting a spatio-temporal ﬁeld using successive noisy measurements obtained by robotic sensors, some of which have uncertain localization. We developed the spatio-temporal ﬁeld of interest using a GMRF and designed sequential prediction
algorithms for computing the exact and approximated predictive inference from a Bayesian
point of view. The most important contribution is that the computation times for Algorithm 3 and Algorithm 4 with a ﬁnite m at each time step do not grow as the number of
measurements increases. Two diﬀerent experimental results provide a solid proof of concept
on the proposed scheme.
The proposed algorithm will be useful for robotics applications such as environmental
monitoring by autonomous aquatic robots [40], decentralized environmental modeling by
mobile sensor networks [64], gradient climbing of environmental ﬁelds [10,62] with or without
localization uncertainty. Furthermore, our approach can provide a fully Bayesian way to
automatically tune the hyperparameters in Gaussian process-based SLAM [7, 51, 77, 105]
with increase computational complexity.

121

Chapter 5
Simultaneous Localization and Spatial
Prediction
Abstract: This chapter investigates a fully Bayesian way to solve the simultaneous localization and spatial prediction (SLAP) problem using a Gaussian Markov random ﬁeld (GMRF)
model. The objective is to simultaneously localize robotic sensors and predict a spatial ﬁeld
of interest using sequentially obtained noisy observations collected by robotic sensors. The
set of observations consists of the observed uncertain poses of robotic sensing vehicles and
noisy measurements of a spatial ﬁeld. To be ﬂexible, the spatial ﬁeld of interest is modeled
by a GMRF with uncertain hyperparameters. We derive an approximate Bayesian solution
to the problem of computing the predictive inferences of the GMRF and the localization,
taking into account observations, uncertain hyperparameters, measurement noise, kinematics of robotic sensors, and uncertain localization. The eﬀectiveness of the proposed algorithm
is illustrated by simulation results.

5.1

Introduction

The simultaneous localization and mapping (SLAM) problem is important to be solved
for a robot to explore an unknown environment under localization uncertainty [21]. The

122

variations of the SLAM problem are surveyed and categorized with diﬀerent perspectives
in [97]. In general, most SLAM problems have strong geometric models [21,32,61,71,72,97].
For example, a robot learns the locations of the landmarks while localizing itself using
triangulation algorithms. Such geometric models could be classiﬁed in two groups, viz., a
sparse set of features which can be individually identiﬁed, often used in Kalman ﬁltering
methods [21], and a dense representation such as an occupancy grid, often used in particle
ﬁltering methods [31].
In contrast to the SLAM problem with popular geometrical models, there is a growing
number of practical scenarios in which no such geometric model exists. Consider localization
using various spatially distributed (continuous) signals such as distributed wireless Ethernet
signal strength [27], or multi-dimensional magnetic ﬁelds [105]. Underwater autonomous
gliders for ocean sampling cannot ﬁnd usual geometrical models from measurements of environmental variables such as pH, salinity, and temperature [62]. Furthermore, there are
reasons to avoid the geometric model as well, even when a geometric model does exist. Such
cases may include: 1) the diﬃculty of reliably extracting sparse, stable features, 2) the ability
to use all sensory data directly rather than a relatively small amount of abstracted discrete
information obtained from feature extraction algorithms, and 3) high computational and
storage costs of dealing with dense features.
Motivated by the aforementioned situations, in this chapter, we consider scenarios without geometric models and tackle the problem of simultaneous localization and prediction
(SLAP) of a spatial ﬁeld.
Nonparametric modeling and prediction techniques for random ﬁelds have been exploited
for mobile robotic sensors [10, 12, 62, 64, 117]. Random ﬁelds such as Gaussian processes and
Gaussian Markov random ﬁelds (GMRFs) [14, 84] have been frequently used for mobile
123

sensor networks to statistically model physical phenomena such as harmful algal blooms,
pH, salinity, temperature, and wireless signal strength e.g., [30, 57, 119].
The recent research eﬀorts that are closely related to our problem are summarized as
follows. In [43], the authors formulated Gaussian process regression under uncertain localization. In [42], the authors used a GMRF with uncertain hyperparameters and tackled a
problem of prediction of the random ﬁeld under localization uncertainty. However, kinematics or dynamics of the sensor vehicles were not incorporated in [42, 43]. In [7], Gaussian
process regression was used to model geo-referenced sensor measurements (obtained from a
camera). After training with data including noisy measurements and their exact sampling
positions, a maximum likelihood estimator was used to ﬁnd the best match for the location
of each of newly sampled measurements. However, this was not SLAM since the training
step has to be performed a priori for a given environment [7]. In [51, 105], Gaussian process
regression was also used to implement SLAM using a magnetic ﬁeld and the feasibility of
such approaches were shown experimentally. The work in [77] used laser range-ﬁnder data
to probabilistically classify the robot’s environment into a region of occupancy. It provides a
continuous representation of robot’s surroundings by employing a Gaussian process. In [27],
so called a WiF-SLAM problem was solved using a Gaussian process latent variable model
(GP-LVM). However, the accurately known training data and the independence across the
dimensions and instantiations of the data were assumed in [27], which may not be practical.
To the best of our knowledge, most work related to our SLAP problem did not address
uncertainties in the hyperparameters of the Gaussian process in a fully Bayesian way. In
most of the previous work, the hyperparamters in the model were estimated oﬄine a priori.
In this chapter, we formulate the simultaneous localization and spatial prediction (SLAP)
problem, in order to simultaneously localize robotic sensors and predict a spatial random
124

ﬁeld of interest using sequentially obtained noisy observations collected by robotic sensors.
The set of observations consists of the observed uncertain poses of robotic sensing vehicles
and noisy measurements of a spatial ﬁeld. To be ﬂexible, the spatial ﬁeld of interest is
modeled by a GMRF with uncertain hyperparameters. We then derive an approximate
Bayesian solution to the problem of computing the predictive inferences of the GMRF and
the localization, taking into account observations, uncertain hyperparameters, measurement
noise, kinematics of robotic sensors, and uncertain localization. The eﬀectiveness of the
proposed algorithm is illustrated by simulation results.

5.2

Sequential Bayesian inference with a GMRF

In this section, we deﬁne a GMRF model in detail, formulate the problem, and provide its
solution.

5.2.1

Gaussian processes and Gaussian Markov random ﬁelds

In this section, we brieﬂy introduce a GMRF as a discretized Gaussian process on a lattice.
Consider a Gaussian process: z ∼ GP(µ, K), where µ is the mean vector, and K ∈ Rn×n is
the covariance matrix. We discretize the compact domain S := [0 max(s1 )] × [0 max(s2 )]
into n spatial sites Q :=

˚(1) , · · · ,˚(n) , where n = h max(s1 ) × h max(s2 ). h will be
s
s

chosen such that n ∈ Z>0 . Note that n → ∞ as h → ∞. The collection of realized values of
the random ﬁeld in Q is denoted by z := (z (1) , · · · , z (n) )T ∈ Rn , where z (i) := z(˚(i) ).
s
The prior distribution of z is given by N (µ, K). We then have

1
P(z) ∝ exp − (z − µ)T K−1 (z − µ) .
2
125

(5.1)

The i, j-th element of K is deﬁned as (K)ij = Cov(z (i) , z (j) ). The prior distribution of z
can be written by a precision matrix Q = K−1 , i.e., z ∼ N (µ, Q−1 ). This can be viewed
as a discretized version of the Gaussian process (or a GMRF) with a precision matrix Q
ˆ
on Q. Note that Q of this GMRF is not sparse. However, a sparse version of Q, i.e., Q
with local neighborhood that can represent the original Gaussian process can be found, for
ˆ
example, making Q close to Q in some norm [15, 35, 90]. This approximate GMRF will be
ˆ
computationally eﬃcient due to the sparsity of Q. In our simulation study, we will use a
GMRF with a sparse precision matrix that represents a Gaussian process precisely as shown
in [42, 63]. However, any parameterization of µθ and Qθ , where θ is the hyperparameter
vector, can be used.

5.2.2

Multiple robotic sensors

Consider N spatially distributed robots with sensors indexed by j ∈ J := {1, · · · , N }
sampling at time t ∈ Z>0 . Suppose that the sampling time t ∈ Z>0 is discrete. Recall
that the surveillance region is discretized as a lattice that consists of n spatial sites, whose
set is denoted by Q. Let n spatial sites in Q be indexed by I := {1, · · · , n}, and z :=
col z (1) , · · · , z (n) ∈ Rn be the corresponding static values of the scalar ﬁeld at n special
[1]

[N ]

sites. We denote all robots’ locations at time t by qt = col qt , · · · , qt
[1]

[N ]

observations made by all robots at time t by yt = col yt , · · · , yt
[1]

[N ]

observed states of all robots at time t by ˜t = col st , · · · , st
s
˜
˜

∈ QN , the

∈ RN , and the

∈ S N . ˜t and yt are noisy
s

observations of st and z, respectively. At time t, robot j takes a noise corrupted measurement
[j]

at its current location qt = ˚(i) ∈ Q, ∀j ∈ J , i ∈ I, viz.,
s

[j]

[j]

yt = z (i) + t ,
126

(5.2)

[j]

where the measurement errors { t } are assumed to be the independent and identically
[j] i.i.d.

∼ N (0, σ 2 ). The measurement noise

distributed (i.i.d.) Gaussian white noise, i.e., t

level σ 2 > 0 is assumed to be known, and we deﬁne t := col

[N ]
[1]
t ,··· , t

∈ RN .

In addition, at time t, robot j takes a noisy observation of its own vehicle position.

[j]

[j]

[j]

st = qt + e t ,
˜

(5.3)

[j] i.i.d.

[j]

where the observation errors {et } are distributed by et

2
∼ N (0, σe I).

2
The observation noise level σe > 0 is assumed to be known, and we deﬁne et :=
[1]

[N ]

col et , · · · , et

∈ R2N .

Our models can be represented in the concise collective notation.

yt = Ht z + t ,

(5.4)

˜t = Lt qt + et ,
s
where Lt is the observation matrix for the vehicle states, and Hτ ∈ Rn×N is deﬁned by

[j]

 1, if qτ = ˚(i) ,
s
(Hτ )ij =

 0, otherwise.

5.2.3

Kinematics of robotic vehicles

In this section, we introduce a speciﬁc model for the motion of robotic vehicles. Each
robotic sensor is modeled by a nonholonomic diﬀerentially driven vehicle in a two dimensional

127

domain, i.e., S ⊂ R2 . In this case, an equation of motion for robot i [86] may be given by






=

[i]
˙
 s1,t 

where

[i]

[i]

s1,t , s2,t ,

[i]

ψt

,

[i]
s2,t
˙

[i]

ut





[i]
[i]
 ut cos ψt 
[i]
[i]
ut sin ψt

[i]

, and βt

[i]

 + βt ,

(5.5)

denote the inertial position, the orientation,

the linear speed, and the system noise of robot i in time t, respectively. In this case, the
kinematics of the vehicle network can be further described in detail as follows.

st+1 = st + Ft ut + vt ,

(5.6)

where ut is a known control input and vt is an i.i.d. white noise realized by a known normal
distribution N 0, Σvt .
The evolution of the location of the robot can be more detailed as follows.

qt+1 = Q (qt + Ft ut + vt )

(5.7)

= qt + Ft ut + wt ,
where Q : S → Q is the nearest neighbor rule quantizer that takes an input and returns
a projected value on S. vt is the process noise and wt is the quantization error between
the continuous and discretized states, i.e., wt = Q (qt + Ft ut + vt ) − (qt + Ft ut ). As the
cardinality of S increases, we have that wt → vt . A special case of (5.7) is that Ft ut is
controlled and wt is chosen such that the next location qt+1 is on a grid point in S. In this
case, we have vt = wt .
[i]

Assuming that {ψt | ∀i ∈ J } in (5.5) can be measured precisly, Ft ∈ R2N ×N in (5.5) is

128

obtained as follows.


[1]
 cos ψt



0


 sin ψ [1]
0

t


[2]

0
cos ψt



[2]
Ft = ∆t 
0
sin ψt


.
.

.
.

.
.



0
0



0
0

···
···
···
···
..

.

···
···

0





0




0



,
0


.

.

.


[N ] 
cos ψt 


[N ]
sin ψt

where ∆t = 1 is a sampling time. We denote the collections of cumulative robots’ locations,
cumulative observations, and cumulative control inputs from time 1 to time t, respectively, by
˜1:t := col (˜1 , · · · , ˜t ) ∈ S N t , y1:t := col (y1 , · · · , yt ) ∈ RN t , and u1:t := col (u1 , · · · , ut ) ∈
s
s
s
RN t .

5.2.4

Problem formulation and its Bayesian predictive inference

In this section, we formulate the SLAP problem and provide its Bayesian solution. To be
precise, we present the following assumptions A.1-A.5 for the problem formulation.
A.1 The scalar random ﬁeld z is generated by a GMRF model which is given by z ∼
N (µθ , Q−1 ), where µθ and Qθ are given functions of a hyperparameter vector θ.
θ
A.2 The noisy measurements {yt } and the noisy sampling positions {˜t }, as in (5.4), are
s
collected by robotic sensors in time t = 1, 2, · · · .
A.3 The control input {ut } is a known deterministic vector at time t.

129

A.4 The prior distribution of the hyperparameter vector θ is discrete with a support Θ =
θ(1) , · · · , θ(L) .
(i)

(i)

A.5 The prior distribution P(qt = ˚t ) is discrete with a support ˚t ∈ Ω(t), and
q
q

Ω(t) = (˚(k1 ) , · · · ,˚(kN ) )|(k1 , · · · , kN ) ∈ L(t)
s
s

, which is given at time t. Here, γ(t) = |L(t)| denotes the number of the probable
possibilities for qt .
Problem 5.2.1. Consider the assumptions A.1-A.5. Our problem is to simultaneously ﬁnd
the predictive distributions, means, and variances for both z and q conditional on Dt :=
{y1:t , ˜1:t }.
s
The solution to Problem 5.2.1 is derived as follows. The distribution of the GMRF is
given by P (z|θ, Dt−1 ) = N µz|θ,D

t−1

, Σz|θ,D

t−1

. Recall that the evolution of qt is given

by (5.6) and the input ut is a known deterministic vector at time t. Therefore, P (qt |Dt−1 )
can be updated by the Gaussian approximation of P(qt−1 |Dt−1 ).
P (qt |Dt−1 ) ∝ exp (qt − µq |D
+ Ft−1 ut−1 )T (Σq |D
+ Σwt−1 )−1
t−1 t−1
t−1 t−1

(5.8)

× (qt − µq |D
+ Ft−1 ut−1 ) .
t−1 t−1
Similarly, P (yt |θ, Dt−1 , qt ) is updated by the Gaussian approximation of P(z|θ, Dt−1 ) as
follows.

T
P (yt |θ, Dt−1 , qt ) ≈ N Ht µz|θ,D

t−1

130

T
, Σ t + Ht Σz|θ,D

t−1

Ht .

(5.9)

Remark 5.2.2. For the sake of reducing memory usage and complexity, the distribution of
qt |Dt−1 and yt |θ, Dt−1 , qt are approximated by normal distributions.
The joint distribution z, qt , θ|Dt−1 is obtained as follows.

P (z, qt , θ|Dt−1 ) = P (z|θ, qt , Dt−1 ) P (θ|qt , Dt−1 ) P (qt |Dt−1 ) .

(5.10)

The observation model is given by (5.4), thus the probabilities of the observed data are
T
P (yt |z, qt ) = N Ht z, Σ

and P (˜t |qt ) = N Lt qt , Σet . The measured random variables
s

have the following conditional joint distribution,

P (yt , ˜t |z, qt , θ, Dt−1 ) = P (yt |z, θ, qt , Dt−1 ) P (˜t |qt ) .
s
s

(5.11)

From Bayes’ rule, the posterior joint distribution of the scalar ﬁeld values, the sampling
positions, and the hyperparameter vector is given as follows.

P (z, qt , θ|Dt ) =

In addition, P (qt , θ|Dt ) =

P (yt , ˜t |z, qt , θ, Dt−1 ) P (z, qt , θ|Dt−1 )
s
.
P (yt , ˜t |Dt−1 )
s

P (z, qt , θ|Dt ) dz is given as follows.
P (˜t |qt ) P (θ|qt , Dt−1 ) P (qt |Dt−1 )
s
P (yt , ˜t |Dt−1 )
s

P (z, qt , θ|Dt ) dz =
×

where

(5.12)

(5.13)

P (z|θ, qt , Dt−1 ) P (yt |z, θ, qt , Dt−1 ) dz,

P (z|θ, qt , Dt−1 ) P (yt |z, θ, qt , Dt−1 ) dz = P (yt |qt , θ, Dt−1 ), and P(yt |qt , θ, Dt−1 ) is

given by (5.9).
Remark 5.2.3. From the Bayes’ rule, P(θ|qt , Dt−1 ) is given by
131

P(θ,qt |Dt−1 )
.
P(qt |Dt−1 )

We can

compute P (θ, qt |Dt−1 ) for all the possible combinations of qt in the previous iteration using (5.13). However, for the sake of reducing the computational cost, we approximate
P (θ|qt , Dt−1 ) by P (θ|Dt−1 ). Therefore, we have

P (qt , θ|Dt ) ≈

P (˜t |qt ) P (θ|Dt−1 ) P (qt |Dt−1 ) P (yt |qt , θ, Dt−1 )
s
,
P (yt , ˜t |Dt−1 )
s

Marginalizing out uncertainties on the possible qt and θ, we obtain the following.

P (z, θ|Dt ) =

P (z, qt , θ|Dt ),
qt ∈Ω(t)

(5.14)

P (z, θ|Dt ).

P (z|Dt ) =
θ∈Θ

Our estimation of qt and θ can be corrected using measured data up to time t as follows.

P (qt |Dt ) =

P (qt , θ|Dt ),
θ∈Θ

(5.15)
P (qt , θ|Dt ).

P (θ|Dt ) =
qt ∈Ω(t)

The predictive probability and the mean value of z|θ, Dt are obtained as follows.

P (z|θ, Dt ) =
µz|θ,D =
t

P (z, θ|Dt )
,
P (θ|Dt )
1
P (θ|Dt )

(5.16)
µz|q ,θ,D P (qt , θ|Dt ) .
t
t

qt ∈Ω(t)

The predictive covariance matrix of z|θ, Dt can be obtained using the law of total variance
Σz|θ,D = E Σz|q ,θ,D + Cov µz|q ,θ,D , where the E and Cov is computed over random
t
t
t
t
t

132

variable qt . Such variables are obtained as follows.

E Σz|q ,θ,D =
t
t

Σz|q ,θ,D P (qt |θ, Dt ) ,
t
t
qt ∈Ω(t)

Cov µz|q ,θ,D =
t
t

µz|q ,θ,D − µz|θ,D
t
t
t

µz|q ,θ,D − µz|θ,D
t
t
t

T

P (qt |θ, Dt ) ,

qt ∈Ω(t)

(5.17)
where the predictive mean and covariance of z|qt , θ, Dt are calculated using Gaussian process
regression as follows.

µz|q ,θ,D =µz|θ,D
t
t
t−1

+ Σz|θ,D

t−1

T
Hqt Σ−1
y |θ,D
t

t−1 ,qt

(yt − µy |θ,D ,q ),
t
t−1 t

(5.18)

T
− Σz|θ,D Hqt Σ−1
H Σ
.
Σz|q ,θ,D =Σz|θ,D
yt |θ,Dt−1 ,qt qt z|θ,Dt−1
t
t
t−1
t−1

Finally, the ﬁrst and second moments of qt |Dt are obtained as follows.

µq |D =
t t

qt P (qt |Dt ) ,
qt ∈Ω(t)

qt − µq |D
t t

Σq |D =
t t

2

P (qt |Dt ) .

qt ∈Ω(t)

5.3

Simulation Results

In this section, we demonstrate the eﬀectiveness of the proposed sequential Bayesian inference algorithm using a numerical experiment. Consider a robot is moving in a discretized
surveillance region Q. The spatial sites in Q consist of 31 × 31 grid points, i.e., n = 961,
uniformly distributed over the surveillance region S := [−15, 15] × [−15, 15].
In this illustrative example, we realize the spatial ﬁeld developed in [42], which a GMRF

133

wrapped around in a torus structure. Thus the top edge (respectively, the left edge) and
the bottom edge (respectively, the right edge) are neighbors each other. The parameters
of the model in [42] are selected as follows. The mean vector µθ is chosen to be zero, and
the precision matrix Qθ is chosen with hyperparameters α = 0.1 equivalent to a bandwidth
√ √
2
= 2/ α ≈ 4.47, and κ = 50 equivalent to σf = 1/4πακ ≈ 0.016. The prior distribution
of the hyperparameter vector θ is discrete with a support

Θ = {(κ, α), (0.1κ, α), (10κ, α), (κ, 0.1α), (κ, 10α)} ,

along with the corresponding uniform probabilities {0.2, 0.2, 0.2, 0.2, 0.2}. The measurement
noise variance in (5.2) is given by σ = 0.1.
A robot takes measurements at time t ∈ {1, 2, · · · , 100} with localization uncertainty. In
Figs. 5.1-(d), (e), and (f), true, noisy, and probable sampling positions are shown in circles,
stars, and dots, respectively, at time t = 100. In this simulation, the standard deviation
of the noise in the observed sampling position is given by σe = 10 in (5.3). The probable
sampling positions that form support Ω(t), are selected within the conﬁdence region of
(i)

[j]

Pr( ˚t − qt
s
˜

≤ σe ).

The results of the simultaneous localization and spatial prediction are summarized for
three methods as follows.
• Case 1: Figs. 5.1-(a), (d), and (g) show the prediction, prediction error variance, and
squared (empirical) error ﬁelds, using exact sampling positions. With the true sampling
positions, the best prediction quality is expected for this case.
• Case 2: Figs. 5.1-(b), (e), and (h) show the results, by using sampled noisy positions.
The results clearly illustrate that naively applying GMRF regression to noisy sampling
134

positions can potentially distort prediction at a signiﬁcant level. Fig. 5.1-(h) shows
that squared error of this case is considerably higher than that of Case 1.
• Case 3: Figs. 5.1-(c), (f), and (i) show the results, by applying the proposed approach
in Section 5.2.4. The resulting prediction quality is much improved as compared to
Case 2 and is even comparable to that of Case 1.
The averaged squared errors in time and space, using true sampling positions (Case
1), noisy sampling positions (Case 2), and using uncertain sampling positions (Case 3) are
0.0837 × 10−3 , 0.1664 × 10−3 , and 0.0989 × 10−3 , respectively. This shows the eﬀectiveness
of our solution to Problem 5.2.1.
The true positions of the robot for time t ∈ T := {10, 31, · · · , 30} are shown in Fig. 5.2
by red diamonds and lines. The estimated sampling positions of the robot E(qt |Dt ) for
t ∈ T are shown in blue dots with estimated conﬁdence regions. Fig. 5.2 clearly shows that
the proposed approach in this chapter signiﬁcantly reduce the localization uncertainty as
compared to the noise level of the sampled positions (denoted by green stars).
In this example, the ﬁxed running time using Matlab R2009b (MathWorks) on a PC (3.2
GHz Intel i7 Processor) is about 40 seconds for each iteration of time which is fast enough
for real world implementation.

5.4

Conclusion

In this chapter, we provide an approximate Bayesian solution to the problem of simultaneous
localization and spatial prediction (SLAP), taking into account kinematics of robots and
uncertainties in the precision matrix, the sampling positions, and the measurements of a
GMRF in a fully Bayesian manner. In contrast to [42], the kinematics of the robotic vehicles
135

(b)

(a)

(c)

−10

−10

0

0

0

10

10

0.3

−10

10

−10

0

10

−10

0

10

0.2
0.1
0
−0.1
−10

(e)

(d)

0

10

(f)

−10

−10

0

0

0

10

10

0.07

−10

10

−10

0

10

−10

(g)

0

10

0.05
0.03
0.01
−10

(h)

−10

−0.2

0

10

(i)

−10

0.15

−10
0.1

0

0

0

10

10

10

0.05
−10

0

10

−10

0

10

−10

0

10

0

Figure 5.1: The prediction results of Cases 1, 2, and 3 at time t = 100 are shown in the
ﬁrst, second, third columns, respectively. The ﬁrst, second, and third rows correspond to
the prediction, prediction error variance, and squared empirical error ﬁelds between the
prediction and true ﬁelds. True, noisy, and probable sampling positions are shown in circles,
stars, and dots, respectively. The x and y axis represent 2-D localization, and colors represent
the value of the desired quantity in the locations.
are integrated into the inference algorithm. The simulation results show that the proposed
approach estimates the sampling positions and predicts the spatial ﬁeld along with their
prediction error variances successfully, in a ﬁxed computational time. The simulation study
suggests that the complexity of the proposed scalable inference algorithm is aﬀordable for a

136

−15
−10
−5
0
5
10
15
−15

−10

−5

0

5

10

15

Figure 5.2: The trajectories of true, predicted, and noisy sampling positions of the robot are
shown by red diamonds, blue dots, and green stars for time t ∈ {10, 11, · · · , 30}. The blue
ellipsoids show the conﬁdence regions of about 68% for the estimated sampling positions.
robot to operate in real world situations.

137

Chapter 6
Conclusion and Future Work
In this chapter, we brieﬂy summarize the key contributions presented in this dissertation
and propose some directions for future work.

6.1

Conclusion

In Chapter 2, we have formulated Gaussian process regression with observations under the
localization uncertainty. Eﬀects of the measurements noise, localization uncertainty, and
prior distributions have been all correctly incorporated in the posterior predictive statistics
in a Bayesian approach. We have reviewed the Monte Carlo sampling and Laplace’s method,
which have been applied to compute the analytically intractable posterior predictive statistics
of the Gaussian processes with localization uncertainty. The approximation error and complexity of all the proposed approaches have been analyzed. In particular, we have provided
tradeoﬀs between the error and complexity of Laplace approximations and their diﬀerent degrees such that one can choose a tradeoﬀ taking into account the performance requirements
and computation complexity due to the resource-constrained sensor network. Simulation
study demonstrated that the proposed approaches perform much better than approaches
without considering the localization uncertainty properly.
In Chapter 3, the multivariate Gaussian process model is utilized to use vision data for
the mobile robot localization. The predictive statistics of vision data is learned in advance
138

and used in order to estimate the position of a mobile robot, equipped just with an omnidirectional camera. For the sake of ﬂexibility, the multivariate GP model with unknown
hyperparameters is used to model vision data. The experiments show good positioning
results within a reasonable computational time.
In Chapter 4, the problem of predicting a spatio-temporal ﬁeld using successive noisy
measurements is tackled. We developed the spatio-temporal ﬁeld of interest using a GMRF
and designed sequential prediction algorithms for computing the exact and approximated
predictive inference from a Bayesian point of view. The most important contribution is
that the computation times for the proposed algorithms do not grow as the number of
measurements increases.
In Chapter 5, we provide an approximate Bayesian solution to the problem of simultaneous localization and spatial prediction (SLAP), taking into account kinematics of robots
and uncertainties in the precision matrix, the sampling positions, and the measurements of
a GMRF in a fully Bayesian manner. In contrast to Chapter 4, the kinematics of the robotic
vehicles are integrated into the inference algorithm. The simulation results show that the
proposed approach estimates the sampling positions and predicts the spatial ﬁeld along with
their prediction error variances successfully, in a ﬁxed computational time. The simulation
study suggests that the complexity of the proposed scalable inference algorithm is aﬀordable
for a robot to operate in real world situations.

6.2

Future Work

In the long term, we plan to expanding our current work and exploring on the following
directions.

139

Although optimal sampling was not pursued in this dissertation, it will be interesting to
consider the optimal sampling strategies for mobile sensing agents subject to complicated
vehicle dynamics.
The distributed implementation of the proposed algorithms is practically important, especially for a sensor network with a limited computational power. To this end, we are
interested in investigating on the eﬃcient implementation and emerging technologies such
as GPU computing to speed up current algorithms for real time applications.
Expanding the work on spatial modeling using GMRFs to deal with a general spatiotemporal process is an important direction to follow in the future. Implementing the developed algorithms in more challenging and more realistic experiments will play a key role in
expanding the scope of applications of the proposed methods in this dissertation.

140

BIBLIOGRAPHY

141

BIBLIOGRAPHY
[1] Xsens company, MTi-G, miniature AHRS with integreted GPS, 2009.
[2] C. Andrieu, N. De Freitas, A. Doucet, and M.I. Jordan. An introduction to MCMC
for machine learning. Machine learning, 50(1):5–43, 2003.
[3] C. M. Bishop. Pattern recognition and machine learning. Springer, New York, 2006.
[4] P. Blaer and P. Allen. Topological mobile robot localization using fast vision techniques. In Proceedings of the IEEE International Conference on Robotics and Automation,, volume 1, pages 1031–1036, 2002.
[5] Francisco Bonin-Font, Alberto Ortiz, and Gabriel Oliver. Visual navigation for mobile
robots: A survey. Journal of intelligent and robotic systems, 53(3):263–296, 2008.
[6] A. Brix and P.J. Diggle. Spatiotemporal prediction for log-Gaussian Cox processes.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(4):823–
841, 2001.
[7] A. Brooks, A. Makarenko, and B. Upcroft. Gaussian process models for indoor and
outdoor sensor-centric robot localization. IEEE Transactions on Robotics, 24(6):1341–
1351, 2008.
[8] J. Choi, J. Lee, and S. Oh. Biologically-inspired navigation strategies for swarm intelligence using spatial Gaussian processes. In Proceedings of the 17th International
Federation of Automatic Control World Congress, pages 593–598, July 2008.
[9] J. Choi, J. Lee, and S. Oh. Swarm intelligence for achieving the global maximum
using spatio-temporal Gaussian processes. In Proceedings of the 27th American Control
Conference, pages 135–140, June 2008.
[10] J. Choi, S. Oh, and R. Horowitz. Distributed learning and cooperative control for
multi-agent systems. Automatica, 45(12):2802–2814, December 2009.
[11] G.C. Christie and C.I. Goddard. Sea lamprey international symposium (slis ii): advances in the integrated management of sea lamprey in the great lakes. Journal of
Great Lakes Research, 29:1–14, 2003.
142

[12] J. Cort´s. Distributed Kriged Kalman ﬁlter for spatial estimation. IEEE Transactions
e
on Automatic Control, 54(12):2816–2827, December 2009.
[13] J. Cort´s. Distributed Kriged Kalman ﬁlter for spatial estimation. IEEE Transactions
e
on Automatic Control, 54(12):2816–2827, 2010.
[14] N. Cressie. Kriging nonstationary data. Journal of the American Statistical Association, 81(395):625–634, September 1986.
[15] N. Cressie and N. Verzelen. Conditional-mean least-squares ﬁtting of Gaussian
Markov random ﬁelds to Gaussian ﬁelds. Computational Statistics & Data Analysis,
52(5):2794–2807, January 2008.
[16] N. Cressie and C.K. Wikle. Space–time Kalman ﬁlter. Encyclopedia of environmetrics,
4:2045–2049, 2002.
[17] D. Culler, D. Estrin, and M. Srivastava. Guest editors’ introduction: overview of sensor
networks. Computer, 37(8):41–49, August 2004.
[18] M.P. Deisenroth, C.E. Rasmussen, and J. Peters. Gaussian process dynamic programming. Neurocomputing, 72(7-9):1508–1524, 2009.
[19] C´dric Demonceaux, Pascal Vasseur, and Yohan Fougerolle. Central catadioptric image
e
processing with geodesic metric. Image and Vision Computing, 29(12):840–849, 2011.
[20] Guilherme N. DeSouza and Avinash C. Kak. Vision for mobile robot navigation: A
survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,, 24(2):237–
267, 2002.
[21] M.W.M.G. Dissanayake, P. Newman, S. Clark, H.F. Durrant-Whyte, and M. Csorba.
A solution to the simultaneous localization and map building (SLAM) problem. IEEE
Transactions on Robotics and Automation, 17(3):229–241, 2001.
[22] Vladislavs Dovgalecs, R´mi M´gret, and Yannick Berthoumieu. Multiple feature fue
e
sion based on co-training approach and time regularization for place classiﬁcation in
wearable video. Advances in Multimedia, 1, 2013.
[23] Y. Dupuis, X. Savatier, and P. Vasseur. Feature subset selection applied to model-free
gait recognition. Image and Vision Computing, 31(8):580 – 591, 2013.

143

[24] H. Durrant-Whyte and T. Bailey. Simultaneous localisation and mapping (SLAM):
Part I the essential algorithms. Robotics and Automation Magazine, 13(2):99–110,
2006.
[25] D. Estrin, D. Culler, K. Pister, and G. Sukhatme. Connecting the physical world with
pervasive networks. Pervasive computing, 1(1):59–69, 2002.
[26] Jo˜o Carlos Apar´ Fernandes and Jos´ Alberto B Campos Neves. Using conical and
a
ıcio
e
spherical mirrors with conventional cameras for 360 panorama views in a single image.
In Proceedings of the IEEE International Conference on Mechatronics, pages 157–160,
2006.
[27] B. Ferris, D. Fox, and N. Lawrence. WiFi-SLAM using Gaussian process latent variable models. In Proceedings of the 20th International Joint Conference on Artiﬁcial
Intelligence, pages 2480–2485, 2007.
[28] J. Geweke. Bayesian inference in econometric models using Monte Carlo integration.
Econometrica: Journal of the Econometric Society, 57(6):1317–1339, 1989.
[29] A. Girard, C.E. Rasmussen, J.Q. Candela, and R. Murray-Smith. Gaussian process
priors with uncertain inputs-application to multiple-step ahead time series forecasting.
Advances in Neural Information Processing Systems, pages 545–552, 2003.
[30] R. Graham and J. Cort´s. Cooperative adaptive sampling of random ﬁelds with
e
partially known covariance. International Journal of Robust and Nonlinear Control,
22(5):504–534, March 2012.
[31] G. Grisettiyz, C. Stachniss, and W. Burgard. Improving grid-based SLAM with raoblackwellized particle ﬁlters by adaptive proposals and selective resampling. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pages
2432–2437. IEEE, 2005.
[32] J.E. Guivant and E.M. Nebot. Optimization of the simultaneous localization and mapbuilding algorithm for real-time implementation. IEEE Transactions on Robotics and
Automation, 17(3):242–257, 2001.
[33] Baofeng Guo and Mark S Nixon. Gait feature subset selection by mutual information.
IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans,
39(1):36–46, 2009.

144

[34] Efstathios Hadjidemetriou, Michael D. Grossberg, and Shree K. Nayar. Multiresolution
histograms and their use for recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 26(7):831–847, 2004.
[35] L. Hartman and O. H¨ssjer. Fast kriging of large data sets with Gaussian Markov
o
random ﬁelds. Computational Statistics & Data Analysis, 52(5):2331–2349, January
2008.
[36] Dave Higdon, James Gattiker, Brian Williams, and Maria Rightley. Computer model
calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583, 2008.
[37] KC Ho. Alleviating sensor position error in source localization using calibration emitters at inaccurate locations. IEEE Transactions on Signal Processing, 58(1):67–83,
2010.
[38] L. Hu and D. Evans. Localization for mobile sensor networks. In Proceedings of the
10th annual international conference on mobile computing and networking, pages 45–
57. ACM, 2004.
[39] M. Jadaliha and J. Choi. Environmental monitoring using autonomous aquatic robots:
Sampling algorithms and experiments. IEEE Transactions on Control Systems Technology. in press, 2012.
[40] M. Jadaliha and J. Choi. Environmental monitoring using autonomous aquatic robots:
Sampling algorithms and experiments. IEEE Transactions on Control Systems Technology, 21(3):899–905, 2013.
[41] M. Jadaliha, Y. Xu, , and J. Choi. Eﬃcient Spatial Prediction Using Gaussian Markov
Random Fields under Uncertain Localization. In Proceedings of the ASME Dynamic
Systems and Control Conference, October 17-19, 2012, Ft. Lauderdale, FL, USA.,
pages 1–6, 2012.
[42] M. Jadaliha, Y. Xu, and J. Choi. Eﬃcient Spatial Prediction Using Gaussian Markov
Random Fields Under Uncertain Localization. In Proceeding of the ASME Dynamic
Systems and Control Conference, October 2012.
[43] M. Jadaliha, Y. Xu, and J. Choi. Gaussian process regression using Laplace approximations under localization uncertainty. In Proceedings of the American Control
Conference, June 2012.

145

[44] Mahdi Jadaliha, Yunfei Xu, Jongeun Choi, N Johnson, and Weiming Li. Gaussian process regression for sensor networks under localization uncertainty. IEEE Transactions
on Signal Processing, 61(2):223–237, 2013.
[45] Gijeong Jang, Sungho Lee, and Inso Kweon. Color landmark based self-localization for
indoor mobile robots. In Proceedings of the IEEE International Conference on Robotics
and Automation, volume 1, pages 1037–1042, 2002.
[46] Woo Yeon Jeong and Kyoung Mu Lee. Visual slam with line and corner features.
In Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and
Systems, pages 2570–2575. IEEE, 2006.
[47] L. Jetto, S. Longhi, and G. Venturini. Development and experimental validation of an
adaptive extended Kalman ﬁlter for the localization of mobile robots. IEEE Transactions on Robotics and Automation, 15(2):219–229, April 1999.
[48] N.S. Johnson, M.A. Luehring, M.J. Siefkes, and W. Li. Mating pheromone reception
and induced behavior in ovulating female sea lampreys. North American journal of
ﬁsheries management, 26(1):88–96, 2006.
[49] N.S. Johnson, S.S. Yun, H.T. Thompson, C.O. Brant, and W. Li. A synthesized
pheromone induces upstream movement in female sea lamprey and summons them
into traps. Proceedings of the National Academy of Sciences, 106(4):1021, 2009.
[50] R. Karlsson and F. Gustafsson. Bayesian surface and underwater navigation. IEEE
Transactions on Signal Processing, 54(11):4204–4213, 2006.
[51] A. Kemppainen, J. Haverinen, I. Vallivaara, and J. R¨ning. Near-optimal SLAM
o
exploration in Gaussian processes. In Proceeding of the IEEE International Conference
on Multisensor Fusion and Integration for Intelligent Systems, pages 7–13, September
2010.
[52] Ji-Hyun Kim. Estimating classiﬁcation error rate: Repeated cross-validation, repeated
hold-out and bootstrap. Computational Statistics & Data Analysis, 53(11):3735–3745,
2009.
[53] J. Ko and D. Fox. GP-BayesFilters: Bayesian ﬁltering using Gaussian process prediction and observation models. Autonomous Robots, 27(1):75–90, 2009.
[54] J. Kocijan, R. Murray-Smith, C.E. Rasmussen, and B. Likar. Predictive control with
Gaussian process models. In Proceedings of EUROCON 2003, volume 1. IEEE, 2003.
146

[55] J. Kosecka, Liang Zhou, P. Barber, and Z. Duric. Qualitative image based localization
in indoors environments. In Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, volume 2, pages II–3–II–8, 2003.
[56] A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. Near-optimal sensor placements:
maximizing information while minimizing communication cost. In Proceedings of the
5th international conference on Information processing in sensor networks, pages 2–10.
ACM, 2006.
[57] A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian
processes: theory, eﬃcient algorithms and empirical studies. The Journal of Machine
Learning Research, 9:235–284, 2008.
[58] N. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian process methods:
The informative vector machine. Advances in Neural Information Processing Systems,
pages 625–632, 2003.
[59] Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial
pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, volume 2,
pages 2169–2178, 2006.
[60] J. Le Ny and G. J. Pappas. On trajectory optimization for active sensing in Gaussian
process models. In Proceedings of the 48th IEEE Conference on Decision and Control,
pages 6286–6292, December 2009.
[61] J.J. Leonard and H.F. Durrant-Whyte. Simultaneous map building and localization
for an autonomous mobile robot. In Proceeding of the Intelligent Robots and Systems
(IROS), pages 1442–1447. IEEE, 1991.
[62] N.E. Leonard, D.A. Paley, F. Lekien, R. Sepulchre, D.M. Fratantoni, and R.E. Davis.
Collective motion, sensor networks, and ocean sampling. Proceedings of the IEEE,
95(1):48–74, 2007.
[63] F. Lindgren, H. Rue, and J. Lindstr¨m. An explicit link between Gaussian ﬁelds and
o
Gaussian Markov random ﬁelds: the stochastic partial diﬀerential equation approach.
Journal of the Royal Statistical Society: Series B, 73(4):423–498, September 2011.
[64] K. M. Lynch, I. B. Schwartz, P. Yang, and R. A. Freeman. Decentralized environmental
modeling by mobile sensor networks. IEEE Transactions on Robotics, 24(3):710–724,
June 2008.

147

[65] A. McHutchon and C.E. Rasmussen. Gaussian Process training with input noise.
In Proceedings of the Advances in Neural Information Processing Systems (NIPS),
Grandata, Spain,, pages 1–9, December 2011.
[66] Emanuele Menegatti, Takeshi Maeda, and Hiroshi Ishiguro. Image-based memory for
robot navigation using properties of omnidirectional images. Robotics and Autonomous
Systems, 47(4):251–267, 2004.
[67] Emanuele Menegatti, Mauro Zoccarato, Enrico Pagello, and Hiroshi Ishiguro. Imagebased Monte Carlo localisation with omnidirectional images. Robotics and Autonomous
Systems, 48(1):17–30, 2004.
[68] Y. Miyata. Fully exponential Laplace approximations using asymptotic modes. Journal
of the American Statistical Association, 99(468):1037–1049, 2004.
[69] Y. Miyata. Laplace approximations to means and variances with asymptotic modes.
Journal of Statistical Planning and Inference, 140(2):382–392, 2010.
[70] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. Fastslam: A factored solution
to the simultaneous localization and mapping problem. In Proceedings of the National
conference on Artiﬁcial Intelligence, pages 593–598. AAAI, 2002.
[71] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM 2.0: An improved
particle ﬁltering algorithm for simultaneous localization and mapping that provably
converges. In Proceeding of the Artiﬁcial Intelligence, volume 18, pages 1151–1156,
2003.
[72] K. Murphy. Bayesian map learning in dynamic environments. Advances in Neural
Information Processing Systems (NIPS), 12:1015–1021, 1999.
[73] R. Murray-Smith, D. Sbarbaro, C.E. Rasmussen, and A. Girard. Adaptive, cautious,
predictive control with Gaussian process priors. In Proceeding of the 13th IFAC Symposium on System Identiﬁcation, Rotterdam, Netherlands, pages 1195–1200, 2003.
[74] M.F. Mysorewala, D.O. Popa, and F.L. Lewis. Multi-scale adaptive sampling with
mobile agents for mapping of forest ﬁres. Journal of Intelligent and Robotic Systems,
54(4):535–565, 2009.
[75] D. Nash and M. Hannah. Using Monte-Carlo simulations and Bayesian Networks to
quantify and demonstrate the impact of fertiliser best management practices. Environmental Modelling & Software, 26(9):1079–1088, 2011.
148

[76] Mark Nixon and Alberto S Aguado. Feature extraction & image processing. Academic
Press, 2008.
[77] S. O’Callaghan, F.T. Ramos, and H. Durrant-Whyte. Contextual occupancy maps
using Gaussian processes. In Proceeding of the IEEE International Conference on
Robotics and Automation, pages 1054–1060, May 2009.
[78] P. Oguz-Ekim, JP Gomes, J. Xavier, and P. Oliveira. Robust localization of nodes
and time-recursive tracking in sensor networks using noisy range measurements. IEEE
Transactions on Signal Processing, 59(99):3930–3942, 2011.
[79] S. Oh, L. Schenato, P. Chen, and S. Sastry. Tracking and coordination of multiple
agents using sensor networks: System design, algorithms and experiments. Proceedings
of the IEEE, 95(1):234–254, January 2007.
[80] Naoya Ohnishi and Atsushi Imiya. Appearance-based navigation and homing for autonomous mobile robot. Image and Vision Computing, 31(6):511 – 532, 2013.
[81] Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual
information criteria of max-dependency, max-relevance, and min-redundancy. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226–1238, 2005.
[82] Christian Plagemann, Cyrill Stachniss, Jrgen Hess, Felix Endres, and Nathan Franklin.
A nonparametric learning approach to range sensing from omnidirectional vision.
Robotics and Autonomous Systems, 58(6):762 – 772, 2010.
[83] J. Qui˜onero-Candela and C. E. Rasmussen. A unifying view of sparse approximate
n
Gaussian process regression. The Journal of Machine Learning Research, 6:1939–1959,
2005.
[84] C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. The
MIT Press, Cambridge, Massachusetts, London, England, 2006.
[85] Carl Edward Rasmussen. Gaussian processes for machine learning. MIT Press, 2006.
[86] W. Ren. Consensus strategies for cooperative control of vehicle formations. Control
Theory & Applications, IET, 1(2):505–512, 2007.
[87] Eric Royer, Maxime Lhuillier, Michel Dhome, and Jean-Marc Lavest. Monocular vision
for mobile robot localization and autonomous navigation. International Journal of
Computer Vision, 74(3):237–260, 2007.
149

[88] H. Rue and L. Held. Gaussian Markov random ﬁelds: theory and applications. Chapman & Hall, 2005.
[89] H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal
Statistical Society: Series B (Statistical Methodology), 71(2):319–392, April 2009.
[90] H. Rue and H. Tjelmeland. Fitting Gaussian Markov random ﬁelds to Gaussian ﬁelds.
Scandinavian Journal of Statistics, 29(1):31–49, March 2002.
[91] Timo Schairer, Benjamin Huhle, Philipp Vorst, Andreas Schilling, and Wolfgang
Straser. Visual mapping with uncertainty for correspondence-free localization using
Gaussian process regression. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4229–4235, 2011.
[92] D Scharstein and A.J Briggs. Real-time recognition of self-similar landmarks. Image
and Vision Computing, 19(11):763 – 772, 2001.
[93] Matthias Seeger. Bayesian Gaussian process models: PAC-Bayesian generalisation
error bounds and sparse approximations. PhD thesis, School of Informatics, University
of Edinburgh, 2003.
[94] BR Smith and JJ Tibbles. Sea lamprey (petromyzon marinus) in lakes huron, michigan,
and superior: history of invasion and control, 1936-78. Canadian Journal of Fisheries
and Aquatic Sciences, 37(11):1780–1801, 1980.
[95] A. J. Smola and P. L. Bartlett. Sparse greedy Gaussian process regression. In Proceeding of the Advances in Neural Information Processing Systems 13, pages 619–625,
2001.
[96] M. St-Pierre and D. Gingras. Comparison between the unscented Kalman ﬁlter and the
extended Kalman ﬁlter for the position estimation module of an integrated navigation
information system. In Proceeding of the IEEE Intelligent Vehicles Symposium, pages
831–835. IEEE, June 2004.
[97] S. Thrun. Simultaneous localization and mapping. Robotics and cognitive approaches
to spatial mapping, pages 13–41, 2008.
[98] S. Thrun, D. Fox, W. Burgard, and F. Dellaert. Robust Monte Carlo localization for
mobile robots. Artiﬁcial intelligence, 128(1):99–141, May 2001.

150

[99] Sebastian Thrun. Bayesian landmark learning for mobile robot localization. Machine
Learning, 33(1):41–76, 1998.
[100] L. Tierney and J.B. Kadane. Accurate approximations for posterior moments and
marginal densities. Journal of the American Statistical Association, 81(393):82–86,
1986.
[101] L. Tierney, R.E. Kass, and J.B. Kadane. Fully exponential Laplace approximations to
expectations and variances of nonpositive functions. Journal of the American Statistical
Association, 84(407):710–716, 1989.
[102] D.H. Titterton and J.L. Weston. Strapdown inertial navigation technology. Peter
Peregrinus Ltd, 2004.
[103] A. Torralba, K.P. Murphy, W.T. Freeman, and M.A. Rubin. Context-based vision
system for place and object recognition. In Proceedings of the IEEE International
Conference on Computer Vision, pages 273–280, 2003.
[104] Volker Tresp. A Bayesian committee machine. Neural Computation, 12(11):2719–2741,
2000.
[105] I. Vallivaara, J. Haverinen, A. Kemppainen, and J. Roning. Simultaneous localization
and mapping using ambient magnetic ﬁeld. In Proceeding of the IEEE International
Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 14–19,
September 2010.
[106] Shrihari Vasudevan. Data fusion with Gaussian processes. Robotics and Autonomous
Systems, 60(12):1528 – 1544, 2012.
[107] N. Vlassis, R. Bunschoten, and B. Krose. Learning task-relevant features from robot
data. In Proceedings of the IEEE International Conference on Robotics and Automation, volume 1, pages 499–504, 2001.
[108] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1-2):1–305, 2008.
[109] Gordon Wells and Carme Torras. Assessing image features for vision-based robot
positioning. Journal of Intelligent and Robotic Systems, 30(1):95–118, 2001.
[110] Gordon Wells, Christophe Venaille, and Carme Torras. Vision-based robot positioning
using neural networks. Image and Vision Computing, 14(10):715–732, 1996.
151

[111] C. K. I. Williams and C. E. Rasmussen. Gaussian processes for regression. Advances
in Neural Information Processing Systems, 8:514–520, 1996.
[112] Christopher Williams and Matthias Seeger. Using the Nystrom method to speed up
kernel machines. In Proceeding of the Advances in Neural Information Processing
Systems 13, pages 682–688. MIT Press, 2001.
[113] Y. Xu and J. Choi. Adaptive sampling for learning Gaussian processes using mobile
sensor networks. Sensors, 11(3):3051–3066, 2011.
[114] Y. Xu and J. Choi. Adaptive sampling for learning Gaussian processes using mobile
sensor networks. Sensors, 11(3):3051–3066, March 2011.
[115] Y. Xu, J. Choi, S. Dass, and T. Maiti. Bayesian prediction and adaptive sampling
algorithms for mobile sensor networks. In Proceedings of the 2011 American Control
Conference, pages 4195–4200, June 2011.
[116] Y. Xu, J. Choi, S. Dass, and T. Maiti. Eﬃcient Bayesian spatial prediction with
mobile sensor networks using Gaussian Markov random ﬁelds. 2012. submitted to
Automatica.
[117] Y. Xu, J. Choi, S. Dass, and T. Maiti. Sequential Bayesian prediction and adaptive
sampling algorithms for mobile sensor networks. IEEE Transactions on Automatic
Control, 57(8):2078–2084, 2012.
[118] Y. Xu, J. Choi, and S. Oh. Mobile sensor network coordination using Gaussian processes with truncated observations. IEEE Transactions on Robotics, 27(6):1118–1131,
2011.
[119] Y. Xu, J. Choi, and S. Oh. Mobile sensor network navigation using Gaussian processes
with truncated observations. IEEE Transactions on Robotics, 27(6):1118 – 1131, December 2011.
[120] Yunfei Xu and Jongeun Choi. Spatial prediction with mobile sensor networks using Gaussian processes with built-in Gaussian Markov random ﬁelds. Automatica,
48(8):1735 – 1740, 2012.

152