ﬁe .123

CI! .11“

[I

,3» . .

a;
.‘v.

w

h

.2.
A.
(:73

ﬁt?

;!

 

"(

a;

4 s . .
1.5.1:! ‘

E «9.21.; :1.
4i. ‘ ~ 7 a.

y: a D. nth)?
. u ,
. w. 5.. “(u .

 

.3.

 

'I‘HEI.C ' ‘7

.9609

This is to certify that the

dissertation entitled

Filtering For Some Stochastic Processes
With Discrete Observations

presented by

Oleg Makhnin

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in Statistics

W

C0 -Major p‘iﬁessor
Anatoli Skorokhod

06% 9%

Co-Major Professor
MS U is an Affirmative Action/Equal Opportunity Institution H ab 1b S a 1 eh i 0-12771

 

 

Date July 22, 2002

 

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN Box to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

JUL 1 f3 2010

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 c:/CIRC/DateDue.p65—p. 15

FILTERING FOR SOME STOCHASTIC PROCESSES WITH DISCRETE
OBSERVATIONS

By

Oleg V. Makhnin

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

2002

ABSTRACT

FILTERING FOR SOME STOCHASTIC
PROCESSES WITH DISCRETE OBSERVATIONS

By

Oleg V. Makhnin

The processes in question are jump processes and processes with jumping velocity.
We estimate the current position of the stochastic process based on past discrete-time
observations (non-linear discrete ﬁltering problem). We obtain asymptotic rates for
the expected square error of the ﬁlter when observations become frequent. These
rates are better than those of a linear Kalman ﬁlter. For jump process, our method is
asymptotically free of the process parameters. Also, estimation of process parameters

is addressed.

Copyright by
OLEG V. MAKHNIN
2002

Acknowledgments

The author would like to thank his advisor, Prof. Skorokhod, for setting up the
problem and working with me patiently, providing guidance and insight into the
problem.

I also thank my committee members Prof. Gilliland, Prof. LePage and Prof.
Salehi whose contributions have been invaluable. Special thanks to Prof. LePage for
his help with the manuscript and to Prof. Salehi for his advice and supervision.

I would also like to express my gratitude to Prof. Koul, Prof. Mandrekar, Prof.
Page and all other professors at Michigan State University from whom I have learned
alot.

Last, but not least, my gratitude to my wife who supported me throughout this

endeavor.

iv

List of Tables

Table 1 ........

Table 2 ........

List of Figures

Figure 1 ........

................................ 8
Figure 2 ........................................ 41
Figure 3 ........................................ 45
Figure 4 ........................................ 48

vi

Contents

List of Tables

List of Figures

1 Introduction
1.1 Review of past results ..........................
1.2 Hyper-efﬁciency ..............................
1.3 Formulation of the problem .......................
1.4 Main results ................................
2 Filtering of a jump process
2.1 Recursive formulation ...........................
2.2 Single-block upper bound for expected square error ...........
2.3 Multiple-block upper bound .......................
2.4 Proofs of inequalities used in Theorem 3 ................
2.5 Lower bound for expected square error ..................
3 Estimation of parameters of a jump process
3.1 Errors in jump detection: Lemmas ...................
3.2 Asymptotic behavior of parameter estimates ...............
4 Piecewise-linear process
4.1 Problem formulation ...........................
4.2 Recursive ﬁltering equations .......................
4.3 Asymptotics: special case ........................
5 Comparison to linear ﬁlters
5.1 Jump process ...............................
5.2 Piecewise linear process .........................
6 Simulation
7 Open problems
References
Notation

vii

49

51

52

54

1 Introduction

This work deals with estimating the position of a stochastic process based on past
observations (ﬁltering). With respect to square error, the optimal non-linear ﬁlter is
the conditional expectation of the current state of the process given the observations.
The observations are discrete, and we are interested in the asymptotic behavior of
the non-linear ﬁlter as these observations become more frequent.

Several factors may affect the asymptotic behavior of a non-linear ﬁlter. One of
them is the nature of the process itself. The more irregular a process is, the harder
it will be to ﬁlter. Another factor might be the distribution of observation errors.

The simplest example of such results is the estimation of the mean of a sequence of
i.i.d. variables. One can think about this mean as a “process” that remains constant
over time. Assume that the variables have a density with respect to Lebesgue measure
on R. As pointed out in a book by Ibragimov and Khas’minskii [13], the quality of
the estimate depends on whether or not this density is continuous. In the case of a
density with discontinuities, the phenomenon of “hyper-efﬁciency” occurs. One gets

different results, for example, in cases of normal distribution and uniform distribution.

1.1 Review of past results

Filtering is a major area of stochastic process theory. This has been progress-
ing rapidly over the last 40+ years, starting with Kolmogorov and Wiener. A
great deal of attention has been paid to the ﬁltering with continuous-time obser-
vations that typically involves stochastic differential equations. Among the major
contributions here are R. Kalman and R. Bucy (1961)[18], A. Shiryaev (1966)[25],
T. Kailath (1968)[14], M. Zakai (1969)[27], G. Kallianpur and C. Striebel (1969)[16],
G. Kallianpur (1980)[15], B. Rozovskii (1990) [23]. In most of these works, the obser-
vation noise is a Wiener process, or, more generally, the observation process satisﬁes
a stochastic differential equation driven by a Wiener process.

Filtering with discrete-time observations was considered by Kalman (1960) [17]
and continued in multiple works, including Brémaud (1981)[3]. After the pioneering
work by Kalman, a lot of attention has been paid to linear ﬁlters, which are linear
combinations of observed values. Many works have been devoted to the theory of
Kalman ﬁlter, for example, Anderson and Moore (1979)[1]. Lately, as the computing
facilities have improved greatly, the focus has shifted to non-linear ﬁlters, which
typically perform much better. Comparison with linear ﬁlters is one of topics in this
work.

Yashin(1970)[26] derived the optimal non-linear ﬁlter for situation when the pro-

cess X (t) is Markov taking values 0 and 1, and the observations are also 0 or 1. This
situation is expanded in the book by Elliott, Aggoun and Moore (1995)[10] in the
context of Hidden Markov Models. Their approach has become popular recently and
involves a change of measure, rendering the observations independent of the process
in question. One then arrives at a discrete-time version of Zakai equation, which
presents a recursive way to compute the optimal ﬁlter. This was used, for example,
in Dufour and Kannan (1999)[9] and Kannan(2000)[19].

The recent papers, to mention a few, are Portenko, Salehi and Skorokhod (1997)[21],
Ceci, Gerardi and Tardelli (2001)[5], Del Moral, Jacod, Protter (2001)[7]. The latter
deals with Monte-Carlo methods for estimating the optimal ﬁlter, even in case when
no explicit expressions for the ﬁlter are available. Monte-Carlo methods are also
discussed by Gordon et a1. (1993)[12] and Doucet, de Freitas and Gordon (2001)[8].

Relatively little is known about the asymptotic behavior of ﬁltered estimates as
observations become frequent. Some results on this are given in [22]. This work con-
siders asymptotics for certain classes of stochastic processes. They include compound
Poisson processes and piecewise-linear processes.

The discrete observations are natural in target-tracking, when the process in ques-
tion is a position of a target, and our observations come from a radar. A special case

(with uniform errors) was considered by Portenko, Salehi and Skorokhod (1998) [22],

although they introduce many extra features useful for target-tracking like multi-
targets and false targets.

A general exposition of ﬁltering techniques employed can be found in [21]. A
detailed overview of the target-tracking from an engineer’s prospective is given by

Bar-Shalom et a1. [2].

1.2 Hyper-efﬁciency

The results for parameter estimation in i.i.d. case are well desribed in [13]. They can

be summarized as follows. Suppose that {Yk}k=1 n is a sequence of i.i.d. random

variables with density f9, depending on parameter 0.

a) Suppose that f9 is continuously differentiable, with a several additional regu-
larity conditions, including local asymptotic normality for the family {f9}069. Then,
both Bayesian and maximum likelihood estimates are asymptotically normal with the
rate E(d — 0)2 = C/n + 0(1/n).

b) Suppose that the densities f9 possess jumps at the ﬁnite number of points
121(0), ..., :rk(9) and are continuously differentiable elsewhere, plus some identiﬁability
and regularity conditions. The earliest treatment of such a problem known to the

author is Chernoff and Rubin [6]. In this case both Bayesian and maximum likelihood

estimates have the rate EMA—0)2 = C/n2+0(1/n2). That is, the estimates are “hyper-

efﬁcient”.

An important special case is when the location parameter is estimated, that is
when fg(:r) E f (a: — 9). The difference between the above two cases can be illustrated
using normal distribution in case (a) and uniform distribution in case (b). For normal
distribution, the mean of observed values is a natural estimator with the expected
square error 0(1/n). For uniform distribution on the interval [0 — (2,6 + a], the
estimator [mama/k) + min(Yk)] / 2 with the expected square error 0(1/n2) is a better
estimator for 0 than the mean. Thus, for a density with jumps, the best location
estimator is a function of observations near a discontinuity point.

A generalization of these results to multi—dimensional variables and vector pa-
rameter 6 is published by Ermakov [11]. (This problem also traces to Rubin [24].)
When the error density f9 has discontinuities along a smooth manifold 89, then both
Bayesian and maximum likelihood estimates for 0 have asymptotic square error of
order 1/n2. He also has some results on the sequential estimation of such parameters.

The way our problem is different lies in the stochastic-process perspective. We
are estimating not a stable, unchanging parameter, but a value of some stochastic

process in time.

1.3 Formulation of the problem

Process X (t) is a real-valued stochastic process on the interval [0, T] or [0, 00), de-
pending on the context. In general, suppose that the apriori distribution of the entire
process X (t, w) has a density 7r(X(-)) with respect to measure V in a suitable function
space f'[0, T]. Also let the distribution of observations £(observations|X()) have a
density f in some space of observations. Keep the notation of X () for the entire
trajectory of the process X.

In this work, we use two approaches:

a) when estimating the current position of the process X (t) at a given time t, use
the information obtained up to this instant (“ﬁltering”), regardless of whether or not
the trajectory comprised of resulting estimates X (t) belongs to the speciﬁc family f,
and

b) when estimating the parameters of the process itself, based on observations
in the interval [0, T], try to produce a process that belongs to the family .7: (“jump
process” below), that is reasonably “close” to X ()

Under certain conditions, prior distribution 7r, of X(t), consistent with 7r(X()),
will have a density with respect to Lebesgue measure. As an estimate of process’
position, we use the posterior (with respect to 7m) conditional expectation of X (t)

given all the observations up to the time t. It is well known that such estimator

 

minimizes the squared error of estimation.

Consider two varieties of process X:

0 “Jump process”. Consider a compound Poisson process

X“) = X0 + 2 ft

izsz-St
where (3,, 5,) are the events of a 2-dimensional Poisson process on [0, T] x R. The

intensity of this process is given by A(s, y) = Ah(y), where A > 0 is a constant

 

“time intensity” describing how frequently the jumps of X occur and hg(y) is
the “jump density” describing magnitudes of jumps 5,- of process X. Here 6 E O
is a parameter (possibly unknown) deﬁning the distribution of jumps. In the
Bayesian formulation, parameters 6 and A will have a prior density 1r(6, x\) with
respect to Lebesgue measure. Assume that for each 6 E 9, E6? < 00.

Also, assume that starting value X0 has some prior density 7rxo().

o “Piecewise Linear Process”.

This is a process with jumping velocity V
t
X(t) = X0 + / V(s)ds
o

with V being a compound Poisson process described above.

Assume that the prior distribution of X0, V0 is known.

Observations
Our observations {Yj} are always going to be “Process+noise” over a ﬁnite grid of

values:

Y1 =X(J'/n)+€j

where the noise variables {63-} are i.i.d. with some density qbg, possibly depending on
a parameter 6 and independent of process X. Assume that for each 6 E O, Eel = 0,
and E6? < 00.

A sample path of a Jump Process and observations are given below.

] X(t), Y -

 

 

 

 

 

 

 

Figure 1

1.4 Main results

We will consider asymptotics when n ——> 00: as the observations become frequent,
but the process changes slowly (rate of change A is bounded from above). When the

process X () is changing fast, its diffusion approximations become appropriate, but

 

these are not discussed here.

We establish the following results:

0 Recursive formulas for conditional density of process position given the obser-

vations.

o Asymptotic rates for the estimate of position of jump (compound Poisson) pro-

cess.

o Asymptotic behavior of the parameters of jump process, as both observation

frequency and total time spent observing become large.

0 Asymptotic rates for a piecewise-linear process.

0 Comparison with linear ﬁlters. Simulation results in a small-sample setting.

2 Filtering of a jump process

The ﬁltering and Bayesian estimation problem can be formulated as follows: ﬁnd
the conditional distribution of the states of the process X and unknown parameters
A,6 given the observations (K), initial distirbution of X(O) = X0 and some prior

distribution on the unknown parameters.

 

2. 1 Recursive formulation E

The results in this section are in spirit of Elliott et al. [10]. In the future, use r = 1 / n
as the time between observations.

Denoting X), :2 X(rk), we have

Xk+1= Xk + Ck+1 (1)

Yksz+ek

(k is a sum ofjumps of X on the interval [T(k -— 1), Th):

Ck = Z {i

r(k—l)§ si<rk

“*7 at 0 and the rest of the mass

Thus, {Ck}k21 are i.i.d. with an atom of mass 8
having (improper) density 1b = 619,; expressible in terms of the original density of

jumps he. To simplify the notation in the sequel, I will call 21) a “density”, actually

10

meaning that 2b(0) is a scaled 6-function, that is for any function g,

/ momma: := . g<o> + / 9(I)1/3(I)dx

Also, subscripts 6, A in qbg, 1129,; will be omitted.

Suppose the priors are given:

6, A have density 7r(-, -),

X0 has density 7Txo(')-

 

Our goal is to ﬁnd the posterior conditional distribution '
£1r(Xk I Y1, "-)Yk)'
From (1), we obtain the densities

k
le ..... Yk (311, "'7yk I X07"°3XI€36) : M ¢(y _

i=1
PXO ..... Xk ($03 “'1 (Ck ] 67 A) : 7rXo($0) H $6,)‘(1‘j — $j_1).

1:1

For briefness, let’s denote
xk 2: ((130,...,:Ck), Xk 2: (X0, ...,Xk), Y), :2 (Y1, ...Yk).
Now, applying Bayes’ Theorem, the joint density of Xk, Yk, 6 and A is

13(XkaYka9a A) = p(yk ] Xk,0,/\) 'p(xk I 0,A) ° 7f(0,)\) :

a-

k
7r,(6 A)- 7rX0( 3:0) -¢(H )-ib(H -—:rj_ 1) (2)

i=1 1:1

11

and the conditional density given the observations

130‘)“ka 0’ A)
6 A Y = '
p(xka 7 I k) ka f9 pr(xk,Yk,9,)\)ka d0 dA

 

Introduce

qk(:r,6, A) := [Rk p(xk,Yk,6, A) dxo....dxk_1.

It is an unnormalized density of the latest state X), and parameters 6,A given the

observations Yk. The normalized density pk(x, 6, A) is then given by

(Milt, 0, A)
fn fe fa Qk($, 6, A)d:r d6 dA'

 

pk(:1:,6,A) := /p(xk,6, A | Yk)da:0....d:rk_1 =
The reason we use this density in an unnormalized form is the recursive relation:
Theorem 1
qo(:r, 6,A) = 7rX0(a:) - 7r(6, A),

qk(a:,6, A) = (159(Yk — (I?) ' [1‘10ng — z)qk_1(z, 6, A) d2 = (3)

z: 4590”,, — :13) - [e‘Aqu_1(:r, 6, A) + [Ribg’ﬁx — z)qk_1(z, 6, A) dz].

Proof: Straightforward, follows from integrating (2)

CI
Remark.

1. In order to use Theorem 1 for the estimation of state Xk, we will compute

12

 

qJ-(zr, 6, A), j g k consecutively, then compute marginal unnormalized density qk(:r) :2

qu(:r, 6, A) d6 dA and then find

XI: 3: E(XkIYk) = W- (4)

2. Although not derived explicitly, the unnormalized density q has to do with a
change of the original probability measure to, say, Q, which makes the observations
Y1, ..., Yk independent of the process X (t). This way, prior distributions on (6, A) and
X (0) ensure that the two measures are absolutely continuous with respect to each
other. The change of measure approach is used extensively in non-linear ﬁltering.
The recursive formulas for the densities can be used to compute “on-line” updates

as new observations are coming in.

2.2 Single-block upper bound for expected square error.

Next, we investigate asymptotic properties of the above ﬁltering estimator X (T) :2
XnT as the observations become frequent (n —> 00). First, we will produce a sub—
optimal estimator of X (T) based on a single “block” of observations at time points

immediately preceding T.

Assume that the last observation is obtained exactly at the moment T. Denote

The following discussion is based on the well-known fact (e.g. see [3, p. 84])

13

 

Lemma 1 For a square-integrable random variable X, sigma-algebra .7 and an .7:-

measurable random variable U,
E[X — E(X].7-')]2 S E(X — U)2

[:1
Setting .7 :2 o{Y1, ..., Yk}, we can see that the ﬁltered estimator X), introduced by
(4) has the smallest expected square loss among all possible estimators of X k based

on observations Yk.

 

To produce an upper bound on €T(T), consider the following sub-Optimal estimator

of X(T):

714A) 1: Z Yj/(RA),

k—nA<jSk

where A is the block length to be speciﬁed later. Here, k = k(7') = T/r, so that

X(T) = X,

Theorem 2 . Asymptotic upper bound for E(X — X )2

AsT——>0,

€T(T) S (A1536? + Eeibﬁ + 0N?) (5)

Proof: Consider the estimate 7,,(A) introduced above. By Lemma 1, it is no better

than X (T), that is

€T(T) S ElX (T) - 7k(A)I2'

14

Suppose that the process X has m jumps on the interval (T —A, T], with the locations
ofjumps 3,, ..., gm and the heights ofjumps 51,...,£m.

Denote

consecutive values taken by X (t) for t E (T —- A, T], and Sm E X (T).

Note that

 

Emaxj(Sm — 5,)2 g Ems... — 5,)2 = ﬂing—3133.
1

Therefore,

l

EIX (T) - 71:63)]? < 2:: + 62—”ng [AA + (my: x Elf—ill. +

m! 2

 

Setting A = Tb for some 0 < b < 1, the above becomes
2 E6? '7'1—b + (1 — Arb)E§f [Arb + 0(7’b)] .
Setting b = 1/ 2, we obtain the statement of the Theorem.
[3
Remark. Note that since the estimating procedure we used did not depend on 6,

the above Theorem is also true when the parameter 6 is unknown. In that case, one

needs to consider Bayesian loss

em) = [9 Boom) — X(T)>2vr(0) do

15

and, integrating (5), obtain the bound

6%.(7) s v? feoEt? + Eei)7r(0) d6 + ow?)

To produce ﬁner approximations, we have to assume the knowledge of the error

distribution.

2.3 Multiple-block upper bound

Next, we modify our estimating procedure. Starting with time T, we will probe one
block of observations after another, stopping whenever we believe that a jump has
occurred.

The following results were obtained when the error distribution is considered
known. Denote as := m.

We use the same idea as before: produce a sub-optimal estimate for X (T) based on
7 for a suitable interval. The difﬁculty lies in not knowing where exactly the last jump
of process X occurred. Consider the intervals (blocks) (T1, T0], (T2, T1], ..., (T N, TN-1],

where

To Z: T
T]- := Tj_1—(lnn)j/n, j = 1, ...,N

TN+1 :: 0.

16

There is a total of

lnn

_ lnlnn

 

blocks; j-th block has length (In n)j / n and n, 2: (ln n)j observations. The last block
has length 1/ln n.
Let X]- be the value of the process at the end of j-th block, that is X, := X(Tj_1).

Let 7]- be the average of observations on the block j, that is
7]- := n;1 :1”, HT, < hr 3 TJ-_1).
k

Assumption 1 . Let

Xm- Uef'lﬁ

 

be the normalized sum of m errors. Assume that for the distribution of errors 6,,
the following is true. There exist constants C], C2, C3 and K > 0 such that for all

sufﬁciently large m and all integers j

Eixi. 1(lxml > 01-m1/KJ')I < 02 cam—c3 mm“).

This assumption is satisﬁed for Normal errors with K = 2; in general, it requires ek
to have small tails.
The following is a simpler-looking but more restrictive than Assumption 1:

Assumption 1' . For Xm given above, there exist constants 0,7 > 0 such that for

17

 

all suﬁ‘iciently large m,
Eezvphlxml) S G.

Proposition 1 Assumption 1' implies Assumption 1 with K = 1.

Proof:
Suppose that Assumption 1’ is satisﬁed. Let Fm(.) be the distribution function of
Xm- Pick C1 such that x2 < exp('y|x|/2) for [x] > Cl.

Then for any j,

/ I{|x| > Clml/j}x2 dFm(x) S/ I{|x| > Clml/j}e7'$I/2dFm(x) g
R R
S exp(-701m’/j/2) [R el'x'dme) _<_

S ezp(-vClm’/j/2) ' G

Theorem 3 . Tighter upper bound for E(Xn(T) — X (T))2

Suppose that the error density d) is known and does not depend on the parameter
6, and there exists a constant A0 such that A 3 A0. Then, under Assumption 1, there
exists a constant C such that for n —> oo,

lnMn

E(Xn(T) - X(T))2 S C

 

with M = (1 + 2/K) v (3 — 2/K).

18

 

Proof:

Consider N —- 1 blocks as described above. Denote T* the point of last jump of X:

T*=sup{0§tST:X(t)—X(t—)>0}.

The idea is to approximate T *, then take the average of all observations from that
moment up to T.
Construct an estimate of X (T) as follows.

Deﬁne jo as

[71'

 

_7. .
1+" > 201.74%“ N. (6)

0e

jozzinf{j>0:\/n_j

Then, as our estimate of X (T), take

.-

[X(T) :2 3730.

We will ﬁnd an upper bound for the average risk of this estimate, 6 z: E(X (T) —
X (T))2. For this, we will need several inequalities, with proofs to follow in the next

section.

Case 1 . Block 1 jump

On the event F that the last jump of X occurred on Block 1, F 2 {T1 < T*},

n == EI(X(T) — X(T))2IFI s 0351—" (7.1)

19

Case 2 . Correct stOpping
In the event S that the last jump of X occurred just before the Block jo, S = {TJ-OH <
T" 5 Tie}

ln2n

is 3: EI(X(T) “ X(T))2ISI g 04—”— (7-2)

Case 3 . Late stopping
In the event L that the last jump of X occurred in Block j, 1 < j S jo,
L : {Tjo < T*}

a := EIGHT) —X(T)>21LI s c——:—:/5 (7.3)

Case 4 . Early stopping
In the event 8 that we stopped on the Block jg but there was no jump of X, 8 =
{T* S TIM-1}

(ln n)3‘2/K

a 2: EM (T) — X(T))2I£l s 06 (7.4)

Now note that P(F U S U L U E) = 1. Thus, 8 = [F + 63 + [L + 85. Also, the
estimator X does not depend on A and particular form of jump density hg, as long
as the frequency of jumps A is bounded.

By Lemma 1, the risk of estimate X does not exceed the risk of X. Combining

(7.1) through (7.4), we obtain the proof of the Theorem.

Cl

20

2.4 Proofs of inequalities used in Theorem 3

Proof of (7.1)
The probability of jump on the ﬁrst Block, which has length lnn/n is
P(F) = ln n/n + 0(ln n/n), and probability of more than one jump on the ﬁrst Block

is 0(ln n/n). Therefore,

- lnn

E[(; ’(T) - X(T))21FIS (og/lnn + E591” "/71 S 0371—-

Proof of (7.2)
Let j0(w) be, as before, the last Block included in the computation of X (T) First,

consider the special case jo = N. Then
EI(X(T) — X(T))213100 = N)l :
_<_ og/(n/ln n) - (l/lnn + 0(1/ln n)) g const/n
Now let jo < N.
Suppose that the last jump T*(w) occurred on the Block jo + 1, that is, w E S. Then

X (t) = X (T) for Tjo < t < T, and the squared loss from estimating X (T) equals the

variance of on, so that

EIGHT) — X(T))2ISIU0 < M] s ; EKVJ- — X(T))?ISI _<_

by independence of {ek} and process X. Thus,

N , . . I 2
65 S const/n + Z(ln’+’n/n + o(ln3+’n/n)) - of ln‘Jn g Cit—nag.
i=1

Proof of (7.3)
Thanks to (7.1), we can exclude the case when the last jump T" happens on Block 1.
Therefore, suppose that the last jump happens on Block J, J > 1, but we stop the
summation only at Block jo, jo 2 J.
Denote N J 2: {last jump happens on Block J}. Our stopping rule (6) implies that

fOIJSijo,

[73' " 7j—1I _<_ 20eC'1njlli2H/UTUK

 

 

Thus,
El(X(T) - X(T))21NJI S
Jo
S E(IX(T) — VJ—ll2 + Z [7]- — 7j_1]2) - P(jump on Block J) S
j=J
J J N .
_<_ l_n_n + 0(ln n)] x [aim—(JAM + C7 2 nglfz/(rl’KI g
n n F]
1+2/K 1+2/K
S 03122 + 07% S 05922;
n n
C]

Proof of (7.4)

If the stopping occurred too early then X (t) = X (T) for Tjo+1 < t < T. Also, the

22

stopping rule (6) implies that at least one of

|7jo+1- X(T )I > C 0en 201/2+’/’°K,

IVJ'O - “V )I > C laen j—ol/a’l/JOK
is true. Thus,

E[(X(T) — X(T))“:‘Isi s
3 Elm. — X(T)? - P(IT:.+1— W )I > 0,0.n;01/2+1/20K)+

E (m, — X(T)|21(|7j0 — X(T )| > C oen gel/“”90“” s E1+ E2.

By Assumption 1, E2_ < NC2exp(——n j”) < C2 (ln n)/n.

To estimate E1, consider the Chebyshov-type inequality

-— '2 1 'K 2 —, 2 K
(cloaks/2° ) P(Ih..l— XT< >I>Ca.n,, ”+100 )<

< E 07.0.1 — X(T>I21<IT..+. — X(T)| > Claens¥f+‘/j°")) 3 C20" n>/n

by Assumption 1. Therefore

N—
_ - —2
<2: o2/nj C2(lnn)/n- (CloeanQH/JK) _<_

02

 

112.0,) n)(1-2/jK)(j+1)<

|/\

062

(ln n)2"n 32/“ (ln n)3‘2/K
062—— < c ——
j

ann

_ 6
n

23

2.5 Lower bound for expected square error.

Let us, as before, have exactly n observations on the interval [0,T] and the last
observation is made at the moment T. Then X (T) = X".
To estimate the expected squared loss of the Bayesian estimate Xn from below,

consider the estimator

V

Xn : E(XnIYla "'a Ym Xn—17 In),

where 1,, = I (Xn 7Q Xn_1) is indicator of the event that some jumps occurred on the

last observed interval.

It’s easy to see that X" = E(XnIYn,X,,_1, In) and that E(X” — X,,)2 < E(X" —

Xn)2, since the estimator X" is based on a ﬁner sigma-algebra.
Proposition 2 . The expected square error for X",
E(X" — X,,)2 = C/n + 0(1/n),

where C > 0 is some constant not depending on n.

Proof:

Consider random variables
Zn ~ lib
so that Xn = Xn-1+ InZn, and

W" 2 Z, + e,,.

24

Joint distribution of Zn, Wn does not depend on n and
P(Zn 6 dz, Wu 6 dy) = 6(T)¢(y — T)-

Also note that on the event {In 2 0}, Xn = Xn_1 and on the event {In 2 1},

Yn = Xn_1 + W". Therefore,

U

Xn = E(XnIYnaXn—laln) = Xn—l + InE(Zn I Wn)
Let 2,, :2 E(Zn | W"). Then
E(X, — X,,)2 = P(I, = 1)E(Zn — 2,)?

Clearly, E(Zn —- Z")2 > 0 and P(In = 1) = 1 — e‘V" = A/n + 0(n‘1). This gives us
the statement of Proposition with C = A - E(Zn — Zn)2.
El

This proposition shows us that the hyper-efficiency observed in case of estimating
a constant mean (different rates for different error distributions) here does not exist,
because there’s always a possibility of a last-second change in the process. The
following informal argument shows us what one can hope for with different error
distributions.

Suppose that the number of observations J since the last jump is known. Set
X, = E(anYl, ..., Y", J).

Just as before, E(X” — X,,)2 5 E(X" — Xn)2.

25

 

The Optimal strategy is to use the latest J observations. If the error density (b has

jumps (e.g. uniform errors) then this strategy yields

- 1 l 1
7 2~ —1 2 _ __ ~—
E(x,,—X,,) _n (1 +22+...+J, _n

On the other hand, for the continuous error density (e.g. normal errors)

v 1 1 lnn
E] X _ Y 2 ~ “—1 + _ + + _ ~ _
( n n) (1 2 "° J)

26

3 Estimation of parameters of a jump process

Next, our goal is to estimate the parameters of process itself, that is the time-intensity
A and parameter 6 describing jump density hg, based on observations Y(t), 0 S t g T.
Recursive formula (Theorem 1) will allow us to do it. The question is: how efﬁcient
are these estimates?

Assume, as before, that the error density (b is known. Without loss of generality,
let 0,, = 1. Also, assume that A is bounded by some constants: A1 3 A 3 A2.

When the entire trajectory of the process X (t, w) is known, that is, we know exact
times t1, t2, when jumps happened, and exact magnitudes 6,- = X(t,) — X(t,—),i 2
1, the answer is trivial. For example, to estimate intensity, we can just take A :=
2121103 S Tl/T-

Likewise, inference about he will be based on the jump magnitudes Q. It’s clear
that these estimates will be consistent only when we observe long enough, that is
T —) 00. In fact, we will consider limiting behavior of the estimates as both n and T
become large.

Now, when the only observations we have are noisy Y,, we can try to estimate the
locations and magnitudes of jumps of process X. Let n be number of observations

on the interval [0, 1]. Split the time interval [0, T] into blocks of m = n5 observations

27

each. Let Zk be the average of observations over Block k,

1 m
Zk : 67—2 2 Ym(k—l)+j

2:1
Consider several cases (see Figure 1). Let a > 0 and 6 > 0 be speciﬁed later.
Case 1. \/r_n'|Zk+1— Zkl g m".
In this case we conclude that no jump occurred on both Block k and Block k + 1.
Case 2. m|zk+1— Zk| > m", £72]st — Zkl _<_ m“, frank+2 — Zk+1| g m“.
In this case we conclude that a jump occurred exactly between Block k and Block
k + 1, that is, at time t = mk/n. Here, estimate the magnitude of this jump as
6* = Zk+1 — Z]:-
Note: accumulation of errors does not occur when estimating { because the esti-
mates are based on non-overlapping intervals.
Case 3. ﬁ(Zk+1 —- 21,) > m“ and «7742,. —— Zk_1) > m“, or
JEKZHI — Zk) < m“ and mm, — Zk_1) < m“,
In this case we conclude that a jump occurred in the middle of Block k, that is, at
time t = m(k + 0.5)/n. We estimate the magnitude of this jump as 6* = Zk+1 -— Zk_1.
Case 4. Jumps occur on the same Block, or on two neighboring Blocks.
The probability that at least two such jumps occur on the interval of a ﬁxed length
is asymptotically equivalent to (m/n)2n = m2/n. Picking 6 < 0.5 we can make this
probability small.

28

Of course, there are errors associated with this kind of detection, we can classify

them as:

Type I Error: we determined that a jump occurred when in reality there was

none (this involves Cases 2 and 3).

0 Type II Error: we determined that no jump occurred when in reality it did (this

involves Cases 1 and 4).

o Placement Error: we determined the location of a jump within a Block or two

neighboring Blocks incorrectly.

o Magnitude Error: the error when estimating the value of 5,- (jump magnitude).

Note that the placement error is small, it is of order m/n. The magnitude error is

based on averaging m i.i.d. values, and is therefore of order m‘l/z.

3.1 Errors in jump detection: Lemmas

Let’s estimate the effect of Type I and II errors. Here, as in Section 2.3, we demand
that Assumption 1 hold.

Type I errors.
Assume that there are no jumps over the Blocks k and k + 1, but we detected one
according to Case 2 or 3.

29

Consider

P(x/ﬁileH - Zkl > m“) = P(IXm,k+1_ Xm,kI > m“),

where

m
ij1em(k—l)+j

Xch : o’e\/n—1

 

is the sum of normalized errors. Further,

P(IXm,k+l — Xm,kI > m“) S 2 - P(IXchI 2 0.5 m“)-
From Assumption 1, for any integer j > 0

Eleni IUXchI > 01 -m’/"")I < C2 (firm—Ca ml”),
and the application of Chebyshov’s inequality yields

P([mec] > C1 -m’/Kj) < Cexp(—C3 ml/j) -m_2/Kj

Picking j such that 1 / K j > a > 1 / (2K j) and for m large enough, summing up over

Tn m“1 blocks, we obtain

Lemma 2 . As it —> 00, provided that T grows no faster than some power of n,

2K0)

P(Type I error) < C - Tn m‘lexp(—C3m ——> 0

30

Type II errors.

Suppose that a jump occurred on Block k, but it was not detected (Case 1), that is
IZlc—l -‘ ZkIVIZk+1— ZkI S ”la—0'5

Of Blocks k — 1, k + 1, pick the closest to the true moment of jump. Without loss
of generality, let it be Block k — 1. Let 5 be the size of the jump. Then averages of

X (t) on Blocks k and k — 1 are different by at least f / 2 and
P(IZk—l — 2k] S m°_°'5) S 2P(2IXm,kI > lélx/Tﬁ/2 - m“) <
< C - Tn m_1exp(—C3m2K€) (8)

as n —> 00, as long as [E] > m‘0'5+°‘+5, for an arbitrary e > 0 (use Assumption 1 in a
way similar to Lemma 2).

Consider separately
P(U{I€.-| > m—0'5+“+E}) g C (AT + o(T))m’°'5+°+5,

using the assumption that density of E,- is bounded in a neighborhood of O and the
total number of jumps is AT + 0(T). Finally, take into account Case 4 which yields

an upper bound CAT m2/n. Summing up, we obtain

Lemma 3

P(Type 11 error) < C. ,\T(n(—0.5+a+e);3 V 7123—1)

31

3.2 Asymptotic behavior of parameter estimates.

For simplicity, determine the behavior of estimates separately, that is consider ﬁrst
an estimate of A, and then an estimate of 6. Let 6 E O, with 9 being bounded subset
of R. Let true values of parameters be A0 and 60.

Let t‘; be consecutive jumps of X () determined by Cases 2, 3. Estimate the

intensity A by

11(3‘Z:1<T)
2'12

SHIP:

From the previous discussion it’s clear that A' is asymptotically equivalent (as T —)
00) to A determined from the “true” trajectory of process X. Thus, it possesses the
same property, that is asymptotical normality with mean A0 and variance C/T for
some constant C.

To estimate 6, use the following

Assumption 2 . Jump magnitude 6 belongs to an exponential family with densities

with respect to some measure )1,

ha(T) = 813pr (2:) - A(9))

Under this Assumption, A(6) = lnfexp(6B(x))dp(x). Also, A’(6) = EoB(€) and

1(6) :2 A” (6) = Var9[B(§)] is Fisher information. We follow the discussion in [20,

32

Example 7.1]. There, the Bayesian estimate with respect to prior 7r(6),
5:: E.(0 | at 2 1)
is asymptotically normal, under some regularity conditions on 7r(6).

Assumption 3 .

(a) [(6) > 0

(b) h9(x) is bounded in a neighborhood of 0, uniformly in 6.

(c) There is a constant 7, 0 g 7 S 1/4, such that for large enough N there exists A

such that P(|§,| > A) = 0(N’1) and

bA :2 supmgA = suplxlSA|6B'(x)| = 0(N7)

£071,190“)

 

 

uniformly in 6.
Deﬁne the log-likelihood function based on estimated jumps
N
To) = 21221225).
i=1
Theorem 4 . Let Assumptions 1-3 hold. Then the maximum likelihood estimate
6* = argmaxgee L*(6)
is asymptotically normal, that is
(AoT) (0‘ — 00) —> N[0, [(60)_1]

in distribution as T ——> 00 no faster than T = n", where k < (1/5) /\ (1 — 47).

33

Proof:

Pick 5, k, oz and e such that

n+2ﬂ—1<0
k+B(—0.5+a+5)<0

7—6/2<0.

With (1,5 arbitrarily small, this is achieved when 2k < 6 < (1 —- k)/2, so that
k < (1/5) /\ (1 — 47).

According to Cases 1-4, the estimated jump magnitudes are
5: = (£1 + 6?)IEC + (511E,

where E is the exceptional set where Type I and II errors occurred, 6,- are the esti—
mates of 5 resulted from these errors, and 6,9 are “magnitude errors” discussed in the
beginning of this Section.

From Lemmas 2, 3, P(E) ——) 0 as n —+ 00. Therefore we can disregard this set

and consider only
§=6+£-
Let N be the total number of jumps on [0, T]. Consider the log-likelihood function

12(9) = Z In (19(56-

1'21

34

Under the conditions of Theorem, maximum likelihood estimate 6 = argmaxgeeL(6)
is asymptotically normal with given variance. Next, we would like to show that the
estimate 6" based on {€313,521 is close to 6 based on true values of 5,.

Note that both maxima exist because L” (6) = —NA”(6) and therefore L(6) is a

convex function, the same is true for L*(6). Furthermore, for any 6 in a neighborhood

A

of 6,

L(6) = L(6) — “’ "’29)2N .4"(é) + 0(0 — é)2

 

Thus, if L(6) — L(6) = 0(1) then (6 —— 6)2NA”(6) = 0(1) and therefore (6 — 6) =
0(N‘1/2).

According to Lemma 4, [L(6) — L‘(6)| = 0(1), and also |L(6) — L*(6)[ = 0(1).
Therefore, (6" — 6) = 0(N‘1/2), and the statement of Theorem follows from the

similar statement for 6.

Cl
Lemma 4 . Under the conditions of Theorem 4,
[12(9) - L’(9)l = 0(1)
uniformly in 6, outside some exceptional set E1 with P(El) : 0(1).

Proof:

According to Assumption 3.3, the probability of event E1 :2 U,{|€,-| > A} is 0(1).

35

Thus, excluding E,

N
IL(0) — L*(0)I E 2 Nu ham.) — ln hem.- + 63H 3 Z IOB’(.{,)| - [5?] g CbAn‘5/2
1:1 12.19:

The statement of Lemma follows as 7 < B / 2.
Cl
Examples.
Assumption 3(c) is the hardest to verify. It holds, e.g., in following cases:
a) when 8’ (x) is bounded. For example, exponential distribution with h9(113) =
exp(—6x + A(6)), x 2 0.
b) The normal distribution with hg(.’L‘) = exp(—6x2 + A(6)). Here one can pick

A =2 In N, and then bA = 6ln(N)/2 = o(N7) for arbitrarily small 7.

36

4 Piecewise-linear process

4. 1 Problem formulation

Let’s consider the following simpliﬁed version of the problem.
Suppose that the velocity of the target is piecewise constant with jumps belonging
to the set {t,c = k/n;k : 1, ...,n}, the probability of jump pn and distribution of
the height of a jump are known. Note that this restriction is not important since
for n —+ 00 the difference between this and our original process X (with jumps at
arbitrary locations) becomes of order O(n’2). We will be interested in the case when
pn = A/n + 0(1/n). Let r =1/n.

Denote Vk the velocity of the target on the interval [k/n, (k + 1) /n] and X), the
position of the target at the point k/ n.
Suppose the observations Y), are made at points 1/n, 2/n, ...., (n — 1)/n, 1.

Overall, we have the model

Vic = Vic—1 + Ck
Xk = ch—l + TVk—l (9)
Yk = X]; + 6k,

where ek are i.i.d. observation errors with a known density (be, possibly also depending

37

000,

(k = 0 with probability (1 — pn)

(k = {k with probability pn,

and {5),} are i.i.d. with a known density hg, depending on some parameter 6.

Initial values X0, ’0 and the parameter 6 have some prior density 7r(x0, v0, 6).
Random variables X0, ’0, {C1,} and {ek} are jointly independent.
The problem of ﬁltering is to ﬁnd the posterior conditional distribution of the position

of process Xn and parameter 6 given the observations Y1, ..., Y”.

4.2 Recursive ﬁltering equations

We will obtain the ﬁltering equations analogous to Section 2.1. But in this case, we
have to consider an unobservable variable (velocity), and the resulting ﬁltered density
will be two-dimensional.

To simplify notation, in the sequel we will write that C), has “density” w E 1/10-

Consider the joint density of V, 6 and Y:

I: 1:
1520110; vo, vk; yl, yk; 9) = 71(To, 120. 0) X H ¢a(yj - 133-) H 1149(1),- — 22,--1)
j=1 j=1

where xk are uniquely determined from (9) as x), = x0 + 7' Egg; vj.

By changing the variables {xo, v0, v1, v2, ..., vk} to {x0, x1, x2, ..., xk, vk}, we obtain

38

the joint density of X, 6, v), and Y: ‘

k
PZ(T0,T1, ---,Tk,vk;y1,.--,yk;6) = 714100.316) x H 459(1),- — x,) x
i=1

 

1 “'1 x-1—2x-+x-_1 xk—xk_1
X927 H ¢0( 3+ 7] J )X 160(1)}: — _7—), (10)
i=1

where prior density 7i,(x0, x1, 6) can be determined from 7r(v0, x0, 6). The density is
“predictive” in its v), argument: though v), depends also on n+1, we do not include
Yk+1 into the equation.

Introduce qﬂxk, vk, 6) the unnormalized density of X2, V), (position and ve-

locity of the target at the moment k/n) and 6 given the observations Y1, ..., Y),
q;(xk,vk,6) = /p2(xo,x1, ...,xk,vk;y1, ...,yk;6) dxo...dxk_1. (11)

Then the following recursive relation holds

Theorem 5 .

qg(x,v,6) = 7r(x,v,6)
117501.116): 7600/1: " 1‘) ' [$001 - vk—1)Qk—1(IB — vk—1720k—1,9)dvk—1 =
= rqb9(Yk — x) [e'Aqu_1(x — vr, v, 6)+ (12)

+(1— 67M) ° [how — Uk—1)Qk—1($ — ”Uh—17, ilk-1,9) dvk—l]

39

Proof: Replace vk_1 = (x,c — $k_1)/T and rewrite (11) as

33k — HYk—i

qlcr($k7vki6) : [pg—1(an-Tli "wink—la iyla --'1yk—1;6) X
It} — £13k-
X(b9(Yk — 2)./29(1),, — ————‘) dxo...dxk_1 2
17k — (Ck—1 55k _ mk—
= 050(Yk - 33k) [Gk—1U?!“ ————,9)I//‘0(vk — ————l)d$k—1

Changing the variable xk_1 into x), — vk_lr, the above yields the statement of the
Theorem.

[:1

4.3 Asymptotics: special case

The question of interest is whether the asymptotic rate of the optimal Bayesian
estimate depends on the smoothness of error distribution, that is, whether “hyper-
efﬁciency” takes place. In [22], the asymptotic rate for estimating a linear function
X (t) 2 X0 + Vot was found for uniform errors and was equal to 0(n’2). Will this
rate be changed when we shift to piecewise-linear?

Consider a special case. Let s have the (prior) uniform distribution on [0, 1] and

consider the following process instead of X:
W(t)=(t—s)+=(t—s)v0, ogt 31

with observation errors e,- having uniform distribution on the interval [—7, 7] (7 > 0 is

known). The process W in this case is a piecewise linear function with all parameters

40

known except the turning point 3, and we need to estimate 5 based on observations.
Note that estimating the ﬁnal position W(1) is equivalent to estimating 3.

It’s clear that the asymptotic rate of E(W (1) — W(1))2 is no worse than the
asymptotic rate of X from the previous section. Let’s write the likelihood function

(the density of distribution [(3 ] observations Y1, ..., Yn))

f(s)=f(s]Y1,...,Y,,)= II 51;I{e,-Z(i/n—s)+—7}

ogi/ngl

Let true value of s = 1. Then the likelihood equals to a constant on the interval

[s*,1], and 0 otherwise, where
s" = min{u : (i/n — u)+ — e, S 7}.

Then, the Bayesian estimate of 3 (given its uniform distribution) is
s = E(s | Y1, ...,Yn) = (s‘ +1)/2

Consider the step function (see Figure 2)

 

 

l—u u+1
— >

 

 

 

D,(t)_ 2 1(1 2 ).
W(t)
023—1“) :
' D,(t)
0 5""’"'1—_

Figure 2

41

Note that D,(t) g W(t) S D2,_1(t). Now consider
s = min{u : Du(i/n) — e,- g 7}.

It’s clear that s > 3“. Find the asymptotic rate of the estimate s, = (s + 1) / 2. We

have

Pegs—=— II i(22-042)):(1—11.(2))“-‘>".

tgi/ngi

In case when t = Cn’1/2, P(s _<_ t) z exp(—C) —> 0 when C —> 00. Thus, for 3 close
to 1, s, is 1 — 0(n-1/2).

The same statement can be obtained for s = min{u : D2,,_1(i/n) — e,- S 7}.
As a result, we would have s < 3* < s, and it follows that for 3 close to 1, s“ is
1 — O(n“’/2).

The case 3 = 1 is the worst, for s < 1 the estimate s, is s — 0(n‘1). Finally, this

yields the following for the expected square loss (with uniform errors)

Proposition 3 .

Keeping in mind that the expected square loss for normal errors cannot be lower than
0(n’1), we obtain some “hyper-efficiency” in this case, although not as good as when

estimating a constant [rate 0(n‘2)].

42

5 Comparison to linear ﬁlters

The optimal linear ﬁlter for our problem is the well-known Kalman ﬁlter. Let’s take
a look at its asymptotics.

5.1 Jump process

In case of jump process, the model can be re-written as a state-space model (for

example, see Brockwell and Davis [4])

KZXt-I'et, t:1,2,...

Xt = Xt-1+ Ct

and subsequently the optimal linear predictor (the estimator of X 2 based on all obser-
vations up to time t — 1) is based on second moments of 6; and C, and can be written

as

— - Q
Xt+l = Xi + at 4:030? — Xt)

 

where Qt = E(Xt — X02 is deﬁned by

a?
_ 2 t
o,,, _ a. + Eg — ———Qt + 03

and 520 depends on prior distribution of X0. Furthermore, the optimal linear ﬁlter is

0:

X 2X +
t+1 t Qt+0§

 

(Y1 - X1) (13)

43

 

with the error

W

EX—X2za— .
( t t) t Qt+0§

 

Note that when the observations become frequent (n ——> 00), the optimal predictor
and optimal ﬁlter become close.
It can be shown that ﬁlter (13) is asympotically (when t —+ 00) equivalent to the

exponential ﬁlter

X1+1= X1 + 304 " Xt)

 

where 6 = limHooQ—ﬁig and when n —> oo,

2 = M - 22/2. + 00/62?)-

Thus, the asymptotic rate of the best linear ﬁlter is of order n‘1/2:

E(Xt — X62 = \/’\/—" ' 050}; + OUR/5)

See Figure 3 for the graphical comparison of linear (exponential) and optimal non-

linear ﬁlters.

44

 

 

 

Process position
............... Linear ﬁlter
— Non-linear ﬁlter
°°°°° Observations

Figure 3

45

5.2 Piecewise linear process

In case of piecewise linear process, we can reformulate (1) as a system of state-space

equations

Y, = X 1 + et
(’25) =(3 1(")(i)+(')
t+l t C t
where Xt is the current position of the process and V, is the current velocity (note
that only position is observed, not velocity), as before, Ct IS the jump in velocity over
the interval [t / n, (t + 1) / n], and at is the change in X caused by this jump in velocity,

which is negligible (of order 0(n‘2)) when n is large.

Applying the recursive equations for Kalman ﬁlter (see [4]), we obtain the following

 

recursions for the best linear predictor ( if ) :

“(7) (a 1m ) ()

with the error matrix

Qt:(:11 11112) =E(
12 ”(U22 ,

deﬁned by recursive relation

 

<><
\/
I
A
<><

1 ( ((1011 + “(Uzi/n)2 (“’11 + w21/701021 )

— 2
w“ + 03 1011 + wzl/n)w21 “’21

46

where

: 0 0 0 n.
Q. (0 0§,/,,)+ (1/)

It follows that the elements of matrix Q = limHooQt are

w“ : «203/202/211’3/4 + 0(n_3/4)

C
w12 = oeogn—l/2 + 0(n_’/2)

w22 = ﬂag/20:62 n—1/4 + 0(71'1/4)

Furthermore, as n ——) 00, the optimal predictor X, and optimal ﬁlter X, are close.
Thus, the asymptotic efﬁciency of the optimal linear ﬁlter E(X, — X,)2 is of order
n'3/4.

A summary of asymptotic behavior of o2 2: E(X" — X”)2 for jump and piecewise-

linear processes is given in Table 1.

 

 

 

 

smooth errors discontinuous errors Linear Filter
Jump Process 3 lnMn/n g lnMn/n n‘”2
Piecewise-Linear Process 2 n‘1 n‘3/2 (Special case) n"3/4
Constant Location Parameter n—1 n‘2 n‘1

 

 

 

 

Table 1. Summary of asymptotic results

47

One can see that increasing the “smoothness” of a process improves the asymptotics.
The behavior of the optimal linear and non-linear ﬁlters for a piecewise-linear process
is shown in Figure 4. The optimal non—linear ﬁlter was evaluated using the sequential

Monte-Carlo method described in [8].

 

 

 

Process position
............... Linear ﬁlter

Non-linear ﬁlter
0 o o o 0 Observations

Figure 4

48

6 Simulation

The author has simulated the optimal ﬁlter for the jump process based on Theorem 1.
There, the densities qk(x) are found recursively using the relation (3). We consider a
case when the parameters 6, A and initial state X0 are known, and the error density
is uniform on the interval [—7,7]. This causes qk(x) to be conﬁned to an interval
[Y]c — 7, Y), + 7], and we keep a discretized version of qk(x) in memory. Integration
required for evaluating (3) was performed numerically.

The results of simulation are given in Table 2. In the case considered, distribution
of jumps is uniform on the interval [—2.5, 2.5], with 7 = 1, on the interval t 6 [0,9].
Two cases, 7‘ = 0.02 and r = 0.04 were considered. For each Monte-Carlo sample of

the process, the ratio of effectiveness of non-linear ﬁlter to the linear one was found:

 

~ 2
R .= [ ?;1(Xt_Xt)2]1/
E(X, - X62 ,

where X, is the optimal non-linear ﬁlter at time t and X, is the optimal linear ﬁlter.
Also, the mean square error M Sn, of the optimal non-linear ﬁlter is given. In each

case, N = 100 Monte-Carlo samples were generated.

49

 

7' mean of R st.dev. of R MS",

 

0.02 2.141 0.535 0.0249

 

0.04 1.749 0.407 0.0479

 

 

 

 

Table 2. Simulation results

This and other simulations lead us to believe that the optimal ﬁlter becomes more
effective relative to the linear ﬁlter when:

a) r -—> 0 (we know that from the asymptotics, as well as Table 2).

b) jump magnitudes increase, making the process more “ragged”.

Intuitively, a linear ﬁlter has to compromise between the periods where the process
stays constant (and for the ﬁlter to perform better on those intervals, past observations
need to carry a greater weight) and the times when jumps happen (to cope with jumps,
we need to forget past observations quickly). As a result, greater jumps will upset the
performance of a linear ﬁlter. Some adaptive ﬁlters, for example, IMM ﬁlter described

in [2], might be more competitive.

50

7 Open problems

Naturally, it makes sense to extend the results for a piecewise-linear process (of which
only the special case is treated in Section 4). This is considerably more difﬁcult than
the jump process case, partly due to hyper-efficiency. Another interesting task is to
cover the unknown error distribution (in most asymptotic results above, the latter
was assumed known). The results on parameter estimation (Section 3) might also be
improved.

Also, in view of possible applications, other forms of stochastic process X () de-
serve to be considered. First, two— and three-dimensional processes are obviously
of interest. Second, some other types of processes, rather than jump and piecewise-
linear, might be more useful. Finally, one needs to consider the situation when param-
eters of the process are changing themselves, albeit slowly (“parameter tracking”).
Such situation is considered in Elliott et a1. [10], and no doubt the recursive formulas

similar to Theorem 1 could be derived in this case, too.

51

References

[1] ANDERSON, B.D.O. AND MOORE, J .B. , Optimal ﬁltering, Prentice Hall, En-
glewood Cliffs, N.J., 1979.

[2] Y. BAR-SHALOM, X. RONG LI, AND T. KIRUBARAJAN, Estimation with appli-
cations to tracking and navigation, Wiley, New York, 2001. Theory, Algorithms
and Software.

[3] P. BREMAUD, Point processes and queues, Springer-Verlag, New York, 1981.
Martingale dynamics, Springer Series in Statistics.

[4] BROCKWELL, PETER J .; DAVIS, RICHARD A., Introduction to time series and
forecasting, Springer-Verlag, New York, 1996.

[5] C. CECI, A. GERARDI, AND P. TARDELLI, An estimate of the approximation

error in the ﬁltering of a discrete jump process, Math. Models Methods Appl.
Sci., 11 (2001), pp. 181—198.

[6] CHERNOFF, HERMAN; RUBIN, HERMAN, The estimation of the location of a dis-

continuity in density, Proceedings of the Third Berkeley Symposium on Math-
ematical Statistics and Probability, (1954—1955). vol. I, pp. 19—37.

[7] P. DEL MORAL, J. JACOD, AND P. PROTTER, The Monte-Carlo method for
ﬁltering with discrete-time observations, Probab. Theory Related Fields, 120
(2001), pp. 346—368.

[8] A. DOUCET, N. DE FREITAS, AND N. GORDON (ed.), Sequential Monte-Carlo
methods in practice, Springer-Verlag, New York, 2001.

[9] F. DUFOUR AND D. KANNAN, Discrete time nonlinear ﬁltering with marked
point process observations, Stochastic Anal. Appl., 17 (1999), pp. 99—115.

[10] ELLIOTT, ROBERT J.; AGGOUN, LAKHDAR; MOORE, JOHN 8., Hidden
Markov models. Estimation and control, Springer-Verlag, New York, 1995.

[11] ERMAKOV, M. S. , Asymptotic behavior of statistical estimates of parameters for
a discontinuous multivariate density. (russian), Studies in statistical estimation
theory, I. Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI),
(1977). pp. 83—107.

[12] N. GORDON, D. SALMOND, AND A. SMITH, A novel approach to
nonlinear/non- Gaussian Bayesian state estimation, IEEE Proceedings on Radar

and Signal Processing, 140 (1993), pp. 107—113.

[13] IBRAGIMOV, I. A.; KRAS’MINSKII , R. Z. , Statistical estimation. Asymptotic
theory, Springer-Verlag, New York-Berlin, 1981.

[14] T. KAILATH, An innovations approach to least-squares estimation, parts I , [1,
IEEE Trans. Automatic Control, AC-13 (1968), pp. 646—660.

52

[15] G. KALLIANPUR, Stochastic ﬁltering theory, Springer-Verlag, New York, 1980.

[16] G. KALLIANPUR AND C. STRIEBEL, Stochastic differential equations occurring
in the estimation of continuous parameter stochastic processes, Teor. Verojatnost.

i Primenen, 14 (1969), pp. 597—622.

[17] R. E. KALMAN, A new approach to linear ﬁltering and prediction problems, J.
Basic Engrg., (1960), pp. 35—45.

[18] R. E. KALMAN AND R. S. BUCY, New results in linear ﬁltering and prediction
theory, Trans. ASME Ser. D. J. Basic Engrg., 83 (1961), pp. 95—108.

[19] D. KANNAN, Discrete-time nonlinear ﬁltering with marked point process obser-
vations. II. Risk-sensitive ﬁlters, Stochastic Anal. Appl., 18 (2000), pp. 261—287.

[20] LEHMANN,E.L. , Theory of point estimation, Wiley, New York, 1983.

[21] PORTENKO, N .; SALEHI, H.; SKOROKHOD, A. , 0n optimal ﬁltering of mul-
titarget tracking systems based on point process observations, Random Oper.
Stochastic Equations, 5 (1997). no. 1, 1-34.

[22] , Optimal ﬁltering in target tracking in presence of uniformly distributed
errors and false targets, Random Oper. Stochastic Equations, 6 (1998). no. 3,

213—240.

 

[23] B. L. ROZOVSKII, Stochastic evolution systems, Kluwer Academic Publishers

Group, Dordrecht, 1990. Linear theory and applications to nonlinear ﬁltering,
Translated from the Russian by A. Yarkho.

[24] RUBIN, HERMAN , The estimation of discontinuities in multivariate densities,
and related problems in stochastic processes, Proc. 4th Berkeley Sympos. Math.

Statist. and Prob., (1961). Vol. I pp. 563—574.

[25] A. N. SHIRYAYEV, Stochastic equations of non-linear ﬁltering of jump-like
Markov processes, Problemy Peredachi Informacii, 2 (1966), pp. 3—22.

[26] A. I. YASHIN, Filtering of jump processes, Avtomat. i Telemeh., (1970), pp. 52—
58.

[27] M. ZAKAI, 0n the optimal ﬁltering of diffusion processes, Z. Wahrsch. Verw.
Gebiete, 11 (1969), pp. 230—243.

53

Notation

Some special symbols used in this work.
E(X) expected value of X
s Vt max(s, t) s /\ t min(s, t)
R set of real numbers

C, const some constant (often, their exact value is not speciﬁed but can be easily
obtained)

I (A) or I A indicator of the set/event A

AC complement of the set / event A

AT matrix A transposed

A 2 B asymptotic equivalence, that is A/ B = const + 0(1)
A := B deﬁnition of expression A in terms of expression B
E(X), £(X |...) distribution/conditional distribution of X
argminy f(Y) the value 2 such that f(z) = miny f(Y)

N(u, 02) Normal distribution with speciﬁed mean and variance.

54

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

ill[I][I]]]]][[l]]]]l][]]]l]Ii