fie .123 CI! .11“ [I ,3» . . a; .‘v. w h .2. A. (:73 fit? ;! "( a; 4 s . . 1.5.1:! ‘ E «9.21.; :1. 4i. ‘ ~ 7 a. y: a D. nth)? . u , . w. 5.. “(u . .3. 'I‘HEI.C ' ‘7 .9609 This is to certify that the dissertation entitled Filtering For Some Stochastic Processes With Discrete Observations presented by Oleg Makhnin has been accepted towards fulfillment of the requirements for Ph.D. degree in Statistics W C0 -Major p‘ifiessor Anatoli Skorokhod 06% 9% Co-Major Professor MS U is an Affirmative Action/Equal Opportunity Institution H ab 1b S a 1 eh i 0-12771 Date July 22, 2002 LIBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE JUL 1 f3 2010 6/01 c:/CIRC/DateDue.p65—p. 15 FILTERING FOR SOME STOCHASTIC PROCESSES WITH DISCRETE OBSERVATIONS By Oleg V. Makhnin A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2002 ABSTRACT FILTERING FOR SOME STOCHASTIC PROCESSES WITH DISCRETE OBSERVATIONS By Oleg V. Makhnin The processes in question are jump processes and processes with jumping velocity. We estimate the current position of the stochastic process based on past discrete-time observations (non-linear discrete filtering problem). We obtain asymptotic rates for the expected square error of the filter when observations become frequent. These rates are better than those of a linear Kalman filter. For jump process, our method is asymptotically free of the process parameters. Also, estimation of process parameters is addressed. Copyright by OLEG V. MAKHNIN 2002 Acknowledgments The author would like to thank his advisor, Prof. Skorokhod, for setting up the problem and working with me patiently, providing guidance and insight into the problem. I also thank my committee members Prof. Gilliland, Prof. LePage and Prof. Salehi whose contributions have been invaluable. Special thanks to Prof. LePage for his help with the manuscript and to Prof. Salehi for his advice and supervision. I would also like to express my gratitude to Prof. Koul, Prof. Mandrekar, Prof. Page and all other professors at Michigan State University from whom I have learned alot. Last, but not least, my gratitude to my wife who supported me throughout this endeavor. iv List of Tables Table 1 ........ Table 2 ........ List of Figures Figure 1 ........ ................................ 8 Figure 2 ........................................ 41 Figure 3 ........................................ 45 Figure 4 ........................................ 48 vi Contents List of Tables List of Figures 1 Introduction 1.1 Review of past results .......................... 1.2 Hyper-efficiency .............................. 1.3 Formulation of the problem ....................... 1.4 Main results ................................ 2 Filtering of a jump process 2.1 Recursive formulation ........................... 2.2 Single-block upper bound for expected square error ........... 2.3 Multiple-block upper bound ....................... 2.4 Proofs of inequalities used in Theorem 3 ................ 2.5 Lower bound for expected square error .................. 3 Estimation of parameters of a jump process 3.1 Errors in jump detection: Lemmas ................... 3.2 Asymptotic behavior of parameter estimates ............... 4 Piecewise-linear process 4.1 Problem formulation ........................... 4.2 Recursive filtering equations ....................... 4.3 Asymptotics: special case ........................ 5 Comparison to linear filters 5.1 Jump process ............................... 5.2 Piecewise linear process ......................... 6 Simulation 7 Open problems References Notation vii 49 51 52 54 1 Introduction This work deals with estimating the position of a stochastic process based on past observations (filtering). With respect to square error, the optimal non-linear filter is the conditional expectation of the current state of the process given the observations. The observations are discrete, and we are interested in the asymptotic behavior of the non-linear filter as these observations become more frequent. Several factors may affect the asymptotic behavior of a non-linear filter. One of them is the nature of the process itself. The more irregular a process is, the harder it will be to filter. Another factor might be the distribution of observation errors. The simplest example of such results is the estimation of the mean of a sequence of i.i.d. variables. One can think about this mean as a “process” that remains constant over time. Assume that the variables have a density with respect to Lebesgue measure on R. As pointed out in a book by Ibragimov and Khas’minskii [13], the quality of the estimate depends on whether or not this density is continuous. In the case of a density with discontinuities, the phenomenon of “hyper-efficiency” occurs. One gets different results, for example, in cases of normal distribution and uniform distribution. 1.1 Review of past results Filtering is a major area of stochastic process theory. This has been progress- ing rapidly over the last 40+ years, starting with Kolmogorov and Wiener. A great deal of attention has been paid to the filtering with continuous-time obser- vations that typically involves stochastic differential equations. Among the major contributions here are R. Kalman and R. Bucy (1961)[18], A. Shiryaev (1966)[25], T. Kailath (1968)[14], M. Zakai (1969)[27], G. Kallianpur and C. Striebel (1969)[16], G. Kallianpur (1980)[15], B. Rozovskii (1990) [23]. In most of these works, the obser- vation noise is a Wiener process, or, more generally, the observation process satisfies a stochastic differential equation driven by a Wiener process. Filtering with discrete-time observations was considered by Kalman (1960) [17] and continued in multiple works, including Brémaud (1981)[3]. After the pioneering work by Kalman, a lot of attention has been paid to linear filters, which are linear combinations of observed values. Many works have been devoted to the theory of Kalman filter, for example, Anderson and Moore (1979)[1]. Lately, as the computing facilities have improved greatly, the focus has shifted to non-linear filters, which typically perform much better. Comparison with linear filters is one of topics in this work. Yashin(1970)[26] derived the optimal non-linear filter for situation when the pro- cess X (t) is Markov taking values 0 and 1, and the observations are also 0 or 1. This situation is expanded in the book by Elliott, Aggoun and Moore (1995)[10] in the context of Hidden Markov Models. Their approach has become popular recently and involves a change of measure, rendering the observations independent of the process in question. One then arrives at a discrete-time version of Zakai equation, which presents a recursive way to compute the optimal filter. This was used, for example, in Dufour and Kannan (1999)[9] and Kannan(2000)[19]. The recent papers, to mention a few, are Portenko, Salehi and Skorokhod (1997)[21], Ceci, Gerardi and Tardelli (2001)[5], Del Moral, Jacod, Protter (2001)[7]. The latter deals with Monte-Carlo methods for estimating the optimal filter, even in case when no explicit expressions for the filter are available. Monte-Carlo methods are also discussed by Gordon et a1. (1993)[12] and Doucet, de Freitas and Gordon (2001)[8]. Relatively little is known about the asymptotic behavior of filtered estimates as observations become frequent. Some results on this are given in [22]. This work con- siders asymptotics for certain classes of stochastic processes. They include compound Poisson processes and piecewise-linear processes. The discrete observations are natural in target-tracking, when the process in ques- tion is a position of a target, and our observations come from a radar. A special case (with uniform errors) was considered by Portenko, Salehi and Skorokhod (1998) [22], although they introduce many extra features useful for target-tracking like multi- targets and false targets. A general exposition of filtering techniques employed can be found in [21]. A detailed overview of the target-tracking from an engineer’s prospective is given by Bar-Shalom et a1. [2]. 1.2 Hyper-efficiency The results for parameter estimation in i.i.d. case are well desribed in [13]. They can be summarized as follows. Suppose that {Yk}k=1 n is a sequence of i.i.d. random variables with density f9, depending on parameter 0. a) Suppose that f9 is continuously differentiable, with a several additional regu- larity conditions, including local asymptotic normality for the family {f9}069. Then, both Bayesian and maximum likelihood estimates are asymptotically normal with the rate E(d — 0)2 = C/n + 0(1/n). b) Suppose that the densities f9 possess jumps at the finite number of points 121(0), ..., :rk(9) and are continuously differentiable elsewhere, plus some identifiability and regularity conditions. The earliest treatment of such a problem known to the author is Chernoff and Rubin [6]. In this case both Bayesian and maximum likelihood estimates have the rate EMA—0)2 = C/n2+0(1/n2). That is, the estimates are “hyper- efficient”. An important special case is when the location parameter is estimated, that is when fg(:r) E f (a: — 9). The difference between the above two cases can be illustrated using normal distribution in case (a) and uniform distribution in case (b). For normal distribution, the mean of observed values is a natural estimator with the expected square error 0(1/n). For uniform distribution on the interval [0 — (2,6 + a], the estimator [mama/k) + min(Yk)] / 2 with the expected square error 0(1/n2) is a better estimator for 0 than the mean. Thus, for a density with jumps, the best location estimator is a function of observations near a discontinuity point. A generalization of these results to multi—dimensional variables and vector pa- rameter 6 is published by Ermakov [11]. (This problem also traces to Rubin [24].) When the error density f9 has discontinuities along a smooth manifold 89, then both Bayesian and maximum likelihood estimates for 0 have asymptotic square error of order 1/n2. He also has some results on the sequential estimation of such parameters. The way our problem is different lies in the stochastic-process perspective. We are estimating not a stable, unchanging parameter, but a value of some stochastic process in time. 1.3 Formulation of the problem Process X (t) is a real-valued stochastic process on the interval [0, T] or [0, 00), de- pending on the context. In general, suppose that the apriori distribution of the entire process X (t, w) has a density 7r(X(-)) with respect to measure V in a suitable function space f'[0, T]. Also let the distribution of observations £(observations|X()) have a density f in some space of observations. Keep the notation of X () for the entire trajectory of the process X. In this work, we use two approaches: a) when estimating the current position of the process X (t) at a given time t, use the information obtained up to this instant (“filtering”), regardless of whether or not the trajectory comprised of resulting estimates X (t) belongs to the specific family f, and b) when estimating the parameters of the process itself, based on observations in the interval [0, T], try to produce a process that belongs to the family .7: (“jump process” below), that is reasonably “close” to X () Under certain conditions, prior distribution 7r, of X(t), consistent with 7r(X()), will have a density with respect to Lebesgue measure. As an estimate of process’ position, we use the posterior (with respect to 7m) conditional expectation of X (t) given all the observations up to the time t. It is well known that such estimator minimizes the squared error of estimation. Consider two varieties of process X: 0 “Jump process”. Consider a compound Poisson process X“) = X0 + 2 ft izsz-St where (3,, 5,) are the events of a 2-dimensional Poisson process on [0, T] x R. The intensity of this process is given by A(s, y) = Ah(y), where A > 0 is a constant “time intensity” describing how frequently the jumps of X occur and hg(y) is the “jump density” describing magnitudes of jumps 5,- of process X. Here 6 E O is a parameter (possibly unknown) defining the distribution of jumps. In the Bayesian formulation, parameters 6 and A will have a prior density 1r(6, x\) with respect to Lebesgue measure. Assume that for each 6 E 9, E6? < 00. Also, assume that starting value X0 has some prior density 7rxo(). o “Piecewise Linear Process”. This is a process with jumping velocity V t X(t) = X0 + / V(s)ds o with V being a compound Poisson process described above. Assume that the prior distribution of X0, V0 is known. Observations Our observations {Yj} are always going to be “Process+noise” over a finite grid of values: Y1 =X(J'/n)+€j where the noise variables {63-} are i.i.d. with some density qbg, possibly depending on a parameter 6 and independent of process X. Assume that for each 6 E O, Eel = 0, and E6? < 00. A sample path of a Jump Process and observations are given below. ] X(t), Y - Figure 1 1.4 Main results We will consider asymptotics when n ——> 00: as the observations become frequent, but the process changes slowly (rate of change A is bounded from above). When the process X () is changing fast, its diffusion approximations become appropriate, but these are not discussed here. We establish the following results: 0 Recursive formulas for conditional density of process position given the obser- vations. o Asymptotic rates for the estimate of position of jump (compound Poisson) pro- cess. o Asymptotic behavior of the parameters of jump process, as both observation frequency and total time spent observing become large. 0 Asymptotic rates for a piecewise-linear process. 0 Comparison with linear filters. Simulation results in a small-sample setting. 2 Filtering of a jump process The filtering and Bayesian estimation problem can be formulated as follows: find the conditional distribution of the states of the process X and unknown parameters A,6 given the observations (K), initial distirbution of X(O) = X0 and some prior distribution on the unknown parameters. 2. 1 Recursive formulation E The results in this section are in spirit of Elliott et al. [10]. In the future, use r = 1 / n as the time between observations. Denoting X), :2 X(rk), we have Xk+1= Xk + Ck+1 (1) Yksz+ek (k is a sum ofjumps of X on the interval [T(k -— 1), Th): Ck = Z {i r(k—l)§ si + / 9(I)1/3(I)dx Also, subscripts 6, A in qbg, 1129,; will be omitted. Suppose the priors are given: 6, A have density 7r(-, -), X0 has density 7Txo(')- Our goal is to find the posterior conditional distribution ' £1r(Xk I Y1, "-)Yk)' From (1), we obtain the densities k le ..... Yk (311, "'7yk I X07"°3XI€36) : M ¢(y _ i=1 PXO ..... Xk ($03 “'1 (Ck ] 67 A) : 7rXo($0) H $6,)‘(1‘j — $j_1). 1:1 For briefness, let’s denote xk 2: ((130,...,:Ck), Xk 2: (X0, ...,Xk), Y), :2 (Y1, ...Yk). Now, applying Bayes’ Theorem, the joint density of Xk, Yk, 6 and A is 13(XkaYka9a A) = p(yk ] Xk,0,/\) 'p(xk I 0,A) ° 7f(0,)\) : a- k 7r,(6 A)- 7rX0( 3:0) -¢(H )-ib(H -—:rj_ 1) (2) i=1 1:1 11 and the conditional density given the observations 130‘)“ka 0’ A) 6 A Y = ' p(xka 7 I k) ka f9 pr(xk,Yk,9,)\)ka d0 dA Introduce qk(:r,6, A) := [Rk p(xk,Yk,6, A) dxo....dxk_1. It is an unnormalized density of the latest state X), and parameters 6,A given the observations Yk. The normalized density pk(x, 6, A) is then given by (Milt, 0, A) fn fe fa Qk($, 6, A)d:r d6 dA' pk(:1:,6,A) := /p(xk,6, A | Yk)da:0....d:rk_1 = The reason we use this density in an unnormalized form is the recursive relation: Theorem 1 qo(:r, 6,A) = 7rX0(a:) - 7r(6, A), qk(a:,6, A) = (159(Yk — (I?) ' [1‘10ng — z)qk_1(z, 6, A) d2 = (3) z: 4590”,, — :13) - [e‘Aqu_1(:r, 6, A) + [Ribg’fix — z)qk_1(z, 6, A) dz]. Proof: Straightforward, follows from integrating (2) CI Remark. 1. In order to use Theorem 1 for the estimation of state Xk, we will compute 12 qJ-(zr, 6, A), j g k consecutively, then compute marginal unnormalized density qk(:r) :2 qu(:r, 6, A) d6 dA and then find XI: 3: E(XkIYk) = W- (4) 2. Although not derived explicitly, the unnormalized density q has to do with a change of the original probability measure to, say, Q, which makes the observations Y1, ..., Yk independent of the process X (t). This way, prior distributions on (6, A) and X (0) ensure that the two measures are absolutely continuous with respect to each other. The change of measure approach is used extensively in non-linear filtering. The recursive formulas for the densities can be used to compute “on-line” updates as new observations are coming in. 2.2 Single-block upper bound for expected square error. Next, we investigate asymptotic properties of the above filtering estimator X (T) :2 XnT as the observations become frequent (n —> 00). First, we will produce a sub— optimal estimator of X (T) based on a single “block” of observations at time points immediately preceding T. Assume that the last observation is obtained exactly at the moment T. Denote The following discussion is based on the well-known fact (e.g. see [3, p. 84]) 13 Lemma 1 For a square-integrable random variable X, sigma-algebra .7 and an .7:- measurable random variable U, E[X — E(X].7-')]2 S E(X — U)2 [:1 Setting .7 :2 o{Y1, ..., Yk}, we can see that the filtered estimator X), introduced by (4) has the smallest expected square loss among all possible estimators of X k based on observations Yk. To produce an upper bound on €T(T), consider the following sub-Optimal estimator of X(T): 714A) 1: Z Yj/(RA), k—nA0, €T(T) S (A1536? + Eeibfi + 0N?) (5) Proof: Consider the estimate 7,,(A) introduced above. By Lemma 1, it is no better than X (T), that is €T(T) S ElX (T) - 7k(A)I2' 14 Suppose that the process X has m jumps on the interval (T —A, T], with the locations ofjumps 3,, ..., gm and the heights ofjumps 51,...,£m. Denote consecutive values taken by X (t) for t E (T —- A, T], and Sm E X (T). Note that Emaxj(Sm — 5,)2 g Ems... — 5,)2 = fling—3133. 1 Therefore, l EIX (T) - 71:63)]? < 2:: + 62—”ng [AA + (my: x Elf—ill. + m! 2 Setting A = Tb for some 0 < b < 1, the above becomes 2 E6? '7'1—b + (1 — Arb)E§f [Arb + 0(7’b)] . Setting b = 1/ 2, we obtain the statement of the Theorem. [3 Remark. Note that since the estimating procedure we used did not depend on 6, the above Theorem is also true when the parameter 6 is unknown. In that case, one needs to consider Bayesian loss em) = [9 Boom) — X(T)>2vr(0) do 15 and, integrating (5), obtain the bound 6%.(7) s v? feoEt? + Eei)7r(0) d6 + ow?) To produce finer approximations, we have to assume the knowledge of the error distribution. 2.3 Multiple-block upper bound Next, we modify our estimating procedure. Starting with time T, we will probe one block of observations after another, stopping whenever we believe that a jump has occurred. The following results were obtained when the error distribution is considered known. Denote as := m. We use the same idea as before: produce a sub-optimal estimate for X (T) based on 7 for a suitable interval. The difficulty lies in not knowing where exactly the last jump of process X occurred. Consider the intervals (blocks) (T1, T0], (T2, T1], ..., (T N, TN-1], where To Z: T T]- := Tj_1—(lnn)j/n, j = 1, ...,N TN+1 :: 0. 16 There is a total of lnn _ lnlnn blocks; j-th block has length (In n)j / n and n, 2: (ln n)j observations. The last block has length 1/ln n. Let X]- be the value of the process at the end of j-th block, that is X, := X(Tj_1). Let 7]- be the average of observations on the block j, that is 7]- := n;1 :1”, HT, < hr 3 TJ-_1). k Assumption 1 . Let Xm- Uef'lfi be the normalized sum of m errors. Assume that for the distribution of errors 6,, the following is true. There exist constants C], C2, C3 and K > 0 such that for all sufficiently large m and all integers j Eixi. 1(lxml > 01-m1/KJ')I < 02 cam—c3 mm“). This assumption is satisfied for Normal errors with K = 2; in general, it requires ek to have small tails. The following is a simpler-looking but more restrictive than Assumption 1: Assumption 1' . For Xm given above, there exist constants 0,7 > 0 such that for 17 all sufi‘iciently large m, Eezvphlxml) S G. Proposition 1 Assumption 1' implies Assumption 1 with K = 1. Proof: Suppose that Assumption 1’ is satisfied. Let Fm(.) be the distribution function of Xm- Pick C1 such that x2 < exp('y|x|/2) for [x] > Cl. Then for any j, / I{|x| > Clml/j}x2 dFm(x) S/ I{|x| > Clml/j}e7'$I/2dFm(x) g R R S exp(-701m’/j/2) [R el'x'dme) _<_ S ezp(-vClm’/j/2) ' G Theorem 3 . Tighter upper bound for E(Xn(T) — X (T))2 Suppose that the error density d) is known and does not depend on the parameter 6, and there exists a constant A0 such that A 3 A0. Then, under Assumption 1, there exists a constant C such that for n —> oo, lnMn E(Xn(T) - X(T))2 S C with M = (1 + 2/K) v (3 — 2/K). 18 Proof: Consider N —- 1 blocks as described above. Denote T* the point of last jump of X: T*=sup{0§tST:X(t)—X(t—)>0}. The idea is to approximate T *, then take the average of all observations from that moment up to T. Construct an estimate of X (T) as follows. Define jo as [71' _7. . 1+" > 201.74%“ N. (6) 0e jozzinf{j>0:\/n_j Then, as our estimate of X (T), take .- [X(T) :2 3730. We will find an upper bound for the average risk of this estimate, 6 z: E(X (T) — X (T))2. For this, we will need several inequalities, with proofs to follow in the next section. Case 1 . Block 1 jump On the event F that the last jump of X occurred on Block 1, F 2 {T1 < T*}, n == EI(X(T) — X(T))2IFI s 0351—" (7.1) 19 Case 2 . Correct stOpping In the event S that the last jump of X occurred just before the Block jo, S = {TJ-OH < T" 5 Tie} ln2n is 3: EI(X(T) “ X(T))2ISI g 04—”— (7-2) Case 3 . Late stopping In the event L that the last jump of X occurred in Block j, 1 < j S jo, L : {Tjo < T*} a := EIGHT) —X(T)>21LI s c——:—:/5 (7.3) Case 4 . Early stopping In the event 8 that we stopped on the Block jg but there was no jump of X, 8 = {T* S TIM-1} (ln n)3‘2/K a 2: EM (T) — X(T))2I£l s 06 (7.4) Now note that P(F U S U L U E) = 1. Thus, 8 = [F + 63 + [L + 85. Also, the estimator X does not depend on A and particular form of jump density hg, as long as the frequency of jumps A is bounded. By Lemma 1, the risk of estimate X does not exceed the risk of X. Combining (7.1) through (7.4), we obtain the proof of the Theorem. Cl 20 2.4 Proofs of inequalities used in Theorem 3 Proof of (7.1) The probability of jump on the first Block, which has length lnn/n is P(F) = ln n/n + 0(ln n/n), and probability of more than one jump on the first Block is 0(ln n/n). Therefore, - lnn E[(; ’(T) - X(T))21FIS (og/lnn + E591” "/71 S 0371—- Proof of (7.2) Let j0(w) be, as before, the last Block included in the computation of X (T) First, consider the special case jo = N. Then EI(X(T) — X(T))213100 = N)l : _<_ og/(n/ln n) - (l/lnn + 0(1/ln n)) g const/n Now let jo < N. Suppose that the last jump T*(w) occurred on the Block jo + 1, that is, w E S. Then X (t) = X (T) for Tjo < t < T, and the squared loss from estimating X (T) equals the variance of on, so that EIGHT) — X(T))2ISIU0 < M] s ; EKVJ- — X(T))?ISI _<_ by independence of {ek} and process X. Thus, N , . . I 2 65 S const/n + Z(ln’+’n/n + o(ln3+’n/n)) - of ln‘Jn g Cit—nag. i=1 Proof of (7.3) Thanks to (7.1), we can exclude the case when the last jump T" happens on Block 1. Therefore, suppose that the last jump happens on Block J, J > 1, but we stop the summation only at Block jo, jo 2 J. Denote N J 2: {last jump happens on Block J}. Our stopping rule (6) implies that fOIJSijo, [73' " 7j—1I _<_ 20eC'1njlli2H/UTUK Thus, El(X(T) - X(T))21NJI S Jo S E(IX(T) — VJ—ll2 + Z [7]- — 7j_1]2) - P(jump on Block J) S j=J J J N . _<_ l_n_n + 0(ln n)] x [aim—(JAM + C7 2 nglfz/(rl’KI g n n F] 1+2/K 1+2/K S 03122 + 07% S 05922; n n C] Proof of (7.4) If the stopping occurred too early then X (t) = X (T) for Tjo+1 < t < T. Also, the 22 stopping rule (6) implies that at least one of |7jo+1- X(T )I > C 0en 201/2+’/’°K, IVJ'O - “V )I > C laen j—ol/a’l/JOK is true. Thus, E[(X(T) — X(T))“:‘Isi s 3 Elm. — X(T)? - P(IT:.+1— W )I > 0,0.n;01/2+1/20K)+ E (m, — X(T)|21(|7j0 — X(T )| > C oen gel/“”90“” s E1+ E2. By Assumption 1, E2_ < NC2exp(——n j”) < C2 (ln n)/n. To estimate E1, consider the Chebyshov-type inequality -— '2 1 'K 2 —, 2 K (cloaks/2° ) P(Ih..l— XT< >I>Ca.n,, ”+100 )< < E 07.0.1 — X(T>I21 Claens¥f+‘/j°")) 3 C20" n>/n by Assumption 1. Therefore N— _ - —2 <2: o2/nj C2(lnn)/n- (CloeanQH/JK) _<_ 02 112.0,) n)(1-2/jK)(j+1)< |/\ 062 (ln n)2"n 32/“ (ln n)3‘2/K 062—— < c —— j ann _ 6 n 23 2.5 Lower bound for expected square error. Let us, as before, have exactly n observations on the interval [0,T] and the last observation is made at the moment T. Then X (T) = X". To estimate the expected squared loss of the Bayesian estimate Xn from below, consider the estimator V Xn : E(XnIYla "'a Ym Xn—17 In), where 1,, = I (Xn 7Q Xn_1) is indicator of the event that some jumps occurred on the last observed interval. It’s easy to see that X" = E(XnIYn,X,,_1, In) and that E(X” — X,,)2 < E(X" — Xn)2, since the estimator X" is based on a finer sigma-algebra. Proposition 2 . The expected square error for X", E(X" — X,,)2 = C/n + 0(1/n), where C > 0 is some constant not depending on n. Proof: Consider random variables Zn ~ lib so that Xn = Xn-1+ InZn, and W" 2 Z, + e,,. 24 Joint distribution of Zn, Wn does not depend on n and P(Zn 6 dz, Wu 6 dy) = 6(T)¢(y — T)- Also note that on the event {In 2 0}, Xn = Xn_1 and on the event {In 2 1}, Yn = Xn_1 + W". Therefore, U Xn = E(XnIYnaXn—laln) = Xn—l + InE(Zn I Wn) Let 2,, :2 E(Zn | W"). Then E(X, — X,,)2 = P(I, = 1)E(Zn — 2,)? Clearly, E(Zn —- Z")2 > 0 and P(In = 1) = 1 — e‘V" = A/n + 0(n‘1). This gives us the statement of Proposition with C = A - E(Zn — Zn)2. El This proposition shows us that the hyper-efficiency observed in case of estimating a constant mean (different rates for different error distributions) here does not exist, because there’s always a possibility of a last-second change in the process. The following informal argument shows us what one can hope for with different error distributions. Suppose that the number of observations J since the last jump is known. Set X, = E(anYl, ..., Y", J). Just as before, E(X” — X,,)2 5 E(X" — Xn)2. 25 The Optimal strategy is to use the latest J observations. If the error density (b has jumps (e.g. uniform errors) then this strategy yields - 1 l 1 7 2~ —1 2 _ __ ~— E(x,,—X,,) _n (1 +22+...+J, _n On the other hand, for the continuous error density (e.g. normal errors) v 1 1 lnn E] X _ Y 2 ~ “—1 + _ + + _ ~ _ ( n n) (1 2 "° J) 26 3 Estimation of parameters of a jump process Next, our goal is to estimate the parameters of process itself, that is the time-intensity A and parameter 6 describing jump density hg, based on observations Y(t), 0 S t g T. Recursive formula (Theorem 1) will allow us to do it. The question is: how efficient are these estimates? Assume, as before, that the error density (b is known. Without loss of generality, let 0,, = 1. Also, assume that A is bounded by some constants: A1 3 A 3 A2. When the entire trajectory of the process X (t, w) is known, that is, we know exact times t1, t2, when jumps happened, and exact magnitudes 6,- = X(t,) — X(t,—),i 2 1, the answer is trivial. For example, to estimate intensity, we can just take A := 2121103 S Tl/T- Likewise, inference about he will be based on the jump magnitudes Q. It’s clear that these estimates will be consistent only when we observe long enough, that is T —) 00. In fact, we will consider limiting behavior of the estimates as both n and T become large. Now, when the only observations we have are noisy Y,, we can try to estimate the locations and magnitudes of jumps of process X. Let n be number of observations on the interval [0, 1]. Split the time interval [0, T] into blocks of m = n5 observations 27 each. Let Zk be the average of observations over Block k, 1 m Zk : 67—2 2 Ym(k—l)+j 2:1 Consider several cases (see Figure 1). Let a > 0 and 6 > 0 be specified later. Case 1. \/r_n'|Zk+1— Zkl g m". In this case we conclude that no jump occurred on both Block k and Block k + 1. Case 2. m|zk+1— Zk| > m", £72]st — Zkl _<_ m“, frank+2 — Zk+1| g m“. In this case we conclude that a jump occurred exactly between Block k and Block k + 1, that is, at time t = mk/n. Here, estimate the magnitude of this jump as 6* = Zk+1 — Z]:- Note: accumulation of errors does not occur when estimating { because the esti- mates are based on non-overlapping intervals. Case 3. fi(Zk+1 —- 21,) > m“ and «7742,. —— Zk_1) > m“, or JEKZHI — Zk) < m“ and mm, — Zk_1) < m“, In this case we conclude that a jump occurred in the middle of Block k, that is, at time t = m(k + 0.5)/n. We estimate the magnitude of this jump as 6* = Zk+1 -— Zk_1. Case 4. Jumps occur on the same Block, or on two neighboring Blocks. The probability that at least two such jumps occur on the interval of a fixed length is asymptotically equivalent to (m/n)2n = m2/n. Picking 6 < 0.5 we can make this probability small. 28 Of course, there are errors associated with this kind of detection, we can classify them as: Type I Error: we determined that a jump occurred when in reality there was none (this involves Cases 2 and 3). 0 Type II Error: we determined that no jump occurred when in reality it did (this involves Cases 1 and 4). o Placement Error: we determined the location of a jump within a Block or two neighboring Blocks incorrectly. o Magnitude Error: the error when estimating the value of 5,- (jump magnitude). Note that the placement error is small, it is of order m/n. The magnitude error is based on averaging m i.i.d. values, and is therefore of order m‘l/z. 3.1 Errors in jump detection: Lemmas Let’s estimate the effect of Type I and II errors. Here, as in Section 2.3, we demand that Assumption 1 hold. Type I errors. Assume that there are no jumps over the Blocks k and k + 1, but we detected one according to Case 2 or 3. 29 Consider P(x/fiileH - Zkl > m“) = P(IXm,k+1_ Xm,kI > m“), where m ij1em(k—l)+j Xch : o’e\/n—1 is the sum of normalized errors. Further, P(IXm,k+l — Xm,kI > m“) S 2 - P(IXchI 2 0.5 m“)- From Assumption 1, for any integer j > 0 Eleni IUXchI > 01 -m’/"")I < C2 (firm—Ca ml”), and the application of Chebyshov’s inequality yields P([mec] > C1 -m’/Kj) < Cexp(—C3 ml/j) -m_2/Kj Picking j such that 1 / K j > a > 1 / (2K j) and for m large enough, summing up over Tn m“1 blocks, we obtain Lemma 2 . As it —> 00, provided that T grows no faster than some power of n, 2K0) P(Type I error) < C - Tn m‘lexp(—C3m ——> 0 30 Type II errors. Suppose that a jump occurred on Block k, but it was not detected (Case 1), that is IZlc—l -‘ ZkIVIZk+1— ZkI S ”la—0'5 Of Blocks k — 1, k + 1, pick the closest to the true moment of jump. Without loss of generality, let it be Block k — 1. Let 5 be the size of the jump. Then averages of X (t) on Blocks k and k — 1 are different by at least f / 2 and P(IZk—l — 2k] S m°_°'5) S 2P(2IXm,kI > lélx/Tfi/2 - m“) < < C - Tn m_1exp(—C3m2K€) (8) as n —> 00, as long as [E] > m‘0'5+°‘+5, for an arbitrary e > 0 (use Assumption 1 in a way similar to Lemma 2). Consider separately P(U{I€.-| > m—0'5+“+E}) g C (AT + o(T))m’°'5+°+5, using the assumption that density of E,- is bounded in a neighborhood of O and the total number of jumps is AT + 0(T). Finally, take into account Case 4 which yields an upper bound CAT m2/n. Summing up, we obtain Lemma 3 P(Type 11 error) < C. ,\T(n(—0.5+a+e);3 V 7123—1) 31 3.2 Asymptotic behavior of parameter estimates. For simplicity, determine the behavior of estimates separately, that is consider first an estimate of A, and then an estimate of 6. Let 6 E O, with 9 being bounded subset of R. Let true values of parameters be A0 and 60. Let t‘; be consecutive jumps of X () determined by Cases 2, 3. Estimate the intensity A by 11(3‘Z:1 0 (b) h9(x) is bounded in a neighborhood of 0, uniformly in 6. (c) There is a constant 7, 0 g 7 S 1/4, such that for large enough N there exists A such that P(|§,| > A) = 0(N’1) and bA :2 supmgA = suplxlSA|6B'(x)| = 0(N7) £071,190“) uniformly in 6. Define the log-likelihood function based on estimated jumps N To) = 21221225). i=1 Theorem 4 . Let Assumptions 1-3 hold. Then the maximum likelihood estimate 6* = argmaxgee L*(6) is asymptotically normal, that is (AoT) (0‘ — 00) —> N[0, [(60)_1] in distribution as T ——> 00 no faster than T = n", where k < (1/5) /\ (1 — 47). 33 Proof: Pick 5, k, oz and e such that n+2fl—1<0 k+B(—0.5+a+5)<0 7—6/2<0. With (1,5 arbitrarily small, this is achieved when 2k < 6 < (1 —- k)/2, so that k < (1/5) /\ (1 — 47). According to Cases 1-4, the estimated jump magnitudes are 5: = (£1 + 6?)IEC + (511E, where E is the exceptional set where Type I and II errors occurred, 6,- are the esti— mates of 5 resulted from these errors, and 6,9 are “magnitude errors” discussed in the beginning of this Section. From Lemmas 2, 3, P(E) ——) 0 as n —+ 00. Therefore we can disregard this set and consider only §=6+£- Let N be the total number of jumps on [0, T]. Consider the log-likelihood function 12(9) = Z In (19(56- 1'21 34 Under the conditions of Theorem, maximum likelihood estimate 6 = argmaxgeeL(6) is asymptotically normal with given variance. Next, we would like to show that the estimate 6" based on {€313,521 is close to 6 based on true values of 5,. Note that both maxima exist because L” (6) = —NA”(6) and therefore L(6) is a convex function, the same is true for L*(6). Furthermore, for any 6 in a neighborhood A of 6, L(6) = L(6) — “’ "’29)2N .4"(é) + 0(0 — é)2 Thus, if L(6) — L(6) = 0(1) then (6 —— 6)2NA”(6) = 0(1) and therefore (6 — 6) = 0(N‘1/2). According to Lemma 4, [L(6) — L‘(6)| = 0(1), and also |L(6) — L*(6)[ = 0(1). Therefore, (6" — 6) = 0(N‘1/2), and the statement of Theorem follows from the similar statement for 6. Cl Lemma 4 . Under the conditions of Theorem 4, [12(9) - L’(9)l = 0(1) uniformly in 6, outside some exceptional set E1 with P(El) : 0(1). Proof: According to Assumption 3.3, the probability of event E1 :2 U,{|€,-| > A} is 0(1). 35 Thus, excluding E, N IL(0) — L*(0)I E 2 Nu ham.) — ln hem.- + 63H 3 Z IOB’(.{,)| - [5?] g CbAn‘5/2 1:1 12.19: The statement of Lemma follows as 7 < B / 2. Cl Examples. Assumption 3(c) is the hardest to verify. It holds, e.g., in following cases: a) when 8’ (x) is bounded. For example, exponential distribution with h9(113) = exp(—6x + A(6)), x 2 0. b) The normal distribution with hg(.’L‘) = exp(—6x2 + A(6)). Here one can pick A =2 In N, and then bA = 6ln(N)/2 = o(N7) for arbitrarily small 7. 36 4 Piecewise-linear process 4. 1 Problem formulation Let’s consider the following simplified version of the problem. Suppose that the velocity of the target is piecewise constant with jumps belonging to the set {t,c = k/n;k : 1, ...,n}, the probability of jump pn and distribution of the height of a jump are known. Note that this restriction is not important since for n —+ 00 the difference between this and our original process X (with jumps at arbitrary locations) becomes of order O(n’2). We will be interested in the case when pn = A/n + 0(1/n). Let r =1/n. Denote Vk the velocity of the target on the interval [k/n, (k + 1) /n] and X), the position of the target at the point k/ n. Suppose the observations Y), are made at points 1/n, 2/n, ...., (n — 1)/n, 1. Overall, we have the model Vic = Vic—1 + Ck Xk = ch—l + TVk—l (9) Yk = X]; + 6k, where ek are i.i.d. observation errors with a known density (be, possibly also depending 37 000, (k = 0 with probability (1 — pn) (k = {k with probability pn, and {5),} are i.i.d. with a known density hg, depending on some parameter 6. Initial values X0, ’0 and the parameter 6 have some prior density 7r(x0, v0, 6). Random variables X0, ’0, {C1,} and {ek} are jointly independent. The problem of filtering is to find the posterior conditional distribution of the position of process Xn and parameter 6 given the observations Y1, ..., Y”. 4.2 Recursive filtering equations We will obtain the filtering equations analogous to Section 2.1. But in this case, we have to consider an unobservable variable (velocity), and the resulting filtered density will be two-dimensional. To simplify notation, in the sequel we will write that C), has “density” w E 1/10- Consider the joint density of V, 6 and Y: I: 1: 1520110; vo, vk; yl, yk; 9) = 71(To, 120. 0) X H ¢a(yj - 133-) H 1149(1),- — 22,--1) j=1 j=1 where xk are uniquely determined from (9) as x), = x0 + 7' Egg; vj. By changing the variables {xo, v0, v1, v2, ..., vk} to {x0, x1, x2, ..., xk, vk}, we obtain 38 the joint density of X, 6, v), and Y: ‘ k PZ(T0,T1, ---,Tk,vk;y1,.--,yk;6) = 714100.316) x H 459(1),- — x,) x i=1 1 “'1 x-1—2x-+x-_1 xk—xk_1 X927 H ¢0( 3+ 7] J )X 160(1)}: — _7—), (10) i=1 where prior density 7i,(x0, x1, 6) can be determined from 7r(v0, x0, 6). The density is “predictive” in its v), argument: though v), depends also on n+1, we do not include Yk+1 into the equation. Introduce qflxk, vk, 6) the unnormalized density of X2, V), (position and ve- locity of the target at the moment k/n) and 6 given the observations Y1, ..., Y), q;(xk,vk,6) = /p2(xo,x1, ...,xk,vk;y1, ...,yk;6) dxo...dxk_1. (11) Then the following recursive relation holds Theorem 5 . qg(x,v,6) = 7r(x,v,6) 117501.116): 7600/1: " 1‘) ' [$001 - vk—1)Qk—1(IB — vk—1720k—1,9)dvk—1 = = rqb9(Yk — x) [e'Aqu_1(x — vr, v, 6)+ (12) +(1— 67M) ° [how — Uk—1)Qk—1($ — ”Uh—17, ilk-1,9) dvk—l] 39 Proof: Replace vk_1 = (x,c — $k_1)/T and rewrite (11) as 33k — HYk—i qlcr($k7vki6) : [pg—1(an-Tli "wink—la iyla --'1yk—1;6) X It} — £13k- X(b9(Yk — 2)./29(1),, — ————‘) dxo...dxk_1 2 17k — (Ck—1 55k _ mk— = 050(Yk - 33k) [Gk—1U?!“ ————,9)I//‘0(vk — ————l)d$k—1 Changing the variable xk_1 into x), — vk_lr, the above yields the statement of the Theorem. [:1 4.3 Asymptotics: special case The question of interest is whether the asymptotic rate of the optimal Bayesian estimate depends on the smoothness of error distribution, that is, whether “hyper- efficiency” takes place. In [22], the asymptotic rate for estimating a linear function X (t) 2 X0 + Vot was found for uniform errors and was equal to 0(n’2). Will this rate be changed when we shift to piecewise-linear? Consider a special case. Let s have the (prior) uniform distribution on [0, 1] and consider the following process instead of X: W(t)=(t—s)+=(t—s)v0, ogt 31 with observation errors e,- having uniform distribution on the interval [—7, 7] (7 > 0 is known). The process W in this case is a piecewise linear function with all parameters 40 known except the turning point 3, and we need to estimate 5 based on observations. Note that estimating the final position W(1) is equivalent to estimating 3. It’s clear that the asymptotic rate of E(W (1) — W(1))2 is no worse than the asymptotic rate of X from the previous section. Let’s write the likelihood function (the density of distribution [(3 ] observations Y1, ..., Yn)) f(s)=f(s]Y1,...,Y,,)= II 51;I{e,-Z(i/n—s)+—7} ogi/ngl Let true value of s = 1. Then the likelihood equals to a constant on the interval [s*,1], and 0 otherwise, where s" = min{u : (i/n — u)+ — e, S 7}. Then, the Bayesian estimate of 3 (given its uniform distribution) is s = E(s | Y1, ...,Yn) = (s‘ +1)/2 Consider the step function (see Figure 2) l—u u+1 — > D,(t)_ 2 1(1 2 ). W(t) 023—1“) : ' D,(t) 0 5""’"'1—_ Figure 2 41 Note that D,(t) g W(t) S D2,_1(t). Now consider s = min{u : Du(i/n) — e,- g 7}. It’s clear that s > 3“. Find the asymptotic rate of the estimate s, = (s + 1) / 2. We have Pegs—=— II i(22-042)):(1—11.(2))“-‘>". tgi/ngi In case when t = Cn’1/2, P(s _<_ t) z exp(—C) —> 0 when C —> 00. Thus, for 3 close to 1, s, is 1 — 0(n-1/2). The same statement can be obtained for s = min{u : D2,,_1(i/n) — e,- S 7}. As a result, we would have s < 3* < s, and it follows that for 3 close to 1, s“ is 1 — O(n“’/2). The case 3 = 1 is the worst, for s < 1 the estimate s, is s — 0(n‘1). Finally, this yields the following for the expected square loss (with uniform errors) Proposition 3 . Keeping in mind that the expected square loss for normal errors cannot be lower than 0(n’1), we obtain some “hyper-efficiency” in this case, although not as good as when estimating a constant [rate 0(n‘2)]. 42 5 Comparison to linear filters The optimal linear filter for our problem is the well-known Kalman filter. Let’s take a look at its asymptotics. 5.1 Jump process In case of jump process, the model can be re-written as a state-space model (for example, see Brockwell and Davis [4]) KZXt-I'et, t:1,2,... Xt = Xt-1+ Ct and subsequently the optimal linear predictor (the estimator of X 2 based on all obser- vations up to time t — 1) is based on second moments of 6; and C, and can be written as — - Q Xt+l = Xi + at 4:030? — Xt) where Qt = E(Xt — X02 is defined by a? _ 2 t o,,, _ a. + Eg — ———Qt + 03 and 520 depends on prior distribution of X0. Furthermore, the optimal linear filter is 0: X 2X + t+1 t Qt+0§ (Y1 - X1) (13) 43 with the error W EX—X2za— . ( t t) t Qt+0§ Note that when the observations become frequent (n ——> 00), the optimal predictor and optimal filter become close. It can be shown that filter (13) is asympotically (when t —+ 00) equivalent to the exponential filter X1+1= X1 + 304 " Xt) where 6 = limHooQ—fiig and when n —> oo, 2 = M - 22/2. + 00/62?)- Thus, the asymptotic rate of the best linear filter is of order n‘1/2: E(Xt — X62 = \/’\/—" ' 050}; + OUR/5) See Figure 3 for the graphical comparison of linear (exponential) and optimal non- linear filters. 44 Process position ............... Linear filter — Non-linear filter °°°°° Observations Figure 3 45 5.2 Piecewise linear process In case of piecewise linear process, we can reformulate (1) as a system of state-space equations Y, = X 1 + et (’25) =(3 1(")(i)+(') t+l t C t where Xt is the current position of the process and V, is the current velocity (note that only position is observed, not velocity), as before, Ct IS the jump in velocity over the interval [t / n, (t + 1) / n], and at is the change in X caused by this jump in velocity, which is negligible (of order 0(n‘2)) when n is large. Applying the recursive equations for Kalman filter (see [4]), we obtain the following recursions for the best linear predictor ( if ) : “(7) (a 1m ) () with the error matrix Qt:(:11 11112) =E( 12 ”(U22 , defined by recursive relation <>< \/ I A <>< 1 ( ((1011 + “(Uzi/n)2 (“’11 + w21/701021 ) — 2 w“ + 03 1011 + wzl/n)w21 “’21 46 where : 0 0 0 n. Q. (0 0§,/,,)+ (1/) It follows that the elements of matrix Q = limHooQt are w“ : «203/202/211’3/4 + 0(n_3/4) C w12 = oeogn—l/2 + 0(n_’/2) w22 = flag/20:62 n—1/4 + 0(71'1/4) Furthermore, as n ——) 00, the optimal predictor X, and optimal filter X, are close. Thus, the asymptotic efficiency of the optimal linear filter E(X, — X,)2 is of order n'3/4. A summary of asymptotic behavior of o2 2: E(X" — X”)2 for jump and piecewise- linear processes is given in Table 1. smooth errors discontinuous errors Linear Filter Jump Process 3 lnMn/n g lnMn/n n‘”2 Piecewise-Linear Process 2 n‘1 n‘3/2 (Special case) n"3/4 Constant Location Parameter n—1 n‘2 n‘1 Table 1. Summary of asymptotic results 47 One can see that increasing the “smoothness” of a process improves the asymptotics. The behavior of the optimal linear and non-linear filters for a piecewise-linear process is shown in Figure 4. The optimal non—linear filter was evaluated using the sequential Monte-Carlo method described in [8]. Process position ............... Linear filter Non-linear filter 0 o o o 0 Observations Figure 4 48 6 Simulation The author has simulated the optimal filter for the jump process based on Theorem 1. There, the densities qk(x) are found recursively using the relation (3). We consider a case when the parameters 6, A and initial state X0 are known, and the error density is uniform on the interval [—7,7]. This causes qk(x) to be confined to an interval [Y]c — 7, Y), + 7], and we keep a discretized version of qk(x) in memory. Integration required for evaluating (3) was performed numerically. The results of simulation are given in Table 2. In the case considered, distribution of jumps is uniform on the interval [—2.5, 2.5], with 7 = 1, on the interval t 6 [0,9]. Two cases, 7‘ = 0.02 and r = 0.04 were considered. For each Monte-Carlo sample of the process, the ratio of effectiveness of non-linear filter to the linear one was found: ~ 2 R .= [ ?;1(Xt_Xt)2]1/ E(X, - X62 , where X, is the optimal non-linear filter at time t and X, is the optimal linear filter. Also, the mean square error M Sn, of the optimal non-linear filter is given. In each case, N = 100 Monte-Carlo samples were generated. 49 7' mean of R st.dev. of R MS", 0.02 2.141 0.535 0.0249 0.04 1.749 0.407 0.0479 Table 2. Simulation results This and other simulations lead us to believe that the optimal filter becomes more effective relative to the linear filter when: a) r -—> 0 (we know that from the asymptotics, as well as Table 2). b) jump magnitudes increase, making the process more “ragged”. Intuitively, a linear filter has to compromise between the periods where the process stays constant (and for the filter to perform better on those intervals, past observations need to carry a greater weight) and the times when jumps happen (to cope with jumps, we need to forget past observations quickly). As a result, greater jumps will upset the performance of a linear filter. Some adaptive filters, for example, IMM filter described in [2], might be more competitive. 50 7 Open problems Naturally, it makes sense to extend the results for a piecewise-linear process (of which only the special case is treated in Section 4). This is considerably more difficult than the jump process case, partly due to hyper-efficiency. Another interesting task is to cover the unknown error distribution (in most asymptotic results above, the latter was assumed known). The results on parameter estimation (Section 3) might also be improved. Also, in view of possible applications, other forms of stochastic process X () de- serve to be considered. First, two— and three-dimensional processes are obviously of interest. Second, some other types of processes, rather than jump and piecewise- linear, might be more useful. Finally, one needs to consider the situation when param- eters of the process are changing themselves, albeit slowly (“parameter tracking”). Such situation is considered in Elliott et a1. [10], and no doubt the recursive formulas similar to Theorem 1 could be derived in this case, too. 51 References [1] ANDERSON, B.D.O. AND MOORE, J .B. , Optimal filtering, Prentice Hall, En- glewood Cliffs, N.J., 1979. [2] Y. BAR-SHALOM, X. RONG LI, AND T. KIRUBARAJAN, Estimation with appli- cations to tracking and navigation, Wiley, New York, 2001. Theory, Algorithms and Software. [3] P. BREMAUD, Point processes and queues, Springer-Verlag, New York, 1981. Martingale dynamics, Springer Series in Statistics. [4] BROCKWELL, PETER J .; DAVIS, RICHARD A., Introduction to time series and forecasting, Springer-Verlag, New York, 1996. [5] C. CECI, A. GERARDI, AND P. TARDELLI, An estimate of the approximation error in the filtering of a discrete jump process, Math. Models Methods Appl. Sci., 11 (2001), pp. 181—198. [6] CHERNOFF, HERMAN; RUBIN, HERMAN, The estimation of the location of a dis- continuity in density, Proceedings of the Third Berkeley Symposium on Math- ematical Statistics and Probability, (1954—1955). vol. I, pp. 19—37. [7] P. DEL MORAL, J. JACOD, AND P. PROTTER, The Monte-Carlo method for filtering with discrete-time observations, Probab. Theory Related Fields, 120 (2001), pp. 346—368. [8] A. DOUCET, N. DE FREITAS, AND N. GORDON (ed.), Sequential Monte-Carlo methods in practice, Springer-Verlag, New York, 2001. [9] F. DUFOUR AND D. KANNAN, Discrete time nonlinear filtering with marked point process observations, Stochastic Anal. Appl., 17 (1999), pp. 99—115. [10] ELLIOTT, ROBERT J.; AGGOUN, LAKHDAR; MOORE, JOHN 8., Hidden Markov models. Estimation and control, Springer-Verlag, New York, 1995. [11] ERMAKOV, M. S. , Asymptotic behavior of statistical estimates of parameters for a discontinuous multivariate density. (russian), Studies in statistical estimation theory, I. Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI), (1977). pp. 83—107. [12] N. GORDON, D. SALMOND, AND A. SMITH, A novel approach to nonlinear/non- Gaussian Bayesian state estimation, IEEE Proceedings on Radar and Signal Processing, 140 (1993), pp. 107—113. [13] IBRAGIMOV, I. A.; KRAS’MINSKII , R. Z. , Statistical estimation. Asymptotic theory, Springer-Verlag, New York-Berlin, 1981. [14] T. KAILATH, An innovations approach to least-squares estimation, parts I , [1, IEEE Trans. Automatic Control, AC-13 (1968), pp. 646—660. 52 [15] G. KALLIANPUR, Stochastic filtering theory, Springer-Verlag, New York, 1980. [16] G. KALLIANPUR AND C. STRIEBEL, Stochastic differential equations occurring in the estimation of continuous parameter stochastic processes, Teor. Verojatnost. i Primenen, 14 (1969), pp. 597—622. [17] R. E. KALMAN, A new approach to linear filtering and prediction problems, J. Basic Engrg., (1960), pp. 35—45. [18] R. E. KALMAN AND R. S. BUCY, New results in linear filtering and prediction theory, Trans. ASME Ser. D. J. Basic Engrg., 83 (1961), pp. 95—108. [19] D. KANNAN, Discrete-time nonlinear filtering with marked point process obser- vations. II. Risk-sensitive filters, Stochastic Anal. Appl., 18 (2000), pp. 261—287. [20] LEHMANN,E.L. , Theory of point estimation, Wiley, New York, 1983. [21] PORTENKO, N .; SALEHI, H.; SKOROKHOD, A. , 0n optimal filtering of mul- titarget tracking systems based on point process observations, Random Oper. Stochastic Equations, 5 (1997). no. 1, 1-34. [22] , Optimal filtering in target tracking in presence of uniformly distributed errors and false targets, Random Oper. Stochastic Equations, 6 (1998). no. 3, 213—240. [23] B. L. ROZOVSKII, Stochastic evolution systems, Kluwer Academic Publishers Group, Dordrecht, 1990. Linear theory and applications to nonlinear filtering, Translated from the Russian by A. Yarkho. [24] RUBIN, HERMAN , The estimation of discontinuities in multivariate densities, and related problems in stochastic processes, Proc. 4th Berkeley Sympos. Math. Statist. and Prob., (1961). Vol. I pp. 563—574. [25] A. N. SHIRYAYEV, Stochastic equations of non-linear filtering of jump-like Markov processes, Problemy Peredachi Informacii, 2 (1966), pp. 3—22. [26] A. I. YASHIN, Filtering of jump processes, Avtomat. i Telemeh., (1970), pp. 52— 58. [27] M. ZAKAI, 0n the optimal filtering of diffusion processes, Z. Wahrsch. Verw. Gebiete, 11 (1969), pp. 230—243. 53 Notation Some special symbols used in this work. E(X) expected value of X s Vt max(s, t) s /\ t min(s, t) R set of real numbers C, const some constant (often, their exact value is not specified but can be easily obtained) I (A) or I A indicator of the set/event A AC complement of the set / event A AT matrix A transposed A 2 B asymptotic equivalence, that is A/ B = const + 0(1) A := B definition of expression A in terms of expression B E(X), £(X |...) distribution/conditional distribution of X argminy f(Y) the value 2 such that f(z) = miny f(Y) N(u, 02) Normal distribution with specified mean and variance. 54 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII ill[I][I]]]]][[l]]]]l][]]]l]Ii