ASYMPTOTIC CONVERGENCE 0F NON-LINEAR, CONTINUOUS-TIME FILTERS» Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY MICHAEL WESLEY BIRD 1969 ‘ __ —_.‘_ LIBRA R Y ‘ Michigan Start- University This is to certify that the thesis entitled ASYMPTOTIC CONVERGENCE OF NON-LINEAR, CONTINUOUS-TIME FILTERS presented by Michael Wesley Bird has been accepted towards fulfillment of the requirements for O \ ,/’-\ M degree mi; xii ) \ /. Major professor 0-169 // ‘ I (r/ 1/5,- -’, - I , .z ___;’/_‘,.t‘// ‘1 ’ ./ ’ aunx mom ma BINDING BY HMS & SONS'I‘ ‘ LIBRARY BINDERS ABSTRACT ASYMPTOTIC CONVERGENCE 0F NON-LINEAR, CONTINUOUS-TIME FIETERS by Michael Wesley Bird Modern non-linear, continuous-time filtering theory, char- acterized by the use of state variable techniques, is potentially applicable to many real-world problems. However, the filters suggested by the theory as solutions to the time-domain filtering problem have not been sufficiently analyzed to provide confidence in their behavior. One aspect of non-linear filter performance which has received no attention is asymptotic behavior. This thesis suggests a method for demonstrating the asymptotic performance of non-linear filters and applies this method to a class of continuous- time, non-linear filters in the scalar case. For this investigation, the time-domain filtering problem consists of a message process, represented as the solution to a first-order differential equation with unknown initial condition, and an observation process, modeled as a signal containing the message to which white noise is added. Filters considered in this thesis are sequential, being represented by first-order differential equations which are identical to the model of the message process plus a correction term. The correction term consists of a gain function, which depends on the past of the filter output, multiplied by the difference between an estimate of the observation and the Michael Wesley Bird observation itself. This structure is similar to the filter made popular by Kalman and Bucy for a related linear filtering problem. The non-linear, continuous-time filters considered here are not postulated to have any optimal properties; however, when the message process has a probability distribution, the gain function can be selected to make the filter approximately-optimal, providing nearly- minimum-variance estimates of the message. In the discrete-time case, the stochastic approximation methods developed for parameter estimation led to asymptotic convergence theorems for sequential filters. These theorems show that the difference between the output of certain filters and the message pro- cess converges to zero, with probability one and in the mean square sense, as time increases. This thesis applies the same strategy to the continuous-time problem. Stochastic approximation methods are developed for the con- tinuous-time parameter estimation problem. The procedure developed uses sequential algorithms for estimating a parameter of a signal when the signal is observed in the presence of white Gaussian noise. Two theorems are proved which show that, when the signal models and the gain functions in theestimators satisfy certain conditions, the estimators converge asymptotically to the true parameter value. The proofs of these theorems rely on the properties of Ito calculus and super-martingales. Relying on the concepts developed for continuous-time para- meter estimation, two asymptotic convergence theorems are proved for a solution to the filtering problem. In these theorems, con- ditions are placed on the message process, observation process, and Michael Wesley Bird gain functions in the filters that guarantee asymptotic convergence. For purposes of evaluation, these results are compared to a theorem which specifies the asymptotic behavior of the Kalman-Bucy filter which is a solution to a linear filtering problem. The convergence theorems developed for filtering show the asymptotic convergence of a particular sequential filter which is an approximately-optimal filter. A filter is thus displayed whose out- put, under certain conditions, provides nearly-minimum-variance estimates of the message throughout the observation time interval and converges to the message process as time increases. Computer- simulated results show the behavior of this filter when applied to a non-linear filtering example. ASYMPTOTI C CONVERGEN CE 0F NON-LINEAR, CONTINUOUS-TIME FILTERS By Michael Wesley Bird A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1969 ACKNOWIEDGEMENTS For consistent confidence in the preliminary work and guidance in the actual writing of this thesis, the author wishes to thank Professor Richard Dubes, his teacher and advisor. His counsel and understanding are greatly appreciated. For their consideration of this thesis, the author wishes to acknowledge the other members of his doctoral committee: H. G. Hedges, C. L. Park, R. B. Zemach, and H. Salehi. ii TABLE OF CONTENTS Page ABSTRACT ACKNOWLEDGEMENTS ii LIST OF FIGURES v LIST OF APPENDICES vi Chapter 1. INTRODUCTION ....................................... 1 1.1 Performance Criteria for Solutions of the Modern Filtering Problem ........ .. .......... 2 1.2 Literature Review ........................... 7 1.2.1 Linear Filtering Theory - the Kalman- Bucy Filter ............... . .......... 8 1.2.2 The Optimal Non-Linear Filter Solution .... .................. . ...... 9 1.2.3 Sub-Optimal Sequential Filters with Known Asymptotic Behavior ............ 11 1.2.4 The Discrete-Time Filtering Problem .. 11 1.2.5 Optimum Discrete-Time Filtering Theory .............................. 13 1.2.6 The Method of Stochastic Approximation in Parameter Estimation .............. 14 1.2.7 Recursive Filters .................... 15 1.2.8 Continuous-Time Stochastic Approx- imation Methods ...................... 16 1.3 Thesis Objectives ........................... 18 1.4 Thesis Outline ........ . ..................... 20 2. A FILTERING PROBLEM AND A CLASS OF SEQUENTIAL SOLUTIONS ......... . ............. . ...... . ........... 22 2.1 The Non-Linear Filtering Formulation ........ 22 2.2 An Approximately-Optimal Filter ............ . 24 2.3 Sub-Optimal, Sequential Filters ............. 28 2.4 Summary .......... . ..... . ............... ..... 29 iii CONVERGENT ALGORITHMS FOR PARAMETER ESTIMATION ..... 3.1 Preliminary Considerations .................. 3.2 The Main Convergence Theorem ................ 3.3 An Alternate Convergence Proof .............. 3.4 Particular Gain Functions ................... 3.5 Summary ..................................... CONVERGENCE INVESTIGATIONS FOR THE FILTERING PROB IEM OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 4.1 Assumptions and Preliminary Derivations ..... 4.2 The Convergence Theorems .................... 4.3 A Convergence Theorem for the Linear Kalman- Bucy Filter .......... . ...................... 4.4 A Class of Gain Functions Guaranteeing Convergence ................................. 4.5 Summary ..................................... AN EXAMPLE OF A FILTERING PROBLEM .................. 5.1 The Example and Computer Simulation ......... 5.2 Summary ..................................... CONCLUSIONS ........................................ 6.1 Conclusions and Results ..................... 6.2 Extensions .................................. BIBLIOGRAPHY ....................................... iv 31 33 37 41 44 46 47 48 51 55 61 64 67 67 7O 73 73 75 77 LIST OF FIGURES Figure Page 1. Summary of Literature Review ..................... 19 2. Filter Behavior for Two Runs ..................... 7O 3. Statistical Analysis of Filter Performance ....... 71 LIST OF APPENDICES Append ix Page A STOCHASTIC DIFFERENTIAL EQUATIONS AND THE ITO CALCULUS ......................................... 81 B a-ALGEBRAS, ITO CALCULUS, AND MARTINGALES ........ 87 C THE INFINITE DIMENSIONAL REPRESENTATION FOR THE OPTIMAL FILTER ................................... 91 D IMPORTANT PROPERTIES OF STOCHASTIC DIFFERENTIAL EQUATIONS SIMILAR TO EQUATION 2.15 ............... 93 vi CHAPTER 1 INTRODUCTION In the process of transmitting or gathering information, the signal that contains the information frequently becomes distorted. It is usually necessary to modify this signal in order to remove the distortion and recover the original message. This act of modifying the signal is called filtering. In other words, the essential purpose of a filter is to remove the distortion or noise from an observed signal. The design of a filter requires mathematical models that represent the signal and observation processes. Wiener [1] was the first to treat these processes as random phenomena and described them in statistical terms. His treatment allows filters to be designed from criteria based on the statistics of these pro- cesses; he points out that his method of developing filters com- bines the techniques of random time-series and conventional elec- trical filter theory. More recently, Kushner [2], Bucy [3], Kalman [4], and Deutsch [5], still using the random descriptions, have developed filters based on fundamental statistical methods other than those of Wiener. The approaches of Wiener and Kalman are the most popular methods for designing filters. The investigations in this thesis are based on the filtering formulation proposed by Kalman [4] and Kushner [2]. Section 1.1 outlines this formulation and discusses extensively criteria for designing filters. A survey of many solutions to the filtering 1 problem is given in Sec. 1.2 pointing out some limitations of the filtering results presently available and indicating the need for studies into the asymptotic behavior of statistical filters. In Sec. 1.3, the objectives of the thesis are discussed and a brief summary of the main results of the thesis is given in Sec. 1.4. 1.1 Performance Criteria for Solutions of the Modern Filtering Emblm During the last eight years, a special form of the statistical filtering problem has received a vast amount of attention in engi- eering literature. This form of the problem uses modern modeling techniques and is distinguished from the traditional by its for- mulation, the message process being described by a stochastic differential equation. With E. denoting the n-dimensional message process, or signal process, the message representation is: |x° = _f.(t,z<_) + 9(t)g(t> ; 50:0) = _c_ (1.1) where .£(t,x) is a set of n known functions of the n components of g_ and t, RAP) is a known n X n matrix, gflt) is an n-dimen- sional, zero-mean, Gaussian, white noise process,1 and g. is an unknown constant vector. Equation (1.1) represents a Markov process and completely determines the probability distribution of x(t). This distribution determines all moments and correlation prOperties of the process. 1 An r-dimensional white noise process, say wflt), is a process with the prOperty that Ewi(t)wj(s) = 5(t-s)6ij for l s i,j, s r, where is the Kronecker delta and 5(t-s) is the generalized bij delta function. The remainder of the filtering formulation defines the m- dimensional observation process, y(t), which is assumed to have the conventional model: fit) = mayo) + raffle) (1.2) where hfit,§) is a set of m known functions of the n vector .5 and t, Rflt)% is the square root of the positive definite m X m matrix ‘R(t), and gfit) is an m-dimensional, zero-mean, Gaussian, white-noise process. The problem is to utilize this formulation to develop useful filters. At this time, no restrictions will be placed on the form of the filter, other than that its input must be the finite past of the observation process and its output must approximate the message process x(t). The type of approximation provided by the output depends on the performance criteria used in selecting the filtering solution. The performance criteria or, stated in different words, the requirements for a good solution,are selected according to the requirements of each particular filtering application. A few such applications and standards for judging the quality of a filter will be discussed to enumerate features of useful filters. The communication model, for example, has recently been placed into the framework of the modern filtering formulation described above [16]. The solutions to the filtering problem are used as receivers at the output of noisy channels and are intended to demodulate the message which has been contaminated by channel noise. This modern approach to filtering is applicable to the communication problem because the resulting filters can give reasonably accurate z. estimates during the transient phase of the filter's response. The transient reSponse has long been a difficulty with classical Wiener filters since they are designed from steady-state error criteria [1]. However, even though the modern filter has this advantage, its performance as a communication receiver is still judged on the quality of its steady-state behavior. A second example, the aerospace guidance problem, has been one of the most studied applications of filtering theory [7]. The time-varying parameters describing the position of a space vehicle are designated as the message. Noisy measurements of these para- meters are filtered to give estimates of the parameters from which the vehicle's location can be established. The filter must be analyzed in terms of three design criteria: 1) the asymptotic behavior of the error variance; 2) the amount of time required to converge into the asymptotic state; 3) the computational feasibility of the algorithm used in synthesizing the filter. Since guidance systems are Operated on a real-time basis for extended periods, the third criteria becomes very important in the implementation of the filter. Intuitively, it appears that the first two criteria can be satisfied if the filter is designed to give an accurate estimate of the message during the transient period, in which case both the asymptotic error variance and the length of the transient period should be reduced. However, excessively stringent accuracy requirements increase the complexity of the filter, so a trade-off may be necessary to satisfy all three criteria. A final example, system identification, may be placed in the filtering formulation [8], [9]. The typical identification problem demands a filter which produces estimates of system parameters from noisy measurements of the system's input and output. The unknown parameters are either constants or varyslowly with time. For any reasonable identification procedure, the filter's output must con- verge in some probabilistic sense to the true parameter as time increases. A secondary, but still important, consideration is the amount of time required for convergence. The convergence time may be minimized if the estimates are as accurate as possible during the entire time interval. The identification scheme is frequently a small segment of a very large and complicated system so constraints must be placed on the complexity of the algorithm. Consequently, a trade-off between complexity and accuracy may be necessary. These three applications demonstrate the type of real-world problems which may be formulated using the time-domain representations defined in (1.1) and (1.2). Since the filter characteristics cited in these examples are typical of real problems, the following three requirements for a good and useful filter are proposed. R.l The output of a good filter should accurately estimate the message throughout the observation interval. For the filtering problem in this thesis, there are a few statistically-based methods which provide a filter satisfying this requirement. The most commonly used method selects the filter which minimizes the mean-square difference between the message process and filter output for each time instant in the observation interval. The filter minimizing this performance index is represented by the conditional mean functional. This procedure is widely used and, since the filter is a solution to an optimization problem, it is commonly called the optimal filter. R.2 Analytical results should be available that describe the asymptotic behavior of a good filter as t a m. Examining the solution to time-domain filtering problems can bring out the type of analysis needed. The earlier discussion indicates that the filter has two phases of operation, the transient-response phase and a type of asymptotic phase. For a wide variety of real problems, this second phase will determine the filter's useful- ness. For these problems, a good transient response is, of course, ideal, but not at the expense of asymptotic behavior. Any filter- ing algorithm that guarantees the filter's large-time behavior will place restrictions on the message and observation models. These restrictions assure that tfmzfluctuations of the estimate of the message will be confined by a calculable bound. R.3 A useful filter must not be too complex, so a sequential algorithm is required. An algorithm is sequential when the filter's output is defined by the solution to a finite-dimen- sional system of first order differential equations, where the derivatives are functions of the present value of the observation process, present value of the system solution and, possibly, a specified functional of the past solution. This might seem to be an unreasonable limitation for an optimal filter. However, the time-domain approach to filtering yields Optimal or nearly- optimal solutions which are sequential. In fact, many present- day problems are being formulated in terms of this modern approach in order to obtain sequential algorithms with nearly-optimal properties. Optimal sequential filters, plus the computer technology for efficiently implementing them, are providing use- ful solutions to many previously-unmanageable problems. Any filter should attempt, in some sense, to satisfy these three criteria. However, the approaches to filtering in the literature are concentrated on deriving approximately-optimal filters, with little or no study of their asymptotic behavior. Analytical investigations of asymptotic behavior are very difficult and appear to be a major barrier in the utilization of useful filters. Considering the difficulties inherent in the analysis of the optimal filter, it is logical to study other filters, without stringent specifications on transient reSponse, but which are more amenable to mathematical treatment than the optimal algorithm. These algorithms will be sequential and, along with well-analyzed asymptotic behavior, attempt to satisfy the requirements enumerated in R.2 and R.3. This thesis will be primarily concerned with initiating an investigation into this type of filter. 1.2 Literature Review To better understand the state of filtering theory and to further reinforce the previous discussions, a summary of existing results is presented in this section. This synopsis will not only include available solutions to the filtering problem discussed in Sec. 1.1, but will also summarize the publications on the discrete- time filtering problem. At this point, it is necessary to make the distinction be- tween continuous-time and discrete-time processes. Equations (1.1) and (1.2) define waveforms specified on the entire time axis, called continuous-time processes. In contrast are the discrete-time processes which are sequences of variables defined on only a countable number of time instants. The following continuous-time investigations are surveyed in Secs. 1.2.1, 1.2.2, and 1.2.3: the well-established linear theory, the optimal approach to non-linear filtering, and the completely un- explored realm of asymptotically stable, sequential filters. A review of the optimal and sub-optimal approaches to discrete-time filtering are contained in Secs. 1.2.4 through 1.2.7. All these discussions place a heavy emphasis on reporting studies, or the need for studies, into the asymptotic behavior of filters. The discrete-time survey will suggest exploring the method of stochastic approximation as an approach to investigating the large-time performance of continuous- time filtering solutions, so this section will conclude with an out- line of the few publications on continuous-time stochastic approx- imation. 1.2.1 Linear Filtering Theory - the Kalman-Bucy Filter Assuming .£(t:§) = Fflt)x. and hflt,x) = Hflt)§, the filtering formulation defined in (1.1) and (1.2) becomes linear. It is also assumed that g_ is Gaussian with zero mean and variance 2, Kalman and Bucy orginated this formulation in 1961, and developed a very effective and useful solution to the filtering problem [4]. They derived an optimal filter which produced the conditional mean, the minimum-variance estimator. Their solution has been shown to have the following properties: A. The filter's output is determined from the solution to two differential equations. Denoting xfit) as the n-dimensional output at time t, the equations are written as: 9 lx’. II at): + 2(t)§(t)T[z(t) - sum : goo) = o (1.3) i: = at): + no? - mTgg + ggT : 20:0) = L (1.4) This optimal algorithm is sequential. B. The solution to (1.4), commonly called a Riccati equation, is the error variance, since .£(t) = E(§(t) - xfit))(gfit) - xflt))T. Equation (1.4) does not depend on observations and can be solved prior to actual implementation of the filter. C. Kalman and Bucy formulated observability and controllability conditions which insure adequate asymptotic behavior of this optimal filter [10]. The optimal solution (1.3), commonly called the Kalman-Bucy filter, satisfies R.1, R.2, and R.3, the postulates for useful filters. This optimal, sequential, asymptotically stable filter found immediate use in the fields of guidance and control. More recently, its popularity has Spread to the fields of system iden- tification, pattern recognition, and differential encoding of television signals [11]. 1.2.2 The Optimal Non-linear Filter Solution Significant work on optimal non-linear filtering solutions began in 1964, with important contributions from Kushner [2], Bucy [3], Stratonovich [12], and Kallianpur [13]. Their primary aim was establishing the conditional mean functional and, because of the non- linear nature of the problem, their derivations required a rigorous treatment of detail. Initially, the mathematical model for the Gaussian white noise disturbance was interpreted as the formal derivative of an independent-increment process, Brownian motion, 10 which requires an understanding Of Ito's stochastic differential equations [3]. Next, the message process, observation process, and conditional mean process were represented as elements in an abstract functional Space. Finally, a stochastic differential equation was derived for the conditional mean using Ito's special differentiation rule. The stochastic differential equation representing the solution for the conditional mean appears initially to be neat and concise. However, careful examination shows that the optimal filter is de- scribed by an infinite-dimensional system of first-order stochastic differential equations, each with the observation process as a driving term. Furthermore, the entire system requires a simultaneous solution. These two facts make the optimal filter impossible to realize. The present emphasis in optimal, non-linear filtering research is centered upon deriving algorithms that approximate the optimal filter. The standard approach is to represent the non-linear functions in (1.1) and (1.2) as series expansions and to retain the first few terms. The infinite system of differential equations de- scribing the optimal filter then reduces tO a finite set and a realizable algorithm is possible. The approximations made on the non-linear functions restrict the applications of the filter so a filtering solution must be developed for each class of problems. Kushner [14] discusses the shortcomings of a number of such approx- imation schemes. Almost all approximately-optimal filters are sequential. An examination of the system of stochastic differential equations de- fining the Optimal filter shows that most finite approximations will ll be sequential. The demand for a filter with a sequential structure instigated the now-prodigious amount Of research on the time-domain approach. While many investigators have been successful in deriving approximately-optimal filters, analytical studies of the error char- acteristics, or asymptotic behavior, for such filters are not avail- able. All work devoted to the properties of such filters involve lengthy computer simulations for very specific problems. A recent survey paper has pointed out both this void in the literature and the reluctance of users to apply the present non-linear filter theory [15]. 1.2.3 Sub-Optimal Sequential Filters with Known Asymptotic Behavior Without general analysis of the behavior Of Optimal, non-linear filters, the real-world user does not have confidence in their be- havior. However, the demand for sequential, non-linear filters exhibits a need for sub-optimal schemes whose performances are under- stood and analyzed. As Sec. 1.1 indicated, there are numerous applications where large strings of data must be processed using a minimal amount Of equipment; these applications require filters with little storage and good large-time behavior. The present state Of optimal filtering theory cannot supply algorithms with satisfactory properties for such tasks. 1.2.4 The Discrete-Time Filtering Problem Presently, there are no published investigations Of the asymptotic performance of continuous-time filters. Insight into this area may be gained from a review of the discrete-time filter— ing results. A brief outline of the discrete-time filtering 12 approaches is presented next, emphasizing investigations into asymptotic behavior. The structure of the discrete-time filtering problem is similar tO that of the continuous-time problem in Sec. 1.1. The n-dimensional message process is: k = o,1,2,... (1.5) X —k+l = £15331.) + Qkflk’ where {2k} isan.r-dimensiona1, zero-mean noise sequence with ij .. E u u = ,, 1 s 1 s r f is a known n-vector Of fu ' k m 6km613’ ,J , _k(xk) nct1ons, and 9n is an n X n matrix. The m-dimensional observation equation is 1k = 91511.) + Rink (1.6) where {3k} is an m-dimensional, zero-mean noise sequence, E viVj = 6 6 1 s i j S m E viuj = 0 and R25 is the s uare root 1, k Lk ij’ ’ ’ 4, k ’ —k q of a positive definite matrix, Bk' The filtering problem is to determine a function of the past Observations which will furnish a good estimate of the present message value. The function transforming the Observed data into a sequence of estimates represents a filter. The discrete-time filtering problem has received a great deal more attention than its continuous-time counterpart. This may be attributed to the availability Of results in classical statistics which concern random phenomenon described by sequences of random variables and which have been applied directly to the discrete-time problem. Investigations into the filtering problem are Split into two groups. When the noises in (1.5) and (1.6) are Gaussian, the 13 Bayes-Optimal approach concentrates on finding a function representing a conditional mean. The other group of investigations are studies of asymptotically-stable, recursive, but non-optimal algorithms. An algorithm is recursive when its output is the solution to a difference equation with the (n+l)st output depending on the nth output and nth or (n+l)st Observation. 1.2.5 Optimum Discrete—Time Filtering Theory Many investigators have suggested a representation for the conditional mean of the message, given the observations, as a solution to the discrete-time filtering problem [16], [17], [18], [19]. This Optimal solution is only employed when the noises {2k} and {2k} are Gaussian. As could be expected from the non-linear nature of the problem, a closed-form solution for the general optimal filter does not exist. The research effort has been channeled into derivations of approximate algorithms. Aoki [16] discussed the following recursive algorithm for an approximately-Optimal filter solution: 3n+1 = £main) + 5n+1[1n+1 ' hn(£n(fin))] (1‘7) A where g“ is the filter's output and [Kn], the gain sequence, is determined from a set of recursive equations. This filter has had numerous applications and is included here for reference. The development of discrete-time, non-linear filters is hindered by the lack of asymptotic (large n) results. Albert and Gardner [20] and Pearson [21] both emphasize the need for work in this area and indicate the necessity Of studying convergent, recur- sive filtering schemes. These authors rely heavily on the methods of stochastic approximation as means for generating sub-Optimum l4 filtering algorithms as outlined in the next section. 1.2.6 The Method of Stochastic Approximation in Parameter Estimation Stochastic approximation algorithms are used for estimating parameters of Signals which are observed in the presence of noise. Albert and Gardner [20] have applied the method of stochastic approximation to problems where the Observations have the form = h + . yn “(9) vn (1 8) where e is an unknown parameter, {vn} is a zero-mean process with E v v = 5 k n and hn(e) is a known function of n and e. The nk’ noise distribution need not be known. If e is considered a message, the problem of estimating 9 can be considered a special case of the filtering problem. The scalar case is treated here to simplify the notation. In parameter estimation, as in filtering theory, the problem is to process the Observation sequence and estimate the value of the unknown constant. The method of stochastic approximation attempts to find a zero Of a regression function and results in the following recursive algorithm which solve the estimation problem. xn+1 = xn + an[yn - hn(xn)] : x0 15 arbitrary (1.9) where xn is the nth estimate Of e. The next step is the formulation Of conditions which guarantee that ;g - e converges to zero as n a m with probability one or in the mean-square sense. These conditions are illustrated in the state— ment Of the following stochastic approximation convergence theorem [20]. The hypotheses are: 15 T1. hn(x) is assumed monotone in x and differentiable; T2. For each n, the sign of an is equal to the sign Of h (x) where h (x) = 2'"h (x)' n n 3x n ’ m . T3. E. b a ==m with bn = inf \hn(x)I; n 1 n n x m 2 T4. 2 a < an. n n=1 Then xn A e as n a m with probability one and in mean-square. Stochastic approximation algorithms are traditionally associated with parameter estimation problems where noise models are unknown, large amounts of data are processed, and simple computations are required. 1.2.7 Recursive Filters Generalizing on the method of stochastic approximation, Pearson [21] and Albert and Gardner have suggested the recursive algorithms §h+1 = fn(;h) + an[yn - hn(;h)] (1.10) with ;g the filter output as solutions to the general filtering problem defined in Sec. 1.2.4 with m = n = r = 1. Equation (1.10) is similar to the approximately-optimal filter in (1.7). These authors reported on the reSults of initial investigations into the application of stochastic approximation methods to the determination of recursive filters with desirable asymptotic characteristics. Pearson considered a Special case of the problem, assuming un = 0 and hn(x) = hx. Then, with the hypotheses 0° co 2 [f(x) ' f(Y)‘ 5 ‘x ‘ Y‘a 2 an = ”a E a < w, a Stochastic n=1 n=1 “ approximation convergence theorem was applied showing that E(;] - xn)2 a O as n a m. The last two hypotheses are identical with T3 and T4 of Sec. 1.2.6. The Lipschitz condition on f(x) 16 was a sufficient restriction on the message dynamics to allow use of the convergence theorem from stochastic approximation. Recently Wolverton [22], using the same hypotheses, proved convergence with probability one. Albert and Gardner [20] considered the general filtering pro- blem. Their hypotheses are lengthy and provide no motivation for this discussion. The important point is the complete reliance of their convergence proofs on techniques developed in stochastic approximation convergence proofs. Their main conclusion was: lim sup E(;$ - Xn)2 3 known bound. Two recent theses, one in pattern recognition, the other in system identification, illustrate the possibilities of using stochastic approximation theory in developing approximately-optimal convergent algorithms. Blaydon [23] is concerned with a pattern recognition problem in which a probability density function, specified in terms of unknown parameters,is to be learned. He derives a recursive estimation algorithm via a minimum—mean-Square criteria and shows convergence to the true parameter value with a stochastic approx- imation theorem. In system identification, Donoghue [24] attempts to formulate an algorithm for learning unknown system parameters from noisy observations Of the input and Output. He approaches the estimation problem from a Bayes-Optimal point of view and eventually derives an approximate solution. A stochastic approximation result is applied to show convergence Of the algorithm. 1.2.8 Continuous-Time Stochastic Approximation Methods The stochastic approximation method has proved useful in studying the asymptotic behavior of discrete~time filters. The 17 extensions of the method to continuous-time processes available in the literature are examined in this section. Only a few investi- gations have been made into convergent, sequential algorithms for continuous-time parameter estimation problems. The major difference between the formulation of the observation process in Sec. 1.1 and the models available in the literature is that Sec. 1.1 places more’ stringent restrictions on the correlation properties of the obser- vation noise than the models in the literature. Driml and Nedoma [25] and others [26] only assume the noise is stationary and ergodic, while Sakrison [27] adds a complicated condition on the spread of the correlation function. In contrast is the white noise assumption in (1.2). Initially, the investigations in the literature appear appli- cable tO the parameter estimation study needed for the continuous- time filtering problem. However, detailed analyses of the theorems indicate that the proposed algorithms and methods Of proof are very specialized and not applicable to the general filtering problem. Sakrison [27] and others [26] postulate a structure for the Obser- vation process which does not resemble (1.2) and their entire de- velopment requires this special structure. Only Driml and Nedoma have a formulation compatible with (1.2). They assume very re- strictive estimation algorithms and these Special forms fit neatly into a convergence proof relying on the Law of Large Numbers. This method of proof cannot be extended to the algorithms needed for filtering. All the continuous-time, stochastic approximation papers mentioned above rely principally on the Law of Large Numbers for 18 their convergence proofs. This appears to be the only method avail- able for treating general, non-white, observation noise; to extend this approach to more meaningful estimation algorithms may require generalizations or extensions of ergodic theory. Figure 1 summarizes approaches to discrete-time and con- tinuous-time filtering. 1.3 Thesis Objectives The emphasis of the research reported in this thesis is on formulating an approach to the non-linear filtering problem of (1.2) by which asymptotic convergence of filtering algorithms can be exhibited and on demonstrating useful analysis techniques. In a sense, the discrete-time approach to asymptotically-stable, recur- sive algorithms will be generalized to the continuous-time filtering problem. Investigators Of discrete-time problems relied upon stochastic approximation methods to prove their convergence results but sto- chastic approximation methods have not been developed for the con- tinuous-time filtering problem. The first major objective Of this thesis will be to state and prove convergence theorems analogous to theorems from the discrete-time stochastic approximation literature. After developing an approach to continuous-time stochastic approximation, the second objective will be to examine a class Of sequential solutions for the filtering problem defined in Sec. 1.1. Since the general problem is very difficult to study, this thesis will concentrate on the scalar, noiseless-message case; i.e. 3(t) = O, m = n = l in (1.1) and (1.2). Under these assumptions, the message may be considered to be a time-varying parameter Bow>om manumuoufiq mo kumessm .H ouswwm l9 maoaaoua wcfiuouafim Homoeow ocoz oeoz ocoz ocoz you muouaww unowuo>eoo Hoawua01>aoumawxoumm< mu .em umwom coHumEfiumo amuoswuwm pow manaawm>o mssuwuowfio ucowuo>aoo ocoz ocoz ocoz mowvsum sow < HwawumOuhaoumswxoumm< maoanoum Hm .om "mmmm wawumuawm kuocom now .oanwaflm>m manmowaaam ammo mowuuomoua oquumexmm ocoz ocoz memosum 30w < ammo: unoocoamoaH nufls manuwuowa< om .mm .mu Homom AcoHumBonuaaa owumOLOOOmv .oaanwm>a mm .om "mama oHnoowaaam ammo cowuwefiumo Houosaumm now nuanmou 36w < mcoz oowvsum HHo3 ammo: uOOOOOQOOOH manufiuowam ucowum>cou ma .om .m~ SH .mH «coz .SH .NH .m "mama ma Mama .GH .NH "mama mumufiaa mOHSmou xcmz .>w:um Ono .muasmou has: I HoEHum0uhaoumaonuaa< cwwmmsoo cmflmmsau cwwmmsww chHumwwumo>cH oufinzuaoz use ouwna ucoc .unovcoaoocH can ucoocoaoocH "oowoz "omwoz "omfioz "omwoz Boanoum Osweumsoaafiucoo oaHHuououumfin waaumuaam 20 specified by the solution to a deterministic differential equation excited by an unknown initial condition. The ideas developed for the stochastic approximation section will be utilized to suggest filtering algorithms whose error variances converge to zero. 1.4 Thesis Outline The discrete-time results for recursive filters with known asymptotic prOperties cited in Secs. 1.2.6 and 1.2.7 were de— veloped for problems where the noise distributions were unknown. The analogous situation in the continuous-time filtering problem is to assume 2211 white noise in (1.1) and (1.2). However, the more general non-Gaussian case Of filtering cannot be treated because the mathematical techniques available for handling non- linear stochastic differential equations are restricted to pro- blems involving Gaussian, white noise. This restriction has not been effectively explained in engineering literature, so Appendix A is devoted to representations of white noise and their relation to stochastic differential equations. The non-linear filtering problem treated in this thesis is defined rigorously in Chapter 2. To supplement later discussions in Chapters 3 and 4, an approximately-optimal filter is derived based on a representation for an Optimum filter. The class of sequential filters, whose asymptotic convergence will be in- vestigated in later chapters, is also defined and discussed. The original contributions of this thesis begin with the statement and proof of two theorems in Chapter 3 similar to theorems from discrete-time Stochastic approximation. The filtering problem is attacked in Chapter 4 and the techniques develOped in Chapter 3 21 are utilized to demonstrate convergence for the class of sequential filters defined in Chapter 2. This thesis marks the first time that stochastic approximation ideas have been applied to the problem of estimating a time-varying parameter described by a differential equation. Accordingly, every aspect of this work is original. The last section of Chapter 4 specifies conditions on (1.1) and (1.2) which guarantee the asymptotic convergence of the approx- imately-optimal filter defined in Chapter 2. These conditions indicate the existence, under certain conditions, of nearly- optimal filters satisfying the requirements of good and useful filters. Chapter 5 examines an example Of a non-linear filtering problem. The performance of the approximately-Optimal filter documented in Chapter 2 is displayed by means of error pro- files generated from a digital computer simulation. The results of the thesis are reviewed in Chapter 6 and conclusions are drawn concerning the application of this approach in demonstrating asymptotic convergence. A number of extensions of these investigations are proposed. CHAPTER 2 A FILTERING PROBLEM AND A CLASS OF SEQUENTIAL SOLUTIONS Chapter 1 outlined the general time-domain filtering problem. This chapter is devoted to rigorously defining the particular pro- blem of concern in this thesis. In subsequent chapters, the emphasis will be on investigations into the asymptotic convergence of a class of sub-optimal, sequential filters; this class is defined and dis- cussed in Sec. 2.3. Chapters 3 and 4 will indicate that this class of sub-optimal filters satisfies two of the three goodness criteria discussed in Chapter 1. In hopes of displaying filters meeting all three performance criteria, Sec. 2.2 derives a nearly-optimal filter to which the convergence results will be applicable. 2.1 The Non-linear Filtering Formulation This thesis examines solutions for a particular case of the general filtering problem defined in Sec. 1.1. Although the signal and Observation processes were defined using conventional engineering models, the type of analysis performed in later chapters necessitates a rigorous mathematical representation for these processes. The reason for the special care being taken to define the models of these processes is to avoid the technical controversies which have plagued investigations of Optimal, continuous-time, non-linear filters [31], [32]. 22 23 The following definition of the scalar "noiseless-message- filtering" problem is a specialization of the general definition given in Sec. 1.1. The signal process is assumed to be the solution to a first-order differential equation: 95351 = f(t,x> dt x(to) = b, an unknown constant (2.1) and f(-,~) is a specified function. The absence Of a noise driver in this equation explains the "noiseless" adjective. The function f(t,-) is assumed to satisfy all sufficient conditions which guar- antee the existence, uniqueness, and continuity of the solution to (2.1). A complete discussion of these conditions is given in Coddington and Levinson [33]. The observed process y(t) retains the structure of Sec. 1.1; y(t) = h(t.X(t)) +V(t) (2.2) where h(°,°) is a known function and v(t) is a zero-mean,Gaussian, white noise process with correlation function Ev(t)v(s) = 5(t-s). The structure of (2.2) is general enough to make y(t) a model of many physical processes. Kailath [34] discusses when (2.2) is a reasonable representation for real-world processes. Appendix A discusses the mathematical complications of de- fining a white noise representation which will provide valid solu- tions for non-linear stochastic differential equations involving this white noise. The appendix indicates the necessity for re- stating the observed process, (2.2), in terms of an independent- increment, Brownian motion process B(t). Applying Ito's stochastic integral when manipulating equations having the form (2.2) eliminates the use of delta functions; Ito calculus provides consistent meanings 24 for all equations. Specifically, z(t) is defined by (2.3) and (2.2) is taken as equivalent to (2.4). z(t) = f:0y(s)ds (2.3) dz(t) = h(t,x(t))dt + dB(t) (2.4) Equation (2.4) is equivalent to (2.2) if (2.4) is divided formally by dt and ggiEl is interpreted as the Gaussian white noise, v(t). Equation (2.4) is interpreted as being equivalent to (2.5), a random integral equation. I: t z(t) - z(to) +~jtoh(s,x(s))ds + [todB(s) (2.5) The first integral in (2.5) is the ordinary Riemann integral, while the second is Ito's stochastic integral. In addition to Appendix A, a recent paper by Wonham [35] briefly surveys the properties of (2.4). Equations (2.1) and (2.4) define models of the signal and observation processes. The formulation of the noiseless message pro- blem may be completed by summarizing the discussion of filter solu- tions in Sec. 1.1. The observed waveform is denoted by {mpg—(il- .0....}=. ds , A filter may be defined as a t t' 0, functional mapping of yt t into a real variable for each t 2 to 3 o The filtering problem is considered solved when a particular mapping has characteristics which satisfy the three goodness criteria of Optimality, asymptotic convergence, and sequential structure. 2.2 An Approximately-Optimal Filter This section develops a filter which is approximately-Optimal and sequential. A proper evaluation Of any approximation requires 25 the optimal solution to the filtering problem. It is important to note that, when considering an optimal solution to the filtering problem, a probability distribution will be assigned to the initial condition on the message process, x(to) = b. Since optimal filters are concerned, at least in part, with accurate estimates of the message during the transient portion of the filteré' time responses, it is mandatory that all moments of the initial condition be known. A precise development of an optimal filter is contained in the papers of Kushner [2] and Kallianpur [13]. The stochastic differential equation for the expectation of any function, g(t,x(t)), of the message, x(t), and the time, t, conditioned on the Observed waveform z is developed in these papers; conditioning on 2 t,t t,t 0 o is equivalent to conditioning on y by (2.3). Letting t,t 0 L8(t.X(t)) = f(t.X(t))é(t,X(t)), where g(t.X) = g;’8(t.X). this stochastic differential equation is d§ €§dc + [éh]tdz - fitt>dt1 (2.6) where s(t) EIg(t.x(t))\zt ,t] 0 {g(t) = E[f(t,X(t))é(t,X(t))\zt ,t] O éi = Etgtt.x)h>|zt ,.1 O B(t) Ethcc,x>lzt ,.T 0 fi E[x(t)|zt ’t]. 0 When g(t,x) = x, (2.6) becomes da(t) = f(t)dt + [§h(c) - R(t)h(t)][dz(t) - h(t)dt] (2.7) 26 with 2(t) the conditional mean. This thesis will follow the con- vention of calling the differential equation (2.7) the representation of the optimal filter, even though there are other legitimate choices for an "optimal" filter. In general, the optimal filter (2.7) can never be realized since an infinite-dimensional system of Stochastic differential equations must be solved to determine the random functions f(t), 4h(t), and h(t). Appendix C displays this infinite-dimensional system. Approximations must be made to produce a finite set of equa- tions. An approximation scheme is now described. The structure of the resulting filter is very similar to that of the linear Kalman- Bucy filter (Sec. 1.2.1). The conditional mean R(t) is assumed to be in the neighborhood of x(t) so that f(t.x(t)) f(t.%(t)) + f(t,%(t))[x(t) - s1 (2 8) h(t.x(t)) was» +fitt,s(t>>tx(t> - stun . (2.9) It is also assumed that the third conditional moment, E([R(t) - x(t)]3]zt ,t) é 0. (2.10) 0 An examination of these assumptions should provide insight into how well the approximately-optimal filter described in (2.13) and (2.14) will satisfy the goodness criteria R.1, optimality. This approximate filter, to be studied in Chapter 4, is now derived using (2.6) - (2.10). Equation (2.8) shows that f(t) é f(t,s(t)) (2.11) so that (2.7) becomes 27 dx(t) é f(t,x(t))dt + P(t)h(t,%(t))[dz(t) - h(t,x(t))dt] (2.12) where P(t) =1: (t) - x(t)2. The function P(t) can be approximated from (2.6) and Ito's Lemma (Appendix B). Letting g(t,x) = x2 and using (2.6) along with (2.8) - (2.10), /3~ . dx (t) E 2 R(t)f(t,x(t))dt + 2 f(t,fi(t))P(t)dt . . /§ ’72 . + h(t,X(t))[X (t) - X (t)X(t)][dZ(t) - h(t,fi(t))dt]. Applying Ito's lemma, Appendix B, to (2.12) gives dx(t)2 e 2 x(t)f(t,x(t))dt + 2 x(t)h(t,fi(t))P(t)[dz(t) - h(t,x(t))dt] + [h(t,x(t))P(t)]2dt. The differential of P(t) is now approximated by dP(t) = dE:>(t) - s(t)21 s 2 f(t.%(t))P(t>dt - [h(t,R(t))P(t)]2dt A o + h(t,§(t))[x3(t) - x (t)x(t) - 2 R(t)P(t)][dz(t)-h(t,§(t))dt The condition (2.10) can be rewritten as x - 3 x a + 2 a = x - x a - 2 a? 5 0. The simultaneous integration of the following stochastic differential equations provides an approximately-optimal filter; x(t) is the filter output. dfi(t) = f(t,s(c>>dt + P>tdz - h(t.s>dt1; 5Ht.) = E (2.13) dpgt) = dt 2 f(t,x(t))P(t) - [h(t,x(t))P(t)]2 ; P(to) = Var (b) (2.14) V‘ 28 Recalling the equivalence of (2.2) and (2.4) and dividing by dt permits (2.13) to be interpreted as $2451 = mam) + P>[y - h(tamn Retaining (2.13) in its special differential form will emphasis the necessity of using Ito calculus for its solution. Equation (2.14) does not contain any white noise terms and may be treated as an ordinary differential equation. This development is a simplified version of work performed by Bass, Norum, and Schwartz [29]. The approximately-optimal filter (2.13) - (2.14) was derived from a solution which minimized the error variance at each instant in the observation interval. Bellman obtained the same equations by using a least-squares criteria and the theory of invariant imbedding [30]. Friedland and Bernstein also derived this filter as an approx- imate solution to the filtering problem analyzed from the maximum likelihood approach [18]. This filter, besides being a common academic filtering solution, has been applied to numerous real-world problems [19], [36]. Its performance was analyzed in a recent paper by Stear and Schwartz; their computer simulation compared several approximately-optimal filters and showed no distinction between this approximation scheme and other, more nearly optimum filters [37]. It may thus be concluded that this approximate filter is a realistic solution to the optimal filtering problem. 2.3 Sub-Optimal, Sequential Filters The following sequential filters are suggested as solutions to the filtering problem defined in Sec. 2.1: WI‘J is V8 vi NO 9): th in th St ma a1 29 d§(c) = f(t,;(t))dt + a(c,§t t)[dz(t) - h(t,x(t))dt] (2.15) 0, where x(t) is the filter output with an arbitrary initial value x(to). The gain term, a(t,;' ), is a time-varying function of the t ,t o t t = [x(s): to s s s t]. The structure of (2.15) O, is very similar to that of the approximately-optimal filter in (2.13) - output waveform x (2.14). In fact, if a(t,;£ ,t) = P(t)h(t,;(t)) and the initial con- ditions are set equal, the filters are identical. The filtering solutions proposed in (2.15) provide filters which will fulfill two of the requirements for a useful filter; R.2, asymptotic convergence, and R.3, sequential structure. These require- ments can be met even if x(to) is not specified so no prior knowledge is required Of the initial condition x(to) in (2.1). The outstanding feature of (2.15) is the random gain term, a(t,-). This gain term must satisfy conditions guaranteeing con- vergence but, otherwise, has an arbitrary form. It may be possible, within these convergence limitations, to Specify a gain function which would allow the filter to fulfill the optimality requirement R.l. An example of this idea would be to set a(t,;£ t) = P(t)h(t,;(t)). On 0’ the other hand, the discussions in Secs. 1.1 and 1.2.3 indicate that in many applications one may prefer that a(t,-) be deterministic, thus providing the filter with a simple, asymptotically convergent structure . 2.4 Summary Equation (2.15) introduces a class of sequential filters which may include an approximately-Optimal filter. The convergence of algorithms from this class will be the main concern of the remaining 30 chapters. This chapter also introduces the concepts of Ito calculus and stochastic differential equations into the non-linear filtering pro- blem. An understanding of these concepts has enabled investigators to develop an acceptable optimal filter theory. This knowledge and a familiarity with the discrete—time stochastic approximation methods lead to asymptotic convergence studies. CHAPTER 3 CONVERGENT ALGORITHMS FOR PARAMETER ESTIMATION Many of the statistical methods which have been developed for the classic problem of estimating unknown parameters have provided approaches to filtering problems. The literature review in Chapter 1 showed that stochastic approximation schemes, developed for dis- crete-time parameter estimation, suggest ideas for analyzing the asymptotic behavior of the continuous-time filtering algorithms prOposed in Chapter 2. This chapter concentrates on the problem of continuous-time parameter estimation by developing two con- vergence theorems analogous to theorems from discrete-time stochastic approximation. The filtering problem discussed in Sec. 1.1 may be con- sidered equivalent to a problem in estimation theory where the filter's output provides an estimate of the present message value. For this reason, the problem of determining, or estimating, para- meters is a special case of the filtering problem with f(t,x) = 0 in (2.1). Optimum solutions for both filtering and parameter estimation problems are based on similar theorems and procedures [12]. There is, however, a conceptual difference between the two problems. A filter's output attempts to follow the time variations of the message process; when the message is random, the output can never follow the message exactly. The parameter estimation problem, where the message is time-invariant, requires an estimation algorithm 31 32 which eventually determines the SEES value of the unknown constant. It is for this reason that any study of a practical parameter estimation scheme must include an analysis of asymptotic behavior. This chapter analyzes the large-time behavior of the estimator represented by (2.15) with f(t,x) = O. This sub-Optimal estimator is shown to converge asymptotically to the true value of the para- meter. The chapter is broken into three parts. Section 3.1 contains a list of assumptions regulating the behavior of the observations, (2.4), and the gain function in (2.15). Also, two lemmas are de- veloped for later use. In Sec. 3.2 and 3.3, the assumptions are used as hypotheses for two convergence theorems. These theorems show that certain estimators converge to the true parameter with probability one and in the mean-square sense. The theorems are a major result and indicate that the sequential estimators defined in (2.15) behave asymptotically as practical estimators. The convergence theorems do not require any structure for the gain functions, so in the last part of this chapter a specific class of gain functions is prOposed. Corollary 3.3 shows that this class of functions satisfies the assumptions of the main conver- gence theorems and that the approximately-optimal filter derived in Chapter 2 and specialized in this chapter has a gain function which is a member of this class of functions. SO, under certain assumptions on the observation process, an approximately-optimal algorithm is shown to converge to the true parameter value. This is a major result since it implies an estimation scheme which satisfies the three requirements of a useful filter. 33 All aspects of this chapter are new and original. Convergence of the estimator algorithm (2.15), in which the information supplied by the past observations is contained in a(t,xt t), has never been 0, proved for a continuous-time problem. The assumptions given in Sec. 3.1 are directly analogous to the discrete-time assumptions of Albert and Gardner [20] but the proofs of the convergence theorems (NT not ,, follow in a straight-forward manner. 3.1 Preliminary Considerations This section provides the groundwork necessary for deriving the convergence theorems in Sec. 3.2. This basic material includes the formulation of the parameter estimation problem, a description of the estimation schemes being investigated in this chapter, two useful lemmas, and the hypotheses for the convergence theorems. The parameter estimation problem and possible solutions con- sidered in this chapter are defined using equations equivalent to those in Chapter 2. The parameter to be estimates is e and if x(t) = x(to) = e, (2.4) represents noisy observations of this un- known parameter. This observed process is modeled by dz(t) = h(t,e)dt +~dB(t) (2.4) with giSEl = y(t) the observed process and B(t) a Brownian motion process. In the first three sections of this chapter, 9 is not assigned a prior distribution. The class of sub-optimal estimation algorithms proposed in (2.15) with f(t,x) = 0 are considered as the possible estimators for 9: d;(t) = a(r,§{t t)[dz(t) - h(t,x(t))dt] : £(ro) arbitrary (3.1) O, 34 where x(t) is the estimate of e at time t. The estimator error is denoted as e(t) = x(t) - e and, according to (2.4) and (3.1), has the following differential re- presentation. de(t) = a(r,§£t ,t)[h(t,e) - h(r,§(r))]dr + a(t,;t ,t)dB(t) (3.2) o 0 An examination of (3.2) brings out many of the difficulties inherent in a study of theasymptotic convergence of e(t). First of all, (3.2) is a stochastic differential equation containing the differential of Brownian motion. Strict attention must be paid to the properties of Ito calculus in all further derivations. Since (3.2) is non-linear and since a(t,;;o,t), the gain function, is unspecified, an analytical solution for e(t) is impossible. Some description of the behavior of the gain function is mandatory and assumptions must be placed on h(t,x) before any inferences can be made about e(t) and/or e(t)2. The development of convergence theorems for equations such as (3.2) is not obvious. In a survey of the discrete-time stochastic approximation literature, Martingale theory was often used to obtain convergence with probability one [38], [39], [40]. Martingale theory and Ito calculus seem very compatible and both are used in later proofs. A survey of stochastic approximation methods does suggest certain assumptions about h(t,x) and a(t,;[ ,t) which can be used effectively to Specify the asymptotic behavior of the esti- mation error. This background of discrete-time investigations pro- vides a fruitful source of ideas for analyzing (3.2). Before listing the assumptions sufficient for convergence, two lemmas are stated and proved. These lemmas are applied in later 35 proofs both in this chapter and in Chapter 4. Lemma 3.1 Let w(t) be a stochastic process satisfying t w(t) s w(r) +'Irg(s)dB(s) for t,r 2 to _(3.3) where B(s) is a Browian motion process. Assume that At is a c-algebra, that w(t) is measurable on At’ and that At(: F{[to,t], 3(5)}, the c-algebra generated by B(s), tO s s s t. Also assume that g(t), a random function, is measurable on F[[to,t], 3(3)]. Then w(t) is a positive super-martingale.2 Proof. Apply the expectation relative to Ar to each side of (3.3). E{w(t)|Ar] s w(r) + E[I:g(s)dB(s)‘Ar] (3.4) For w(t) to be a super-martingale the second term on the right side of the inequality must be zero. A property of conditional expecta- tions3 shows that Eif:g(s)dB(s)|Ar] = r(r(§:g(s)do(s)|r([ro,r], B11IArI Since g(s) is measurable on F{[to,s], B(u)] the conditional expectation satisfies property B.2; i.e. t . EtfrgdnIFttto.r1, B(u)}] = o (3.4a) This completes Lemma 3.1. t Lemma 3.2 If m(t) >'0 and M(t) = c + [t m(s)ds with c < m arbitrary and if M(t) « m as t a m then It m(s) ds diverges if a s 1 t0 M(s)d converges if a > 1 Consult Appendix B for discussion of c-algebras and martingales. Doob [41], p. 37. 36 Proof. Assume a s 1; there exists a T0 > tO such that for t > T M(t) z 1. Now for a 2 TO and b 2 O and because of the monotone behavior of M(t), Ia+b m(s)ds 2 l ra+b l a+b a m(s)” M(a-I-b)a a m(swssza m(sws The right hand side can be integrated to obtain a+b m(s)ds 2 M(a-I-b) - M(a) = 1 _ M(a) fa M(s)d M(a+b) M(a+b) The monotone property of M(t) shows that for every 3 2 To, there exists a T 2 T such that for a+b > T , 1 - EXEZ--—'> % o M(a+b) This contradicts the necessary Cauchy condition [42] for the con- vergence of I: m(s)ds O M(S)oz Assume a > 1, let a = l + 5 where 6 > O, examine the integral J’EO 53—8- [M(s)]-6ds = M(t)-6 - M(tO)-6: - M(ro)’5 < co Evaluate the derivative in the integral. d -6 d 1 ds d8 [[c +-I: m(u)du]g] o . 6[c +]‘: m(o)do]5'lm(s) O [c +-I: m(u)du]26 O _ 5 m(s) = _ 6 m(s) [c +]‘: minnow)”6 M(s)a O O 3 37 So '6 f: fl§-qus~§£§3— as t-oao. o M(s) Lemma 3.2 is complete. The following assumptions are made for (2.4) and the gain function in (3.1). These assumptions are the hypotheses for the main convergence theorems proved in Secs. 3.2 and 3.3 and are analogous to the assumptions Albert and Gardner [20] formulated for their dis- crete-time observation equation and gain sequences. Assumption 1: For each value of t, h(t,x) is monotone and differ- entiable with respect to x. Assumption 2: The function a(t,x; t) is measurable on F[[to,t], ;(s)], o, the a-algebra generated by x(s), where to S s s t. The space of all real continuous functions defined on the time interval [to,t] is denoted by CE; h(t,x) and a(t,x[ t) are assumed to satisfy all 0 0’ _ conditions necessary to guarantee the continuity of x(t). (Appendix D). t Assumption 3: For each t, the Sign of a(t,X) for all X 6 Ct is o constant and equal to the sign of h(t,x) = §;'h(t,x). Assumption 4: I: b(t)inf|a(t)]dt = m where b(t) = inf]h(t,x)\ o x and iniIa(r)\ = inf \a(r,X)I t XEC to Assumption 5: I: SUp]a(t)]2dt < m where sup‘a(t)| = supt ]a(t,X)‘ o XGCt o 3.2 The Main Convergence Theorem The theorem to be Stated and proved in this section shows that the output of the estimation algorithm, x(t) in (3.1), converges, as t increases, to the value of the unknown parameter. A basic 38 martingale theorem provides both convergence with probability one and in the mean-square sense. Before the actual statement of Theorem 3.1, an equation is developed for the square of the estimation error using some of the assumptions listed in Sec. 3.1. The following simplified notation will be used for the remainder of this chapter: a(t,;; t) = a(t) 0’ Equation (3.2) exhibits the estimation error. Assumption 2 and the development in Appendix D show that a(t) is measurable on F[[to,t], B(s)], so, if Z(t,e) = e2, Ito's lemma (Theorem B.l) can be applied to (3.2) producing a differential equation for e(t)2. de(t)2 = 2 e(t)a(t)[h(t,9) - h(t,;(t))]dt + a(t)2dt + 2 e(t)a(t)dB(t) This equation is equivalent to (3.5). e(t)2 = e(r)2 +1]: 2 e(s)a(s)[h(s,e) - h(s,§(s))jds +-]:a(s)2ds +‘f: 2 e(s)a(s)dB(S) ; t,r 2 t0 (3.5) The mean value theorem [42] gives: h(s,§') = We) +hts,o(s>)t§(s> - e] e s ¢(s) s §(s) if §(s) 2 e where _ _ x(s) s ¢(s) s e if x(s) s e Assumption 3 implies that: a(s)h) = IaIIfi(s.o>I >. o Combining these results with (3.5) gives 39 e(t)2 s e(r)2 - 2 j:\a(s)I|h(x,¢(s))|e(s)2ds an]: sup|a(s)|2ds + 2 f:e(s)a(s)d3(s) (3.6) Equation (3.6) will be a basic equation both in the following proof and in the proof of Theorem 3.2 in Sec. 3.3. Theorem 3.1 Under Assumptions 1-5, the process x(t) defined by (3.1) converges to 9 with probability one and in the mean square sense: lim ;(t) = e w.P.l. and 1im E(x(t) - e)2 = O. t-co t—uan Proof. Equation (3.6) provides an inequality for the squared error. The second term on the right hand side of (3.6) is negative, thus establishing a simple and very convenient inequality for e(t)2. e(t)2 s e(r)2 +j‘t sop|a(s)|2os + 2 jte(s)a(s)d3(s) : r t 2 t (3.7) r r ’ 0 Defining w(t) = e(t)2 +-f: sup‘a(s)]2ds (3.8) and substituting into (3.7), w(t) s w(r) + 2 f:e(s)a(s)d3(s) (3.9) A review of the proof up to this point emphasizes the importance of Assumptions 1 through 3 in obtaining a transformation of e(t)2 which satisfies an equation of the form (3.9). It has already been pointed out that a(t) is measurable on FI[to,t], B(s)] and Since ;(t) is measurable on this o-algebra, e(t)a(t) is also. The error squared is measurable on F{[to,t], E(s)} and it follows directly that F{[to,t], e(s)2]<: F[[to,t], B(s)]. SO when At is defined equal to F{[to,t], e(s)2], Lemma 3.1 indicates that w(t) is a 40 positive super-martingale. With the process w(t) being a super-martingale, the function Ew(t) is a non-increasing function of t and since it is bounded below it follows that4 lim Ew(t) = Eg = K.< m (3.10) 64: A combination of (3.8), Assumption 5, and (3.10) shows that , 2 11m Ee(t) = EC (3-11) t—m Returning to (3.6) and noting that inf|a(s)]b(s) s ‘a(s)‘lh(s,¢))| allows another weakening of the inequality on e(t)2. e(t)2 s e(r)2 - 2 I: inf‘a(s)\b(s)e(s)2ds + [: sup|a(s)|2ds + 2 [Ee(s)a(s)ds(s) (3.12) The expectation of each side of (3.12) is taken and Fubini's theorem is applied to the second term on the right hand side. Ee(t)2 s Ee(r)2 - 2 I: inf]a(s)]b(s)Ee(s)2ds +-f: sup‘a(s)‘2ds + 2 s[f:e(s)a(s)d3(s)] (3.13) The proof Of Lemma 3.1, specifically the part deriving (3.4a), t can be reapplied to Show E[Ire(s)a(s)dB(s)] = 0- (3-14) Applying (3.14), (3.13) becomes 2 2 t . 2 t 2 Ee(t) s Ee(r) - 2 fr inr|a(s)\h(s)Ee(s) ds +-jr sop\a(s)\ ds (3.15) The application Of (3.11) and Assumption 5 to (3.15) indicates the necessity of (3.16). Doob [41], chapter 7. 41 2 I: inf‘a(s)]b(s)Ee(s)2ds < m for t 2 to (3.16) If (3.16), (3.11), and Assumption 4 are all to be satisfied, it is necessary that 2 lim Ee(t) = 0, so the mean-square convergence is assured. t—m The super-martingale inequality (Theorem B.3) shows conver- gence with probability one. The inequality gives Prob{ sup w(u) > e] < [Ee(r)2 +‘r: sup|a(s)‘2ds]%' for each t 2 r. t2u2r (3. 17) The definition in (3.8) provides the following relationship Prob[ sup e(u)2 > e} < Probf sup w(u) > e]. (3.18) t2u2r t2u2r Mean-square convergence and Assumption 5 are now combined with (3.17) and (3.18). Given an e > O and 5 > 0, there exists a R > 0 such that for every t > R Prob[e(u)2 > e : t 2 u 2 R] < 6 thus lim e(t) = 0 W.P.l.. t—am This completes the proof of Theorem 3.1. 3.3 An Alternate Convergence Proof The proof of Theorem 3.1 was structured with convergence with probability one in mind; mean-square convergence was a by-product. The mean-square result was unexpected since, in all known stochastic approximation theories, demonstrations of mean-square and of prob- ability-one convergence have required separate theorems. The theorem 42 in this section demonstrates another way to prove the mean-square pro- perty. The techniques used in its proof will be applicable to part of the filtering investigation in Chapter 4. Theorem 3.2 Under Assumptions 1-5 the process x(t) defined in (3.1) converges in the mean-square sense to 6; lim E(;(t) - 9)2 = 0. .nggf, The inequality in (3.6) is weakeneth: applying the fact that infla(s)‘b(s) s |a(s)“h(s,¢(s))‘. The expectation of each side is taken to give 2 2 t . 2 t 2 Be (t) g Ee (r) - 2 fr 1nf‘a(s)\b(s)Ee (s)ds + fr sup‘a(s)] ds. (3.19) The expectation of the last term in (3.6) is zero for the reasons given in Theorem 3.1. The remainder of this proof relys on 2 . the construction of a function which dominates Ee (t). The funct1on e(t) is defined as the solution to .%§$£l = - 2 b(t)iof|a(t)|e(t) + sop|a(t)|2 : C(to) = Ee2(to) (3.20) Equation (3.20) is a linear equation having the following solution: e(t) = D(t t )Ee2(t ) +-It D(t s)sup|a(s)]2ds (3.21) ’ o O to ’ In (3.21), D(t,s) = exp[- 2 f:b(u)int\a(o)]doj (3.22) Equation (3.20) is now written as an integral equation. t t 2 = - ’ d + d ° . e(t) c(r) 2 [rb(s)1nf|a(s)]c(s) s Irsup‘a(s)] s, t,r 2 to (3 23) Substracting (3.23) from (3.19), [Ee2(t) - e(t)] + 2 f:b(s)infla(s)I[Eez(s) - c(s)]ds s [Ee2(r) - C1. (3.24) 43 2 Equation (3.24) indicates that c(t) 2 Ee (t). To verify this, suppose it is not true. Then there exists a time T1 such that Ee2(T1) - c(T1) = 0. If r = T1 and t = T1 + 5 where 6 is the amount of time Ee2(t) - e(t) is positive, then 5 > 0 since Ee2(t) and e(t) are continuous and (3.24) provides a contradiction. Equation (3.21) shows that Ee2(t) s D(t,to)Ee2(to) +-I:0D(t,s)sup]a(s)\2ds (3.25) The estimator defined in (3.1) converges to 9 in the mean- square sense when the two terms on the right hand side of (3.25) con- verge to zero. The first term is easily handled by an application of Assumption 4 to D(t,to) so that D(t,to) a O as t a m. (3.26) Assumption 5 and (3.26) must both be used on the second term. Arbitrarily selecting e > 0, a T > 0 can be found so that t IT sup‘a(s)‘2ds < 3/2 and D(t,to) s 1 for t > T. Writing the integral term under investigation as a sum Shows that I: D(t,S)suP‘a(S)‘2ds s fésupia(s)|2ds +-IEOD(t,s)sup‘a(s)‘2ds for t > T- o . d . d = e The constant 32 1s ef1ne as 32 ZIT su ‘a(s)‘2ds top There exists a T2 > T such that D(t,s) < 32 for t > T2 and s < T. Therefore, for every e > 0 there exists a T2 > 0 such that for t > T2 , e I: sup‘a(s)|2ds t 2 a 0 ._ D(t,s)sup a(s) ds 5 + - , Ito ' ' 2 2 I: Sup‘a(s)‘2ds e o 44 which shows that lim Ee2(t) = O. t-m This completes the proof. 3.4 Particular Gain Functions The hypotheses in Sec. 3.1 guarantee the convergence to e of the estimator defined in (3.1). However, they are very general. This section lists a Specific class of gain functions which satisfy the five assumptions of Sec. 3.1. Before discussing the gains, a few assumptions about h(t,x) are listed which are a combination of one of the assumptions in Sec. 3.1 and two new ones. These new assumptions can replace those of Sec. 3.1 when the gains of this section are used in (3.1). Assumption 81: The function h(t,x) is monotone and differentiable in x for each t. t O Assumption 82: Sup ‘§%E% S Kt< m where B(t) = sup]h(tsx)i Cost/w X , , 2 , 2 t 2 Assumpt1on S3: 11m B(t) =tm Wlth B(t) = It b(s) ds b4: 0 Qggollary 3.3 With Assumptions Sl - S3, gain functions a(t,;L t) 0’ having the form of (3.27) satisfy Assumptions 1-5 of Sec. 3.1. _. 2 - _. a(t,Xt ,t) sgn h(t,x(m -' o a(t,X ) = 3 (3'27) t ,t - t ‘- 2 o o‘(t,xt ’t)[c +'ft y(s,xt ,5) ds] 0 o o where the three functions g(t,xto,t), a(t,xt0,t), and y(t,xto,t) are less than g(t), are greater than, b(t)' and are measurable on F[[to,t], x(s)]. Proof. Assumptions 1-3 are satisfied by inspection. The integrand of Assumption 4 is now examined. 45 3 3 b(t)inf‘a(t)‘ 2 b(tlt 2 2 3 b(t) t 2 g(t)[c + ft g(s) ds] K h(t)[o'+j‘t b(s) ds] 0 O This shows that 2 I: b(s)inf\a(s)|ds 2 15-]: 13(8) 8 2 ds. 0 K o [c +-It b(u) du] 0 Lemma 3.2 and Assumption S3 indicate that Assumption 4 is satisfied. Similarily, 2 g(t)2 2 K4b(t)4 sup‘a(t)‘ S t 2 2 t 2 2 b(t)[c + ft b(s) ds] b(t) [c +J‘t b(s) ds] 0 o which shows that t 2 4 t b(s)2 Itosup‘a(s)‘ ds 5 K I ds . to [c + f: b(u)2du]2 O Assumption 5 also follows from Assumption 83 and Lemma 3.2. One gain function which satisfies (3.27) is of unusual inter- est. This function is m; )= h(t,x(tn , _ (3.28) to" W? +j': h(s.x)2ds1 0 where P = var e. For the discussion of this gain and the correspond- ing estimator, e, the unknown constant, is assigned a prior distribu- tion and the requirement x(to) = E(e) is made for (3.1). To better understand the significance of (3.28) P(t) is introduced. 1 P(t) = _ . _ [P 1 +-J: h(s,x(s))2ds] o Differentiating, 46 dP(t) _ g_ 1 dt dt [Pu1 +-I: h(s,;(s))2ds] o b (mignz - - 2 _ . _. = ' [h(t,x(t))P(t) - (3-29) [P 1 +-I: h(s,x(s))2ds]2 ] 0 Now, (3.29) is the same as (2.14) when f(t,x) = 0; i.e.,when para- meter estimation is considered as a filtering problem. The algorithm specified when the gain function (3.28) is placed in (3.1) defines an approximately-Optimal estimator of 9. Theorem 3.1 and Corollary 3.3 show that this estimator asymptotically converges to 9 with prob- ability one. Thus, the algorithm provides an estimator that qualifies as a useful algorithm by satisfying in some sense, all three perfor- mance criteria in Chapter 1. 3.5 Summary The continuous-time, parameter estimation problem has been considered in this chapter. The main results, Theorems 3.1 and 3.2, show that the sub-optimal estimators defined by (3.1) converge to the true value Of the unknown parameter with probability one and in the mean-square sense. A secondary, but still Significant, result was obtained for the class of gain functions in (3.27). Corollary 3.3 demonstrates that this class of gains satisfies the hypotheses of Theorems 3.1 and 3.2. In an interesting result, this class is shown to contain function (3.28). When algorithm (3.1) employs the gain in (3.28), the algorithm is identical to the approximately-optimal scheme in (2.13)-(2.14). In other words, the approximately-optimal filter (2.13)-(2.14), if used for parameter estimation, qualifies as a useful algorithm satisfying requirements R.l, R.2, and R.3. CHAPTER 4 CONVERGENCE INVESTIGATIONS FOR THE FILTERING PROBLEM In Chapter 3, convergence theorems were developed for a class of sub-optimal estimators used for solving the continuous-time para- meter estimation problem. The techniques used to prove these theorems can be applied in analyzing the behavior of certain sequential, sub- optimal, continuous-time filters. The main objective of this chapter is to formulate convergence theorems for the class of sub-optimal filters represented mathematically by (2.15). Section 4.1 contains a list of the assumptions which are sufficient for convergence; basically, they allow the analytical tools and ideas described in Chapter 3 to be applied to the task of demonstrating the asymptotic convergence of time-domain filters. The two convergence theorems are found in Sec. 4.2 and show that the output of the sub-optimal filter, (2.15), converges to the message as time becomes infinite. The convergence is both in the mean-Square sense and with probability one. Section 4.3 shows that, except for the most general assumptions on the message and observation models, the convergence theorems de- veloped for the non-linear filtering problem, can be applied to the .1— During the remainder of the thesis the phrases "asymptotic con- vergence of filters," "filter is shown to converge", etc. means that the difference between the filter output and message process converges to zero, in some probabilistic sense, as time increases indefinitely. 47 48 linear filtering problem and that the Kalman-Bucy filter is asymp- totically convergent. This chapter concludes with Sec. 4.4 where properties of the message and observations models which guarantee the convergence of the approximately-optimal filtering solution developed in Chapter 2 are listed. The convergence ideas presented in this thesis have demonstrated that a non-linear filter presently being applied to real-world problems [30], [37], [7] is asymptotically convergent. This filter meets the three requirements enumerated in Chapter 1. The demonstration of a convergent, approximately-optimal filter illustrates only one type of result which can be realized by using the mathematical techniques considered in this chapter. The convergence theorems exhibit the applicability of Ito calculus and martingale theory to the error analysis of sequential filtering algorithms. The approach, theoretical proofs, and conclusions have originated with this thesis investigation. 4.1 Assumptions and Preliminagy Derivations This section lists the assumptions for the convergence theorems to be proved in Sec. 4.2 and makes preliminary calculations on the error equation. Section 3.1 dealt with the parameter estimation problem. Assumptions 1 through 5 concerned the behavior of the non-linear function in the observation process, h(t,x) in (2.4), and the behavior of the gain function in the sub-optimal estimator, (3.1). Assumptions of this type are retained in the filtering solutions suggested in this chapter. In addition, conditions are placed on the non-linear function defining the message process, f(t,x) in 49 (2.1). Assumptions are also made involving a combination of the gain function, (2.15), and both h(t,x) and f(t,x). All assumptions used in Theorems 4.1 and 4.2 are listed below. The first two assumptions are the same as 1, 2 and 3 in Chapter 3. Assumption B.1: h(t,x) is monotone and differentiable in x for every t. Assumption B.2; a(t,x; ) has the same sign as h(t,x) for every ,t O t and is measurable on F{[to,t], x(s)}, the o-algebra generated by t;(s). tO s s s t]. The remainder of the assumptions are given in terms of the following Simplified notation. C(t) = SUp f(t,x) X b(t) = in£\h(t,x)| X ¢(t,to) = exptfiocds1 a(t) = a(t,;; ) ,t o inf|a(t)] = inft |a(t,X)‘ XEC to sup‘a(t)] = supt |a(t,X)] XECt o t . . . where C 13 the space of cont1nuous functions on [to,t]. o Assumption B.3: f(t,x) is differentiable in x for every t. 2 Assumption 3.4: a) lim sup ¢ (t,to) = K.< m Tam tZT b) I: b(S)inf‘a(s)‘ds = m o c) I: 52(to,t)sup‘a(t)\2dt < a o 50 ¢(t t )2 Assumption B.4': a) 1im ‘ 9 t t—m 8Xp[2 It = 0 b(s)inf|a(s)‘ds] o b) f:osup‘a(t)]2dt < a Before the two convergence theorems are stated, a basic equa- tion involving the filter error is developed from (2.1) and (2.4). The error at time t is denoted by e(t) = x(t) - x(t). The follow- ing differential equation can be written from (2.1), (2.4), and (2.15). de(t) = [f(t,;(t)) - f(t.x]dc + a(c>th(t.x(t)) - h>1dc + a(t)dB(t) 2 If Z(t,e) = e , then Ito's lemma (Appendix B) gives an equa- tion for e(t)2: de(t)2 = 2 e(t)[f(t,;(t)) - f(t,x(t))]dt + 2 e(t)a(t)[h(t,x(t)) - h(t,;(t))]dt + a(t)2dt + 2 e(t)a(t)dB(t) The mean-value theorem gives f(t,xm) = f(c.§ + f(t,a(t))[x(t) - §3 h(t.§(t>> + h(t,e>[x - Em) h(t,X(t)) where a(t) and e(t) are both in between x(t) and x(t). The equation for the error Squared becomes: 2 2 ' . 2 de(t) = 2 e(t) [f(t,g(t)) - a(t)h(t,9(t))]dt + a(t) + 2e(t)a(t)dB(t) (4.1) Introducing a new variable y(t) = ¢(to,t)2e(t)2, (4.1) leads to the following differential equation. 51 dy(t) e(t:o.t)2de(t:)2 + e(t)2 g; e(to.t)2dt 2 e2o2[f(c.o) - e(t)]dc - 2 o2a)e2dt + o2a2dt + 2 otto.c>2e(t>adB supIaI ds +fro ea(s>dB(s> Letting w(t) = y(t) +-f:¢(to,8)zsup‘a(s)]2ds gives t 2 w(t) s w(r) + frz ¢(to,s) e(s)a(s)dB(s) (4.3) Equation (4.3) satisfies all of the hypotheses of Lemma 3.1, so w(t) is a positive super-martingale. The previous chapter stated that the expected value of a super-martingale is non-increas- ing so that lim Ew(t) = E; = Kl< m t-cco and lim Ey(t) = Eg t—m Returning to (4.2) and using the inequality inf]a(t)‘b(t) s ‘a(t)“h(t,9(t))‘ provides another inequality for y(t)- y(t) s y(r) - 2 f:inf]a(s)|b(s)y(s)ds +-I:¢(to,s)zsup‘a(s)|2ds +-I:¢(to,s)2e(s)a(s)dB(s) (4.4) 53 The expectation operator is applied to (4.4) and the fact, derived in Lemma 3.1, that E I:¢(to,s)2e(s)a(s)dB(s) = 0 is employed to obtain an equation for Ey(t). Ey(t) s Ey(r) - 2 Itinf|a(s)‘b(s)Ey(s)ds +.[t¢(t S>23up‘a(s)‘2ds r r o’ Combining Assumption B.4(c) and the convergence of Ey(t), fjin£|a(s)|b(s)sy(s)ds < o for every r 2 to. (4.5) If (4.5) is to be consistent with Assumption B.4(b), . . 2 2 11m Ey(t) = 11m ¢(t ,t) e(t) = O. t-a t—aco o Since,from Assumption B.4(a), lim sup ¢(t,to)2 = K.< m, the function 2 ¢(to,t)2 = 1/¢(t,to) must be non-zero for large t; thus , 2 11m E e(t) = 0. t—uoo The basic super-martingale inequality (Appendix B) demonstrates probability-one convergence. The process w(t) has been shown to be a positive super-martingale so Prob[ sup w(u) > e}‘< EEX£2 r2u2t e The process w(t) depends on y(t) in such a way that Prob{ sup y(u) > e] < Prob[ sup w(u) > e] r2u2t r2u2t Combining Assumption B.4(c) and the mean-square convergence result, lim y(t) = O w.P.l. t-too 54 It has already been pointed out that ¢(to,t)2 > 0 for large t so that lim e(t) = 0 w.P.l. t-Ieo The proof of Theorem 4.1 relied on the reasonable assumption that lim sup ¢(t,to)2 = Kw< a. In Theorem 4.2, ¢(t,to)2 is allowed to grow large as t approaches a. This theorem differs from Theorem 4.1 both in its list of assumptions (B.4 is replaced by B.4') and in its conclusions since only mean-square convergence is demonstrated. Theorem 4;; Under Assumption B.l through B.3 and B.4', the output x(t) of the filter represented by (2.15) converges in the mean-square sense to x(t): 1im E(x(t) - x(t))2 = o. t-a Proof. The inequality inf|a(s)|b(s) s Ia(s)||h(s,e(s))\ and the expectation of (4.2) show that Ey(t) s Ey(r) - [:2 ianaIbEyds +J§o to o o it follows from the hypotheses that 2 Mme) 2 I:0¢(s,to)2H(s) ds OsP(t)s —.0 as t-ooo- This theorem is analogous to the multi-dimensional discrete-time linear filtering result by Aoki [l6] and Sorenson [43]. The object of this section is to see if the non-linear con- vergence Theorems 4.1 and 4.2 imply the convergence of the Kalman- Bucy filter from the hypotheses of Theorem 4.3. The function ¢(t,to)2 = (y(t,to)2 (since C(t) = P(t)) may become unbounded so Theorem 4.2 is checked to see if the gain of the Kalman-Bucy filter satisfies its hypotheses. Assumption 8.1 through B.3 are satisfied by inspection, so only Assumption B.4' need be examined. The gain function for the Kalman-Bucy filter is 58 o(t.to)2H(t) [P-1 +-I:O¢(s,to)2H(s)2ds] a(t) = P(t)H(t) = Define the function 2 e(t.to) tn(t) = t exp[2 ft b(s)inf|a(s)|ds] o where H2o2 b(t)inf a(t) = _ . [P 1 +‘[: ¢(s,to)2H(s)2ds] O Letting the denominator of m(t) be Hts>2o2 ds 1 t u(t) = - exp 2 I P t "1 S 2 2 O[P +‘ft ¢(u,to) H(u) du] 0 produces the differential equation duct) _ 2 o dt [P-1 +-I:o¢(s,to)2H(s)2ds] . -1. . u P The function w(t) = [P-1 +‘[: ¢(S,to)2H(s)2ds]2 satisfies the o differential equation, dw(t) = 2 ¢(tato)2H(t)2w(t) dt [P-1 +.[: ¢(s,to)2H(s)2ds] o "Olr—I ; W(to) = Since w(t) and u(t) satisfy the same linear differential equation with identical initial conditions, they are equal; i.e., w(t) = u(t). Now, to examine Assumption B.4', this equality is used in m(t). 2 2 o(t.to) ¢(t.to) m(t) = = _. exp[2 f: b(s)inf|a(s)‘ds] P[P 1 +‘[: ¢(s,to)2H(S)2ds]2 o o 59 The hypothesis in Theorem 4.3 is sufficient for o4H2 2 ’ O sup]a(t)‘ dt =. _ Ito J":0 [P 1 +.[: ¢(s,to)2H(s)2ds]2 o l 4 2 s _2 I: ¢(t,to) H(t) dt P o , 2 and w1th ¢(t,to) < l for t > some T > t0 2 1 2 2 1 T 4 2 I: sup\a(t)| dt s gfg‘fi;¢(t,to) H(t) dt + ;:2«fto¢(t’to) H(t) dt 0 l T 4 2 +—-——. 5 K1 P_2 Ito¢(t,to) H(t) dt. Theorem 4.1 handles case 1. First, Assumption B.4(c) requires I: ¢(to,s)2sup‘a(s)|2ds be integrable. This follows directly when pazt (b) of case 1 and Lemma 3.2 are applied to the integral, o(t.to)2H(t)2 dt. -1 t 2 2 2 o [P +«fto¢(s’to) H(s) ds] co 2 2 [to¢(to,t) sup\a(t)\ dt = f: Furthermore, for Assumption B.4(b), part b of case 1 and Lemma 3.2 show that 2 2 m(8.t0) H(S) o [2'1 +‘f: ¢(u,to)2H(u)2ds] O G) f:Ob(s)inf|a(s)\ds =‘ft dS =mo 61 Consequently, Theorem 4.1 and 4.2 Show that the optimal, Kalman-Bucy solution to the linear filtering problem is asymptotically convergent in the mean-square sense if (4.9) and (4.10) satisfy the hypotheses of Theorem 4.3 and lim sup ¢(t,to)2 < m. This is the first time that restrictions have been placed on (4.9) and (4.10) which guarantee the convergence of the optimal linear filter with prob- ability one. 4.4 A Class of Gain Functions Guaranteeing Convergence This section returns to the filters represented by (2.15) which were proposed as solutions to the non-linear filtering problem. In Sec. 4.2, these sub-optimal filters were shown to converge when the models of the message and observation processes, (2.1) and (2.4), and the gain function, a(t,; ,t), satisfied Assumptions 13.1 - B.3 and B.4 or B.4'. The gain fugction does not have any fixed Structure and only has to fulfill conditions such as: I: b(t)inf]a(t)‘dt = m and I: ¢(to,t)zsup|a(t)]2dt < m. In this section, a class of gain functiogs is formulated and more assumptions are made about (2.1) and (2.4). The assumptions and Theorem 4.1 imply that the sub-optimal filters converge with probability one and in the mean-square sense. The following assumptions are made about (2.1) and (2.4): Assumption B.l, B.2, B.3, and Assumption C.l: sup §%%%-< m where g(t) = sup]h(t,x)‘ t x Assumption C.2: Let C(t) 2 f(t,x) 2 D(t) for t 2 t0 and for every x and define ¢(t,t0) exp[I:0G(s)ds] w(t,to) exp[f:0D(s)ds]; ‘3’! _ '2 “1‘qu k. ' ". 62 2 o(t.t0) then sup ._______§.< m and lim sup ¢(t,t )2 = K.< m . t N(t,to) O Assumption C.3: I: ¢(s,to)2b(s)2ds = m 0 Corollary 4.4 If (2.1) and (2.4) satisfy Assumptions B.l - B.3 and C.1 - C.3 and if the gain function a(t,x; t) has the form 0, gi(t’;ro,r>°1> ]a(t,;£ ,t)] - (4.13) _ t ._ '— o [a +Ift g2(s,xt ,S)c2(s,x(s))ds] o o where _ 2 . ,t), g2 _) O O -2- 2 ._ I(t.to) s g1(t.xt b(t) s .1(r,;(t)), cztc,§(c>> s g(t) then a(t,;£ t), (2.1) and (2.4) satisfy Assumption B.4. 3 Proof. Assumption B.4(a) is satisfied by inspection. Examining the integrand of the integral in Assumption B.4(b), y(c.to>2b2 [a +‘[: ¢(s,to)2g(s)2ds] O b(t)inf'a(t)] 2 By Assumption 0.2, there exists a K and K such that for 1 2 every t ¢(tt)2 ’ o t ls—jskl and lsmb(t)sx2 I These inequalities weaken the lower bound on the integrand to e(t.to)2b(t)2 b(t)iana(t)| 2 2 t 2 2 K1K2[a' +‘ft0¢(8.to) b(S) d8] Using Assumption C.3 and Lemma 3.2, f:0b(t)inf|a(t)]dt = a . 63 Assumption B.4(c) is now investigated. ¢(to,t)2¢(t,to)4g(t)2 r t 2 2 2 a + ft ¢(s,to) b(s) ds] 0 ¢(t,to)28up‘a(t)]2 s K4 K: ¢(t, t o) 22b(t) [a" +Ijt ¢(s,to) 2b(s)2ds]2 o Again, Assumption C.3 and Lemma 3.2 Show that I:O¢(to,t)zsup]a(t)‘2dt < m, completing the proof. w Except for the discussions of the linear Kalman-Bucy filter in Sec. 4.3, this chapter has concentrated on demonstrating the asymptotic convergence of general sub-optimal, sequential filters. One reason this investigation analyzed sub-optimal filters such as (2.15) was to obtain theorems which provide analytical properties for Optimal, non-linear filters. With this motivation, the ideas of stochastic approximation exhibited by Theorems 4.1 and 4.2 were specialized to produce the corollary in this section, Corollary 4.4. This corollary is now used to show that the approximately-optimal filter represented by (2.13)-(2.14) is asymptotically convergent. Define the gain term in (2.15) as a(t,§t t) = P(t)h(t,§(t)) (4.14) O, with §(xt ,t) P(t)= ° [v—l—_ar(b) +:j‘ grotto S)h(s,;(s))2ds] 64 and 6(xt ,t o > = exp[2 I: f(s,;(s))dsj . O The function P(t) satisfies the differential equation (2.14) since #1:: t = 2 i(t.§(t))P(t) - [h(t,;(t))P(t)]2 ; 150:0) = varw). The gain term a(t,;; t) is the gain term in (2.13) and also satisfies 3 o (4.13) in Corollary 4.4. Therefore, Theorem 4.1 combined with Corollary 4.4 Show that the approximately-optimal filter represented by (2.13) and (2.14) converges asymptotically with probability one and in the mean-square sense if (2.1) and (2.4) satisfy Assumptions B.l, B.3, C.1, ,J C.2, and C.3 and if the initial condition on (2.1), x(to) = b, has a prior distribution. The convergence theorems of this chapter have provided the means for displaying, under certain conditions, a filter- ing solution which meets the three requirements of a useful filter; optimality, convergence, and sequential structure. 4.5 Summary The first major contribution of this chapter is the statement and proof, in Sec. 4.2, of two theorems demonstrating the asymptotic convergence of the non-linear filters defined by (2.15). Theorem 4.1 shows that the filtering error goes to zero both with probability one and in the mean-square sense, with limitations placed on (2.1) and (2.4) and on the gains a(t,;£ ,t)' One prime restriction in the hypotheses of Theorem 4.1 is that the message process is bounded; i.e., lim sup ¢(t,to)2 < m. Theorem 4.2 places other conditions on the gain functions and allows the message process to become unbounded yet shows that the output of the sub-Optimal filter converges in the 65 mean-square sense to the message. To compare Theorems 4.1 and 4.2 to known convergence results, a convergence theorem for the optimal, Kalman-Bucy solution to the linear filtering problem is given in Sec. 4.3 as a special case of these theorems. Theorem 4.1 and 4.2 prove the convergence of the Kalman-Bucy filter when the message process is bounded and when the «as functions used in the mathematical models of the message and observation processes, (2.1) and (2.4), guarantee the divergence of a certain integral. Thus, Theorems 4.1 and 4.2 imply convergence of the optimal linear solution but under restricted hypotheses. Section 4.3 produces a minor original result in the linear theory; Theorem 4.1 demonstrates convergence of the Kalman-Bucy filter with probability one, a type of convergence that has not been proved in the literature. In Sec. 4.4, additional assumptions were made about (2.1) and (2.4). Corollary 4.4 showed that Assumptions B.l - B.3 and C.1 - C.3, when combined with any of the gain functions specified by (4.13), satisfy the hypotheses of Theorem 4.1. If any of the gain functions given in (4.13) is used in (2.15) and if (2.1) and (2.4) fulfill the corollary's assumptions, the filter defined by (2.15) converges, as time increases, to the message process. The second major contribution of this chapter concerns the convergence of an approximately-optimal non-linear filter. The gain function given in (4.14) satisfies the conditions in Corollary 4.4 and, when placed in (2.15), defines a filter identical to the approx- imately-optimal filter derived in Chapter 2. The combination of Assumptions B.1 - B.3 and C.1 - C.3, Theorem 4.1, and Corollary 4.4 shows that an approximately-Optimal filter is asymptotically con- vergent. Under the Specified restrictions, this filter, given by 66 (2.13)-(2.14), provides nearly-minimum—variance estimates of the message throughout the period of observation and, furthermore, is guaranteed to converge to the message after lengthy processing of the observations. The filter satisfies all three requirements of a useful filtering solution as discussed in Chapter 1. This thesis provides the first study into the asymptotic convergence of filters for the continuous-time, non-linear filter- ing problem. The style of analysis and its associated theoretical tools used to prove Theorems 4.1 and 4.2 appear to provide logical means for probing deeper into the behavior of non-linear, sequential filters. CHAPTER 5 AN EXAMPLE OF A FILTERING PROBLEM This chapter considers an example of the filtering problem defined in Sec. 2.1. The filter uses the approximately-optimal algorithm discussed in Sec. 4.4 and is shown to satisfy the hypotheses of Corollary 4.4 so that the filter is asymptotically convergent. This convergence is verified when the message process, observation process, and filter are simulated on a digital computer. 5.1 The Example and Computer Simulation The filtering problem discussed in this chapter is defined as: x(t) = f(t,x) ; x(0) = b; to = 0 (2.1) y(t) = h(t.X) + V(t) (2»2) where f(t,x) = :33??? '9 x + 3.0.: 2 (5.1) [1 + {——-—-21 1 (t + l) h(t,x) = 0.5 x + 5 tan-1x (5.2) and b is a Gaussian random variable with a mean = -l.5 and a variance = 1.0. The approximately-optimal filter derived in Sec. 2.2 and re- presented by (2.l3)-(2.l4) is used to process y(t). This filter is asymptotically convergent if (5.1) and (5.2) satisfy Assumptions B.l, B.3, C.1, C.2, and C.3. These assumptions are now examined. With 67 68 h(t,x) being monotone in x, Assumptions 3.1 and B.3 are verified directly. Assumption C.1 is satisfied as shown below. Also f(t.X) = which implies that and ¢(t,to) = Mute) = Therefore e(t. sup t v(t. h(t,x) = 0.5 + 5 2 l + x g(t) = sup‘h(t,x)| = 5.5 and x b(t) = inf|h(t,x)| = 0.5 x sin(1000 t)__ 6.0 x 100 n t (t+l)4[l + { x 2}2]2 (t+1) sin(1000 t) + 3.0 G(t) = 100 n t (t+1)2 _ sin(1000 t) _ 3.0 D(t) ' 100 n t 2 (t+1) t sin(1000 8) ds + t 3.0 ds explio 100 11- s O (S+1)2 :1 exp[ft sin(1000 3) d3 _ I; __§,g_ ds] 0 100 n S (8+1)2 , 2 11m ¢(t,to) =K - B(t.) [B(t )- B(t)] = lim{2 1+1 2 1 +2 i+12 1 2 These conflicts indicate the need for carefully formulating stochastic differential equations such as (A.l) and for strictly defining the integration procedure to be utilized in the solution of such equations. The formulation presently being applied in the literature and being relied upon in this thesis has a number of important theoretical advantages (Kailath). This formulation, which applies The Levy oscillation property is shown in Doob [41], p. 395. 84 d when g(t) is Gaussian, formally makes [g(t) = agi£l , where B(t) is a vector Brownian motion process, and states that (A.1) is equi- valent to (A.11). _ t t g(t) - _}_t_(to) + It £(S)ds + ft p(s)dp(s) (11.11) 0 o The solution to (A.11) is symbolically represented as: d)_(_(t) = £(t)dt + p(t)dp(t), (A.12) The first integral in (A.11) is treated as a Reimann integral, while the second is defined as the Ito integral, first defined by Ito [45]. Since the Ito calculus is used throughout the thesis, its pro- perties are listed in Appendix B. One important property of this calculus is the chain-rule for differentiation (Theorem 3.1, Ito's Lemma), which is used when determining the differential equation for a function of g(t). To illustrate Ito's lemma and to Show the difference between Ito and non-Ito calculus, the stochastic differential equation in (A.13), is considered. dz(t) = % z(t)dt + z(t)dB(t) ; z(O) = l (A.13) Theorem B.l shows that the solution in the Ito sense is: z(t) = exp[B(t)]. (A.14) Using the notation of Theorem B.l, let a(t) = O and b(t) = 1. Then, x(t) = B(t). Now, let Z(t,x) = exp[x]. Theorem 8.1 says that 2 dz(t) % a-—g-exp[x(t)]dt +-§;-exp[x(t)]dB(t) BX 35 z(t)dt + z(t)dB(t). 85 Thus, when Ito calculus is used to analyze (A.13), (A.14) is the correct solution. The solution to (A.13) when applying the rules of ordinary calculus to (A.13) divided by dt is: z(t) = exp[Btt) + t/2J. Other ways of treating stochastic differential equations have been documented. All assume (A.1) is equivalent to (A.11) and specify different definitions for the Stieltjes-type integral in (A.11). Stratonovich [32] has provided the most popular alternative to Ito's t + t 1+1 i 2 solutions to (A.11) which would be obtained by formal integration, at definition; his method selects pi = in (A.8) and provides least in the scalar case. Unfortunately, his integral doesn't possess analytical properties as convenient as those listed in Appendix B for the Ito integral. This discussion has concentrated on stochastic differential equations containing Gaussian white noise. The remainder of this appendix describes the available literature concerning definitions for stochastic integrals, such as the one in (A.8), when the noise in (A.1) is non-Gaussian. Portions of the following comments are also found in Kailath [34] and Fisher [8]. The correct way of representing white noise is to define it as a generalized random process, such as discussed in Gel'fand and Vilenkin [46]. Part of Gel'fand's develOpment shows that the gen- eralized derivative of Brownian motion has a delta function for its correlation function. More generally, the random variables comprising the formal derivative of any process with independent increments are shown to be independent. In other words, the formal derivative of an independent increment process is white. This fact has been used 86 in formulating stochastic differential equations Since both the Ito and the Stratonovich integral have been defined when B(t) is replaced by any independent increment process [13], [45]. However, this more general representation of white noise cannot be applied when analyzing observation processes such as (1.2) and (2.2). Independent increment processes can be shown to be either impulsive (Poisson) processes or Brownian motion processes or a combination of the two. Kailath and Fisher indicate that it is not possible to define a filtering problem which has an observation process containing impulsive noise. In summary, if the white noise in the models of the observation processes used for the filtering problem is considered equivalent to the formal derivative of an independent increment process, the white noise must be Gaussian. There is at least one other way to interpret white noise. Assume it is equivalent to the formal derivative of any process with orthogonal increments. Results on the derivative of independent in- crement processes indicate that the random variables comprising these processes are independent when all that is needed is orthogonality. Thus, the derivatives of orthogonal-increment processes do seem logical as representations for white noise. There does not seem to be any published work on the formulation of stochastic differential equations using this interpretation of white noise. Because of these limitations in the theory of stochastic differential equations, this thesis treats only Gaussian, white noise in the observed process (2.2). APPENDIX B O-ALGEBRAS, ITO CALCULUS, AND MARTINGALES This Appendix defines and discusses three important concepts in stochastic processes. 3.1 a-Algebras Generated by Stochastic Processes Let m(t,w) be a stochastic process defined on the probability space {Q,H,P] where w E Q and t E I, the index set; 0 is a sample space, H is a a-algebra of measurable sets and P is a measure assigned to the sets in H. For the remainder of this appendix, and for the entire thesis, m(t,w) is denoted as m(t). The G-algebra generated by the random process [m(s), t0 5 s s t] is defined as the minimal a-algebra of events that contains all events of the form: {w : w E Q and m(u,w) E A] where u 6 [to,t] and A E R is any Borel set. This O-algebra is denoted by F{[to,t],m(u)]. l Theorems from measure theory [41], [47] Show that P[[to,t],m(u)}<: H, that P[[to,s],m(u)] c F[[to,t],m(u)] for t 2 s, and that m(t) is measurable on F{[to,t],m(u)}.7 b.2 Properties of the Ito Integral Let B(t) be a Brownian motion process defined on I = [tO,T]. This section lists the properties of the Ito calculus that follow from Ito's definition of the integral I: g(s)dB(s). This integral 0 cannot be defined in the ordinary Riemann-Stietjes sense Since B(t) is of unbounded variation with probability one. For a discussion of O-algebras (O-fields) consult Loeve [47] (Chap. 1). Doob [41] (pp. 599-602) has a discussion of a-algebras generated by random processes. 87 88 Ito calculus is thoroughly discussed in Skorokhod [45]. The o-algebra generated by B(s), where tO S s s t, is F{[to,t],B(s)]. Let M2 (F t) be the class of functions g(t) which are measurable on F{[t0 ,t] ,B(s)] and satisfy Prob[IT oIg(s)]2 dt < a} = l. The Ito integral I: g(s)dB(s) is defined for all g(t) 6 M2 (Ft ) and has the following properties. Property 3.1: E I: g(s)dB(s) = 0 t E I 0 Property B.2: E[I:g(u)dB(u)]F{[to,s],B(r)]] = O for t 2 s and t,s 6 18 Property B.3: E]f: og(s)dB(s)]2 =Iig E|g(s)|2ds for t E I Property B.4: The process w(t) =-IE g(s)dB(s) is a martingale. Property B.5: The process w(t) is t: continuous process with prob- ability one. Appendix A indicated that the Ito integral has been extremely useful in the formulation and analysis of stochastic differential equations. Special care must be taken when applying the chain-rule for differentiation to these equations. Theorem B.l provides the proper differentiation formula. Theorem 3.1 (Ito's Lemma) Let x(t) be a process satisfying dx(t) = a(t)dt + b(t)dB(t) w.P.1 for t e [tO,T] where B(t) is Brownian motion and a(t), b(t) and b(t)2 belong to M2 (F t). If Z(t, x) is continuous and has continuous derivatives %—-Z(t, x), g—-Z(t, x), and §——'Z(t, x) for t E [to,T], then the pro- 3x2 cess z(t) = Z(t,x(t)) satisfies the relation For a discussion of conditional expectations refer to Doob_[4l], p. 37. 9 All integrals not of the Ito type are treated as Riemann integrals. 89 2 dz(t) = [3; Z(t,x(t)) + a(t) 51X- Z(t,x(t)) + 3sb2 L5 Z(t,x(t))Jdt 3x + b(t) g; Z(t,x(t))dB(t). The notation gE'Z(t,x(t)) has been substituted for g; Z(s,x(t))‘s=t. B.3 Martingale Theory A random process w(t) defined on a time interval I is called a martingale if for every t E I there corresponds a O-algebra At relative to which the random variables w(s) are measurable for s s t, s E I, and which possesses the prOperty that for t1 S t2 where t1,t2 E I E[w(t2)‘At1] = w(tl) w.P.1. (3.1) If the equality in (B.l) is replaced by S (2), a super-martingale (sub-martingale) is defined. Some of the martingale theorems given later require that w(t) be a separable process. A process w(t) is separable on I if there is a denumerable, everywhere dense subset D of I such that for any a < b in I sup w(t) = sup w(t) and inf w(t) = inf w(t) w.P.1. tE(a,b) t6(a,b)flD tE(a,b) t€(a,b)nD The separability condition is satisfied if the process w(t) is right- continuous in t (w.P.1) on the interval I. In this thesis, all processes are continuous (Appendix D), so separability is automatically satisfied and need not be mentioned. The following three theorems on martingales are used in this thesis. Theorem B.2: If w(t) is a non-negative, super-martingale, 1im w(t) = W a tam 90 exists with probability one and is finite. Theorem B.3: If w(t) is a (separable) non-negative super martingale on any interval 1, then for any t E I and any constant c, Prob{ sup w(u) 2 c} S EEXEl . u2t,u€I Theorem B.4: If w(t) is a (separable) sub-martingale on some interval I, then for any t E I and any constant c, Prob{ sup w(u) 2 c} s ELEXELL - uSt,u€I The proofs of these properties can be found in Loeve [47] (sub- sections 29.3 and 36.1) and Doob [41] (chapter 7). APPENDIX C THE INFINITE DIMENSIONAL REPRESENTATION FOR THE OPTIMAL FIETER This short exposition develops the system of random differ- ential equations which must be solved to provide the minimum-variance E estimate of the message process x(t) in (2.1). This system re- presents the filter which is an optimal solution to the filtering problem discussed in Sec. 2.2. In Sec. 2.2, the minimum-variance (conditional mean) estimator t is represented by the following stochastic differential equation. d$c(t) = Ema: +5310» - x(t)h(t)][dz(t) - Emu} (2.7) In (2.7), x(t) is the conditional mean at time t. The other functions are defined in Sec. 2.2. To solve (2.7), where the observed process dz(t) is an input driver, the functions f(t), xh(t), h(t) must be found. The function h(t) is examined first. Equation (2.6) re- presents h(t), the conditional expectation of h(t,x(t)), when g(t,x) = h(t,x). Then dB(t) = {Elma + [g(t) - h(t)2][;dz(t) - h(t)dt] /\ /\ where h1 = 5X h(t,x). Now, fh1(t) and h2(t) need to be deter- mined so (2.6) must be consulted. /\ A /\ .. .. dfh1(t) = @1(t)dt + [f2h2(t)dt + fhh1(t) - phl(t)h(t)][dz(t) - b(t)dt] /\ dh (c) = 2 find)“ + [h3(t) -?(t)fi(t)][dz(t) - b(t)dtl 91 92 A AA To continue this procedure,‘ff:h1(t), f2h2(t), fhh1(t), h3(t) must be determined. In fact, an endless string of differential equations need to be solved to determine all needed conditional expectations. The same situation occurs when determining f(t) and 4km. This endless string of differential equations, which must be solved simultaneously with (2.7), provides the infinite dimensional representation for the optimal filter. APPENDIX D IMPORTANT PROPERTIES OF STOCHASTIC DIFFERENTIAL EQUATIONS SIMILIAR TO EQUATION 2.15 The purpose of this appendix is to indicate conditions on h(t,x) and a(t,;; ,t) which are sufficient for providing x(t) in (2.15) with two important properties: (1) Continuity of x(t) w.P.1 and (2) F{[to,t],;(s)} C:F{[t0,t],B(s)}. The following derivations are based on Doob [41] (chapter 6) and Skorokhod [45] (chapters 2 and 3). This appendix does not deal directly with (2.15) but with an equation containing the same features as (2.15) and fewer terms. Let w(t) be the solution to dw(t) = g(t,w(t))dt + a(t,wt t)dB(t) for t e [to,T] (C.1) 9 where B(t) is a Brownian motion process and wt t = {w(s) : t s s S t}. s O The following conditions on g(t,w(t)) and a(t,wt t) are postulated 9 for the remainder of this appendix. 0 H1: The function g(t,w) is a continuous function in the pair (t,w) and a(t,wt ,t) is measurable on F{£to,t],w(s)}. H : The fugction g(t,w) satisfies the uniform Lipschitz condition ~‘g(t’x) " g(t:Y)| 5 K‘x - y‘ for x,y s R and t 6 [tO,T]; K is a fixed constant. 1 The function a(-:-) satisfies the following function-space condition ~‘a(t’xto,t) - a(t,yto,t)\ 5 KIIE6lX(s) - y(s)|2ds]% 93 94 where xt ,t and yt ,t E C: , the space of continuous functions on the time interval [t:,t]. The constant K may be selected equal to the Lipschitz constant for g(-,°) with no loss of generality. The derivations in the remainder of this appendix are con- centrated on demonstrating the following properties of the solution to (C.1). P1: The function w(t) is continuous in t with probability one. P2: F{[to,t],w(s)} C:F{[to,t],B(s)} for every t E [tO,T]. These properties are demonstrated by a proof constructed similar to Doob_[4l] (pp. 277-281). A by-product of this proof is the existence of the solution to (C.1). The derivation begins with the following lemma. Lemma: If a process w(s) has properties P and P and if g(-,~) 1 2 and a(-,-) have properties H1 and H2 then any process y(t) defined by y(t) = ft g(s w(s))ds +ft a(s w )dB(s) (c.2) t ’ t ’ t ,s o o 0 has property P2 and the Ito interpretation of the second integral in (0.2) provides y(t) with property P1. Proof. According to H1, H2, and P1, the first integrand in (0.2) is a bounded, continuous function of s for almost all sample functions. The first integral is a continuous function of t with probability indicates that a(t,w ) is one. Condition H1, comb1ned With P2, to’t measurable on F{[to,t],B(s)} so the second integral may be defined as an Ito integral (Appendix B). This integral is a martingale and, in addition, is continuous in t with probability one. Continuity for almost all sample functions y(t) follows. It is obvious 95 that y(t) is measurable on F{[to,t],B(s)} and, since for all s S t, y(s) is measurable on F{[t0,s],B(u)}<: F{[to,t],B(u)}, the process [y(s) : t0 S s S t] is measurable on F{[to,t],B(u)}. The definition of F{[to,t],y(u)} shows that F{[to,t],y(u)}<: F{[to,t],B(u)}. This finishes the proof of the lemma. A solution can be found for (C.1) by successive approximations that have the properties P1 and P2 shown above. Let wo(t) be any process having properties P1 and P2. According to the lemma, it is now possible to define wn(t) in such a way that every wn(t) has properties P1 and P2. wn(t) = I:0g(s,wn_1(s))ds +-f:oa(s,wt )dB(s) (C.3) ,s,n-l o with w (u) : tO S u S s}. The following derivation t ,S,n"1 = {Wm-1 0 will show that lim,wn(t) = w(t) w.P.1 (C.4) n—G uniformly in t, thus defining a process w(t) with properties P1 and P2. Also, it will be shown that . t t 11m I g(s,w (s))ds = I g(s,w(s))ds w.P.1 t n t nah o o (0.5) 9 t 1im It a (8 ,wt nHub o S,n)dB(S) = f:0a(s,wto,s)dB(s) w P.1 uniformly in t. The process w(t) will be the solution to (C.1). In proving these facts, the following notation will be con- venient. (t) AnW(t) = Wn(t) - Wn_1 Ang(t) = g(t,wn(t)) - g(t,wn_1(t)) 96 Ana(t) = a(t’wto,t,n) - a(t’wto,t,n-1) From H2, lang(t)| s K\Anw(t)l ‘AnS(t)‘2 S K.f:o‘Anw(s)‘2ds. Then, from (C.3) and a property of the Ito integral (Appendix B), 2 t 2 t 2 E{[Anw(t)] } S 2 E{‘ft An_1g(s)ds‘ } + 2 E{|It An_1a(s)dB(s)‘ } o o S 2 K2(T-to) I: EEAn_IW(s)]2dS +-2 f: E‘An_la(s)‘2ds 0 0 S 2 K2(T - to) I: E[An_lw(s)]2ds + o t 2 t 2 + 2 K ft Is EEAn_1w(u)] duds 0 O s 2 K2(T - to) f:0E[An_lw(s)]2ds + + 2 K2 I:O(t-S)E[An-1w(s)]2ds s 4 K2(T - to) j:oE[An_1w(s)]2ds. Hence, for some constant c, EEAnw(t)2] S [4 K2(T - to)]n“1 I: (£52)? E[A1w(s)]2ds o n c S “' for t S t S T. n! 0 Using this inequality and Chebyshev's inequality gives t -n T -n Prob{ sup”,t Ang(s)ds| 2 2 } S Prob[K rt ‘Adw(s)‘ds 2 2 } toStST o o S 4n E([K.IT ‘A w(s)\ds]2} to n ‘ n 2 cn S 4 K (T-to) ET . 97 Since the last term is the general term of a convergent series, (C.6) holds for sufficiently large n with probability one from the Borel- Cantelli lemma (Doob [41], p. 104). sup [ft A g(s)dsl S 2-n (C.6) t n t StST o o The process I: Ana(s)dB(s) is a martingale for the reasons 0 given in the lemma. The square of this process is a sub-martingale and the basic martingale inequality for sub-martingales (Theorem B.4) shows that Pr°b{ SUP 1f: A a(s)dB| 2 2'“) s 4“ Eilff A a(s)dB| < 2 (C.7) t StST O 0 According to (C.6) and (C.7), the integrals on the right hand side of (C.3) converge uniformly in t when n a a, w.P.1. Hence, the limit in (C.4) exists uniformly in t, w.P.l and w(t) has pro- perties P1 and P2. The validity of (C.5) follows from two facts: (1) the integrands converge uniformly w.P.l, in the first limit equation; (2) the sub-martingale inequality applied as above, shows 98 that Prob{ sup If: [a(s,wt S n) - a(s,wt S)1dB(s)l 2 ll toStST O O, ) O, n 2 2 T 2 s n K It (I - s)E[wn