SYSTEM IDENTIF!CAT!0N BY BAYESIAN LEARNING . Thesis for the Degree of Ph. D.’ MICHIGAN STATE UNWERSITY PATRICK J. DONOGHUE 1968 b .,, .t 1‘1 . 'I'hcblb 0-169 IIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIL 31293 00995 8731 This is to certify that the thesis entitled System Identification by Bayesian Learning presented by Patrick J. Donoghue has been accepted towards fulfillment of the requirements for 2% degree in —EE__ .. /) ' " J _.‘ l 'I x r . ’3! ..//4 "1:74;”! gun/l “1/"- V Major professor ‘ ,l‘ ’1’“ -- 1’1 "x l _ 1‘ / '- ' Date -' " g/U,/.-’,/< ”9 I" N ABSTRACT SYSTEM IDENTIFICATION BY BAYESIAN LEARNING by Patrick J. Donoghue The problem of system identification is of fundamental importance both from a practical and system-theoretic point of View. As such it has received wide attention in the literature. However, much of the work in this area suffers from the lack of a unifying structure. This is especially true of those identifica- tion techniques that treat problems concerned with noise-obscured measurements and unobservable random disturbances. In this thesis the approach taken to system identification is Bayesian learning. The systems considered are described by a finite set of difference equations relating the system states and inputs, and by a finite set of algebraic equations relating the sys- tem states, inputs, and observations. The measurement of all. , states and all inputs is assumed to be obscured by additive noise. Further, the system states are assumed to be influenced by un- observable additive random disturbances. The object of the identification is to determine the matrices or constants that specify the system. PATRICK J. DONOGHUE Using Bayesian learning for identification allows the deriva- tion of a general identification algorithm which is practical, includes many previous results as special cases, and provides a framework for solving new problems. The identification algorithm is iterative and has the form of a general stochastic approximation algorithm. Thus the algorithm operates on the data as it becomes available and produces a sequence of estimates for the parameters which specify the system. These estimates are Bayes-optimal in some cases con- sidered and are sub-optimal in others. The specific identification algorithms for each class of systems considered have one simple structure and are computationally feasible. The introduction of the concepts of identifiability and strong identifiability provides a workable basis from which convergence of the general identification algorithm can be established. Three theorems on the convergence of the identification algorithm to the true system parameters as the number of observations becomes infinite are proved using these concepts. Two of these theorems are new results in identification. A new proof for the third result follows from a strengthening of the hypotheses of the other two theorems. The general identification algorithm is used to derive Specific algorithms for important classes of systems. Algorithms PATRICK J. DONOGHUE for both stationary and time-varying linear systems are obtained, and the derivation of algorithms for nonlinear systems is indicated. Other formulations are given, including systems with state distur- bances having an unknown mean and systems with generated inputs. To demonstrate that implementation of the proposed algorithm is not difficult and to demonstrate that the algorithm does indeed con- verge, a fourth-order digital control system with eight unknown parameters is identified. Computer-sirnulated results show the behavior of the algorithm under different initial estimates, noise conditions, and a-priori uncertainty. SYSTEM IDENTIFICATION BY BAYESIAN LEARNING BY Patrick J. Donoghue A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering 1968 5‘\ ACKNOWLEDGEMENTS The author wishes to thank his thesis advisor, Dr. R. C. Dubes, for his continued assistance and inspiration during the preparation of this work and for his invaluable guidance as a friend and educator. Thanks are also due to Dr. H. E. Koenig and Dr. G. L. Park for their assistance as academic advisors and to Dr. E. A. Nordhaus and Dr. R. B. Zemach for their interest in this work. Finally the author owes a great debt of love and gratitude to his wife, Sharon, for her unselfish patience and understanding during his graduate studies and the preparation of this thesis. ii TABLE OF CONTENTS ABSTRACT ACKNOWLEDGEMENTS ............ . . . . . . LISTOFFIGURES..................... LIST OF APPENDICES .................. . I INTRODUCTION .......... . . . . ...... Modeling and Induction .......... . . . . II III System Theory and the Identification Problem . Applications of System Identification . . . . . . General Identification Techniques ..... . . . ObjectoftheThesis. . . . . . . . . . . . . . . D—ll—lD—ID—‘l—l £11900an FORMULATION OF THE IDENTIFICATION PROBLEM AS BAYESIAN LEARNING. . . . . . . . . 2.1 Identification by Estimation and Learning. . . . 2. 2 Previous Results in Identification by Estimation and Learning. . . . . . . . . . . . . 2. 3 Basic Equations and. the A-posteriori Distribution.................... 2.4 Alternate Forms for the A-posteriori Distribution .............. . . . . . . 2.5 Summary ....... ...... SOLUTION OF THE BAYESIAN IDENTIFICATION PROBLEM ....... ...... . Derivation of the Identification Algorithm. . . . Convergence Theorems and Assumptions. . . . Derivation of Previous Results as Special Cases Summary ..... . ............... wwww o. O Jimmi— iii Page ii vi 00¢me 10 10 l4 17 19 24 2.6 26 32 36 39 IV SPECIAL FORMS OF THE ALGORITHM . parse-94:4: O‘U‘II-hUONH Summary V AN EXAMPLE UTU1UI Summary VI CONCLUSION . . . ...... . . l . 2 Computer Simulation Results . 3 Linear Stationary Systems . . . . . Linear Time-Varying Systems . ..... . . . . Stationary Nonlinear Systems. . . . Time-Varying Nonlinear Systems . . . Miscellaneous. . . . . . Problem Statement and Formulation 6.1 Results of the Thesis ........ 6.2 Possible Extensions . . . BIB LIOGRAPHY . . iv 40 41 44 50 51 53 56 57 57 63 68 7O 70 72 LIST OF FIGURES Figure Page 3.1.1 General system configuration . . ...... 27 5. 1. 1 Digital control system with noisy transmission and feedback paths . ...... 59 5. 2.1 Behavior of a typical estimator ....... 66 5. 2. 2 Effect of noise power on convergence . . . . 66 5. 2. 3 Effect of initial estimates on convergence . . 67 5. 2. 4 Effect of uncertainty on convergence ..... 67 LIST OF APPENDIC ES Page APPENDIX I CONVERGENCE PROOFS FOR THE IDENTIFICATION ALGORITHM. . . . . . . 74 II THE REPRODUCING NORMAL AND ITS QUADRATIC FORM ............. 88 vi I INTRODUCTION This chapter provides a foundation for the results in this thesis. Section 1. 1 indicates in a general way how induction and modeling are related. The role of identification in the general structure of system theory is discussed in section 1. 2, and the wide range of practical applications for identification techniques is indi- cated in section 1. 3. Various general identification techniques are listed. in section 1. 4 to show how Bayesian learning is related to other methods. Section 1. 5 discusses the basic results that will be derived in the thesis. 1. 1 Modeling and Induction The fundamental goal of virtually all scientific investigation is to acquire knowledge about the real world (except perhaps for the mathematical formalist's investigation [SCH] ). This real world may take on diverse appearances depending upon a particular investiga— tor's background and individual goals. Information about the real world. usually appears in the form of causality relations, and one medium through which these causality relations are di8p1ayed is mathematics. In fact, mathematics is practically the only universally accepted tool for handling the rela- tionships which arise in scientific studies. Mathematics provides a systematic procedure for predicting and analyzing the observable characteristics of an investigator's environment. This is parti- cularly true in those systems that are of practical interest. This prediction and analysis is realized by development of a mathematical model, i. e. , a finite set of mathematical relationships which inter- relate certain observable or measurable quantities of the system under consideration. The derivation or synthesis of the relation- ships forming the model is accomplished in terms of the structural features of the system. These features are the characteristics of the individual members of the aggregate and their manner of inter- action or interconnection. The individual characteristics are obtained by assumption, a-priori information, or sets of funda- mental measurements. The test of the mathematical model's validity rests upon the connection between the variables of the mathematical structure and their physical counterpart. Hopefully, the relation between these entities is isomorphic [AHL] . That is, the values assumed by the variables in the mathematical model are in a one-to-one pr0perty correSpondence with the values that are measured. An alternative way of arriving at mathematical models of physical systems is to employ inductive inferential relationships [CAR][AY] . That is, a partially specified mathematical model is assumed to be an accurate counterpart of the physical system and appropriate measurements are taken to confirm or deny the hypo- thesis. If all variations of the original model fail to accurately pre- dict or account for the observations, then the model is changed or discarded. If, however, suitable manipulation or further specifica- tion of the model results in accurate prediction of the observations for some finite time, then the model is retained. Since the phenomena are only observed for a finite time it is impossible to verify the hypothesis completely and only confirmational support for it is ob- tained [HE] [ POP] . This is the nature of inductive logic. The model is retained only as long as it is able to account for all the observations. Even if no contradictory evidence occurs for a very long time, the original hypothesis is not necessarily verified. The approach taken in this thesis, called identification by Bayesian learning,is based on inductive inference. 1. 2 System Theory and the Identification Problem In the foregoing discussion the nature of the problem to be considered. here was indicated only in a very general way. In this section the problem will be defined in the context of contemporary system theory. In one of his treatments of general system theory, Zadeh [Z-l] has given a comprehensive listing of the problems that are included in this theoretic structure. Among these problems are system analysis, synthesis, control, stability, reliability, learning, and signal theory. In addition three distinct aSpects of the theory are singled out for Special consideration because of their fundamental importance. These are the problems of system characterization, classification and identification. The first of these problems deals with the representation of input-output relations. These representations may be expressed as solutions of differential equations, state functions, transfer functions, integral operators or in any other convenient form and may change according to whether the system is time-continuous or discrete, sto- chastic, causal or finite state among other considerations. The second problem, classification, is concerned with determining class membership when a system is assumed to belong to one of a family of classes. The classification relies on observations of system input and output. Two problems that are in this category are: (1) determination of the order of a system's differential or difference equation, and. (2) determination of whether a time-varying system is linear or nonlinear. Lastly, and most important for this discussion and often for practical situations, is the problem of system identi- fication. Generally, the identification problem considers means for determining the Specific characteristics of a system through observations of the input and output. A precise definition of identification is now stated [Z-l]: Given a class of systems S where each member of the class is completely Specified, the identification of a system A consists in finding a system s 6 S that is input-output equivalent to A. It is important to note that the definition requires input-output equivalence and does not require 3 c S and A to be identical. For a given input- output relation, there is generally no unique system representation [K-l][ARN][Z-2] . In this thesis systems are assumed to be com- pletely Specified within a parameter set and the purpose of identifi- cation is to determine these parameters. There are three major complications which normally enter into any real identification problem. The first of these is the absence of knowledge concerning the system's initial condition;-_ the second is the presence of random noises obscuring the observations of inputs and outputs; last is the difficulty in establishing a meaningful and convenient method for estimating the system parameters as a function of the observations . l. 3 Applications of System Identification Probably the most common use of identification techniques is found in the area of automatic control system design. Here a device such as a servomechanism is identified using sinusoidal or step response data. The identification consists of determining poles and zeros or time constants. Knowledge of these system character- istics aids in the design of devices which may improve total system performance. In process control situations, such as found in oil refineries or batch production industries, the input-output relations may be very complicated or essentially unknown. Only by identifying the characteristics of the process from operating records can any kind of satisfactory control policy be developed. Many other systems of interest are describable by sets of equations which have slowly varying parameters. Through continuous monitoring, the system can be approximately identified at each instant in time and effectively controlled. Controllers which utilize con- tinuous on-line identification are called (parameter) adaptive con- trols [AO-1][ SK] . Still other systems may vary in a random manner. By using a system identifier in a learning loop, effective learning control systems can be developed [SK]. Identification techniques can also be applied to communica- tion problems. For example a communication channel may some- times be characterized by a slowly-varying linear system. Con- tinuous identification of the channel can make more efficient communication pos sible [ KAI] [ HAN] [ DAL] . Two more areas where identification techniques are of interest are pattern recognition and reliability. The object in pattern recognition problems is to distinguish one probability distribution from another. If each distribution is assumed to be generated. by a different system, then an identification scheme may be effective in discriminating among these patterns. In reli- ability problems a system reliability index may depend upon the system parameters which can vary. By identifying the system parameters, a check on the reliability of the system can be main- tained . 1.4 General Identification Techniques A wide variety of techniques have been developed to solve some aSpect of the identification problem. Many of these techniques were developed to study specific processes of a practical interest [IFAC] . Others have been deveIOped to attack the problem in a very general way [BAL- 1, Z] [ HE] . Most identification techniques may be classified either as statistical or analytic, depending upon whether or not the problem formulation accounts for random effects. These general areas may be subdivided as follows: A. Statistical Methods [EY] 1. Parameter Estimation a. Maximum Liklihood i) conditional [ SMI] [ RAU] ii) unconditional [EY] b. Regression Analysis and Least Squares 1) linear [ALB] [ STI] ii) nonlinear [ALB] c. Stochastic Approximation [ HO- 1] [ LEE] [SAR] [ SAK— 1, 2] d. Bayesian Estimation [FK—3] [ MA][AO-l] 2. Learning Model Techniques [SK][ TSY] [ FK-l] [FU-l] 3. Spectral Analysis (Wiener-Kolmogorov Theory) [DA] [ LAN][ LEV] [ AND] [ WE-l, 2][ LE- 1, 2] B. Analytic Methods 1. Functional State Representations [HE] [ BAL-l, 2] 2. Gradient Techniques [ BAL-Z] [ MG] 3. Numerical Methods and Approximations [ BX][ CU-Z] [BEL-1, 2] [ 30-2] 4. Frequency Domain Techniques [SG-l] [ PU] [ BEL-1] [CH] [ AH] . Most identification techniques in the literature apply only to stationary linear systems. Some work has been done on nonlinear problems, but it is a relatively untouched area even in those pro- blems that are of important practical interest. Relatively few studies are based upon mathematical statistics and probability theory. However, work in this area is significant and treats a wide variety of problems. 1. 5 Object of the Thesis In this thesis system identification, as defined in section 1. 2, is approached by a method called Bayesian learning. The systems considered are represented by a finite set of difference equations relating the system states and inputs, and by a finite set of alge- braic equations relating the system states, inputs and observations. Identification of such systems when the states are subjected to unobservable additive random disturbances and when observations are corrupted by additive noise will be examined in this study. Most of the work in this area lacks a unifying approach. The purpose of this study is to derive a general identification algorithm which is convenient and practical, includes many previous results, extends the applicability of some of them, and allows the derivation of new results. The identification algorithm is iterative. That is, it operates on the data as it becomes available and produce a sequence of estimates for the parameters which Specify the system. These estimates are approximations to estimates which are optimal in a sense to be defined, by virtue of an approximation to Bayes' rule. The specific algorithms for each class of systems have one simple structure and are computationally feasible. Furthermore, the algorithms converge (in various senses) to the true parameter values as the number of observations becomes infinite. The main contribution of the thesis consists in the formu- lation and solution of a general identification problem in the structure of Bayesian learning. This formulation allows the derivation of algorithms for important classes of systems; namely linear statio- nary, linear time-varying and nonlinear systems. Three important convergence theorems are proved, two of which are new results in identification. A detailed examination of the properties of the identification algorithm for a Specific system is carried. out by digital computer simulation. II FORMULATION OF THE IDENTIFICATION PROBLEM AS BAYESIAN LEARNING Identification by Bayesian learning is a Special method that comes under the general heading of statistical estimation. After the structure of estimators that are useful in identification is indicated in section 2.1, the scope of previous results in this area is given in section 2. 2. The basic relations used to derive algorithms are developed in section 2. 3. lrnportant implications of using these particular relations are discussed in section 2.4. 2.1 Identification by Estimation and Learning System identification by methods that are based. on statis- tical estimation have f*und wide application. These methods can account for the effects of random observation noises as well as unknown random inputs or disturbances. Further, they may be applied to nonlinear as well as linear systems and are applicable to both on-line and off-line problems. One way of formulating the identification problem of section 1. 2 follows. Let the system state equations be 10 x : f 2. . k+l k( ) 1 1 E Xk' uk, “km for k = O, 1, . . . , where x is an n-vector called the system state, k uk is an r-vector of inputs, ék is a Sample from an n-dimensional >:< random process , fk(.) is a known vector function of its arguments at each time tk and a iS a p—vector of fixed but unknown parameters. The system observation equations are 1 Yk : gklxk! “R: B) 2010 2 - + Z 2 l 3 Vk ’ ”k ”k ' ' where yk is an m-vector of observations, vk is an r-vector of 2. input observations, n1 and nk are samples from m and r-dimen- k Sional independent random processes, gk(.) is a known function of its arguments for each k and {3 is a q-vector of fixed but unknown parameters. Identification of the system represented by 2.. l. 1, Z. 1. Z, 2. 1. 3, consists in the determination of the parameters a and [3. Since random disturbances enter into the system equations, \ o. and B will in general b— imp Q ssihle to find exactl . Rather, suc- cessive estimates of these parameters that converge (in some sense) to the true values of a and [3 will be sought. In this context an identification algorithm will learn the true parameter values. pln this thesis a random variable and the values it takes on will be denoted by the same letter. The meaning will be clear from the context. 12 Knowledge of f k(. ) ar: gk (. i is required 8 ane identificati ion and not classification or characterization is being considered. Success ve estimates of a and B will be derived in the frame- work of Bayesian estimation. Ln particular, a sequence of esti- mators {LIDk} of the comp sLte vector LlJ = (a, .3) is sought. This se- quence of estimators is to depend upon the observations yk and vk, k = 0,1,. . . . The loss function m3? or) represents the loss incurred k, when the estimator $k based on the first k observations is used and q; is the true parameter value. This loss function is assumed to be non-negative and to have a relative minimum for $k = LP. De- . k . .., k noting by y the set oi observatiens (yk, yk 1, . . . , yo), and by v the set of observations (vk. . . v0}, the “'rstrnnar risk or expected loss in choosing 13k as an estimatr ref L11 5 defined by the conditional expectation, A . ,A , k k k k E{L(¢ Alt/w} = Law .thay .V /¢)d(y .V) k yk Vk k where p(yk, vk/qJ) re ptresen s the grim, pr'bab rLty densi y of the J; observations conditisned an L9. Letting p(LL') denote the a-priori . . . . . A den81ty of the parameters, the ex pested risk of choosrng up 18 k defined by >:< For notational convenience, in all of teh fo -13wing development p(. ) will be used to dencte the density functi 3n .rf a random variable and the argument of the den nsity will Speeif the random variable. l3 R($) E{E{L($k.¢)/¢ }} S 5 k k “$18 ‘l’lpwk: Yk/¢)p(¢)d(yk, vk, 4;) 4’ Y . V II That function L’Dk which minimize ROTJk) is called the minimum risk or Bayes estimator of 41 for each k. Since p(vkmk/wpm = p(t/vk,yk>p(vk.yk) then Rm?) = C p(vk,yk) L<¢p¢)p(¢/vk.yk>d(yk.vkuv) k Uyk vk ¢ where p(LlJ/yk, vk) is the a-posteriori density of the parameters evaluated at KIJ conditioned on yk, vk and k k k k p(v .y > = Sptv .y /¢>p<¢> at, LP Since p(vk, yk) Z 0, R($k) can be minimized at each k by minimizing . . k k . . . the inner integral for each y , v . This is often done by solv1ng the gradient equation associated with R($k). To form recursive estimates of LIJ as the observations (vk, yk) become available, the a-priori density p(qJ) is replaced by p(LlJ/ vk-l, - .. k- yk 1) and a relation between p(LlJ/ vk, yk) and p(Lp/vk 1, y l) is es- tablished by Bayes' rule. Furthermore, it is possible to show under conditions of convex loss functions and symmetric density functions that this minimum risk is realized by taking {bk to be the conditional expectation [ DE] , l4 tk = ElLP/vk.yk}. These conditions hold for the very important case of gaussian random variables and squared error loss functions. This particular case is used extensively in the development of the system identifica- tion algorithms . Z. 2 Previous Results in Identification by Estimation and Learning A large number of publications deal with system identification when random inputs and/ or observation noises are present. Only a relative few of these utilize estimation theory and learning proce- dures to derive recursive estimators for the system parameters. Some recent results in this area are discussed in this section. The basic system structure and conditions for convergence of the identification scheme are considered. to be the important factors of the individual investigations. Only time domain techniques for identi- fication of discrete-time systems are considered. A relatively simple problem in identification is studied by Ho and Whalen [HO-1] . It is given that the system equations are x : Ax k=0,1,... 2.2.1 2 222 yk Xk+nk " which represent the state and observation equations, where xk is an n-vector, A is a constant nxn matrix to be found and 11k is a 15 gaussian random vector with zero mean and covariance T E : : ° 0 otherwise. The system is stationary, linear, homogenous and all states are observed with additive white noise. The authors give a recursive algorithm for finding the A matrix that converges if detlxr: xk+1"°" xk+n-l] %’ 0 and there exist constants X1, X0 > 0 such that x > ll[xk,..., x THZ >>\ >0. 1 o k+n-1][xk’°°°'xk+n-l] The identification algorithm converges to the true value of A with probability one (WPl or A. E. , [PAP] ). The proof follows from an application of Dvoretzky's theorem on stochastic approximation [DV][ WOL]. Another type of identification problem can be formulated as a least squares filtering problem. The solution can be derived either as a least squares or Bayesian estimate. The system equa- tions are 2. 2.6 where 6k is a random n-vector with zero mean and known covariance, and A is an unknown matrix. 16 As shown by Mayne [MA] the problem of forming unbiased least-squares estimates of A can be solved using known results [K-Z][K-3] . Specifically, the problem can be formulated, as a linear state estimation problem and solved using Kalman filtering techniques. Mayne has also shown how estimates of a randomly or deterministi- cally varying A matrix can be found, and can account for an unknown input-gain matrix B. A serious restriction to this formulation is that all states and any control inputs must be observed without noise. No proof of convergence is given by Mayne. Another method of solution to this problem is given by Fukao using a Bayesian approach [FK-Z, 3]. In his original formulation all states are observed without noise and the input noise 5k has an un- known mean. Fukao‘s proof that his algorithm converges WPl relies on results from stochastic approximation theory. An alternate proof of his result with appropriate assumptions is given in Appendix I. By intuitively extending his algorithm to account for observation noises, Fukao has also shown how to handle this important case under stated assumptions. In fact he is able to apply the algorithm to some nonlinear systems if the assumptions can be shown to hold. For identification of nonlinear systems with all states ob- scured by noise, but having no input noise process, Kirvaitis [KI] [FU-Z] has indicated how stochastic approximation may be used to develop convergent algorithms . l7 2.3 Basic Equations and the A-Posteriori Distribution For the results described in section 2. Z, a different approach is taken for each problem. Although each of the algorithms has a similar form, their derivations, where any are given, are directed. solely to the problem at hand. By adopting a Bayesian approach to system identification many of the above results may be combined into a single structure and furthermore, new results can be derived. Consider the system of equations : 2 Xk+1 fk(xk’ uk’ gk’ ‘1) ' 3'1 and the observation equations - (x 1 B‘ Z 3 Z yk _ gk k, “R, } o o _ u + 7- 2 3 3 Vk ‘ k nk ' ° where each of the variables is defined in section 2.1. Identification of the system consists of finding a sequence of estimates for o. and B that converge to the true values based on observations of v and yk. k As indicated in section 2. l, the conditional eXpectation k k ._ . . . ENJ/y , v ), where 41 : (o, B), 18 the optimal Bayes estimator for LP based on k observations. Thus, if it is possible to obtain a recursive relation, either approximate or exact, for this estimator as a func- tion of observations and previous estimates, then a solution to the identification problem will have been found. Of course if an approxi- mation is used then convergence of the estimates to the true value must be e stablished . 18 To obtain the desired conditional mean, the a-posteriori den- sity of the unknown parameter vector Ll» is needed. This can be found as follows. k+1 k+1 p(¢.Y .v ) (' k k v ) pw,y ’V’yk+1’ k+1 k k k k k k = p(yk+1.vk+l/¢.y .v )PNJ/Y .v )p(y .V) Also k+1 k+1 k+1 k+1 k+1 k+1 p(¢,y :V ) : p(qJ/Y 3V )p(y ,V ) Thus k k k k k k (W k+1 k+1) _ p(yk+1.vk+1/¢.Y .v )PNJ/Y .v )P(Y .V) p V ’v ' k+1 k+1 p(y .v ) Since k+1 k+1 k k k k p(y .v )- p(kaNkH/y .v Mr .V) then k k k k PW/Y .v )= k k 2.3.4 £363“vathr .v ) where k k k k k k p(ka’VkH/Y .V ) = S¢P(¢/Y .v )p(Yk+1.vk+l/LP.Y ,v )dqa. Equation 2.. 3.4 is a statement of Bayes rule and is a recursive k+1 k+1 , v relation between the probability densities p(¢[y ) and p(LIJ/Yk, vk). The form of this relation indicates that as observations yk+l' vk+1 become available the conditional density for up can be up- dated. Since the true value of up is a constant, if certain conditions 19 on the densities hold,then [SP] it is possible to show that the condi- tional density p(¢/yk, vk) approaches an impulse function centered at the value L11. The density p(y v /4J yk vk) is required in the relation k+1’ k+1 ’ ’ 2. 3.4. This is obtained from the state and observation equations 2 2.. 3.1, 2. 3. Z and 2. 3. 3 and from the distributions of 11]: and 11k. However, this is generally a very difficult task except in Special cases. 2. 4 Alternate Forms for the A-Posteriori Distribution There are two alternatives to the direct use of equation 2. 3. 4. These alternatives will be developed here and shown to be of a less tenable form than equation 2. 3.4. One iterative expression for the a-posteriori density of 4; can be derived as follows [AO-l] , for a simpler problem having k k, k k k k = p(¢.xk/Y )p(Yk+1/¢.xk.y .xk+1)p(xk+1/¢.xk.y ). Also k k+1 k p(¢.xk+1.Yk+I/Y )=p(¢.xk+l/Y )P(Yk+1/Y ) k I - d - S p(w.xk.xk+l.yk+l/Y ) xk k 20 Combining these relations gives k k k S pdxk+1 4 X k+1 the relations 2.4.1, 2.4. 2 give a recursive relation for the a-posteriori distribution of 4;. The indicated density functions in 2.4.1 simplify because, from 2. 3.1 and. 2. 3. 2 assuming Ek's and ni's are independent, k p(vk+1/¢.xk.y .xk+1) = P(Yk+1/f3: ka) and from 2. 3.1 k P(Xk+l/LP, xk. Y )= p(xk+l/a. xk) . DeSpite the apparent usefulness of 2. 4.1 there is a serious drawback to its application. This is the difficulty of performing the required integration. Even with simple gaussian random variables and a linear system it is not clear that the integration can be carried out in closed form. Even if it were possible to do so, there is no reason to susPect that a recursive relation between conditional means for up, which are the optimal Bayes estimators, would be obtained . 21 Another form for the a-po steriori distribution of LIJ can be ob- tained using 2. 3. 4, and noting that (again assuming 11k = vk) k k k p(YkH/tbm )- 5X p(yk+1/Lb,xk+1.y )p(xk+l/Ll4.y )dxk+1 2.4.3. k+1 Since PW /¢ x Yk) = p(y /Bx ) k+1 ’ k+1’ k+1 ’ k+1 this would appear to be a useful relation. However, establishing the density p(ka/kp, yk) presents in effect an optimal nonlinear filter- ing problem. Even in the linear case with gaussian random variables the problem would. be very difficult since it requires the solution of an optimal (Kalman) filtering problem for the mean Qk+1 as a function of up and all the past observations yk. The covariance matrix is given by the solution to a Riccati equation. To establish that the exact form of p(ka/xp, Xk+l' yk) is indeed difficult to work with and furthermore to establish a basis for approximation, the foliowing linear example is considered. Let X n k+1 “ xk + gk Yk : Xk + ”k where all variables are scalars, while gk and nk are zero-mean, white, gaussian processes. To utiiize 2. 4. 3, the density /¢, yk) is required. From well known results [K- 2, 3], 1’ (xk+1 this density is normal with mean 32 and variance P , where k+1 k+1 22 P A z A _ _ 9 2 Xk+1 ‘1 Xk + r (Yk k) “4'4 and 2 1Dk : Z — ——--— 2. Pk+l a Pk+ q r . 4. 5 The terms q and. r are the variances of Ek and 71k, respectively, (assuming stationarity). The observation equation yields k , * p(vk+1/¢,xk+l.y )NN(xk+1 . r). Using these relations in equation 2. 4. 3 it can be shown that the exponent of p(ka/tp, yk) equals (except for the -l/ 2 factor) —— { — 9 r" r + P yk+1 k+1 ° k+1 Therefore . k A . p(YkH/W’Y ) N("1<+1° r + Pk+1) . . A ,. k . U81ng 2. 4.4 to write x as a funC';;cn of y gives k+1 A _ _ \( _ p _ A xk+1 — (a Pk/rho. Pk_l/r)....(a PO/r)xO P x, . _ _2 +(a-Pk/r)(a-Pk_1/I)....(a Pl/r) r Yo Pl +(a - Pk/r)....(a - 132/1“)? Y1 P I k-l + (a - Pk/r) —— yk_1 .i r yk * The symbols ~N(a : b) are used to denote the normal distribution with first argument as mean, and second argument as covariance. 23 where the Pj's satisfy 2.4. 5. This is a rather involved expression for ka, and is inconvenient for two reasons. First, to obtain a recursive relation for a from 2. 3. 4 would require storing all observations, which is undesirable for practical reasons. Second, the expression in the numerator of 2. 3.4 must be maximized as a function of a to find the conditional mean. This would be a difficult task because of the form of Qk+l° A simple approximation to the density p(yk+1/¢, yk) can be obtained from 2. 4. 4. By taking 32 2. 4. 4 becomes kzyk! A .— Xk+1 - ”k 2.4.6 and the variance may be approximated as Z P +r=q+r+ZaP-Pk/rf_q+r+s}:=s k+1 k k for some 31:) O. This is eSpecially convenient because if p(a/yk) . , _ k k has a normal form in 2. 3.4 then the product p(a/y )p(yk+1/u, y ) . . . . ,. , k+1 1s ea31ly max1mized With reSpect to a and p(u/y ) has a normal form also. That is, the density p(a/yk) reproduces [SP] and only one observation needs to be retained. The relation 2.4.6 is sometimes exact. If there is no system observation noise (r 5 0) then 2.4.6 is exact since yk : xk and the variance for p(yk+l/a,yk) is s = q. This is the case studied by Mayne [MA] and one of the cases considered by Fukao [FK-Z, 3]. With these results in mind it would appear that using 2. 3.4 directly is a more logical way to proceed than either of the indirect ways indicated. Equation 2. 3.4 gives a very fundamental relationship between the density functions. More detailed information about the density functions can be supplied by the form of the system being con- sidered. As was seen in the simple scalar example, some approxi- mation may be necessary in the utilization of 2. 3.4 but the effect of these approximations is more clearly seen in this relationship than in the others described. Furthermore, the algorithms which result from this relation are quite simple and can be realized by conventional computing te chnique s . 2. 5 Summary Under the general structure of Bayes estimation it has been shown that a general class of identification problems can be formu- lated. In examining previous results it was seen that there is little uniformity of approach. Ho and Whalen's result for the linear stationary autonomous system is derived ad—hoc and the convergence follows from stochastic approximation. Mayne's work is a refor- mulation of linear filtering and does not allow any observation noises. Fukao's results apply only to stationary systems with no input observation noise. Further, no derivation of results is given or motivated and only a single restrictive convergence theorem is given. A single point of reference in attacking these problems is 25 provided by Bayes' rule, which also allows computationally feasible algorithms to be developed. Because of the difficulty in forming the exact density functions required by Bayes' rule, an approximation was seen to be desirable. III SOLUTION OF THE BAYESIAN IDENTIFICATION PROBLEM The general form of the problem to be solved has been given in Chap. II. In Sec. 3. 1 an iterative equation will be derived that relates the estimates at time k-l to the kth estimate and observa- tions. The conditions under which this algorithm converges to the true value of the parameters and the sense of convergence are given as convergence theorems I, II and III in Sec. 3. 2. This result is related to those given by other authors in Sec. 3. 3. 3. l Derivation of the identificaticn Algorithm In this section, the identification algorithm for a class of systems is derived using 2. 3.4. In particular, for the system of Figure 3.1.1 the equations are now assumed to be of the form : ' \ : .l ‘l \ xk+1 fk(xk, uk, gk, (1, Dk‘xk' akja + ék 3 . 1.1 - x + 1 3 1 2 YR ’ k “k ' ‘ - u + Z 3 1 3 Vk _ k nk . 0 where Dk(.) is an n x p matrix, a is a fixed but unknown p-vector 26 Z7 and E(§)-E(1)-O E(Z)-O 314 k - nk " ! nk " . . a me, tT)=Q k2) 3.1.4b kj k’ :0 , kij 3.1.4C 1 ’ék m .< u k+1 - x k Delay k I l l > Dk(.)a A ‘ a“ 3k 2 “k k Figure 3.1.1 General System Configuration 1 Z T . E(nk[nj] ):O for all k,J 3.1.4d E(i[ i]T)-Ri fork-"i-IZ 3l4e nk nj _ k — J 9 _ l 0 . = O forkyij‘,i:1,2 3.1.4f i T . . and E(§k[nj] )2 O for all R, J', 121, 2 3.1.4g Since I3 is known in 3.1. 2, 2. 3.4 becomes k k k k H vk+1 p(a/Y ’V )p(yk+1’vk+l/a’y ’V ) k p(G/Y 9 ) Z k k 3.1.5 (V v /Y v ) p k+1’ k+1 ' 28 In 3.1.1 - 3.1. 4 assuming v are independent and that d k+1 3“ yk+1 vk+1 is independent of a, vk and yk gives k k k k p(yk+1"’k+1/“’y 'V ) ‘ p(ykn/a’y 'V )p(vk+l) ’ Thus p(ka) can be canceled in the numerator and denominator of 3.1. 5. This gives the recurrence relation k k k k k+1 k+1 Phi/y .v )P(Yk+1/a,y ,v ) P(G/Y ,V )2 k k 3.1.6 where the denominator equals the integral of the numerator over a. The vector u is assumed to have a gaussian a—priori d is- tribution, with den 3 ity -1 A - % (a -80)TPO (Ct-(10) o) e O V(Zn)pdet(PO) where det(.) denotes determinant. p(u/YO. V O O i, ’A o p.a/Y .V )~N\a0. P0) The form of the density function po(a) and the mean and co- variance matrices are assumed to be given and reflect the a-priori uncertainty concerning the parameter vector u. If th e a-priori in- formation concerning a is small, then the Po matrix is large in norm (c.f. Sec. 3.2). If the true value of o. is known with some certainty then the matrix P is small in norm. The initial mean 0 7-9 value 60 is taken to be the best available estimate of the tr ue value of a. The density function p(yk+1/a, yk, vk) is now needed . From 3.1.1, 3.1.2, 3.1.3 1 yk+1 ‘ XL<+1Jr T'1<+1 - D (x u)u+§ + 1 ' k k’ k k “k+1 1 Z 1 ’ Dk(yk- T‘k’ Vk‘ T‘k)‘1 +§k +quk+1' This equaticn gives a basis for approximating the function p(yk+1/u,yk, vk) as discussed in Sec. 2.4. Taking k k E(Yk+1/Q:Y 9V ) — Dk(yka Vk). 0- _ k k and cov {yk+l/a,y ,v } = Sk where Sk is larger (in norm) than Qk + Rk’ the density is approxi- mated as normal. Thus k k ‘ ~ I . . 3 o 0 At the kth step suppose k - p(a/Y ,Vk)~N(ak', Pk) . 3.1.8 o t q - u o k Since p (a) 15 normal, if a recurswe relation between p(a/y , v ) o and p(a/yk+l, vk+l) is obtained having the a-posteriori as normal then a complete algorithm will be established. The a-posteriori mean and covariance will define the algorithm, with the mean being taken as the estimator of a. 30 Using 3.1.7 and 3.1.8 in 3.1.6 gives k+1 k+1 p(G/Y s V ) : 1 T -1 kexp-EUa-fik) Pk (a—Gk)+(yk+l-Dk(y k,vk )(1) Tskl (Yk'l'l- D k(Yk. Vk)a)} C [numerator] do “a 3 . l. 9 wh ere k is a constant. Since it is known that the exponential family reproduces [SP], the density p(a/yk+1, Vk+l) is normal. Thus it is only necessary to find the maximum value of the numerator to find the conditional mean. The exponent of the numerator (except for the constant - 1/2) equals -1 T -1 Pk + Dk (yk,vk)Sk Dk(yk,vk)}a T -l A T -1 .. 2 o o 0. {Pk Gk + Dk (yk, Vk)Sk yk+1} 3 l 10 T —1 -1 A + {Gk Pk“ k +yk+lsk yk+1}' To maximize the numerator cf 3.1. 9, the quadratic form 3 .1. 10 must be minimized. This is accomplished by taking (1, and thus the a-posteriori mean Gk“, to be T A — a _ {PkA k+13k (yk,v )s } 3.1.11 k+1 P+l< k k yk+1 where The matrix Pk+1 is the a-posteriori covariance matrix ass ociated with p(a/yk+l, vk+1). The details of the argument are found in Appendix II. 31 Equation 3.1.11 and 3.1.12 can be written in a mor e con- venient form A "lA T -l = P + D , s C1k+1 pk+1 k “k pk+1 ka Vk)1< yk+1 A ‘1 '1 A T '1 = +P p - +p D , s “k k+1( k Pk+1 )ak k+1 k (yk Vk) k yk+1 From 3.1.12 13'1 P ’1 — D ( )s"1D( v) ‘ k+1 " " k Yk’Vk k kyk’ k so that G :6 +13 {-DT( v)S-1D ( v we +p DT( v )s’1 k+1 k k+1 k Yk’ k k kyk' k k k+1 k yk’ k kyk+1 =6+P DT(y VIS-l[y -D(y v)8] 3113 k k+1k k'k’k k+1 kk’ kk " Using a matrix inversion lemma [AO-l][HOU] , pk+l can be written as, from 3.1.12 )P T - T -1 k+1 k k k 3 .1. 14 Equations 3.1. 3 and 3. 1.14 represent the identification algorithm for the system 3.1.1, 3.1. 2, 3.1. 3. This algorithm gives a Bayes-optimal sequence of estimates for the vector (1 when~ ever the system is linear and no observaticn noises are present. Otherwise, the estimates are sub-optimal. The form of 3.1.13 is of interest and has intuitive appeal. From 3. 1. 1, the te rm D , v )u can be interpreted as a prediction of the next obser- ka k k vation yk+l' based on the present observations yk and vk and on 32 . . A . the present parameter escimate (1k. Therefore the differe nce D ( (ka' k )Gk) represents an error term, and the present yk! Vk estimate 3k is updated by an amount proportional to this er ror. The algorithm 3.1.13, 3.1.14 is actually a generalized stochastic approximation algorithm [DV] [ BL] [ KIE] [ CHU] [ROB] with a gain sequence that depends upon the observations rather than being specified a-priori. 3.2 Convergence Theorems and Assumptions Convergence of the general algorithm 3.1.13, 3.1.14 will be considered here. Conditions under which this algorithm con- verges to the true value a, and in what senses will be given. The proofs for the three theorems to be stated are found in App endix 1. Equation 3.1.13 can be written in error form by defining _ A _ 6k — uk (1 and writing A A T -l T -1 A _ = .. + D S - P D “k+1 ‘1 “k “ Pk+1 k k yk+1 k+1 kSk Dkuk 01' T «1 T -1 T -1 z D ‘ - p D c — p D Ek+1 €15" pk+1 ksk yk+1 k+1 ksk Dk‘k k+1 k Sk Dk“ T ..1 T -1 = - D ‘ + D - D Ek+1 (I Pk+l ksk Dk)Ek Pk+1 ksk(yk+1 k“) where Dk is assumed to represent Dk(yk, vk) for convenience. From 3.1.12 33 -1 T -l : P D o Pk+1k +Pk+1kSk Dk Substituting this into the equation for €k+1 e — P P’16+P DTS'1( D0) 3’21 k+1—k+lkkk+1kkyk+1k° " Equation 3. 2.1 relates the estimation error at the (k+1)St step to the previous error and the true value of the parameter (1. Conditions under which 6k approaches zero and in what sense are now given. The concepts of system identifiability and strong identifiability are introduced. These concepts are fundamental to the convergence theorems, and indicate whether sufficient observa- tional information about the system is available. Other assumptions required for convergence are also stated. The norm of a vector X and a square matrix A as used in this discussion are defined by 2 H XHZ = 2 'x I i and >:< H All = x (A A) max where (’1‘) denotes conjugate transpose and X ax(.) denotes the m largest eigenvalue of the matrix argument. Definition 3.2.1: The system 3.1.1, 3.1.2, 3.1.3 is said to be identifiable if there exists a finite positive integer q such that the matrix sum 34 k+q 2 D.T 371D. jzk 1 J J is positive definite with probability one (WPl) for any k. Assumption 3.2.A1 . ”30- all < 00 and O < HPOH < oo. Assumption 3.2.A2 : E{(y, _ Dja)/’yJ, v3} = 0 ‘J+1 k k . T -1 T Assumptlon 3.2.A3: HE{ E D, S, (y,+l- Dja) Z (y.+l- D.a) j=k-q '1 J J j=k-—q J J SZIHI _<_ M < oo .5 Convergnce Theorem I: If the system 3.1.1, 3.1. 2, 3.1. 3 is identifiable and assumptions 3. 2.Al, 3. 2. A2, 3. 2. A3 hold then the algorithm 3.1.13, 3.1.14 converges to the true value of the parameter vector u in the sense that Hcov(8k- a)“ : Hcov EkH»O as k -+oo . i. e., the norm of the error covariance matrix converges to zero. Assumption 3. 2.A4: k -1 E P ' +P >3 D. . . ”Pk“ k-q e1<.q k+1 k S; (Y -1 2 5 EHPk+1Pk_q ek_qll + EHP Assumption 3. 2. A5: k 2 Ell 2 D. (y. Djafll 5 M< oo . :k- J 35 Convergence Theorem 11: If the system 3.1.1, 3.1. 2, 3.1. 3 is identifiable and assumptions 3. 2.A1, 3. 2. A4, 3. 2.A5 hold then the algorithm 3.1.13, 3.1.14 converges to the true value of the para— meter vector u in the sense that 2 A 2 EUIEkH } = EHlak- all }-> 0 as k -> 00. i.e., the error norm converges to zero in mean square [PAP]. Definition 3. 2. 2: The system 3.1.1, 3.1. 2, 3.1. 3 is said to be k strongly identifiable if the minimum eigenvalue of E D.TS-.lD. satisfies k x . {2 D.Ts.'1D.} > ckp min ij J J J — with probability one (WPl) where 1 < p < 00 and O < c < 00 are constants . k Assumption 3.2.A62 H E D.T5:l(y. . , +1 J=k~q ‘ J probability one (WPl‘j, where M is a constant. - 13:0)” < M < 00 with J __ Convergence Theorem 111: If the system 3.1.1, 3.1. 2, 3.1. 3 is strongly identifiable and if assumptions 3. 2.A1, 3. 2.A6 hold then the algorithm 3.1.13, 3.1.14 converges to the true value of the parameter vector (1 in the sense that ”6k” 5 ”316 uH-e O as k —» co with probability one (WPl). 36 This theorem is equivalent to one given by Fu kao [FK-3], but the proof given in Appendix I is distinct. Assumption 3. 2.A1 is common to the three converg ence theorems and is very reasonable. Assumptions 3. 2.A3, 3. 2. A5 and 3. 2.A6 are constraints on the magnitude of the random variables and their moments, and are easily satisfied. Assumption 3. 2. A6 cbes not hold for gaussian random variables. This is not a stringent con - dition because in practical situations all variables are bounded. Assumption 3. 2.A2, as required by Theorem I, is true for linear systems, but is not true in general for nonlinear systems. Theorem II is similar to Theorem I. In the former, assumption 3. 2. A4 replaces 3. 2.A2 and allows convergence in mean square of the error norm as opposed to convergence of the error covariance norm. Theorem 11 applies to linear systems, but in general will apply to nonlinear systems only if 3. 2. A4 can be verified. The condition of identifiability is required by both Theorem I and Theorem II. Conditions under which a system is identifiable ar e discussed in Chapter V. The strong identifiability of the system as required by Theorem III is sufficient for identifiability, as shown "‘ in Appendix 1. Theorem III may be applied to nonlinear systems. 3. 3 Derivation of Previous Results as Special Cases The form of the system equations 3.1.1, 3.1. 2, 3.1. 3 are sufficiently general to permit derivation of known results as Spe- cial cases. 37 The problem studied by Mayne [MA] is considered first. Equation 2. 2. 5 may be rewritten as T Xk+1 ‘ Xka+§k and yk : Xk where r- T — ‘1 xk 0 al T T Xk : Xk , O. — ] 3.2 3 3.1 o I o 0 “J g k T n and aiT is the ith row of the matrix A. In the algorithm 3. 1.13, 3.1.14, the D .) matrix is taken as k( Dk(y'k, Vk) = Dk(yk) = D(Xk) = X 3 . 3. Z k Since there are no observation noises, p(yk+1/a, yk, v ) need not be approximated since k k k p(yk+l/a3y ,V )- p(yk+1/a9x) and. y :x =Ax +§ =XTQ+§ k+1 k+1 k k k k Therefore w1th 5k: Qk k T , p(yk+1/a.x )’VN(Xk a , Qk) 38 where 0k is the covariance matrix of the gaussian random variable gk. The definitions for a and Dk in 3. 3.1 and 3. 3. 2 are then used with the algorithm 3.1.13, 3.1.14 to solve the problem. If the matrix A varies with k then Dk is changed at each step accord- ingly. If inputs are also applied then D (.) must include these noise- k free observations. The results of Fukao [FK—Z, 3], for a linear system with no observation noise, are the same as those given by Mayne and can be derived accordingly. When observation noises are present at the output, then uk: v and Dk is taken to be k T T I o Yk O “k Dk(yk, Vk) : D(Yk» uk) : o . . . T T 0 yk 0 uk L .1 The nonlinear problems studied by Fukao are also included in the formulation 3.1. 1, 3.1. 2, 3.1. 3. For his formulation there is no observation noise on the inputs so that v = u Also, time-vary- k k. ing systems are not considered so that the D .) matrix remains k( fixed, Dk(yk, vk) = D(yk, uk) . The identification of the system studied by Ho and Whalen [HO-1] is easily handled. There is no input noise, so Qk’ the covariance of ék, is identically zero. There are no inputs so 39 uk 2 vk = 0, and the matrix A is fixed. All states are observed. Therefore in 3.1.13, 3.1.14 Dkwk. vk) = D(Vk) = '. and this type of system can be identified accordingly. Kirvaitis [KI] [ FU-Z] considers identification of nonlinear systems with all states obscured by noise and with no state dis- turbances. If the system is taken to be time-discrete then 3.1. 1, 3.1. 2, 3.1. 3 are sufficiently general to include this case, taking gk = 0, and. vk 5 uk. 3. 4 Summary An identification algorithm 3.1. 13, 3.1.14 has been derived for the general system 3.1.1, 3.1.2, 3.1. 3. This was accom- plished using a reproducing gaussian distribution for the para- meters which specify the system. The convergence of this algorithm to the true parameter values is established by convergence Theorems 1, II and III in Sec. 3. 2. Theorems 1 and II are new results in identification, and the algorithm 3. 1. 13, 3. 1. 14 is more general than any given previously. The algorithm can be Specialized to give results of other authors as indicated in Sec. 3.3. IV SPECIAL FORMS OF THE ALGORITHM System equations of the form xk+1 : Dk(xk’uk)° +§k 1 Yk ‘ Xk+nk Vk : uk+nk have been considered in Sec. 3.1. The identification algor ithm for these systems has the form 3.1.13, 3.1.14. For a given system structure, implementation of the equations requires that the Dk(yk, vk) matrix be found. Special forms of Dk(.) for some important classes of systems are given in this chapter. The convergence theorems of Sec. 3. 2 are discussed in relation to each system. The formulations for linear, stationary, and time-varying systems are given in Sec. 4.1 and 4. 2, respectively. Nonlinear stationary, and time-varying system formulations are given in Sec. 4. 3 and 4. 4. Section 4. 5 discusses the formulation 40 41 for systems with generated inputs, for systems with state distur- bances having unknown mean, and systems with indirectly ob- served states. 4.1 Linear Stationary Systems When all states are observed the equations for a linear, stationary system have the form xk+1= Axk + Buk + gk -x + 1 Yk‘ k “k Vk‘uk T'k where A is an n x n matrix with rows aiT and B is an n x r matrix with rows biT. The entries of the matrices A and B are to be identified. The state equations may be written as xk+l : [A B] xk + gk “k _ .1 _ T T F al b1 xk = : + ék a T b T u n n k _ _J ._ _. Rearranging into the form 3.1.1 42 I T T xk 0 : uk 0 a1 T T a Z 2 o xk+1 xk : uk + gk 4 1 l O I . O O x TI 0 u T an k k b _ I a l b n T T xk+1 ‘ [xk Uk 1°+§k ' D(xk’uk)a+gk 4'1'2 Thus for the linear, stationary system, the Dk(.) matrix is defined by 4. l. 2. Conditions under which such a system is identifiable can q now be given. For 2: D,Ts.'1D. to be positive definite (WPl) j=1 J J J xT(‘Z D.TS.'1D.)x 2' 23 xTD.TS.-1D.x = 2(D.X)TS.-1(D.X) > 0 j J J J j J J J J. J J J for all x 7! 0, where Dj denotes Dj(yk, vj). Since Si"1 is positive definite, no x 7! O can exist for which 29 N n >4 II o where K) has nq rows and n(n+r) columns. This equation has only 43 the trivial solution x = 0 if the rank of E is equal to n(n+r) so that q 2 n+r is required. The 9 matrix has the form I I l 0 O O Y1y I Y2 I I yq l I I I T _ O ’ l O ‘ | l 0 3 Y1 | Y2 I ' ' 'I Y<21 v1 . 0 : v2 0 : : vq 0 0 v1 | 0 v2 I | O vq L I I I _ so that the rows of 9 (columns of QT) are linearly independent if any (n+r) of the pairs [yiTvi ], i = 1, 2, . , ,q, are linearly inde- pendent. This is satisfied (WPl) if both yi and vi contain additive noise components. The noise process gk guarantees (WPl) the linear independence of the yi vectors, which is sufficient for the linear independence. To apply convergence Theorem I of Sec. 3. Z to linear stationary systems, assumptions 3. 2.A1, 3. 2.A2, 3. 2. A3 must also be satisfied. Assumptions Al and A3 are very reasonable. Assumption A2 is satisfied since, with all random variables having zero mean E(v. -D.a) = E(. -A .- Bv.) J+1 J y yJ J 3+1 : E(x, 1 2 - An. - Bn.) J+1 J J 1 - Ax.- Bu.) + E(n, J J J+1 1 1 2 = Eé.)+E . -A .-B. (J (TIJ+1 TIJ TIJ) = O 44 Therefore Convergence Theorem I applies to the linear stationary case and the algorithm 3. 1.13, 3. 1. l4 converges accordingly. Con- vergence Theorem II may also be applied if assumptions 3. 2.A4 and 3. 2.A5 hold. 4.2 Linear, Time-Varying Systems When all states are observed the equations for a linear, time- varying system have the form N II k+1 Akxk + Bkuk + gk vk = uk +nk Using 7 T _ F T — _ '— al (k) Io1 (k) al(k) a (k) T T 2 Ak= a2 (k) . Bk = b2 (k) . wk). a.(k) T T n an (k) bn (k) b (k) __ _ _ .J l bn(k) where aiT (k) and biT(k) are the rows of Ak and ER respectively, the variation of 7(k), and thus the variation of Ak and Bk’ is assumed to be governed by 45 7(k+1) = Auk“) 4.2.1 Z(k+l) = GZ(k) 4.2.2 The matrix LA is unknown, Z(k) is a q-vector, and 7(k) is an n(n+r)- vector. The matrix G and the initial value ZO are known. From 4.2.1 and 4.2.2 7(k+l) = 942(k+1) = cAG Z(k)=cAGk+1Zo Then ._ I .— T T xk O : uk 0 Akxk+ Bkuk = .. l .. 700 TI ° T O xk I 0 uk T T k _ [xk Uk 194020, k . The vectorc/iG Z0 can be written LA sz : (A h : H T a. o k k where T T T T a' = [a' ..o.' ],,_A = [(1.02 0'] 4.2.3 and hkT o T T k k-l : : Z . 020 Hk hk , hk G Z0 G G Z0 4 4 0 hkT L J 46 Therefore _ T T T ,_ Axk+Buk— [Xk Uk ]Hk a — Dk(xk’uk a 4.2.5 where a' generally contains both known components and unknown components a which are to be determined- Equations 4. 2. 3, 4. 2. 4, o 20 . g o 4 5 define Dk(xk uk) Example 4. 2.1. The use of 4. 2. 3, 4. 2.4, and 4. 2. 5, is studied. in this example of a time-varying linear system of second order. - qr _ — — ‘P — —l — - xk+1 aLo+al°k a2 xk b1 g1k : + u + 2 k yk+1 a3k a4k yk b2k ng _ .. .__ -— — _I. L d. b _I The a1 and bi are unknown parameters. The vector '7 is then given by 2 1(, a k,13, b 4 1 2k] k, a2, a 3 T 7 (k) = [ao+ad 01' 7T(k) : [ao+a k”), a , a k(2) 1 Z +ak 3 3 (l) a karb (1) .4 k1 , b 1 2 where the superscript (. ) denotes a factorial polynomial an d is defined by [HAM] (n) x : x(x-l)(x-2)...(x-n+l), n_>_1 (0) and x =1. The terms in the second. expression for 7T(k) are obtained from Sterling numbers of the second kind. 47 Equation 4. 2.1 is written as 0 a1 a0 0 0 a2 'k(2)— wk) = vimk) = a3 a3 0 k“) 0 a4 0 km) 0 0 bl '— " LO b2 0 and 4. 2. 2 is written as Z(k+1) II 0 N E — q _ -— (k+l)(z) 1 2 o k (k+l)(l) = o 1 1 k( (k+l)(0) o o 1 k — — I-— _u~ and Z0T = [O 0 l] . These relations are used with 4.2. 3, 4. 2.4, ,u). and 4. 2. 5 to determine D xk k k( From the cA matrix, the (1' vector to be found has the form T l ._ a — [Oa1a000a2a3a300a4000b1ObZO] . Since some of the components of a' are known, further reduction of the problem is possible. For some k, Then the product H To ', where H k written as, eliminating the zeros of a' Re-ordering the column vector and eliminating the redundant a gives Multiplying H k row '0!” 0 0‘0.- r (02a) 3 w O‘Cl"!u NHIFUND-‘O L _ T on the left by [XkTUkT] gives T is defined by 4. 2.4can be 3 49 [XTUT]HTQ,_ Xkyk O O “k O (”302 0 8'0 k k k - o 0 o ‘03 8L1 Xkyk “k a _ ((01+w2) a2 002 a3 w 4 o 3 b1 so that, performing the indicated multiplication, ka3 kaz ykw3 O O u (1) O a D (x ,u )-o. = k k k 0 0 O ( + ) Xk “’1 “2 3'sz k 2 L. .J When the vector 0. is found, 7(k) can be calculated, and from this, AR and Bk. Assumptions 3. 2.A1 and 3. 2.A3 as required by convergence Theorem I of Sec. 3. 2 are easily satisfied. Identifiability will be satisfied generally. An indication of why this is true is given by the example of this section. Assumption A2 is also satisfied so that the algorithm will converge. Assumptions A4 and A5 may als 0 hold so that the algorithm would also converge in the sense of Theorem 11. 50 4. 3 Stationary Nonlinear Systems The system equations for a stationary nonlinear system with all states observed are .. f( xk+1 ’ Xk’ ”16“) + gk 1 Yk " ka‘k 2 k k k v :u+'r] where f(.) is a vector valued function. Its components satisfy fi(xka uk: 0.) = [fl]. (xk’ uk), {12(xk, uk): 0 o - : flip (xks “kn ° 3 for i : 1, 2, . . .n. Each of the n- p functions fij(' ) is assumed known. Therefore, the required Dk(°) matrix is r _ f11(xk,uk) . . . f1p(xk’ uk) Dk(xk’uk) = I Z 4°3'1 fn1(xk’uk) ° ° ° fnp(xk'uk)J Example 4. 3.1 : To illustrate, the third order nonlinear sta- tionary system _ _ F g _ l _ _, Xk+1 ao+ a‘1"15r azxkyk O glk 2 yk+1 ' a33‘1: + a‘4.”4. + ID1 “k + ng zk+1 assm Zk b2 g3k is consideredwhere all variables are scalars. 51 This may be written as _ _ _,,_ i _ _ ka+1 1 xk xkyk 0 O O O 0 a0 glk — o o o x2 0 u 0 a1 + g yk+l ‘ k yk k 2k aL2 zk+1 o o o 0 o Slnzk 0 uk a §3k L _ L _ 3 _ _ a4 a5 b 1 sz 01' - xk+1 yk+1 : D(Xk’yk’ zk’ “13° ‘1 + gk zk+1 Convergence Theorem I is not generally applicable to non- linear systems since assumption A2 may not hold. If assumption A4 can be justified, then Theorem II can be applied. Generally, however, Theorem III will be required for nonlinear systems. Assumptions Al and. A6 hold easily. Identifiability generally holds, as can be conjectured from the form of Dk(° ) in the pre- ceding example. Assuming strong identifability gives convergence in the sense of Theorem III. 4. 4 Time-Varying Nonlinear Systems The system equations for time-varying nonlinear systems with all states observed have the general form 52 = f Xk+1 k(xk’ uk’a) + ié‘k - x + l yk ‘ k "k Vk : uk+nk - By analogy with the stationary nonlinear case, the Dk(°) matrix has the form — — fll(k,x u) . . . f1p(k,xk,uk) k’k Dk(xk’ uk) 2 . . fn1(k. Xk’ 11k) . . . fnp(k.xk. 11k: where all the fij (k, x , uk) are known. k Example 4.4.1 2 To illustrate, the second order system x a kx 3 + a y k+1 : o k 1 k + 1k , 2 yk+1 azyk51n k §2k where all variables are scalars is considered. This may be writ- ten as kx 3 0 F a _ g xk+1 k yk 0 1k : Z a + ka o o yksmk 1 §2k aL2 53 As in the stationary case, if assumptions A1, A6 are satis- fied then convergence Theorem III can be applied. 4. 5 Miscellaneous Simple generalizations of the above results are considered. in this section. Sec. 4. 5a considers systems with generated in- puts. Sec. 4. 5b considers systems having state disturbances with unknown means, while 4. 5c considers systems with indirectly observed states. 4. 5a. If the input process u is generated by the equation k ULk+1 : cuk where C is an unknown matrix the function uk can be identified by the matrix C. The observations of uk are obscured by additiv e noise. Vk : uk+nk Then, for example, if the system is linear Xk+1 _ A B xk + gk uk+1 O C uk 0 and x 1 Yk _ k + "k ‘ 2 V u 54 To identify this system, having unknown matrices A, B and C the results of Sec. 4. 1 can be applied and the input generating mechanism can be identified. Convergence Theorem I applies to this case. If the input mechanism is a random process generated by 2 uk+1_ C “k +gk and. the observations are 2 Vk = uk+nk . 2 . . where C 18 unknown and ék is a zero mean white process, then with a linear system fi+1_ABxkg uk+1 O C “k g The Dk(.) matrix is formed as in Sec. 4.1 and the algorithm identifies the input noise process by its matrix C, as well as A and B. By adjoining the input generating mechanism to the state equations, similar generalizations are possible for the other cases considered in this chapter. The corresponding conver- gence theorems will still apply. 4. 5b. If the state disturbance process ék has a non- zero unknown mean m, then the process 55 £11: ék-m has zero mean. For a linear system xk+1 = Axk-tBuk+€.k _ I —Axk+Buk+m+§k _ '- ‘ i — [A B m] xk + gk “k l Th e corresponding observation equation is — — — 1 — - Vk xk "k Vk ‘ “k TIk l J l 0 This formulation allows the unknown mean m to be considered as a system parameter and estimated accordingly. In a similar manner, nonlinear or time-varying systems having unknown mean input processes §k can be identified. 4. 5c. If the observation equations have the form Yk z ka +111: where H is unknown and H"1 exists, then for a linear system, a new basis in the state Space may be found. Letting 56 gives -1 k+1 HAH Zk+HBuk+H§k N II and. Yk : Zk+ "k Thus the system characterized by the matrices HAH”1 and HB may be identified, even though the covariance matrix of the pro- cess Hgk is unknown. 4. 6 Summary The generality of the algorithm 3. 1.13, 3.1.14 has been clearly established by the formulations of this chapter. Equa- tions 4. 1. 1 and 4. l. 2 define the algorithm for the stationary linear case, while 4. 2. 3, 4. 2. 4, and 4. 2. 5 complete the time - varying case. The relations between the convergence theorems of Sec. 3. 2 and these systems is also established. For non- linear systems, 4. 3. l and 4.4. 1 define the algorithm and are used. to establish the applicability of the convergence theorems. Sec. 4. 5 considers other simple generalizations. V AN EXAMP LE In order to show how a specific identification problem may be formulated, and to show that implementation of the pro- posed algorithm is not always difficult, the identification of a fourth- order digital control system with eight unknown parameters is considered in this chapter. Formulation of the problem is given in Sec. 5.1. Computer simulation results for the behavior of the algorithm under different initial estimates, noise conditions, and. a-priori uncertainty are shown in Sec. 5. Z. 5. 1 Problem Statement and Formulation An interesting example of a situation where identifica- tion techniques can be useful is depicted in Fig. 5.1. 1. The system consists of a digital controller C2. and an unknown plant: Q. The plant is assumed to be describable by a second-order, linear differential equation with constant coefficients. The digital controller, which is completely Specified, is described by a pair of first-order linear difference equations with constant coef- ficients. The analog-digital (A/ D) converter transforms the 57 58 (t) into digital inputs for the con- continuous signals e1(t) and eZ troller. The D/A converter accepts the controller's output and transforms it to a zero-order-hold (ZOH) signal. The and controller outputs x and x are obser ved inputs 111 and u 3 4 2 without noise. The plant 63 receives inputs from the digital con- troller c that are corrupted by additive noise €(t). Observations of the plant outputs are corrupted by the additive noise n(t) and. are used as feedback signals. This situation may arise when the plant is located at a remote site. For example, the plant may be a moon probe while the controller is on the earth. The identifi- cation of this system consists in determining the coefficients of the plant's differential equation. The identifier I operates on the observations ul, uz, yl, YZ’ x3, and x4 and calculates the plant parameters. Knowledge of these plant parameters may then be us ed to alter the digital con- troller so that system performance is improved. For example, the controller may be altered so that the overall system is asymptotically stable . The digital controller C, is described by 5.1. 1. d k x3[ (k+1)T] c11 C12 x3(kT) + (111 12 e1( T) x4[(k+1)T] c2.1 c22 x4(kT) d21 d22 e2(kT) 5.1.1 where cij and dij are known for i,j : 1, 2. The constant T repre- sents the length of time between samples. 59 sofimmflhmcmnh >302 5M3 Swamiwm 33:00 fimfiwflm 9.3mm xomnpmmh pad 4.1m enema Hanna 6 3.9 en G mg“ g. 4X 5? «\Q 2:, me 4 H Hmfioufluoo Seaman U 3 __ .— $235.3 H 60 The plant 63 is described by 5.1. 2 , d xl(t) a a x (t) b b v (t) dt II + U1 .l.2 x2“) a21 a22 x2“) b21 bzz V where aij and. bij are unknown for i, j = 1, 2, and all variables are scalars . The gaussian processes €(t) and n(t) are white, have zero mean and are independent. The processes €(t) is assumed to behave as a ZOH signal. From the plant equations and. system configuration x (t) X (t) X (t) d 1 T 3 + T3 €(t) x.Z (t) x2 (t) x4(t) where A and B have entries :51], and bij. Since x3(t), x4(t) and €(t) are assumed to be ZOH signals, at the sampling times xl(k+l) X k) X3(k) = A + B + B§(k) 5.1.3 x2(k+l) x2(k) x4(k) where A and B have entries a? and bij' T : 1 is used for con- venience and [KOE] A = eAT, B = S eA(T"’)1‘3d7 5.1.4 0 If A and B can be found from the identification algorithm then identification of the continuous system is possible using X : ~l-T In A 5 .1 T _ -A - .. and B = [ S 6 7dr] 1 A 1B 5 .l 0 Writing the direct sum of 5.1.1 and 5.1. 3 _ T L ._ x1(k+1) x3(k) x2(k+l) A O 0 x (k) B €(k) 4 : Xk+l : Xk + + x3(k+1) 0 C D e1(k) O x4(k+1) e2(k) _ _ _ J Using the relations Fx3(k)_ r- o 0 1 0 o _x1(k)— P o 1 x (k) 0 o o 1 0 X2(k) + o 4 = x3(k) e (k) -1 o o 0 1 1100 1 x4(k) _ n e2(k) O -1 O O O 111 (k) L d - 112(k) the above becomes A B 0 B gk X = X + u(k) + 5 .1 k+1 -D C k D _an where C and D have entries Cij and d., as in 5.1.1. The observation equations are _ L Y1 (k) Y2(k) Y3(k) v4(k) — b 62 x1(k) x2(k) x3(k) X4(k) ‘ 11(k) .1 Equations 5.1. 7 and 5.1. 8 are of the general form discussed in Sec. 4.1, and the corresponding algorithm can be applied for identification. and x and indicate more clearly how the algorithm can be applied. then 4 x1(k+1) x2(k+1) Defining Y1(k) Yz(k) [a11a12a21a22 are observed, the equations xl (k) xzfld x1(k) szk) bb 11 12 Since the matrices C and D are known, and x x (k) + B 3 X4(k) + TI(1<) b x3(k) x4(k) O 21‘022:I O 3 + B §(k) O x3(k) 25,00 0 O. 63 From Sec. 4.1, lek) Y2(k) 0 0 x3(k) x4(k) o o D (Y ,X ) k k k 0 0 lek) yz(k) o o x3(l<) 35,00 The identification algorithm takes the form T 1 y1(k+1) A _ A " .A ak+1 — ak + Pk+1Dk Sk - Dk 0k} 5.1.9 Y2(k+1) and P =P-PDT(S+DPDT)-1DP . 5110 k+1 k k k k k k k k k ' ° The Sk matrix may be taken as _ x Sk _ Qk +Rk +Sk where Skis some positive definite symmetric matrix and Qk Rk cov[ n(k)] cov[§(k)] 5. 2 Computer Simulation Results The particular example considered in this section uses the matrices 64 -2 0 40/9 0 The plant is unstable since A has eigenvalues of l and 2. Also, the closed 100p system in the absence of a digital controller is unstable since (A-B) = has an eigenvalue greater the unity. However, the complete sys- tem with matrix —D C is stable, with eigenvalues of o, o, +1/«f2 , -1/'\/2. It is also completely controllable since is nonsingular. The example was programmed on the Control Data 3600 digital computer. Ordinary matrix routines were used to simu- late the system. The gaussian noises were simulated using the sum of nine samples from a uniform distribution [SC] . 65 Figure 5. 2.1 shows the simulation results for a typical para- A meter. The average value of b over ten runs is plotted against the 22 number of samples. In this figure, P0 = 51, S = 31, Q = R = I and the initial estimate of all parameters was taken as zero. It can be seen that the initial response of the estimator is good, while as the number of estimates increases, the convergence becomes slow. This is typical of stochastic approximation algorithms. The effect of different noise levels is depicted. in Fig. 5. 2. 2. Here, the normalized error HEkHZ/HEOHZ averaged over ten runs is plotted against the number of samples. In both cases indicated, P0 = 51 and 60 = O. In the lower curve, Q = R = I and S = 31, while for the upper curve Q : R = 251 and S = 501. With the increased noise level, Q = R = 251, it can be seen that the convergence rate is slowed considerably. The final value for this case is 1. 44 x 10"2 as opposed to 4. 34 x 10'”4 for the lower noise level. Figure 5. 2. 3 shows the effect of different initial estimates. Here, P0 = 51, Q : R : I and S : 31. The error norm HEkHZ aver- aged over ten runs is plotted against the number of samples. For the curve having A points, [IEOHZ = l, 300, while for the 0-point curve, HEOHZ = l. 00. As the number of estimates increases, the effect of initial errors disappears and the two estimators have similar asymptotic properties. This indicates that the algorithm's convergence is unaffected by initial errors. The effect of au-priori uncertainty is depicted. in Fig. 5. 2. 4. In both cases indicated, Q = R = I, S : 31, and 30 = O. A ten run 66 7 ’13 22 /\ 1. O »- A 0 —0-———0-——0 ten run average Q = R = I P0 = 51 0. 5 _ S = 31 N0. of samples/2 ‘j\,._l_ l 1 l l l l l l l g 1 2 4 8 16 32 64 128 256 512 Figure 5. 2.1. Behavior of a Typical Estimator 2 2 116k” /||€OH 1 0T 0 ° ”"= “: ten run average P 2 SI . 0 1 eo==0.o 10' _ . ‘ . O O . . . lo‘zt ‘ 10‘3. . A A ‘ 0: Q=R=251, 82501 —4 A: Q = R = I, S : 31 N0. of samples/2 10 W—l l 1 l l 1 1 1 l \ 8 16 32 64 128 256 512 Figure 5. 2. 2. Effect of Noise Power on Convergence 67 2 3'1. Hekll 2 10 [ ‘- A; HE H = 1,300 o 2 102- 0: Neon =1.0 ten run average 1 Q = R = I 10 ' S = 31 P0 = 51 10 dy’ 10'1_ -2 10 - '3 1 I l , No. of samples/2 I 10 l m J A _AV_3 2 4 8 16 32 64 128 255 512 Figure 5. 2. 3. Effect of Initial Estimates on Convergence 2 2 HekH /|I€OH ’t l 0;: 0 P0 = 0.1 I A: P = 101 0 1 ten run average 10‘ _ (2::R:=1 S a -2 10 - 10’3. -4 N0. of samples/2 10 _/\/ J l ' I 1 1 1 1+ 1 l > 1 2 4 8 16 32 64 128 256 512 Figure 5. 2.4. Effect of Uncertainty on Convergence 68 average of the normalized error HEkHZ/HEOH Z is plotted against the number of estimates. The uncertainty associated with initial estimates is reflected in the a-priori covariance matrix Po. If the initial estimates are thought to be close to the true values, then PO will be small (in norm). If initial uncertainty is large, the PC will be large (in norm). In the curve with points A, P0 = 101 and, for the 0-point curve P0 = 0. 11. In both cases the initial error norm is small, having a value of 2.61. As indicated by Fig. 5. 2.4 the effect of initial uncertainty tends to produce large changes in the initial estimates which are close to the true values. This results in errors that are large relative to those produced by the same initial estimate with a smalier Po. As more observa- tions become available, the effects of the a—priori uncertainty become small and both estimates have similar asymptotic pro- perties. 5. 3 Summary The identification of a fourth-order digital control system with eight unknown parameters has been studied in this chapter. The formulation of the identification algorithm in Sec. 5. 1 was shown to be a simple application of the results of Sec. 4. 1. Various properties of the algorithm were demonstrated by a com- puter simulation in Sec. 5. 2. It was shown that a-priori uncertainty 69 and initial estimate errors do not affect the asymptotic properties of the algorithm. It was also shown that increased noise power slows the algorithm's rate of convergence considerably. VI C ONCLUSION The major results of the thesis are listed in Sec. 6. 1 and possible extensions of this work are discussed in Sec. 6. 2. 6.1 Results of the Thesis The formulation of a general system identification problem as a problem in Bayesian learning is carried. out in Chap. 11. This formulation is significant because it provides a unifying structure through which a wide variety‘of system identification problems can be solved. In particular, Sec. 2. 3 shows the basic relations which exist for the general system 2. 3.1, 2. 3. 2. , 2. 3. 3. Chapter 111 gives the solution of the Bayesian identification problem for the general system 3.1.1, 3.1. 2, 3.1. 3. The general algorithm 3.1.13, 3.1.14 is derived under gaussian assumptions with an approximation to the optimal Bayesian estimator, and is optimal for linear systems having no observation noises. This algorithm is more general than any previous results, and can be Specialized to many important cases as shown in Sec. 3. 3. 7O 71 The proofs of the three convergence theorems of Sec. 3. 2 are new, and are found in Appendix 1. Convergence Theorems 1 and II are new and can be more readily applied than those given by other authors. These theorems are important to the thesis because they establish that the algorithm does indeed yield the true system parameters. Chapter IV shows how the algorithm 3.1.13, 3.1.14 can be specialized to important classes of systems. New results or generalizations are obtained for linear and nonlinear systems in each section. The formulation for stationary linear systems is new because of the presence of input observation noises. The time-varying linear formulation is significant because observa- tion noises have not been considered previously in this case. The nonlinear formulations are new because of the input observation noises and non-stationarity of the system. The generalizations of Sec. 4. 5 also contribute to the resuits of this thesis. The feasibility of the identification algorithm is demon - strated by an example of a fourth-order system with eight un- known parameters in Chap. V. A computer simulation shows that increased noise power slows the algorithm’s rate of con- vergence and that the effect of initial estimates becomes negli- gible as the number of observations becomes large. Also, when a—priori uncertainty is large, the effect of initial observations is weighted heavily, resulting in initial transients in the algorithm. 72 6. 2 Possible Extensions The class of systems to which the identification algorithm is applicable is very general. However, the formulation in this thesis applies only to discrete-time systems. A very natural extension of the results given in Chaps. II and III is to continuous- time systems. Conceptually, this is not a difficult problem, al- though the details of the limiting arguments may be difficult. In the continuous case, an equation analgous to 3. l. 13 would be £0”) = P(t)DT(t)S(t)'l{§E y(t) - D(t)0(t)} 6-2-1 and for 3.1. 14 % pm = .. meTm {Sm-1+ D(t)P(t)DT(t)}'1D(t)P(t) 6. 2.2a or :19; p'lu) = DTIt)S(t)'1D(t) 6-Z-Zb Implementation of these equations for analog computer simulations would be difficult, although hybrid techniques would reduce these difficulties considerably. Another useful extension of the results studied here would be the elimination of 3. 1. 14. A deterministic sequence of matrices Pk would reduce the computational requirements. Furthermore, the convergence proofs would be simpler, although the sequence of estimates would be less Optimal. For example, (l/k) I might replace P Other more general forms for P might be con- k° k sidered and the rate of convergence optimized over a class of I n I- 73 The identification algorithm could also be used to study adaptive controller systems. As indicated in Chap. V, the identifier could be us ed to track system parameters and the digital controller could be modified accordingly. Other adap- tive learning loops could be studied. This same philosophy can be applied to communication problems. By utilizing a channel identifier, an adaptive detector could be designed. The convergence proofs of Appendix 1 could be extended. Some of the techniques used in these proofs may be useful in establishing multi-dimensional stochastic approximation algorithms . Finally, the study of canonical forms would be very useful when all system states are not observable. APPENDIX I CONVERGENCE PROOFS FOR THE IDENTIFICATION ALGORITHM The three convergence theorems for the identification algor- ithm 3.1.13, 3.1.14 were stated with assumptions in Sec. 3. 2. In this appendix these theorems will be proved. For convenience of reference the algorithm equations are re- stated here. A A I -l A = P D _ D . u k+1 ak+ k+1 k (yk' Vk)Sk [ yk+1 k(yk' vk)a k] A1 1 T T -1 P k+1 ‘ Pk‘ Pka (yk’ vk)[sk+ Dkwk’ vk)PkD k (3’16ka Dk(yk'vk)Pk° A1.2 With 6k = Gk -1 T -1 : - D O Ek+1 Pk+lpk €13 Pk+1Dk (Yk’ vk)Sk ”k+1 k(yk’ Vk) “1 A1. 3 In the following discussion D will denote Dk(yk, vk) as opposed to k Dk(xk, uk). Since Pi is a covariance matrix it is real, positive definite and symmetric . From the definition of Sec. 3. 2, (P12) = x (P.) ||P.|l =./x (P.*P.) = /x 1 max 1 1 max 1 max 74 75 Also, xmaxuai) : -1 Lemma A1. 1: The product of two positive definite symmetric matrices has positive eigenvalues. Proof: For any square matrix F and any nonsingular G, (FG)TFG and FTF GGT have the same eigenvalues since by the similarity transform GT [ROS] GT[FTFGGT] (GT)-l = GTFTFG . Also, for any nonsingular H, HTH is positive definite sincer 7! 0 xT(HTH)x : (Hx)T(Hx) > 0 since Hx 7! 0. 1 For two positive definite symmetric matrices P and Q, P‘2 and i Q2 are positive definite and symmetric. Thus l .1. l l l T l l l T P0 = PZPZQZQZ = (P2) P2Q2(Qz) , 1 T _1_ _1_ has the same eigenvalues as (PZQZ) (PZQZ). But (PZQZ) is non- _1_ singular since P and Q are nonsingular, so that (PZQ‘2 )T(PZ Q2) is positive definite and has positive eigenvalues. Therefore the pr‘cr-r duct PQ has positive eigenvalues. Lemma A1. 22 If the system is identifiable then the norm of -1 . . Pk+1Pk-q satisfies HPk+1Pk:<11H < 1 76 and the norm of P _1P satisfies k-q k+1 -1 < 1 Hpk-quHH with probability one (WPl). Proof: Only the first inequality will be proved. The second follows from a similar development. By lemma A1.1, Pk+lPk q has posi- tive eigenvalues since Pj is positive definite and symmetric for k any j. By identifiability, Z DjI‘Slej is positive definite and :k-q J symmetric (WPl ), so k Pk+1 2 015711). j:k_q J J J has positive eigenvalues (WPl). From A1. 2 P '1 = P '1 + 2: DTS-ID” k+1 k-q . J=k-q Therefore, pre-multiplying by Pk“ k -1 T -l : - D . 1Dk+lpk-q I Pk+l j—E q j Sj Dj Both product terms have positive eigenvalues (WPl) so that the Jordan form with eigenvalues on the diagonal shows that the eigenvalues of J or those of Pk+1Pkncl1 are strictly less than one (WPl). Thus the maximum eigenvalue is less than one, and 77 -1 < 1|Pk+lpk_q ll 1 (WPl) . An interesting and important property of the matrix P will k+1 now be derived through a sequence of lemmas. Lemma A1. 3. The minimum eigenvalue of a symmetric matrix F satisfies xTFx A . (F) = inf 0 min x7’0 xTx Proof: It is first necessary to show that xTxk . (F) < xTFx min = or that xT[F-x (F)-I]x> 0 min = 1.e., [F - kmin(F). 1] is pOSitive semi-definite. Letting JF denote the Jordan form of F, and J the Jordan form of the difference gives J = JF- Amin(F)' I . Since all eigenvalues of F are on the diagonal of J and Amin(F) E F MF) ’0'- X(F) where M.) denotes an eigenvalue, then the eigenvalues of J satisfy MJ) 2 0. Therefore F - Xmin(F)- I is positive semi- definite. Thus xe T xx k in(F) 5- m VX7-(0. Furthermore, since xminu?) is an eigenvalue there exists x' 3! 0 such that Fx' = x . (F)x' min 78 so x'TFx' = x'Tk . (F)x’ 1n x'TFx' < xTFx and X . (F) = min _ VX;(O. x'x' xx Therefore by definition of infimum [ROY] T x ,n(F) = inf x TF" . m1 xfO x x Lemma A1. 4. A11 eigenvalues of Pk+l approach infinity as k -. 00, WPl, and do so at a rate greater or equal to 0° k, for some con- stant c > 0. Proof: Iterating equation A1. 2 gives _1 1 k/q q.i-1 T 1 P :13” + 2: 2: D.S. D.,k=q,2q,... k+1 o . . . J 121 j:q(1-1) By identifiability, 2 DjTSj-IDj is positive definite (WPl) and has all J' positive eigenvalues. Therefore there exists a constant c' > 0 such that for all eigenvalues k x>c’>0 Using this, Lemma A1. 3, and the fact that for all realizations of {21} ianziZE infzi i i it follows that k -'- T -1 k/q q 1 1 T -1 xmin[ 2 D. s. D.] = "min[ 2 2 D. s. D.] 79 xTz (2D.Ts.‘1D.)x i . J J J . J : inf T xf’O x x T 1 xT(2 D. 5.- D.)x j J J J E inf T i x710 x x l V > z x .(2D.Ts.'ID.) - mm. J J J i J z E C' : €11.18 C”>0. i Therefore 1 1 k T 1 - > .. - min[ Pk+1:l — Xmin[ 1Do ] + >\min[j230]:)j Sj Dj] > > _ co+ clk, co, c1 0 and all eigenvalues of P approach infinity at least as fast as k k+1 (WPl). Lemma Al. 5. The norm of Pk+l satisfies < H PkHH _ cZ/(k + c.) . (WPl) where 02, c3 > O are constants. -1 Pr°°f' HPk+1H ' xmax(Pk+1) ‘ l/xmin(Pk+l) _<_ 1/(co+ clk) 5 cz/(k+c3) by lemma A1. 4. From lemmas A1. 2 and A1. 5 80 ||Pk+1Pk:(11 H < 1 (WPl) and HP 5 c2/ (k + c3) (WPl). k+1H Only system identifiability was used to establish these properties. These results will be useful in establishing convergence of the identi- fication algorithm. Equation A1. 3 may be rewritten as k -1 -1 = + P - D Ek+1 Pk+lpk-q El:<.q k+1j—E qu Sj (yj+1 j“) by recursive substitution. From this equation the product 6 E T is written as k+1 k+1 T 1 T 1 k T 1 E = - - +2 D - 6k+1 k+1 Pk+1Pk- -q k q k qu qu+1 Pk+1j_123 q j Sj ‘(Yj+1 T 1 k T 1 _ E ' D ' .. Dj“) k-q P-kqpk+l 11+1?" '5' (yj+l Dj“) J=k-q k T -1 . 2: (y. -D.<1) s. P A1.5 . +1 k+1 J=k_q J J J Utilizing assumption 3. 2.A2, the expectation of the cross product .term in A1. 5 vanishes since T -1 TP -1 2 E{ Pk+l? Dj Sj (Yj+l Dj-MEk q k qu+1} T -1 _ 2E{Pk+lE[§:Dj sj (Hjl-Djk-Ta)e:qk/Y’VR]P1<+1} = 2E{P 2E{DTS'1E[(y -Da)/yJ v316P'1}P } k+1 j j j+1 k-q P--kq k+1 5 =0. 81 Therefore T _1 T "l E{GkHEkH} ' E{Pk+1Pk-qu-qk-qu-qpk+1} T -1 T ‘1 - D - + E{Pk+l :2 DJ' SJ (yj+1 5a)?(yj+1 Din) $5 PHI} Taking the norm of both sides of this equation and using the triangle inequality T -l T -l ”E(Ek+1€ k+1)H E HE{Pk+1Pk-q Ek-qu-qu-qpki-l}H T -1 T -1 HE{Pk+l ?Dj Sj (Yj+1'Dj°)§3(yj+1'Dj°) Sj Pk+l}” Applying this result, Lemma Al. 2 and Lemma Al. 5, 2 C2 T T>H+ —HE{ZD 51(r (k+c3z) j j j J“ T HE_0thenxi->Oasi->00. Proof: Recursively substituting into the inequality for xi +1 gives 82 X. < a.x.+ . 1+1— 1 1 B1 < — aiai-lxi-l+ “1614+ pi x. < (1.11. ...ax+a.a. ...a +... 1+1 - 11-1 0 0 11-1 lflo + “1‘11..1‘31.2+ “1‘31 1+ ‘3' 1 1 i-l i x, < 11 a.x0+ E H (1 [3“? 131 j=0 J j:0 k=j+1 00 Since 11 (ii: 0 and all partial products are bounded, for any m > 0 m-l i m-l i E H akfij _<_ flj max H Gk j=0 k=j+1 j=0 OfijEm-l k=j+1 and i i i i E I'[ akfl. E Z [3. max 11 o. j=m k=j+1 J j=m J m:j_<_i k:j+l Given any 6 > 0 there exists m sufficiently large so that since 2 [3. < 00. Thus for m sufficiently large j i i m-l i E II fink < 6/3 + E 5. max l'I ok-l-fi. j=.0 k:j+l J j=0 J O:j_<_m-1 k=j+l For fixed m, m-l i Z [3. max 11 (1k < 6/3 j=0 J o_<_j_<_m-1 k=j+l 83 for i sufficiently large, since Haj = 0. Also Bi < 6/3 since 23 [31 <.oo. Therefore 1 1-1 1 n «1.x + 2: n o. p.+p. < E/3+E/3+E/3=Efor largei J o . . RJ 1 :0 3:0 k=3+1 1 implies x. -> O as i ->oo since x. > O. [DV][WOL]. 1 1- Assumption 3. 2.A3 and the previous results give the first conifer gence theor em. Convergence Theorem I: If the system is identifiable and assump- tions 3. 2.A1, 3. 2.A2, 3. 2.A3 hold, then the algorithm A1. 1, A1. 2 converges to the true value of the parameter vector u in the sense that Hcov(€k)H = Hcov(&k- a” -> O as k -> 00. P f.F A16 d3ZA3HEE eTH< 2HEE 6TH r00 - mm ' an ° ° (k+1 k+1) - C1 (k-q k-q) 2 +c'2/(k+c3) where0oo k+1 ’ A V 2 —> -> E{Hak+l-aH } 0 ask oo . Theorem III is proved using assumptions 3. 2.A1, 3. 2.A6, and strong identifiability. This latter condition is stronger than identi- fiability and follows from lemma Al. 4, which states that the minimum k eigenvalue of 2‘, D.TS.-1D. approaches infinity at a rate greater than i=0 or equal to c-k. Strong identifiability requires that this rate be strictly greater than c- k, or at least equal to c- kp, p > 1. Consider- ing the inequalities used in the proof of lemma A1. 4 and the random nature of the system variables, this is not a stringent requirement. Lemma A1. 7: If the system is strongly identifiable, IIP II 5 c4/(kp+c3>, (W131) k+1 where 0‘min(pk+l) 1 k T 1 <1/[1, EV y+1 .(2:D.&'DJ] — mm 0 mln j=0 J J J _<_1/[c1+c2kp] E c4/(kp+c3), 1 < p < oo , O < c , c < 00 with probability one, by assumption. 3 4 00 ° < < < : . Lemma Al. 8 . If xk+1_ akxk+ [3k and 0 _ x0 00, II Gk 0 With 00 k=0 all partial products uniformly bounded, 2 pk < 00, 13k _>_ 0, all k=0 with probability one, then xk -> O as k -> oo WPl. Proof: This proof follows that of lemma Al. 6 except that statements hold with probability one (WPl ). This lemma is stated. by Fukao [FK- 3] without proof. Convergence Theorem 111: If the system is strongly identifiable and if assumptions 3. 2.A1, 3. 2.A6 hold then the algorithm Al.1, Al. 2 converges to the true value of the parameter vector u in the sense that ”E -aH—>O as k—»oo (WPl). k+1H : Hek+1 Proof: Using the triangle inequality for norms in A1. 3 -1 T II:IIP P II-II6k_qII+IIP H Ek+1 k+1 k-q k k+1II -II 2 DJ. -1 S. (y. - Du)” . +1 J:k_q J J 87 Lemma A1. 2, strong identifiability, and 3. 2.A6 give IIEkHII 5 clllemII + cs/(kp+ c3) where00 (WPl) as k-> oo . k+1H : HGk+l This theorem is equivalent to one given by Fukao [FK-3] but the proof is distinct. APPENDIX 11 THE REPRODUCING NORMAL AND ITS QUADRATIC FORM In S ec. 3.1, the a—posteriori distribution of the para- meter vector u wa 3 given as gaussian with mean value and co- variance given by 3.1 .11 and 3.1 .12, reSpectively, and followed from the reproducing property of the gaussian distribution and minimization of a quadratic form. The details of this derivation are given in this appendix. From 3.1. 9 and 3.1 .10 the exponent of the numerator equals (within a- l/ 2 factor) T -1 T -1 T -1 T -1 _ 2 A 0' {Pk +Dk Sk D13“ ‘1 {Pk+1ak+ Dk Sk yk+1} {A T "IA T -1 4. A2. “‘kPk “U yk+1Sk yk+l} 1 Since the integration in the denominator 3.1. 9 is with reSpect to o, the third term in this quadratic form cancels. Completing the square of the remaining part of AZ.1 gives 88 89 T -1 T -1 T -1. T -1 - Z : ‘1 {Pk +Dk Sk Dk}“ ‘1 {Pk+1“k+ Dk Sk Yk+1} -1 T —1 -1 -1,\ T —1 T -1 T -1 ((1 -[Pk +Dk Sk Dk] [Pk ak+Dksk yk+l]) [Pk +Dksk Dk] -1 T -1 1,. T -1 -1 - x(o. - [Pk +Dk Sk Dk] [Pk (1k+Dk Sk yk+1] -1 T -l -l -1 '1/\ T -1 -[P (1 +13k Sk Dk] [Pk +DkSk Dk] [Pk c1k+Dk Sk yk+l] A2. 2 which is the exponent of the numerator and the exponent under the integral sign of 3.1. 9. Integrating the perfect square in (1 leaves the second term of A2. 2, which cancels with an equal term in the numerator. The resulting exponent of the a—posteriori distribution is the quadratic form -1 T -l --1 '1/\ T -1 (a - [Pk +Dk Sk Dk] [Pk c1k+Dk Sk yk+1 -l T -1 -1 -1A T -1 x (a - [Pk +Dk Sk Dk] [Pk uk+ Dk Sk yk+l]) and is a perfect square. Therefore the a-posterior distribution is gaus sian with mean A -1 T -1 -1 1A T 1 : , Z “k+1 [Pk + 13k Sk Dk] [P k+ Pk Sk yk 1] A . 3 and covariance -l 1 - T -1 = 2 P Pk+DkSkD A.4 90 Equation AZ. 3 is the same as 3.1.11 and can be written in the form 3.1.13. Using a matrix inversion lemma [HOU][AO-1] , A2.4 can be written in the form 3.1.14. The mean 3k“ can also be obtained directly from A2. 1 by minimization of that quadratic form with respect to o, once the above gaussian reproducing property is established. [AH] [AHL] [ AND] [AO-l] [AC-2] [ALB] [ARN] [AT] [AY] BIBLIOGRAPHY Ahmed, N. and S. Karni, "On Obtaining Transfer Functions from Gain Function Derivatives, " IEEE Trans. on Automatic Control, AC-12, No. 2, p. 229, April 1967. Ahlfors, L. V., Complex Analysis, " 2nd Ed., McGraw-Hill Book Co. , Inc. , New York, 196 6. Anderson, G. W., R. N. Buland, and G. R. Cooper, "Use of Cross Correlation in an Adaptive Contro 1 System, ”Proc. Nat'l. Elec. Conf., Vol. 15, Oct. 1959. Aoki, M., Citimization of Stochastic Systems, Academic Press, New York, 1967. Aoki, M. , "On Some Convergence Questions in Bayesian Optimization Problems, ” IEEE Trans. on Automatic Control, Vol. 10, No. 2, pp. 180- 182, April 1965. Albert, A. , ”Nonlinear Regression and Stochastic Approximation, " IEEE Conv. Rec'd, 1967. Arnold, C. R. and K. Narendra, "The Characteri- zation and Identification of Systems, " Tech. Rept. No. 471, Cruft Lab., Harvard Univ., Cambridge, Mass., June 18, 1965. Athans, M. and P. Falb, OJLtimal Control, McGraw- Hill Book Co., Inc., New York, 1966. Ayer, Alfred J. , "The A PRIORI, " in Philosophy of Mathematics; Selected Readings, p. 289-301, Eds. P. Benecerraf and H. Putman, Prentice Hall, Inc., Englewood Cliffs, New Jersey, 1964. 91 [BAL-H [BAIPZ] [BEL-l] [BEL-2] [ BLU] [BX] [CAR] [ CH] [ CHU] [cu-u [CU-Z] 92 Balakrishnan, A. V., "Identification of Control Systems from Input-Output Data, " Proc. IFAC Conf. on Identi- fication in Automatic Control Systems, Prague, Czecho- slovakia, June 1967. Also in SIAM J. Control, Vol. 5, No. 3, Aug. 1967. Balakrishnan, A. V. , "Determination of Nonlinear Sys- tems from Input-Output Data, " Proc. Princeton Conf. on Identification Problems in Communication and Control, Mar. 1963. Bellman, R., B. Gluss, and R. Roth, "Identification of Differential Systems with Time Varying Coefficients, " RAND Corp., RM-4288-PR, Nov. 1964. Bellman, R., R. Kalaba, and R. Sridhar, "Adaptive Control via Quasilinearization and Differential Approxi- mation, " RAND Corp. , RM-39Z8-PR, Nov. 1963. Blum, J. R. , "Approximation Methods which Converge with Probability One, " Annals. Math. Stat. , Vol. 25, pp. 382-386, 1954. Boxer, R. and S. Thaler, "A Simplified Method of Solving Linear and Nonlinear Systems, " Proc. IRE, Jan. 1956. Carnap, R., Logical Foundations of Probability, Chi- cago, Univ. of Chicago Press, 1950. Chen, C. F. and B. L. Philips, "Accurate Determi- nation of Complex Root Transfer Functions from Frequency Response Data, " IEEE Trans. on Automatic Control, AC-lO, No. 3, p. 356, July 1965. Chung, K. L. , "On a Stochastic Approximation Method, ” Annals of Math. Stat., Vol. 25, pp. 463-483, 1954. Cuenod, M. and A. P. Sage, HComparison of Some Methods Used for Process Identification in Automatic Control Systems, Proc. IFAC Symposium on Identi- fication in Automatic Control Systems, Prague, June 1967. Cuenod, M. and A. E. Durling, An Introduction to hnpulse Analysis, Academic Press, New York 1967. [DA] [DAL] [DE] [DU] [DV] [EY] [FU-l] [FU-Z] [FK-l] [FK-Z] [PK-3] [FR] [HANU 93 Davenport, W. B. and W. L. Root, An Introduction to the Theory of Random Signals and Noise, McGraw- Hill Book Co. , Inc. , New York, 1958. Daly, R. F. , "The Adaptive Binary Detection Problem on the Real Line, " Stanford Electron Lab. Rept. , TR 2003-3, Feb. 1962. Deutsch, R., Estimation Theory, Prentice-Hall, Engle- wood Cliffs, New Jersey, 1965. Dubes, R. C., Theory of Applied Probability, Prentice- Hall, Englewood Cliffs, New Jersey, 1968. Dvoretzky, A. , "On Stochastic Approximation, " Proc. Third Berkeley Symposium on Mathematical Statistics and Probability I, pp. 39-55, Univ. of Cal. Press., Berkeley, Cal. , 1956. Eykhoff, P. , ”Process Parameter and State Estimation, " Proc. IFAC Symmsium on Identification in Automatic Control Systems, Prague, 1967. Fu, K. S. , "Learning Control Systems, " Computer and Information Sciences, pp. 318-343, J. T. Tou and R. H. Wilcox, eds., Spartan Books, Washington, D. C., 1964. Fu, K. S. and K. Kirvaitis, "Identification of Nonlinear Systems by Stochastic Approximation, " Proc. 1966 JACC, pp. 255-264. Fukao, T. , "Some Fundamental Properties of Adaptive Control Processes (1), "Bull. of Electrotech. Lab., Vol. 28, No. 1, pp. 1-19, Jan. 1964. Fukao, T. , "System Identification by Bayesian Learning, " I, 11, Bull. Electrotech. Lab. 29, No. 5, p. 364, Tokyo, 1965. Fukao, T. , "System Identification by Bayesian Learning, " Proc. Tokyo IFAC Symposium, pp. 137-146, 1965. Frame, J. S. , "Matrix Functions and Applications, " IEEE Spectrum, vol. 1, March-July 1964. Hamming, R. W., Numerical Methods for Scientists and Engineers, McGraw-Hill Book Company, Inc., New York, 1962. [ HAN] [HE] [ HS] [110-1] [110-2] [110-3] [Ho—4] [HOU] [ IFAC] [K-1] [K-Z] 94 Hancock, J. C. and P. A. Wintz, Signal Detection Theory, McGraw-Hill Book Co. , Inc. , New York, 1966. Hempel, C. G. , Fundamentals of Concept Formation in Empirical Science, Chicago: Univ. of Chicago Press, 1952. Hsieh, H. C. , "Least Squares Estimation of Linear and Nonlinear System Weighting Function Matrices, ”J. Info. and Control, Vol. 7, March 1964. Also in Computing Methods in Optimal Control, A. Balakrishnan, and L. Neustadt, eds., Academic Press, New York, 1966. Ho, Y. C. and B. H. Whalen, "An Approach to the Identification and Control of Linear Dynamic Systems with Unknown Parameters, " IEEE Trans. on Automatic Control, Vol. 8, No. 3, pp. 225-256, July 1963. Ho, Y. C. and R. C. K.Lee, "Identification of Linear Dynamic Systems, " Jrnl. of Information and Control, Vol. 8, pp. 93-110, Feb. 1965. Ho, Y. C. and R. C. K. Lee, "Identification of Linear Dynamic Systems, " Proc. N.E.C. , 1964. Ho, Y. C. , "On Optimal Filtering and Stochastic Approxi- mation, " J. Math. Analysis and Applications, Vol. 6, No. 1, pp. 152-154, 1963. Householder, A. S. , The Theory of Matrices in Nurneri- cal Analysis, Blaisdell Publishing Co. , New York, 1964. Proc. of Prague IFAC Conference on Identification in Automatic Control Systems, Prague, Czechoslovakia, June 1967. Kalman, R. , "Mathematical Description of Linear Dynamical Systems," S.I.A.M. Control, Vol. 1, No. 2, 19630 Kalman, R . , "A New Approach to Linear Filtering and Prediction Problems, " Trans. ASME, ser. D, J. Basic En ., 82, pp. 35-45, 1960. __8___ [K-31 [ KAI] [KIE] [KIR] [ LAN] [LE-1] [LE—2] [LEE] [LEV] [MA] [MG] 95 Kalman, R. and R. S. Bucy, ”New Results in Linear Filtering and Prediction Theory, ” Trans. ASME ser. D, J. Basic Eng. 83, pp. 95-108, Mar. 1961. Kailath, T., ”Correlation Detection of Signals Perturbed by a Random Channel, IRE Trans., Vol. IT-6, No. 3, pp. 361-366, June 1960. Kiefer, J. and J. Wolfowitz, "Stochastic Estimation of the Maximum of a Regression Function, " Annals. Math. Stat. , Vol. 23, pp. 462-466, 1952. Kirvaitis, K. , “Identification of Nonlinear Systems by Stochastic Approximation, " Ph. D. Thesis, Purdue Univ. , Aug. 196 5. Lanning, J. H. and R. H. Battin, Random Processes in Automatic Control, McGraw-Hill Book Co., Inc., New York, 1956. Lee, Y. W. and M. Schetzen, ”Measurement of the Kernels of a Nonlinear System by Cross Correlation, " Int. Jrnl. of Control, 1965. Lee, Y. W. and M. Schetzen, "Some Aspects of the Wiener Theory of Nonlinear Systems, " Proc.N. E. C. , 1965. Lee, R. C. K., Optimal Estimation, Identification and Control, Research Monograph, No. 28, M. I. T. Press, Cambridge, Mass., 1964. Levin, M. J. , ”Estimation of a System Pulse Transfer Function in the Presence of Noise, " IEEE Trans.“ on Automatic Control, AC-9, No. 3, July 1964. Mayne, D. Q. , ”Optimal Non-Stationary Estimation of the Parameters of a Linear System with Gaussian Inputs, " J. Electr. Control, pp. 101-112, Jan. 1963. McGhee, R. B. , "Identification of Non-Linear Dynamic Systems by Regression Analysis Methods, " Ph.D. Thesis, Univ. of So. Cal., June 1963. [ POP] [ PAP] [PU] [Q] [RAU] [ROB] [ROS] [SAK-l] [SAK-Z] [SAR] [SC] 96 Popper, K. R. , The Loac of Scientific Discovery, Routeledge and Kegan Paul, London, 1951. Papoulis, A. , Probability, Random Variables and Sto- chastic Processes, McGraw-Hill Book Co. , Inc. , New York, 1965. Puri, N. W. and C. N. Weygandt, "Transfer Function Tracking of a Linear Time-Varying System by Means of Auxiliary Simple Lag Networks, " IEEE Trans. on Appl. and Ind., vol. 83, pp. 70-72, Jan. 1964. Quine, W. V. , "On What There Is, " Philosophy of Mathematics: Selected Readings, pp. 183-196, P. Benecerraf and H. Putnam eds. , Prentice-Hall, Englewood Cliffs, New Jersey, 1964. Rauch, Tung, and Striebel, "On Maximum Likelihood Estimation of Linear Systems, " Lockheed Missile and Space Co. Tech. Rpt., June 5, 1963. Robbins, H. and S. Monro, "A Stochastic Approxima- tion Method, " Annals. of Math. Stat., Vol. 22, pp. 400- 407, 1951. Rosen, J. B. , "Stability and Bounds for Nonlinear Systems of Difference and Differential Equations, " Jrnl. Math. Anal. and App1., Vol. 2, pp. 370-393, 1961. Sakrison, D. J. , ”The Use of Stochastic Approximation to Solve the System Identification Problem, " IEEE Trans. on Automatic Control, AC-12, No. 5, pp. 563- 567, Oct. 1967. Sakrison, D. J. , "Stochastic Approximations: A Recursive Method for Solving Regression Problems, " Advances in Communication Systems, Vol. 2, 1966. , Academic Press, New York, 1966. Saridis, G. N. and G. Stein, "Stochastic Approxima- tion Algorithms for Linear Discrete-Time System Identification, " Proc. NEC, 1967. Schreider, Methods of Statistical Testing, Elsevier Publishing Co. , New York, 1964. [SCH] [SG-l] [so—z] [SK] [3%U [SP] [STU [TSY] [WE—1] [WE-2] PW] [WOL] 97 Scheffler, I. , The Anatomy of Inquiry: Philosophical Studies in the Theory of Science, A. Knopf Publ. Co., New York, 1963. Sage, A. B. and W. C. Choate, "Minimum Time Identification of Non-Stationary Dynamic Processes, " Proc. N.E.C., 1965. Sage, A. P. and B. R. Eisenberg, "Experiments in Nonlinear and Non-Stationary System Identification via Quasilinearization and Differential Approximation, " Proc. J.A.C.C., 1966. Sklansky, J. , "Learning Systems for Automatic Control, " IEEE Trans. on Automatic Control, Vol. AC-11, No. 1, Jan. 1966. Smith. B. W. , "Parameter Estimation in the Presence of Measurement Noise, " Int. Jrnl. of Control, Vol. 3, NO. 4, pp. 297-312, 19660 Spragins, J. D. , "A Note on the Iterative Application of Bayes' Rule, " LEEE Trans. on Info. Theory, IT-ll, No. 4, pp. 544-549, Oct. 1965. Steiglitz, and Mcbridge, "A Technique for Identification of Linear Systems, " IEEE Trans. Automatic Control, AC-lO, Oct. 1965. Tsypkin, Y. Z., "Adaptation, Learning and Selflearning in Control Systems, " Third IFAC Conjress, Buttenworths, London, June 1966. Wiener, N. , The Extrapolation, Interpolation and Smooth- i_r1g of Time Series, John Wiley and Sons, New York, 1949. Wiener, N. , Nonlinear Problems in Random Theory, John Wiley and Sons, New York, 1958. Wilde, D. J., Optimum Seeking Methods, Prentice- Hall, Englewood Cliffs, New Jersey, 1964. Wolverton, C. T., and J. T. Rawgen, " A Counter- example to Dvortetzky's Stochastic Approximation Theoreml' I.E.E.E. Trans., IT-14, No. 1, pp. 157- 158, Jan. 1968. 98 [Z-l] Zadeh, L. A. , ”From Circuit Theory to System Theory, " Proc. IRE, 50, pp. 856-865, May 1962. [Z-Z] Zadeh, L. A. and C. A. Desoer, Linear System Theory: The State Space Approach, McGraw-Hill Book Co. , Inc. , New York, 1963. nICHIan STATE UNIV. LIBRARIES IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 31293009958731