EDENTIFTCATTON OF SYSTEMS 0F UNKNOWN ORDER Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSTTY MARTIN W. SCHWARTZ ' 1972 LIBRARY Michigan S tam University This is to certify that the thesis entitled ‘ I DENTIFI CA "i‘ION OF SYSTEMS OF UNKNOWN ORDER presented by I ‘ Martin W. Schwartz has been accepted towards fulfillment ' i of the requirements for ‘ Ph. D . degree in SYS ' j WWW Major professor Date W 6:, Z 772' 0-7639 ABSTRACT IDENTIFICATION OF SYSTEMS 0F UNKNOWN ORDER By Martin W. Schwartz The primary purpose of this work is to determine the order of a discrete-time system from input and output observations. Several tests, based on the properties of dynamic systems, are given. The best of them is the determinants of the models' controllability matrices normalized by the product of their input coefficients. This and the other new tests compare favorably with existing tests, based on the prOperties of the least squares estimator, on simulated examples and observed data taken from an electric power system. Theoretical prOperties of the least squares estimator when the order of the model is not equal to the minimal order of the system are also developed. When the model order is higher than the system order in a noise free system, extra poles and zeros are added to the transfer function; all of these are on the same side of the unit circle. When noise is present, the estimator still converges, but to an unknown realization. It is also shown that the small sample estimates might theoretically differ greatly from the noise-free estimates in such Situations. However, numerical results are quite close. Martin W. Schwartz Variants of the standard least squares estimator are developed and compared. A starting algorithm for on-line estima- tion, based on matrix pseudo-inverses is also developed. IDENTIFICATION OF SYSTEMS OF UNKNOWN ORDER By 2’» Martin WSVSchwartz A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering and Systems Science 1972 To Joyce who makes it all worthwhile ii ACKNOWLEDGEMENTS The author wishes to express his sincere gratitude to Dr. G.L. Park, his thesis advisor for his continued assistance and encouragement during the course of this research. Thanks are also due to Dr. R.O. Barr, Dr. J.S. Frame, Dr. A. Bacopolous, and Dr. J.B. Kreer for their interest and critical reviews of the thesis and their excellent teaching. The author also appreciates the financial support awarded through the Consumers Power Company-Michigan State University Cooperative Research Agreement. iii Chapter I II III IV TABLE OF CONTENTS ABSTRACT LIST OF FIGURES LIST OF TABLES INTRODUCTION 1.1 Motivation 1.2 Background 1.2.1 Equivalent Systems and Identifiability 1.2.2 Identification of Linear Systems LEAST SQUARES ALGORITHMS 2.1 Statement of the Problem 2.2 Large-sample PrOperties 2.2.1 Uncorrelated Noise 2.2.2 Correlated Noise 2.3 Small-sample Properties 2.4 Systems of Unknown Order 2.4.1 No Noise 2.4.2 Noise 2.4.3 Tests for the Order of the System 2.5 Normalized Least Squares 2.6 Summary NUMERICAL COMPUTATIONS 3.1 Description of the Algorithms Used 3.2 Description of the Examples 3.3 Computed Results CONCLUSIONS 4.1 Thesis Results 4.2 Future Development APPENDIX BIBLIOGRAPHY iv Page vi l'-‘ boson—I 18 21 21 23 24 26 27 30 33 39 41 42 42 46 48 69 70 77 LIST OF FIGURES Figure Page 1 Norm of e vs. Extra Pole 32 2 Determinants of Controllability Matrices 51 3 Ratio of Controllability Determinants (full matrix lse) 52 4 Order Tests 53 5 Mean Square Residuals 54 6 Mean Squared Residual for Example 2 60 7 Noramlized Controllability Determinant for Example 2 61 8 Mean Squared Residual -- Example 3 65 9 Controllability Determinants -- Example 3 66 Table 10 11 12 13 LIST OF TABLES Equivalent Realizations of a System Order Tests for Example 1 Critical Values for the F-test F-test for Example 1 Estimates of the System Order Order Test Summary for Example 2 F-tests for Example 2 Order Tests for Example 2 Poles and Zeros for Example 2 Order Test Summary for Example 3 Order Tests for Example 3 F-test for Example 3 Poles and Zeros for Example 3 vi Page 13 50 55 56 57 57 58 59 63 67 67 CHAPTER I INTRODUCTION 1.1 Motivation The following problem arose in a study of an electric power system: find a simple equation which represents the be- havior of a high-voltage tieline connecting two areas. Such a mathematical model would be incorporated in simulation models used in design of the system and would be studied to possibly provide better control of the system. The Standard theoretical treatments of these tielines (E2) requires knowledge of parameters which are often not available and some simplifying assumptions which were questioned. Thus, it was decided to measure the power and fre- quency (the relevant variables) on a tieline and form a model using these measurements. This is an example of a common problem we will call the identification problem; namely, to obtain information about a complicated real-world dynamic system from discrete-time or sampled measurements of the relevant input and output variables. Using known information and enough simplifying assumptions to make the problem computationally feasible, a mathematical model with unknown parameters is constructed, and the parameters are estimated. If the model is accurate, some properties of the system can be determined within certain accuracies. Specifically, consider a single-input, single-output (SISO) dynamic system which is linear and time-invariant. If the order n is known, the problem is to estimate the coefficients of the difference equation: n n — + (1'1°1) 5t 2l aijt-i 2l biut-i where y is the output, u the input, and yt = y(t) . For example, a model for one tieline could be (1.1.2) f = .59ft t + .405ft + . 3 + . 2 OO pt_1 00 pt_ -1 -2 2 where f is frequency and p is power. Sometimes the input is a test signal, sometimes not; the system may be initially at rest, or may not; and we may have knowledge or can make assumptions about the measurement errors and plant noise. In each Situation some method such as least squares, GausséMarkov (L1), maximum likelihood (A4), Kalman-Ho (K2), or stochastic approximation (M3) is available to generate the estimates. Going back to the Specific problem of modeling the load- frequency behavior of an electric power tieline, the measurements must be taken during normal operation, and noise is present as measurement error in both input (power) and output (frequency) and as an inherent stochastic element of the system (the system loads and configuration are partly random) (P1). While the model can be taken as linear and time-invariant during a short period under normal ranges of the variables, several factors arise that violate the usual assumptions of existing methods. First, the stochastic assumptions necessary to insure unbiased estimates are violated. Second, we do not have Sufficient information about the system to obtain the order of the model or relations between the parameters. Third, we have very limited information about the characteristics of the noise. Finally, the output and, to a lesser extent, the input are almost constant. The fourth factor is important. The usual methods for testing the fit of a model calculate the residuals; if they are small, the model is considered good. But here the residuals alone do not constitute a sufficient test for the validity of the model because any reasonable identification scheme will yield small residuals. For example, if a constant model equal to the nominal value of 60.000Hz is taken, one set of data has an rms residual of .021 of the estimate. But our goal is to derive some information about the structure of the system, and not just to estimate the output. Thus, the almost constant data implies that we must look for additional criteria to use in testing the model fit. The constancy causes additional problems. Many identifica- tion.methods involve solution of a set of equations of the form Y =IXA derived from (1.1.1). Several columns of X will contain output values (which in the above example vary from 59.980 to 60.017Hz), and these columns will be almost linearly dependent. Therefore, the solution will be very sensitive to small perturbations in the data and to round-off errors in the numerical technique. Another aspect of the problem is that the high intercorrelation of the columns of X makes it very difficult to determine empirically a reasonable order for the model. This is important because, as shown in Section 1.2.1, different models may exhibit identical input-output behavior and one might be stable and the other unstable, one controllable and the other uncontrollable, or have other divergent prOperties. The following example from numerical analysis (12) shows this problem in a slightly different context. When solving the differential equation dy/dt = -y. y(0) = yo with a third-order difference approximation (of a specified form) for the derivative, the difference equation yt + (3/2 +3h)yt_1 - 3yt_2 + %yt_3 = O is obtained, where h is the sampling interval. Three distinct solutions of the form yt = at are obtained. All of the 8's satisfy the equation 3 2 a + (3/2 + 3h)a - 3a +'% = O . For sufficiently small h, one eigenvalue is less than -2. Thus the difference solution becomes unbounded while the true solution approaches zero. The problem is due solely to the addition of the extraneous eigenvalues caused by too large a model order. Our primary motivation is solving the practical problem of how to choose a model. In doing so we will not always be rigorous from a probabilistic interpretation. This is deliberate and is common in developing numerical techniques; Powell (P2), when discussing methods for finding the maximum of a function F(x1,x2,...,xn), says: The lack of definiteness in stating the conditions on F(x ,x ,...,x ) is deliberate, because we are l 2 n describing the current state of optimization, and it happens that the current state is not a logical structure of theorems. Instead it has developed from an assortment of numerical methods which have been devised because real problems had to be solved, and at present the actual success of the algorithms is far ahead of any theoretical predictions. The model-order problem is much less developed than the maximiza- tion problem; numerical, rather than theoretical, results for the model order problem are emphasized. There are some new theoretical results in this thesis: Theorem 2 on the asymptotic convergence of least-squares estimators with correlated noise; Theorem 3 on small-sample properties with white noise, and Theorem 4 on the location of the additional poles with too-high a model order and no noise. The algorithm in Chapter III for the pseudo-inverse of the sum of matrices of a certain form is also new, and a variant of the least squares algorithm.which we call the reduced least squares algorithm.has not been explicitly discussed in the literature. However, for the solution of the practical problem, the main contribution is the application of the theory of equivalent representations to the study of the identification problem. The nature of the problem precludes a logical structure of theorems at the present time. The rest of this chapter and most of the next develOp theory necessary to the understanding of the tests for model order and their numerical comparison. Relevant literature is reviewed when the appropriate subject is discussed and is often juxtaposed with the new material; this is true, in particular, for the section on the model-order problem. 1.2 Background In this section we will define and discuss several con- cepts necessary to define the problem more clearly and to compare each method with others. The first part distinguishes the dif- ferent ways we can represent a system and discusses related ques- tions; although the discussion will be for linear, discrete, time- invariant 8180 systems, most of the ideas are valid for multivariate and time-varying ones. The next part describes the different loss or error criteria we can use for identification. The last part is a brief outline of several significant identification algorithms and how they become more elegant and efficient as we have more information and more control over how the information is obtained. It is necessary in what follows to distinguish between a real~world system, a mathematical system, and a mathematical model of a system. A real-world system is, of course, what we find in nature and almost always lacks desirable prOpertieS (such as true time-invariance) that mathematical systems can have. In Section 1.2.1 we will distinguish between a mathematical system and a model or representation of it. This distinction is more subtle and is more important because, although we are developing algorithms with the intent that they will be applied to real-world problems, much of the time we will have to talk about models of mathematical systems. 1.2.1 Equivalent realizations and identifiability In the remaining chapters we talk about "the order of the system" and "the order of the model" as if they were well-defined concepts, and it is important that they be well defined. The most obvious definition (for either concept) would be the largest lag in the difference equation, i.e. the parameter n in equa- tion (1.1.1), but this is not sufficient because it does not dis- tinguish between difference equations which have identical input- output behavior. This problem can be thought of (Cl, K1) as pole-zero can- cellation of the transfer-function T(z) relating input to output. discrete transform function is the z-transform (01) of the dif- ferential equation representation of the system. The z-transform is the discrete version of the LaPlace transform and has many of the same properties. An example of the use of the z-transform and the corresponding derivation using the difference equation is in the Appendix. If we derive T(z) from theoretical considera- tions and get a pole-zero pair, then it is well known (61) that if this pole is unstable, then the real system will act as though it was unstable even though classical z- (or LaPlace) transform theory tells us that we may cancel that term in the mathematical model and look at just the other poles. The reasons for this deviant behavior are that real components are not linear over all possible input-output values (C2) and that they do not exhibit exact pole-zero cancellation (Kl). A simple example (C2), using a continuous time dynamic model, demonstrates this. Let us set up the system with total transfer function T(S) = l/(S-l) using two components in series such that the first has T1(s) = l/(s+l) and the second The T2(8) = (s+l)/(s-l), so that T(S) = T1(S)T2(S). With zero initial conditions and unit step input, theoretically y(t) = l - exp(-t) the same as if we had used only one integrator. Real components, however, are linear for only a limited range of their input and output. Thus, after a while, the output of the first component or the input of the second would be overloaded, and the observed behavior would deviate markedly from the theoretical. This is true even if all the components are perfect; if the pole and zero do not match exactly, or if there are slight perturbations in the system, the effect is increased. However, we are deriving T(z) from observed data, not theoretical considerations. As explained later in the section the input-output data gives us information only about the con- trollable, observable part of the system, and this part has no pole-zero pairs. If everything is exact, and we take the model order to be larger than the system order, the model transfer function will have pole-zero pairs (see Section 2.4.1) and can- celling these will yield the transfer function of the (con- trollable, observable part of the) system; here we cancel the pole-zero pairs because any occurence of such pairs from input- output data is due to a flaw in the identification algorithm or an error we have made because we cannot identify those pairs which do cancel. If there are stochastic elements, then we do not get exact pole-zero pairs because of the identification errors, but we would like to guarantee that the behavior is not much dif- ferent than the behavior of the noise-free system. We now have two reasons for examining concepts related to equivalent representations. First, we want to have an unambiguous statement of what "the order of the system" means. Second, we must be able to compare two systems and/or models and be able to say when their behavior is identical or close. There are three different ways of representing the rela- tion between the input and output whidh we shall use: difference equation, transfer function, and state-Space model. Each of these has advantages and disadvantages, some of which are discussed below. But first let us explicitly state them. The difference equation representation is n n . . = . + <1 2 1) yt 21 a y -1 z b.ut 1 where the 8i and bi are constants and n is the maximum lag. Its z-transform is (01) (1.2.2) T(Z) = There are several standard state-Space representations of the difference equation (1.2.1) (01), one of which is as follows: let :3‘ II b + a h (1.2.3) h = b + a h +...+ a h n n 1 Now define the State variables: x1(t) = yr x2(t) = x1(t+l) - hlut (1.2.4) ° xn(t) = xn_1(t+l) - hn_1ut 10 Then the state-Space model consists of state- and output-equations X(t+l) = AX(t) + But (1.2.5) yt = CX(t) where X(t) = col(x1(t),...,xn(t)) r6 1 ... o o7 O O ... O 0 (1.2.6) A = I O O ... 0 1 L?“ an”1 ... a2 aL B = col(h1,...,hn) C = (l,O,...,O) Two important concepts derived from state-space representa- tions that we will need for the model-order problem are control- lability and observability (G1, 01, K1). Formally, a state-space realization is controllable (observable) if its controllability matrix Qc (observability matrix Q0) (B. AB....,A“'IB) D II (1.2.7) - (CT,ATCT,...,(AT)n-1CT O I ) has full rank. Intuitively, it is observable if we can see what is happening internally (in the state vector) by looking at the output. It is controllable if we can manipulate the state using the input. The difference equation representation contains in- formation only about those parts of the system which are both ll controllable and observable. In the above example, the differential equation does not Show the output of the first integrator or the input to the second, and it does not allow us to manipulate them so that the desired input-output relation holds for all values. The state-space representation (1.2.3)-(1.2.6) has observability matrix equal to the identity. This can readily be seen because CT is the first column of the identity and (AT)kCT is the first column of (AT)k which is the kth column of AT and is equal to the kth column of the identity. To illustrate several ideas in this section we will use Example 1, Section 3.2, whose difference equation form is . . = . + . - o - o + .5 ° (1 2 8) yt 8yt-1 39yt_2 27yt_3 But-1 ut-Z + 1Ut-3 Its z-transform is -.522 + .52 + .1 z - .82 - .392 + .27 (1.2.9) T(z) _ -.5(z-l.17082) (z + .17082) “ (z-.9)(z+.6)(z-.5) And for the state-space representation we are using 0 (1.2.10) A = 0 C = (l O 0) 2 Got-Io II! II OHU‘ We are now ready to Show that (1.2.8) and the other representations are not unique, i.e. there are other difference equations which have identical input-output behavior. The material is partially contained in several references (A3, B5, R2, 21, 22) which differ among themselves in definitions; ours will also differ, 12 but will generally follow the later sources. We Shall define a system as an input sequence and its corresponding output sequence. The triplet (A, B, C), defined above, is called a (state-) realization of the system. Any system obviously has many realiza- tions such as (A, B, C) and its similar realizations (T-IAT, T-lB, CT) where T is any nonsingular matrix of appropriate size. We shall say that two realizations, S and S', are equivalent if, given an initial state x0 of S, there is an initial state x; of S' such that y(t; XON) = y'(t; xgw). More Simply, two realizations are equivalent if the same input string applied to both yields the same output string. Obviously, two Similar realizations are equivalent, not all equivalent realizations are similar; while all Similar realizations have the Same dimension state vector, not all equivalent realizations do. If there is no realization with a smaller-dimension state vector, then the realization is called minimal. A11 minimal realizations are similar. Finally, the order of the system is the dimension of the minimal realization. We will also call the coefficient vector of the difference equation form (1.2.1) a realization, keeping in mind that formally we mean the one obtained from (1.2.1) by using (1.2.3)-(1.2.6). If the order of the system is m, then there is no dif- ference equation with fewer than m lags which realizes the system. If we do not know m and try models with other maximum lags, n, then the models will also realize the system. Thus we 13 will use the following terminology to differentiate between the 'maximum lag and the order of the system: by "the order of the model" we will mean the largest lag of the difference equation in the realization being used. Table 1 illustrates the difference between "the order of the system" and "the order of the model" by showing several equivalent realizations of Example 1, where the order of the system is 3. Table 1. Equivalent Realizations of a System order of the model a1 a2 a3 84 b1 b2 b3 b4 3 .8 .39 -.27 --- -.5 .5 .1 --- 4 .8 .39 -.27 0 -.5 .5 .1 0 4 1.8 -.41 -.66 .27 -.5 1.0 -.4 -.1 4 -.466 -.657 .140 .090 -.5 .332 .267 .033 The following facts about minimal and non-minimal realiza- tions will be useful: 1) There is no pole-zero cancellation in the transfer function form (1.2.2) of the minimal realization; non-minimal realizations have pole-zero cancellation. (1.2.2) shows that the model of order 3 has no cancellation; the models of order 4 in Table l have additional factors of z, (z-l), and (z - .334021), respectively. This is the main advantage of form (1.2.2), since factoring the numerator and denominator of T(z) is the easiest way to determine if a realization is minimal. The main dis- advantage is that T(z) does not account for the initial conditions unless they are zero. l4 2) The difference equation form (1.2.1) of a minimal realization is unique; this is not true of non-minimal realiza- tions of fixed order as Table 1 illustrates. 3) Given any realization of a system and its con- trollability and observability matrices QC and Q0, then the order of the system is rankKQzOc). Realization (1.2.3)-(1.2.6) has the advantage that the order of the system it represents is equal to the rank of’ QC, making calculations easier. We will finish the discussion of equivalent realizations in Section 2.4, after some other relevant material has been pre- sented. The last question in this section concerns identifiability (B2, 82). A model is identifiable if its coefficients can be determined uniquely from the input-output data. This requirement of uniqueness is desirable to inhibit the estimator from oscillating between two correct coefficient vectors (i.e. between two realiza- tions). If m is less than n, there are many possible solutions and the uniqueness property is not satisfied. But there can be convergence without uniqueness (Bl); for example, the algorithm can converge to the realization with minimum norm. This is the case here (see Section 2.4.2), and we do not care very much which realization is approached. Thus we will procede to identify the parameters even if they are not identifiable in the formal sense. They will be identifiable in practice. 1.2.2 Identification of linear systems If we were extremely fortunate, we would be able to use a test signal for the input and observe the output without any noise. 15 For such situations, the Ho-Kalman algorithm (K2) provides an elegant method for determining the order of the system and the para- meter vector. Let the input be a pulse and let ”I 3'2 ”T y2 y3 ... (1.2.11) H = . . ... L: ":1 H is a Hankel matrix, i.e. its (i,j)th entry depends only on (i+j). Let its principal leading submatrix of dimension n be denoted by Hn. If the minimal order of the system is ‘m, then (1.2.12) det(Hn) = 0 if and only if n > m. This gives a method for determining the order of the system and a realization of it: > ll ‘H;1(sHm), where s is the shift operator (1.2.13) B col(l, O,..., O) c = cyl. y2,.... ym> The algorithm generalizes easily to the Situation where there are r inputs and p outputs. Then H is a block Hankel matrix with r X p blocks, and the above expressions for A, B, and C are pre- and post-multiplied by "editing" matrices, composed of the identity and zeros, which are easy to compute. If there is no noise, but a pulse input cannot be used, the method does not work because it is impossible to set up a matrix which has the properties of H which are needed. But there is still no problem because we can set up a system of linear equations of apprOpriate size and l6 solve the parameters. On the other hand, there are problems which have not been completely solved, whenever noise is present, no matter what the input. If there is noise, in general the equations are not con- sistent, so it is necessary to choose some criterion in making the estimates. The most common criterion is least squares, where the sum of the squared differences between the model estimates and the observed values is to be minimized. This method, elaborated in Chapter II, is the one we have chosen because it has several advan- tages. It is easy to compute the estimates, and it is essentially distribution free, i.e. it is not necessary to know the distribution of the stochastic elements a priori; it is unbiased as long as the noise is uncorrelated. Other methods, such as maximum-likelihood and the Bayes' method, which maximize probability density functions, require known density functions of the residuals. A set of variants of the least squares method is weighted least squares. Regular least squares solves a set of equations of the form. Y = XA derived from (1.1.1). Then weighted least squares solves WY = WXA, where W is a positive definite matrix. A well- known example of a weighting matrix is the inverse of the covariance matrix of the residuals. This variant is called the Gauss4Markov or Best Linear Unbiased (BLUE) estimator. Another variant is the instrumental variables method (A3), and a third is discussed in Section 2.5. These are the most common criteria in identification algorithms. There are some other criteria which cannot be used with a single estimate, but which can be used to compare different l7 estimates. For example, we might want the parameter estimates to be near their true values. Since we do not know the true values, parameter errors are not a basis for an identification algorithm, but we can compare the parameter estimates from different estimators. For dynamic systems, one way to compare them (which is independent of model order) is to compare the steady State response to a unit- step input. As shown in the Appendix, (1.2.14) yss = (z bi)/(1 - 2 81) where the a1 and bi are the parameters in the difference equa- tion (1.1.1). We can also compare the estimates of the poles and zeros of the transfer function of a dynamic system. This topic is discussed in detail in Section 2.4.3. CHAPTER II LEAST SQUARES ALGORITHMS In this chapter we will discuss least-squares estimation of discrete, single-input, single-output systems using different assumptions about the noise, when the order of the system is known and when it is not. Two variations of the standard technique are given: a new one which has improved small sample properties, and a weighted one. We will also discuss tests for the order of the system. 2.1 Statement of the Problem A deterministic system of order n can be represented in discrete time as the difference equation (2.1.1) x = z“ a.x n l 1 t-l + £1 biv t-i where input v and output x are given at equally spaced times. If stochastic elements are present, we shall write the equation to include modeling errors as _ n n (2.1.2) yt - 21 aiyt-i + £1 biut-i + et where et will be explicitly defined as appropriate for each theorem. After N observations, (2.1.2) is usually written in matrix form as 18 19 (2.1.3) YN = ¢Ne + EN’ where YN = col(yn+1, yn+2,..., yn+N)’ an N X 1 output vector EN = col(en+1, en+2,..., eniN)’ an. N x 1 noise vector 9 = col(a1,...,an, b1"°"bn)’ the Zn X l parameter vector, and (311:: :: : [fniN-l ... yN UnfiN-l ... u an N X 2n observation matrix. The least squares estimator (lse) 6N of e is given by (2.1.4) 9N = ngN where + denotes the Moore-Penrose generalized inverse. If N 2 2n and ¢N has full (column) rank, then, (2.1.5) 0N = (¢:¢N)‘1¢§YN The proof is straight-forward using Lagrange multipliers; the errors EN dr0p out of the solution because their derivatives with respect to the parameters are zero. The large and small sample properties of the lse are given in the next two sections; there it is shown that it is biased for a finite number of Samples even when it is asymptotically unbiased. The variation we call the reduced lse, which is unbiased for finite samples, can be obtained by deleting entries in YN’ EN, and ¢N as follows: let 20 YR = C01(yn+1’ YZ(n+1)3°°°’ yK(n+1)) and make the corresponding changes in EN and ¢N so that (2.1.6) YK =(bK9-l-EK When this is done, any yt, t = l,...,N = K(n+l), will appear just once in (2.1.6) while it usually appears (n+1) times in (2.1.3). For example, let n = 2 and N = 7. Then ”3’31 (3’2 )’1 L12 “17 F837 ya y3 y2 u3 u2 rail ea y5 Y4 y3 “4 U3 82 - es YN = ’6 = y5 ya u5 “4 b1 + es y7 y6 y5 ”6 u5 LPQ. e7 y8 y7 y6 u7 “8 e8 L79J L78 y7 u8 U7- LSQJ for the full matrix lse, and y3 y2 y1 u2 u1 rail e3 YK = y6 = y5 Y4 u5 “4 a2 + 86 y9 y8 y7 U8 u7 b1 89 . EZ‘ for the reduced matrix lse. The reduced estimator is + (2.1.7) 6K - (DKYK Section 2.4 discusses the lse's when the true order of the system is not known and studies the practical problem of choosing a model order; Section 2.5 gives another variation -- normalized least squares. 21 2.2 Large-sample Properties In this section the asymptotic prOperties of the lse of system (2.1.2) are derived under uncorrelated and correlated noise. The asymptotic prOperties of the reduced matrix form are the same as for the full matrix form. 2.2.1 Uncorrelated noise The simplest stochastic case is independent system noise, i.e. where the random variables et are independent and identically distributed (iid) and are independent of the input and output. This means that there are no observation errors, but only an inherent stochastic behavior in the system. Astrom (A2) states and sketches the proof (with minor errors) that in this case the least squares estimator converges in the mean square. The proof is sketched below because some intermediate results are used in the rest of the chapter. Theorem 1. (Astrom) Let the system be given by (2.1.2) and assume: 1) the residuals et are iid with zero mean, have all moments finite, and are independent of the input and output; 2) the system is stable; 3) lim fig fi'ut is finite and lim :: % utut-s = Ru(s) is finite for s = 0,1,...; 4) the matrix whose (i,j)th element is Ru(i-j) is positive definite; 5) the order of the system is known. Then the least squares estimator (2.1.4) converges to e in mean square. 22 The assumptions of this theorem are common in identifica- tion: the first characterizes the errors and will be generalized in Theorem 2; Assumption 2 guarantees that the output is bounded (bounded output should be sufficient for the proof); Assumptions 3 and 4 insure that the input is well behaved (for example, essentially constant inputs such as a unit step or impulse will not work for least Squares identification); and the last assumption guarantees uniqueness. The finite difference representation is used for identification because it gives the unknown parameters in terms of directly measurable variables. When the (minimal) order is known, this representation is unique. Outline of proof (for more details see the proof of Theorem 2 in the appendix): Let _ _ _ _. T (2.2.1) BN - N ¢N EN and CN —'N ¢N ¢N Assumptions 2, 3, and 4 imply that lim EN = 0 and lim CN = C exists and is positive definite. Define (2.2.2) ZN = CNS}; where 9N = 6N - e is the error of the N-th estimate. Then (2.2.3) 2 = l E = B N ¢N N It is now possible to Show that _ T _ E(ZN) — 0 and E(ZNZN) c, E. N I where s = var(et). Now E(ZN) = C lim 8N, so the lse is unbiased. Finally, 23 lim E<9NT9N) = lim sagzN) = lim% cue), so the estimate converges in mean square. 2.2.2 Correlated noise We will now generalize the results to allow et to be system noise correlated to observation noise. Theorem 2. Let the system be given by (2.1.2) and assume: 1) The residuals et are identically distributed with zero mean, all moments finite, E(ete ) = 0 for t+s s > n, and are independent of the noise-free input and output. 2) - 5) same as in Theorem 1. Then the lse (2.1.4) converges to (e + C-IB) in mean square, where C is defined by (2.2.1) and (2.2.4) B = C01(E(ytet+l)’°°°’E(ytet+n)’ E(ute ),...,E(ute )) t+1 t+n The proof is given in the appendix. It is similar to the proof of Theorem 1, but now E(Z ) = B 1‘ 0 and E(ZTZ ) = 1 ("weighted c"). N N N N The term C-IB is the asymptotic bias, which is unknown in most practical situations. We conclude this section with two comments. First, the assumption that E(ete = O for s > n is necessary. If it t+s) did not hold, the actual order of the system would be greater than n, as it would be if the numerator of the transfer function was higher than the degree of the denominator. Second, if there is 24 independent observation noise, but no plant noise, than (2.2.4) can be replaced by (2.2.5) B = col(-alsy,...,-ansy, -blsu,...,-bnsu) where s and sy are the variances of the input and output noise u respectively. 2.3 Small Sample Properties While it is theoretically important to have desirable pro- perties when an arbitrary amount of data is available, in practical Situations it is more important to know the properties when only a limited amount of data is available. But small-Sample pr0perties are more difficult to obtain because they often involve non-linear transformations. Thus, while the proofs of Theorems 1 and 2 (for large samples) did not involve the distribution of er, the negative results of the following theorem (for small samples using the full matrix lse) are proved explicitly only for an example involving a particular distribution and Specific values for N and n. Theorem 3. Under the assumptions of Theorem 1, the full matrix lse (2.1.5) may be biased for finite samples, while the reduced matrix lse (2.1.7) is unbiased. Proof: From (2.2.2) and (2.2.3) we see that (2.3.2) 9N =¢;E et is independent of all uS and all eS and yS, s f t. So in the reduced matrix formulation, each element of EK is inde- + pendent of each element of RK’ which implies that E(®REK) = O, 25 and 8K is unbiased. On the other hand, for the full matrix + formulation, E and ¢N are generally dependent. Let us take N the simplest example, n-1 and N = 2. After some algebra, we obtain 1 e2(u2-au1) (2.3.3) ' = _ 9N €2U1+“1(ay1+b”1) y1“2 e (ay +2bu +e ) 2 l l 2 Thus, m c3e ‘ — _.______ (2.3.4) E(a ) — [ c1 +*cze f(e)de where c1, c2, and c3, obtained from (2.3.3), are independent of e2, and f(e) is the density function of the error. The expected value of b' is similar, but has an additional term, quadratic in e, in the numerator. We will not attempt to find the class of density functions which give unbiased results; but if et is uniformly distributed on (-d,d), then c c - c d (2.3.5) E(a') = %[2— + if log 23-1—2171} 0, almost surely. 1 2c1 2 1 This is the only bias we will explicitly find, but if either n or N is increased, (2.3.4) becomes the multiple integral of the product of a joint density function and a rational function in several variables, so the full matrix lse will give biased small sample estimates most of the time. Of course, there is a price to pay for the unbiased reduced matrix lse: the convergence factor for the full matrix estimator n+1 while it is %'= -- for the reduced formulation. 15 N l N’ 26 For correlated noise, the situation is less clear. First, it is impossible to eliminate either small or large sample bias because EK and é; are dependent for both estimators. Second, it is more difficult to compare the standard estimates, because each estimator has a different weighting matrix. However, numerical results in Chapter III often indicate better results using the reduced lse. It is possible that either the small sample bias or the variance of the estimator is reduced. The only other results on small-sample estimates are in Hurwicz (H2). He shows that the least squares estimates are biased for a first order free response system when a finite number of samples are available and gives specific values of the bias under very particular circumstances. 2.4 Systems of Unknown Order The discussion so far has assumed that the order of the system is known; if the system is simple it is usually possible to determine the order from physical, economic, or other relevant theoretical considerations. But if the system is complex, it may not be possible to determine the order from theory. One reason is the problem of aggregation -- even when it is possible to model each component of a system, the total may not act like the sum of the components. This phenomenon has been noted in engineering (Cl) and economics (Tl). Another reason is that the order of a "complete" model (if it is possible to construct one) may be in the thousands, while the dominant behavior is of low order. Thirdly, the order of the observable, controllable part may be less than 27 the sum of the orders of the subsystems (61). Thus we are motivated to study parameter estimation when the order of the model is not that of the system. First, least squares estimation when the order of the model is higher than the order of the system is presented. Then tests for the order are presented. While not completely general, we will assume that the system has the same form as the model, but has a different order; this assumption will allow us to use several tools to solve the model order problem. Let the order of the system be m so that it may be represented as — m m I (2.4.1) Yt ' 21 ciyt_i + 21 diut-i + 8t and let the order of the model be n so that _ n (2.4.2) yt — £1 aiy -i + £1 biut-i + et Further, we will assume that n is larger than m unless other- wise specified. 2.4.1 No noise Using the material from Section 1.2.1, we see that there is not a unique difference equation if n > m. If n-m = p, then multiplying both the numerator and denominator of the transfer function T(z) by the same arbitrary p-th degree polynomial yields an exact n-th order realization of the system. Two such realiza- tions are (2.4.3) 9 = col(c ., c , 0,...,0, d .. ... 9 0 ... 0 , 1’ C2’ m 1’ ’dm ’ ’ ) 28 where the C1 and (11 are the coefficients in the minimal realization (2.4.1), and the least squares realization described below which is due to Soderstrom (81). Table 1 Shows examples of fourth order realizations of a third order system; the second row in the table is realization (2.4.3), and the last row is the least squares model. With no noise, the rank of ¢ is m+n, so ¢T® is not invertible. The lse becomes _+_T+T (2.4.4) 9N ¢NYN (¢N¢N) ¢NYN The least squares realization has the smallest norm of any n-th order realization. To characterize the lse we need some new notation. First, rearrange the columns of ¢ and the elements of 9 so that the input and output at the same time are adjacent. Then yn ”n yn-l ”n-I y1 “Ii (2.4.5) ¢o = I ; Lyn-I'N-l un-tN-l yn-tN-Z un-l-N-Z yN “IL (2.4.6) 90 = col(x 02p) where x = (Cl, d1,..., cm, dm) and 9k is the null vector with k elements. Now define 29 91 = c“(021-2 6 = c°1(al’b1"°"an’bn) -1, O X i=1,ooo,p OZp-Zi)’ T T H =C01e 300-39 (1 p) B = col(g1,...,gp) where G is obtained from the factorization of T(z) n n-l blz + bZZ +3..+ bnz n zn-l 1 ... n (2.4.8) “T(z) = p-l m P +OOO+ 0.. = dlz dmz . z +-glz +| +812 m p p-l -...- + +OOO+ z c z glz gp 9 can be written as From (2.4.1) it is clear that H¢: = 0, so that premultiplying by H yields the solution (2.4.10) 8 = -(HRT)'1890 Numerical examples of this computation are given in Section 3.2. One property of the real system that the model should reflect is stability or instability. The model will be unstable if one of the extra poles of T(z) lies outside the unit circle, even though the actual system is stable. We want to insure that the algorithm used to estimate the coefficients does not introduce instability into the model; this is especially important if there are stochastic elements present because then the best we can expect is that the pole-zero pairs will be approximately, but not exactly, 30 equal. Obviously, (2.4.3) yields a stable realization because G = Op. The situation is not as neat for the lse, as the follow- ing theorem shows. Theorem 4. Let a system be given by (2.4.1) and its model by (2.4.2), where n-m = p > O, and assume that et and e; are identically zero. Then the lse (2.4.4) introduces p additional pole-zero pairs to the transfer function (2.4.8). Either all these pairs are located inside the unit circle or all are located out- side the unit circle. In particular, if p = l, the extra pair is stable. The proof of Theorem 4 is in the appendix. Our attention has been focused on the case when n is greater than m because there does not seem to be any practical problem when n is smaller. The relevant theoretical work, called the reduction of dynamic systems, (Al, D1, M2) has assumed that the system.is known and all that is desired is a lower order approximation to it. Our situation is somewhat different because all we have are input and output values, but it does not appear to be much different. We get a projection from m dimensional space to n dimensional space which does not seem to effect stability or controllability. In fact, it is sometimes difficult to dis- tinguish between the reduced and unreduced models both in our identification problem (see Section 3.3) and in a similar method for the reduction of known systems (Al). 2.4.2 Noise There is a great gap between the theoretical development and the practical results when there is noise and n > m. The 31 practical results are amazingly good, while the limited theory says that we have no right to expect anything very close. The basic problem is that the noise-free ¢N matrix does not have full rank, and the observed, noisy ¢N has full rank almost surely (a.s.). Thus, if C = lim %'¢§¢N is invertible (this limit of a sequence of non-singular matrices is not necessarily non-singular), then Theorems 1 and 2 are still valid (with minor changes). C does have full rank a.s. if there are only independent observation errors. To Show this let (2.4.11) ¢N = DN + PN where DN is the deterministic part and PN are the observation errors. Then __1_T lT = (2.4.12) E(CN) — E(N DNDN) +E(N PNPN) D + P D has rank (n+m), but P is a diagonal matrix of rank 2n, (2.4.13) P = diag(sy,...,sy, su,...,su) so C has rank 2n a.s. If the stochastic part is just independent plant noise, as in Theorem 1, then P = diag(s,...,s,0,...,0) so C is still invertible if 2n 5 ZmNn, and Theorem 1 is still valid. The small sample properties of Theorem 3 are also correct if n 5 2m, but the full matrix formulation is also unbiased if n > 2m. 32 It is not clear exactly what happens asymptotically (when the number of samples becomes large). The estimates do converge to some estimate; if there is just independent plant noise, the estimate is an exact realization of the system. It would be very difficult to decide which realization is approached, because it would require a very detailed probabilistic argument on the errors in (2.1.3) and knowledge of the relation between (D+P)-1 and P"1 and D.1 in (2.4.12) which not available. It would be nice to say that the answer was either the noise-free least squares estimate or readization (2.4.3). In fact numerical simulations in Chapter III and in (A2) have sometimes indicated one and some- times the other.of these. The reason for this lies in the fact that we are trying to minimize the norm of the parameter vector when there is more than one realization. Figure 1 shows the norm of the estimate for Example 1 when n = 4; while there is a minimum at the lse, the curve is very flat. Soderstrom (81) indicates that machine roundoff error alone can cause a large error in the estimate, so it is not surprising that simulations have not generated consistent results. 1.4 1.2 -1/3 0 1/3 2/3 1 Figure 1. Norm of e vs. Extra Pole 33 If we cannot say much theoretically about the asymptotic estimate, we can say even less about the small sample estimates. The natural approach is to let P, the stochastic part of E(CN) approach zero, but then the rank of E(CN) changes from 2n to, at most, (n+m). When the rank of a matrix changes, the (pseudo-) inverse is no longer a continuous function of its elements as the following examples show. e o '1 /e 0e] 0 0+ 0e 0 1/ but 00 g 00 He 1 '1 +e -1 1 1+ 1 1 1 l+e =21 1 1+ b‘“: 1 =2:11 e +2e - e Thus, the noisy lse might, theoretically, be quite different from the no-noise lse for small samples. But the numerical results in Chapter III are much different than the pessimistic theory. Even with noise the computed poles and zeros are often quite close to the noise-free values. 2.4.3 Tests for the order of the system The literature contains very few papers on the model-order problem per se (A2, 62, W3), but there has been work on choosing a regression model from a set of possible models. All of the existing work on choosing the order of a linear, dynamic system has just modified the general regression tests to fit the particular problem. These existing tests fit into two categories. We can say that ¢ has full rank a.s. if and only if n s m and that the problem is to determine the largest n so that the columns 34 of ¢ are sufficiently independent. On the other hand, we can look at the residuals and say that they should have zero mean and approximately a Gaussian distribution if the correct order is chosen. None of these methods make use of the considerable body of knowledge of linear systems that is available. After a rather complete review of the existing tests of model order, with brief indications of the numerical results in Chapter III, we shall pre- sent some new tests which do make use of the theory of linear systems, in particular the theory of equivalent representations. The second point of view, residuals, is more common in the engineering literature (A2, Jl, W3). There are several non- rigorous tests based on the residuals. Draper and Smith (D1) out- line several ways to examine the residuals of a given model such as plots of the residual versus time, the input, and the output, examination of outliers (those residuals with large magnitudes), and run tests; none of them seem very useful except at a very pre- liminary stage of investigation. A much better one, originally pro- posed by Forsythe (F2) and since used by several others (B4, J1, W3), is often useful. Forsythe considers the problem of determining the degree of a polynomial when only noisy values are available and gives the following heuristic algorithm: Add one degree at a time to the approximating polynomial; the sum of the squared residuals (or the RMS residual) should decrease rapidly until the correct degree is reached and then remain fairly constant. Boling (B4) has modified the algorithm to include a stopping rule based on the relative decrease. Numerical examples indicate that Forsythe's method works well in determining the order of a dynamic 35 system when the variables have a wide enough range but does not work well when the output is relatively constant. Other residual norms, such as the largest absolute one, give Similar results. In all the rigorous tests based on the residuals, it is assumed that they are independent samples from a zero~mean Gaussian distribution. This restriction may be more important theoretically than practically, at least in part because of the central limit theorems. The most common of these is an F-test (Al, El, Bl). Let V1 and V2 be the sum of the squared residuals for models of order n1 and n2 > n1, respectively, over the same N observa- tions. Then V - V N - 2n (2.4.14) F = V—1———2 . 2( _2n) 2 I“2 1 has an F-distribution with 2(n2 - n ) and (N - n - 1) degrees 1 2 of freedom. The F-test says to accept the hypothesis that n >11 2m 2 1 if F is smaller than an apprOpriate value of the F-distribution and reject the hypothesis otherwise. This test also works well when the variables have a wide enough range. Another test on the residuals (J1) is to compute the sample spectrum of the residuals and test the integrated Spectrum using a Kolmogorov-Smirnov statistic. The first point of view, the rank of ¢, is more common in the econometrics literature (Fl, Hl, W2), where it is called the multicollinearity problem (the word identification being reserved for a different problem). ¢ has full column rank if and 36 only if ¢T¢, a square matrix, is invertible, so tests based on det(¢T¢) seem reasonable. Both Farrar and Glauber (F1) and Haitovsky (Hl) do so. However, they use the correlation matrix, whose de- terminant is between zero and one and is based on a zero-mean pro- cess, instead of the unstandardized ¢T¢. Haitovsky's test seems more reasonable: Let Rn be the Zn X 2n correlation matrix. Then, as in (H1), (2.4.15) X = -(N + 2n/3 + 11/6) ln(l - det Rn) has a chi-squared distribution with n(2n-l) degrees of freedom. A.small X value implies that n > m. For a dynamic system, it is easy to Show that det Rn+1 s det Rn so that the test gives a stopping criterion. Of course, if the input or output is relatively constant, then Rn will be small for any n > 1. The other problem is that by subtracting the means we may have changed the rank of the matrix under investigation, i.e. Rn and ¢T¢ may not have the same rank. ¢T¢ can be normalized so that its determinant is between zero and one, but the chi-squared test will still not apply. Tests based on det(Rn) or det(¢T¢) do not seem to work as well as the residual tests. Goodman and Hiller (CZ) give a test based on the deter- minant of a square ¢ and estimates of the maximum errors. But, again, the determinants will be small if a variable is almost con- stant and will be very sensitive to the particular values of the variables. Some sort of average determinant might be useful to reduce this dependency, but the algorithm is not too promising. 37 As mentioned above, all of these tests apply to the gen- eral problem of choosing a regression model. The only applica- tions to dynamic systems have been the Goodman and Hiller paper (G2), an application of the F-test by Astrom (A2), and a paper by Woodside (W3). Woodside compares three tests: the mean squared residual, the ratio of the determinants of unnormalized $T¢ matrices, and a likelihood ratio test which is fairly complicated computationally. He concludes that the mean squared residual test is the best of these. He also gives a method for improving the results if the characteristics of the (uncorrelated) noise are available. All of the above tests have the disadvantage that they do not work well in practice when the input or output is almost constant. This is expected in the general regression situation. But we can expect better results if we can use knowledge of dynamic systems. Our basic problem is to find a model which behaves in the same manner as the actual system, at least in those aspects of interest. Ideally, the model might have the same order, co- efficients, poles, and zeros as the (minimal respresentation of the) system, but this might not be possible or even necessary. If we have several models which behave Similarly to each other, then any of them should be acceptable. In particular, we can use knowledge of equivalent representations and reduction of dynamic systems. Our procedure will have four steps. First, we compute the parameters for a number of models; for example, we may get models of orders one through M for the full matrix, reduced matrix, 38 and normalized least squares estimators (more will be said about the selection of the algorithm in Chapter III). It might be possible to select the order on the basis of the mean squared residual or other such test. If not, these tests will allow us to eliminate some possibilities. This is the second step and re- duces our selection space to, say, orders m' through M'. The third step is to examine the poles and zeros of the estimated transfer functions. The numerator and denominator of an estimate should have approximately common factors if its order is larger than the minhmal order of the system. We can also look at the poles and zeros of different models; they should be approximately con- stant with respect to changes of model order. The theoretical prob- lems with the third step are that the roots of a polynomial are notoriously sensitive to small changes or errors in the coefficients, and that the standard error of the coefficients increases as the order does and as the variables become more constant. However, the numerical results are very good. The final step is to choose the simplest acceptable model based on steps two and three. There is a test based on the theory of dynamic systems which has shown better numerical results than any of the others. In Section 1.2.1, we stated that, given the controllability matrix, QC, and the observability matrix, Q0, of any realization of a system, the minimal order of the system is rankKQiOc). Since the observability matrix of realization (1.2.3)-(1.2.6) is the identity, the minimal order is rankKQC). Since the stochastic elements will cause Qc to always have maximum rank, it seems reasonable to compute det(QC) and decide when the determinant becomes small 39 enough as the rank changes. It is easy to set up Q and compute its determinant be- c cause it is a Hankel matrix, i.e. (2.4.16) Qc(i,j) = si+j_1 To see this recall that n-l (2.4.17) QC = (B, AB,...,A B) It is now easy to see that (2.4.18) S, = h , j = l,2,...,n where hj is defined in (1.2.3), and a little computation shows that = n ' = .. (2.4.19) Sj zlsj-iai’ j (n+1),...,(2n l) 2.5 Normalized Least Squares One problem with least squares is that larger data values influence the results more than smaller values. For example, let (2.5.1) yt = yt-1 - ut-l We will take two sequences of observations and find the least squares solutions; in each sequence there is just one error, ya. yI u1 y2 u2 y3 u3 y4 y1 u1 y2 u2 y3 u3 y4 u4 1 1 0 1 -1 10 -12 10 10 0 10 -10 1 -12 In matrix form 40 O 1 O 10 10 -l = O l 9, ~10 = O 10 9 ~12 -1 10 -12 -10 1 Solving for e 135 1 342 1 9 = -134 123 9 = -332 321 and the errors are -1 -10 __.1__. -..L. ' 123 11 E ’ 321 110 -1 542 Thus, in the first sequence, the last residual is the smallest even though it is the only line where there is an error. In the second Sequence, the last residual is the largest. Werther (W1) has prOposed a weighting scheme which eliminates this inequity. He divides the elements in each row by the rms value of the row; this correSponds to premultiplying by a diagonal weighting matrix. He states that the results are improved sometimes by this "normalization" process. The non-uniform effect of the observation magnitudes is also important if the input and output have different magnitudes. In Examples 2 and 3, Chapter III, the output is about twice the input. Not only does the output have more influence on the para- meter estimates, but the variances of the input coefficients are larger than the variances of the output coefficients. This is because the variance of a coefficient equals the appropriate diagonal term of (¢T¢)-l. For a first order system, the variance 41 of the output coefficient is proportional to the sum of squares of the input, and the variance of the input coefficient is propor- tional to the sum of squares of the output. In the examples the input coefficient has four times the variance of the output co- efficient even though it is three orders of magnitude smaller! Attempts we have made to correct this problem have not been too successful. More work needs to be done on the area of choosing a good weighting matrix. 2.6 Summary This chapter contains the theoretical results on least squares and the model order problem. After explicitly defining the full and reduced matrix least squares algorithms, it was shown that they converge in mean square with correlated errors. The small sample bias of the full matrix algorithm and unbiased- ness of the reduced one were shown. Several tests for the order of the system were given in Section 2.4.3 including some new ones based on systems realization theory. The last section contained some material on the selection of weighting schemes to normalize the effect of the observations. CHAPTER III NUMERICAL COMPUTATIONS In this chapter we will compare the algorithms and order tests defined in Chapter II. First, we will describe the numerical algorithms used in the computations. Then the examples will be described in some detail. Finally, the numerical comparisons will be made using both simulated data and observed data from a real, but unknown, system. 3.1 Description of the Algorithms Used There are several ways to compute the least-squares estimate off-line. The most obvious, and the most common, method is to pre- multiply both sides of (2.1.3) by ¢T and solve the system of equations T T (3-1-1) (e ¢)9 = ¢ Y for e by the Gaussian algorithm. But this method is numerically unstable (C3). A much better method is to determine a unitary matrix Q such that Q¢, is triangular. Then the solution of (3.1-2) (Q¢)e = QY is the same as the solution of (3.1.1) and the least squares solu- tion of (2.1.3), but it is much less prone to numerical errors. 42 43 This algorithm is the one which we have used most often because of its stable numerical properties; the actual computer subroutine was obtained from the IBM Scientific Subroutine Package (11). Least squares can also be done on-line as follows (G3): write the n-th order difference equation representation as (3.1.3) yn+i 1 where (3.1.4) 21 = col(y , ’yn-l+i’ 1,. "un-l+i) and let _ T UN ' Eli yn+izi _ T TN 2: zizi Then (3 1.5) = u T"1 ‘ 9N N N This algorithm can be easily modified to handle multi-input multi-output systems (G3). It also requires less storage than the off-line algorithms Since it is necessary to keep only the latest UN vector and the latest TN matrix, in addition to one 21 vector. However, some computational effort can be saved at the expense of a little bit of storage by finding the inverses recursively. To do this, we divide the algorithm into three stages. TN is invertible a.s. if N 2 2n, so we can use the well known (G3, Pl) equation of (3.1.7) to compute T"1 from Igl. N+l If N < 2n, we use the first two stages to get TN+1 from T;. 44 Step (a) N = 1. Then (3.1.6) I: = (2%2 )2 T1 1 1 Proof: By definition, T# = T+ if TT#T = T, T#TT# = T#, and TT and T#T are hermitian. Dropping the subscript, T2 = zszzT = zsz = NT, where x = sz. Thus, TT+T = (ImZT3 = alpzxr2 = T. Similarly, T+TT+ = T+. TT+ + and T T are hermitian because T is. Steb (b) For 1 < N < 2n, when TN is not invertible, then we can use the new algorithm given below. Step (c) If T is nonsingular, then N -l _ -l l -l -l T (3.1.7) TN+1 - TN - H21. (T_1 ) (TN ZN+1)(TN zN+1) N+1 N zN+1 The algorithm for step (b) requires that we write TN as _ T _ T (3.1.8) TN - :q 2121 — 22 where (3.1.9) Z = (z1,...,zN) . T . . Then, If W = Z 2 Is nonSIngular + - (3.1.10) TN = zw 2zT Proof: TT+ = ZZTZW-ZZT = ZWW-ZZT = ZW-IZT. Thus, TTTT = ZW-IZTZZT = ZWW-IZT = ZZT = T. The rest of the proof is Similar. 45 While the algorithm holds when the 21 are matrices, that is in the multi-input, multi-output case, if they are vectors we do not have to invert W at each step. First note that Wfi+1 and WN differ only by the addition of a new column and row. (3.1.12) W = where b is N X l and c is a scalar. Thus (F3) -l T dWfi + ee -e -1 _ 1 (3.1.13) Wfi+1 d T -e l where (3.1.14) d = c - bthglb _ -l e — WN b WN+1 Is nonSIngular If WN Is and d is non-zero. Another on-line method is stochastic approximation (M3). (3115) = +1» z( -T) ’ ‘ eN+1 eN N+l N yn+N zNeN where (3.1.16) P =1> - 1 (p )(P )T N+1 N T NZN NZN l +zNPNzN The initial matrix PO may be any positive definite symmetric matrix of appropriate dimensions, but is usually taken as a multiple of the identity. The stochastic approximation estimates approach the least squares estimates as N becomes large. 46 Stochastic approximation has the advantage over on-line least squares that it is self-correcting; the factor (ynfiN - zSQN) is the error of the last observation. It will also automatically correct for slowly-varying parameters. Accumulated roundoff error was a problem for on-line least squares when the stiff example, Example 2, was used; double precision arithmetic corrected this problem. While the on-line algorithms are useful for control of an operating system, the off-line algorithm seems preferable for the model order problem because it allows the least amount of over- all error. The determinants were taken by the pivotal condensation method (F3). Some reduction in computation could have been saved by using elementary row and column Operations on the original matrices, but the necessary bookkeeping outweighed the improvement. Both the correlation and observability matrices have determinants that are relatively insensitive to round-off errors in most Situations. 3.2 Description of the Examples In the next section, we will use three examples to numerically compare the algorithms and tests. The first is regular simulated system; the second an ill-conditioned simulated system; and the third is actual data observed on an electric power network. Example 1. Regular system =. +. -0 '0 (3-2-1) Yt 8Y¢-1 39yc-2 27yt-3 SL1t-l + .SUC'Z + .lut‘3 47 To illustrate the procedure of Section 2.4.1 for the calculation of the extra zeros when n > m, we will do it for p = n-m = 1 and for p = 2. For p = 1, (2.4.6) and (2.4.7) become 91 = {-1, 0, 8, -.5, 39, 5, - 27, 1) (3.2.2) eg— (8, - 5, 39, 5, - 27, .1, 0, 0) H = .3 Thus (3.2.3) 0 = «9:91) “16:60 = - £3323 m 1/3 The additional pole of T(z) is about -1/3. For p = 2 we have D-l 92 = (09 09 '19 09 89 " 59 39: 59 ' 279 1) T 91 = (-I, 0, 8, - 5, 39, 5, - 27, .1, 0, O) T 90 - ( 8, - 5, 39, 5, -.27, .1, 0, 0, 0, 0) Now [2.375 -.7933] '1 [3.7933] .47982 (3.2.4) G = - -.7933 2.375 -.656 = .43648 Thus, the additional factor is (z2 + .479822 + .43648) and the additional poles are (-.2399 i .6156i). Example 2. Ill-conditioned system (3.2.7) yt = .1998yt_1 + .3998yt_ 2 + .207 92yt-3 + .1035616yt_A + .0883232yt_5 - .558-46t 1 +'.1595E-3ut - .14245E-3ut -2 +.881951:-3ut_5 -3 + .45925E-4ut -4 48 The poles of the transfer function are 0.9998, (-.5 : .3i), and (.l i;.51); the zeros are approximately 2.8079, -l.34194, and (.717 j 1.9341). This fifth-order example was designed to approximate the power system of Example 3; an average input of 33.4 should yield an average output of 60.0. The input used in the simulations was approximately uniformly distributed in the interval (30.4, 36.4); the output had a mean of about 59.4 and a standard deviation of about .31. The range of the output, although much larger than the output of example, was relatively small, about .251 of the mean. Example 3. Observed data The last example is a set of load-frequency data taken from an actual Operating electric power system. An attempt to model the data was made using stochastic approximation (Pl); several "good" models were made, and the problem of selecting the best of them motivated the present work. 3.3 Computed Results In this section we examine the output of simulated runs of examples one and two, where the noise has different properties on each run, and the estimates for the real data of example three. We will see that the model tests alone are sufficient to identify the order of the regular system, Example 1, but are not sufficient for the power example and its approximate simulation, Example 2. There was no difficulty choosing the order of Example 1. With even a small number of sample points, N = 100, the con- trollability matrix test worked with every noise configuration that 49 was used; the mean squared residual worked except at high noise levels; the F-test except when there was only observation noise; the correlation matrix, and its chi-squared variant, did not work. There seemed to be little difference between the full and reduced matrix results. Figures 2, 3, 4 and 5 and Tables 2, 4 and 5 compare the results for three different simulations of Example 1. All three had relatively high noise levels (one had system noise only, one observation noise only, and one had both), and the full matrix tests were more definite than the reduced ones for all three. Figure 2 shows the absolute values of the controllability deter- minants; the numerical data for two of the runs is in Table 2. As shown in Figure 3, there is at least an order of magnitude change when going from n = 3, the correct order, to n = 4, except in two instances, and the drop was often two orders of magnitude. In the two exceptions, the large drop occurred from n = 4 to n = 5, giving an estimated order of four. Figure 4 contains a graph of the full matrix mean squared residuals, listed in Table 2. The three runs shown there were the only three where this test did not work well (more typical runs are in Figure 5); while there is very little decline after n = 3, there is also very little from n = 2 to n = 3. This test gives an estimated system order of two. Figure 4 also shows the deter- minants of the correlation matrices for one of the runs; the straight line behavior was typical, indicating that the test is not very good. The chi-squared test based on this determinant was almost a complete failure. 50 Table 2. Order Tests for Example 1 order Mean Square Residual Det(Q) full Reduced Full Reduced l .0794 .0824 .0846 .1103 2 .0277 .0235 .0660 .1390 3 .0247 .0216 .0401 .1106 4 .0245 .0213 .0007 .0262 5 .0244 .0212 .0001 .0036 System and observation noise 1 .0770 .0790 .0899 .1062 2 .0528 .0215 .0799 .1194 3 .0224 .0194 .0462 .0944 4 .0223 .0189 .0006 .0158 5 .0221 .0187 .0000 .0003 System noise only 1 .0498 .0480 .1871 .2461 2 .0041 .0030 .0967 .1722 3 .0020 .0017 .0312 .0456 4 .0017 .0014 .0015 .0003 5 .0014 .0011 .0003 .0002 Observation noise only yt = .8yt-1 + .39yt_2 - .27yt_3 - OSUt-l + jut-2 + .lu 51 .06 .03 full matrix reduced matrix controllability controllability determinant determinant Figure 2. Determinants of Controllability Matrices. 52 100 10 2/3 3/4 4/5 1/2 Ratio of Controllability Determinants Figure 3. (full matrix lse) 53 .8 .7 ‘1. T: . 6 \ .5 .4 "-, .3 kq....'..IIIIIIIIIII-nu — — — — — 2 K . l 2 3 4 5 order 1 2 3. 4 5 order full matrix determinants mean squared residual Figure 4. of correlation matrices Order Tests 54 \ \ \ \ 1 l l l \ l l l \ \ \ 1h 2 3 4 5 Figure 5. Mean Square Residuals 55 Table 4 shows the F-test arrays for these runs. The critical values at 90% confidence are in Table 3. Table 3. Critical Values for the F-test “2 1 value 2.30 1.94 1.77 1.67 If a value in the array is lower than the one in Table 3, then n2, n1 2 m, the minimal order. The first pair of arrays in Table 4 are for observation noise only, and both indicate that the system order is at least five. The test does work for the other two noise configurations shown, although the last array does not have completely consistent results. Table 5 summarizes the results of the test on the noise configurations shown for Example 1. We can conclude that, even for these relatively high noise levels, the order of the system is three and we need not do anything further. We can also con- clude that the determinant of the controllability matrix is the test that works most often. The problem of choosing the system order is more difficult for the second example, as the summary Table 6 shows. The F-test, detailed in Table 7, is not very helpful. The mean squared residual, Figure 6 and Table 8, becomes flatter as n increases, but there is no one point where there is a large change of slope as there was in Figure 5. But the controllability determinant does work in most cases. Order U‘I-l-‘LDN Uid-‘UJN UlJ-‘UN Utd-‘WN LII-#005) U'IJ-‘wN Table 4. 1 2 2745. 2929. 259 2373. 182. 2148. 162. Full matrix -- 732. 635. 34.1 528. 27.2 487. 25.9 Reduced matrix 504. 301. 33.0 201. 17.0 152. 11.9 Full matrix -- 129. 7.21 4.90 48.7 3.06 36.3 2.22 Reduced matrix 464. 274. 30.0 184. 15.9 138. 11.0 Full matrix -- 120. 66.0 4.11 43.9 2.37 32.4 1.62 Reduced matrix 56 F-test for Example 1 3 4 52.0 56.2 50.1 observation noise only 12.1 13.0 11.2 -- observation noise only 1.11 1.37 1.62 system noise only 1.20 .886 .584 —- system noise only 1.46 1.30 system and observation noise .652 .429 .218 -- system and observation noise 57 Table 5. Estimates of the System Order Noise Full Reduced Full Reduced Full Reduced type MSR MSR F -test F -test det (Q) det (Q) System 2 or 3 2 or 3 3 3 3 3 or 4 Obser. 2 or 3 2 or 3 2 5 2 5 3 3 Both 2 or 3 2 or 3 3 3 3 3 Table 6. Order Test Summary for Example 2 Test Observation noise System noise Both Full Reduced Full Reduced Full Reduced MSR X 5 or 7 5 6 4 or 5 X F-test 2 7 4 5 or 6 3 or 5 5 4 or 5 Det(Qc) 5 4 or 5 5 5 5 5 Det(Q c) ----— 5 5 5 5 5 5 nbi Note that the determinant (Table 8) decreases by about three orders of magnitude in almost all situations and that the relative de- crease is approximately equal to the input coefficients, b This observation led to a heuristic modification which has given very satisfactory results in every instance of Examples 1 and 2: "normalize" the determinant by dividing by the product of the bi. Then (3.3.1) det (Qm_1) det (Qm) << m9 m n1 1bi,m-l TIl bi,m order Nombww \onUI-l-‘wN \chutJ-‘UON \lcst-‘wN NoU'IJ-‘wN \lChU'l-PUN Table 7. l 2 91.1 78.0 50.0 62.8 37.5 53.4 31.6 45.2 26.0 42.1 24.9 Full matrix -- 14.2 9.76 4.06 8.42 4.22 7.09 3.65 5.60 2.75 5.02 2.56 Reduced matrix 138. 102. 46.3 78.6 33.9 65.2 28.3 52.4 21.6 44.2 17.7 Full matrix -- 16.9 11.5 4.47 8.80 3.52 7.37 3.15 6.43 2.90 5.22 2.27 Reduced matrix 156. 99.8 28.7 70.2 18.2 63.4 21.6 51.7 17.1 43.0 13.7 Full matrix -- 14.9 9.76 3.55 6.55 1.97 5.81 2.25 4.93 1.98 3.99 1.57 Reduced matrix 58 3 21.6 19.3 15.6 16.1 F-tests for Example 2 A 5 6 16.0 11.8 7.29 13.4 11.6 15.5 observation noise only 4.02 3.19 2.17 2.06 2.21 1.22 .275 1.36 .938 1.60 observation noise only 18.7 16.9 11.7 9.28 14.2 7.77 1.30 5.84 1.61 1.93 system and observation noise 2.39 2.33 2.22 1.63 2.16 2.05 1.89 1.36 .958 .084 system and observation noise 6.98 16.6 12.2 9.15 25.6 14.4 3.11 9.67 1.64 .181 system noise only .449 1.55 1.41 1.07 2.62 1.88 1.14 1.27 .628 .152 -- system noise only order NoWJ-‘U’NH \loxkl'IJ-‘WNH Noxmwar-t yt = '1998yt-1 +3998”?2 + '2079yt-3 + .10356l6yt_4 + .0883232yt_ -.55E-4u C 59 Table 8. Order Tests for Example 2 Mean Sq Full .773E -4 .554E-4 .474E-4 .442E-4 .4l%-4 .40E-4 .388E-4 .l36E-3 .927E-4 .802E-4 .757E-4 .720E-4 .717E-4 .712E-4 .64%-4 .426E-4 .388E-4 .37%-4 .34%-4 .345E-4 .345E-4 -l . Residual Det(Q) Normalized Det(Q) Reduced Full Reduced Full Reduced .778E-4 .23lE-3 .158E-3 1.0 1.0 .555E-4 .498E-7 .114E-6 .703 1.422 .497E-4 .605E-10 .l86E-10 2.36 .468 .444E-4 .661E-14 .2362-12 2.71 6.83 .415E-4 .514E-15 .187E-14 443. 134. .412E-4 ASE-18 .118E-17 2941. 264. .391E-4 JOE-22 .127E-20 3425. 887. Observation noise only .120E -3 .138E -3 .l45E -3 l .0 .812E-4 .24lE-6 .475E-7 1.65 .71E-4 ,.223E-9 .171E -9 4.06 .67lE-4 .l6lE-l4 .62%-l3 .502 .62%-4 .103E-l4 .31%-14 654. .594E-4 .l82E-18 .482E-18 366. . 5 92E -4 . 9213 -21 . 303E -21 8033 . System and observation noise .575E-4 .llOE-3 .152E-3 .405E-4 .l4OE-6 .147E-7 .367E-4 .6l3E-10 .143E-10 .362E-4 .677E-14 .301E-l4 .335E-4 .106E-l4 .269E-15 .323E-4 .352E-18 .861E-20 .322E-4 .523E-21 .614E-24 Sys tem noise only 5 + .1595E-3ut - .l4245E-3ut + .45925E--4ut + .881958-3ut -2 -3 -4 -5 60 m: 0H: .0 z 6mm em as om pow m :3 Ha now as use 3 N I, OOOOOOUOOOOOOUO I // .I, ll mm no mo umE v u kw omH O .... I. .' I. I. " O l,..- ' l ' om Ha xauume oma 100 10 Normalized Controllability Determinant for Example 2 Figure 7. 62 where the subscript denotes the order of the model, and m is the minimal order of the system. Table 8 also shows these normalized determinants. On the basis of them, we can conclude that the minimal system.order is five, the correct one, in every instance. If we are not satisfied with the conclusion from the con- trollability test, then we can look at the poles and zeros of the transfer functions. Table 9 shows one set of poles and zeros. There is very good agreement, with respect to change of model order, on the dominant pole. Starting with the third order there is a complex pair with negative real part and norm around .6; these will behave very similarly. Another pair with about the same norm enters at n = 5. Looking at the zeros, we see that starting with n = 5 there are two real (around -l.3 and -3) and one complex pair (with positive real part and norm 1.1). While the fourth- order poles are similar to the fifth-order ones, the fourth and fifth order zeros are not similar. There is no approximate pole- zero cancellation to help decide. But we can get some information from the steady-state response to a unit-step input, also shown in Table 9. There is very good agreement starting with n = 5, and less agreement with lower order values. Thus we can conclude from examination of the transfer functions, as well as from the normalized controllability test, that the system is fifth order. The order tests yield more consistent answers for Example 63 Table 9. Poles and Zeros for Example 2 (a,b) denotes the complex pair (a j;bi) order Poles 2 .99969 -.561 3 .99978 (-.380, .466) 4 .99975 (-.120, .622) -.607 5 099962 ('0498, 0284) (0057, 0627) 6 .99967 (-.322, .309) (.105, .588) -.442 7 .99965 (-.629, .289) (.322, .489) (-.131, .661) True .9998' (-.5, .3) (.1, .5) Steady order Zeros state 2 -1.38 1.653 3 -2.21 .315 1.600 4 -2.24 (.106, .435) 1.621 5 -l.l3 -3.16 (.591, .915) 1.675 6 -l.29 -2.81 (.504, .941) .285 1.661 7 -l.31 -2.79 (.543, .881) (.134, .382) 1.665 True -l.34 +2.81 (.717, 1.93) 1.603 Table 10. Order Test Summary for Example 3 Test Full lse Reduced lse Mean Squared Residual X 4 Controllability det 4 4 Normal detalc) 4 or 6 4 or 6 F-test 5 4 (,4 The tests are detailed in Tables 11 and 12 and in Figures 8 and 9. From them we draw the tentative conclusion that the system can be approximated fairly well by a fourth order model. The poles and zeros for both the full and reduced algorithms are in Table 13. As in Example 2, the poles and zeros are relatively constant as the model order changes, but there is no pole-zero cancellation. The steady-state response to the unit- step input is virtually constant for the full lse and, as expected, varies slightly for the reduced lse. The examination of the poles and zeros gives no reason to reject the tentative conclusion that m = 4. Table 11. Order Tests for Example 3 order Mean Sq. Residual Detfll) Normal det Full Reduced Full Reduced Full Reduced l .482E-5 .56OE-5 .262E-3 .312E-3 1.0 1.0 2 .300E-5 .336E-5 .180E-6 .236E-6 .98 .80 3 .293E-5 .333E-5 .233E-9 .l3lE-9 3.8 2.7 4 .287E-5 .273E-5 .199E-ll .152E-10 25. 30. 5 .280E-5 .26OE-5 .208E-16 .134E-l3 .3 307. 6 .280E-5 .252E-5 .l67E-l6 .581E-l6 1.7E4 1.1E4 In summary, the normalized det(Qc) test is the most effective followed by the det(Qc), F-, and mean squared residual tests; the poles and zeros are useful as checks on the other tests. It is best to use all the tests and to use them on several sets of parameter estimates because, consistnet test results imply confidence about both the chosen order and the chosen model type. Also, the computations are very inexpensive: the entire computation time for all tests and parameter estimates for the power example was under 8 seconds on a CDC 6500 computer. 65 .55E-5 ‘ | I | l, | l l .45 “ I | l .40 “ i \ l .35 1. ‘\ \ .30 \ \ Full \\\ ,25E ‘5 \\ Reduced order 1 2 3 4 5 6 Figure 8. Mean Squared Residual -- Example 3 66 I I I I I I I I I I I 100 I 1000 I I I 100 1° 10 1 1/2 2/3 3/4 4/5 5/6 1 2 3 4 5 Determinant ratios Normalized determinant Figure 9. Controllability Determinants -- Example 3 Order O‘U‘J-‘UDN O‘U‘J-‘UN order o~u~a~u>n> order O‘U‘lbUN 67 Steady state Steady state Table 12. F-test for Example 3 l 2 3 4 5 116. 61.6 4.65 43.1 4.39 4.06 34.3 4.67 4.60 5.05 27.3 3.52 3.09 2.58 .124 Full matrix 17.0 8.36 .248 8.21 2.70 5.10 6.46 2.17 3.11 1.10 5.24 1.79 2.29 .899 .708 Table 13. Poles and Zeros for Example 3 Poles -- full matrix 1.000135 -.601 1.000106 .052 -.649 10000132 0335 (-0450, 0037) 1.000110 .416 (-.180, .269) -.618 1.000110 .514 (-.534, .143) (-.004, .403) zeros -- full matrix .404 1.858 -2.12 .682 1.858 -2.63 (.555, .223) 1.858 -1.87 (.438, .622) .780 1.857 -1.83 (.442, .623) .794 —.033 1.857 Poles -- reduced matrix 1.000158 -.630 1.000142 -.002 -.613 1.000240 .718 (-.663, .417) 1.000196 (-.747, .462) (.444, .287) 1.000209 .685 (-.811, .450) (.157, .548) zeros -- reduced matrix .432 1.810 -l.O65 .630 1.830 -1.992 (.666, .207) 1.827 -l.719 (.588, .248) .185 1.831 -1.325 (.497, .433) .736 -.514 1.834 CHAPTER IV CONCLUSIONS 4.1 Thesis Results Choosing a model type is always a difficult and important problem in modeling. Even if it is possible to use the simplest type of dynamic system -- linear, time-invariant, single-input, single-output -- there is still the problem of selecting an order of the model. We have discussed the model order problem for such systems. There are two aspects of this problem: the method of parameter identification and tests to determine the order of the system on the basis of the parameter estimates. The basic method used in the parameter identification phase was least squares. Least squares was chosen because it requires less knowledge and fewer assumptions than other methods. After introducing the reduced variant of the algorithm, large and small sample properties were derived assuming that the order of the system was known. Some theoretical results when the model order was greater than the system order were given. While reduced least squares often gave better estimates than the full matrix version, seomtimes they were worse. It was definitely useful to have both sets of estimates when estimating the system order. Several tests to determine the system order were given. Among them are some new ones based on the characteristics of dynamic 68 69 systems. The most promising of these is based on the fact that the minimal order of a system equals the rank of the product of the con- trollability and observability matrices of any representation. Using a representation where the observability matrix is the identity, the test is to look at the determinant of the controllability matrices for successive mode1 orders normalized by the product of the corresponding input coefficients. The normalized determinants are relatively constant when the model order is less than the system order; when they are equal, there is a large jump in the determinant value; and the behavior is erratic when the model order is more than the system order. It was helpful to also use an F-test on the residuals. 4.2 Future Development Any new area contains many voids between the already known facts. In identification, much work can be done on the theoretical prOperties when the model order is more than the system order. This will be a very difficult problem because uniqueness is absent. Selection of weighting schemes is also not well developed. Additional model order tests must be developed. For example, other state-space realizations might generate determinants which have more useful prOperties than the one we used. Finding useful properties of the determinants is also an area for research. And the general problem of selecting model properties is wide open. APPENDIX APPENDIX A.l Proof of Theorem 2 C = lim C as defined by The assumptions imply that N (2.2.1) exists and is positive definite (see (M1) for details) Intuitively, this makes sense because C is the correlation matrix for the input and output. 1 N N zlyn-i-i-len+i E(yt-iet) 1 N N zlyien-l-i E(yt-net) (A.1.l) lim BN = lim = = B l’iNu e E(u e ) N 1 n+i-1 n+i t-l t 1 N ztluien-i-i E(ut-net) B exists by assumptions 1) - 3). Let ._ l T _ = l T _ T (A.l.2) 2N N ¢N®N(8N e) N((1;quN (¢N¢N(YN END) 1 T " N ¢NEN 7' BN Also, Using (A.1.l) and (A.1.2) E(ZN) = E(BN) = B. T 1 1 T T . . E(ZNZN) — N E(N ¢NENEN¢N), a moment matrix which ex1sts. We now look at the limits. Obviously, (A.1.3) lim E(CNBW) = linIE(ZN) = B 70 71 Although E(CNefi) 1‘ E(CN)E(9‘\;), still (A.1.4) 1im‘E(CNe§) = lim E(CN) x lim E(efi) = c lim E(eé) because the stochastic matrices CN approach the deterministic matrix C. Since C is invertible, (A-lo5) lim E(eé) = In Theorem 1, B = 0, so the lse is unbiased. To show mean square convergence, we must show that (A.l.6) lim E(efiTefi) = 0 Since ' = C-12 and since C is symmetric 9N N N N ’ IT I) -2 (A.1.7) E(eN 9N) =E(ZN TNc z N) (A.1.8) lim E(ZTZ ) = lim E(tr(Z 21)) N N N N . 1 . = 11m fi'tr(moment matrix) = O . . . . . -2 . . . Since C is pOSltlve definite, CN acts as a weighting matrix for N sufficiently large. Thus, (A.1.9) lim E(eNT 9N) = lim l'tr(weighted moment matrix)= 0 and mean square convergence is proved. 72 A.2 Proof of Theorem 4 Lep p = 1. We will show that -l s -g S 1. From (4.2. ) 8T8 1 T 9191 O (A.2.1) -g = From (4.2.6) and (4.2.7) T _ T (A-2.2) 9191 — 1 + 9090 T T T Let 9190 2 0. If 9190 2 8191, then T T T O. ‘ + (A 2 3) 29190 a 9191 9090 or (A24) 02( -e)T( -) '° 91 O 91 60 T which is impossible. The proof is similar for 9160 < 0. Now assume that for p = po we have _ T - _ -l (A'2°S) G0 -(HoHo) 11“090 L0 Jo We will express Gl’ correSponding to p = po + 1, in terms of . _ T = . . Go. Notice that (Lo)i,j eiej f‘i-j" allow1ng us to wrlte L K (A.2.6) L = 1 KT f o where .2. = (A 7) K col(fp_1,fp 2, ,fl) and (A.2.8) J = col(Jo, fp) 73 Lil can be expressed in terms of L;1 as (F1) _1 I -L;1K L;1 o 1 o (A.2.9) L = _ _ _ 1 o 1 o (f - KTL 1K) 1 -KTL 1 1 0 O 0 Finally, we get -1 "O o - dL + W Go 0 K 81 dgp-l (A.2.10) G = = O o 1 d g2 + dgp_2 0 + d gp_1 g c d J where g: is the i-th element of Go and T -1 -1 T -l .2. = - - - (A 11) d (£0 K LO K) (fp K LO Go) To show that the roots of (2.4. ) are either all within or all outside the unit circle, we use the Lehmer-Schur method (R1). Define = p-l o p o (A.2.12) Po(z) z + glz + + gp-l and * o p-l = +. + (A.2.13) Po(z) gp_lz l * The roots of P are the inverse conjugates of the roots of P, i.e. if all the roots of P are inside the unit circle then all * the roots of P are outside. In a similar manner P1(z) and * P1(z) are defined from G1. We now form 2 - (d -l)g:_lzp 1 +...+ (dz-l) (dz-1)P:(z) (A.2.14) T(P1(z)) = (1121(2) - Pie) 74 The Lehmer-Schur method states that if T(P1(0)) < 0, then T(P1(z)) = constant-P;(z) and P:(z) have the same number of zeros within the unit circle; if T(P1(0)) > 0, then P:(z) and P1(z) have the same number of zeros inside. In summary, if \d‘ s l and Go defines a stable system, then G1 will also define a stable system; if ‘d‘ > 1 and G0 is stable, then every mode of G1 will be unstable; etc., and the theorem is proved. It would be more interesting if we could prove that ‘d‘ < 1. It is not too difficult to show that d > -1, but the other in- equality is more more difficult. 75 A.3 Steady-state Response to a Unit-step Input We will show that the steady-state response of the system (1.1.1) to a unit-step input is (A.3.1) B/(l - A) where n _ n (A.3.2) A — zlai B - zlbi We will show this in both the time domain (difference equation representation) and z-domain (transfer function representation) to illustrate the differences of the two domains. In the time domain, ut = 1 for all t = 0,1,... Then y1 = b1 y2 = aly1 + (b1 + b2) — +...+ + yn+1 alyn anyl B n + yn+k zlaiyn+k-i B Taking limits, n o o = + = A + (A 3 3) yss 2laiyss B yss B and the result is shown. In the z-domain, the transfer function is (A.3.4) T(z) = Y(z)/U(z) and the unit-step response is (01) (A.3.5) Y(z) = 6(2) -—' The final value theorem states (A.3.6) ySS = lim yt = lim ((z-1)Y(z)). t-m z—41 Thus, A37 =1' (G())=-B—- ( . . ) ySs 1m 2 z l-A z~1 BIBLIOGRAPHY A1 A2 A3 B1 B2 B3 B4 B5 Cl C2 C3 BIBLIOGRAPHY Anderson, J.B., "Geometrical Approach to Reduction of Dynamical Systems", Proc. IEEE, Vol. 114, July 7, 1967, Astron, K.J., Lectures on the Identification Problem -- The Least Squares Method, Lund Inst. of Tech., Div. of Automatic Control Report 6906, 1968. Astrom, K.J. and Eykoff, P., "System Identification -- A Survey", Automatica, Vol. 7, pp. 123-162. Astrom, K.J., Bohlin, T. and Wensmark, S., Automatic Construction of Linear Stochastic Dynamic Models for Stationary Industrial Processes with Random Disturbances Using Operating Records, IBM Nordic Laboratory Technical Paper TP 18.150, 1965. Athans, M. and Falb, P.L., Optimal Control, McGraw-Hill, NY, 1965. Bacopoulos, A. and Gaff, B., "On the Reduction of a Problem of Minimization of n+1 Variables to a Problem of One Variable", SIAM J. on Numer. Ana1., Vol. 8, No. 1, pp. 97-103. Bellman, R. and Astrom, K.J., "On Structural Identifiability", Mathematical Biosciences, Vol. 7, 1969, pp. 329-339. Bevington, P.R., Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill, NY, 1969. Boling, R., personal communication. Brockett, R.W., Finite Dimensional Linear Systems, Wiley, NY, 1970. Central Electricity Generating Board, "Demand ReSponse to Changes of Voltage", Report PL-ST/1/70, London, 1970. Chen, C.T., "A_New Look at Transfer Function Design", Proc. IEEE, Vol. 59, No. 11, November 1971, pp. 1580-1585. Cheney, R.W., "Numerical Problems of Linear Algebra", Mimeographed lecture notes, Michigan State University, 1969. 77 D1 D2 E1 E2 F1 F2 F3 G1 62 G3 H1 H2 11 12 J1 78 Davison, E.J., "A.Method for Simplifying Linear Dynamic Systems", IEEE Auto. Cont., Vol. 11, No. 1, January 1966, pp. 93-101. Draper, N.R. and Smith, B., Applied Regression Analysis, Wiley, NY, 1966. Efroymson, M,A., "Multiple Regression Analysis, in Mathe- matical Methods for Digital Computers, A. Ralston and H.S. Wilf, eds., Wiley, NY, 1960. Elgerd, 0.1., Electric Energy Systems: An Introduction, McGraw-Hill, NY, 1971. Farrar, D.E. and Glauber, R.R., "Multicollinearity in Regression Analysis: the Problem Revisited", Rev. Econ. and Stat., Vol. 49, February 1967. Forsythe, G.E., "Generation and Use of Orthogonal Poly- nomials for Data-fitting on a Digital Computer", J. SIAM, Vol. 5, 1957, pp. 74-88. Frame, J.S., "Matrix Operations and Generalized Inverses", IEEE Spectrum, March 1964. Gilbert, E.G., "Controllability and Observability in Multi- variable Control Systems", SIAM J. on Control, Vol. 2, No. 1, 1963, pp. 128-151 a Goodman, D. and Hiller, J., "Determination of System Order", First Hawaii International Conference on Systems Science, 1968, pp. 527-530. Gopinath, B., "On the Identification and Control of Linear Systems", Ph.D. thesis, Stanford University, 1968. Haitovsky, Y., "Multicollinearity in Regression Analysis: Comment", Rev. Econ. and Stat., Vol. 51, No. 4, 1969, pp. 486-490. Hurwicz, L., "Least Squares Bias in Time Series", in Statistical Inference in Dynamic Economic Systems, T.C. Koopmans, ed., Cowles Commission monograph No. 10, Wiley, NY, 1950. IBM, "System/360 Scientific Subroutine Package Programmer's Manual", 1966. Isaacson, E. and Keller, H.B., Analysis of Numerical Methods, Wiley, NY, 1966. Jenkins, G.M. and Watts, D.G., Spectral Analysis and its Applications, Holden-Day, San Francisco, 1968- K1 K2 L1 M1 M2 M3 01 P1 P2 R1 R2 31 S2 T1 T2 W1 W2 79 Kalmmn, R.E., "Mathematical Description of Linear Dynamical Systems", SIAM J. on Control, Vol. 2, No. l, 1963, pp. 152-192. Kalman, R.E., Falb, P.L. and Arbib, M.A., T0pics in Mathe- matical Systems Theory, McCraw-Hill, NY, 1970. Lewis, T.O. and Odell, P.L., Estimation in Linear Models, Prentice-Hall, Englewood Cliffs, NJ, 1971. Mann, H.B. and Wald, A., "On the Statistical Treatment of Linear Stochastic Difference Equations", Econometrica, Vol. 11, 1943, pp. 173-220. Marshall, S.A., "An Approximate Method for Reducing the Order of a Linear System", Control, December 1966, pp. 642-643. Monahan, D.D., System Identification by Stochastic Approxima- tion, Ph.D. thesis, Michigan State University, 1970. Ogata, K., Modern Control Engineering, Prentice-Hall, NY, 1970. Park, G.L., Monahan, D.D. and Schwartz, M.W., "An Applica- tion of a Stochastic-Approximation, System-Identification Algorithm to Interconnected Electric Power Systems", Midwest Power Symposium, Ames, Iowa, 1970. Powell, M.J.D., Numerical Methods for Unconstrained Optimization", SIAM Review, Vol. 12, No. 1, January 1970. Ralston, A., A First Course in Numerical Analysis, McGraw- Hill, NY, 1965. Rosenbrock, R.R., State Space and Multivariable Theory, Wiley, NY, 1970. Soderstrom, T., Notes on Pseudoinverses: Applications to Identification, Lund Inst. of Tech. Div. of Auto Control, Report 7003, 1970. Staley, R.M. and Yue, P.C., "On System Parameter Identifiability", Information Sciences, Vol. 2, pp. 127-138. Theil, H., Economic Forecasts and Policy, North-Holland, Amsterdam, 1961. Tintner, G., Econometrics, Wiley, NY, 1952 Werther, M.H.F., Recursive Algorithm for the Best Approximate Solution of Linear Equations, Management Information Services, Detroit, 1970. Wold, H. and Jureen, L., Demand Analysis, Wiley, NY, 1953. W3 21 22 80 Woodside, G.M., "Estimation of the Order of Linear Systems", Automatica, Vol. 7, No. 6, 1971. Zadeh, L.A. and Desoer, C.A., Linear System Theory: The State Space Approach, McGraw-Hill, NY, 1963. Zadeh, L.A. and Polak, E., eds., System Theory, McGraw- Hill, NY, 1969. 16 min”WWWmam {lllfllilillul 3 1293 03169 5 I [IHIHIUIH