EDENTIFTCATTON OF SYSTEMS 0F UNKNOWN ORDER

Thesis for the Degree of Ph. D.
MICHIGAN STATE UNIVERSTTY
MARTIN W. SCHWARTZ

' 1972

 

LIBRARY

Michigan S tam
University

 

This is to certify that the
thesis entitled ‘
I DENTIFI CA "i‘ION OF SYSTEMS

OF UNKNOWN ORDER
presented by I ‘

Martin W. Schwartz

has been accepted towards fulﬁllment ' i
of the requirements for ‘

Ph. D . degree in SYS ' j

WWW

Major professor

Date W 6:, Z 772'

0-7639

 

ABSTRACT
IDENTIFICATION OF SYSTEMS 0F UNKNOWN ORDER
By

Martin W. Schwartz

The primary purpose of this work is to determine the order
of a discrete-time system from input and output observations.

Several tests, based on the properties of dynamic systems,
are given. The best of them is the determinants of the models'
controllability matrices normalized by the product of their input
coefficients. This and the other new tests compare favorably with
existing tests, based on the prOperties of the least squares
estimator, on simulated examples and observed data taken from an
electric power system.

Theoretical prOperties of the least squares estimator when
the order of the model is not equal to the minimal order of the system
are also developed. When the model order is higher than the system
order in a noise free system, extra poles and zeros are added to the
transfer function; all of these are on the same side of the unit
circle. When noise is present, the estimator still converges, but
to an unknown realization. It is also shown that the small sample
estimates might theoretically differ greatly from the noise-free
estimates in such Situations. However, numerical results are quite

close.

Martin W. Schwartz

Variants of the standard least squares estimator are
developed and compared. A starting algorithm for on-line estima-

tion, based on matrix pseudo-inverses is also developed.

IDENTIFICATION OF SYSTEMS OF UNKNOWN ORDER

By
2’»

Martin WSVSchwartz

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical Engineering
and Systems Science

1972

To Joyce

who makes it all worthwhile

ii

ACKNOWLEDGEMENTS

The author wishes to express his sincere gratitude to
Dr. G.L. Park, his thesis advisor for his continued assistance
and encouragement during the course of this research.

Thanks are also due to Dr. R.O. Barr, Dr. J.S. Frame,
Dr. A. Bacopolous, and Dr. J.B. Kreer for their interest and
critical reviews of the thesis and their excellent teaching. The
author also appreciates the financial support awarded through the
Consumers Power Company-Michigan State University Cooperative

Research Agreement.

iii

Chapter

I

II

III

IV

TABLE OF CONTENTS

ABSTRACT
LIST OF FIGURES

LIST OF TABLES

INTRODUCTION

1.1 Motivation

1.2 Background
1.2.1 Equivalent Systems and Identifiability
1.2.2 Identification of Linear Systems

LEAST SQUARES ALGORITHMS

2.1 Statement of the Problem
2.2 Large-sample PrOperties
2.2.1 Uncorrelated Noise
2.2.2 Correlated Noise
2.3 Small-sample Properties
2.4 Systems of Unknown Order

2.4.1 No Noise

2.4.2 Noise

2.4.3 Tests for the Order of the System
2.5 Normalized Least Squares
2.6 Summary

NUMERICAL COMPUTATIONS

3.1 Description of the Algorithms Used
3.2 Description of the Examples

3.3 Computed Results

CONCLUSIONS

4.1 Thesis Results
4.2 Future Development

APPENDIX

BIBLIOGRAPHY

iv

Page

vi

l'-‘

boson—I

18
21
21
23
24
26
27
30
33
39
41

42
42

46
48

69
70

77

LIST OF FIGURES

Figure Page
1 Norm of e vs. Extra Pole 32
2 Determinants of Controllability Matrices 51
3 Ratio of Controllability Determinants (full matrix lse) 52
4 Order Tests 53
5 Mean Square Residuals 54
6 Mean Squared Residual for Example 2 60
7 Noramlized Controllability Determinant for Example 2 61
8 Mean Squared Residual -- Example 3 65
9 Controllability Determinants -- Example 3 66

Table

10
11
12

13

LIST OF TABLES

Equivalent Realizations of a System
Order Tests for Example 1
Critical Values for the F-test
F-test for Example 1

Estimates of the System Order
Order Test Summary for Example 2
F-tests for Example 2

Order Tests for Example 2

Poles and Zeros for Example 2
Order Test Summary for Example 3
Order Tests for Example 3

F-test for Example 3

Poles and Zeros for Example 3

vi

Page
13
50
55
56
57
57
58

59

63

67

67

CHAPTER I

INTRODUCTION

1.1 Motivation

The following problem arose in a study of an electric
power system: find a simple equation which represents the be-
havior of a high-voltage tieline connecting two areas. Such a
mathematical model would be incorporated in simulation models used
in design of the system and would be studied to possibly provide
better control of the system. The Standard theoretical treatments
of these tielines (E2) requires knowledge of parameters which are
often not available and some simplifying assumptions which were
questioned. Thus, it was decided to measure the power and fre-
quency (the relevant variables) on a tieline and form a model
using these measurements.

This is an example of a common problem we will call the
identification problem; namely, to obtain information about a
complicated real-world dynamic system from discrete-time or
sampled measurements of the relevant input and output variables.
Using known information and enough simplifying assumptions to make
the problem computationally feasible, a mathematical model with
unknown parameters is constructed, and the parameters are estimated.
If the model is accurate, some properties of the system can be

determined within certain accuracies. Specifically, consider a

single-input, single-output (SISO) dynamic system which is linear
and time-invariant. If the order n is known, the problem is

to estimate the coefficients of the difference equation:

n n
— +
(1'1°1) 5t 2l aijt-i 2l biut-i

where y is the output, u the input, and yt = y(t) . For

example, a model for one tieline could be

(1.1.2) f = .59ft

t + .405ft

+ . 3 + . 2
OO pt_1 00 pt_

-1 -2 2

where f is frequency and p is power. Sometimes the input is

a test signal, sometimes not; the system may be initially at rest,
or may not; and we may have knowledge or can make assumptions about
the measurement errors and plant noise. In each Situation some
method such as least squares, GausséMarkov (L1), maximum likelihood
(A4), Kalman-Ho (K2), or stochastic approximation (M3) is available
to generate the estimates.

Going back to the Specific problem of modeling the load-
frequency behavior of an electric power tieline, the measurements
must be taken during normal operation, and noise is present as
measurement error in both input (power) and output (frequency) and
as an inherent stochastic element of the system (the system loads
and configuration are partly random) (P1). While the model can be
taken as linear and time-invariant during a short period under
normal ranges of the variables, several factors arise that violate
the usual assumptions of existing methods. First, the stochastic
assumptions necessary to insure unbiased estimates are violated.

Second, we do not have Sufficient information about the system

to obtain the order of the model or relations between the parameters.
Third, we have very limited information about the characteristics

of the noise. Finally, the output and, to a lesser extent, the input
are almost constant.

The fourth factor is important. The usual methods for
testing the fit of a model calculate the residuals; if they are
small, the model is considered good. But here the residuals alone
do not constitute a sufficient test for the validity of the model
because any reasonable identification scheme will yield small
residuals. For example, if a constant model equal to the nominal
value of 60.000Hz is taken, one set of data has an rms residual
of .021 of the estimate. But our goal is to derive some information
about the structure of the system, and not just to estimate the
output. Thus, the almost constant data implies that we must look
for additional criteria to use in testing the model fit.

The constancy causes additional problems. Many identifica-
tion.methods involve solution of a set of equations of the form
Y =IXA derived from (1.1.1). Several columns of X will contain
output values (which in the above example vary from 59.980 to
60.017Hz), and these columns will be almost linearly dependent.
Therefore, the solution will be very sensitive to small perturbations
in the data and to round-off errors in the numerical technique.
Another aspect of the problem is that the high intercorrelation of
the columns of X makes it very difficult to determine empirically
a reasonable order for the model. This is important because, as
shown in Section 1.2.1, different models may exhibit identical

input-output behavior and one might be stable and the other unstable,

one controllable and the other uncontrollable, or have other
divergent prOperties.

The following example from numerical analysis (12) shows
this problem in a slightly different context. When solving the

differential equation

dy/dt = -y. y(0) = yo

with a third-order difference approximation (of a specified form)

for the derivative, the difference equation

yt + (3/2 +3h)yt_1 - 3yt_2 + %yt_3 = O

is obtained, where h is the sampling interval. Three distinct
solutions of the form yt = at are obtained. All of the 8's

satisfy the equation
3 2
a + (3/2 + 3h)a - 3a +'% = O .

For sufficiently small h, one eigenvalue is less than -2. Thus
the difference solution becomes unbounded while the true solution
approaches zero. The problem is due solely to the addition of
the extraneous eigenvalues caused by too large a model order.

Our primary motivation is solving the practical problem
of how to choose a model. In doing so we will not always be
rigorous from a probabilistic interpretation. This is deliberate
and is common in developing numerical techniques; Powell (P2),
when discussing methods for finding the maximum of a function

F(x1,x2,...,xn), says:

The lack of definiteness in stating the conditions
on F(x ,x ,...,x ) is deliberate, because we are
l 2 n

describing the current state of optimization, and

it happens that the current state is not a logical

structure of theorems. Instead it has developed

from an assortment of numerical methods which have

been devised because real problems had to be solved,

and at present the actual success of the algorithms

is far ahead of any theoretical predictions.

The model-order problem is much less developed than the maximiza-
tion problem; numerical, rather than theoretical, results for
the model order problem are emphasized.

There are some new theoretical results in this thesis:
Theorem 2 on the asymptotic convergence of least-squares estimators
with correlated noise; Theorem 3 on small-sample properties with
white noise, and Theorem 4 on the location of the additional poles
with too-high a model order and no noise. The algorithm in
Chapter III for the pseudo-inverse of the sum of matrices of a
certain form is also new, and a variant of the least squares
algorithm.which we call the reduced least squares algorithm.has
not been explicitly discussed in the literature. However, for
the solution of the practical problem, the main contribution is
the application of the theory of equivalent representations to
the study of the identification problem.

The nature of the problem precludes a logical structure
of theorems at the present time. The rest of this chapter and
most of the next develOp theory necessary to the understanding
of the tests for model order and their numerical comparison.
Relevant literature is reviewed when the appropriate subject is

discussed and is often juxtaposed with the new material; this is

true, in particular, for the section on the model-order problem.

1.2 Background

In this section we will define and discuss several con-
cepts necessary to define the problem more clearly and to compare
each method with others. The first part distinguishes the dif-
ferent ways we can represent a system and discusses related ques-
tions; although the discussion will be for linear, discrete, time-
invariant 8180 systems, most of the ideas are valid for multivariate
and time-varying ones. The next part describes the different loss
or error criteria we can use for identification. The last part
is a brief outline of several significant identification algorithms
and how they become more elegant and efficient as we have more
information and more control over how the information is obtained.

It is necessary in what follows to distinguish between a
real~world system, a mathematical system, and a mathematical model
of a system. A real-world system is, of course, what we find in
nature and almost always lacks desirable prOpertieS (such as true
time-invariance) that mathematical systems can have. In Section
1.2.1 we will distinguish between a mathematical system and a model
or representation of it. This distinction is more subtle and is
more important because, although we are developing algorithms with
the intent that they will be applied to real-world problems, much
of the time we will have to talk about models of mathematical

systems.
1.2.1 Equivalent realizations and identifiability

In the remaining chapters we talk about "the order of the

system" and "the order of the model" as if they were well-defined

concepts, and it is important that they be well defined. The
most obvious definition (for either concept) would be the largest
lag in the difference equation, i.e. the parameter n in equa-
tion (1.1.1), but this is not sufficient because it does not dis-
tinguish between difference equations which have identical input-
output behavior.

This problem can be thought of (Cl, K1) as pole-zero can-
cellation of the transfer-function T(z) relating input to output.
discrete transform function is the z-transform (01) of the dif-
ferential equation representation of the system. The z-transform
is the discrete version of the LaPlace transform and has many of
the same properties. An example of the use of the z-transform
and the corresponding derivation using the difference equation is
in the Appendix. If we derive T(z) from theoretical considera-
tions and get a pole-zero pair, then it is well known (61) that
if this pole is unstable, then the real system will act as though
it was unstable even though classical z- (or LaPlace) transform
theory tells us that we may cancel that term in the mathematical
model and look at just the other poles. The reasons for this
deviant behavior are that real components are not linear over all
possible input-output values (C2) and that they do not exhibit
exact pole-zero cancellation (Kl).

A simple example (C2), using a continuous time dynamic
model, demonstrates this. Let us set up the system with total
transfer function T(S) = l/(S-l) using two components in series

such that the first has T1(s) = l/(s+l) and the second

The

T2(8) = (s+l)/(s-l), so that T(S) = T1(S)T2(S). With zero initial

conditions and unit step input, theoretically y(t) = l - exp(-t)
the same as if we had used only one integrator. Real components,
however, are linear for only a limited range of their input and
output. Thus, after a while, the output of the first component
or the input of the second would be overloaded, and the observed
behavior would deviate markedly from the theoretical. This is
true even if all the components are perfect; if the pole and zero
do not match exactly, or if there are slight perturbations in the
system, the effect is increased.

However, we are deriving T(z) from observed data, not
theoretical considerations. As explained later in the section
the input-output data gives us information only about the con-
trollable, observable part of the system, and this part has no
pole-zero pairs. If everything is exact, and we take the model
order to be larger than the system order, the model transfer
function will have pole-zero pairs (see Section 2.4.1) and can-
celling these will yield the transfer function of the (con-
trollable, observable part of the) system; here we cancel the
pole-zero pairs because any occurence of such pairs from input-
output data is due to a flaw in the identification algorithm or
an error we have made because we cannot identify those pairs which
do cancel. If there are stochastic elements, then we do not get
exact pole-zero pairs because of the identification errors, but
we would like to guarantee that the behavior is not much dif-
ferent than the behavior of the noise-free system.

We now have two reasons for examining concepts related to

equivalent representations. First, we want to have an unambiguous

statement of what "the order of the system" means. Second, we
must be able to compare two systems and/or models and be able to
say when their behavior is identical or close.

There are three different ways of representing the rela-
tion between the input and output whidh we shall use: difference
equation, transfer function, and state-Space model. Each of these
has advantages and disadvantages, some of which are discussed below.
But first let us explicitly state them. The difference equation

representation is
n n
. . = . +
<1 2 1) yt 21 a y -1 z b.ut 1

where the 8i and bi are constants and n is the maximum lag.

Its z-transform is (01)
(1.2.2) T(Z) =

There are several standard state-Space representations of the

difference equation (1.2.1) (01), one of which is as follows: let

:3‘
II

b + a h
(1.2.3)

h = b + a h +...+ a h
n n

1

Now define the State variables:

x1(t) = yr

x2(t) = x1(t+l) - hlut

(1.2.4) °

xn(t) = xn_1(t+l) - hn_1ut

10
Then the state-Space model consists of state- and output-equations

X(t+l) = AX(t) + But

 

 

(1.2.5)
yt = CX(t)
where
X(t) = col(x1(t),...,xn(t))
r6 1 ... o o7
O O ... O 0
(1.2.6) A = I
O O ... 0 1
L?“ an”1 ... a2 aL
B = col(h1,...,hn)
C = (l,O,...,O)

Two important concepts derived from state-space representa-
tions that we will need for the model-order problem are control-
lability and observability (G1, 01, K1). Formally, a state-space
realization is controllable (observable) if its controllability

matrix Qc (observability matrix Q0)

(B. AB....,A“'IB)

D
II

(1.2.7)
- (CT,ATCT,...,(AT)n-1CT

O
I

)

has full rank. Intuitively, it is observable if we can see what
is happening internally (in the state vector) by looking at the
output. It is controllable if we can manipulate the state using
the input. The difference equation representation contains in-

formation only about those parts of the system which are both

ll

controllable and observable. In the above example, the differential
equation does not Show the output of the first integrator or the
input to the second, and it does not allow us to manipulate them
so that the desired input-output relation holds for all values.

The state-space representation (1.2.3)-(1.2.6) has
observability matrix equal to the identity. This can readily be

seen because CT is the first column of the identity and (AT)kCT

is the first column of (AT)k which is the kth column of AT and
is equal to the kth column of the identity. To illustrate several

ideas in this section we will use Example 1, Section 3.2, whose

difference equation form is

. . = . + . - o - o + .5 °
(1 2 8) yt 8yt-1 39yt_2 27yt_3 But-1 ut-Z + 1Ut-3

Its z-transform is

-.522 + .52 + .1
z - .82 - .392 + .27

(1.2.9) T(z)

 

_ -.5(z-l.17082) (z + .17082)
“ (z-.9)(z+.6)(z-.5)

And for the state-space representation we are using

0
(1.2.10) A = 0 C = (l O 0)
2

Got-Io
II!
II
OHU‘

We are now ready to Show that (1.2.8) and the other
representations are not unique, i.e. there are other difference
equations which have identical input-output behavior. The material
is partially contained in several references (A3, B5, R2, 21, 22)

which differ among themselves in definitions; ours will also differ,

12

but will generally follow the later sources. We Shall define a
system as an input sequence and its corresponding output sequence.
The triplet (A, B, C), defined above, is called a (state-)
realization of the system. Any system obviously has many realiza-
tions such as (A, B, C) and its similar realizations

(T-IAT, T-lB, CT) where T is any nonsingular matrix of appropriate
size. We shall say that two realizations, S and S', are equivalent

if, given an initial state x0 of S, there is an initial state

x; of S' such that

y(t; XON) = y'(t; xgw).

More Simply, two realizations are equivalent if the same input
string applied to both yields the same output string. Obviously,
two Similar realizations are equivalent, not all equivalent
realizations are similar; while all Similar realizations have the
Same dimension state vector, not all equivalent realizations do.
If there is no realization with a smaller-dimension state vector,
then the realization is called minimal. A11 minimal realizations
are similar. Finally, the order of the system is the dimension
of the minimal realization. We will also call the coefficient
vector of the difference equation form (1.2.1) a realization,
keeping in mind that formally we mean the one obtained from (1.2.1)
by using (1.2.3)-(1.2.6).

If the order of the system is m, then there is no dif-
ference equation with fewer than m lags which realizes the
system. If we do not know m and try models with other maximum

lags, n, then the models will also realize the system. Thus we

13

will use the following terminology to differentiate between the
'maximum lag and the order of the system: by "the order of the
model" we will mean the largest lag of the difference equation

in the realization being used. Table 1 illustrates the difference
between "the order of the system" and "the order of the model"

by showing several equivalent realizations of Example 1, where

the order of the system is 3.

Table 1. Equivalent Realizations of a System

order of

the model a1 a2 a3 84 b1 b2 b3 b4
3 .8 .39 -.27 --- -.5 .5 .1 ---
4 .8 .39 -.27 0 -.5 .5 .1 0
4 1.8 -.41 -.66 .27 -.5 1.0 -.4 -.1
4 -.466 -.657 .140 .090 -.5 .332 .267 .033

The following facts about minimal and non-minimal realiza-
tions will be useful:

1) There is no pole-zero cancellation in the transfer
function form (1.2.2) of the minimal realization; non-minimal
realizations have pole-zero cancellation. (1.2.2) shows that the
model of order 3 has no cancellation; the models of order 4 in
Table l have additional factors of z, (z-l), and (z - .334021),
respectively. This is the main advantage of form (1.2.2), since
factoring the numerator and denominator of T(z) is the easiest
way to determine if a realization is minimal. The main dis-
advantage is that T(z) does not account for the initial conditions

unless they are zero.

l4

2) The difference equation form (1.2.1) of a minimal
realization is unique; this is not true of non-minimal realiza-
tions of fixed order as Table 1 illustrates.

3) Given any realization of a system and its con-
trollability and observability matrices QC and Q0, then the
order of the system is rankKQzOc). Realization (1.2.3)-(1.2.6)
has the advantage that the order of the system it represents is
equal to the rank of’ QC, making calculations easier.

We will finish the discussion of equivalent realizations
in Section 2.4, after some other relevant material has been pre-
sented.

The last question in this section concerns identifiability
(B2, 82). A model is identifiable if its coefficients can be
determined uniquely from the input-output data. This requirement
of uniqueness is desirable to inhibit the estimator from oscillating
between two correct coefficient vectors (i.e. between two realiza-
tions). If m is less than n, there are many possible solutions
and the uniqueness property is not satisfied. But there can be
convergence without uniqueness (Bl); for example, the algorithm
can converge to the realization with minimum norm. This is the
case here (see Section 2.4.2), and we do not care very much which
realization is approached. Thus we will procede to identify the
parameters even if they are not identifiable in the formal sense.

They will be identifiable in practice.
1.2.2 Identification of linear systems

If we were extremely fortunate, we would be able to use a

test signal for the input and observe the output without any noise.

15

For such situations, the Ho-Kalman algorithm (K2) provides an
elegant method for determining the order of the system and the para-

meter vector. Let the input be a pulse and let

 

”I 3'2 ”T
y2 y3 ...
(1.2.11) H = . . ...
L: ":1

 

H is a Hankel matrix, i.e. its (i,j)th entry depends only on
(i+j). Let its principal leading submatrix of dimension n be

denoted by Hn. If the minimal order of the system is ‘m, then
(1.2.12) det(Hn) = 0 if and only if n > m.

This gives a method for determining the order of the system and

a realization of it:

>
ll

‘H;1(sHm), where s is the shift operator

(1.2.13) B

col(l, O,..., O)

c = cyl. y2,.... ym>

The algorithm generalizes easily to the Situation where there are

r inputs and p outputs. Then H is a block Hankel matrix

with r X p blocks, and the above expressions for A, B, and C

are pre- and post-multiplied by "editing" matrices, composed of

the identity and zeros, which are easy to compute. If there is no
noise, but a pulse input cannot be used, the method does not work
because it is impossible to set up a matrix which has the properties
of H which are needed. But there is still no problem because

we can set up a system of linear equations of apprOpriate size and

l6

solve the parameters. On the other hand, there are problems which
have not been completely solved, whenever noise is present, no matter
what the input.

If there is noise, in general the equations are not con-
sistent, so it is necessary to choose some criterion in making
the estimates. The most common criterion is least squares, where
the sum of the squared differences between the model estimates and
the observed values is to be minimized. This method, elaborated in
Chapter II, is the one we have chosen because it has several advan-
tages. It is easy to compute the estimates, and it is essentially
distribution free, i.e. it is not necessary to know the distribution
of the stochastic elements a priori; it is unbiased as long as the
noise is uncorrelated. Other methods, such as maximum-likelihood
and the Bayes' method, which maximize probability density functions,
require known density functions of the residuals.

A set of variants of the least squares method is weighted
least squares. Regular least squares solves a set of equations of
the form. Y = XA derived from (1.1.1). Then weighted least squares
solves WY = WXA, where W is a positive definite matrix. A well-
known example of a weighting matrix is the inverse of the covariance
matrix of the residuals. This variant is called the Gauss4Markov
or Best Linear Unbiased (BLUE) estimator. Another variant is the
instrumental variables method (A3), and a third is discussed in
Section 2.5.

These are the most common criteria in identification
algorithms. There are some other criteria which cannot be used

with a single estimate, but which can be used to compare different

l7

estimates. For example, we might want the parameter estimates

to be near their true values. Since we do not know the true values,
parameter errors are not a basis for an identification algorithm,
but we can compare the parameter estimates from different estimators.
For dynamic systems, one way to compare them (which is independent
of model order) is to compare the steady State response to a unit-

step input. As shown in the Appendix,
(1.2.14) yss = (z bi)/(1 - 2 81)

where the a1 and bi are the parameters in the difference equa-
tion (1.1.1). We can also compare the estimates of the poles and
zeros of the transfer function of a dynamic system. This topic

is discussed in detail in Section 2.4.3.

CHAPTER II

LEAST SQUARES ALGORITHMS

In this chapter we will discuss least-squares estimation
of discrete, single-input, single-output systems using different
assumptions about the noise, when the order of the system is
known and when it is not. Two variations of the standard technique
are given: a new one which has improved small sample properties,
and a weighted one. We will also discuss tests for the order of

the system.

2.1 Statement of the Problem

A deterministic system of order n can be represented

in discrete time as the difference equation

(2.1.1) x = z“ a.x

n
l 1 t-l + £1 biv

t-i

where input v and output x are given at equally spaced times.
If stochastic elements are present, we shall write the equation

to include modeling errors as

_ n n
(2.1.2) yt - 21 aiyt-i + £1 biut-i + et

where et will be explicitly defined as appropriate for each
theorem. After N observations, (2.1.2) is usually written in

matrix form as

18

19

(2.1.3) YN = ¢Ne + EN’ where
YN = col(yn+1, yn+2,..., yn+N)’ an N X 1 output vector
EN = col(en+1, en+2,..., eniN)’ an. N x 1 noise vector

9 = col(a1,...,an, b1"°"bn)’ the Zn X l parameter vector,

and

(311:: :: :

[fniN-l ... yN UnﬁN-l ... u

 

 

an N X 2n observation matrix. The least squares estimator (lse)

6N of e is given by

(2.1.4) 9N = ngN

where + denotes the Moore-Penrose generalized inverse. If

N 2 2n and ¢N has full (column) rank, then,

(2.1.5) 0N = (¢:¢N)‘1¢§YN

The proof is straight-forward using Lagrange multipliers; the
errors EN dr0p out of the solution because their derivatives
with respect to the parameters are zero.

The large and small sample properties of the lse are given
in the next two sections; there it is shown that it is biased for
a finite number of Samples even when it is asymptotically unbiased.
The variation we call the reduced lse, which is unbiased for finite
samples, can be obtained by deleting entries in YN’ EN, and ¢N

as follows: let

20

YR = C01(yn+1’ YZ(n+1)3°°°’ yK(n+1))

and make the corresponding changes in EN and ¢N so that

(2.1.6) YK =(bK9-l-EK

When this is done, any yt, t = l,...,N = K(n+l), will appear
just once in (2.1.6) while it usually appears (n+1) times in

(2.1.3). For example, let n = 2 and N = 7. Then

 

 

 

 

 

 

 

 

”3’31 (3’2 )’1 L12 “17 F837
ya y3 y2 u3 u2 rail ea
y5 Y4 y3 “4 U3 82 - es

YN = ’6 = y5 ya u5 “4 b1 + es
y7 y6 y5 ”6 u5 LPQ. e7
y8 y7 y6 u7 “8 e8

L79J L78 y7 u8 U7- LSQJ

for the full matrix lse, and
y3 y2 y1 u2 u1 rail e3
YK = y6 = y5 Y4 u5 “4 a2 + 86
y9 y8 y7 U8 u7 b1 89
. EZ‘

 

 

for the reduced matrix lse. The reduced estimator is
+
(2.1.7) 6K - (DKYK

Section 2.4 discusses the lse's when the true order of
the system is not known and studies the practical problem of
choosing a model order; Section 2.5 gives another variation --

normalized least squares.

21

2.2 Large-sample Properties

In this section the asymptotic prOperties of the lse of
system (2.1.2) are derived under uncorrelated and correlated noise.
The asymptotic prOperties of the reduced matrix form are the same

as for the full matrix form.
2.2.1 Uncorrelated noise

The simplest stochastic case is independent system noise,
i.e. where the random variables et are independent and identically
distributed (iid) and are independent of the input and output.
This means that there are no observation errors, but only an inherent
stochastic behavior in the system. Astrom (A2) states and sketches
the proof (with minor errors) that in this case the least squares
estimator converges in the mean square. The proof is sketched below
because some intermediate results are used in the rest of the chapter.
Theorem 1. (Astrom) Let the system be given by (2.1.2) and assume:
1) the residuals et are iid with zero mean, have all
moments finite, and are independent of the input and
output;
2) the system is stable;
3) lim ﬁg ﬁ'ut is finite and lim :: % utut-s = Ru(s)
is finite for s = 0,1,...;
4) the matrix whose (i,j)th element is Ru(i-j) is
positive definite;
5) the order of the system is known.

Then the least squares estimator (2.1.4) converges to e in mean

square.

22

The assumptions of this theorem are common in identifica-
tion: the first characterizes the errors and will be generalized
in Theorem 2; Assumption 2 guarantees that the output is bounded
(bounded output should be sufficient for the proof); Assumptions
3 and 4 insure that the input is well behaved (for example,
essentially constant inputs such as a unit step or impulse will
not work for least Squares identification); and the last assumption
guarantees uniqueness. The finite difference representation is
used for identification because it gives the unknown parameters
in terms of directly measurable variables. When the (minimal)
order is known, this representation is unique.

Outline of proof (for more details see the proof of Theorem

2 in the appendix): Let

_ _ _ _. T
(2.2.1) BN - N ¢N EN and CN —'N ¢N ¢N

Assumptions 2, 3, and 4 imply that lim EN = 0 and lim CN = C

exists and is positive definite. Define

(2.2.2) ZN = CNS};

where 9N = 6N - e is the error of the N-th estimate. Then

(2.2.3) 2 = l E = B
N ¢N N

It is now possible to Show that

_ T _
E(ZN) — 0 and E(ZNZN) c,

E.
N
I

where s = var(et). Now E(ZN) = C lim 8N, so the lse is unbiased.

Finally,

23

lim E<9NT9N) = lim sagzN) = lim% cue),

so the estimate converges in mean square.
2.2.2 Correlated noise

We will now generalize the results to allow et to be
system noise correlated to observation noise.
Theorem 2. Let the system be given by (2.1.2) and assume:

1) The residuals et are identically distributed with

zero mean, all moments finite, E(ete ) = 0 for

t+s
s > n, and are independent of the noise-free input
and output.

2) - 5) same as in Theorem 1.

Then the lse (2.1.4) converges to (e + C-IB) in mean square,

where C is defined by (2.2.1) and

(2.2.4) B = C01(E(ytet+l)’°°°’E(ytet+n)’ E(ute ),...,E(ute ))

t+1 t+n

The proof is given in the appendix. It is similar to the proof

of Theorem 1, but now
E(Z ) = B 1‘ 0 and E(ZTZ ) = 1 ("weighted c").
N N N N

The term C-IB is the asymptotic bias, which is unknown in most

practical situations.

We conclude this section with two comments. First, the

assumption that E(ete = O for s > n is necessary. If it

t+s)
did not hold, the actual order of the system would be greater than

n, as it would be if the numerator of the transfer function was

higher than the degree of the denominator. Second, if there is

24

independent observation noise, but no plant noise, than (2.2.4)

can be replaced by

(2.2.5) B = col(-alsy,...,-ansy, -blsu,...,-bnsu)

where s and sy are the variances of the input and output noise
u

respectively.

2.3 Small Sample Properties

While it is theoretically important to have desirable pro-
perties when an arbitrary amount of data is available, in practical
Situations it is more important to know the properties when only a
limited amount of data is available. But small-Sample pr0perties
are more difficult to obtain because they often involve non-linear
transformations. Thus, while the proofs of Theorems 1 and 2 (for
large samples) did not involve the distribution of er, the negative
results of the following theorem (for small samples using the full
matrix lse) are proved explicitly only for an example involving
a particular distribution and Specific values for N and n.
Theorem 3. Under the assumptions of Theorem 1, the full matrix
lse (2.1.5) may be biased for finite samples, while the reduced
matrix lse (2.1.7) is unbiased.

Proof: From (2.2.2) and (2.2.3) we see that
(2.3.2) 9N =¢;E

et is independent of all uS and all eS and yS, s f t. So

in the reduced matrix formulation, each element of EK is inde-

+
pendent of each element of RK’ which implies that E(®REK) = O,

25

and 8K is unbiased. On the other hand, for the full matrix

+
formulation, E and ¢N are generally dependent. Let us take

N

the simplest example, n-1 and N = 2. After some algebra, we

 

obtain
1 e2(u2-au1)
(2.3.3) ' = _
9N €2U1+“1(ay1+b”1) y1“2 e (ay +2bu +e )
2 l l 2
Thus,
m c3e
‘ — _.______
(2.3.4) E(a ) — [ c1 +*cze f(e)de

where c1, c2, and c3, obtained from (2.3.3), are independent

of e2, and f(e) is the density function of the error. The
expected value of b' is similar, but has an additional term,
quadratic in e, in the numerator. We will not attempt to find
the class of density functions which give unbiased results; but if

et is uniformly distributed on (-d,d), then

c c - c d
(2.3.5) E(a') = %[2— + if log 23-1—2171} 0, almost surely.
1 2c1 2 1

This is the only bias we will explicitly find, but if either n
or N is increased, (2.3.4) becomes the multiple integral of the
product of a joint density function and a rational function in
several variables, so the full matrix lse will give biased small
sample estimates most of the time.

Of course, there is a price to pay for the unbiased reduced
matrix lse: the convergence factor for the full matrix estimator
n+1

while it is %'= -- for the reduced formulation.

15 N

l
N’

26

For correlated noise, the situation is less clear. First,
it is impossible to eliminate either small or large sample bias
because EK and é; are dependent for both estimators. Second,
it is more difficult to compare the standard estimates, because
each estimator has a different weighting matrix. However, numerical
results in Chapter III often indicate better results using the
reduced lse. It is possible that either the small sample bias or
the variance of the estimator is reduced.

The only other results on small-sample estimates are in
Hurwicz (H2). He shows that the least squares estimates are biased
for a first order free response system when a finite number of

samples are available and gives specific values of the bias under

very particular circumstances.

2.4 Systems of Unknown Order

The discussion so far has assumed that the order of the
system is known; if the system is simple it is usually possible
to determine the order from physical, economic, or other relevant
theoretical considerations. But if the system is complex, it may
not be possible to determine the order from theory. One reason
is the problem of aggregation -- even when it is possible to model
each component of a system, the total may not act like the sum of
the components. This phenomenon has been noted in engineering (Cl)
and economics (Tl). Another reason is that the order of a
"complete" model (if it is possible to construct one) may be in
the thousands, while the dominant behavior is of low order. Thirdly,

the order of the observable, controllable part may be less than

27

the sum of the orders of the subsystems (61).

Thus we are motivated to study parameter estimation when
the order of the model is not that of the system. First, least
squares estimation when the order of the model is higher than the
order of the system is presented. Then tests for the order are
presented. While not completely general, we will assume that the
system has the same form as the model, but has a different order;
this assumption will allow us to use several tools to solve the
model order problem. Let the order of the system be m so that

it may be represented as

— m m I
(2.4.1) Yt ' 21 ciyt_i + 21 diut-i + 8t

and let the order of the model be n so that

_ n
(2.4.2) yt — £1 aiy -i + £1 biut-i + et

Further, we will assume that n is larger than m unless other-

wise specified.
2.4.1 No noise

Using the material from Section 1.2.1, we see that there
is not a unique difference equation if n > m. If n-m = p, then
multiplying both the numerator and denominator of the transfer
function T(z) by the same arbitrary p-th degree polynomial yields
an exact n-th order realization of the system. Two such realiza-

tions are

(2.4.3) 9 = col(c ., c , 0,...,0, d

.. ... 9 0 ... 0 ,
1’ C2’ m 1’ ’dm ’ ’ )

28

where the C1 and (11 are the coefficients in the minimal
realization (2.4.1), and the least squares realization described
below which is due to Soderstrom (81). Table 1 Shows examples
of fourth order realizations of a third order system; the second
row in the table is realization (2.4.3), and the last row is the
least squares model.

With no noise, the rank of ¢ is m+n, so ¢T® is not

invertible. The lse becomes

_+_T+T

(2.4.4) 9N ¢NYN (¢N¢N) ¢NYN

The least squares realization has the smallest norm of any n-th
order realization. To characterize the lse we need some new

notation. First, rearrange the columns of ¢ and the elements

of 9 so that the input and output at the same time are adjacent.

 

 

Then
yn ”n yn-l ”n-I y1 “Ii
(2.4.5) ¢o = I ;
Lyn-I'N-l un-tN-l yn-tN-Z un-l-N-Z yN “IL
(2.4.6) 90 = col(x 02p)
where

x = (Cl, d1,..., cm, dm)

and 9k is the null vector with k elements. Now define

29

91 = c“(021-2

6 = c°1(al’b1"°"an’bn)

-1, O X i=1,ooo,p

OZp-Zi)’

T T
H =C01e 300-39
(1 p)

B = col(g1,...,gp)

where G is obtained from the factorization of T(z)

n n-l
blz + bZZ +3..+ bnz
n zn-l
1 ... n

(2.4.8) “T(z) =

p-l

 

 

m P
+OOO+ 0..
= dlz dmz . z +-glz +| +812
m p p-l
-...- + +OOO+
z c z glz gp
9 can be written as

From (2.4.1) it is clear that H¢: = 0, so that premultiplying

by H yields the solution
(2.4.10) 8 = -(HRT)'1890

Numerical examples of this computation are given in Section 3.2.
One property of the real system that the model should

reflect is stability or instability. The model will be unstable

if one of the extra poles of T(z) lies outside the unit circle,
even though the actual system is stable. We want to insure that
the algorithm used to estimate the coefficients does not introduce
instability into the model; this is especially important if there
are stochastic elements present because then the best we can expect

is that the pole-zero pairs will be approximately, but not exactly,

30

equal. Obviously, (2.4.3) yields a stable realization because

G = Op. The situation is not as neat for the lse, as the follow-
ing theorem shows.

Theorem 4. Let a system be given by (2.4.1) and its model by
(2.4.2), where n-m = p > O, and assume that et and e; are
identically zero. Then the lse (2.4.4) introduces p additional
pole-zero pairs to the transfer function (2.4.8). Either all these
pairs are located inside the unit circle or all are located out-
side the unit circle. In particular, if p = l, the extra pair

is stable.

The proof of Theorem 4 is in the appendix.

Our attention has been focused on the case when n is
greater than m because there does not seem to be any practical
problem when n is smaller. The relevant theoretical work, called
the reduction of dynamic systems, (Al, D1, M2) has assumed that
the system.is known and all that is desired is a lower order
approximation to it. Our situation is somewhat different because
all we have are input and output values, but it does not appear to
be much different. We get a projection from m dimensional space
to n dimensional space which does not seem to effect stability
or controllability. In fact, it is sometimes difficult to dis-
tinguish between the reduced and unreduced models both in our
identification problem (see Section 3.3) and in a similar method

for the reduction of known systems (Al).
2.4.2 Noise

There is a great gap between the theoretical development

and the practical results when there is noise and n > m. The

31

practical results are amazingly good, while the limited theory
says that we have no right to expect anything very close. The
basic problem is that the noise-free ¢N matrix does not have
full rank, and the observed, noisy ¢N has full rank almost
surely (a.s.). Thus, if C = lim %'¢§¢N is invertible (this
limit of a sequence of non-singular matrices is not necessarily
non-singular), then Theorems 1 and 2 are still valid (with minor
changes). C does have full rank a.s. if there are only

independent observation errors. To Show this let

(2.4.11) ¢N = DN + PN

where DN is the deterministic part and PN are the observation

errors. Then

__1_T lT =
(2.4.12) E(CN) — E(N DNDN) +E(N PNPN) D + P
D has rank (n+m), but P is a diagonal matrix of rank 2n,

(2.4.13) P = diag(sy,...,sy, su,...,su)

so C has rank 2n a.s. If the stochastic part is just independent

plant noise, as in Theorem 1, then
P = diag(s,...,s,0,...,0)

so C is still invertible if 2n 5 ZmNn, and Theorem 1 is still
valid. The small sample properties of Theorem 3 are also correct
if n 5 2m, but the full matrix formulation is also unbiased if

n > 2m.

32

It is not clear exactly what happens asymptotically (when
the number of samples becomes large). The estimates do converge
to some estimate; if there is just independent plant noise, the
estimate is an exact realization of the system. It would be very
difficult to decide which realization is approached, because it
would require a very detailed probabilistic argument on the errors
in (2.1.3) and knowledge of the relation between (D+P)-1 and
P"1 and D.1 in (2.4.12) which not available. It would be nice
to say that the answer was either the noise-free least squares
estimate or readization (2.4.3). In fact numerical simulations
in Chapter III and in (A2) have sometimes indicated one and some-
times the other.of these. The reason for this lies in the fact
that we are trying to minimize the norm of the parameter vector
when there is more than one realization. Figure 1 shows the norm
of the estimate for Example 1 when n = 4; while there is a minimum
at the lse, the curve is very flat. Soderstrom (81) indicates
that machine roundoff error alone can cause a large error in the
estimate, so it is not surprising that simulations have not

generated consistent results.

1.4

1.2

 

 

-1/3 0 1/3 2/3 1

Figure 1. Norm of e vs. Extra Pole

33

If we cannot say much theoretically about the asymptotic
estimate, we can say even less about the small sample estimates.
The natural approach is to let P, the stochastic part of E(CN)
approach zero, but then the rank of E(CN) changes from 2n to,
at most, (n+m). When the rank of a matrix changes, the (pseudo-)
inverse is no longer a continuous function of its elements as

the following examples show.
e o '1 /e 0e] 0 0+
0e 0 1/ but 00 g 00
He 1 '1 +e -1 1 1+ 1 1
1 l+e =21 1 1+ b‘“: 1 =2:11
e +2e - e

Thus, the noisy lse might, theoretically, be quite different

 

from the no-noise lse for small samples.
But the numerical results in Chapter III are much different
than the pessimistic theory. Even with noise the computed poles and

zeros are often quite close to the noise-free values.

2.4.3 Tests for the order of the system

The literature contains very few papers on the model-order
problem per se (A2, 62, W3), but there has been work on choosing
a regression model from a set of possible models. All of the
existing work on choosing the order of a linear, dynamic system
has just modified the general regression tests to fit the particular
problem. These existing tests fit into two categories. We can
say that ¢ has full rank a.s. if and only if n s m and that

the problem is to determine the largest n so that the columns

34

of ¢ are sufficiently independent. On the other hand, we can

look at the residuals and say that they should have zero mean and
approximately a Gaussian distribution if the correct order is

chosen. None of these methods make use of the considerable body

of knowledge of linear systems that is available. After a rather
complete review of the existing tests of model order, with brief
indications of the numerical results in Chapter III, we shall pre-
sent some new tests which do make use of the theory of linear systems,
in particular the theory of equivalent representations.

The second point of view, residuals, is more common in the
engineering literature (A2, Jl, W3). There are several non-
rigorous tests based on the residuals. Draper and Smith (D1) out-
line several ways to examine the residuals of a given model such
as plots of the residual versus time, the input, and the output,
examination of outliers (those residuals with large magnitudes),
and run tests; none of them seem very useful except at a very pre-
liminary stage of investigation. A much better one, originally pro-
posed by Forsythe (F2) and since used by several others (B4, J1,
W3), is often useful. Forsythe considers the problem of determining
the degree of a polynomial when only noisy values are available
and gives the following heuristic algorithm: Add one degree at a
time to the approximating polynomial; the sum of the squared
residuals (or the RMS residual) should decrease rapidly until the
correct degree is reached and then remain fairly constant. Boling
(B4) has modified the algorithm to include a stopping rule based
on the relative decrease. Numerical examples indicate that

Forsythe's method works well in determining the order of a dynamic

35

system when the variables have a wide enough range but does not
work well when the output is relatively constant. Other residual
norms, such as the largest absolute one, give Similar results.

In all the rigorous tests based on the residuals, it is
assumed that they are independent samples from a zero~mean Gaussian
distribution. This restriction may be more important theoretically
than practically, at least in part because of the central limit
theorems. The most common of these is an F-test (Al, El, Bl). Let
V1 and V2 be the sum of the squared residuals for models of
order n1 and n2 > n1, respectively, over the same N observa-

tions. Then

V - V N - 2n
(2.4.14) F = V—1———2 . 2( _2n)
2 I“2 1

 

has an F-distribution with 2(n2 - n ) and (N - n - 1) degrees

1 2
of freedom. The F-test says to accept the hypothesis that

n >11 2m

2 1

if F is smaller than an apprOpriate value of the F-distribution
and reject the hypothesis otherwise. This test also works well
when the variables have a wide enough range. Another test on the
residuals (J1) is to compute the sample spectrum of the residuals
and test the integrated Spectrum using a Kolmogorov-Smirnov
statistic.

The first point of view, the rank of ¢, is more common
in the econometrics literature (Fl, Hl, W2), where it is called
the multicollinearity problem (the word identification being

reserved for a different problem). ¢ has full column rank if and

36

only if ¢T¢, a square matrix, is invertible, so tests based on
det(¢T¢) seem reasonable. Both Farrar and Glauber (F1) and Haitovsky
(Hl) do so. However, they use the correlation matrix, whose de-
terminant is between zero and one and is based on a zero-mean pro-
cess, instead of the unstandardized ¢T¢. Haitovsky's test seems

more reasonable: Let Rn be the Zn X 2n correlation matrix.

Then, as in (H1),
(2.4.15) X = -(N + 2n/3 + 11/6) ln(l - det Rn)

has a chi-squared distribution with n(2n-l) degrees of freedom.
A.small X value implies that n > m. For a dynamic system, it
is easy to Show that det Rn+1 s det Rn so that the test gives a
stopping criterion. Of course, if the input or output is
relatively constant, then Rn will be small for any n > 1. The
other problem is that by subtracting the means we may have changed
the rank of the matrix under investigation, i.e. Rn and ¢T¢
may not have the same rank. ¢T¢ can be normalized so that its
determinant is between zero and one, but the chi-squared test will
still not apply. Tests based on det(Rn) or det(¢T¢) do not
seem to work as well as the residual tests.

Goodman and Hiller (CZ) give a test based on the deter-
minant of a square ¢ and estimates of the maximum errors. But,
again, the determinants will be small if a variable is almost con-
stant and will be very sensitive to the particular values of the
variables. Some sort of average determinant might be useful to

reduce this dependency, but the algorithm is not too promising.

37

As mentioned above, all of these tests apply to the gen-
eral problem of choosing a regression model. The only applica-
tions to dynamic systems have been the Goodman and Hiller paper
(G2), an application of the F-test by Astrom (A2), and a paper by
Woodside (W3). Woodside compares three tests: the mean squared
residual, the ratio of the determinants of unnormalized $T¢
matrices, and a likelihood ratio test which is fairly complicated
computationally. He concludes that the mean squared residual
test is the best of these. He also gives a method for improving
the results if the characteristics of the (uncorrelated) noise
are available.

All of the above tests have the disadvantage that they
do not work well in practice when the input or output is almost
constant. This is expected in the general regression situation.
But we can expect better results if we can use knowledge of dynamic
systems. Our basic problem is to find a model which behaves in
the same manner as the actual system, at least in those aspects
of interest. Ideally, the model might have the same order, co-
efficients, poles, and zeros as the (minimal respresentation of
the) system, but this might not be possible or even necessary.

If we have several models which behave Similarly to each other,
then any of them should be acceptable. In particular, we can use
knowledge of equivalent representations and reduction of dynamic
systems.

Our procedure will have four steps. First, we compute the
parameters for a number of models; for example, we may get models

of orders one through M for the full matrix, reduced matrix,

38

and normalized least squares estimators (more will be said about
the selection of the algorithm in Chapter III). It might be
possible to select the order on the basis of the mean squared
residual or other such test. If not, these tests will allow us
to eliminate some possibilities. This is the second step and re-
duces our selection space to, say, orders m' through M'. The
third step is to examine the poles and zeros of the estimated
transfer functions. The numerator and denominator of an estimate
should have approximately common factors if its order is larger
than the minhmal order of the system. We can also look at the poles
and zeros of different models; they should be approximately con-
stant with respect to changes of model order. The theoretical prob-
lems with the third step are that the roots of a polynomial are
notoriously sensitive to small changes or errors in the coefficients,
and that the standard error of the coefficients increases as the
order does and as the variables become more constant. However,
the numerical results are very good. The final step is to choose
the simplest acceptable model based on steps two and three.

There is a test based on the theory of dynamic systems
which has shown better numerical results than any of the others.
In Section 1.2.1, we stated that, given the controllability matrix,
QC, and the observability matrix, Q0, of any realization of a system,
the minimal order of the system is rankKQiOc). Since the
observability matrix of realization (1.2.3)-(1.2.6) is the identity,
the minimal order is rankKQC). Since the stochastic elements will
cause Qc to always have maximum rank, it seems reasonable to

compute det(QC) and decide when the determinant becomes small

39

enough as the rank changes.
It is easy to set up Q and compute its determinant be-
c

cause it is a Hankel matrix, i.e.

(2.4.16) Qc(i,j) = si+j_1

To see this recall that

n-l

(2.4.17) QC = (B, AB,...,A B)
It is now easy to see that
(2.4.18) S, = h , j = l,2,...,n

where hj is defined in (1.2.3), and a little computation shows

that

= n ' = ..
(2.4.19) Sj zlsj-iai’ j (n+1),...,(2n l)

2.5 Normalized Least Squares

One problem with least squares is that larger data values

influence the results more than smaller values. For example, let

(2.5.1) yt = yt-1 - ut-l

We will take two sequences of observations and find the least

squares solutions; in each sequence there is just one error, ya.

yI u1 y2 u2 y3 u3 y4 y1 u1 y2 u2 y3 u3 y4 u4
1 1 0 1 -1 10 -12 10 10 0 10 -10 1 -12

In matrix form

40

O 1 O 10 10
-l = O l 9, ~10 = O 10 9
~12 -1 10 -12 -10 1

Solving for e

135 1 342 1
9 = -134 123 9 = -332 321

and the errors are

-1 -10

__.1__. -..L.

' 123 11 E ’ 321 110
-1 542

Thus, in the first sequence, the last residual is the smallest
even though it is the only line where there is an error. In the
second Sequence, the last residual is the largest.

Werther (W1) has prOposed a weighting scheme which
eliminates this inequity. He divides the elements in each row
by the rms value of the row; this correSponds to premultiplying
by a diagonal weighting matrix. He states that the results are
improved sometimes by this "normalization" process.

The non-uniform effect of the observation magnitudes is
also important if the input and output have different magnitudes.
In Examples 2 and 3, Chapter III, the output is about twice the
input. Not only does the output have more influence on the para-
meter estimates, but the variances of the input coefficients are
larger than the variances of the output coefficients. This is
because the variance of a coefficient equals the appropriate

diagonal term of (¢T¢)-l. For a first order system, the variance

41

of the output coefficient is proportional to the sum of squares

of the input, and the variance of the input coefficient is propor-

tional to the sum of squares of the output. In the examples the

input coefficient has four times the variance of the output co-

efficient even though it is three orders of magnitude smaller!
Attempts we have made to correct this problem have not

been too successful. More work needs to be done on the area of

choosing a good weighting matrix.

2.6 Summary

This chapter contains the theoretical results on least
squares and the model order problem. After explicitly defining
the full and reduced matrix least squares algorithms, it was
shown that they converge in mean square with correlated errors.
The small sample bias of the full matrix algorithm and unbiased-
ness of the reduced one were shown. Several tests for the order
of the system were given in Section 2.4.3 including some new ones
based on systems realization theory. The last section contained
some material on the selection of weighting schemes to normalize

the effect of the observations.

CHAPTER III

NUMERICAL COMPUTATIONS

In this chapter we will compare the algorithms and order
tests defined in Chapter II. First, we will describe the
numerical algorithms used in the computations. Then the examples
will be described in some detail. Finally, the numerical comparisons
will be made using both simulated data and observed data from a real,

but unknown, system.

3.1 Description of the Algorithms Used

There are several ways to compute the least-squares estimate
off-line. The most obvious, and the most common, method is to pre-
multiply both sides of (2.1.3) by ¢T and solve the system of

equations
T T
(3-1-1) (e ¢)9 = ¢ Y

for e by the Gaussian algorithm. But this method is numerically
unstable (C3). A much better method is to determine a unitary

matrix Q such that Q¢, is triangular. Then the solution of
(3.1-2) (Q¢)e = QY

is the same as the solution of (3.1.1) and the least squares solu-

tion of (2.1.3), but it is much less prone to numerical errors.

42

43

This algorithm is the one which we have used most often because

of its stable numerical properties; the actual computer subroutine

was obtained from the IBM Scientific Subroutine Package (11).
Least squares can also be done on-line as follows (G3):

write the n-th order difference equation representation as

(3.1.3) yn+i 1
where
(3.1.4) 21 = col(y , ’yn-l+i’ 1,. "un-l+i)
and let
_ T
UN ' Eli yn+izi
_ T
TN 2: zizi
Then
(3 1.5) = u T"1
‘ 9N N N

This algorithm can be easily modified to handle multi-input
multi-output systems (G3). It also requires less storage than

the off-line algorithms Since it is necessary to keep only the
latest UN vector and the latest TN matrix, in addition to one
21 vector. However, some computational effort can be saved at
the expense of a little bit of storage by finding the inverses
recursively. To do this, we divide the algorithm into three
stages. TN is invertible a.s. if N 2 2n, so we can use the well
known (G3, Pl) equation of (3.1.7) to compute T"1 from Igl.

N+l

If N < 2n, we use the first two stages to get TN+1 from T;.

44

Step (a) N = 1. Then

 

(3.1.6) I: = (2%2 )2 T1
1 1
Proof: By definition, T# = T+ if TT#T = T, T#TT# = T#, and
TT and T#T are hermitian. Dropping the subscript,
T2 = zszzT = zsz = NT, where x = sz. Thus,
TT+T = (ImZT3 = alpzxr2 = T. Similarly, T+TT+ = T+. TT+

+
and T T are hermitian because T is.

Steb (b) For 1 < N < 2n, when TN is not invertible, then we

can use the new algorithm given below.

Step (c) If T is nonsingular, then

 

N
-l _ -l l -l -l T
(3.1.7) TN+1 - TN - H21. (T_1 ) (TN ZN+1)(TN zN+1)
N+1 N zN+1
The algorithm for step (b) requires that we write TN as
_ T _ T
(3.1.8) TN - :q 2121 — 22
where
(3.1.9) Z = (z1,...,zN)
. T . .
Then, If W = Z 2 Is nonSIngular
+ -
(3.1.10) TN = zw 2zT
Proof: TT+ = ZZTZW-ZZT = ZWW-ZZT = ZW-IZT. Thus,

TTTT = ZW-IZTZZT = ZWW-IZT = ZZT = T. The rest of the proof is

Similar.

45

While the algorithm holds when the 21 are matrices,
that is in the multi-input, multi-output case, if they are vectors
we do not have to invert W at each step. First note that Wﬁ+1
and WN differ only by the addition of a new column and row.

(3.1.12) W =

where b is N X l and c is a scalar. Thus (F3)

 

-l T
dWﬁ + ee -e
-1 _ 1
(3.1.13) Wﬁ+1 d
T
-e l
where
(3.1.14) d = c - bthglb
_ -l
e — WN b
WN+1 Is nonSIngular If WN Is and d is non-zero.
Another on-line method is stochastic approximation (M3).
(3115) = +1» z( -T)
’ ‘ eN+1 eN N+l N yn+N zNeN
where
(3.1.16) P =1> - 1 (p )(P )T
N+1 N T NZN NZN
l +zNPNzN

The initial matrix PO may be any positive definite symmetric
matrix of appropriate dimensions, but is usually taken as a multiple

of the identity. The stochastic approximation estimates approach

the least squares estimates as N becomes large.

46

Stochastic approximation has the advantage over on-line
least squares that it is self-correcting; the factor (ynﬁN - zSQN)
is the error of the last observation. It will also automatically
correct for slowly-varying parameters. Accumulated roundoff error
was a problem for on-line least squares when the stiff example,
Example 2, was used; double precision arithmetic corrected this
problem. While the on-line algorithms are useful for control of
an operating system, the off-line algorithm seems preferable for
the model order problem because it allows the least amount of over-
all error.

The determinants were taken by the pivotal condensation
method (F3). Some reduction in computation could have been saved
by using elementary row and column Operations on the original
matrices, but the necessary bookkeeping outweighed the improvement.
Both the correlation and observability matrices have determinants
that are relatively insensitive to round-off errors in most

Situations.

3.2 Description of the Examples

In the next section, we will use three examples to
numerically compare the algorithms and tests. The first is regular
simulated system; the second an ill-conditioned simulated system;

and the third is actual data observed on an electric power network.

Example 1. Regular system

=. +. -0 '0
(3-2-1) Yt 8Y¢-1 39yc-2 27yt-3 SL1t-l

+ .SUC'Z + .lut‘3

47

To illustrate the procedure of Section 2.4.1 for the calculation
of the extra zeros when n > m, we will do it for p = n-m = 1

and for p = 2. For p = 1, (2.4.6) and (2.4.7) become

 

91 = {-1, 0, 8, -.5, 39, 5, - 27, 1)

(3.2.2) eg— (8, - 5, 39, 5, - 27, .1, 0, 0)
H = .3

Thus

(3.2.3) 0 = «9:91) “16:60 = - £3323 m 1/3

The additional pole of T(z) is about -1/3.

For p = 2 we have

D-l

92 = (09 09 '19 09 89 " 59 39: 59 ' 279 1)

T

91 = (-I, 0, 8, - 5, 39, 5, - 27, .1, 0, O)

T

90 - ( 8, - 5, 39, 5, -.27, .1, 0, 0, 0, 0)
Now

[2.375 -.7933] '1 [3.7933] .47982

(3.2.4) G = - -.7933 2.375 -.656 = .43648

Thus, the additional factor is (z2 + .479822 + .43648) and

the additional poles are (-.2399 i .6156i).

Example 2. Ill-conditioned system

(3.2.7) yt = .1998yt_1 + .3998yt_

2 + .207 92yt-3

+ .1035616yt_A + .0883232yt_5 - .558-46t 1

+'.1595E-3ut - .14245E-3ut

-2
+.881951:-3ut_5

-3

+ .45925E-4ut

-4

48

The poles of the transfer function are 0.9998, (-.5 : .3i), and
(.l i;.51); the zeros are approximately 2.8079, -l.34194, and
(.717 j 1.9341). This fifth-order example was designed to
approximate the power system of Example 3; an average input of
33.4 should yield an average output of 60.0. The input used in
the simulations was approximately uniformly distributed in the
interval (30.4, 36.4); the output had a mean of about 59.4 and
a standard deviation of about .31. The range of the output,
although much larger than the output of example, was relatively

small, about .251 of the mean.

Example 3. Observed data

The last example is a set of load-frequency data taken
from an actual Operating electric power system. An attempt to
model the data was made using stochastic approximation (Pl);
several "good" models were made, and the problem of selecting the

best of them motivated the present work.

3.3 Computed Results

In this section we examine the output of simulated runs
of examples one and two, where the noise has different properties
on each run, and the estimates for the real data of example three.
We will see that the model tests alone are sufficient to identify
the order of the regular system, Example 1, but are not sufficient
for the power example and its approximate simulation, Example 2.

There was no difficulty choosing the order of Example 1.
With even a small number of sample points, N = 100, the con-

trollability matrix test worked with every noise configuration that

49

was used; the mean squared residual worked except at high noise
levels; the F-test except when there was only observation noise;
the correlation matrix, and its chi-squared variant, did not work.
There seemed to be little difference between the full and reduced
matrix results.

Figures 2, 3, 4 and 5 and Tables 2, 4 and 5 compare the
results for three different simulations of Example 1. All three
had relatively high noise levels (one had system noise only, one
observation noise only, and one had both), and the full matrix
tests were more definite than the reduced ones for all three.
Figure 2 shows the absolute values of the controllability deter-
minants; the numerical data for two of the runs is in Table 2.

As shown in Figure 3, there is at least an order of magnitude
change when going from n = 3, the correct order, to n = 4,
except in two instances, and the drop was often two orders of
magnitude. In the two exceptions, the large drop occurred from
n = 4 to n = 5, giving an estimated order of four.

Figure 4 contains a graph of the full matrix mean squared
residuals, listed in Table 2. The three runs shown there were the
only three where this test did not work well (more typical runs
are in Figure 5); while there is very little decline after n = 3,
there is also very little from n = 2 to n = 3. This test gives
an estimated system order of two. Figure 4 also shows the deter-
minants of the correlation matrices for one of the runs; the
straight line behavior was typical, indicating that the test is
not very good. The chi-squared test based on this determinant

was almost a complete failure.

50

Table 2. Order Tests for Example 1
order Mean Square Residual Det(Q)
full Reduced Full Reduced
l .0794 .0824 .0846 .1103
2 .0277 .0235 .0660 .1390
3 .0247 .0216 .0401 .1106
4 .0245 .0213 .0007 .0262
5 .0244 .0212 .0001 .0036
System and observation noise
1 .0770 .0790 .0899 .1062
2 .0528 .0215 .0799 .1194
3 .0224 .0194 .0462 .0944
4 .0223 .0189 .0006 .0158
5 .0221 .0187 .0000 .0003
System noise only
1 .0498 .0480 .1871 .2461
2 .0041 .0030 .0967 .1722
3 .0020 .0017 .0312 .0456
4 .0017 .0014 .0015 .0003
5 .0014 .0011 .0003 .0002
Observation noise only
yt = .8yt-1 + .39yt_2 - .27yt_3 - OSUt-l + jut-2 + .lu

51

 

.06

.03

 

 

 

 

 

full matrix reduced matrix
controllability controllability
determinant determinant

Figure 2. Determinants of Controllability Matrices.

52

 

 

 

 

100

10

2/3 3/4 4/5

1/2

Ratio of Controllability Determinants

Figure 3.

(full matrix lse)

53

 

 

 
 

 

 

 

 

 

 

.8
.7
‘1.
T: . 6
\ .5
.4
"-, .3
kq....'..IIIIIIIIIII-nu
— — — — — 2
K .

l 2 3 4 5 order 1 2 3. 4 5 order
full matrix determinants
mean squared
residual

Figure 4.

of correlation
matrices
Order Tests

54

 

 

\

\

\

\

1

l

l

l

\

l

l

l

\

\

\

1h
2 3 4 5
Figure 5.

Mean Square Residuals

55

Table 4 shows the F-test arrays for these runs. The critical

values at 90% confidence are in Table 3.

Table 3. Critical Values for the F-test

“2 1

value 2.30 1.94 1.77 1.67

If a value in the array is lower than the one in Table 3, then
n2, n1 2 m, the minimal order. The first pair of arrays in
Table 4 are for observation noise only, and both indicate that
the system order is at least five. The test does work for the
other two noise configurations shown, although the last array
does not have completely consistent results.

Table 5 summarizes the results of the test on the noise
configurations shown for Example 1. We can conclude that, even
for these relatively high noise levels, the order of the system
is three and we need not do anything further. We can also con-
clude that the determinant of the controllability matrix is the
test that works most often.

The problem of choosing the system order is more difficult
for the second example, as the summary Table 6 shows. The F-test,
detailed in Table 7, is not very helpful. The mean squared
residual, Figure 6 and Table 8, becomes flatter as n increases,
but there is no one point where there is a large change of slope
as there was in Figure 5. But the controllability determinant

does work in most cases.

Order

U‘I-l-‘LDN Uid-‘UJN UlJ-‘UN Utd-‘WN LII-#005)

U'IJ-‘wN

Table 4.
1 2
2745.
2929. 259
2373. 182.
2148. 162.

Full matrix --

732.

635. 34.1
528. 27.2
487. 25.9

Reduced matrix

504.

301. 33.0
201. 17.0
152. 11.9

Full matrix --

129.

7.21 4.90
48.7 3.06
36.3 2.22

Reduced matrix

464.

274. 30.0
184. 15.9
138. 11.0

Full matrix --

120.

66.0 4.11
43.9 2.37
32.4 1.62

Reduced matrix

56

F-test for Example 1

3 4
52.0
56.2 50.1

observation noise only

12.1
13.0 11.2

-- observation noise only

1.11
1.37 1.62

system noise only

1.20
.886 .584

—- system noise only

1.46 1.30

system and observation noise

.652
.429 .218

-- system and observation noise

57

Table 5. Estimates of the System Order

Noise Full Reduced Full Reduced Full Reduced
type MSR MSR F -test F -test det (Q) det (Q)
System 2 or 3 2 or 3 3 3 3 3 or 4
Obser. 2 or 3 2 or 3 2 5 2 5 3 3
Both 2 or 3 2 or 3 3 3 3 3

Table 6. Order Test Summary for Example 2
Test Observation noise System noise Both

Full Reduced Full Reduced Full Reduced
MSR X 5 or 7 5 6 4 or 5 X
F-test 2 7 4 5 or 6 3 or 5 5 4 or 5
Det(Qc) 5 4 or 5 5 5 5 5
Det(Q c)
----— 5 5 5 5 5 5

nbi

Note that the determinant (Table 8) decreases by about three orders

of magnitude in almost all situations and that the relative de-

crease is approximately equal to the input coefficients, b

This

observation led to a heuristic modification which has given very

satisfactory results in every instance of Examples 1 and 2:

"normalize" the determinant by dividing by the product of the bi.

Then

(3.3.1)

det (Qm_1) det (Qm)

<<
m9 m
n1 1bi,m-l TIl bi,m

 

order

Nombww \onUI-l-‘wN \chutJ-‘UON \lcst-‘wN NoU'IJ-‘wN

\lChU'l-PUN

Table 7.
l 2

91.1

78.0 50.0
62.8 37.5
53.4 31.6
45.2 26.0
42.1 24.9
Full matrix --
14.2

9.76 4.06
8.42 4.22
7.09 3.65
5.60 2.75
5.02 2.56
Reduced matrix
138.

102. 46.3
78.6 33.9
65.2 28.3
52.4 21.6
44.2 17.7
Full matrix --
16.9

11.5 4.47
8.80 3.52
7.37 3.15
6.43 2.90
5.22 2.27
Reduced matrix
156.

99.8 28.7
70.2 18.2
63.4 21.6
51.7 17.1
43.0 13.7
Full matrix --
14.9

9.76 3.55
6.55 1.97
5.81 2.25
4.93 1.98
3.99 1.57

Reduced matrix

58

3

21.6
19.3
15.6
16.1

F-tests for Example 2

A 5 6
16.0
11.8 7.29
13.4 11.6 15.5

observation noise only

4.02
3.19
2.17
2.06

2.21
1.22 .275
1.36 .938 1.60

observation noise only

18.7
16.9
11.7
9.28

14.2
7.77 1.30
5.84 1.61 1.93

system and observation noise

2.39
2.33
2.22
1.63

2.16
2.05 1.89
1.36 .958 .084

system and observation noise

6.98
16.6
12.2
9.15

25.6
14.4 3.11
9.67 1.64 .181

system noise only

.449

1.55
1.41
1.07

2.62
1.88 1.14
1.27 .628 .152

-- system noise only

order

NoWJ-‘U’NH \loxkl'IJ-‘WNH

Noxmwar-t

yt = '1998yt-1 +3998”?2 + '2079yt-3 + .10356l6yt_4 + .0883232yt_

-.55E-4u
C

59

Table 8. Order Tests for Example 2

Mean Sq
Full

.773E -4
.554E-4
.474E-4
.442E-4
.4l%-4
.40E-4
.388E-4

.l36E-3
.927E-4
.802E-4
.757E-4
.720E-4
.717E-4
.712E-4

.64%-4
.426E-4
.388E-4
.37%-4
.34%-4
.345E-4
.345E-4

-l

. Residual Det(Q) Normalized Det(Q)
Reduced Full Reduced Full Reduced
.778E-4 .23lE-3 .158E-3 1.0 1.0
.555E-4 .498E-7 .114E-6 .703 1.422
.497E-4 .605E-10 .l86E-10 2.36 .468
.444E-4 .661E-14 .2362-12 2.71 6.83
.415E-4 .514E-15 .187E-14 443. 134.
.412E-4 ASE-18 .118E-17 2941. 264.
.391E-4 JOE-22 .127E-20 3425. 887.

Observation noise only

.120E -3 .138E -3 .l45E -3 l .0

.812E-4 .24lE-6 .475E-7 1.65
.71E-4 ,.223E-9 .171E -9 4.06
.67lE-4 .l6lE-l4 .62%-l3 .502
.62%-4 .103E-l4 .31%-14 654.
.594E-4 .l82E-18 .482E-18 366.
. 5 92E -4 . 9213 -21 . 303E -21 8033 .

System and observation noise

.575E-4 .llOE-3 .152E-3
.405E-4 .l4OE-6 .147E-7
.367E-4 .6l3E-10 .143E-10
.362E-4 .677E-14 .301E-l4
.335E-4 .106E-l4 .269E-15
.323E-4 .352E-18 .861E-20
.322E-4 .523E-21 .614E-24

Sys tem noise only

5

+ .1595E-3ut - .l4245E-3ut + .45925E--4ut + .881958-3ut

-2 -3 -4

-5

60

m:
0H:
.0

z

6mm

em

as

om pow

m

:3

Ha

now

as

use

3

N

 

I,

OOOOOOUOOOOOOUO
I

//

.I,

ll

mm
no
mo

umE v

u

kw

omH

O

 

 

....
I.

.'

I.

I.

"

O
l,..-
'
l
'

 

 

om
Ha
xauume

oma

 

 

 

 

 

 

 

100

10

Normalized Controllability Determinant for Example 2

Figure 7.

62

where the subscript denotes the order of the model, and m is the
minimal order of the system. Table 8 also shows these normalized
determinants. On the basis of them, we can conclude that the
minimal system.order is five, the correct one, in every instance.
If we are not satisfied with the conclusion from the con-
trollability test, then we can look at the poles and zeros of the
transfer functions. Table 9 shows one set of poles and zeros.
There is very good agreement, with respect to change of model order,
on the dominant pole. Starting with the third order there is a
complex pair with negative real part and norm around .6; these will
behave very similarly. Another pair with about the same norm
enters at n = 5. Looking at the zeros, we see that starting with
n = 5 there are two real (around -l.3 and -3) and one complex
pair (with positive real part and norm 1.1). While the fourth-
order poles are similar to the fifth-order ones, the fourth and
fifth order zeros are not similar. There is no approximate pole-
zero cancellation to help decide. But we can get some information
from the steady-state response to a unit-step input, also shown
in Table 9. There is very good agreement starting with n = 5,
and less agreement with lower order values. Thus we can conclude
from examination of the transfer functions, as well as from the
normalized controllability test, that the system is fifth order.

The order tests yield more consistent answers for Example

63

 

 

 

 

 

 

Table 9. Poles and Zeros for Example 2

(a,b) denotes the complex pair (a j;bi)
order Poles

2 .99969 -.561

3 .99978 (-.380, .466)

4 .99975 (-.120, .622) -.607

5 099962 ('0498, 0284) (0057, 0627)

6 .99967 (-.322, .309) (.105, .588) -.442

7 .99965 (-.629, .289) (.322, .489) (-.131, .661)
True .9998' (-.5, .3) (.1, .5)

Steady

order Zeros state

2 -1.38 1.653

3 -2.21 .315 1.600

4 -2.24 (.106, .435) 1.621

5 -l.l3 -3.16 (.591, .915) 1.675

6 -l.29 -2.81 (.504, .941) .285 1.661

7 -l.31 -2.79 (.543, .881) (.134, .382) 1.665
True -l.34 +2.81 (.717, 1.93) 1.603

Table 10. Order Test Summary for Example 3

Test Full lse Reduced lse
Mean Squared Residual X 4
Controllability det 4 4
Normal detalc) 4 or 6 4 or 6
F-test 5 4

 

 

(,4

The tests are detailed in Tables 11 and 12 and in Figures 8 and 9.
From them we draw the tentative conclusion that the system can be
approximated fairly well by a fourth order model.

The poles and zeros for both the full and reduced
algorithms are in Table 13. As in Example 2, the poles and zeros
are relatively constant as the model order changes, but there is
no pole-zero cancellation. The steady-state response to the unit-
step input is virtually constant for the full lse and, as expected,
varies slightly for the reduced lse. The examination of the poles

and zeros gives no reason to reject the tentative conclusion that

m = 4.
Table 11. Order Tests for Example 3
order Mean Sq. Residual Detﬂl) Normal det
Full Reduced Full Reduced Full Reduced
l .482E-5 .56OE-5 .262E-3 .312E-3 1.0 1.0
2 .300E-5 .336E-5 .180E-6 .236E-6 .98 .80
3 .293E-5 .333E-5 .233E-9 .l3lE-9 3.8 2.7
4 .287E-5 .273E-5 .199E-ll .152E-10 25. 30.
5 .280E-5 .26OE-5 .208E-16 .134E-l3 .3 307.
6 .280E-5 .252E-5 .l67E-l6 .581E-l6 1.7E4 1.1E4

In summary, the normalized det(Qc) test is the most
effective followed by the det(Qc), F-, and mean squared residual
tests; the poles and zeros are useful as checks on the other tests.
It is best to use all the tests and to use them on several sets of
parameter estimates because, consistnet test results imply confidence
about both the chosen order and the chosen model type. Also, the
computations are very inexpensive: the entire computation time for
all tests and parameter estimates for the power example was under

8 seconds on a CDC 6500 computer.

65

 

 

 

 

 

.55E-5 ‘
|
I
|
l,
|
l
l
.45 “
I
|
l
.40 “
i
\
l
.35 1.
‘\
\
.30 \
\ Full
\\\
,25E ‘5 \\ Reduced
order
1 2 3 4 5 6
Figure 8.

Mean Squared Residual -- Example 3

66

 

 

 

 

 

 

 

I I
I
I
I
I
I
I
I
I
I
100 I
1000 I
I
I
100 1°
10 1
1/2 2/3 3/4 4/5 5/6 1 2 3 4 5
Determinant ratios Normalized determinant

Figure 9. Controllability Determinants -- Example 3

Order

O‘U‘J-‘UDN

O‘U‘J-‘UN

order

o~u~a~u>n>

order

O‘U‘lbUN

67

Steady state

Steady state

Table 12. F-test for Example 3
l 2 3 4 5
116.
61.6 4.65
43.1 4.39 4.06
34.3 4.67 4.60 5.05
27.3 3.52 3.09 2.58 .124
Full matrix
17.0
8.36 .248
8.21 2.70 5.10
6.46 2.17 3.11 1.10
5.24 1.79 2.29 .899 .708
Table 13. Poles and Zeros for Example 3
Poles -- full matrix
1.000135 -.601
1.000106 .052 -.649
10000132 0335 (-0450, 0037)
1.000110 .416 (-.180, .269) -.618
1.000110 .514 (-.534, .143) (-.004, .403)
zeros -- full matrix
.404 1.858
-2.12 .682 1.858
-2.63 (.555, .223) 1.858
-1.87 (.438, .622) .780 1.857
-1.83 (.442, .623) .794 —.033 1.857
Poles -- reduced matrix
1.000158 -.630
1.000142 -.002 -.613
1.000240 .718 (-.663, .417)
1.000196 (-.747, .462) (.444, .287)
1.000209 .685 (-.811, .450) (.157, .548)
zeros -- reduced matrix
.432 1.810
-l.O65 .630 1.830
-1.992 (.666, .207) 1.827
-l.719 (.588, .248) .185 1.831
-1.325 (.497, .433) .736 -.514 1.834

CHAPTER IV

CONCLUSIONS

4.1 Thesis Results

Choosing a model type is always a difficult and important
problem in modeling. Even if it is possible to use the simplest
type of dynamic system -- linear, time-invariant, single-input,
single-output -- there is still the problem of selecting an order
of the model. We have discussed the model order problem for such
systems. There are two aspects of this problem: the method of
parameter identification and tests to determine the order of the
system on the basis of the parameter estimates.

The basic method used in the parameter identification
phase was least squares. Least squares was chosen because it
requires less knowledge and fewer assumptions than other methods.
After introducing the reduced variant of the algorithm, large and
small sample properties were derived assuming that the order of
the system was known. Some theoretical results when the model order
was greater than the system order were given. While reduced least
squares often gave better estimates than the full matrix version,
seomtimes they were worse. It was definitely useful to have both
sets of estimates when estimating the system order.

Several tests to determine the system order were given.

Among them are some new ones based on the characteristics of dynamic

68

69

systems. The most promising of these is based on the fact that the
minimal order of a system equals the rank of the product of the con-
trollability and observability matrices of any representation.

Using a representation where the observability matrix is the identity,
the test is to look at the determinant of the controllability
matrices for successive mode1 orders normalized by the product of
the corresponding input coefficients. The normalized determinants
are relatively constant when the model order is less than the system
order; when they are equal, there is a large jump in the determinant
value; and the behavior is erratic when the model order is more

than the system order. It was helpful to also use an F-test on

the residuals.

4.2 Future Development

Any new area contains many voids between the already known
facts. In identification, much work can be done on the theoretical
prOperties when the model order is more than the system order.

This will be a very difficult problem because uniqueness is absent.
Selection of weighting schemes is also not well developed.

Additional model order tests must be developed. For example,
other state-space realizations might generate determinants which
have more useful prOperties than the one we used. Finding useful
properties of the determinants is also an area for research. And

the general problem of selecting model properties is wide open.

APPENDIX

APPENDIX

A.l Proof of Theorem 2

C = lim C as defined by

The assumptions imply that N

(2.2.1) exists and is positive definite (see (M1) for details)

Intuitively, this makes sense because C is the correlation

matrix for the input and output.

1 N
N zlyn-i-i-len+i E(yt-iet)
1 N
N zlyien-l-i E(yt-net)
(A.1.l) lim BN = lim = = B
l’iNu e E(u e )
N 1 n+i-1 n+i t-l t
1
N ztluien-i-i E(ut-net)
B exists by assumptions 1) - 3). Let
._ l T _ = l T _ T
(A.l.2) 2N N ¢N®N(8N e) N((1;quN (¢N¢N(YN END)
1 T
" N ¢NEN 7' BN
Also,

Using (A.1.l) and (A.1.2) E(ZN) = E(BN) = B.

T 1 1 T T . .
E(ZNZN) — N E(N ¢NENEN¢N), a moment matrix which ex1sts. We now

look at the limits. Obviously,
(A.1.3) lim E(CNBW) = linIE(ZN) = B
70

71
Although E(CNeﬁ) 1‘ E(CN)E(9‘\;), still
(A.1.4) 1im‘E(CNe§) = lim E(CN) x lim E(eﬁ) = c lim E(eé)

because the stochastic matrices CN approach the deterministic

matrix C. Since C is invertible,

(A-lo5) lim E(eé) =

In Theorem 1, B = 0, so the lse is unbiased.

To show mean square convergence, we must show that

(A.l.6) lim E(eﬁTeﬁ) = 0
Since ' = C-12 and since C is symmetric

9N N N N ’

IT I) -2
(A.1.7) E(eN 9N) =E(ZN TNc z N)
(A.1.8) lim E(ZTZ ) = lim E(tr(Z 21))
N N N N
. 1 .
= 11m ﬁ'tr(moment matrix) = O

. . . . . -2 . . .
Since C is pOSltlve definite, CN acts as a weighting matrix

for N sufficiently large. Thus,
(A.1.9) lim E(eNT 9N) = lim l'tr(weighted moment matrix)= 0

and mean square convergence is proved.

72

A.2 Proof of Theorem 4

Lep p = 1. We will show that -l s -g S 1. From (4.2. )

8T8
1
T

9191

O

 

(A.2.1) -g =

From (4.2.6) and (4.2.7)

T _ T
(A-2.2) 9191 — 1 + 9090
T T T
Let 9190 2 0. If 9190 2 8191, then
T T T

O. ‘ +
(A 2 3) 29190 a 9191 9090
or
(A24) 02( -e)T( -)

'° 91 O 91 60

T

which is impossible. The proof is similar for 9160 < 0. Now

assume that for p = po we have
_ T - _ -l
(A'2°S) G0 -(HoHo) 11“090 L0 Jo

We will express Gl’ correSponding to p = po + 1, in terms of

. _ T = . .
Go. Notice that (Lo)i,j eiej f‘i-j" allow1ng us to wrlte
L K
(A.2.6) L =
1 KT f
o

where

.2. =
(A 7) K col(fp_1,fp 2, ,fl)

and

(A.2.8) J = col(Jo, fp)

73

Lil can be expressed in terms of L;1 as (F1)

 

 

_1 I -L;1K L;1 o 1 o
(A.2.9) L = _ _ _
1 o 1 o (f - KTL 1K) 1 -KTL 1 1
0 O 0
Finally, we get
-1 "O o
- dL + W
Go 0 K 81 dgp-l
(A.2.10) G = = O o
1 d g2 + dgp_2
0
+ d
gp_1 g
c d J
where g: is the i-th element of Go and
T -1 -1 T -l
.2. = - - -
(A 11) d (£0 K LO K) (fp K LO Go)

To show that the roots of (2.4. ) are either all within or

all outside the unit circle, we use the Lehmer-Schur method (R1).

Define
= p-l o p o
(A.2.12) Po(z) z + glz + + gp-l
and
* o p-l
= +. +
(A.2.13) Po(z) gp_lz l

*
The roots of P are the inverse conjugates of the roots of P,
i.e. if all the roots of P are inside the unit circle then all
*
the roots of P are outside. In a similar manner P1(z) and

*
P1(z) are defined from G1. We now form

2 -
(d -l)g:_lzp 1 +...+ (dz-l)

(dz-1)P:(z)

(A.2.14) T(P1(z)) = (1121(2) - Pie)

74

The Lehmer-Schur method states that if T(P1(0)) < 0, then
T(P1(z)) = constant-P;(z) and P:(z) have the same number of zeros
within the unit circle; if T(P1(0)) > 0, then P:(z) and P1(z)
have the same number of zeros inside. In summary, if \d‘ s l
and Go defines a stable system, then G1 will also define a
stable system; if ‘d‘ > 1 and G0 is stable, then every mode
of G1 will be unstable; etc., and the theorem is proved.

It would be more interesting if we could prove that ‘d‘ < 1.

It is not too difficult to show that d > -1, but the other in-

equality is more more difficult.

75

A.3 Steady-state Response to a Unit-step Input

We will show that the steady-state response of the system

(1.1.1) to a unit-step input is

(A.3.1) B/(l - A)
where

n _ n
(A.3.2) A — zlai B - zlbi

We will show this in both the time domain (difference equation
representation) and z-domain (transfer function representation)

to illustrate the differences of the two domains.

In the time domain, ut = 1 for all t = 0,1,... Then
y1 = b1
y2 = aly1 + (b1 + b2)

— +...+ +
yn+1 alyn anyl B

n
+
yn+k zlaiyn+k-i B

Taking limits,

n
o o = + = A +
(A 3 3) yss 2laiyss B yss B

and the result is shown.

In the z-domain, the transfer function is
(A.3.4) T(z) = Y(z)/U(z)

and the unit-step response is (01)

(A.3.5) Y(z) = 6(2) -—'

The final value theorem states

(A.3.6) ySS = lim yt = lim ((z-1)Y(z)).
t-m z—41

Thus,

A37 =1' (G())=-B—-

( . . ) ySs 1m 2 z l-A

z~1

BIBLIOGRAPHY

 

A1

A2

A3

B1

B2

B3

B4

B5

Cl

C2

C3

BIBLIOGRAPHY

Anderson, J.B., "Geometrical Approach to Reduction of
Dynamical Systems", Proc. IEEE, Vol. 114, July 7, 1967,

Astron, K.J., Lectures on the Identification Problem --
The Least Squares Method, Lund Inst. of Tech., Div. of
Automatic Control Report 6906, 1968.

Astrom, K.J. and Eykoff, P., "System Identification --
A Survey", Automatica, Vol. 7, pp. 123-162.

Astrom, K.J., Bohlin, T. and Wensmark, S., Automatic
Construction of Linear Stochastic Dynamic Models for
Stationary Industrial Processes with Random Disturbances
Using Operating Records, IBM Nordic Laboratory Technical
Paper TP 18.150, 1965.

Athans, M. and Falb, P.L., Optimal Control, McGraw-Hill,
NY, 1965.

Bacopoulos, A. and Gaff, B., "On the Reduction of a Problem
of Minimization of n+1 Variables to a Problem of One

Variable", SIAM J. on Numer. Ana1., Vol. 8, No. 1, pp. 97-103.

Bellman, R. and Astrom, K.J., "On Structural Identifiability",
Mathematical Biosciences, Vol. 7, 1969, pp. 329-339.

Bevington, P.R., Data Reduction and Error Analysis for the
Physical Sciences, McGraw-Hill, NY, 1969.

Boling, R., personal communication.

Brockett, R.W., Finite Dimensional Linear Systems, Wiley,
NY, 1970.

Central Electricity Generating Board, "Demand ReSponse to
Changes of Voltage", Report PL-ST/1/70, London, 1970.

Chen, C.T., "A_New Look at Transfer Function Design", Proc.
IEEE, Vol. 59, No. 11, November 1971, pp. 1580-1585.

Cheney, R.W., "Numerical Problems of Linear Algebra",
Mimeographed lecture notes, Michigan State University, 1969.

77

D1

D2

E1

E2

F1

F2

F3

G1

62

G3

H1

H2

11

12

J1

78

Davison, E.J., "A.Method for Simplifying Linear Dynamic
Systems", IEEE Auto. Cont., Vol. 11, No. 1, January 1966,
pp. 93-101.

Draper, N.R. and Smith, B., Applied Regression Analysis,
Wiley, NY, 1966.

Efroymson, M,A., "Multiple Regression Analysis, in Mathe-
matical Methods for Digital Computers, A. Ralston and
H.S. Wilf, eds., Wiley, NY, 1960.

Elgerd, 0.1., Electric Energy Systems: An Introduction,
McGraw-Hill, NY, 1971.

Farrar, D.E. and Glauber, R.R., "Multicollinearity in
Regression Analysis: the Problem Revisited", Rev. Econ.
and Stat., Vol. 49, February 1967.

Forsythe, G.E., "Generation and Use of Orthogonal Poly-
nomials for Data-fitting on a Digital Computer", J. SIAM,
Vol. 5, 1957, pp. 74-88.

Frame, J.S., "Matrix Operations and Generalized Inverses",
IEEE Spectrum, March 1964.

Gilbert, E.G., "Controllability and Observability in Multi-
variable Control Systems", SIAM J. on Control, Vol. 2, No. 1,
1963, pp. 128-151 a

Goodman, D. and Hiller, J., "Determination of System Order",

First Hawaii International Conference on Systems Science,
1968, pp. 527-530.

Gopinath, B., "On the Identification and Control of Linear
Systems", Ph.D. thesis, Stanford University, 1968.

Haitovsky, Y., "Multicollinearity in Regression Analysis:
Comment", Rev. Econ. and Stat., Vol. 51, No. 4, 1969,
pp. 486-490.

Hurwicz, L., "Least Squares Bias in Time Series", in
Statistical Inference in Dynamic Economic Systems, T.C.
Koopmans, ed., Cowles Commission monograph No. 10, Wiley,
NY, 1950.

IBM, "System/360 Scientific Subroutine Package Programmer's
Manual", 1966.

Isaacson, E. and Keller, H.B., Analysis of Numerical Methods,
Wiley, NY, 1966.

Jenkins, G.M. and Watts, D.G., Spectral Analysis and its
Applications, Holden-Day, San Francisco, 1968-

K1

K2

L1

M1

M2

M3

01

P1

P2

R1

R2

31

S2

T1

T2

W1

W2

79

Kalmmn, R.E., "Mathematical Description of Linear Dynamical
Systems", SIAM J. on Control, Vol. 2, No. l, 1963, pp. 152-192.

Kalman, R.E., Falb, P.L. and Arbib, M.A., T0pics in Mathe-
matical Systems Theory, McCraw-Hill, NY, 1970.

Lewis, T.O. and Odell, P.L., Estimation in Linear Models,
Prentice-Hall, Englewood Cliffs, NJ, 1971.

Mann, H.B. and Wald, A., "On the Statistical Treatment of
Linear Stochastic Difference Equations", Econometrica,

Vol. 11, 1943, pp. 173-220.

Marshall, S.A., "An Approximate Method for Reducing the
Order of a Linear System", Control, December 1966, pp. 642-643.

Monahan, D.D., System Identification by Stochastic Approxima-
tion, Ph.D. thesis, Michigan State University, 1970.

Ogata, K., Modern Control Engineering, Prentice-Hall, NY,
1970.

Park, G.L., Monahan, D.D. and Schwartz, M.W., "An Applica-
tion of a Stochastic-Approximation, System-Identification
Algorithm to Interconnected Electric Power Systems",
Midwest Power Symposium, Ames, Iowa, 1970.

Powell, M.J.D., Numerical Methods for Unconstrained
Optimization", SIAM Review, Vol. 12, No. 1, January 1970.

Ralston, A., A First Course in Numerical Analysis, McGraw-
Hill, NY, 1965.

Rosenbrock, R.R., State Space and Multivariable Theory,
Wiley, NY, 1970.

Soderstrom, T., Notes on Pseudoinverses: Applications to
Identification, Lund Inst. of Tech. Div. of Auto Control,
Report 7003, 1970.

Staley, R.M. and Yue, P.C., "On System Parameter
Identifiability", Information Sciences, Vol. 2, pp. 127-138.

Theil, H., Economic Forecasts and Policy, North-Holland,
Amsterdam, 1961.

Tintner, G., Econometrics, Wiley, NY, 1952
Werther, M.H.F., Recursive Algorithm for the Best Approximate
Solution of Linear Equations, Management Information Services,

Detroit, 1970.

Wold, H. and Jureen, L., Demand Analysis, Wiley, NY, 1953.

 

W3

21

22

80

Woodside, G.M., "Estimation of the Order of Linear Systems",
Automatica, Vol. 7, No. 6, 1971.

Zadeh, L.A. and Desoer, C.A., Linear System Theory: The
State Space Approach, McGraw-Hill, NY, 1963.

Zadeh, L.A. and Polak, E., eds., System Theory, McGraw-
Hill, NY, 1969.

16

min”WWWmam

{lllfllilillul
3 1293 03169 5

I

 

[IHIHIUIH