I'Hzt‘V

("$1, 13 3 f1.
Lt g-xazni‘nuv
fé‘fi
551:..i

n
.112 41.3.... 11‘.
e u-

a,.£ bwgntaw

Uzi-Haw»-

 

This is to certify that the

dissertation entitled

THE VALUE OF IMPERFECT SAMPLE SEPARATION
INFORMATION IN SWITCHING REGRESSION MODELS

presented by

Edwina A. Masson

has been accepted towards fulﬁllment 1
of the requirements for I

Ph .D . degree in Economfics l

”em SQELQ/

Major professor

 

Peter J. Schmidt
Date Ju1y 3'; 1985

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

 

 

'rvifs‘aj RETURNING MATERIALS:
Place in book drop to
LIBRARIES remove this checkout from
Ail-(SIIIL. your record. FINES will

 

 

be charged if book is
returned after the date
stamped below.

 

 

 

THE VALUE OF IMPERFECT SAMPLE SEPARATION INFORMATION
IN SWITCHING REGRESSION MODELS

By

Edwina A. Masson

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Economics

1985

ABSTRACT

THE VALUE OF IMPERFECT SAMPLE SEPARATION INFORMATION
IN SWITCHING REGRESSION MODELS

By

Edwina A. Masson

The purpose of this study is to determine the value,
in terms of efficiency gains, of using imperfect sample se-
paration information in switching regression models. The
imperfect information appears in the model as a regime clas-
sification, which is correct only with some probability.

The importance of this study lies in the fact that knowledge
of improvements in the efficiency of parameter estimation

can guide one in determining whether to use sample separation
information, even if it is unreliable.

We determine the value of sample separation informa-
tion by comparing the asymptotic variances of the parameter
estimates, under different assumptions about the available
information. These assumptions range from perfect sample
separation information, at the one extreme, to no such infor-
mation whatever, at the other extreme. The asymptotic var-
iances of the parameter estimates are obtained from the rele-
vant information matrices, which are calculated by simulation
over a very large sample size.

Among our findings, the following are most important.
(1) There are efficiency gains when using imperfect informa-

tion as compared to no information at all, and these can be

Edwina A . M88 8011

substantial in some cases. (2) Efficiency gains when using
imperfect sample separation information are greatest when
such information is highly reliable; and when the samples
are difficult to disentangle from each other. (3) There
are additional efficiency gains when the switching probabi-
lities are modelled as probit functions of the explanatory
variables. These gains occur in cases when they are most
needed; specifically, when the samples are hardly distinct
from each other, and when the imperfect sample separation

information is not very informative.

ACKNOWLEDGEMENTS

I would like to express my deepest gratitude to my
adviser, Professor Peter Schmidt, for all the guidance and
encouragement he gave me throughout the course of this thesis.
He was always available with helpful suggestions and was very
patient with me, particularly when the thesis problem had
not yet been explicitly defined. I am also grateful to the
other members of my dissertation committee -- Professors
Christine Amsler, T.C. Anant, and Stephen Martin.

Most of all, I want to thank my family, especially my
parents and my husband, for their support and encouragement

during my years of study at Michigan State.

11

LIST CF
CHAPTER

ONE

TWO

THREE

FOUR

TABLES .......................................... v
Page
INTRODUCTION .................................... l
1.1 Definition of the Problem .................. 1
1.2 Formal Discussion of Switching
Regression Models .......................... 3
1.3 Review of the Literature .................. 12
1.“ Plan of the Study ......................... 20
THE CASE OF CONSTANT REGIME
CLASSIFICATION PROBABILITIES ................... 23
2.1 The Model ................................. 23
2.2 Derivation of Asymptotic Variances ........ 26
2.3 The Value of Imperfect Information ........ 32
2.u Summary ................................... 41
THE CASE OF NON-CONSTANT REGIME
CLASSIFICATION PROBABILITIES ................... U3
. 3.1 Introduction .............................. A3
3.2 The Model ................................. M5
3.3 Derivation of Asymptotic Variances ........ A7
3.A The Value of Imperfect Information ........ 53
3.5 Summary ................................... 72
THE CASE OF NON-CONSTANT REGIME
CLASSIFICATION PROBABILITIES AND NON-CONSTANT
SWITCHING PROBABILITIES ........................ 7H
h.l Introduction .............................. 7H

TABLE OF CONTENTS

111

iv

CHAPTER Page
”.2 The Model ................................. 76
H.3 Derivation of Asymptotic Variances ........ 79
4.A The Value of Imperfect Information ........ 8H

“.5 summary .0.000.00.0000...OOOOOOOOOOOOOOOOO101
FIVE COtJCLUSIONS I.O...OOOOOOOOOOOOOOIOOOO0.0.0.0... 101‘

APPENDIX A. The Second Derivative Components
of the Information Matrix in the
Case of Non-Constant Classification
Probabilities ............................ 111

APPENDIX B. The Second Derivative Components
of the Information Matrix in the
Case of Non—Constant Classification
Probabilities and Non-Constant
Switching Probabilities .................. 117

BIBLIOGRAPHY 0.0.0.0...OOOOOOOOOOOCOOOOOOOOOO00.0.00... l2“

Table

LIST OF TABLES

Tables on Ratios of Asymptotic Variances

Page
Varying p11 and pm when ,ul 0, [.42 = 2,
61. 62.1, )..5 .....0.................... 35
Varying [.12 when [11 - 0, C1 = 62 . 1,
I..5’ p11.p00..8 .C.................OCOCOC 38
Varying X when Iul I 0, p2 - 2,
C1. (281’ pllgpoo- .8 0.0000000000000000. 39

Varying 6'2 when [Ml . 0, [J2 - 2, 6' = l,

1
k - .5. p11 - p00 - .8 ......................... no
Varying h ($2 - h Fl) when F1 - (1, 1)‘,

61 - 62 - 1, >.= .5, ‘6- (1, -1, 1, 1)‘ 57
Varying €22 when ﬁ- (0, 0, 0, pay,

(1 - 62 . 1, ks .5, X- (1, -1, 1, 1)' 59
Varying $21 when F- (0, 0, F21, O)‘, i
61 - 62 - 1, A: .5, X- (1, -1, 1, 1)‘ 60
Varying ‘60 when 6- (O, O, 2, 0)’,

61 - 62 - 1, >.- .5, 151 = (1, -1)' 614
Varying X12 and X02 when ﬁ- (0, O, 2, 0)‘,
(1— 62-1, A- .5,

X - (1, X12, -1, onw 67

Table

10

11

12

13

14

15

16

vi

Page

Varying ‘6 (X1 = X0) when @= (0. 0. 2. 0)‘.
(1-C2=1,>\-.5 ......................... 69
Comparison of F(x"61) - F(x'KO) = .8

and p11 = p00 = .8 when §= (O, O, 2, O)‘,
61=62=1,)\=.5 ......................... 7o
Varying $21 when §= (O, 0, $21, 0)’,

Cl - (2 -= 1, Q= (o, 0)'.

K= (l, -1, 1, 1)‘ ............................ 88
Varying $21 when $= (O, 0, $21, 0)’,

(1 = 62 = 1, Q= (1, -1)',

K= (1, -1, 1, 1)' ............................ 9o

Varying ‘6 (X1 #5 KO) when 6'- (0, 0, 2: O)‘,

(1" €2=1, Q=(o,0)' .................... 9L:
Varying ‘5 (X1 1‘ KO) when 38 (O, O, 2, O)‘,
‘1‘ (2=1, Q=(1,-1)' ................... 96

Varying Q when F = (O, O, 2, O)‘,
€1= Q2=1, X= (1,-1,1,1)' ............. 99

CHAPTER ONE

INTRODUCTION

1.1 Definition of the Problem

Switching regression models, normal mixture models,
and disequilibrium models are systems characterized by dis-
continuous shifts in regression regimes at unknown points in
the data series. The most common formulation hypothesizes
that the system may switch numerous times back and forth
between two particular regimes, or to successive new regimes.
For the sake of simplicity, we shall restrict our discussion
to the case in which it is known a priori that the number of
regimes is two. These models are primarily designed to deal
with samples in which sample separation information is miss-
ing. That is, we do not know whether an observed random var-
iable is generated by one regime (which corresponds to a
distinct regression model) or by another regime (which cor-
responds to another regression model).

An interesting issue here is the loss, measured in
terms of the efficiency of parameter estimation, when sample
separation (alternatively, regime classification) is unknown
or is not observed. A number of papers (Goldfeld and Quandt,
1975; Kiefer, 1978; Schmidt, 1981) have addressed this ques-
tion in the context of disequilibrium models and normal mix-
ture models. All these studies found that sample separation
information does have a positive value, in that estimates

derived are more efficient when there is a priori knowledge

as to which regime each observation belongs to. This con-
firms the need to obtain reliable information about sample
separation, when it is available.

The purpose of this paper is to extend the issue one
more step. Sample separation information may exist, but may
not be entirely reliable. Such a situation may conceivably
arise in models with outliers, or when the available data is
simply not entirely accurate. By how much is efficiency im-
proved when imperfect regime classification information is
used? This paper attempts to answer that question, and is
therefore, an extension of Schmidt's paper, with the addi-
tional use of imperfect sample separation information. We
will address the issue strictly in the context of switching
regression models.

The importance of this extension lies in the fact that
knowledge of improvements in efficiency of parameter estima-
tion can guide one in deciding whether to use sample separa-
tion information, even if it is known that such information
is imperfect or unreliable. In addition, even if imperfect
information is not readily available, knowledge of efficien-
cy gains will aid in determining whether such additional in-
formation is worth obtaining at all.

Before we proceed any further, a formal discussion of

switching regression models is warranted at this point.

1.2 Formal Discussion of Switching Regression Models

The simplest possible formulation is a normal mixture
model (actually, a switching regression model with only a
constant term), where a sample of observations y1, y2,...,
yn is given on a random variable y. It is known that nature

chooses between regimes with probabilities A and 1 - ”A.
That is,

yrx/ N( pl, (12) with probability A (1.1)
(regime 1)
yrv N( “2, (22) with probability (1 - A)

(regime 2)

where the parameters #1, #5, C12, 622, and A are unknown.
A more complicated case arises in the switching regression
model in which observations are given on a random variable y
and on a vector of nonstochastic regressors x. Nature is as-
sumed to generate each yJ from xJ by regime l with probabili-
ty A , and by regime 2 with probability (1 - A ). Therefore,

we have:

= ' I
yJ x13 51 + ulj with probability A (1.2)
(regime 1)
8 ' .-
yJ x23 52 + 112.1 with probability (1 A)

(regime 2)

2 2
where um!V N(0, (1 ), “23'” N(O, 62 ), and the parameters
Fl, 82, 612, 622, and A are unknown. There are also so-
called disequilibrium models (which we will not discuss in

this paper), in the context of demand and supply equations.

Such models are characterized by a minimum condition, as in
qJ - minimum (DJ, SJ) for an ordinary demand-supply model,
where the observed quantity qJ is the smaller of demand and
supply. They are similar to switching regression models,
since observations can come from two regimes (supply or de-
mand equations), but the probability of an observation coming
from a given regime varies over observations.

In an economic context, applications of such models
are plentiful. Hamermesh (1970) used a switching regression
model to examine the determination of wage bargains from ob-
servations on wage changes, changes in the consumer price in-
dex and unemployment. The dependent variable is the wage
change, w, and he hypothesized that the effect of cost of
living changes, c, on wage changes is significantly positive
only when cost of living changes exceed some critical figure,
which has been selected a priori. There are two wage bargain
equations, each one corresponding to when E is either less
than or greater than and equal to this predetermined criti-
cal figure. This is a case where regime classification is
known.

Quandt and Ramsey (1978) re-estimated Hamermesh's mo-
del where there is no prior information as to the critical
value of 6 below and above which different regression regimes
are at work. They assumed that nature chooses between the two
regressions for any observation, by comparing 6 to a critical
value (known only to nature). If this critical value is F,

and the fraction of observations with c _<_ c- is equal to A ,

then nature chooses one regime with probability ?\, and the
other regime where c) '5' with probability (1 - A ). This is
a case of no sample separation information and the regimes
are unknown.

Lee and Porter (198A) used switching regression tech-
niques to model a supply function for a railroad cartel.

This supply function identifies periods in which firms are
behaving non-cooperatively as opposed to cooperatively, i.e.
whether price wars were occuring or not. The dependent var-
iable is the market price for grain, so that price wars with-
in the cartel shift the supply curve to signal reversions from
collusive (higher prices) to non-collusive (lower prices) be-
havior. They assumed that sample separation information was
available, though not perfectly reliable.

Examples of disequilibrium models can be found in the
watermelon market (Suits, 1955); the market for housing
starts (Fair and Jaffee, 1972); the market for chartered
banks' loans to business firms (Laffont and Garcia, 1977);
the U.S. labor market (Rosen and Quandt, 1978); and credit
rationing in international lending (Eaton and Gersovitz,
1980).

If information on sample separation is known for
switching regression models, then estimation of the parame-
ters in the respective regimes is straightforward and is done
by least squares. If information on sample separation is un-
known, then we are confronted with the problem of regime

classification, and estimation of the parameters is done by

either maximum likelihood, method of moments, moment genera-
ting function, or modified moment generating function. The
choice of the appropriate estimation technique, however, does
not concern us here, and so we will only provide a brief 0-
verview of the issues involved. A more detailed discussion
of the issues may be obtained from the references cited.

We shall restrict ourselves to the basic normal mix-
ture case of equation (1.1), since the extension to equation
(1.2) is fairly straightforward. It should, first of all,
be noted that parameters of finite mixtures of normal densi-
ties are identified, and that there exists no sufficient sta-
tistic for the parameters of a normal mixture (Quandt and
Ramsey, 1978).

Under the assumptions of (1.1), the probability densi-

ty function for yJ (J = l,...,n observations) is:

. 2 2
fJ f(yJ, M1, M2, 61 . 62 . A) (1.3)

hfl(yj) + (1 - )M‘ZUWJ)

 

 

 

 

2
= A exp [- (yJ - M1) :| +
2
J??? ‘1 2 61
2
(1-)\) exp [- (yj' M2)]
féTo'Z 2 «,2

f1(y3) and f2(yJ) are the normal probability density func-
tions for observations from regime 1 and regime 2, respec-

tively. The likelihood function for the unknown parameters is:

The natural procedure for estimating the parameters using ma-
ximum likelihood is to maximize the likelihood function with
respect to the parameters. This, however, runs into diffi-
culties since as either (1 or (2 goes to zero, f3 increas-
es without bound. It follows that the likelihood function L
is unbounded, and the unboundedness of the likelihood func-
tion means that any attempt to find a global maximum will
produce inconsistent estimates. To avoid this, it is possi-
ble to specify a priori knowledge of the ratio of the varian-
ces 6'12, (22 and to set (12 . h 622, or alternatively, to
specify that 522 2. h (12, where h is known (Goldfeld and
Quandt, 1975; Kiefer, 1978). Another problem with maximum
likelihood estimation is the potential singularity of the mat-
rix of second partials of the log likelihood function, which
is equivalent to a vanishing Jacobian for the set of normal
equations derived from the maximum likelihood approach
(Quandt and Ramsey, 1978; Hartley, 1978).

Kiefer (1978) argues that although the likelihood
function is known to be unbounded at some points on the edge
of the parameter space, the likelihood equations have a root
which is consistent and asymptotically normally distributed.
Therefore, computation of the maximum likelihood estimates
should attempt to find a local maximum in the interior of the
parameter space of the likelihood function. However, the at-
tainment of such a maximum may be difficult in practice so
that alternative estimators may need to be considered.

Quandt and Ramsey (1978) propose using either the

method of moments or the method of the sample moment genera-
ting function (MGF). Under the method of moments, the sample
mean is equated to the theoretical first moment of equation
(1.3) and the second, third, fourth and fifth sample moments
about the mean to the corresponding theoretical central mo-
ments (if there are five parameters). From this, we obtain
five equations from which it is possible to solve for consis-
tent estimates of the five parameters. However, if there are
K (where K '> 1) independent variables in the switching reg-
ression model (in the normal mixture model, K - 1), then the
number of parameters is 2K + 3. It follows that moments of
order even higher than five need to be employed, and the re-
sults are likely to be fairly unstable. While no estimates
of the sampling variances are provided by this technique, it
is well-known that, as a general rule, the sample variances
of higher-order moments are quite large (Kendall and Stuart,
1963). For these reasons, the MGF technique is preferred 0-
ver the method of moments as an estimating procedure.

The MGF method solves for the values of the parameters
by minimizing a sum of squared differences between the empi—
rical and theoretical values of the moment generating func—

tion. Define the following expression:

'é (1.14)

I
'IN-o (N

Sn(o(, 9)

" 2
61:

u
1'44

-' 2
(zn(dt) - G(O, 4t))

where:

33 = (El, arm, ET)
... 1 M
Et ‘ H 12 ejt
En(°(t) - gijzziexp (dtyJ)
C(O. dt) = 7\exp [Mldt + .(t2 612] +
2
2 2
(1 - A) exp [ﬂedt + ott 0'2]
___2__

t=1’ooo,T;J=1’ooo,n

T different values of 0‘ are picked (where T Z the number of
parameters, i.e. 5 in this case) and Sn(o(, O) is minimized
between the T estimated MGF values and their theoretical coun-
terparts. The «it (t = l,...,T) are chosen so as to ensure
that the corresponding normal equations derived from the mi-
nimization of Sn(<K, 9) with respect to 0 are nonsingular.
The solution to the five normal equations defines the MGF es-
timate, which is consistent and asymptotically normally dis-
tributed. In choosing cit, the values which need to be a-
voided are those which are either very close to zero or those
which are large enough so that G(0, 0(t) becomes computation-
ally intractable.

Schmidt (1982) improves on this method by postulating
a modified MGF estimator, which is also consistent and where

a generalized sum of squares is minimized rather than an or-

dinary sum of squares. The criterion in (1.“) needs to be

10

re-written as:

Sn'(o(, e) - é'ﬂ'lé (1.5)
where:

é'=(€1.é2.....éT)

- 1.,

et "Eff; ejt

ﬁst B G(0, ‘15 + cit) - 6(0, 0(8)G(O, e(t)

s,t £1,000’T; J =1,000’n

The matrix Sl.(of order T x T) has its stth

element defined
as above. It comes from the covariance matrix of 0, and is
proportional to the covariance matrix of the Et'

The rationale behind this approach is that the Git
(t = l,...,T) are correlated and have unequal variances, so
that a generalized least squares criterion should be minimized,
by analogy to the ordinary least squares and generalized

least squares regression. When T is equal to the number of
parameters (i.e. 5), the distinction between (1.4) and (1.5)
does not apply because either sum of squares is minimized at
zero, so that either minimization yields the same estimates.
However, when T is greater than five, the estimates obtained
by minimizing the generalized sum of squares are asymptotical-
ly efficient relative to those obtained by minimizing the sim-
ple sum of squares.

In comparison, the asymptotic covariance matrix of the

MGF estimator can be expressed as:

*Pl = (A'A)'1A'_(l Maura)"l

11

where A is the T x 5 matrix defined by

A 36(0, ok

89.;

it ' .1)

The asymptotic covariance matrix of the modified MGF estima-

tor is of the form

‘P2 = (A'ﬂ 'lA)'1

where the matrix A is defined as above. When T is equal to

five, therefore,
3 g -1 "1
‘Pl W2 A Il.(A)

so that the modified MGF and MGF estimators are identical.
However, when T is greater than five, the difference (\Pl -
W2) is a positive semi-definite matrix, which implies that
the modified MGF estimator is asymptotically efficient rela—
tive to the MGF estimator.

But, there still remains the problem of the appropriate
choice of T (since asymptotically, more values are preferable
to less). The values of °(t (t 8 l,...,T), given the choice
of T, may be addressed by the asymptotic covariance matrix
of the resulting estimates. A useful criterion would be to
choose the o('s which minimize some measure of the size of the
asymptotic covariance matrix, i.e. its determinant. In addi-
tion, the O‘t values need to be small and need to assume dif-
ferent values -- this latter requirement puts some limit on

how small they all can be.

 

12

1.3 Review of the Literature

Let us now turn our attention to a survey of articles
which are related to this one. It has been previously stat-
ed that some Studies have tried to evaluate the value of sam-
ple separation information in disequilibrium models and nor-
mal mixture models.

Goldfeld and Quandt (1975) worked on a disequilibrium
model of the watermelon market (derived from Suits, 1955)
and did a small sample Monte Carlo experiment based on a set

of estimated parameter values. Their model is of the form:

QJ . f(predetermined variables)

XJ 8 g(PJ, Q1, predetermined variables)
PJ - h(YJ, predetermined variables)

YJ - minimum (QJ, X3)

where QJ’ XJ, PJ and YJ are equal to the crop of watermelons,
the ex-ante or intended harvest, the price, and the actual
harvest of watermelons, respectively. Two specifications

were postulated -- first, where QJ is not observed and sample
separation information is therefore unknown, and second, where
QJ is observed and sample separation is known.

For their experiments, parameter values were chosen so
as to reproduce approximately the levels of the dependent var-
iables observed in the actual data. Since the first specifi-
cation has less information than the second, some parameter

values in the former were varied to examine the effect of the
variations on the value of additional information, but the

fraction of sample points was kept constant (i.e. for

13

XJ 3.03 and for XJ 4 QJ, which determines regime classifica-
tion) by a compensating variation in another parameter. 0-
ver a number of cases, Goldfeld and Quandt derived the root
mean-squared error ratios for the parameters (where the mean-
squared error ratios can be interpreted as consistent esti-
mates of the variance ratios) when sample separation was
known relative to when it was unknown. All these ratios were
less than 1.0, although the ratios were naturally larger for
the parameters of the P‘1 equation where QJ does not come in.
A larger ratio simply means that the effect of not knowing
sample separation information is minimal on the efficiency of
the parameter estimates. 0n the whole though, knowledge of
sample separation leads to smaller variances, implying that
using data on QJ has a positive value in terms of more effi-
cient estimates.

When there is no information on QJ’ then the coeffi-
cient of QJ in the XJ equation is zero, or the variable Just
drops out. However, when this coefficient should not be ze-
ro (meaning there is a significant relationship between QJ
and X3), then there is the additional complication of the un—
observable QJ entering the XJ equation. The larger the abso-
lute value of this coefficient, then the more valuable is in-
formation on the QJ data for estimating the XJ equation. It
follows that the larger this coefficient, then the root mean-
squared errors for the parameters in the first specification
with no sample separation information also increase. Experi-

ments were conducted in this regard, where the coefficient

l“

of QJ was allowed to vary, and the a priori expectations

were confirmed. Therefore, the superiority of the second
specification rises with the value of this coefficient, since
ratios of the root mean-squared errors when sample separation
is known relative to when it is unknown, decline. These ex-
periments show that efficiencies of the estimates are im-
proved when additional information is increasingly provided
to the model.

Kiefer (1979) extended the Goldfeld and Quandt results
by using a large sample for a normal mixture model. The pro-
cedure involves measuring the asymptotic precision of esti-
mates based on a marginal density (limited information esti-
mation) and comparing it with the asymptotic precision of
those based on a Joint density (full information estimation).
He uses the dummy variable DJ to denote regime classification
information, so that the DJ variable indicates the regime
which generated the 3th observation. Therefore, the DJ var-
iable only appears in the full information Joint density func-

tion, which can be written as:
f(yJ, DJ; 9) = A D3f1(y3) + (1 - >~)(1 - DJ)f2(yJ)

where G is the vector of parameters.

The precision of a maximum likelihood estimate based
on the Joint and marginal density is defined, respectively,
as (the subscript J was dropped for simplicity):

-E321n11y,D) and -E321nfm
3939' ' 393w

15

By definition, ln f(y, D) - 1n f(y) + 1n f(D/y), so that it
follows that:

-E 321nr(y, D) = -E 321nr(y)
3939' 3939'

- E 92 ln f(D/y)
3999:

The precision of the maximum likelihood estimator based on
the Joint density is equal to the precision of the maximum
likelihood estimator based on the marginal density (here,
f(y) corresponds to the formulation in equation (1.3)), Plus
a positive definite matrix. It follows that the precision of
the estimates based on the former is always greater than that
for the latter. Estimates are naturally more precise when
there is more information.

To confirm this relationship, Monte Carlo experiments
were conducted on a normal mixture model where the only para-
meters being estimated are the means. Precision ratios were
then taken for the full information and limited information
models and converted into asymptotic variance ratios (to fa-
cilitate a comparison with the Goldfeld and Quandt results)
by inversion of the information matrix. Note that the pre-
cisions of the estimates are derived from the information mat-
rix, and that the inverse of the information matrix is a con-
sistent estimate of the asymptotic variance-covariance matrix
of the parameter estimates.

Two types of experiments were conducted -- first, when

only one mean had to be estimated, and second, when two mean

16

values had to be estimated. Given fixed variances and the
mixing parameter, the values of the means were allowed to
vary. Over a series of cases, the asymptotic variance ra-
tios for regime known relative to regime unknown were compu-
ted, and were found to be all less than 1.0, consistent with
the results of Goldfeld and Quandt. Efficiency loss from u-
sing the marginal rather than the Joint density could be con-
siderable.

When the means of the samples are close together, ra-
tios tend to be small, so the effects of implicit misclassi-
fication are serious and estimates suffer. As the means be-
come farther apart, the probability of misclassification be-
comes so small, so that estimates become almost as efficient
(the ratios approach a value of 1.0) as estimates based on
known sample separation.

These numbers are generally a little higher than those
obtained by Goldfeld and Quandt, indicating that the value of
information in more complicated models (i.e. disequilibrium
models) is greater than that in simpler models, as seems
plausible (although this must be qualified since the Gold-
feld and Quandt results are for small samples). At any rate,
these results supplement the Monte Carlo evidence of the ear-
lier study by showing that efficiency losses from not observ-
ing sample separation, found in small samples by Goldfeld and
Quandt, persist and can be substantial asymptotically.

In his work, though, Kiefer assumed that the variances

and the mixing parameter are known and only the means in the

17

normal mixture model have to be estimated. Schmidt (1981)
extended Kiefer's results by also working on a normal mix-
ture model and he derived asymptotic variance ratios (again,
from the inverse of the information matrix), this time as-
suming that all parameters have to be estimated. The ra-
tionale behind this is that Kiefer's results understate the
true value of sample separation information for the following
reason. In the unknown regime case, the information matrix
is not diagonal and estimates of the means are improved by
knowledge of the variances and the mixing parameter, so that
sample separation information is less valuable when some of
the parameters are known than when all the parameters have to
be estimated.

A series of experiments were conducted, each done with
100,000 replications. The values of the parameters were va-
ried in each experiment and asymptotic variance ratios of re-
gime unknown relative to regime known were derived. All the
ratios are greater than 1.0, so the importance of having sam-
ple separation information is again verified. Among the con-
clusions in this study are the following: (1) the value of
sample separation information depends strongly on the natural
separation of the two samples, so that as the two distribu-
tions become far apart, the value of sample separation in—
formation goes to zero (ratios go to 1.0); (2) the value of
sample separation information is higher for the parameters of
the regime which is sampled with the lower probability; and

(3) the value of sample separation information is higher

18

when all the parameters have to be estimated, which is why
the results here show a larger value of information than in
Kiefer's study, where only the means had to be estimated.

Lee and Porter (198A) also tried to evaluate the im-
portance of sample separation information in a switching reg-
ression model. Their econometric model is different from the
usual switching models in the literature in that there is ad-
ditional imperfect sample separation information available
and this is used as the regime indicator. Lee and Porter
worked on a two-equation model with an application to cartel
stability using a sample size of 328.

The model is composed of demand and supply functions
for a railroad cartel, where an attempt is made to identify
periods in which firms are behaving collusively, as opposed
to non-cooperatively. These different behavioral rules are
reflected by differing supply functions, where the supply
curve can be drawn from one of two possible regimes. The car-
tel arrangements take the form of market share allotments.
Firms then set their rates individually and the actual mar—
ket share of any particular firm would depend on both the
prices charged by all firms as well as on unpredictable sto-
chastic forces. But the index of listed prices (which is
the price variable in the model) is imperfect, so that mem-
ber firms could not know with certainty whether secret price
cutting was occuring. It is in this context that an imper-
fect indicator is needed to determine whether the observed

price wars represent a switch from collusive to non-cooperative

l9

behavior.

Their model consists of two equations:

PJ - f(IJ, predetermined variables)

Q3 ' g(PJ, predetermined variables)

where PJ, Q3, and IJ are, respectively, the price of grain;
the total quantity of grain shipped; and a latent dichotomous
variable which equals 1, when the industry is in a coopera-
tive regime, and equals 0, otherwise. With no reliable in-
formation on IJ, it is measured possibly with error by W3, a
regime classification indicator. WJ - 1, when a trade maga-
zine reports collusion; and WJ = 0, when this same trade mag-
azine reports that a price war is occuring. This data series
may not be accurate at all, but in the absence of any other
information, this extra information may still help to reduce
the estimated standard errors. After all, a little informa-
tion (even if not entirely accurate) may be better than not
having any information at all to guide in determining regime
classification.

Their model was estimated twice -- first, using the
partial information provided by the WJ’ and second, using no
information on WJ. The estimated standard errors are smaller
for the former compared to the latter. However, for this
particular data set, the gains in asymptotic efficiency from
using the imperfect indicator are small due to the clear se-
paration of the two underlying distributions. This is evident

from the fact that the two distributions of ln P3 are far

20

apart, since the difference of the means is 0.48 and the var-
iance is only 0.01. This result complements the Monte Carlo
simulation results of Kiefer and Schmidt that the value of
any information on regime classification becomes smaller
(ratios of asymptotic variances approach 1.0) as the distri-

butions become clearly distinct.

1.“ Plan of the Study

 

Our obJective in this paper is to determine the value,
in terms of efficiency gains, of using imperfect sample se—
paration information, given different assumptions about the
parameters and different specifications of switching regres-
sion models.

We will integrate into our study the framework of Lee
and Porter regarding the use of imperfect sample separation
information in switching regression models. We will also
use the approach of Schmidt (1981) where all the parameters
in the model have to be estimated, so as not to understate
the true value of imperfect sample separation information.
Similar to Kiefer's and Schmidt's procedures, we will con-
duct several experiments over a number of scenarios with dif-
ferent parameter values, each time deriving ratios of asymp-
totic variances, where these variances can be obtained from
the corresponding information matrices. Asymptotic variance
ratios will be derived twice for each experiment -- the first,
showing the loss in efficiency when we have no sample separa-

tion information at all relative to full information, and the

21

second, showing the loss in efficiency when we have partial
sample separation information (provided by an imperfect or
unreliable indicator) relative to full information. A com-
parison of both results will show the extent of the advan-
tages of using information even if it is inaccurate, as com-
pared to using no regime classification information at all.

In the Lee and Porter paper, the imperfect information
indicator WJ was incorporated into the switching regression
model through the use of classification probabilities --
that is, the probabilities that the regime classification is
right or wrong, given the true regime that the observation
really belongs to. In their model, these classification prob-
abilities were assumed to be constant for all observations.

In Chapter 2, we will deal with the simplest formula-
tion of a switching regression model -- that of the normal
mixture model. We will adopt Lee and Porter's approach of
using constant probabilities of correct regime classification
by our imperfect sample separation information.

In Chapter 3, we extend the previous chapter to the
case where we have two explanatory variables in our switch—
ing regression model. In addition, we consider the case when
the probabilities of regime classification are non-constant,
and in fact, can be modelled as probit functions of the.e-
xogenous variables.

In Chapter A, we keep the assumptions of the previous
chapter but we also postulate that the mixing parameter is

non-constant, so that we have varying switching probabilities.

22

The mixing parameter will also be modelled as a probit func-
tion of the explanatory variables.

Chapter 5 summarizes the findings of the preceding
three chapters and presents the conclusions we have derived

based on the series of experiments conducted.

CHAPTER TWO

 

THE CASE OF CONSTANT

REGIME CLASSIFICATION PROBABILITIES

2.1 The Model

 

The first specification which we consider is the sim-
ple normal mixture model, in which a random variable yJ is
drawn from N( M1, 612) with probability A , and from
N(742, 622) with probability (1 - A.). It can also be ex-
pressed as a switching regression model, where the only ex-
planatory variable corresponds to the constant term. There-

fore, we have the following:

yJ I x1.j ﬁl + ulJ with probability A (2.1)
(regime l)
yJ -- x2J $2 + U23 with probability (1 - A)
(regime 2)

For the normal mixture case, le I x2J I l, and 31 I “1
and 6 2 I (‘2 are scalars. We assume that 1.11:} and uz.j are
independently distributed, where “1.1"“ 11(0, (12) and
uZJN MO, 622). The vector of parameters 9' I (,ul, M2,
(12, (22, A ) needs to be estimated from a sample of ob-
servations on yd. There are n observations with n1 from re-
gime i (i I 1,2; J I l,...,n).

Suppose that there is an observed dichotomous indica-
tor wJ for each J, which provides sample separation informa-

tion. In addition, for each observation J, we define a latent

dichotomous variable IJ where:

23

2A

IJ I 1 if yJ is generated from regime 1
IJ I 0 otherwise

Therefore, wJ is a measure of 13’ possibly with error. The
relationship between wJ and IJ can be described by the tran-

sition probability matrix given by:

 

 

wJ I 1 wJ I 0
I: ‘ 1 p11 p10
13 g 0 p01 p00
That is,
p11 = Pr0b(wd '3 l/IJ 3 1)
P01 = Prob(wJ = l/IJ = 0)
P10 = Prob(wJ I O/IJ I 1)
p00 . Prob(wJ = O/IJ = 0)

It follows that p10 I 1 — p11 and p00 I 1 - p01. Now, let

p = Prob(wj . 1). Since A = Prob(I 1) and (1 - A) I

J
Prob(IJ I 0), then:

p I Prob(I I 1)Prob(w I l/I 1) +

J
Prob(I

J
I 0)Prob(w

J

= l/I I O)

J

J J

The density function f(yJ) for yJ when we have a mixture of

two normal distributions is given in equation (1.3) as:

f(yj) . Prob(IJ - 1)r1(y3) + (2.2)
Prob(IJ - °)f2(yd)

25
f(yj) I >\fl(yJ) + (1 - A)f2(yJ)

When imperfect sample separation information using the ob-

served indicator, "3’ is incorporated into the model, then

the Joint density function for yJ and wJ is:
f(st wd) = fl(yJ)Pr°b(WJ: 1:] B 1) + (2-3)
f2(yJ)Prob(wJ, IJ . 0)

f2(yJ)(wJ (1 ' A )p01 +
(1 - wJ><1 - A )(1 - p01»

(1 - ) )f2(yJ)(WJP01 + (l - WJ)(1 ' 1301))

A 1E‘1(yJ)(wJp11 + (1 - wJ)(l - p11» +
(1 - A )f2(y3)(wj(1 - p00) + (1 - “3)1’00)
where:
- (y - )2
f (y ) I 1 exp J "i
i J :§——-
./21r C 2 g
i i
i I 1,2; J I l,...,n
The regime classification indicator wJ contains some
information on sample separation if p11 is not equal to p01.
When p11 I p01, or alternatively, when pll I l - p00’ then

the Prob(wJ/IJ) I Prob(wJ) and the Joint density function is:

My. WJ) I (Af1(yJ) + (1 - A)f2(y3)) x

(wJp + (1 - wJ)(1 - p))

so that the indicator wJ does not contain any information on

26

sample separation. This is equivalent to having no informa-
tion at all, as in equation (2.2). This is, in fact, Schmidt's
model and also Kiefer's marginal density function (limited
information model). On the other hand, when p11 I l and

p01 I 0 (alternatively, p00 I 1), the indicator wJ provides
perfect sample separation information and the Joint density

function is expressed as:
f(st W3) ‘ )‘f1(yJ)WJ + (1 ‘ A )f2(yJ)(1 - WJ)

This is equivalent to Kiefer's Joint density or full informa-
tion model, where our W3 is his DJ, the indicator of perfect

information on regime classification.

2.2 Derivation of Asymptotic Variances

 

We adopt the approach of Schmidt here. When the re-
gime is known or when perfect sample separation information
is available, the asymptotic variances of frT( '31 - M1),

- A - A 2 2 - r 2 2
/n(#2- M2), Jn( (1 - 61 )g and /n( 62 - (2)81'8,

respectively:

 

2‘1 Band

 

27

They are derived from the diagonal elements of the inverse
of the information matrices from the corresponding likeli-
hood functions of the respective known densities, f1(y3)

and f2(yJ). The terms A and (1 - A) appear in the above
expressions since they adJust for the correct sample size

in each regime (nl or n2), relative to the total number of
observations n. This follows from the implicit relationship
that nl I An and n2 I (l - A )n. A has a binomial distrib-

ution, so the asymptotic variance of /H( A - A) is

A(l- A).

When the regime is either completely unknown or is
partly known (due to the partial sample separation informa-
tion available), the asymptotic variances are derived in the
same manner. Therefore, the asymptotic variances of ./H(5 - 0)
come from the diagonal elements of the inverse of the corres-
ponding information matrices. That is, ./H(8 - O) approaches
the distribution specified by N (0, lim (i—3A-l) .

The Fisher information matrix is defined as:

3=-E[221nL]

399w
where:
t
I . f
L b‘zl J
In L I 15 In f
Stt J

When the regime is completely unknown, f corresponds to the

density function laid out in (2.2) as:

28

0 is a (5 x 1) vector of parameters, defined by 0 I ([A1,

#2, (12, 622, A )’. When the regime is partly known, f

corresponds to the density function in (2.3):

f(yj. W3; 0) I A 1E‘1(y3)(wdpll + (1 - wJ)(1 - 1911)) +
(1 - A )f2(yJ)(wJp01 +
(1 - wJ)(l - p01))
0 is a (7 x 1) vector of parameters, defined by 0 I (‘51,
”2' 512’ ‘22, A ’ pll’ p01)"
The expected value of the expression that denotes the
information matrix was intractable analytically, so that we
calculate the information matrix instead by simulation tech-

1/

niques- using the following expansion:

3=_E£ﬂ[a lnfj] (2.1-l)
" 8959'

2

[44.2. '1 (ﬁn-AM]
l" rJ 8989' :37 39 30

The model we have here when there is no sample sepa-
ration information is actually Schmidt's model, so we do not
need to simulate the information matrix corresponding to the
density function of (2.2) since that was done in his work;

we will Just adopt his results. All we need to simulate is

 

l/From the definition of the information matrix, we
know that (1/n35) has a limit. We therefore simulate

lim (l/n 3) by calculating (1/n3) for some finite though
large n.

29

the information matrix when we have imperfect sample separa-
tion information. In order to facilitate a comparison between
the results, we use Schmidt's approach in our experiments.

The information matrix was evaluated by a simulation
of 100,000 trials derived from a normal or Gaussian random
variable generator. For any set of values assigned to 0,
draws were made from the appropriate normal mixture distribu-
tion. The first and second derivatives were calculated in ac-
cordance with the expression in (2.4). The resulting 100,000
matrices were then averaged to obtain the information matrix,
and the asymptotic variances are the corresponding diagonal
elements of the inverse of the information matrix.

When we have imperfect sample separation information,
the expressions in (2.4) are laid out below, where f comes
from the density function defined by (2.3). The first deriv-
atives of f(y, w; 0) with respect to G are (we drop the sub-
script J for simplicity):

91' I AQl Bfl

aul 3M1

Br I (1 - A )<;22 312
3M2 8M2
2r .. A Q1 ”1

3612 3612

or = (l - A )Q2 312
""‘"2 2
K, 362
a: - lel - £262,

3A

30

Dr -- Af1(w- (l-w))
3p11

91‘ . (1 - A)f2(w - (l -w))
3p01

where:

Q1 . Wpll + (1 - W)(l - p11)
02 = Wp01 + (1 - w)(1 - p01)

Bfi . f1 (y "' M1)

 

 

2*

2
Uri :- f1 [‘1 + (y " Mi) 1
61 J

The non-zero second derivatives of f(y, w; 0) with respect to

 

 

 

 

0 are:
2 2r
a r IAQ 3 1
3M2 1 3M2
1 1
2 2r
3 r -)\ol 3 1
2 2
3M1 3‘1. ”‘1 9‘1
Der -Ql 9f1
DMIDA 3A1

92f = A(w-(1-w))3f1
3“13911 ”‘1

2
32r=(l-A)QZ 9 f2

2 2
due 3M2

31

 

 

 

 

 

 

 

 

 

 

 

 

 

 

32f I(1-A)Qz 321‘;
9.442 962? 374, 3622
32f I—QZ 3‘2
Buzay 3A2
32: =(1-A)(w-(1-w>>3_f_g_
3”231301 3"‘2
321' = A621 321‘1
N512)? emf)?
92: =Ql af1
BKIZEA -_SE:§
32f IA(w-(1-w))3f1
asleep“ 961
321‘ I(l—A)Q2 3212
3(622)2 maze)?
321‘ =-Q2 9‘2
96223A 962:
Bar I(1—A)(w-(1-w)) 3"2
9(223901 952
32: I(w-(1-w))f1
map11

921' I-(w-(l-w))f2
”2901

 

where:

32

 

22 f1 . r, [-1 + (y - 7.92]

 

22f, . at, [(y- p1)2

 

 

 

 

M615)? 23615 251 2(1
r1[ 1 _(y-h1)2]
2&1; €16
921‘1 B Bfi (y-u1)_f1(Y-M1)
aw, as? 3612 of of

i=1,2

When there is no sample separation information, such
that regime is unknown, 0 is of dimension (5 x l) or (2K + 3)
x 1 where K is the total number of explanatory variables
(i.e. K I 1 in this case). It follows that 3 is of dimen-
sion (2K + 3) x (2K + 3). When there is partial sample se-
paration information, such that regime is partly known, 0
is of dimension (7 x 1) or (4K + 3) x 1, where K is still e-
qual to 1. Therefore, S-is of dimension (4K + 3) x (4K + 3).

2.3 The Value of Imperfect Information

We derive here one set of ratios -- asymptotic var-
iances with regime partly known relative to regime known. We
also need the asymptotic variance ratios with regime unknown.
relative to regime known, but these will be simply adopted
from Schmidt's study. The results are comparable, since the
same simulation techniques have been employed in evaluating

the information matrix.

33

All the results are presented in Tables 1 through 4,
for a variety of cases. It is to be noted that asymptotic
variance ratios are not given for the parameters p11 and
p01, since these parameters are not estimated at all, when
the regime is known. All the figures in the tables are great-
er than or equal to one, and the extent to which they differ
from one measures the value of imperfect sample separation
information or Just sample separation information as the case
may be. That is, they measure how much we lose on efficien-
cy grounds in parameter estimation, when we use imperfect in-
formation, or no information at all, as compared to perfect
information when assigning regime classification.

The main interest here concerns the effects of the pa-
rameters p11 and p01, which represent the level of reliability
or accuracy of the available information. In Schmidt's study,
when there is no sample separation information at all, only
the first five parameters were used. With the partial infor-
mation provided by our additional parameters p11 and p01, we
expect our asymptotic variance ratios to be less than or e-
qual to his asymptotic variance ratios. After all, any piece
of information on sample separability, even if not entirely
accurate, may facilitate identification of regime membership
for the observations, and thereby improve the efficiency of
the parameter estimates, as compared to when no information
is used at all. For purposes of comparison, we present
Schmidt's figures in parentheses underneath the figures we

derived.

34

We conduct four types of experiments here. First, for
a given set of parameter values, we vary our probabilities
of regime classificationg/. The different values assigned
to p11 and p01 values represent the range from highly im-
perfect information to almost perfect information on regime
classification. In the other three types of experiments, we
choose a particular p11 and p01 mix, and allow the following
to vary -— the difference between the means, the variances,
and the mixing parameter.

Table 1 presents the results when we couple different
regime classification probabilities with fixed values of the
other parameters -- “l I 0, 712 I 2, 61 I 62 I 1, and

A I .5. The values assigned to the means and variances are
not as restrictive as they might seem, in the sense that they
are invariant to translation (#1 I 0, 1.42 I 2, 61 I 52 I 1
give the same results as “1 I -6, M2 I -4, 61 I (2 I l)
and to scale (ill I 0, 1A2 I 2, Si I C: I 1 give the same
2 g L" G1 " ‘2
When p11 is equal to p01, then we have the special case

results as (Al I 0, A1 I 2).
of there being no sample separation information at all, and
the ratios derived here should be the same as Schmidt's.

The difference (i.e. 41.7 versus 41.1 for "1) is presumable
due to randomness in the simulation of the information matrix.

When p11 is equal to p00 (where p00 I 1 - p01), that is, when

__7__

- These regime classification probabilities are as-
sumed constant for all observations in each case, and can be
estimated (as Lee and Porter did) by maximum likelihood.

35

.nopmmno mng :H moHnmu gonna on» mom can» mH mass

.mzomx mH oEHwon mom: on o>HpmHom mzocxms mH oEHmoh cons moocMHhm>
oHpoaQEmmm ho mOHumn on» one mononpconmm cH momstm .OOOOOH I m

 

 

 

 

 

1m.mev Ae.~Hv Ae.mHv 1:.ozv AH.HeV
m.=H mm.: mo.= m=.m 0H.w H. :.
H.e= mm.w em.m m.:m «.mm m. m.
m.m= mz.m mm.m :.mm w.mm m. H.
m.mm em.m mm.m m.~m m.mm m. :.
3.2m mm.m mm.m H.mm m.mm m. h.
m.Hm N.OH F.0H ~.Hm m.mm m. :.

muHSmom thOHpan<

Am.wev Am.mHe he.mHV A:.o:v AH.HeV
mm.H om.H mm.H mH.H Hm.H mm. mo.
mm.m Hm.m mm.m mm.m mm.m m. H.
em.m Hm.~ aa.m He.m Na.m a. N.
e.~a ea.m ae.: em.e ma.e e. m.
m.mm mo.e mm.» m.mH H.o~ m. e.
=.mh o.mH o.mH 2.0: ~.H= m. m.
M. «M_ .Hw ma HQ Hod HHQ

aﬂocx\ncsomxmbv mzocx zHuumm

mOHpmm unoposmmwm

 

 

 

m. I A .H I mw I HV .m I m1 .0 I Hi cons Hon Home HHQ 93.3.2?

 

moonHnm> oHpOpm8mm< no mOprm

.H oHnea

36

there are equal probabilities of correct classification into
each regime, then the ratios diminish considerably, with the
figures being lowest (efficiency is highest) when there is
greater certainty about rightly or wrongly assigning the
observation into each regime. The ratios approach one when
pll goes to zero, and p01 goes to one (alternatively, when
pll goes to one, and p01 goes to zero); that is, when there
is almost perfect sample separation information. In this
sense, the use of imperfect sample separation information
leads to estimates which are almost as efficient as those
derived when regime classification is completely known.

An interesting observation here is that the value of
information is unchanged when p11 and p01 are symmetric
(i.e. pll I .2, p01 I .8 give the same results as p11 I .8,
p01 I .2; alternatively, p11 I pOO I .2 give the same results
as p11 I pOO I .8). This is a consequence of the identifica-
tion issue referred to in Lee and Porter, such that when
xlJ I x‘?‘j for all J, as they are here (they are both equal to
one), then the names of the two regimes can simply be inter-
changed, and this holds true when there is no sample separa-
tion information and even when there is imperfect sample se-
paration information. This does not really come as a sur-
prise since in the normal mixture model, the only parameters
being estimated in a regression sense are the means; there-
fore, it makes no difference at all about having the same
probabilities for right or wrong regime classification, since

we can merely switch the names of the regimes.

37

Additional results show different p11 and p01 values
paired together. When p11 and p01 are close together but
are in the intermediate range (.4 to .6), then the ratios
are highest. This occurs when uncertainty about regime clas-
sification is at its peak, since the imperfect information
indicates that there are almost equal chances of misclassi-
fication into both regimes (p11 and p01 are close to .5).
Note that at the extreme, when pll I p01 I .5, we have no
information at all. When p11 and p01 are close together but
are out of the intermediate range, then the ratios go down.
This means that when there is greater certainty of correct
regime classification into the two regimes, or when the par-
tial sample separation information is quite reliable for both
regimes, then the ratios decline and efficiency improves.
Tables 2, 3 and 4 illustrate the case of a particular
p11, p01 mix -- we choose pll I .8, p01 I .2. In Table 2,
ha is allowed to vary. The results are similar to Schmidt's
findings that the value of sample separation information de-
pends on the natural separation of the two regimes. As the
distributions become far apart (£12 increases while [41 is
constant), the ratios diminish and tend to approach one.
When the means are very close together, the resulting ratios
show the substantial gains in efficiency when information is
quite accurate as compared to using no information at all.
Table 3 takes the cases where A I .2 and A I .5.
The results are again similar to the earlier findings that

when the distributions are fairly close to each other

38

 

 

 

 

Aoo.HV goo.Hv goo.Hv Aoo.Hv Aoo.Hv
oo.H oo.H mam. mam. oo.H m
AHo.HV 10H.HV Aeo.He Amo.Hv Amo.Hv
oo.H eo.H mo.H mo.H mo.H m
1:0.HV 1mm.HV 1mm.Hv AHH.HV AHH.HV
No.H om.H mH.H eo.H wo.H m
Amm.HV 1mm.Hv ANH.HV Amm.HV Aom.HV
MH.H m=.H ma.H em.H mm.H :
Ame.mv Amm.mv “Hm.mv 1mm.:v AHN.:V
m~.H mm.H mm.H Hm.H mm.H m
1m.mev Am.mHv Ae.mHv A=.o:v AH.H:V
mm.m em.m ee.m me.m mm.m m
M mmw NHW ma Ha m1
czomx\AmzochDv czocx szamm
mOHpmm moposmmmm

 

 

 

 

@- I OOQ I HHQ am. I K «H I N” I HV «0 I Hi C033 N: MCH%HG>

 

moocmHnm> oHpOpQEmm< no mOHpmm .m oHnt

39

 

 

 

 

 

1m.wev 1m.mHV he.mHv 13.031 1H.H:V
mm.m em.m ee.m me.m we.m m.
AH.mme Amo.mv 1H.eHV Am.omv 1m.mev
mm.m mm.m mm.m mm.m mm.m m.
M mmw me ma HQ &
mzomx\ﬁczocxmov 2302M mHuhmm
mOHpmm popoﬁmnmm

 

 

 

w. u OOQ ﬂ HHQ «H I N06 ﬂ HV «N H N1 «0 N H: Swat?» 4 wCHhhmxw

 

moocmew> oHpoumEmm< uo mOHpmm .m oHnt

4O

 

 

 

 

Aom.mv AHm.HV Aem.mv Aem.Hv Amm.HV

am.m HN.H mm.m mN.H mm.H :
Am.mHv Ham.HV Azm.ev A-.>v Amm.mv

mm.u mm.H ~>.m o>.m mm.H m
Aw.mev 1e.mHV Ae.mHV 1:.oev 1H.H=V

mm.m Nm.m >~.m m~.m mm.m H

m mmw me ma Ha me
mzomx\ﬂmzomxmsv czomx szmmm
mOHpmm , popoemmmm

 

 

 

 

we I OOQ ﬂ HHQ am. I A «H I Hb qN I N: «O I H: C053 NU MCH5HN>

 

moommHnm> oHpOmemm< no mOHpmm .: oHme

41

( “1 I 0, [A2 I 2) and the imperfect information is quite
reliable (p11 I p00 I .8), then there are large efficiency
gains in using partial information relative to using no in-
formation at all. In addition, we observe that the value of
sample separation information is higher for the parameters
of the regime which is observed with the lower probability,
in this case, regime 1.

Table 4 gives results when the variances are not e-
qual. The larger the difference between the variances of
the two samples, the lower the ratios become. It is apparent
that not only does mean disparity between the regimes contri-
bute to distinct sample separation, but also disparity of the
variances. Another observation here is that the ratios are
higher for the variance parameter of the sample which has
the smaller variance and the reason behind this is fairly in-
tuitive. A surprising finding here though, is that the dec-
line in the ratios as the difference between the variances
widens is not monotonic for the mixing parameter when partial
information is available, and the reason for this is not

clear.

2.4 Summary

 

We have studied the value of imperfect sample separa-
tion information in a simple normal mixture model, where all
the parameters have to be estimated. This was done under
different values for the probabilities of correct regime

classification. The ratios of asymptotic variances for

42

regime partly known relative to the asymptotic variances for
regime known were computed. These ratios are highest when
there is greater uncertainty about regime classification (p11
and p00 are in the intermediate range) and the ratios are
lowest when there is almost perfect certainty about right or
wrong classification for both regimes (p11 and p00 are in
the extreme range). In between is a continuum of values de—
pending on the reliability of the sample separation informa-
tion for each regime.

A variety of experiments were also conducted and these
show that the value of sample separation information largely
depends on how much alike the two samples are. When the sam-
ples are hard to distinguish from one another, then the value
of information is highest. At any rate, the presence of the
partial sample separation information tends to diminish the
value of any other additional information, since the figures
derived are considerably lower than those when there is no
sample separation information at all.

These results suggest that any information should be
used, even if there is uncertainty about its reliability or
accuracy, since even imperfect sample separation information
improves the efficiency of the estimates. Of course, the
more reliable the imperfect sample separation information,

the greater the gains in efficiency.

CHAPTER THREE

 

THE CASE OF NON-CONSTANT

REGIME CLASSIFICATION PROBABILITIES

3.1 Introduction

 

In the previous chapter, we considered and evaluated
the value of imperfect sample separation information in a
normal mixture model, where the imperfect information is re-
flected through constant probabilities of regime classifica-
tion. We concluded that the more reliable the imperfect in-
formation, the greater the gains in efficiency, since there
is greater certainty of right or wrong regime classification.

We now extend that model to a switching regression
case where there are at least two independent variables -- a
constant term and one or more other explanatory variables.
In addition, we consider the case when the classification
probabilities are non-constant, and in fact, can be modelled
as probit functions of the exogenous variables. The rationale
behind this is that the values of the explanatory variables
are highly likely to affect the regime classification of the
dependent variable, increasing the reliability of the imper-
fect information indicator. Consequently, treatment of the
probabilities as non-constant for each observation adds more
reliable information to the model and will hopefully improve
the efficiency of estimation.

The framework for the use of imperfect sample separa-

tion information was derived from Lee and Porter who used

43

44

switching regression techniques to model a supply function
for a railroad cartel. In their model, the observed regime
classifications were obtained from data from a trade magazine
(presumably reported with error) on whether there were price
wars or not. The probabilities that these regime classifica-
tions were in fact, correct were assumed constant, and there-
fore independent of the exogenous variables.

Their model can be improved upon by postulating that
the classification probabilities are dependent on the exoge-
nous variables and will differ for each time period. Taking
the Lee and Porter application as a case in point, we note
that their explanatory variables include a Great Lakes dummy
variable and several dummy variables on structural changes.
The Great Lakes dummy variable documents when the Great Lakes
were made open to navigation so that the cartel faced its main
source of competition. The structural changes dummy variables
are used to proxy changes caused by the entry, acquisitions
or additions to existing networks in the railroad industry.
When the Great Lakes were made open to navigation, or when
there were instances of entry and new acquisitions, we expect
that there will be price cutting or non-cooperative behavior
among the firms in the cartel, due to the presence of other
competitors in the industry, and this will be reflected in the
imperfect indicators of information -- data from the trade
magazine.

Using this information during each time period adds to

the certainty on regime classification, as to whether’there

45

were indeed price wars or not. This raises the probabilities
of correct classification and also leads to higher efficien-
cies of parameter estimation, as opposed to the case when
constant probabilities are applied for each time period as
Lee and Porter did. Our suggested treatment of the derivation
of classification probabilities as non-constant seems to be a
plausible alternative to theirs in the sense that we use more
information (at no extra cost of obtaining this information)
in solving for these probabilities, which presumably improves
efficiency. Also, their model is a special case of ours, so
we can test the adequacy of their model against the alterna-

tive of our model.

3.2 The Model

 

We extend the model of the previous chapter to the case
when there are at least two explanatory variables, and when
the probabilities of regime classification (i.e. p11 and p00)
are not fixed. Suppose for simplicity that le I x23; we

call it xJ so the basic switching regression model is:

= 1
yJ xJ 81 + ulJ with probability A (3.1)
(regime l)
= I -
yJ xJ $2 + u2:! with probability (1 A )

(regime 2)

$1 and 82 are vectors of parameters. The error terms u

1J

and 112.1 are assumed to be independently and normally distri-

buted with means 0 and variances (12 and (22, reapectively.

46

When there is imperfect sample separation information
or the regime is partly known, we can then consider an ob-

servability model on probability classification like:

p11J ‘ F(XJ"K1) where p11.1 I Prob(wJ I l/IJ I 1)
for each J

I F(xJ"X0) where p00J I Prob(wJ I 0/IJ I 0)

p003
for each J

p10,1 ' 1 ' p11,1

‘ 1 ’ p003

p013
where P( ) is a standard normal cumulative distribution

function, and X1 and X0 are vectors of parameters. is

”J
the observed dichotomous indicator which provides sample se-

paration information, while I is the latent dichotomous in-

J
dicatcr of the actual regime classification. In essence,
the regime classification probabilities are probit models
of observability. This contains the Lee and Porter model as
a special case, that is, all the elements of 1K1 and 1X0 are
zero, except for those corresponding to the constant term.

The Joint density function for yJ and wJ is then re-written

from (2.3) and given as:

rJ = f(yJ, wJ; 0) (3.2)
I )\f1(yd)(wdp113 + (1 - wJ)(1 - p113)) +
(l - A)f2(yd)(wJpC,L1 + (1 - wJ)(1 - p013))
= Af1(yJ)(wJF(xJ' x1) + (1 - wJ)F(-xj"(1)) +
(1 - A>r2(y3>(wJF(-x3' x0) + (1 - ‘3)fo ‘60))

47

 

 

where:
, 2
fi(yJ) I 1 exp [:- (yJ - xJ Hi) ]
J2? 61 2612
x "x _ 2
pllJ EX 3 1 :- exp [:4]va
-ob ./ 1T 2
x "X 2
pOOJ I X J O 1 exp [:- VJ ] va
-eb Jﬁ' 2

i I 1,2; J I l,...,n

That is, f1(yJ) and f2(yJ) are normal probability density
functions with means and variances given by N(xJ' F1, 612)
2
I U o
and A(xJ 52, ((2 ), respectively, and p113 and p00J are
probabilities of correct regime classification denoted as

probit models.

3.3 Derivation of Asymptotic Variances

When the regime is known, the asymptotic variances
_ l‘ .. " ._ A 2 2
or \/n( $1 - Fl), Jn(pz - F2), /n( 61 " ‘1),
JH( 822 - £22), and JET AA - A) are, respectively:

512. (11m £1732. xjxd') '1;

 

 

“-7.0
A
2 li‘ I '1.
_§.£_ .333 n lIxeJ) ’
1-A I4
2361 ;
A
252“;and
1-A

A(1- A).

48

As in the previous chapter, there are no asymptotic var-
iances for the parameters 1X1 and X0 (which enter the p113
and p00J probability functions of regime classification) since
these parameters are irrelevant when regimes are completely
known. The above expressions for the asymptotic variances
of ’81, [$2, 212 and £22 are derived from the inverse of
the information matrices from the corresponding likelihood
functions of the known densities associated with the respec-
tive regimes. The asymptotic variance of A. comes from that
of the binomial distribution.

For cases when the regime is either completely unknown
or is partly known, the asymptotic variances of x/H(B - O)
are derived in the same way -- from the diagonal elements of
the inverse of the Fisher information matrix. Therefore,
./H(6 - 0) approaches the distribution specified by the fol-

lowing expression -- N (0, lim(%— 3)-1) .

The information matrix is defined as:

 

§=-E 221nL
3929'
where:
A
leg‘rj
M
lnLIZ lnf
In J

Therefore, when the regime is completely unknown, f corres-

ponds to the density function of (2.2) given as:

f(y,; 0) = Ar1<y3)+ (l - A ”2%) (3.3)

49

0 is a (2K + 3) x 1 vector of parameters, where K is the
number of explanatory variables. In particular, 0 I ( pl',
2 2
(12" ‘1 ’ ‘2 . 31' "here “1 ‘ (211’ $12...” 51x"
I v .v
and 32 ( 821, $22,..., F2K) . When the regime is

partly known, f corresponds to the density function in (3.2):

f(yj. wJ; 0) -- A rl(y3)(wJF(xJ"61) +
(l - wJ)(l - F(xJ"61))) +
(1 - A>r2(y3)(w3(1 - chJ'xon +
(1 — wJ)F(xJ'XO))
O is a (4K + 3) x 1 vector of parameters given by ( 51"
52', 612, (22, A, ’61', 30')' where 81 and ﬁzare
defined as previously; and ‘X1 I ( Xll’ 'K12,..., 1X1K)' and
X0 I (K01, X02,..., XOKV are additional parameters.

As in the earlier chapter, the expected value of the
expression that represents the information matrix was analy-
tically intractable so that we instead calculate the informa-
tion matrix by simulation techniques in either of two ways:

I 2

(1) 3 ,
. JSI

II
I
t!)
M
r-—'—I
le
Q)
“I
I

ll
tr]
M 3
r———-v
h
Q)
:3
"9
V
A
Q)
H
:3
\—/
.__L

(2) 3

ll
tli
1M:
r———"1
"" I
H
N
A
Q)
“J
V
A
Q)
*5
v
L.____I

50

The second method of calculation follows from the first
method, in the sense that, in the limit, the expression

, 2
- E Z [ l 3 fj] goes to zero. In addition, the second
J" — a
039'

%
method has the added advantage of being positive definite
always, and not Just in the limit. For this reason, we choose
the second method of calculation, and throughout the experi-
ments we will be conducting, the information matrixi/ is to
be calculated as follows:

>=E3[;L(Zﬂqtﬂiy]

1" r2 30 39
J
For the case where regime classification is complete-
ly unknown or when there is no sample separation information

at all, the first derivatives of f(y, 9) (we drop the sub-

script J for simplicity) with respect to 0 are:

21' = ) 3f1
3“1k 351k

ar =<1- x) af2
2F2k 362k

31' = ) 3f1
3‘12 3612

—_T—

— The expressions for the elements of the information
matrix when the first method of calculation is used are also
derived and are given in Appendix A.

Since

§ is

known
tion,

are:

51
21' -(1->.) 3f2
""7

arr-r -r

where:

3f1 = 1‘1 (y‘x'f’i)xk

 

3% = f1 [(y’x'91)2-1]
72
51

1 = 1,2; k = l,2,...,K

G is of dimension (2K + 3) x 1, then it follows that
of dimension (2K + 3) x (2K + 3).

For the case where regime classification is partly
or when there is imperfect sample separation informa-

the first derivatives of f(y, w; G) with respect to 0

91‘ = )Q1 Bfl
351k 3‘31k
21‘ :=(1-).)c;22 3f2
232k aka
3r = ”Q1 3f1
@612 3&1:
3r =(1-)s)Q2 3f2

52

9f I f
2A

- f

1Q1 2Q2

Bf - “‘1‘" - (1 - w)) NW“ 1)
9‘11: 331k

 

9: - - <1 - k >f2<w - (1 - w)) “W" o)
aka 2‘50::

 

where:
Ql = wF(x"61) + (1 - w)(1 - FUNK 1))

Q2 = w(1 - F(x'\S 0)) + (1 - W)F(X"6 0)

 

 

3fi 3 1‘i (y - x' 61) xk
28.1k 61

3f1 3 f1 [(Y-X' 61)2_1]
3‘1 2512 612

aF(X'XS) g E(X"6 s) X
3% sk

 

k

1 = 1,2; k = 1,2,...,K; s = 0,1

where ﬂ(x'x 8) is a standard normal probability density
function. Since 0 here is of dimension (“K + 3) x 1, it
follows that 3- is of dimension (“K + 3) x (“K + 3).

The simulation involves a large number of trials de-
rived from a normal random variable generator. For any set
of O values, draws were made from the switching regression
model and the information matrix was obtained by averaging
the expressions derived from the first derivative components

of the density function over the number of replications,

53

given that the regime is either unknown or partly known.
The asymptotic variances are the corresponding diagonal ele-

ments of the inverse of the information matrix.

3.H The Value of Imperfect Information

 

We derive here two sets of ratios -— asymptotic var-
iances with regime unknown relative to regime known; and
asymptotic variances with regime partly known relative to
regime known. The ratios in the former are greater than or
equal to those in the latter, since the presence of informa-
tion, even if imperfect, improves the efficiency of parameter
estimation. In addition, all the figures we will be deriving
are greater than or equal to one, and the extent to which
they differ from one measures the value of information, or
imperfect information, as the case may be. These ratios il-
lustrate how close to full information efficiency our esti-
mates will be when we are faced either with no information
at all or with unreliable information.

We assume we have two exogenous variables x1 and x2,
where x1 is a unit vector and x2 is defined as exp (- x3),
where x3 is a standard normal random variable, which we also
derive from the normal random variable generator. We essen-
tially conduct experiments of two types here. First, for a
given set of X values which denote some information, we
vary the F parameters to find out the effects of the same
amount of information on the estimation efficiencies of dif-

ferent regime distributions. Second, for a given set of F

5“

values, we vary the ‘6 parameters to find out the effects of
different levels of information observability about regime
classification on the estimation efficiencies of a particular
sample distribution.

Given arbitrary values for G = ( 911, $12, $21:

F22’ 612’ ‘22: >\a $11, $12, ‘601, ‘02). (we chose
0 = (1, l, 2, 2, l, 1, .5, l, -l, l, l)' for regime partly
known), we initially compare results for the ratios of asymp-
totic variances when we have n = 5000 and n = 20000. Although
there are differences in the absolute magnitudes of the fi-
gures which range from .1 to .7 for both regime partly known
and unknown, the difference in computer costs makes us opt for
the smaller sample size, since the relationship among the
relative magnitudes prevails.ﬂ/ We therefore made use of a
sample size of 5000 for all our experiments.

We first need to establish the non-informative case.
In the previous chapter, we had discussed an implicit "infor-

mativeness" condition in the model. When p113 = l and

pOOJ = 1, then the indicator wJ provides perfect sample

 

i/F’or the same 9 values, we also compare results under
both methods of solving for the information matrix. There
are differences in absolute magnitudes that become smaller
as the sample size increases from 5000 to 20000. Under the
first method, the differences in the ratios between the two
sample sizes range from .1 to .4, while under the second me-
thod, it is from .1 to .7. It is expected that as the sam-
ple size increases some more, the absolute difference between
the two methods will decline. Although the absolute magni-
tude differences persist, the relationships among the relative
magnitudes are fairly constant. This at least partially Jus-
tifies our choice of method 2 for calculating the information
matrix and a sample size of 5000.

55

separation information. Partial sample separation informa-
tion is given by wJ when p11.j is not equal to p01J, which is
equivalent to the condition that pllJ # 1 - p003, or that
pllJ + p003 # 1. This implies that wJ provides no sample
separation information at all when p11, = l - p00 , or
a J

p113 + P003 ‘ 1'

In terms of our model, where p , and p are denoted

ll.J 003
as probit functions, then the "informativeness" condition can
be expressed as a simple restriction on the parameters le
and ‘KO, which enter our probit models of information obser-
vability. Information is not provided when ‘K1 3 - X 0 since
= . 1 ' a
then, p113 + p003 1, that is, F(xJ X1) + F(xJ ‘60)
I 13' _ ' 3 '
F(xJ ‘X1) + 1( xJ X1) 1, for any xJ , where F( ) is a
standard normal cumulative distribution function. Combina-
tions of parameter values where K 1 = - ‘K 0 can be illustra-
ted by any number of examples. A case in point where no in-
formation is provided is when ‘Kl = ‘Ko = 0. This implies
vs a: ' a a: a ' =

that p111 F(xJ ‘61) 11(0) .5 and pooJ F(xJ ‘60)
F(0) = .5. This was the non-informative case we had in the
previous chapter where the probabilities of regime classifica-
tion were assumed constant, i.e. p11 = p00 = .5.

We now proceed with the first type of experiments we
have to conduct, where for given K values, we vary our F pa-
rameters. We choose X= (l, -1, 1, l)‘ where there is some

information provided, i.e. 'X1_# -‘X0. The first case is

when we allow the 6 parameters of regime 2 to deviate

56

uniformly from regime 1, such that $2 =- h 51 (h - 1,2,“).
We hold 612 - 622 - 1 and )v - .5. The results are pre-
sented in Table 5. Figures in parentheses are the ratios
when the regime is unknown relative to when regime is known.

When Fl 3 F 2 so that the regression equations are
the same for both regimes, the presence of information given
by X 1 and ‘60 greatly improves the efficiency of the esti-
mates. With no information at all, the ratios go to d»,
since the samples are impossible to disentangle while the ra-
tios are finite with some information available. An interest-
ing observation here is that the value of sample separation
information is much greater for the slopes than for the in-
tercepts when the regime is partly known. For the case of
the estimated mixing parameter,)\, the value of information
for regime partly known is co , and for regime unknown is 0.
There is no meaning that can be attached to this parameter
in this instance, since the samples are difficult to distin-
guish from each other anyway.

The choices for $1 and (32 are of course restrictive.
However, note that the results are invariant with regards to
location and scale, as long as $1 8 B 2 and 612 .- (22.
$1.‘ (1, 0)‘, $2 - (1, 0)’ gives the same results as
91 - (1, 1>', 92 - (1. 1)'.

When $2 - h $1 (h f 1), so that the intercept and
slope of one equation move away from the intercept and slope
of the other equation by the same proportion, then the value

of sample separation information decreases monotonically as

57

can» a“ mass

.mmpmano man» CH moHnmu nosuo on» you

.czocx mH oEHmon con: 0» o>HpmHon czocxc: mu oEHwon cogs
moccaana> OHpounEhma no moapmu on» and mononuconam ca nonswam .ooom I :

 

 

 

 

Ao.Hv AN.HV AN.HV Am.ﬂv Am.av Am.mv Am.Hv
o.H NIH H.H m.H NIH m.m m.H z a H H
Am.mv Am.mv Am.mv Aw.Hv Ao.zv Am.mv Am.zv
o.m N.N o.N w.H :.m 5m m.m m N H H
on AQV A%V Aiv AQV Aiv AQV

on m.m m.m Nil. 56 m.i. om H H H H
on “any A 9; Aoav Aoov Aasv Aosv

o. m.m m.m mil. 56 NJ; w.m o H o H

K< NNM. NHM NmmK Hmm mHm HHw mmm Hmu NHm HHn

:30:M\Aczocx:Dv caoqx szpmm
mOHpmm mnmpoEmpwm

 

 

 

 

.AH .H .H- .HV n x
.m. u A .H u we . He ..AH .HV u an can; Ada a . mu V : mcﬁspas

 

mmocmHmm> OHqumme< mo mOHumm .m mHnme

58

h increases. Note the large decline in the ratios of var-
iances for the estimated slopes as soon as the samples be-
come distinct from each other. When the intercepts and
slopes of the two equations are sufficiently far apart, the
ratios when the regime is partly known or is completely un-
known tend to approach one. In addition, the ratios tend

to equal each other in both cases of observability so there
is very little value in obtaining sample separation informa-
tion or using imperfect information (when available), when
the regression equations are clearly distinguishable.

The second case we consider is when the regression
equations are made distinct from each other by moving the
slopes away, but keeping the intercepts constant. The re-
sults are presented in Table 6. The value of sample separa-
tion information decreases monotonically, as $22 increases
with $12 I 0. Again, the decline in the ratios is very
steep as soon as the samples are made distinguishable, i.e.

6 - (o, o, o, 0). and ﬁ - (o, o, o, 1)'. As the slopes
move farther away, the decline in the ratios is not very
great, or is rather slow. As before, there is very little
value in obtaining sample separation information or using
imperfect information when the samples are clearly distinct,
since the ratios with partial information and with no infor-
mation at all tend to equalize.

The third case is when the regression equations are
made distinct by moving the intercepts farther away, but

keeping the slopes constant. The results are in Table 7.

59

 

Am.HV A=.Hv A=.Hv A».HV Am.mv Am.mv Am.mv

 

 

s.H m.H m.H m.H o.m m.m m.m s o o o
Am.mv Am.HV Am.av Am.Hv AH.mv Am.mv As.mv
m.m >.H w.H ~.H b.H h.m m.m m o o o
A~.mv Am.mv Am.~v “o.mv Am.mv Am.mv As.mv
m.: o.N m.H m.H h.m m.m m.m H o o o
on “coy Aoov Aoov Acov “any Antv
st m.m m.m m.ss s.m ~.ss m.m o o o o
K< mmw NHM. mum Hm@ NHMW HHm Nmm Hmm «Hm HHQ

csocx\ﬂczocxcbv csocx mHuuwm

 

soapsm

 

 

mumposwpam

 

.AH .H .H. .Hv an.

«m. "K «H n NV I HV nuAmmu «O «0 «0V "m

2023 mmu wCHznw>

 

moosznm> oHuomeam< no moprm

.m sﬁssa

 

60

 

 

 

Ao.Hv AH.HV Ao.HV Am.Hv Am.Hv As.mv Am.ﬁv
o.H H.H o.H m.H m.H s.~ m.H o m o o
Am.HV Ao.mv Am.av Ao.mv Am.mv Am.mv Am.m~
~.H s.H s.H o.m o.m m.~ o.m o s o o
Ao.mmv Am.mﬁv Ao.mﬂv Am.mv AH.HmV Am.mv AH.HmV
a.» m.m m.m m.m m.m m.m m.s o m o o
Am.mmmsmv Am.mﬂsv AH.mosv As.mv A=.mmmmv Aa.sv A=.ommmv
o.os H.m o.m s.m s.mm 0.: m.sH o H o o
2: Ac: :8 Aaov 33 no; 33
a. m.m m.~ m.ss 5.0 m.=s m.m o o o o
m H mm Hm «H as mm Hm NH HH
A. N w L a s a s s s s s

csocx\Aczocx:Dv csozx meumm

 

mOHpmm

 

 

muoooamnmm

 

.AH .H .H- .HV um”

 

mmocwﬁnm> OHpOmemm¢ no moHpmm

.s mﬂpss

«m. I K «H I NV I H% q.Ao «Hmu «0 «CV I& 2033 Hmm wCHhhw>

 

61

Here, the value of sample separation information declines

but the decline is not monotonic for the estimated intercepts
and variances; the decline is monotonic for the slopes
though. This same observation was also found in Kiefer's
study (1979) of a normal mixture model. When the intercepts
are close together (in this case, they are equal), wrong
classification does not seriously affect the quality of the
estimates. Then, as the intercepts move farther away, the
effects of misclassification become more serious and the es-
timates suffer. When the intercepts become still farther a-
part, the probability of misclassification becomes so small
so that the estimates become almost as efficient as estimates
based on known sample separation. As the intercepts move a-
way from each other, the decline in the ratios is more sub-
stantial, or faster as compared to the case when the inter-
cepts are held constant, but the slopes are moved farther a-
part. Again, there is very little value to obtaining infor-
mation when the regression equations are clearly distinct,
(since the ratios with partial information and with no infor-
mation at all tend to equalize.

The very large values of the variance ratios for the
estimated intercepts, variances and mixing parameter, when
the samples are sufficiently close and when there is no in-
formation at all, seem to suggest that the intercept is a more
important component of the regression equation in determining
separability of the two distributions, as compared to the

slope. It is more difficult to distinguish one sample from

62

the other when the intercepts are close together rather than

when the slopes are. Compare the cases of' B-=(0, 0, 0, l)'

and ﬂ - (0, 0, 0, 2)‘ in Table 6 as against the cases of
{3- (o, o, 1, or and (3- (o, o, 2, 0)' in Table 7.

Note that our values for $1 and $2 are restrictive,
but they are invariant with regards to translation, as long
as the other parameters in G are not changed. 31 - (l, l)',
B2 - (2, l)' gives the same results as. 91’- (0, 0)‘, F2 3
(l, 0)‘. A related observation is that $1 - (0, 0)‘, f2 -
(2, 0)‘ gives almost the same results as $1 I (0), 92 I (2)
where the latter comes from a normal mixture model. The ra-
tios in the former are slightly bigger than the ratios in the
latter, since there are more parameters to estimate in the

former, even if $12 - = 0.§/ When we estimate a nor-

322
mal mixture model, the ratios corresponding to ‘311’ $21,
212, 222 and a are 14.1 (50A), 4.9 (50.5), 2.6 (116.9),
3.7 (16.2) and 7.0 (98.9), respectively. When regime is un-
known, the ratios Schmidt (1981) derived in an earlier paper
are very similar to the above figures in parentheses. The

only difference is that Schmidt's ratios are smaller (i.e.

 

57

— Note that when a row and column corresponding to a
certain parameter is deleted, this implies that either the
model does not contain this parameter, or that the parameter
is part of the model but is known a priori and need not be
estimated at all. In the former case, the value of infor-
mation is more important when the model is more complicated,
or when 0 has more parameters even if both models are pre-
sented with the same amount of information in X and ‘6’ .
In the latter case, when some parameters are known a priori
and need not be estimated, resulting ratios are lower since
they understate the true value of information.

63

Nl.l, “0.4, 12.7. 12.6 and 78.8 as presented in Table l of
the previous chapter) presumably due to a larger sample size
(n - 100000) so that the results are much tighter.

We now turn to the second type of experiments we
will be conducting,§/ that of varying the X parameters given
particular 8 values to find out the effects of different le-
vels of observability on the efficiency of parameter estima-
tion with fixed regression parameters.

We had earlier established that the intercept terms
are more important than the slope coefficients in determining
regime classification since ratios tend to be higher (the va-
lue of sample separation information is more important) when
the intercepts are moved farther away, rather than when the
slopes are moved apart. For this reason, we choose a 9 set
equal to (0, 0, 2, 0)‘ where the slopes are equal but the
intercepts are different. Note that 6" (0, 0, 2, 0)‘ is in-
variant with regards to transformation to some other F forms,
i.e. (5: (2, 2, u, 2)' and (3- (2, o, a, or.

Our first case is presented in Table 8. Given ‘61,
the X 0 combinations are arranged from highest ratios (least

information so most inefficient) to lowest ratios (most in-

formation so most efficient). When X1 = - ‘6 0’ this is

 

élThis is the extent of our experimentation in this

chapter. We will not attempt to change the variances nor
the mixing parameter, since the earlier chapter had already
established the results for these cases; that is, the value
of sample separation information is higher for the parameters
of the regime which is observed with the lower probability,
and higher for the variance parameter of the sample which
has the smaller variance.

6H

 

 

 

 

 

 

 

m.m m.~ m.~ m.H 0.: m.m =.= H HI H HI
s.m =.m m.m m.H H.: m.m =.: HI H HI H
H.» H.m =.m w.m ~.= m.m m.m HI H H H
H.» m.m m.m m.m m.m N.m w.: H H HI H
m.m m.m m.m m.m >.m N.m o.m o H HI H
:.mH =.m 5.: m.m H.HH m.m z.HH H 0 HI H
o.mm m.mH m.:H w.m H.Hm m.m H.Hm H HI HI H

.m‘ NNM‘ NHM mmw Hmw NHM/v. HHW Now Ho¥ NH” HH”

czocm\csosx szpwm
moprm mumpoemmmm
2H- .Hv I x .m. .x

«HINW IHw q.Ao «N no soy Ia

can: on wchas>

 

moocdHum> oHpopqum< no moprm

.m eHnse

 

55

the case of no information and when ‘61 I X 0’ this is the
case of the most information.

Given X1, the value of information is higher when
we change the slope of the probit model, [02 (keeping X01
I 0) rather than the intercept of the probit model, X01
(keeping ‘6 02 - 0). This implies that the intercept term in
the probit function is more important in increasing the ef-
ficiency of the parameter estimates given the available in-
formation. Given X 02 and ‘61, ratios are lower when ‘01
is higher so that there is more efficiency here, and ratios
are higher when X01 is lower so that there is less efficien-
cy. The transition from least information (smaller X01) to
most information (larger X01) improves efficiency when ‘0
is closer to X1 values. The most efficient estimates occur
when X1 I 60. This implies that the quality of estimates
is best when there is equal certainty for the sample separa-
tion to be correct for both regimes.

X1 I (l, -l)', ‘60 I (l, —l)' is invariant to ‘61 I
{-1, l)', to = (-1, l)'. This reflects the fact that ‘61.

I - ‘6 1 and $0“ I - ‘6 0 result in the same value of sample
separation information as did ‘6 1 and X0. This follows from
the "non-informativeness" condition on ‘1 and ‘6 0 when X 1

I - K0. By the same reasoning, the information reflected

in $1 is no different from that in {1* (and likewise for
K0 and ‘60“) when KfI-‘éland ‘60:... X0, when‘élI
‘0. This follows from the relationship that: p11* + p11 I l

and p00. + p0O I 1. 0n the other hand, 6 1 I (1, -1)',

66

$0 I (1, l)' is not invariant to ‘Kl I (1, l)', ‘60 I (1,
-l)'. That is, X1“ - ‘6 o and ISO“ - ‘61 do not imply the
same value of sample separation information, when ‘6 l I ‘6 0'

We examine next the case when \‘11 I - 3 01, so that
the intercept terms imply no information, and we vary the
slope terms. The results are presented in Table 9. The ra-
tios are again arranged from highest (no information) to low-
est (most information). The classification probabilities im-
plied by the ‘5 1 and ‘3 0 parameters become higher (so that
ratios become lower and efficiency improves) as 612 and ‘6 02
assume non-zero values. Lower ratios result when ‘602 is
non-zero (keeping {12 I 0) than when ‘612 is non-zero (keep-
ing ‘6 02 I 0), since the probabilities implied by ‘61 I
(1, l)', (0 I (-l, 0)‘ represent a wider divergence in pro-
babilities p11 and p00 than that given by the combination
'61 I (l, 0)‘, ‘80 I (-l, l)' due to the fact that the pro-
bability associated with ‘Kl I (1, l)' is higher than that of
60 I (-l, l)'. It is to be noted that the wider the differ-
ence in probabilities p11 and p00 (particularly in the inter-
mediate range of probability values), the less the certainty.
there is on information about regime classification, and it
follows that the estimates will be less efficient. The ex-
ception here is the non-informative case of p11 I poo I .5,
where there is no difference in the probabilities but effi-
ciency is lowest (since it is non-informative).

The additional information provided by the non-zero

‘12 and 6 02 parameters improves the efficiency of the

67

 

 

 

 

 

m.m w.H N.N w.H o.m w.m H.= H HI H H
m.m m.m m.m H.m m.m m.m s.m H HI 0 H
H.mH m.: :.m HUN m.0H m.m m.mH 0 HI H H
o.mm m.mH m.:H w.m H.Hm m.m H.Hm 0 HI 0 H
M NNM. NHM‘ NNM Hmm NHiw_e HH»e Now How NH” HH”
czocx\zso:x szmmm
moprm unoposmnmm

 

 

.ANON «HI «NHV «Hv I”

.m. a .H u m u H .. o .m .o .o u can: we .NH msHass>
A m w A V w »

 

mmocmem> oHvopmammH no moHpmm .m oHnt

 

68

estimates as Opposed to the case when only ‘611 and ‘601 are
assigned non-zero values (and ‘612 I ‘6 02 I 0). In effect,
this implies that modelling the classification probabilities
in such a way that they are not constant for every observa-
tion increases the quality of the estimates, compared to the
case in which these classification probabilities are fixed
for all observations ($12 I ‘6 02 I 0).

We have earlier shown that when X 1 I X 0’ so that
the classification probabilities are equal, the ratios of a-
symptotic variances are lowest. This is the next case we
consider, the results of which are shown in Table 10. Again,
we start with the non-informative case, where 61 I ‘6 0 I 0.
As the ‘61 and ‘60 values increase in magnitude, the classi-
fication probabilities associated with them increase too, and
there is more information as the implied probabilities get
higher (i.e. p11 I pOO approach one). The ratios decline mo-
notonically as the implied probabilities rise, and when these
probabilities are sufficiently high, the quality of the es-
timates approximates that when there is perfect information,
and the corresponding regimes are fully known.

The last case we evaluate is when we try to approximate
the X 1 and ‘6 0 values that will duplicate our results in the
previous chapter, where classification probabilities were
fixed. We test our model with non-constant p11 and p00 a-

gainst the alternative of constant p11 and p00’ which is ac-

tually a special case of our specification. The results are

in Table 11. In particular, we have F(.8416) I p11 I p00

 

m.H m.H m.H m.H m.m m.m :.m H H H H
m.m m.m m.m w.H m.m >.m m.m m. m. m. m.
o.mm m.wH m.=H w.m H.Hm m.m H.Hm o o o o

 

69

 

< N < < < < <

czonx\czocx mepmm

 

 

moHpmm mpopoemnmm

 

 

m. I K .H I Nb I He .20 .m .o .3 In cons Ac» .- Hwy» mustang,

 

moozmHnw> oHpoerzm< no moprm .OH oHnt

 

70

mHnaoHHan no: I .a.:

 

 

 

 

 

=.m m.m m.~ .w.: s.m .m.c N..m .m.: mHzm. .w.: mHaw.
a.m~ :.NH o.mH .m.c 3.0: .m.: H.H: .m.: o .w.c o
OOOOOH n c
m.m H.m m.m m.m m.: o.m m.: .m.: szm. .w.: mHzm.
m.m H.m m.m m.m m.: m.m o.m o . mHnm. o wHHm.
o.mm m.wH m.:H m.m H.Hm N.m H.Hm o o o o
ooom I n
A< NNW NHM Nmm‘ Hmm NHM HHm Now How; NH» HHX
:30:M\:3ocx szpmm
moHpmm mumpoEdmmm

 

 

m. I’A «H u Nb I Hb n.no «N no «0v um Cm£3
m. I com I HHQ new m. u A0» .xvm n AH» .xvm mo momHhmano

 

mooctha> oHpoumezm< no moHpmm .HH oHan

 

71

I .8; in terms of our probit models, 612 I X02 I 0, so
that p11 and p00 are now constant for all observations.

We have two basic experiments here -- when we delete
and do not delete 1'12 and X 02 from the model. When they
are not deleted, they are set equal to zero, but implicitly
still estimated. Both results as well as the non-informative
case are reported here, and we compare these figures to our
earlier results patterned after the Lee and Porter model
where p11 and p00 are fixed at .8 using a sample size of
n I 100000.

As Table 11 shows, when ‘6 12 and ‘6 02 are not deleted,
the resulting figures are slightly larger due most probably
to the fact that we estimate more parameters in the model so
that efficiencies may suffer. When we compare our model with
the deleted ‘6 parameters to our fixed probabilities specifi-
cation of the earlier chapter, we observe that the ratios we
derive now are larger than those we derived before. This
could be due to a number of reasons. First, we now have more
parameters to estimate in 8, i.e. 6 I ( 811, 312, 821,

P22)' as against P I ( 811, 8 21)’ in the earlier chapter.
Second, we now use a smaller sample size so that the resul-
ting figures may be less tight. Lastly, we employed diffe-
rent methods of evaluating the information matrix in both
cases. All these reasons could account for the differences
in the absolute magnitudes of our ratios, although the rela-
tionships among the relative magnitudes are quite similar.

This second set of experiments we have Just conducted

72

on varying X for a given set of f has highlighted two main
observations. First, invariance in the ratios occurs when
(1* I - $1 and $0” I - to for 61 I X0; and when X1“

I - b 0 and ‘60‘ I - ‘61 for ‘61 f 60. There does not seem
to exist any form of multiplicative or additive transforma-
tion for X where invariance may result in the derived ratios,
since any other change introduced to the Y 1 and ‘6 0 parame—
ters will lead to probability changes reflected in F(x"61)
and F(x' ‘5 0). Second, when evaluating the X 1 and ‘6 0 pa-
rameters, it is to be remembered that when ‘6 1 and ‘6 0 are
closer to each other, it follows that the probabilities
F(x"61) and F(x"XO) are also closer. This means that there
is almost equal certainty of proper sample separation into
the two regimes, so that the information is quite reliable
and efficiency improves. At the extreme, 61 I X 0 and ef-
ficiency gains are highest, particularly when the probabili-
ties implied by these parameters belong to the extreme range.
At the other extreme, when 6 1 I - ‘6 0 there is no informa-

tion at all in the regime classification information.

3.5 Summary

 

We have improved our earlier model on the value of im-
perfect sample separation information by allowing more exo—
genous variables in the switching regression model and by
postulating that the classification probabilities are non-

constant. As in the earlier model, all the parameters have

to be estimated. The latter extension where the classification

73

probabilities can be modelled as probit functions is aimed
at providing more information and flexibility to the model
since the probabilities of regime classification are de-
pendent on the exogenous variables at each observation.

Two basic types of experiments were conducted using
simulation techniques applied on a large sample size. First,
we vary the 8 parameters for a given information level (de-
noted by the 6 parameters) to find out the effects on effi-
ciency of estimation of varying the degree to which the sam-
ples are separate. Second, we vary the 6 parameters for a
given set of 8 parameters to evaluate the effects of diffe-
rent levels of information observability, given a particular
sample distribution.

Among our findings, the following two are most impor-
tant. (l) The use of information, even if imperfect, still
presents large gains relative to when there is no information
at all. Naturally, the more reliable the imperfect sample
separation information, the greater the gains in efficiency,
where the reliability of the information can be evaluated by
the ‘61 and X 0 parameters. (2) The value of imperfect sam-
ple separation information largely depends on how much alike
the two samples are. When the samples are hard to distinguish
from one another, then the value of any information is high-
est. If we consider the 8 parameters as denoting sample se-
parability, the intercept parameters are more important than
the slope parameters in determining how distinct the samples

are from each other.

CHAPTER POUR
THE CASE OF NON-CONSTANT
REGIME CLASSIFICATION PROBABILITIES

AND NON-CONSTANT SWITCHING PROBABILITIES

4.1 Introduction

In the preceding chapter, we evaluated the value of
imperfect sample separation information in a switching reg-
ression model with two exogenous variables, where the pro-
babilities of regime classification are non-constant. We
argued that such a specification has its merits in the fact
that more reliable information on sample separation is pro-
vided at each observation. This implies that the values of
the exogenous variables do affect the chances of prOper re-
gime membership given the actual regime, so that the observed
imperfect indicator of sample separation is a more accurate
measure of the latent perfect indicator at each observation,
when the regime classification probabilities are non-constant.

However, we assumed then that the switching probabi-
lities were constant for all observations. That is, the pro-
bability that each observation is generated by a particular
regime is fixed. We now re-formulate this assumption to take
into account that the switching probabilities are non-con-
stant, and can also be modelled as probit functions of the
exogenous variables. The rationale behind this is fairly
intuitive -- certain values of the exogenous variables have

higher chances of being associated with observations which

74

75

are generated by a particular regime, while other values of
the exogenous variables are better associated with observa-
tions generated by another regime. Therefore, the values

of the explanatory variables affect the probabilities of
actual regime classification ()\), and not Just the proba-
bilities of presumed regime classification given the actual
regimes (p11 and poo). While the preceding chapter explored
the latter approach, we now deal with the former possibility
as well as the latter.

In terms of the Lee and Porter railroad cartel stabi-
lity model, the explanatory variables include: (1) a Great
Lakes dummy variable which represents when the Great Lakes
were made open to navigation so that the cartel faced its
chief source of competition; and (2) several structural
changes dummy variables which represent the entry, acquisi-
tions and additions to existing networks in the railroad in-
dustry. When the cartel faced its main source of competition
or when there were significant structural changes in the
industry, we expect these events to affect the occurence of
either collusive or non-collusive behavior within the cartel.
This implies that these explanatory variables affect not only
the probabilities of proper regime classification given the
true regime (i.e. whether price wars were probably occuring
or not), but also the probabilities of actual regime classi-
fication (i.e. whether price wars were really occuring or
not).

We postulate here that switching probabilities or

76

probabilities of actual regime membership assume non-constant
values for all observations, which introduces more flexibili-
ty to the model and improves the model's ability to classify
observations based on the values of the explanatory varia-
bles. Our model here on non-constant switching probabilities
can also be extended to consider our past models with a con-
stant mixing parameter as a special case, so we can compare
the performance of those models against the alternative of

our present model.

4.2 The Model

We still maintain the basic switching regression model
of the previous chapter but we now designate the switching
probabilities as non-constant. Therefore, our model can be

expressed as:

yd = XJ' 81 + 111.3 with probability )\ (4.1)

J
for observation J
(regime l)
a ' + -
yJ xJ $2 u2J with probability Cl Ad)

for observation J

(regime 2)

91 and 82 are (K x 1) vectors of parameters corresponding
to the explanatory variables of the (K x n) matrix x. The
error terms UIJ and 1.12.j are assumed to be independently and

normally distributed with means 0 and variances Ciz and

62?, respectively. The non-constant switching probabilities

77

can be modelled as probit functions of the exogenous varia-

bles. That is,
A.) '3 F(XJ'QI)
1 - )1 I l -F(x3'5¢) I F(-xJ'G,)

where F( ) is a standard normal cumulative distribution
function, and Q is a (K x 1) vector of parameters. This con-
tains the constant switching probabilities model as a special
case where all the elements of Q; are zero, except for that
corresponding to the constant term. In the present model, we
still retain the assumption of the previous chapter regarding
the treatment of the regime classification probabilities as
non-constant. Therefore, we have the following probit models

on probability classification:

pllJ I F(xd"61) where p113 I Prob(wJ I 1/IJ I 1)
for each observation J
p00.j I F(xJ'XO) where p00J I Prob(wJ I O/IJ I 0)

for each observation J

where F( ) is again the standard normal cumulative distribu-
tion function, and X 1 and ‘60 are (K x 1) vectors of para-
meters. wJ is the observed dichotomous indicator which pro-
vides sample separation information, while IJ is the unob-
served dichotomous indicator of actual regime classification.
When there is imperfect sample separation information,

the Joint density function for yJ and wJ can be re-written

from (3.2) as:

78

f.j . f(y,. wJ; 0) (4.2)

th1(yJ)(wdan + (l - wJ)(1 - 13113)) +

 

- F(xJ'Q )f1(yd)(wJF(xJ"61) +

(l " wJ)F(-XJ'X1))+

F('XJ'Q )f2(yJ)(WJF(-XJ' XO) +

(1 "' WJ)F(XJ'XO))
where

, 2

fi(YJ) I 1 em I? (y: ' xi 31) ]

Jﬁ-Ci 25,2

 

 

 

x3"Xc) _ v 2
as -S 3_ [—41—] as
-:ﬁ ./21'
i I 1,2; J I l,...,n
f1(yJ) and f2(yJ) are normal probability density functions
with means and variances given by N(xJ' 81’ 612) and
N(xJ' 82, 622), respectively. k3 is represented by a
probit model of the actual switching probabilities; and pllJ
and p00J are represented by probit models of the presumed

classification probabilities given the actual regimes.

79

4,3 Derivation of Asymptotic Variances
When the regime is known, the asymptotic variances
_ A — A — p 2 2
Of Jn($1- $1), Jn< 62- B2): s/n(€1 - 61),
_ _ A
Jn(’@22 - 622), and ./n(Q - Q ) are, respectively:

2 l “ -l ,
‘1 (3.333 HI; Ixixi') ’

u
262

12" (1 - )3)

; and

 

n

I 2 I “1
11m $2 (NJ:J 6)) (xij ) .
m. ’” F(xJ'Q)(l-F(XJ'Q))

The asymptotic variances for (31, ’82, 812, and 1&22 will
reduce to the corresponding asymptotic variances given in the
previous chapter if )\ were constant. However, our switch-
ing probabilities are no longer constant in our present spe-
cification so that we have different values of >IJ for each
observation. Since }\J I F(xJ' 4,) which is a probit model,
then the asymptotic variance ofe, corresponds to the asymp-
totic variance of the parameters in a standard probit model.
The above expression was derived from Judge et. al. (1980),
and Ashford and Sowden (1970), where E( ) is a standard

normal probability density function and F( ) is a standard

80

normal cumulative distribution function. There are no a-
symptotic variances for the estimated parameters 71 and
'%0 since these parameters are not relevant at all when the
regimes are fully known. As in the previous chapter, the
above expressions for the asymptotic variances of 81, 82,
R12, and 822 are derived from the inverse of the informa-
tion matrix, where this information matrix corresponds to
the likelihood function for the case of known regimes.

For the models where the regime is either completely
unknown or is partly known, the asymptotic variances of
I/HYE — 0) are derived in the same manner -- from the diago-
nal elements of the inverse of the information matrix. It

follows that ./ﬁ(0 - G) approaches the distribution designa-
ted by the expression N (0, 1131(233) -1) .
II-veo n

The information matrix is defined by the following
expression:
_ 2
3 - - E 29 In L
3939'

where:

.I‘r
L J“ a

1n L I 35 1n f
1:! 3

Therefore, when the regime is completely unknown, f corres-

ponds to the density function of (3.3) given as:
f(yjs 9) I )Jf1(y3) + (1 - BJM'ZWJ) (4.3)

0 is a (3K + 2) x 1 vector of parameters, where K is the

81

number of explanatory variables. Therefore, 9 I ( $1}, 82"
2 2
‘1 ’ 62 ’ 4'" "he” 91 " (911’ $12’°°" elk)" (2 g
I g I
($21, 622,000, 92K) and Q (£1, €2,000’ 6K) 0
When the regime is partly known, f corresponds to the densi-

ty function in (4.2) given as:

f(yJ. WJ; 9) I F(xJ'Q)fl(yJ)(wJF(xJ'X1) +
(1 - w3><1 - Fuj'xlm +
(l - F(xJ'% ))f2(yJ)(wJ(l - F(xd' ‘60)) +
(1 - WJ)F(XJ'XO))
0 is a (5K + 2) x 1 vector of parameters given by ( 81', F 2',
612: 622, 6', Xl', XO')' where the vectors (:1, 82, and
Q have been defined as previously; and 161 I (311, X12,...,
1110' and X0 = (101’ X02""’ X0K)"

To facilitate comparison of the results here with those
of the preceding chapter, we calculate the information mat-
rix in the same manner using similar simulation techniques.
The information matrix will be evaluated in the following way:

., 2
»._Ez[a 1:113]

:18!

aeuac'
«I? 1 221 21H
‘ [37(39)(90

We therefore need to derive the first derivative ex-
pressions of f when regime is either unknown or partly known.
For the case when regime classification is completely un-
known (there is no sample separation information at all),

the first derivatives of f(y; 0) (we omit the subscript J

82

for simplicity) with respect to O are:

a: =F(x'4.)

as1k

31‘
2)52k

a:
2
961
a:
2
262

Bf;
331x
I at.
I (1 - F(x Q )) 2
252k
I F(x'&,) afl
351

<1 - F<x'e )> 2325
as,

21‘ = (r1 - r2) amx'é)

ask bék
where:
—EE:H-‘-i:§ “” X'Fi)ih<
951k 61
_a_£____ [a ”In?
3612 2612 «,2

 

 

BF(x'Q) = awe ) xk
Eek

i =1,2;k:=1,.u,K

where E(

‘1]

) is a standard normal probability density function.

Since G is of dimension (3K + 2) x 1, then 3 is of dimen-

sion (3K + 2) x (3K + 2).

For the case when regime classification is partly

known due to the presence of imperfect sample separation in-

formation, the first derivatives of f(y, w; G) with respect

83

 

 

 

 

 

 

 

to 0 are:
21‘ a Font, )Ql Dfl
a51k aﬁlk
at“ = (:L-F(x'€t))Q2 3f2
252k 252k
2: = F(x'Q )Ql 31-1
9612 3&1:
Br = (1-1='(x'€))Q2 3%
9&22 962:
21' = (rial - r2Q2) Move)
aek ask
a: = F(x'Q )rlm - (1 - w)) 3“” I‘1)
361k 981k
9r = - (1 - F(x'¢, ))f2(w - (1 - 24)) 33“!"0)
360k 360k
where:

Q1 I wF(x"Xl) + (1 - w)(l - F(x"Xl))
02 I w(1 - F(x'XO)) + (1 - W)F(X' ‘60)

afi I 1.i (y--x'81)xk
351k 61
afi a fi [(y’x'51)2-1]
—'2' "‘2' 2
261 2‘1 61
BFIx'Q) = some) xk
9Qk

84

’aF(x'%s) I Nx'Xs) "k
268k

1 I 1,2; k I l,...,K; s I 0,1

where ﬂ( ) denotes a standard normal probability density
function. G is of dimension (5K + 2) x 1, so that it fol-
lows that 9 is of dimension (5K + 2) x (5K + 2).

We follow the simulation techniques of the preceding
chapter.1/ Using a sample size of n I 5000, and faced with
specific parameter values, we draw observations from the
switching regression model using a normal random variable
generator. We evaluate the information matrix by averaging
the expressions derived from the first derivative components
of the density functions, when regime is either unknown or
is partly known. The asymptotic variances are the corres-
ponding diagonal elements of the inverse of the information

matrix.

4.4 The Value of Imperfect Information
We again derive here two sets of asymptotic variance
ratios for each experiment -- one, with regime partly known

relative to regime known; and two, with regime unknown

 

Z/In the preceding chapter, we showed that the infor-

mation matrix can be evaluated in two ways. We evaluated it
by the second method, using first derivative components of

the appropriate density functions. In our present model, we
followed the same method in order to facilitate a comparison
of the simulation results. However, we can also evaluate

the information matrix using the second derivative components
(although we did not do this) as we did in Chapter 2. For

the reader's interest, the expressions are shown in Appendix B.

85

relative to regime known. A comparison of these two ra-
tios will show how much more efficient our estimates will
be when we use partial information as compared to no infor-
mation at all in determining sample separability. Under-
standably, the ratios in the former case will all be less
than or equal to those in the latter case (they are equal
when the partial information is not informative at all).
All the ratios will, however, be greater than or equal to
one (they are equal when the estimates derived are as effi-
cient as full information estimates), and the extent to
which they differ from one indicates the value of informa-
tion or imperfect information, as the case may be.

We maintain the use of two exogenous variables x1
and x2, where x1 is a unit vector and x2 is equal to
exp (- x3), where components of x3 are distributed as N(0, 1).
x3, like our dependent variable y, is derived from the nor-
mal random variable generator. The sample size is set at
n I 5000. We retain the experimental conditions of the pre-
ceding chapter, in order to make comparisons later with the
resulting ratios.

We conduct three sets of experiments. In the first
set, for given 612, G 22, and 6 parameters which denote some
information, we vary our E parameters in order to make sam-
ple separation more distinct. We do this twice -- first, u-
sing Q, values which imply constant switching probabilities
(i.e. 6,2 I 0 but estimated) and second, using Q values

which indicate non-constant switching probabilities (i.e.

86

$2 I 0 and also estimated). In the second set of experi—

ments, for given 612: 622

, and 6 parameters which denote
a distinct sample distribution, we vary our 6 parameters to
show different levels of information observability. Again,
we do this twice —- where our chosen 6 values imply both
constant and non-constant switching probabilities. In the
last set of experiments, we vary our 6 parameters given
fixed 8, 612, C 22, and ‘6 values. The purpose of this
last set of experiments is to find out the effects of dif-
ferent Q values (all of which imply non-constant switching
probabilities) on parameter estimation efficiencies.
For the first experiment, we vary the F values given

512 = «22 = 1, and ‘6 I (1, -1, 1, l)' which is informa-
tive. Since we had earlier established that the intercept
term is more important in determining sample separability
than the slope term, we vary our intercept term 621, hold-
ing the other intercept term fixed; therefore, we have P I
(0, 0, 821, 0)' where the two distributions are made in-
creasingly distinct from each other as 821 increases. We
choose Q,I (0, 0)' which essentially implies constant
switching probabilities of .5, even if k. is modelled as a
probit. This particular choice of Q values enables us to
test our model with non-constant switching probabilities,

h I F(x'Q) I .5 where G I (0, 0)' but estimated, against

the alternative of constant )( (implicitly, )I I F(x'€ ) I
.5, where QI (0, 0)' but not estimated), which is actually

a special case of our present specification. The results

87

are presented in Table 12, and the ratios here can be com-
pared with the ratios in Table 7 of the previous chapter,
where, for fixed )\, we perform the same eXperiments. We
call this Case 1.

‘6 has some information, i.e. ‘61 I -Xo, so that the
variance ratios when regime is partly known are less than or
equal to the variance ratios when regime is unknown. They
become more equal as the two regimes become distinctly se-
parate, meaning that there is little value in obtaining more
information on sample separation when the two distributions
are clearly far apart.

Compared to fixed h , where Q is not estimated (as in
Table 7), the ratios we derive now are slightly larger (par-
ticularly for [812, $22, and la ) probably due to the fact
that more parameters are estimated here, or maybe simply due
to randomness. But as the regimes become clearly separate,
i.e. F I (0, 0, 4, 0)', the ratios now are almost equal to
those derived when X was fixed.

The &,1 and E,2 variance ratios are higher than the
A) ratios even when both imply that Q. I 0. The reason behind
this is that parameter values for Q now have to be estimated,
thereby introducing more randomness in the process, compared

to the case when Q.I 0, but not estimated.

When regimes are quite close to each other, i.e. P I
A A
(0, 0, l, 0)', 812 and 822 variance ratios are larger than

r A
the ﬁll and. $21 variance ratios, a pattern very unlike

that when X was fixed. However, this observation only holds

.mounoso
anp CH noHnmp nonpo on» mom can» nH mHnB .czocx nH ouson can: on o>HomHon 23023::
nH oEHmoa cons noocaHum> oHponEmnm uo noHusu on» was nonmnpcohan 2H nohsmHh .ooom I c

 

 

88

 

 

 

3.3 3.3 8.3 8.3 8.3 3.3 3.3 8.3

m.H s.H s.H :.H o.m o.m :.m o.m o a o o
3.8 3.83 3.33 363 8.3 2.43 3.3 Amémv

H.m m.» w.m b.m m.: m.m a.» N.m o m o o
3.33 2.335 86.33 2.603 3.03 346me 8.9; 3433

m.MHm =.hs m.m o.m p.5m m.mm :.mm m.sH. o H o o

Nsm HM mww me mum Hmm mHm HHm. «Nu HN& mHu HHu
csocx\Aczocchv csosx zHuumm
moHumm mumposmnmm

 

 

 

.3 .H .HI .3 a»
«.Ao «0V I@ «H INV IHb «.Ao «Hma «C «0v In Cm£3 Hmm wCHhhm>

 

mmocmHnm> oHpomezm< no moHpmm .NH oHnma

89

when regime is partly known. When regimes are completely
unknown, $12 and $22 ratios are much smaller than the

‘311 and lgzl ratios, a pattern evident when X was fixed.

The same pattern holds, but on a smaller scale when 5 I

(O, 0, 2, 0)'. This implies that when regimes are very
close, and there is partly known information on regime clas-
sification, then there is a larger value of sample separation
information when k is not fixed, as compared to when X is
fixed.

We now repeat the previous experiment, this time
choosing non-zero Q parameters -- call this Case 2. The
results are presented in Table 13. We set Q»= (1, -l)'
so that the implied probability for each observation is no
longer constant at .5. The mean value of the (different)
X3 is .h38. This implies that there is a slightly larger
probability that an observation is generated from regime 2
rather than regime 1. Consistent with the findings of past
experiments, the ratios which reflect the value of sample
separation information are larger for the regime which is
sampled with the lower probability. Since regime l is sam-
pled with the lower probability on the average, then the
variance ratios of 311’ $12, and 612 are all larger than
the corresponding estimated parameter ratios of regime 2.

The only difference between Case 1 and Case 2 is in
the value of the Q parameters. In Case 1, the choice of

the Q; values assure that for each observation, k - .5;

in Case 2, the choice of the Q values assure that for each

9O

 

Am.mv Am.mv Am.Hv Am.av Am.mv Aa.mv Ao.:v A~.mv

m.m m.m =.H =.H N.N m.m m.m m.m
Am.=ﬁv Am.=av Am.~v Ao.=v As.mv Ae.sv A=.HHV Ae.ev
w.HH e.m H.~ =.~ e.m m.s m.m o.m

Am.mHHV Am.mmﬂv Am.mv Am.mv A:.=v Am.mav Am.~mv Am.mmv
e.mm m.me m.m m.s m.m a.» m.om m.:a

 

 

NW HG NV aw NNm HN$ mﬁw HH»

4 < N( N< < ( ( <

czosx\AcsoschV czosx mapumm

 

mOHpmm

 

mmu Hma Nﬂa Ham

 

mumpoemnmm

 

.AH .ﬂ .H: .63 an,

q-AHI «HV Id «H I Nw I Hb «pno AHN& «C «0v I& C033 HNm wgﬂhhdxr

F

moocmauw> onopmemm< no moapmm .ma magma

 

91

observation, X assumes different values, depending on the
magnitude of the independent variables x that determine the
value of X , i.e. X = F(x'€ ). A comparison of the ratios
between Case 1 and Case 2 shows that, in the latter, the
value of sample separation information varies less. That
is, when the two regimes are fairly close and the value of
sample separation information is important (or the ratios
are high) in Case 1, the value of sample separation infor-
mation is less important (or the ratios are lower) in Case
2. On the other hand, when the regimes become farther apart,
the value of sample separation information in Case 1 becomes
lower or the ratios of asymptotic variances tend to approach
one as they should. For Case 2, the decline in the ratios
is slower, so that ratios in Case 2 are higher than those ob-
tained in Case 1, when regimes are distinctly separate. To
illustrate, take the ratios for $11. In Case 1, they range
from 17.5 to 2.0 (as the distributions become farther apart)
when the regimes are partly known, and from 6831." to 2.3
when the regimes are unknown. In Case 2, they range from
lu.8 to 3.5 when the regimes are partly known, and from 25.6
to 3.7 when the regimes are unknown. The same pattern holds
for all the other variance ratios of the estimated parameters.
There does seem to be an advantage in postulating that
the switching probabilities be non-constant rather than con-
stant (even if the‘Q parameters have to be estimated in both
cases), so that the probability that an observation is gene-

rated by a particular distribution depends on the values of

92

the exogenous variables. However, this advantage only holds
when {,2 !‘ O.-8-/ This is supported by the observation that
the efficiency of the estimates does not suffer as much
(variance ratios are lower) when )x is non-constant (Q r‘ O),
as compared to when X is constant ( Q 8 0), when the re-
gimes are very close to each other and are hardly distinct,
i.e. B ' (0, 0, l, 0)'. It is evident when regime is either
partly known or unknown. As the regimes become separate,

the decline in the ratios is quite slow, so that the variance
ratios are actually lower when X is constant.

Among all the ratios, the highest values belong to the
estimated 6, parameters, Just as in Case 1. This means that
among all the parameters to be estimated, the largest effi-
ciency losses originate from the parameters that determine
the switching probabilities. This is fairly intuitive, since
the efficiency of the estimates for the parameters in the two
regimes are affected by the initial probability of switching
regimes or of correctly matching the observations with the
proper regimes; therefore, the greater burden of efficiency
losses correspond to the 6. parameters, which enter the
switching probability probit function. These are applicable

only when regimes are difficult to distinguish from one

 

life i‘ O basically implies that X =- F(x'€) is non-
constant fog all observations, while Q, = 0 implies that
k . F(x'Q,) is constant, since the effects of the variable
x are wiped out and are not reflected in the resulting values
of k . In our experiments, we adopted the special case of
). = F(x'§) = .5, where 6,8 (0, 0)' but estimated.

93

another. When the regimes are sufficiently apart, then the
ratios of the 6 parameters are comparable in magnitude to
the other ratios. Consistent with the observation in Case
1, the decline in the variance ratios is monotonic for all
estimated parameters as the distributions become far apart.
2 . (S22

= l and also choose a particular sample mix, i.e. 9 = (0,

In our second set of experiments, we fix 61

0, 2, 0)'. We vary our ‘6 parameters to reflect different
observability levels. We do this set of experiments twice --
Case 1, where Q,= (O, 0)' and Case 2, where Q 8 (l, -1)'.
The results are presented in Tables 1“ and 15, respectively.

Let us start with Case 1. In the first experiment,
K1 = - X 0, that is, the ‘6 parameters imply that no infor-
mation is provided at all, and the ratios derived here are
very similar to those derived when.) is fixed for all obser-
vations, but 6, = O is not estimated (as seen in Table 8 of
the previous chapter). Ratios when regime is partly known
are exactly equal to those derived when regime is unknown.
The only difference between the ratios derived here and those
derived when Q= 0 but not estimated is that the ?12, $22,
and 2 ratios are much higher when the regimes are close to-
gether, i.e. B = (o, o, 2, 0)'.

When information is now introduced into the K parame-
ters (Kl ﬁ-Ko), as in X= (l, -l, l, l)' and X . (1, 1,
-l, l)', then the ratios when regime is partly known are

less than the ratios when regime is completely unknown.

That is, the presence of sample separation information

94

 

 

 

 

 

 

 

m.H m.= m.H m.m h.H m.m m.m m.: H HI H H
H.m m.» m.m >.m m.: m.m :.> m.m H H HI H
m.m =.HOH m.mH o.mH m.= H.Hm m.u =.mm H HI HI H
NM HM NNM‘ «HM mmm.‘ HNM mHm HHm mox Howl NHx HH
ssosx\czocx thnmm
moHpam mnmumEmpwm
20.8..0 .Humw nab

:3 .m .o .8 a»

can; on, x H» V » wcthm>

 

moocmHnm> oHpoumEmw< mo moHpmm

.nH mHnt

 

95

presents efficiency gains as reflected in the decline of the
ratios as compared to when there is no information at all.
The results here can be compared to the ratios in Table 8
and Table 9 for the same parameter values of 612, 6 22, g ,
'X, and W = .5 where ‘Q = 0 but not estimated.

x = (l, -l, l, l)' presents a wider divergence in re-
gime classification probabilities p11 and p00, as compared to
X = (l, l, -1, l)'. That is why, ratios are lower or effi—
ciency gains are higher when p11 is close to p00 as in '3 =
(1, 1, -l, l)'. The observation of the previous experiment

also applies here. That is, the ratios derived when )( is
fixed and Q = 0 but not estimated are close, but slightly
less than the ratios derived here where )\ is also fixed and
Q ==C, but estimated. Again, the difference may be due to
randomness or to the fact that more parameters have to be
estimated this time.

We now explore Case 2, which is shown in Table 15.
The Q values are set differently, where Q = (l, -1)' so
that the switching probabilities vary for all observations.
This results in an average value of 'k = .438, meaning that
there is a slightly larger probability on the average that
an observation is generated from regime 2 rather than regime
1. Consequently, the ratios are larger for the estimated pa-
rameters of the regime which is observed with the lower pro—
bability.

The first experiment illustrates a non-informative

case, where X = (1, -l, -1, l)'. Therefore, ratios when

96

 

 

 

 

 

 

 

m.» m.HH :.H H.m m.m m.m m.m m.m H HI H H
m.HH h.m H.m =.m ~.m m.= m.m o.m H H HI H
m.:H w.:H m.m 0.: :.m u.» :.HH ~.~ H HI HI H
NM HM NNM NHM mmm Hmw «Hm. HHm Now, How «HM HHx
ssocx\csosx menmm
moHpmm mnopoednmm
.AHI «HV "6 «H I N.” ..u Hb

A.AO «N .0 «CV BM

sons A0» x H» V » magnum“;

 

moocanm> oHpopmemw< no moprm

.mH mHan

 

97

regime is partly known are equal to the ratios when regime
is completely unknown. The next two experiments provide
informative X choices, which essentially duplicate those in
Case 1, so that the variance ratios decline when information
is not denied from the model.

When the Q, values ensure that the switching probabi-
lities are non-constant for all observations, the ratios in
Case 2 vary less than those in Case 1. Even when the ‘6 pa-
rameters are non-informative, variance ratios in Case 2 are
lower than the corresponding variance ratios of Case 1.

This re-enforces our earlier findings in the first set of ex-
periments that there are efficiency advantages when we pos-
tulate that the switching probabilities be modelled as non-
constant (Q2 3* 0). However, as information is provided on
sample separation, the decline in the variance ratios is

very slow or is quite minimal in Case 2. To illustrate this
point, consider the?ll ratios -- in Case 1, the decline in
the values ranges from 52.4 to 4.6 when information is pro-
vided, while in Case 2, the decline in the values ranges from
7.7 to 6.8 when the same X information is provided. A si-
milar pattern is evident for the ratios of the other parame-
ters. Therefore, the advantages of improved efficiency asso-
ciated with non-constant switching probabilities seems to oc-
cur only within that range of parameter values where informa—
tion is very valuable in determining sample separability --
in this instance, when the 6' parameters are non-informative.

Since ‘6 - (l, 1, -l, l)' provides less divergence

98

in the p11 and p00 regime classification probabilities as
compared to X - (l, -l, l, l)', we would expect that the
ratios in the latter should be consistently higher than the
ratios in the former. However, this does not hold, parti-
cularly in the case of the i311, ,6 12, and 3.1 variance ra-
tios, where the decline in the ratios is not monotonic as we
vary the “ values from the least informative to the more
informative.

Another effect of the non-constant ) values is seen
in the fact that among all the derived ratios of asymptotic
variances, it is the Q ratios which are always the highest.
This implies that as information is provided on regime clas-
sification, efficiency losses associated with the G parame-
ters remain quite substantial when )x is not constant for
all observations. When X is constant for all observations,
but 49 parameters still have to be estimated (as in Case 1),
then the ratios are much lower (when information is provided
to the model) and the decline in the values of the asymptotic
variance ratios is monotonic as more information is provided
on sample separation.

The last set of experiments we conduct involves vary-
ing the values assumed by the {Q parameters given fixed va-
lues for 612, 6‘ 22, g , and X . The results are presented
in Table 16. For these experiments, ‘6 1 If - ‘6 0’ so we have
informative cases. Ratios when information is partly avail-

able are less than ratios derived when there is no information

available at all.

 

99

 

 

 

 

 

AH.mmv Am.wmv AH.HHV Am.Hv Am.::v Am.mmv Am.mv A».zv
m.>m H.@ :.H m.mm m.mH :.N H H wmo. mom.
A~.m~v As.HmV “n.0Hv Am.mv Am.omv A=.s~v Am.mv Ao.~Hv
m.mH v.2 m.H z.mH w.m m.m m. m. maH. «mm.
A=.mHv Am.mav Am.mv Am.mv Am.oav Am.ev Am.ev
N.HH m.m >.H p.» o.m m.: H HI wmz. mmm.
Hm.=Hv Am.=HV Am.mv “0.33 A:.mv A>.~v A~.>V
w.HH H.m z.m N.m 0.: o.m HI H Nmm. mmn.
Am.mv Am.Hoav Am.mHV Ao.mHv Am.:v A».Hmv Am.sv Am.mmv
H.w m.m b.m m.: m.m m.m o o m. m.
NM. «NM NHM NNm Hmw HHm NV .HO £_I H A
czo:x\AssocstV ssocx sznmm
onpmm muoposdnmm mosHm> cam:

 

 

 

«HINWIHU

.AH .H .HI .HV a”,

«oAO «N «C «0v In

sons so maﬁmﬁg

 

moocMHum> oHpomeHm< no moHpmm

.wH mHan

 

100

The different C, values suggest different average
values for )7 and (l - ‘)). The resulting variance ratios
are consistent with the expectations that the ratios asso-
ciated with the parmaters of the regime observed with the
lower probability assume higher values. Therefore, as the
average value of )\ goes up, the ratios associated with re-
gime l, i.e. 811, $12, and 812 all go down.

Regarding the ratios corresponding to the Q. parame-
ters, the lowest ratios occur when k = l - 'X = .5; this
means that the efficiency of the estimates on the parameters
of the )5 model is highest when there are equal probabilities
for an observation to be generated by either regime. As the
switching probabilities increase for any one regime, i.e. as
the Q, parameter values increase absolutely, then g, ratios
also increase monotonically, implying that the efficiency of
the estimates declines substantially when the switching pro-
babilities become biased in favor of any one regime.

When 2= (1, -l)' and ﬁ" 8 (-l, 1)‘, then Q* = - Q ,
so that F(x'Q) = 1 - F(-x'€) = 1 - F(x'6,*). Therefore,

A 8 l - )\*. This transformation is a similar action to
simply interchanging the names of the regimes. Consequently,
’81 ratios for )I are simply equal to $2 ratios for l - )\ *3

2 ratios when -'Q. is equal to

5‘12 ratios are similar to 82
Q“; and so on. Any difference in the values of the variance
ratios may be attributed to randomness, and to differences in

the information provided by the X' parameters to both regimes.

101

“.5 Summary

This chapter has focused on the possibility of model-
ling the switching probabilities as probit functions of the
exogenous variables in a switching regression model. It
has, however, retained the other features of the preceding
chapter -- two exogenous variables, and modelling the regime
classification probabilities given the true regime also as
probit functions of the explanatory variables. In addition,
all the parameters will have to be estimated. This expanded
model is aimed at improving on the previous specification
since using all the available observations on the dependent
and independent variables may increase the chances of correct
switching between regimes. It also serves as a better indi-
cation of the model's ability to classify observations based
on the values of the exogenous variables.

Different types of experiments were conducted here.
In the first two sets -- vary F given K, and vary X given
8 —- we apply both constant ( Q = O) and non-constant (Q, 1‘
0) switching probabilities, where the 6 parameters are esti-
mated in both instances. When Q = O, we have the special
case of our former model with fixed X (implicitly, 6 = 0
but not estimated) and the ratios we derived previously can
be compared with our present results. When‘Q # O, we can e-
valuate the merits of our probit model when the resulting
switching probabilities are either constant (16 = O) or
non-constant (Q i‘ O). In the last set of experiments, we

vary our 4, parameters, all 4, I‘ O, to find out the effects

102

of such an action on the resulting parameter efficiencies.

We come up with the following important findings.
First, there are advantages when the switching probabilities
are modelled as non-constant (éz r‘ 0) as compared to con-
stant switching probabilities (42 = 0 but still estimated).
These advantages are in terms of greatly improved efficien—
cy of the estimates of the parameters. However, these gains
only occur during instances where information is most valu-
able -- when samples are hardly distinct from each other,
and when the information provided by the X parameters is
not informative at all. Under these circumstances, we get
smaller variance ratios when the switching probabilities are
not fixed for all observations. Second, since there are more
parameters to estimate in this model, a lot of randomness and
variability is introduced. This may account for the fact
that the ratios we derive here are slightly larger than those
derived when h was fixed.(‘Q = 0 but not estimated). In ad-
dition, the slope variance ratios in the regression model are
now larger than the intercept variance ratios in instances
when the value of information is most important (as mentioned
above) and for the sample which is observed with the lower
probability on the average. This was not evident at all
when we had a constant mixing parameter X.. Third, when we
vary the Q parameters to yield various average levels of
switching probabilities, the variance ratios of the estimated
parameters which correspond to the sample observed with the

A
lower average probability are generally higher. The Q

103

variance ratios also increase as the probability of ob-
serving a particular regime diverges from .5. Last, the
value of imperfect sample separation information is still
largely dependent on the natural separation of the two
samples. Variance ratios are higher when the samples are
more difficult to distinguish from each other, and they

are lower when samples are far apart. Also, the use of im-
perfect information improves parameter estimates as compared
to when no information is used at all. Naturally, the more
reliable the imperfect information (as evident from the X

parameters), the better our estimates will be.

CHAPTER FIVE

CONCLUSIONS

 

We set out in this study with the purpose of asses—
sing the value or importance of imperfect sample separation
information in a switching regression model, where all the
parameters have to be estimated, so as not to understate
the true value of such information. We accomplished this
by evaluating information matrices using simulation experi-
ments over a large sample size (i.e. 100000 and 5000) in or-
der to derive the asymptotic variances of the estimated pa-
rameters when regime is either unknown (no available sample
separation information) or partly known (the available in-
formation is imperfect). These asymptotic variances are
simply the corresponding diagonal elements of the inverse of
the information matrix. We then solved for asymptotic var-
iance ratios when regime is either partly known or completely
unknown, relative to when regime is completely known (full
sample separation information). A comparison of these two
sets of ratios shows the advantage of using imperfect regime
classification information relative to no information at all.

All these ratios are greater than or equal to one, and
the extent to which they differ from one measures the value
of information, or imperfect information, as the case may be.
The higher these variance ratios, the greater is the value
of regime classification information. On the other hand,

variance ratios which approach the lower bound of 1.0 imply
10h

105

that information is not very valuable to the model.

In the past three chapters, where we evaluated the
value of imperfect sample separation information, we made
variations on the basic switching regression model by pos-
tulating different assumptions about the parameter values.
In Chapter 2, we examined a normal mixture model with im-
perfect regime classification information, where the proba-
bilities of correct regime classification (given actual re-
gime classification) are constant over observations. This
is a straightforward extension of Schmidt's work to the Lee
and Porter model with constant regime classification pro-
babilities, p11 and p00. Our experiments consisted of vary-
ing the regime classification probabilities, the difference
between the means of the two samples, the difference between
the variances of the two samples, and the mixing parameter
-- each time holding the other parameter values fixed.

In Chapter 3, we added another explanatory variable
into our switching regression model and further assumed that
the presumed regime classification probabilities (p11 and
p00) are non-constant over observations, but are in fact,
probit functions of the exogenous variables. This extension
is aimed at improving the flexibility of the model and is
plausible since it is highly likely that the imperfect re-
gime classification probabilities vary from one observation
to another, and that their values are affected by the exo-
genous variables. Our experiments consisted of varying the

probit parameters of the regime classification probabilities

106

for a particular sample mix, and varying the sample mixes
for a particular set of imperfect indicators -- each time
holding the other parameter values constant.

In Chapter A, we maintained the features of Chapter
3 but added another assumption, namely, that the switching
probabilities, formerly assumed to be a constant mixing pa-
rameter for all observations, are now non-constant and can
be modelled as a probit function of the explanatory varia-
bles. This extension is aimed at providing the model with a
better ability to classify observations into the two regimes,
by using as much information as possible at each observation.
Therefore, actual regime classification probabilities as
well as imperfect regime classification probabilities are
modelled here as probit functions of the explanatory varia-
bles. There are three sets of experiments here: varying the
probit parameters of the imperfect regime classification
probabilities for a particular sample mix, and varying the
sample mixes for a particular set of imperfect regime clas-
sification probabilities, each time using non-constant
switching probabilities; and varying the parameters in the
switching probabilities probit model given fixed values of
the other parameters.

We have discussed the results of our experiments
in detail already, so here we will discuss only a few of
the more important findings. First, there are advantages
in terms of efficiency gains when using imperfect sample

separation information, as compared to no information at

107

all. These efficiency gains can be substantial in some cases.
This is especially so when the two samples are not very dis-
tinct, so that there is not much sample separation informa-
tion in the sample itself.

Our second important finding follows from the first
one. There are two cases in which imperfect sample separa-
tion information does not improve efficiency of estimation:

(1) The imperfect sample separation information is
not informative. This occurs when the probability of a par-
ticular observed regime classification does not depend on
the true regime classification; that is, when pll . l — p00.
In terms of the model of Chapter 3, where these probabili-
ties are modelled as probit functions, this occurs when
unto.

(2) The samples are very distinct. The two distrib-
utions are sufficiently far apart so that there is a very
small probability of misclassification for any observation.
Therefore, there is hardly any need for information (im-
perfect or otherwise) in determining sample separability.
This occurs when the means of the two distributions in the
sample are clearly separate ([11 distinct from.,uz; 81 dis-
tinct from 82).

Our third conclusion again follows from the first.
The value of imperfect sample separation information is high-
est, or the gains in efficiency in using unreliable informa-

tion are greatest, under the following circumstances:

(1) The imperfect sample separation information is

108

highly informative. In the extreme case, p11 ' p00 = 1
so that the imperfect indicators are perfect indicators, and
the regime is fully identifiable based on the available in-
formation. The more reliable the imperfect indicators, the
more efficient the estimates are. This occurs when p11 8
p00 or p11 near p00 in the extreme ranges of probability,
where there is great certainty and confidence that both re-
gime classifications are right.

(2) The samples are not very distinct. It is here
where information (even if imperfect) is most helpful in
determining sample separability and improving the efficien-
cy of the estimates. This agrees with the findings of pre-
vious studies (Kiefer, 1979; Schmidt, 1981; Lee and Porter,
198“) that the value of sample separation information is
largely dependent on the natural separation of the two sam-
ples. The closer the distributions in the sample (I41
close to #2; 81 close to 82) and the closer the variances
are, the more important is information in assigning regime
membership.

Fourth, it is the intercept term rather than the
slope term in a switching regression model which mostly de-
termines sample separability. It is more difficult to dis-
tinguish one sample from the other when the intercepts are
close together rather than when the slopes are. Therefore,
the efficiency losses in using no information or using par-
tial information are far greater for the parameter estimates

when the intercepts are hardly distinct from each other as

109

compared to when the slopes are hardly distinct from one
another.

Fifth, the value of sample separation information is
highest for the estimates corresponding to the mixing para-
meter or to the parameters of the switching probabilities.

Sixth, there are definite efficiency gains when we
model our switching probabilities as non-constant probit
functions of the explanatory variables. These gains occur
in circumstances where information is most valuable; that
is, when samples are hardly distinct from each other and
when the imperfect regime classification information is not
very informative.

Seventh, as we continually expand on our basic switch-
ing regression model, we find that regime classification
information becomes more valuable. The value of sample se-
paration information is more important for complicated models,
as Kiefer (1979) suggested. This is due to the fact that as
we try to estimate more parameters, more variability is in-
troduced to the estimates, which is naturally reflected in
larger variances. This notion of more variability in the
model is also evident in other situations -- when the X pa-
rameters are not very informative, when samples are diffi-
cult to disentangle, and when a particular regime is observed
with a lower probability.

Eighth, in accordance with the findings of Schmidt

(1981), the value of information, imperfect or otherwise,

is higher for the regime which is observed with the lower

 

110

probability.

In light of these findings, a final word is warranted.
Sample separation information, even if imperfect or unrelia-
ble, can be used to improve the efficiency of parameter es-
timates in switching regression models. Its use is most
valuable when the samples are hard to disentangle from each
other, and when the imperfect information is informative and
fairly reliable. Under these conditions, it may also be ad-
visable to model the switching probabilities as non-constant,
since this action can further increase the efficiency of the
estimates, particularly when the samples are difficult to
distinguish from one another. Presumed regime classification
probabilities given the actual regimes may also be modelled
as non-constant to further improve the model's flexibility.
However, when the imperfect information is highly unreliable
or when the samples are clearly separate, there is little
point in using imperfect information, since only small effi-
ciency gains are possible. In addition, one must consider
the trade-off implied when adding more parameters to the model
(like imperfect regime indicator functions with probit para-
meters) since such an action gives the model more variability
and tends to increase the variance estimates. Therefore,
gains achieved by improving the model's plausibility may be
lost or at least partially offset by introducing more varia-
bility into the model when additional parameters have to be
estimated.

APPENDICES

 

APPENDIX A

THE SECOND DERIVATIVE COMPONENTS OF THE
INFORMATION MATRIX IN THE CASE OF

NON-CONSTANT CLASSIFICATION PROBABILITIES

The density function (we drop the subscript J for

simplicity) when the regime is unknown is:
f(y; 0) = Afl(y) + (1 - )~)f2(y)
where:

9 = (9151525512’ 522’ 0)'

31‘ (511’ 312"", 3110'

[" We]

fi(y) = 2
2‘1

1
J2“- (i
1=1,2

The information matrix is given by:

M 2
Sg-EIZ"[3 lnfj]

 

3929'
where:
azlnf 921‘ r r .
J=._1___J_- 1(2_.q(_3_1)
9939' rJ 3929' a? be 30

The first derivatives of f with respect to 0 are given in
the text of Chapter 3. The non-zero second derivatives of
f with respect to 0 are:

2

351k 351m 2"11:35 1m

111

 

 

a 2 r 7
331k 361

321' .

 

251k 3A

321‘

 

3321c 982m

 

 

 

 

112

2

j
351k 361

2‘1

951k

2
=(1-')) 3 1‘2

932k 332m

2
=(1-M 3 f2

33 2k 362

 

2

 

 

 

 

 

 

for simplicity) when the regime is partly known is:

32H - Bf1[(y-x'$1)2- 1]+

emf)?- 3612

i . 1,2; k,m = 1,2,...,K

The Joint density function (we omit the subscript J

f(y, W; 9) = )sf1(y)(wp11 + (1 — W)(l - 1311)) +

mmam‘u—m

where:
9 ‘ (3153255112 522’ >" $1,960.),
$1 = (311’ 512“”: 5110'

Ks B (Xsl’ xs2"°” KSK)'

2
f1(y) = 1 exp [i' (y ' x' Pi) ]
JW ‘1 2 «12

 

 

 

 

POO=F(X'XO) IISX'KO J1 exp [-vz] dv
-4; 1"

i = 1,2; 8 = 0,1

F( ) . standard normal cumulative distribution function

The information matrix is given by:

2
§--E£[3 Inf-1]
V' 3939'

11H

 

where:

32 lnf 321‘ 3r 3f '
i=_1.__J_-_}_(__J.)(_i)

3000' rJ 3930' r32 30 30

The first derivatives of f with respect to O are given in
the text of Chapter 3. The non-zero second derivatives of

f with respect to 0 are:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

32 r = M21 3 f1
3‘1:: 351m ‘ a31k 351m

2

32 f = )Q1 3 fl
351k 3612 walk 3612

a 2 r = Q1 Bfl
3511(93 aElk

32:? = X(w-(l-w)) 3f1 3F(X'X1)
”M 3“m 351k ”in:

2

321‘ =(1-MQ2 31‘2

352k a52m a$21: 952m
2
221‘ =(1—)\)Q2 Bra
3 2

B521: 9‘2 352k 3‘2

32 r = — Q2 91.2

321, =-(l-)\)(w-(l-w)) 3:2 3“"‘0’
a521: 9“ Om 382k ”‘01::

2

32f = Hal 3 f1
2( (12)2 3 (€12)2

 

115

)2: ‘Q1 3%

 

 

 

 

 

 

 

 

 

 

 

 

36122): 2515

32: =}\(w-(1-w)) 3"1 3“"‘1’
9412 axlk 2&1: axlk

2

92:“ =(1-MQ2 3 f2
3((22)§ buzz)?

321' =-Q2 3f2

3622»; 2&2:

32: =-(1- M(w-(1-w)) 2‘2 3“”‘0’
3‘229X0k a‘22 DXOK
92f =fl(w- (l-w)) 3F(X'Xl)

axaxlk BKlk

32f =f2(w-(1-w)) 9F(X'KO)

2A 230k “(ox

2 :
32: =)\fl(w-(1-w))3 F“ ‘51)
2 c
32: =-(l->\)f2(w-(l-w))3 1"("Xo)
aXOk 330m aka 3X03,
where:

Q1 = wF(x' X1) + (1 - w)(l - F(x' Xl))

Q2 = NO - F(X' 1(0)) + (l - W)F(X'Ko)

2
3 f1 = fi (- xkxm) + (y-x'ﬁi) xk in

2
9311:3331; ‘1 ‘12 a‘gzun

 

 

116

2

 

 

 

 

 

3 f1_-(y”"?1)xk[9f1 1-f1]
251k 3612 3‘1 ‘1 (1
sari . an [(v-x'wz- 1 ]

2
f1[1 _(y-X'$1):|
2‘1 ‘16

2
a F(X'K ). I _ t
s ﬂ(x ISSH x Ks) xkxm
axsk aXsm
i - 1,25 s - 0,1; k,m 8 1,2,...,K
F( ) = standard normal cumulative distribution function

E( ) = standard normal probability density function

APPENDIX B

THE SECOND DERIVATIVE COMPONENTS OF THE
INFORMATION MATRIX IN THE CASE OF
NON-CONSTANT CLASSIFICATION PROBABILITIES
AND NON-CONSTANT SWITCHING PROBABILITIES

The density function (we drop the subscript J for

simplicity) when the regime is unknown is:

f(y; 9) = \flw) + (1 - l>f2(y)
where:

9 ‘ (31': F25 ‘12: ‘22: Q')'
#1 ‘ (511’ 512"“: 3119'

% = (4.1, can”, QKM

, 2
f1(y) = l eXP [' (y ' x 51) J
1

223? s 2 «12

 

>\=F(x'g)=Sx'Q l exp[-v2:|dv
'06 J21?

1=1,2
F()

standard normal cumulative distribution function

The information matrix is given by:

,A 2
3=-E§‘[31”f1]
V 293m

where:

2 2 I

a 1nr1._1_ a r1_.1_2(ar1)(3r1)
f

3030' 121 3030' J 39 Do

118

The first derivatives of f with respect to 0 are given in
the text of Chapter A. The non-zero second derivatives of

f with respect to G are:

 

 

 

 

2
221' =F(x'6,) 3 1.1
951k 35 11:: 3511c 551m
2
92f jar-Pong) B 1‘1
2 ‘5
351k 3‘1 3"1k 2‘1

92f = 315'1 3F(X'Q)
D$n(36m aBnc 36m

 

 

 

 

 

 

 

 

 

 

2
32: =(1-F(x'a>) 3 1‘2
332k a52m a52k 232m
2
92: =(1-F(x'€g)) 3 f2
2 2
252k 362 aﬁzk 362
32: = - 2f2 BF(x'Q)
333(3Qn1 382k 36m
2
32 r =F(x'(¢,) 3 f1
emf)? emf)?
Ber = 9f1 312006)
3612 34k 3€12 Mk
2
32f =(l-F(X'€)) 3 f2
2 2 22
3(62) 3K2)
321‘ = - 9f2 me'é)
as} aak ac; ack

321' = (fl-f2) 32F(x'€_)
Bekaem ack 36m

119

 

 

 

 

where:

321:1 . r1 (- xkxm) + 3:1 xk (y-x'ﬁi)
951:: 351m :7: 3E: ‘12

32f1 = (y'x'ﬂih‘k 2‘1 1 - 1{'1
am [7??? 2“]

32‘}: Bfi [(y'x'31)2- 1 ]+
W 75? 2:1" W

, 2
f1 [ 1 _ (y ' x 51) J
2611: «16

32 mm.) = m'cx- X'mxkxm
Eek 2cm

1 = 1,2; k,m = 1,2,...,K
F( ) 8 standard normal cumulative distribution function

ﬂ( ) = standard normal probability density function
The Joint density function (we omit the subscript J
for simplicity) when the regime is partly known is:

f(y, W; 0) = % f1(y)(Wpll + (l - W)(l - p11)) +

(1 - A )f2(y)(w(l - poo) + (1 - w)poo)
where:
2 2
9 g ( Fl"’$2ﬂ’*61.’ ($2 , Q3, )glc"xoc)v
F1 " ( 311’ 312““: FiK)'
Ks (Xsl’ X32"'°’ szy
4. = (61, 1'22“.” ex):

120

, 2
f1(1)!) = 1 eacp [ ' (y ' x 31) ]

J21? (1 251:

 

)\=F(x'(¢.)*j‘x'6' _i_ exp [-v2]dv

 

 

-.b J??? 2
x!
P11=F(x"61) '[ X1 _l__ exp [-vz] dv
-.o firm" 2
x' 1‘o 2
POO=F(X'XO)'X _2L_ exp [-v ]dv
-ob J?“— 2

1 = 1,2; 8 = 0,1

F( ) = standard normal cumulative distribution function

The information matrix is given by:

§=-E;: [321”3]

3“

 

3930'
where:
2 2
a 1n fJ Si 2 f1 _ 1 (3f!)( afl):
acao' rJ 3939' I"? Do 39

The first derivatives of f with respect to G are given in the
text of Chapter 4. The non-zero second derivatives of f with

respect to O are:

 

 

 

 

2
32f -F(x'&)Ql 3 f1
351k 951m 351k 351m

2
Bar j=F(x'Q)Ql 3 f1
951k 3512 951k 2612

3 2 f - Q1 afl 3F(x'4)
321k 96m 331k 9Qn:

121

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

32 r a F(x'C, )(w - (1 - w)) 3‘1 BFW‘H)
3311: 9‘ﬁlm a51k 3ﬁlm
2
3 2 r = (1 - 1"(x'ﬁ.))c22 3 f2
95 2k 8(92le 352}: 352m
2
32: ﬁ=(1-F(X'Q))Q2 3 f2
a32k 3‘ 22 33 2k 9‘2?
92f =-Q2 3f2 3F(x'ﬁl
aﬁgkagm 322k 3Q”)
32: -=-(1-F(x'Q,))(w-(l-w)) Bra 9F("'7‘0)
3‘5 2k ax0m a(52k BXOm
2
BZr =F(x'Q)Q 3 f1
2ﬁ2’ 1"""""""'2 2
3(61) 9(61)
321‘ 2Q Bf]. amx'Q)
2 1""""‘2
361 ”'1: 351 36k
Bar =F(x'Q)(w- (l-w)) bfl 3“""1’
3412 axlk “12 axlk
2
32 r a (1 - F(x'Q ))Q2 3 f2
9(622)2 3(62 )
_B_E?t;__= - Q2 1:2? apnea!
362 36k ace 2%
22 r = - (1 — rows, ))(w - (1 - w)) 312 amx'xo)
2 2

’a 2 r = (rlol - f2Q2) 92 ch'é)
9% 36m aakaam

122

3 2f =f1(w- (l-w)) 3F(x'&) BF(X'X1)

 

 

 

 

 

 

3Q}: 2K 1m Balk 21‘1m

3 2 r = 220,: - (1 - w)) DFu'Q) BM“! 0)
254k axOm 3Q): 3)‘0m

2 :

3 2 r = F(x'Q ”1‘" - (1 - w)) 3 F("11)

91‘1k ”11:: 2“1k 2x111:
2 t

32f =— (1-F(x'Q,))f2(w- (l-w)) a F“ ‘0)

3x01: BKOm 5x01: 210m
where:

Q1 = wF(x"61) + (1 - w)(l - F(x'X1))
Q2 = w(l - F(x' 1(0)) + (1 - W)F(X' X0)

2

 

 

 

 

 

 

 

9 f1 = i‘1 (- xkxm) + afi xk (y - x'ﬁi)
“—2 ”'7
251k 281m ‘1 351m ‘1
2 v
3 f1 =(y'x51)xk[3f1 1-f1]
2 ﬂ“? —'&'
251k 351 2‘1 C1 ‘1
2 . 2
a fi,= afi [(y'x 51) - 1 ]+
3(512)2 3512 261“ 2612
f1[ 1 -(y'x'$1)2]
"—1 6
2‘1 ‘1

22 mx'fz) = mx'QH- x'Q. )xkxm
22 k, m

2
3 F(x'X8) . my XSH- x' xs)xkxm
axSk axsm

123

i I 1,23 s 8 0,1; k,m = l,2,...,K
F( ) = standard normal cumulative distribution function

E( ) - standard normal probability density function

BIBLIOGRAPHY

BIBLIOGRAPHY

Ashford, J.R. and Sowden, R.R. "Multivariate Probit Analy-
sis." Biometrics, 1970, Volume 26, 535-5u6.

Eaton, Jonathan and Gersovitz, Mark. "LDC Participation in
International Financial Markets: Debt and Reserves.”
§ggrna1 of Development Economics, 1980, Volume 7,

-10

Fair, R.C. and Jaffee, D.M. "Methods of Estimation for Mar-
kets in Disequilibrium." Econometrica, 1972, Volume
#0, u97-51u.

Gersovitz, Mark. "Classification Probabilities for the Dis-
equilibrium Model." Journal of Econometrics, 1980,
Volume 1”, 239-246.

Goldfeld, Stephen and Quandt, Richard. "Estimation in a
Disequilibrium Model and the Value of Information."
Journal of Econometrics, 1975. Volume 3, 325-3H8.

Hamermesh, Daniel. "Wage Bargains, Threshold Effects, and
the Phillips Curve." Quarterlngournal of Economics,
1970, Volume 8n, 501-517.

Hartley, Michael. "Comment on "Estimating Mixtures of Normal
Distributions and Switching Regressions"." Journal of
the American Statistical Association, 1978, Vqume 73,

Judge, George G., Griffiths, William E., Carter Hill, 3., and
Lee, Tsoung—Chao. The Theor and Practice of Economet-
rics. 1980, New York: John iléy and Sons, Inc.

Kendall, Maurice and Stuart, Alan. The Advanced Theory of
Statistics, Volume 1. 1963, New York: Hafner.

Kiefer, Nicholas. "Discrete Parameter Variation: Efficient
Estimation of a Switching Regression Model." Econo-
metrica, 1978, Volume #6, h27-H3H.

. "0n the Value of Sample Separation Information."
conometrica, 1979, Volume “7, 997-1003.

12“

125

. "A Note on Regime Classification in Disequilibrium
godeés." Review of Economic Studies, 1980, Volume 47,
37- 39 o

Laffont, Jean-Jacques and Garcia, Rene. "Disequilibrium
Econometrics for Business Loans." Econometrica, 1977,
Volume #5, 1187-120U.

Lee, Lung-Fei and Porter, Richard. "Switching Regression
Models with Imperfect Sample Separation Information --
with an Application on Cartel Stability." Economet-
rica, 198“, Volume 52, 391-N18.

 

Quandt, Richard. "A New Approach to Estimating Switching

Regression Models." Journal of the American Statisti-
tical Association, 19 , o ume , - .

and Ramsey, James. "Estimating Mixtures of Normal
Distributions and Switching Regressions." Journal of
the American Statistical Association, 1978, Volume‘73,

 

Rosen, Harvey and Quandt, Richard. "Estimation of a Disequi-
librium Aggregate Labor Market." Review of Economic
Statistics, 1978, Volume 60, 371-379.

Schmidt, Peter. "Further Results on the Value of Sample Sepa-
ration Information." Econometrica, 1981, Volume #9,
1339-1343.

. "An Improved Version of the Quandt-Ramsey MGF Esti-
mator for Mixtures of Normal Distributions and Switch-
ing Regressions." Econometrica, 1982, Volume 50,

501-516.

Suits, Daniel. "An Econometric Model of the Watermelon Market."

Journal of Farm Economics, 1955, Volume 37, 237-251.