I'Hzt‘V ("$1, 13 3 f1. Lt g-xazni‘nuv fé‘fi 551:..i n .112 41.3.... 11‘. e u- a,.£ bwgntaw Uzi-Haw»- This is to certify that the dissertation entitled THE VALUE OF IMPERFECT SAMPLE SEPARATION INFORMATION IN SWITCHING REGRESSION MODELS presented by Edwina A. Masson has been accepted towards fulfillment 1 of the requirements for I Ph .D . degree in Economfics l ”em SQELQ/ Major professor Peter J. Schmidt Date Ju1y 3'; 1985 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 'rvifs‘aj RETURNING MATERIALS: Place in book drop to LIBRARIES remove this checkout from Ail-(SIIIL. your record. FINES will be charged if book is returned after the date stamped below. THE VALUE OF IMPERFECT SAMPLE SEPARATION INFORMATION IN SWITCHING REGRESSION MODELS By Edwina A. Masson A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1985 ABSTRACT THE VALUE OF IMPERFECT SAMPLE SEPARATION INFORMATION IN SWITCHING REGRESSION MODELS By Edwina A. Masson The purpose of this study is to determine the value, in terms of efficiency gains, of using imperfect sample se- paration information in switching regression models. The imperfect information appears in the model as a regime clas- sification, which is correct only with some probability. The importance of this study lies in the fact that knowledge of improvements in the efficiency of parameter estimation can guide one in determining whether to use sample separation information, even if it is unreliable. We determine the value of sample separation informa- tion by comparing the asymptotic variances of the parameter estimates, under different assumptions about the available information. These assumptions range from perfect sample separation information, at the one extreme, to no such infor- mation whatever, at the other extreme. The asymptotic var- iances of the parameter estimates are obtained from the rele- vant information matrices, which are calculated by simulation over a very large sample size. Among our findings, the following are most important. (1) There are efficiency gains when using imperfect informa- tion as compared to no information at all, and these can be Edwina A . M88 8011 substantial in some cases. (2) Efficiency gains when using imperfect sample separation information are greatest when such information is highly reliable; and when the samples are difficult to disentangle from each other. (3) There are additional efficiency gains when the switching probabi- lities are modelled as probit functions of the explanatory variables. These gains occur in cases when they are most needed; specifically, when the samples are hardly distinct from each other, and when the imperfect sample separation information is not very informative. ACKNOWLEDGEMENTS I would like to express my deepest gratitude to my adviser, Professor Peter Schmidt, for all the guidance and encouragement he gave me throughout the course of this thesis. He was always available with helpful suggestions and was very patient with me, particularly when the thesis problem had not yet been explicitly defined. I am also grateful to the other members of my dissertation committee -- Professors Christine Amsler, T.C. Anant, and Stephen Martin. Most of all, I want to thank my family, especially my parents and my husband, for their support and encouragement during my years of study at Michigan State. 11 LIST CF CHAPTER ONE TWO THREE FOUR TABLES .......................................... v Page INTRODUCTION .................................... l 1.1 Definition of the Problem .................. 1 1.2 Formal Discussion of Switching Regression Models .......................... 3 1.3 Review of the Literature .................. 12 1.“ Plan of the Study ......................... 20 THE CASE OF CONSTANT REGIME CLASSIFICATION PROBABILITIES ................... 23 2.1 The Model ................................. 23 2.2 Derivation of Asymptotic Variances ........ 26 2.3 The Value of Imperfect Information ........ 32 2.u Summary ................................... 41 THE CASE OF NON-CONSTANT REGIME CLASSIFICATION PROBABILITIES ................... U3 . 3.1 Introduction .............................. A3 3.2 The Model ................................. M5 3.3 Derivation of Asymptotic Variances ........ A7 3.A The Value of Imperfect Information ........ 53 3.5 Summary ................................... 72 THE CASE OF NON-CONSTANT REGIME CLASSIFICATION PROBABILITIES AND NON-CONSTANT SWITCHING PROBABILITIES ........................ 7H h.l Introduction .............................. 7H TABLE OF CONTENTS 111 iv CHAPTER Page ”.2 The Model ................................. 76 H.3 Derivation of Asymptotic Variances ........ 79 4.A The Value of Imperfect Information ........ 8H “.5 summary .0.000.00.0000...OOOOOOOOOOOOOOOOO101 FIVE COtJCLUSIONS I.O...OOOOOOOOOOOOOOIOOOO0.0.0.0... 101‘ APPENDIX A. The Second Derivative Components of the Information Matrix in the Case of Non-Constant Classification Probabilities ............................ 111 APPENDIX B. The Second Derivative Components of the Information Matrix in the Case of Non—Constant Classification Probabilities and Non-Constant Switching Probabilities .................. 117 BIBLIOGRAPHY 0.0.0.0...OOOOOOOOOOOCOOOOOOOOOO00.0.00... l2“ Table LIST OF TABLES Tables on Ratios of Asymptotic Variances Page Varying p11 and pm when ,ul 0, [.42 = 2, 61. 62.1, )..5 .....0.................... 35 Varying [.12 when [11 - 0, C1 = 62 . 1, I..5’ p11.p00..8 .C.................OCOCOC 38 Varying X when Iul I 0, p2 - 2, C1. (281’ pllgpoo- .8 0.0000000000000000. 39 Varying 6'2 when [Ml . 0, [J2 - 2, 6' = l, 1 k - .5. p11 - p00 - .8 ......................... no Varying h ($2 - h Fl) when F1 - (1, 1)‘, 61 - 62 - 1, >.= .5, ‘6- (1, -1, 1, 1)‘ 57 Varying €22 when fi- (0, 0, 0, pay, (1 - 62 . 1, ks .5, X- (1, -1, 1, 1)' 59 Varying $21 when F- (0, 0, F21, O)‘, i 61 - 62 - 1, A: .5, X- (1, -1, 1, 1)‘ 60 Varying ‘60 when 6- (O, O, 2, 0)’, 61 - 62 - 1, >.- .5, 151 = (1, -1)' 614 Varying X12 and X02 when fi- (0, O, 2, 0)‘, (1— 62-1, A- .5, X - (1, X12, -1, onw 67 Table 10 11 12 13 14 15 16 vi Page Varying ‘6 (X1 = X0) when @= (0. 0. 2. 0)‘. (1-C2=1,>\-.5 ......................... 69 Comparison of F(x"61) - F(x'KO) = .8 and p11 = p00 = .8 when §= (O, O, 2, O)‘, 61=62=1,)\=.5 ......................... 7o Varying $21 when §= (O, 0, $21, 0)’, Cl - (2 -= 1, Q= (o, 0)'. K= (l, -1, 1, 1)‘ ............................ 88 Varying $21 when $= (O, 0, $21, 0)’, (1 = 62 = 1, Q= (1, -1)', K= (1, -1, 1, 1)' ............................ 9o Varying ‘6 (X1 #5 KO) when 6'- (0, 0, 2: O)‘, (1" €2=1, Q=(o,0)' .................... 9L: Varying ‘5 (X1 1‘ KO) when 38 (O, O, 2, O)‘, ‘1‘ (2=1, Q=(1,-1)' ................... 96 Varying Q when F = (O, O, 2, O)‘, €1= Q2=1, X= (1,-1,1,1)' ............. 99 CHAPTER ONE INTRODUCTION 1.1 Definition of the Problem Switching regression models, normal mixture models, and disequilibrium models are systems characterized by dis- continuous shifts in regression regimes at unknown points in the data series. The most common formulation hypothesizes that the system may switch numerous times back and forth between two particular regimes, or to successive new regimes. For the sake of simplicity, we shall restrict our discussion to the case in which it is known a priori that the number of regimes is two. These models are primarily designed to deal with samples in which sample separation information is miss- ing. That is, we do not know whether an observed random var- iable is generated by one regime (which corresponds to a distinct regression model) or by another regime (which cor- responds to another regression model). An interesting issue here is the loss, measured in terms of the efficiency of parameter estimation, when sample separation (alternatively, regime classification) is unknown or is not observed. A number of papers (Goldfeld and Quandt, 1975; Kiefer, 1978; Schmidt, 1981) have addressed this ques- tion in the context of disequilibrium models and normal mix- ture models. All these studies found that sample separation information does have a positive value, in that estimates derived are more efficient when there is a priori knowledge as to which regime each observation belongs to. This con- firms the need to obtain reliable information about sample separation, when it is available. The purpose of this paper is to extend the issue one more step. Sample separation information may exist, but may not be entirely reliable. Such a situation may conceivably arise in models with outliers, or when the available data is simply not entirely accurate. By how much is efficiency im- proved when imperfect regime classification information is used? This paper attempts to answer that question, and is therefore, an extension of Schmidt's paper, with the addi- tional use of imperfect sample separation information. We will address the issue strictly in the context of switching regression models. The importance of this extension lies in the fact that knowledge of improvements in efficiency of parameter estima- tion can guide one in deciding whether to use sample separa- tion information, even if it is known that such information is imperfect or unreliable. In addition, even if imperfect information is not readily available, knowledge of efficien- cy gains will aid in determining whether such additional in- formation is worth obtaining at all. Before we proceed any further, a formal discussion of switching regression models is warranted at this point. 1.2 Formal Discussion of Switching Regression Models The simplest possible formulation is a normal mixture model (actually, a switching regression model with only a constant term), where a sample of observations y1, y2,..., yn is given on a random variable y. It is known that nature chooses between regimes with probabilities A and 1 - ”A. That is, yrx/ N( pl, (12) with probability A (1.1) (regime 1) yrv N( “2, (22) with probability (1 - A) (regime 2) where the parameters #1, #5, C12, 622, and A are unknown. A more complicated case arises in the switching regression model in which observations are given on a random variable y and on a vector of nonstochastic regressors x. Nature is as- sumed to generate each yJ from xJ by regime l with probabili- ty A , and by regime 2 with probability (1 - A ). Therefore, we have: = ' I yJ x13 51 + ulj with probability A (1.2) (regime 1) 8 ' .- yJ x23 52 + 112.1 with probability (1 A) (regime 2) 2 2 where um!V N(0, (1 ), “23'” N(O, 62 ), and the parameters Fl, 82, 612, 622, and A are unknown. There are also so- called disequilibrium models (which we will not discuss in this paper), in the context of demand and supply equations. Such models are characterized by a minimum condition, as in qJ - minimum (DJ, SJ) for an ordinary demand-supply model, where the observed quantity qJ is the smaller of demand and supply. They are similar to switching regression models, since observations can come from two regimes (supply or de- mand equations), but the probability of an observation coming from a given regime varies over observations. In an economic context, applications of such models are plentiful. Hamermesh (1970) used a switching regression model to examine the determination of wage bargains from ob- servations on wage changes, changes in the consumer price in- dex and unemployment. The dependent variable is the wage change, w, and he hypothesized that the effect of cost of living changes, c, on wage changes is significantly positive only when cost of living changes exceed some critical figure, which has been selected a priori. There are two wage bargain equations, each one corresponding to when E is either less than or greater than and equal to this predetermined criti- cal figure. This is a case where regime classification is known. Quandt and Ramsey (1978) re-estimated Hamermesh's mo- del where there is no prior information as to the critical value of 6 below and above which different regression regimes are at work. They assumed that nature chooses between the two regressions for any observation, by comparing 6 to a critical value (known only to nature). If this critical value is F, and the fraction of observations with c _<_ c- is equal to A , then nature chooses one regime with probability ?\, and the other regime where c) '5' with probability (1 - A ). This is a case of no sample separation information and the regimes are unknown. Lee and Porter (198A) used switching regression tech- niques to model a supply function for a railroad cartel. This supply function identifies periods in which firms are behaving non-cooperatively as opposed to cooperatively, i.e. whether price wars were occuring or not. The dependent var- iable is the market price for grain, so that price wars with- in the cartel shift the supply curve to signal reversions from collusive (higher prices) to non-collusive (lower prices) be- havior. They assumed that sample separation information was available, though not perfectly reliable. Examples of disequilibrium models can be found in the watermelon market (Suits, 1955); the market for housing starts (Fair and Jaffee, 1972); the market for chartered banks' loans to business firms (Laffont and Garcia, 1977); the U.S. labor market (Rosen and Quandt, 1978); and credit rationing in international lending (Eaton and Gersovitz, 1980). If information on sample separation is known for switching regression models, then estimation of the parame- ters in the respective regimes is straightforward and is done by least squares. If information on sample separation is un- known, then we are confronted with the problem of regime classification, and estimation of the parameters is done by either maximum likelihood, method of moments, moment genera- ting function, or modified moment generating function. The choice of the appropriate estimation technique, however, does not concern us here, and so we will only provide a brief 0- verview of the issues involved. A more detailed discussion of the issues may be obtained from the references cited. We shall restrict ourselves to the basic normal mix- ture case of equation (1.1), since the extension to equation (1.2) is fairly straightforward. It should, first of all, be noted that parameters of finite mixtures of normal densi- ties are identified, and that there exists no sufficient sta- tistic for the parameters of a normal mixture (Quandt and Ramsey, 1978). Under the assumptions of (1.1), the probability densi- ty function for yJ (J = l,...,n observations) is: . 2 2 fJ f(yJ, M1, M2, 61 . 62 . A) (1.3) hfl(yj) + (1 - )M‘ZUWJ) 2 = A exp [- (yJ - M1) :| + 2 J??? ‘1 2 61 2 (1-)\) exp [- (yj' M2)] féTo'Z 2 «,2 f1(y3) and f2(yJ) are the normal probability density func- tions for observations from regime 1 and regime 2, respec- tively. The likelihood function for the unknown parameters is: The natural procedure for estimating the parameters using ma- ximum likelihood is to maximize the likelihood function with respect to the parameters. This, however, runs into diffi- culties since as either (1 or (2 goes to zero, f3 increas- es without bound. It follows that the likelihood function L is unbounded, and the unboundedness of the likelihood func- tion means that any attempt to find a global maximum will produce inconsistent estimates. To avoid this, it is possi- ble to specify a priori knowledge of the ratio of the varian- ces 6'12, (22 and to set (12 . h 622, or alternatively, to specify that 522 2. h (12, where h is known (Goldfeld and Quandt, 1975; Kiefer, 1978). Another problem with maximum likelihood estimation is the potential singularity of the mat- rix of second partials of the log likelihood function, which is equivalent to a vanishing Jacobian for the set of normal equations derived from the maximum likelihood approach (Quandt and Ramsey, 1978; Hartley, 1978). Kiefer (1978) argues that although the likelihood function is known to be unbounded at some points on the edge of the parameter space, the likelihood equations have a root which is consistent and asymptotically normally distributed. Therefore, computation of the maximum likelihood estimates should attempt to find a local maximum in the interior of the parameter space of the likelihood function. However, the at- tainment of such a maximum may be difficult in practice so that alternative estimators may need to be considered. Quandt and Ramsey (1978) propose using either the method of moments or the method of the sample moment genera- ting function (MGF). Under the method of moments, the sample mean is equated to the theoretical first moment of equation (1.3) and the second, third, fourth and fifth sample moments about the mean to the corresponding theoretical central mo- ments (if there are five parameters). From this, we obtain five equations from which it is possible to solve for consis- tent estimates of the five parameters. However, if there are K (where K '> 1) independent variables in the switching reg- ression model (in the normal mixture model, K - 1), then the number of parameters is 2K + 3. It follows that moments of order even higher than five need to be employed, and the re- sults are likely to be fairly unstable. While no estimates of the sampling variances are provided by this technique, it is well-known that, as a general rule, the sample variances of higher-order moments are quite large (Kendall and Stuart, 1963). For these reasons, the MGF technique is preferred 0- ver the method of moments as an estimating procedure. The MGF method solves for the values of the parameters by minimizing a sum of squared differences between the empi— rical and theoretical values of the moment generating func— tion. Define the following expression: 'é (1.14) I 'IN-o (N Sn(o(, 9) " 2 61: u 1'44 -' 2 (zn(dt) - G(O, 4t)) where: 33 = (El, arm, ET) ... 1 M Et ‘ H 12 ejt En(°(t) - gijzziexp (dtyJ) C(O. dt) = 7\exp [Mldt + .(t2 612] + 2 2 2 (1 - A) exp [fledt + ott 0'2] ___2__ t=1’ooo,T;J=1’ooo,n T different values of 0‘ are picked (where T Z the number of parameters, i.e. 5 in this case) and Sn(o(, O) is minimized between the T estimated MGF values and their theoretical coun- terparts. The «it (t = l,...,T) are chosen so as to ensure that the corresponding normal equations derived from the mi- nimization of Sn(~)(1 - DJ)f2(yJ) where G is the vector of parameters. The precision of a maximum likelihood estimate based on the Joint and marginal density is defined, respectively, as (the subscript J was dropped for simplicity): -E321n11y,D) and -E321nfm 3939' ' 393w 15 By definition, ln f(y, D) - 1n f(y) + 1n f(D/y), so that it follows that: -E 321nr(y, D) = -E 321nr(y) 3939' 3939' - E 92 ln f(D/y) 3999: The precision of the maximum likelihood estimator based on the Joint density is equal to the precision of the maximum likelihood estimator based on the marginal density (here, f(y) corresponds to the formulation in equation (1.3)), Plus a positive definite matrix. It follows that the precision of the estimates based on the former is always greater than that for the latter. Estimates are naturally more precise when there is more information. To confirm this relationship, Monte Carlo experiments were conducted on a normal mixture model where the only para- meters being estimated are the means. Precision ratios were then taken for the full information and limited information models and converted into asymptotic variance ratios (to fa- cilitate a comparison with the Goldfeld and Quandt results) by inversion of the information matrix. Note that the pre- cisions of the estimates are derived from the information mat- rix, and that the inverse of the information matrix is a con- sistent estimate of the asymptotic variance-covariance matrix of the parameter estimates. Two types of experiments were conducted -- first, when only one mean had to be estimated, and second, when two mean 16 values had to be estimated. Given fixed variances and the mixing parameter, the values of the means were allowed to vary. Over a series of cases, the asymptotic variance ra- tios for regime known relative to regime unknown were compu- ted, and were found to be all less than 1.0, consistent with the results of Goldfeld and Quandt. Efficiency loss from u- sing the marginal rather than the Joint density could be con- siderable. When the means of the samples are close together, ra- tios tend to be small, so the effects of implicit misclassi- fication are serious and estimates suffer. As the means be- come farther apart, the probability of misclassification be- comes so small, so that estimates become almost as efficient (the ratios approach a value of 1.0) as estimates based on known sample separation. These numbers are generally a little higher than those obtained by Goldfeld and Quandt, indicating that the value of information in more complicated models (i.e. disequilibrium models) is greater than that in simpler models, as seems plausible (although this must be qualified since the Gold- feld and Quandt results are for small samples). At any rate, these results supplement the Monte Carlo evidence of the ear- lier study by showing that efficiency losses from not observ- ing sample separation, found in small samples by Goldfeld and Quandt, persist and can be substantial asymptotically. In his work, though, Kiefer assumed that the variances and the mixing parameter are known and only the means in the 17 normal mixture model have to be estimated. Schmidt (1981) extended Kiefer's results by also working on a normal mix- ture model and he derived asymptotic variance ratios (again, from the inverse of the information matrix), this time as- suming that all parameters have to be estimated. The ra- tionale behind this is that Kiefer's results understate the true value of sample separation information for the following reason. In the unknown regime case, the information matrix is not diagonal and estimates of the means are improved by knowledge of the variances and the mixing parameter, so that sample separation information is less valuable when some of the parameters are known than when all the parameters have to be estimated. A series of experiments were conducted, each done with 100,000 replications. The values of the parameters were va- ried in each experiment and asymptotic variance ratios of re- gime unknown relative to regime known were derived. All the ratios are greater than 1.0, so the importance of having sam- ple separation information is again verified. Among the con- clusions in this study are the following: (1) the value of sample separation information depends strongly on the natural separation of the two samples, so that as the two distribu- tions become far apart, the value of sample separation in— formation goes to zero (ratios go to 1.0); (2) the value of sample separation information is higher for the parameters of the regime which is sampled with the lower probability; and (3) the value of sample separation information is higher 18 when all the parameters have to be estimated, which is why the results here show a larger value of information than in Kiefer's study, where only the means had to be estimated. Lee and Porter (198A) also tried to evaluate the im- portance of sample separation information in a switching reg- ression model. Their econometric model is different from the usual switching models in the literature in that there is ad- ditional imperfect sample separation information available and this is used as the regime indicator. Lee and Porter worked on a two-equation model with an application to cartel stability using a sample size of 328. The model is composed of demand and supply functions for a railroad cartel, where an attempt is made to identify periods in which firms are behaving collusively, as opposed to non-cooperatively. These different behavioral rules are reflected by differing supply functions, where the supply curve can be drawn from one of two possible regimes. The car- tel arrangements take the form of market share allotments. Firms then set their rates individually and the actual mar— ket share of any particular firm would depend on both the prices charged by all firms as well as on unpredictable sto- chastic forces. But the index of listed prices (which is the price variable in the model) is imperfect, so that mem- ber firms could not know with certainty whether secret price cutting was occuring. It is in this context that an imper- fect indicator is needed to determine whether the observed price wars represent a switch from collusive to non-cooperative l9 behavior. Their model consists of two equations: PJ - f(IJ, predetermined variables) Q3 ' g(PJ, predetermined variables) where PJ, Q3, and IJ are, respectively, the price of grain; the total quantity of grain shipped; and a latent dichotomous variable which equals 1, when the industry is in a coopera- tive regime, and equals 0, otherwise. With no reliable in- formation on IJ, it is measured possibly with error by W3, a regime classification indicator. WJ - 1, when a trade maga- zine reports collusion; and WJ = 0, when this same trade mag- azine reports that a price war is occuring. This data series may not be accurate at all, but in the absence of any other information, this extra information may still help to reduce the estimated standard errors. After all, a little informa- tion (even if not entirely accurate) may be better than not having any information at all to guide in determining regime classification. Their model was estimated twice -- first, using the partial information provided by the WJ’ and second, using no information on WJ. The estimated standard errors are smaller for the former compared to the latter. However, for this particular data set, the gains in asymptotic efficiency from using the imperfect indicator are small due to the clear se- paration of the two underlying distributions. This is evident from the fact that the two distributions of ln P3 are far 20 apart, since the difference of the means is 0.48 and the var- iance is only 0.01. This result complements the Monte Carlo simulation results of Kiefer and Schmidt that the value of any information on regime classification becomes smaller (ratios of asymptotic variances approach 1.0) as the distri- butions become clearly distinct. 1.“ Plan of the Study Our obJective in this paper is to determine the value, in terms of efficiency gains, of using imperfect sample se— paration information, given different assumptions about the parameters and different specifications of switching regres- sion models. We will integrate into our study the framework of Lee and Porter regarding the use of imperfect sample separation information in switching regression models. We will also use the approach of Schmidt (1981) where all the parameters in the model have to be estimated, so as not to understate the true value of imperfect sample separation information. Similar to Kiefer's and Schmidt's procedures, we will con- duct several experiments over a number of scenarios with dif- ferent parameter values, each time deriving ratios of asymp- totic variances, where these variances can be obtained from the corresponding information matrices. Asymptotic variance ratios will be derived twice for each experiment -- the first, showing the loss in efficiency when we have no sample separa- tion information at all relative to full information, and the 21 second, showing the loss in efficiency when we have partial sample separation information (provided by an imperfect or unreliable indicator) relative to full information. A com- parison of both results will show the extent of the advan- tages of using information even if it is inaccurate, as com- pared to using no regime classification information at all. In the Lee and Porter paper, the imperfect information indicator WJ was incorporated into the switching regression model through the use of classification probabilities -- that is, the probabilities that the regime classification is right or wrong, given the true regime that the observation really belongs to. In their model, these classification prob- abilities were assumed to be constant for all observations. In Chapter 2, we will deal with the simplest formula- tion of a switching regression model -- that of the normal mixture model. We will adopt Lee and Porter's approach of using constant probabilities of correct regime classification by our imperfect sample separation information. In Chapter 3, we extend the previous chapter to the case where we have two explanatory variables in our switch— ing regression model. In addition, we consider the case when the probabilities of regime classification are non-constant, and in fact, can be modelled as probit functions of the.e- xogenous variables. In Chapter A, we keep the assumptions of the previous chapter but we also postulate that the mixing parameter is non-constant, so that we have varying switching probabilities. 22 The mixing parameter will also be modelled as a probit func- tion of the explanatory variables. Chapter 5 summarizes the findings of the preceding three chapters and presents the conclusions we have derived based on the series of experiments conducted. CHAPTER TWO THE CASE OF CONSTANT REGIME CLASSIFICATION PROBABILITIES 2.1 The Model The first specification which we consider is the sim- ple normal mixture model, in which a random variable yJ is drawn from N( M1, 612) with probability A , and from N(742, 622) with probability (1 - A.). It can also be ex- pressed as a switching regression model, where the only ex- planatory variable corresponds to the constant term. There- fore, we have the following: yJ I x1.j fil + ulJ with probability A (2.1) (regime l) yJ -- x2J $2 + U23 with probability (1 - A) (regime 2) For the normal mixture case, le I x2J I l, and 31 I “1 and 6 2 I (‘2 are scalars. We assume that 1.11:} and uz.j are independently distributed, where “1.1"“ 11(0, (12) and uZJN MO, 622). The vector of parameters 9' I (,ul, M2, (12, (22, A ) needs to be estimated from a sample of ob- servations on yd. There are n observations with n1 from re- gime i (i I 1,2; J I l,...,n). Suppose that there is an observed dichotomous indica- tor wJ for each J, which provides sample separation informa- tion. In addition, for each observation J, we define a latent dichotomous variable IJ where: 23 2A IJ I 1 if yJ is generated from regime 1 IJ I 0 otherwise Therefore, wJ is a measure of 13’ possibly with error. The relationship between wJ and IJ can be described by the tran- sition probability matrix given by: wJ I 1 wJ I 0 I: ‘ 1 p11 p10 13 g 0 p01 p00 That is, p11 = Pr0b(wd '3 l/IJ 3 1) P01 = Prob(wJ = l/IJ = 0) P10 = Prob(wJ I O/IJ I 1) p00 . Prob(wJ = O/IJ = 0) It follows that p10 I 1 — p11 and p00 I 1 - p01. Now, let p = Prob(wj . 1). Since A = Prob(I 1) and (1 - A) I J Prob(IJ I 0), then: p I Prob(I I 1)Prob(w I l/I 1) + J Prob(I J I 0)Prob(w J = l/I I O) J J J The density function f(yJ) for yJ when we have a mixture of two normal distributions is given in equation (1.3) as: f(yj) . Prob(IJ - 1)r1(y3) + (2.2) Prob(IJ - °)f2(yd) 25 f(yj) I >\fl(yJ) + (1 - A)f2(yJ) When imperfect sample separation information using the ob- served indicator, "3’ is incorporated into the model, then the Joint density function for yJ and wJ is: f(st wd) = fl(yJ)Pr°b(WJ: 1:] B 1) + (2-3) f2(yJ)Prob(wJ, IJ . 0) f2(yJ)(wJ (1 ' A )p01 + (1 - wJ><1 - A )(1 - p01» (1 - ) )f2(yJ)(WJP01 + (l - WJ)(1 ' 1301)) A 1E‘1(yJ)(wJp11 + (1 - wJ)(l - p11» + (1 - A )f2(y3)(wj(1 - p00) + (1 - “3)1’00) where: - (y - )2 f (y ) I 1 exp J "i i J :§——- ./21r C 2 g i i i I 1,2; J I l,...,n The regime classification indicator wJ contains some information on sample separation if p11 is not equal to p01. When p11 I p01, or alternatively, when pll I l - p00’ then the Prob(wJ/IJ) I Prob(wJ) and the Joint density function is: My. WJ) I (Af1(yJ) + (1 - A)f2(y3)) x (wJp + (1 - wJ)(1 - p)) so that the indicator wJ does not contain any information on 26 sample separation. This is equivalent to having no informa- tion at all, as in equation (2.2). This is, in fact, Schmidt's model and also Kiefer's marginal density function (limited information model). On the other hand, when p11 I l and p01 I 0 (alternatively, p00 I 1), the indicator wJ provides perfect sample separation information and the Joint density function is expressed as: f(st W3) ‘ )‘f1(yJ)WJ + (1 ‘ A )f2(yJ)(1 - WJ) This is equivalent to Kiefer's Joint density or full informa- tion model, where our W3 is his DJ, the indicator of perfect information on regime classification. 2.2 Derivation of Asymptotic Variances We adopt the approach of Schmidt here. When the re- gime is known or when perfect sample separation information is available, the asymptotic variances of frT( '31 - M1), - A - A 2 2 - r 2 2 /n(#2- M2), Jn( (1 - 61 )g and /n( 62 - (2)81'8, respectively: 2‘1 Band 27 They are derived from the diagonal elements of the inverse of the information matrices from the corresponding likeli- hood functions of the respective known densities, f1(y3) and f2(yJ). The terms A and (1 - A) appear in the above expressions since they adJust for the correct sample size in each regime (nl or n2), relative to the total number of observations n. This follows from the implicit relationship that nl I An and n2 I (l - A )n. A has a binomial distrib- ution, so the asymptotic variance of /H( A - A) is A(l- A). When the regime is either completely unknown or is partly known (due to the partial sample separation informa- tion available), the asymptotic variances are derived in the same manner. Therefore, the asymptotic variances of ./H(5 - 0) come from the diagonal elements of the inverse of the corres- ponding information matrices. That is, ./H(8 - O) approaches the distribution specified by N (0, lim (i—3A-l) . The Fisher information matrix is defined as: 3=-E[221nL] 399w where: t I . f L b‘zl J In L I 15 In f Stt J When the regime is completely unknown, f corresponds to the density function laid out in (2.2) as: 28 0 is a (5 x 1) vector of parameters, defined by 0 I ([A1, #2, (12, 622, A )’. When the regime is partly known, f corresponds to the density function in (2.3): f(yj. W3; 0) I A 1E‘1(y3)(wdpll + (1 - wJ)(1 - 1911)) + (1 - A )f2(yJ)(wJp01 + (1 - wJ)(l - p01)) 0 is a (7 x 1) vector of parameters, defined by 0 I (‘51, ”2' 512’ ‘22, A ’ pll’ p01)" The expected value of the expression that denotes the information matrix was intractable analytically, so that we calculate the information matrix instead by simulation tech- 1/ niques- using the following expansion: 3=_E£fl[a lnfj] (2.1-l) " 8959' 2 [44.2. '1 (fin-AM] l" rJ 8989' :37 39 30 The model we have here when there is no sample sepa- ration information is actually Schmidt's model, so we do not need to simulate the information matrix corresponding to the density function of (2.2) since that was done in his work; we will Just adopt his results. All we need to simulate is l/From the definition of the information matrix, we know that (1/n35) has a limit. We therefore simulate lim (l/n 3) by calculating (1/n3) for some finite though large n. 29 the information matrix when we have imperfect sample separa- tion information. In order to facilitate a comparison between the results, we use Schmidt's approach in our experiments. The information matrix was evaluated by a simulation of 100,000 trials derived from a normal or Gaussian random variable generator. For any set of values assigned to 0, draws were made from the appropriate normal mixture distribu- tion. The first and second derivatives were calculated in ac- cordance with the expression in (2.4). The resulting 100,000 matrices were then averaged to obtain the information matrix, and the asymptotic variances are the corresponding diagonal elements of the inverse of the information matrix. When we have imperfect sample separation information, the expressions in (2.4) are laid out below, where f comes from the density function defined by (2.3). The first deriv- atives of f(y, w; 0) with respect to G are (we drop the sub- script J for simplicity): 91' I AQl Bfl aul 3M1 Br I (1 - A )<;22 312 3M2 8M2 2r .. A Q1 ”1 3612 3612 or = (l - A )Q2 312 ""‘"2 2 K, 362 a: - lel - £262, 3A 30 Dr -- Af1(w- (l-w)) 3p11 91‘ . (1 - A)f2(w - (l -w)) 3p01 where: Q1 . Wpll + (1 - W)(l - p11) 02 = Wp01 + (1 - w)(1 - p01) Bfi . f1 (y "' M1) 2* 2 Uri :- f1 [‘1 + (y " Mi) 1 61 J The non-zero second derivatives of f(y, w; 0) with respect to 0 are: 2 2r a r IAQ 3 1 3M2 1 3M2 1 1 2 2r 3 r -)\ol 3 1 2 2 3M1 3‘1. ”‘1 9‘1 Der -Ql 9f1 DMIDA 3A1 92f = A(w-(1-w))3f1 3“13911 ”‘1 2 32r=(l-A)QZ 9 f2 2 2 due 3M2 31 32f I(1-A)Qz 321‘; 9.442 962? 374, 3622 32f I—QZ 3‘2 Buzay 3A2 32: =(1-A)(w-(1-w>>3_f_g_ 3”231301 3"‘2 321' = A621 321‘1 N512)? emf)? 92: =Ql af1 BKIZEA -_SE:§ 32f IA(w-(1-w))3f1 asleep“ 961 321‘ I(l—A)Q2 3212 3(622)2 maze)? 321‘ =-Q2 9‘2 96223A 962: Bar I(1—A)(w-(1-w)) 3"2 9(223901 952 32: I(w-(1-w))f1 map11 921' I-(w-(l-w))f2 ”2901 where: 32 22 f1 . r, [-1 + (y - 7.92] 22f, . at, [(y- p1)2 M615)? 23615 251 2(1 r1[ 1 _(y-h1)2] 2&1; €16 921‘1 B Bfi (y-u1)_f1(Y-M1) aw, as? 3612 of of i=1,2 When there is no sample separation information, such that regime is unknown, 0 is of dimension (5 x l) or (2K + 3) x 1 where K is the total number of explanatory variables (i.e. K I 1 in this case). It follows that 3 is of dimen- sion (2K + 3) x (2K + 3). When there is partial sample se- paration information, such that regime is partly known, 0 is of dimension (7 x 1) or (4K + 3) x 1, where K is still e- qual to 1. Therefore, S-is of dimension (4K + 3) x (4K + 3). 2.3 The Value of Imperfect Information We derive here one set of ratios -- asymptotic var- iances with regime partly known relative to regime known. We also need the asymptotic variance ratios with regime unknown. relative to regime known, but these will be simply adopted from Schmidt's study. The results are comparable, since the same simulation techniques have been employed in evaluating the information matrix. 33 All the results are presented in Tables 1 through 4, for a variety of cases. It is to be noted that asymptotic variance ratios are not given for the parameters p11 and p01, since these parameters are not estimated at all, when the regime is known. All the figures in the tables are great- er than or equal to one, and the extent to which they differ from one measures the value of imperfect sample separation information or Just sample separation information as the case may be. That is, they measure how much we lose on efficien- cy grounds in parameter estimation, when we use imperfect in- formation, or no information at all, as compared to perfect information when assigning regime classification. The main interest here concerns the effects of the pa- rameters p11 and p01, which represent the level of reliability or accuracy of the available information. In Schmidt's study, when there is no sample separation information at all, only the first five parameters were used. With the partial infor- mation provided by our additional parameters p11 and p01, we expect our asymptotic variance ratios to be less than or e- qual to his asymptotic variance ratios. After all, any piece of information on sample separability, even if not entirely accurate, may facilitate identification of regime membership for the observations, and thereby improve the efficiency of the parameter estimates, as compared to when no information is used at all. For purposes of comparison, we present Schmidt's figures in parentheses underneath the figures we derived. 34 We conduct four types of experiments here. First, for a given set of parameter values, we vary our probabilities of regime classificationg/. The different values assigned to p11 and p01 values represent the range from highly im- perfect information to almost perfect information on regime classification. In the other three types of experiments, we choose a particular p11 and p01 mix, and allow the following to vary -— the difference between the means, the variances, and the mixing parameter. Table 1 presents the results when we couple different regime classification probabilities with fixed values of the other parameters -- “l I 0, 712 I 2, 61 I 62 I 1, and A I .5. The values assigned to the means and variances are not as restrictive as they might seem, in the sense that they are invariant to translation (#1 I 0, 1.42 I 2, 61 I 52 I 1 give the same results as “1 I -6, M2 I -4, 61 I (2 I l) and to scale (ill I 0, 1A2 I 2, Si I C: I 1 give the same 2 g L" G1 " ‘2 When p11 is equal to p01, then we have the special case results as (Al I 0, A1 I 2). of there being no sample separation information at all, and the ratios derived here should be the same as Schmidt's. The difference (i.e. 41.7 versus 41.1 for "1) is presumable due to randomness in the simulation of the information matrix. When p11 is equal to p00 (where p00 I 1 - p01), that is, when __7__ - These regime classification probabilities are as- sumed constant for all observations in each case, and can be estimated (as Lee and Porter did) by maximum likelihood. 35 .nopmmno mng :H moHnmu gonna on» mom can» mH mass .mzomx mH oEHwon mom: on o>HpmHom mzocxms mH oEHmoh cons moocMHhm> oHpoaQEmmm ho mOHumn on» one mononpconmm cH momstm .OOOOOH I m 1m.mev Ae.~Hv Ae.mHv 1:.ozv AH.HeV m.=H mm.: mo.= m=.m 0H.w H. :. H.e= mm.w em.m m.:m «.mm m. m. m.m= mz.m mm.m :.mm w.mm m. H. m.mm em.m mm.m m.~m m.mm m. :. 3.2m mm.m mm.m H.mm m.mm m. h. m.Hm N.OH F.0H ~.Hm m.mm m. :. muHSmom thOHpan< Am.wev Am.mHe he.mHV A:.o:v AH.HeV mm.H om.H mm.H mH.H Hm.H mm. mo. mm.m Hm.m mm.m mm.m mm.m m. H. em.m Hm.~ aa.m He.m Na.m a. N. e.~a ea.m ae.: em.e ma.e e. m. m.mm mo.e mm.» m.mH H.o~ m. e. =.mh o.mH o.mH 2.0: ~.H= m. m. M. «M_ .Hw ma HQ Hod HHQ aflocx\ncsomxmbv mzocx zHuumm mOHpmm unoposmmwm m. I A .H I mw I HV .m I m1 .0 I Hi cons Hon Home HHQ 93.3.2? moonHnm> oHpOpm8mm< no mOprm .H oHnea 36 there are equal probabilities of correct classification into each regime, then the ratios diminish considerably, with the figures being lowest (efficiency is highest) when there is greater certainty about rightly or wrongly assigning the observation into each regime. The ratios approach one when pll goes to zero, and p01 goes to one (alternatively, when pll goes to one, and p01 goes to zero); that is, when there is almost perfect sample separation information. In this sense, the use of imperfect sample separation information leads to estimates which are almost as efficient as those derived when regime classification is completely known. An interesting observation here is that the value of information is unchanged when p11 and p01 are symmetric (i.e. pll I .2, p01 I .8 give the same results as p11 I .8, p01 I .2; alternatively, p11 I pOO I .2 give the same results as p11 I pOO I .8). This is a consequence of the identifica- tion issue referred to in Lee and Porter, such that when xlJ I x‘?‘j for all J, as they are here (they are both equal to one), then the names of the two regimes can simply be inter- changed, and this holds true when there is no sample separa- tion information and even when there is imperfect sample se- paration information. This does not really come as a sur- prise since in the normal mixture model, the only parameters being estimated in a regression sense are the means; there- fore, it makes no difference at all about having the same probabilities for right or wrong regime classification, since we can merely switch the names of the regimes. 37 Additional results show different p11 and p01 values paired together. When p11 and p01 are close together but are in the intermediate range (.4 to .6), then the ratios are highest. This occurs when uncertainty about regime clas- sification is at its peak, since the imperfect information indicates that there are almost equal chances of misclassi- fication into both regimes (p11 and p01 are close to .5). Note that at the extreme, when pll I p01 I .5, we have no information at all. When p11 and p01 are close together but are out of the intermediate range, then the ratios go down. This means that when there is greater certainty of correct regime classification into the two regimes, or when the par- tial sample separation information is quite reliable for both regimes, then the ratios decline and efficiency improves. Tables 2, 3 and 4 illustrate the case of a particular p11, p01 mix -- we choose pll I .8, p01 I .2. In Table 2, ha is allowed to vary. The results are similar to Schmidt's findings that the value of sample separation information de- pends on the natural separation of the two regimes. As the distributions become far apart (£12 increases while [41 is constant), the ratios diminish and tend to approach one. When the means are very close together, the resulting ratios show the substantial gains in efficiency when information is quite accurate as compared to using no information at all. Table 3 takes the cases where A I .2 and A I .5. The results are again similar to the earlier findings that when the distributions are fairly close to each other 38 Aoo.HV goo.Hv goo.Hv Aoo.Hv Aoo.Hv oo.H oo.H mam. mam. oo.H m AHo.HV 10H.HV Aeo.He Amo.Hv Amo.Hv oo.H eo.H mo.H mo.H mo.H m 1:0.HV 1mm.HV 1mm.Hv AHH.HV AHH.HV No.H om.H mH.H eo.H wo.H m Amm.HV 1mm.Hv ANH.HV Amm.HV Aom.HV MH.H m=.H ma.H em.H mm.H : Ame.mv Amm.mv “Hm.mv 1mm.:v AHN.:V m~.H mm.H mm.H Hm.H mm.H m 1m.mev Am.mHv Ae.mHv A=.o:v AH.H:V mm.m em.m ee.m me.m mm.m m M mmw NHW ma Ha m1 czomx\AmzochDv czocx szamm mOHpmm moposmmmm @- I OOQ I HHQ am. I K «H I N” I HV «0 I Hi C033 N: MCH%HG> moocmHnm> oHpOpQEmm< no mOHpmm .m oHnt 39 1m.wev 1m.mHV he.mHv 13.031 1H.H:V mm.m em.m ee.m me.m we.m m. AH.mme Amo.mv 1H.eHV Am.omv 1m.mev mm.m mm.m mm.m mm.m mm.m m. M mmw me ma HQ & mzomx\ficzocxmov 2302M mHuhmm mOHpmm popofimnmm w. u OOQ fl HHQ «H I N06 fl HV «N H N1 «0 N H: Swat?» 4 wCHhhmxw moocmew> oHpoumEmm< uo mOHpmm .m oHnt 4O Aom.mv AHm.HV Aem.mv Aem.Hv Amm.HV am.m HN.H mm.m mN.H mm.H : Am.mHv Ham.HV Azm.ev A-.>v Amm.mv mm.u mm.H ~>.m o>.m mm.H m Aw.mev 1e.mHV Ae.mHV 1:.oev 1H.H=V mm.m Nm.m >~.m m~.m mm.m H m mmw me ma Ha me mzomx\flmzomxmsv czomx szmmm mOHpmm , popoemmmm we I OOQ fl HHQ am. I A «H I Hb qN I N: «O I H: C053 NU MCH5HN> moommHnm> oHpOmemm< no mOHpmm .: oHme 41 ( “1 I 0, [A2 I 2) and the imperfect information is quite reliable (p11 I p00 I .8), then there are large efficiency gains in using partial information relative to using no in- formation at all. In addition, we observe that the value of sample separation information is higher for the parameters of the regime which is observed with the lower probability, in this case, regime 1. Table 4 gives results when the variances are not e- qual. The larger the difference between the variances of the two samples, the lower the ratios become. It is apparent that not only does mean disparity between the regimes contri- bute to distinct sample separation, but also disparity of the variances. Another observation here is that the ratios are higher for the variance parameter of the sample which has the smaller variance and the reason behind this is fairly in- tuitive. A surprising finding here though, is that the dec- line in the ratios as the difference between the variances widens is not monotonic for the mixing parameter when partial information is available, and the reason for this is not clear. 2.4 Summary We have studied the value of imperfect sample separa- tion information in a simple normal mixture model, where all the parameters have to be estimated. This was done under different values for the probabilities of correct regime classification. The ratios of asymptotic variances for 42 regime partly known relative to the asymptotic variances for regime known were computed. These ratios are highest when there is greater uncertainty about regime classification (p11 and p00 are in the intermediate range) and the ratios are lowest when there is almost perfect certainty about right or wrong classification for both regimes (p11 and p00 are in the extreme range). In between is a continuum of values de— pending on the reliability of the sample separation informa- tion for each regime. A variety of experiments were also conducted and these show that the value of sample separation information largely depends on how much alike the two samples are. When the sam- ples are hard to distinguish from one another, then the value of information is highest. At any rate, the presence of the partial sample separation information tends to diminish the value of any other additional information, since the figures derived are considerably lower than those when there is no sample separation information at all. These results suggest that any information should be used, even if there is uncertainty about its reliability or accuracy, since even imperfect sample separation information improves the efficiency of the estimates. Of course, the more reliable the imperfect sample separation information, the greater the gains in efficiency. CHAPTER THREE THE CASE OF NON-CONSTANT REGIME CLASSIFICATION PROBABILITIES 3.1 Introduction In the previous chapter, we considered and evaluated the value of imperfect sample separation information in a normal mixture model, where the imperfect information is re- flected through constant probabilities of regime classifica- tion. We concluded that the more reliable the imperfect in- formation, the greater the gains in efficiency, since there is greater certainty of right or wrong regime classification. We now extend that model to a switching regression case where there are at least two independent variables -- a constant term and one or more other explanatory variables. In addition, we consider the case when the classification probabilities are non-constant, and in fact, can be modelled as probit functions of the exogenous variables. The rationale behind this is that the values of the explanatory variables are highly likely to affect the regime classification of the dependent variable, increasing the reliability of the imper- fect information indicator. Consequently, treatment of the probabilities as non-constant for each observation adds more reliable information to the model and will hopefully improve the efficiency of estimation. The framework for the use of imperfect sample separa- tion information was derived from Lee and Porter who used 43 44 switching regression techniques to model a supply function for a railroad cartel. In their model, the observed regime classifications were obtained from data from a trade magazine (presumably reported with error) on whether there were price wars or not. The probabilities that these regime classifica- tions were in fact, correct were assumed constant, and there- fore independent of the exogenous variables. Their model can be improved upon by postulating that the classification probabilities are dependent on the exoge- nous variables and will differ for each time period. Taking the Lee and Porter application as a case in point, we note that their explanatory variables include a Great Lakes dummy variable and several dummy variables on structural changes. The Great Lakes dummy variable documents when the Great Lakes were made open to navigation so that the cartel faced its main source of competition. The structural changes dummy variables are used to proxy changes caused by the entry, acquisitions or additions to existing networks in the railroad industry. When the Great Lakes were made open to navigation, or when there were instances of entry and new acquisitions, we expect that there will be price cutting or non-cooperative behavior among the firms in the cartel, due to the presence of other competitors in the industry, and this will be reflected in the imperfect indicators of information -- data from the trade magazine. Using this information during each time period adds to the certainty on regime classification, as to whether’there 45 were indeed price wars or not. This raises the probabilities of correct classification and also leads to higher efficien- cies of parameter estimation, as opposed to the case when constant probabilities are applied for each time period as Lee and Porter did. Our suggested treatment of the derivation of classification probabilities as non-constant seems to be a plausible alternative to theirs in the sense that we use more information (at no extra cost of obtaining this information) in solving for these probabilities, which presumably improves efficiency. Also, their model is a special case of ours, so we can test the adequacy of their model against the alterna- tive of our model. 3.2 The Model We extend the model of the previous chapter to the case when there are at least two explanatory variables, and when the probabilities of regime classification (i.e. p11 and p00) are not fixed. Suppose for simplicity that le I x23; we call it xJ so the basic switching regression model is: = 1 yJ xJ 81 + ulJ with probability A (3.1) (regime l) = I - yJ xJ $2 + u2:! with probability (1 A ) (regime 2) $1 and 82 are vectors of parameters. The error terms u 1J and 112.1 are assumed to be independently and normally distri- buted with means 0 and variances (12 and (22, reapectively. 46 When there is imperfect sample separation information or the regime is partly known, we can then consider an ob- servability model on probability classification like: p11J ‘ F(XJ"K1) where p11.1 I Prob(wJ I l/IJ I 1) for each J I F(xJ"X0) where p00J I Prob(wJ I 0/IJ I 0) p003 for each J p10,1 ' 1 ' p11,1 ‘ 1 ’ p003 p013 where P( ) is a standard normal cumulative distribution function, and X1 and X0 are vectors of parameters. is ”J the observed dichotomous indicator which provides sample se- paration information, while I is the latent dichotomous in- J dicatcr of the actual regime classification. In essence, the regime classification probabilities are probit models of observability. This contains the Lee and Porter model as a special case, that is, all the elements of 1K1 and 1X0 are zero, except for those corresponding to the constant term. The Joint density function for yJ and wJ is then re-written from (2.3) and given as: rJ = f(yJ, wJ; 0) (3.2) I )\f1(yd)(wdp113 + (1 - wJ)(1 - p113)) + (l - A)f2(yd)(wJpC,L1 + (1 - wJ)(1 - p013)) = Af1(yJ)(wJF(xJ' x1) + (1 - wJ)F(-xj"(1)) + (1 - A>r2(y3>(wJF(-x3' x0) + (1 - ‘3)fo ‘60)) 47 where: , 2 fi(yJ) I 1 exp [:- (yJ - xJ Hi) ] J2? 61 2612 x "x _ 2 pllJ EX 3 1 :- exp [:4]va -ob ./ 1T 2 x "X 2 pOOJ I X J O 1 exp [:- VJ ] va -eb Jfi' 2 i I 1,2; J I l,...,n That is, f1(yJ) and f2(yJ) are normal probability density functions with means and variances given by N(xJ' F1, 612) 2 I U o and A(xJ 52, ((2 ), respectively, and p113 and p00J are probabilities of correct regime classification denoted as probit models. 3.3 Derivation of Asymptotic Variances When the regime is known, the asymptotic variances _ l‘ .. " ._ A 2 2 or \/n( $1 - Fl), Jn(pz - F2), /n( 61 " ‘1), JH( 822 - £22), and JET AA - A) are, respectively: 512. (11m £1732. xjxd') '1; “-7.0 A 2 li‘ I '1. _§.£_ .333 n lIxeJ) ’ 1-A I4 2361 ; A 252“;and 1-A A(1- A). 48 As in the previous chapter, there are no asymptotic var- iances for the parameters 1X1 and X0 (which enter the p113 and p00J probability functions of regime classification) since these parameters are irrelevant when regimes are completely known. The above expressions for the asymptotic variances of ’81, [$2, 212 and £22 are derived from the inverse of the information matrices from the corresponding likelihood functions of the known densities associated with the respec- tive regimes. The asymptotic variance of A. comes from that of the binomial distribution. For cases when the regime is either completely unknown or is partly known, the asymptotic variances of x/H(B - O) are derived in the same way -- from the diagonal elements of the inverse of the Fisher information matrix. Therefore, ./H(6 - 0) approaches the distribution specified by the fol- lowing expression -- N (0, lim(%— 3)-1) . The information matrix is defined as: §=-E 221nL 3929' where: A leg‘rj M lnLIZ lnf In J Therefore, when the regime is completely unknown, f corres- ponds to the density function of (2.2) given as: f(y,; 0) = Ar1r2(y3)(w3(1 - chJ'xon + (1 — wJ)F(xJ'XO)) O is a (4K + 3) x 1 vector of parameters given by ( 51" 52', 612, (22, A, ’61', 30')' where 81 and fizare defined as previously; and ‘X1 I ( Xll’ 'K12,..., 1X1K)' and X0 I (K01, X02,..., XOKV are additional parameters. As in the earlier chapter, the expected value of the expression that represents the information matrix was analy- tically intractable so that we instead calculate the informa- tion matrix by simulation techniques in either of two ways: I 2 (1) 3 , . JSI II I t!) M r-—'—I le Q) “I I ll tr] M 3 r———-v h Q) :3 "9 V A Q) H :3 \—/ .__L (2) 3 ll tli 1M: r———"1 "" I H N A Q) “J V A Q) *5 v L.____I 50 The second method of calculation follows from the first method, in the sense that, in the limit, the expression , 2 - E Z [ l 3 fj] goes to zero. In addition, the second J" — a 039' % method has the added advantage of being positive definite always, and not Just in the limit. For this reason, we choose the second method of calculation, and throughout the experi- ments we will be conducting, the information matrixi/ is to be calculated as follows: >=E3[;L(Zflqtfliy] 1" r2 30 39 J For the case where regime classification is complete- ly unknown or when there is no sample separation information at all, the first derivatives of f(y, 9) (we drop the sub- script J for simplicity) with respect to 0 are: 21' = ) 3f1 3“1k 351k ar =<1- x) af2 2F2k 362k 31' = ) 3f1 3‘12 3612 —_T— — The expressions for the elements of the information matrix when the first method of calculation is used are also derived and are given in Appendix A. Since § is known tion, are: 51 21' -(1->.) 3f2 ""7 arr-r -r where: 3f1 = 1‘1 (y‘x'f’i)xk 3% = f1 [(y’x'91)2-1] 72 51 1 = 1,2; k = l,2,...,K G is of dimension (2K + 3) x 1, then it follows that of dimension (2K + 3) x (2K + 3). For the case where regime classification is partly or when there is imperfect sample separation informa- the first derivatives of f(y, w; G) with respect to 0 91‘ = )Q1 Bfl 351k 3‘31k 21‘ :=(1-).)c;22 3f2 232k aka 3r = ”Q1 3f1 @612 3&1: 3r =(1-)s)Q2 3f2 52 9f I f 2A - f 1Q1 2Q2 Bf - “‘1‘" - (1 - w)) NW“ 1) 9‘11: 331k 9: - - <1 - k >f2\a $11, $12, ‘601, ‘02). (we chose 0 = (1, l, 2, 2, l, 1, .5, l, -l, l, l)' for regime partly known), we initially compare results for the ratios of asymp- totic variances when we have n = 5000 and n = 20000. Although there are differences in the absolute magnitudes of the fi- gures which range from .1 to .7 for both regime partly known and unknown, the difference in computer costs makes us opt for the smaller sample size, since the relationship among the relative magnitudes prevails.fl/ We therefore made use of a sample size of 5000 for all our experiments. We first need to establish the non-informative case. In the previous chapter, we had discussed an implicit "infor- mativeness" condition in the model. When p113 = l and pOOJ = 1, then the indicator wJ provides perfect sample i/F’or the same 9 values, we also compare results under both methods of solving for the information matrix. There are differences in absolute magnitudes that become smaller as the sample size increases from 5000 to 20000. Under the first method, the differences in the ratios between the two sample sizes range from .1 to .4, while under the second me- thod, it is from .1 to .7. It is expected that as the sam- ple size increases some more, the absolute difference between the two methods will decline. Although the absolute magni- tude differences persist, the relationships among the relative magnitudes are fairly constant. This at least partially Jus- tifies our choice of method 2 for calculating the information matrix and a sample size of 5000. 55 separation information. Partial sample separation informa- tion is given by wJ when p11.j is not equal to p01J, which is equivalent to the condition that pllJ # 1 - p003, or that pllJ + p003 # 1. This implies that wJ provides no sample separation information at all when p11, = l - p00 , or a J p113 + P003 ‘ 1' In terms of our model, where p , and p are denoted ll.J 003 as probit functions, then the "informativeness" condition can be expressed as a simple restriction on the parameters le and ‘KO, which enter our probit models of information obser- vability. Information is not provided when ‘K1 3 - X 0 since = . 1 ' a then, p113 + p003 1, that is, F(xJ X1) + F(xJ ‘60) I 13' _ ' 3 ' F(xJ ‘X1) + 1( xJ X1) 1, for any xJ , where F( ) is a standard normal cumulative distribution function. Combina- tions of parameter values where K 1 = - ‘K 0 can be illustra- ted by any number of examples. A case in point where no in- formation is provided is when ‘Kl = ‘Ko = 0. This implies vs a: ' a a: a ' = that p111 F(xJ ‘61) 11(0) .5 and pooJ F(xJ ‘60) F(0) = .5. This was the non-informative case we had in the previous chapter where the probabilities of regime classifica- tion were assumed constant, i.e. p11 = p00 = .5. We now proceed with the first type of experiments we have to conduct, where for given K values, we vary our F pa- rameters. We choose X= (l, -1, 1, l)‘ where there is some information provided, i.e. 'X1_# -‘X0. The first case is when we allow the 6 parameters of regime 2 to deviate 56 uniformly from regime 1, such that $2 =- h 51 (h - 1,2,“). We hold 612 - 622 - 1 and )v - .5. The results are pre- sented in Table 5. Figures in parentheses are the ratios when the regime is unknown relative to when regime is known. When Fl 3 F 2 so that the regression equations are the same for both regimes, the presence of information given by X 1 and ‘60 greatly improves the efficiency of the esti- mates. With no information at all, the ratios go to d», since the samples are impossible to disentangle while the ra- tios are finite with some information available. An interest- ing observation here is that the value of sample separation information is much greater for the slopes than for the in- tercepts when the regime is partly known. For the case of the estimated mixing parameter,)\, the value of information for regime partly known is co , and for regime unknown is 0. There is no meaning that can be attached to this parameter in this instance, since the samples are difficult to distin- guish from each other anyway. The choices for $1 and (32 are of course restrictive. However, note that the results are invariant with regards to location and scale, as long as $1 8 B 2 and 612 .- (22. $1.‘ (1, 0)‘, $2 - (1, 0)’ gives the same results as 91 - (1, 1>', 92 - (1. 1)'. When $2 - h $1 (h f 1), so that the intercept and slope of one equation move away from the intercept and slope of the other equation by the same proportion, then the value of sample separation information decreases monotonically as 57 can» a“ mass .mmpmano man» CH moHnmu nosuo on» you .czocx mH oEHmon con: 0» o>HpmHon czocxc: mu oEHwon cogs moccaana> OHpounEhma no moapmu on» and mononuconam ca nonswam .ooom I : Ao.Hv AN.HV AN.HV Am.flv Am.av Am.mv Am.Hv o.H NIH H.H m.H NIH m.m m.H z a H H Am.mv Am.mv Am.mv Aw.Hv Ao.zv Am.mv Am.zv o.m N.N o.N w.H :.m 5m m.m m N H H on AQV A%V Aiv AQV Aiv AQV on m.m m.m Nil. 56 m.i. om H H H H on “any A 9; Aoav Aoov Aasv Aosv o. m.m m.m mil. 56 NJ; w.m o H o H K< NNM. NHM NmmK Hmm mHm HHw mmm Hmu NHm HHn :30:M\Aczocx:Dv caoqx szpmm mOHpmm mnmpoEmpwm .AH .H .H- .HV n x .m. u A .H u we . He ..AH .HV u an can; Ada a . mu V : mcfispas mmocmHmm> OHqumme< mo mOHumm .m mHnme 58 h increases. Note the large decline in the ratios of var- iances for the estimated slopes as soon as the samples be- come distinct from each other. When the intercepts and slopes of the two equations are sufficiently far apart, the ratios when the regime is partly known or is completely un- known tend to approach one. In addition, the ratios tend to equal each other in both cases of observability so there is very little value in obtaining sample separation informa- tion or using imperfect information (when available), when the regression equations are clearly distinguishable. The second case we consider is when the regression equations are made distinct from each other by moving the slopes away, but keeping the intercepts constant. The re- sults are presented in Table 6. The value of sample separa- tion information decreases monotonically, as $22 increases with $12 I 0. Again, the decline in the ratios is very steep as soon as the samples are made distinguishable, i.e. 6 - (o, o, o, 0). and fi - (o, o, o, 1)'. As the slopes move farther away, the decline in the ratios is not very great, or is rather slow. As before, there is very little value in obtaining sample separation information or using imperfect information when the samples are clearly distinct, since the ratios with partial information and with no infor- mation at all tend to equalize. The third case is when the regression equations are made distinct by moving the intercepts farther away, but keeping the slopes constant. The results are in Table 7. 59 Am.HV A=.Hv A=.Hv A».HV Am.mv Am.mv Am.mv s.H m.H m.H m.H o.m m.m m.m s o o o Am.mv Am.HV Am.av Am.Hv AH.mv Am.mv As.mv m.m >.H w.H ~.H b.H h.m m.m m o o o A~.mv Am.mv Am.~v “o.mv Am.mv Am.mv As.mv m.: o.N m.H m.H h.m m.m m.m H o o o on “coy Aoov Aoov Acov “any Antv st m.m m.m m.ss s.m ~.ss m.m o o o o K< mmw NHM. mum Hm@ NHMW HHm Nmm Hmm «Hm HHQ csocx\flczocxcbv csocx mHuuwm soapsm mumposwpam .AH .H .H. .Hv an. «m. "K «H n NV I HV nuAmmu «O «0 «0V "m 2023 mmu wCHznw> moosznm> oHuomeam< no moprm .m sfissa 60 Ao.Hv AH.HV Ao.HV Am.Hv Am.Hv As.mv Am.fiv o.H H.H o.H m.H m.H s.~ m.H o m o o Am.HV Ao.mv Am.av Ao.mv Am.mv Am.mv Am.m~ ~.H s.H s.H o.m o.m m.~ o.m o s o o Ao.mmv Am.mfiv Ao.mflv Am.mv AH.HmV Am.mv AH.HmV a.» m.m m.m m.m m.m m.m m.s o m o o Am.mmmsmv Am.mflsv AH.mosv As.mv A=.mmmmv Aa.sv A=.ommmv o.os H.m o.m s.m s.mm 0.: m.sH o H o o 2: Ac: :8 Aaov 33 no; 33 a. m.m m.~ m.ss 5.0 m.=s m.m o o o o m H mm Hm «H as mm Hm NH HH A. N w L a s a s s s s s csocx\Aczocx:Dv csozx meumm mOHpmm muoooamnmm .AH .H .H- .HV um” mmocwfinm> OHpOmemm¢ no moHpmm .s mflpss «m. I K «H I NV I H% q.Ao «Hmu «0 «CV I& 2033 Hmm wCHhhw> 61 Here, the value of sample separation information declines but the decline is not monotonic for the estimated intercepts and variances; the decline is monotonic for the slopes though. This same observation was also found in Kiefer's study (1979) of a normal mixture model. When the intercepts are close together (in this case, they are equal), wrong classification does not seriously affect the quality of the estimates. Then, as the intercepts move farther away, the effects of misclassification become more serious and the es- timates suffer. When the intercepts become still farther a- part, the probability of misclassification becomes so small so that the estimates become almost as efficient as estimates based on known sample separation. As the intercepts move a- way from each other, the decline in the ratios is more sub- stantial, or faster as compared to the case when the inter- cepts are held constant, but the slopes are moved farther a- part. Again, there is very little value to obtaining infor- mation when the regression equations are clearly distinct, (since the ratios with partial information and with no infor- mation at all tend to equalize. The very large values of the variance ratios for the estimated intercepts, variances and mixing parameter, when the samples are sufficiently close and when there is no in- formation at all, seem to suggest that the intercept is a more important component of the regression equation in determining separability of the two distributions, as compared to the slope. It is more difficult to distinguish one sample from 62 the other when the intercepts are close together rather than when the slopes are. Compare the cases of' B-=(0, 0, 0, l)' and fl - (0, 0, 0, 2)‘ in Table 6 as against the cases of {3- (o, o, 1, or and (3- (o, o, 2, 0)' in Table 7. Note that our values for $1 and $2 are restrictive, but they are invariant with regards to translation, as long as the other parameters in G are not changed. 31 - (l, l)', B2 - (2, l)' gives the same results as. 91’- (0, 0)‘, F2 3 (l, 0)‘. A related observation is that $1 - (0, 0)‘, f2 - (2, 0)‘ gives almost the same results as $1 I (0), 92 I (2) where the latter comes from a normal mixture model. The ra- tios in the former are slightly bigger than the ratios in the latter, since there are more parameters to estimate in the former, even if $12 - = 0.§/ When we estimate a nor- 322 mal mixture model, the ratios corresponding to ‘311’ $21, 212, 222 and a are 14.1 (50A), 4.9 (50.5), 2.6 (116.9), 3.7 (16.2) and 7.0 (98.9), respectively. When regime is un- known, the ratios Schmidt (1981) derived in an earlier paper are very similar to the above figures in parentheses. The only difference is that Schmidt's ratios are smaller (i.e. 57 — Note that when a row and column corresponding to a certain parameter is deleted, this implies that either the model does not contain this parameter, or that the parameter is part of the model but is known a priori and need not be estimated at all. In the former case, the value of infor- mation is more important when the model is more complicated, or when 0 has more parameters even if both models are pre- sented with the same amount of information in X and ‘6’ . In the latter case, when some parameters are known a priori and need not be estimated, resulting ratios are lower since they understate the true value of information. 63 Nl.l, “0.4, 12.7. 12.6 and 78.8 as presented in Table l of the previous chapter) presumably due to a larger sample size (n - 100000) so that the results are much tighter. We now turn to the second type of experiments we will be conducting,§/ that of varying the X parameters given particular 8 values to find out the effects of different le- vels of observability on the efficiency of parameter estima- tion with fixed regression parameters. We had earlier established that the intercept terms are more important than the slope coefficients in determining regime classification since ratios tend to be higher (the va- lue of sample separation information is more important) when the intercepts are moved farther away, rather than when the slopes are moved apart. For this reason, we choose a 9 set equal to (0, 0, 2, 0)‘ where the slopes are equal but the intercepts are different. Note that 6" (0, 0, 2, 0)‘ is in- variant with regards to transformation to some other F forms, i.e. (5: (2, 2, u, 2)' and (3- (2, o, a, or. Our first case is presented in Table 8. Given ‘61, the X 0 combinations are arranged from highest ratios (least information so most inefficient) to lowest ratios (most in- formation so most efficient). When X1 = - ‘6 0’ this is élThis is the extent of our experimentation in this chapter. We will not attempt to change the variances nor the mixing parameter, since the earlier chapter had already established the results for these cases; that is, the value of sample separation information is higher for the parameters of the regime which is observed with the lower probability, and higher for the variance parameter of the sample which has the smaller variance. 6H m.m m.~ m.~ m.H 0.: m.m =.= H HI H HI s.m =.m m.m m.H H.: m.m =.: HI H HI H H.» H.m =.m w.m ~.= m.m m.m HI H H H H.» m.m m.m m.m m.m N.m w.: H H HI H m.m m.m m.m m.m >.m N.m o.m o H HI H :.mH =.m 5.: m.m H.HH m.m z.HH H 0 HI H o.mm m.mH m.:H w.m H.Hm m.m H.Hm H HI HI H .m‘ NNM‘ NHM mmw Hmw NHM/v. HHW Now Ho¥ NH” HH” czocm\csosx szpwm moprm mumpoemmmm 2H- .Hv I x .m. .x «HINW IHw q.Ao «N no soy Ia can: on wchas> moocdHum> oHpopqum< no moprm .m eHnse 55 the case of no information and when ‘61 I X 0’ this is the case of the most information. Given X1, the value of information is higher when we change the slope of the probit model, [02 (keeping X01 I 0) rather than the intercept of the probit model, X01 (keeping ‘6 02 - 0). This implies that the intercept term in the probit function is more important in increasing the ef- ficiency of the parameter estimates given the available in- formation. Given X 02 and ‘61, ratios are lower when ‘01 is higher so that there is more efficiency here, and ratios are higher when X01 is lower so that there is less efficien- cy. The transition from least information (smaller X01) to most information (larger X01) improves efficiency when ‘0 is closer to X1 values. The most efficient estimates occur when X1 I 60. This implies that the quality of estimates is best when there is equal certainty for the sample separa- tion to be correct for both regimes. X1 I (l, -l)', ‘60 I (l, —l)' is invariant to ‘61 I {-1, l)', to = (-1, l)'. This reflects the fact that ‘61. I - ‘6 1 and $0“ I - ‘6 0 result in the same value of sample separation information as did ‘6 1 and X0. This follows from the "non-informativeness" condition on ‘1 and ‘6 0 when X 1 I - K0. By the same reasoning, the information reflected in $1 is no different from that in {1* (and likewise for K0 and ‘60“) when KfI-‘éland ‘60:... X0, when‘élI ‘0. This follows from the relationship that: p11* + p11 I l and p00. + p0O I 1. 0n the other hand, 6 1 I (1, -1)', 66 $0 I (1, l)' is not invariant to ‘Kl I (1, l)', ‘60 I (1, -l)'. That is, X1“ - ‘6 o and ISO“ - ‘61 do not imply the same value of sample separation information, when ‘6 l I ‘6 0' We examine next the case when \‘11 I - 3 01, so that the intercept terms imply no information, and we vary the slope terms. The results are presented in Table 9. The ra- tios are again arranged from highest (no information) to low- est (most information). The classification probabilities im- plied by the ‘5 1 and ‘3 0 parameters become higher (so that ratios become lower and efficiency improves) as 612 and ‘6 02 assume non-zero values. Lower ratios result when ‘602 is non-zero (keeping {12 I 0) than when ‘612 is non-zero (keep- ing ‘6 02 I 0), since the probabilities implied by ‘61 I (1, l)', (0 I (-l, 0)‘ represent a wider divergence in pro- babilities p11 and p00 than that given by the combination '61 I (l, 0)‘, ‘80 I (-l, l)' due to the fact that the pro- bability associated with ‘Kl I (1, l)' is higher than that of 60 I (-l, l)'. It is to be noted that the wider the differ- ence in probabilities p11 and p00 (particularly in the inter- mediate range of probability values), the less the certainty. there is on information about regime classification, and it follows that the estimates will be less efficient. The ex- ception here is the non-informative case of p11 I poo I .5, where there is no difference in the probabilities but effi- ciency is lowest (since it is non-informative). The additional information provided by the non-zero ‘12 and 6 02 parameters improves the efficiency of the 67 m.m w.H N.N w.H o.m w.m H.= H HI H H m.m m.m m.m H.m m.m m.m s.m H HI 0 H H.mH m.: :.m HUN m.0H m.m m.mH 0 HI H H o.mm m.mH m.:H w.m H.Hm m.m H.Hm 0 HI 0 H M NNM. NHM‘ NNM Hmm NHiw_e HH»e Now How NH” HH” czocx\zso:x szmmm moprm unoposmnmm .ANON «HI «NHV «Hv I” .m. a .H u m u H .. o .m .o .o u can: we .NH msHass> A m w A V w » mmocmem> oHvopmammH no moHpmm .m oHnt 68 estimates as Opposed to the case when only ‘611 and ‘601 are assigned non-zero values (and ‘612 I ‘6 02 I 0). In effect, this implies that modelling the classification probabilities in such a way that they are not constant for every observa- tion increases the quality of the estimates, compared to the case in which these classification probabilities are fixed for all observations ($12 I ‘6 02 I 0). We have earlier shown that when X 1 I X 0’ so that the classification probabilities are equal, the ratios of a- symptotic variances are lowest. This is the next case we consider, the results of which are shown in Table 10. Again, we start with the non-informative case, where 61 I ‘6 0 I 0. As the ‘61 and ‘60 values increase in magnitude, the classi- fication probabilities associated with them increase too, and there is more information as the implied probabilities get higher (i.e. p11 I pOO approach one). The ratios decline mo- notonically as the implied probabilities rise, and when these probabilities are sufficiently high, the quality of the es- timates approximates that when there is perfect information, and the corresponding regimes are fully known. The last case we evaluate is when we try to approximate the X 1 and ‘6 0 values that will duplicate our results in the previous chapter, where classification probabilities were fixed. We test our model with non-constant p11 and p00 a- gainst the alternative of constant p11 and p00’ which is ac- tually a special case of our specification. The results are in Table 11. In particular, we have F(.8416) I p11 I p00 m.H m.H m.H m.H m.m m.m :.m H H H H m.m m.m m.m w.H m.m >.m m.m m. m. m. m. o.mm m.wH m.=H w.m H.Hm m.m H.Hm o o o o 69 < N < < < < < czonx\czocx mepmm moHpmm mpopoemnmm m. I K .H I Nb I He .20 .m .o .3 In cons Ac» .- Hwy» mustang, moozmHnw> oHpoerzm< no moprm .OH oHnt 70 mHnaoHHan no: I .a.: =.m m.m m.~ .w.: s.m .m.c N..m .m.: mHzm. .w.: mHaw. a.m~ :.NH o.mH .m.c 3.0: .m.: H.H: .m.: o .w.c o OOOOOH n c m.m H.m m.m m.m m.: o.m m.: .m.: szm. .w.: mHzm. m.m H.m m.m m.m m.: m.m o.m o . mHnm. o wHHm. o.mm m.wH m.:H m.m H.Hm N.m H.Hm o o o o ooom I n A< NNW NHM Nmm‘ Hmm NHM HHm Now How; NH» HHX :30:M\:3ocx szpmm moHpmm mumpoEdmmm m. I’A «H u Nb I Hb n.no «N no «0v um Cm£3 m. I com I HHQ new m. u A0» .xvm n AH» .xvm mo momHhmano mooctha> oHpoumezm< no moHpmm .HH oHan 71 I .8; in terms of our probit models, 612 I X02 I 0, so that p11 and p00 are now constant for all observations. We have two basic experiments here -- when we delete and do not delete 1'12 and X 02 from the model. When they are not deleted, they are set equal to zero, but implicitly still estimated. Both results as well as the non-informative case are reported here, and we compare these figures to our earlier results patterned after the Lee and Porter model where p11 and p00 are fixed at .8 using a sample size of n I 100000. As Table 11 shows, when ‘6 12 and ‘6 02 are not deleted, the resulting figures are slightly larger due most probably to the fact that we estimate more parameters in the model so that efficiencies may suffer. When we compare our model with the deleted ‘6 parameters to our fixed probabilities specifi- cation of the earlier chapter, we observe that the ratios we derive now are larger than those we derived before. This could be due to a number of reasons. First, we now have more parameters to estimate in 8, i.e. 6 I ( 811, 312, 821, P22)' as against P I ( 811, 8 21)’ in the earlier chapter. Second, we now use a smaller sample size so that the resul- ting figures may be less tight. Lastly, we employed diffe- rent methods of evaluating the information matrix in both cases. All these reasons could account for the differences in the absolute magnitudes of our ratios, although the rela- tionships among the relative magnitudes are quite similar. This second set of experiments we have Just conducted 72 on varying X for a given set of f has highlighted two main observations. First, invariance in the ratios occurs when (1* I - $1 and $0” I - to for 61 I X0; and when X1“ I - b 0 and ‘60‘ I - ‘61 for ‘61 f 60. There does not seem to exist any form of multiplicative or additive transforma- tion for X where invariance may result in the derived ratios, since any other change introduced to the Y 1 and ‘6 0 parame— ters will lead to probability changes reflected in F(x"61) and F(x' ‘5 0). Second, when evaluating the X 1 and ‘6 0 pa- rameters, it is to be remembered that when ‘6 1 and ‘6 0 are closer to each other, it follows that the probabilities F(x"61) and F(x"XO) are also closer. This means that there is almost equal certainty of proper sample separation into the two regimes, so that the information is quite reliable and efficiency improves. At the extreme, 61 I X 0 and ef- ficiency gains are highest, particularly when the probabili- ties implied by these parameters belong to the extreme range. At the other extreme, when 6 1 I - ‘6 0 there is no informa- tion at all in the regime classification information. 3.5 Summary We have improved our earlier model on the value of im- perfect sample separation information by allowing more exo— genous variables in the switching regression model and by postulating that the classification probabilities are non- constant. As in the earlier model, all the parameters have to be estimated. The latter extension where the classification 73 probabilities can be modelled as probit functions is aimed at providing more information and flexibility to the model since the probabilities of regime classification are de- pendent on the exogenous variables at each observation. Two basic types of experiments were conducted using simulation techniques applied on a large sample size. First, we vary the 8 parameters for a given information level (de- noted by the 6 parameters) to find out the effects on effi- ciency of estimation of varying the degree to which the sam- ples are separate. Second, we vary the 6 parameters for a given set of 8 parameters to evaluate the effects of diffe- rent levels of information observability, given a particular sample distribution. Among our findings, the following two are most impor- tant. (l) The use of information, even if imperfect, still presents large gains relative to when there is no information at all. Naturally, the more reliable the imperfect sample separation information, the greater the gains in efficiency, where the reliability of the information can be evaluated by the ‘61 and X 0 parameters. (2) The value of imperfect sam- ple separation information largely depends on how much alike the two samples are. When the samples are hard to distinguish from one another, then the value of any information is high- est. If we consider the 8 parameters as denoting sample se- parability, the intercept parameters are more important than the slope parameters in determining how distinct the samples are from each other. CHAPTER POUR THE CASE OF NON-CONSTANT REGIME CLASSIFICATION PROBABILITIES AND NON-CONSTANT SWITCHING PROBABILITIES 4.1 Introduction In the preceding chapter, we evaluated the value of imperfect sample separation information in a switching reg- ression model with two exogenous variables, where the pro- babilities of regime classification are non-constant. We argued that such a specification has its merits in the fact that more reliable information on sample separation is pro- vided at each observation. This implies that the values of the exogenous variables do affect the chances of prOper re- gime membership given the actual regime, so that the observed imperfect indicator of sample separation is a more accurate measure of the latent perfect indicator at each observation, when the regime classification probabilities are non-constant. However, we assumed then that the switching probabi- lities were constant for all observations. That is, the pro- bability that each observation is generated by a particular regime is fixed. We now re-formulate this assumption to take into account that the switching probabilities are non-con- stant, and can also be modelled as probit functions of the exogenous variables. The rationale behind this is fairly intuitive -- certain values of the exogenous variables have higher chances of being associated with observations which 74 75 are generated by a particular regime, while other values of the exogenous variables are better associated with observa- tions generated by another regime. Therefore, the values of the explanatory variables affect the probabilities of actual regime classification ()\), and not Just the proba- bilities of presumed regime classification given the actual regimes (p11 and poo). While the preceding chapter explored the latter approach, we now deal with the former possibility as well as the latter. In terms of the Lee and Porter railroad cartel stabi- lity model, the explanatory variables include: (1) a Great Lakes dummy variable which represents when the Great Lakes were made open to navigation so that the cartel faced its chief source of competition; and (2) several structural changes dummy variables which represent the entry, acquisi- tions and additions to existing networks in the railroad in- dustry. When the cartel faced its main source of competition or when there were significant structural changes in the industry, we expect these events to affect the occurence of either collusive or non-collusive behavior within the cartel. This implies that these explanatory variables affect not only the probabilities of proper regime classification given the true regime (i.e. whether price wars were probably occuring or not), but also the probabilities of actual regime classi- fication (i.e. whether price wars were really occuring or not). We postulate here that switching probabilities or 76 probabilities of actual regime membership assume non-constant values for all observations, which introduces more flexibili- ty to the model and improves the model's ability to classify observations based on the values of the explanatory varia- bles. Our model here on non-constant switching probabilities can also be extended to consider our past models with a con- stant mixing parameter as a special case, so we can compare the performance of those models against the alternative of our present model. 4.2 The Model We still maintain the basic switching regression model of the previous chapter but we now designate the switching probabilities as non-constant. Therefore, our model can be expressed as: yd = XJ' 81 + 111.3 with probability )\ (4.1) J for observation J (regime l) a ' + - yJ xJ $2 u2J with probability Cl Ad) for observation J (regime 2) 91 and 82 are (K x 1) vectors of parameters corresponding to the explanatory variables of the (K x n) matrix x. The error terms UIJ and 1.12.j are assumed to be independently and normally distributed with means 0 and variances Ciz and 62?, respectively. The non-constant switching probabilities 77 can be modelled as probit functions of the exogenous varia- bles. That is, A.) '3 F(XJ'QI) 1 - )1 I l -F(x3'5¢) I F(-xJ'G,) where F( ) is a standard normal cumulative distribution function, and Q is a (K x 1) vector of parameters. This con- tains the constant switching probabilities model as a special case where all the elements of Q; are zero, except for that corresponding to the constant term. In the present model, we still retain the assumption of the previous chapter regarding the treatment of the regime classification probabilities as non-constant. Therefore, we have the following probit models on probability classification: pllJ I F(xd"61) where p113 I Prob(wJ I 1/IJ I 1) for each observation J p00.j I F(xJ'XO) where p00J I Prob(wJ I O/IJ I 0) for each observation J where F( ) is again the standard normal cumulative distribu- tion function, and X 1 and ‘60 are (K x 1) vectors of para- meters. wJ is the observed dichotomous indicator which pro- vides sample separation information, while IJ is the unob- served dichotomous indicator of actual regime classification. When there is imperfect sample separation information, the Joint density function for yJ and wJ can be re-written from (3.2) as: 78 f.j . f(y,. wJ; 0) (4.2) th1(yJ)(wdan + (l - wJ)(1 - 13113)) + - F(xJ'Q )f1(yd)(wJF(xJ"61) + (l " wJ)F(-XJ'X1))+ F('XJ'Q )f2(yJ)(WJF(-XJ' XO) + (1 "' WJ)F(XJ'XO)) where , 2 fi(YJ) I 1 em I? (y: ' xi 31) ] Jfi-Ci 25,2 x3"Xc) _ v 2 as -S 3_ [—41—] as -:fi ./21' i I 1,2; J I l,...,n f1(yJ) and f2(yJ) are normal probability density functions with means and variances given by N(xJ' 81’ 612) and N(xJ' 82, 622), respectively. k3 is represented by a probit model of the actual switching probabilities; and pllJ and p00J are represented by probit models of the presumed classification probabilities given the actual regimes. 79 4,3 Derivation of Asymptotic Variances When the regime is known, the asymptotic variances _ A — A — p 2 2 Of Jn($1- $1), Jn< 62- B2): s/n(€1 - 61), _ _ A Jn(’@22 - 622), and ./n(Q - Q ) are, respectively: 2 l “ -l , ‘1 (3.333 HI; Ixixi') ’ u 262 12" (1 - )3) ; and n I 2 I “1 11m $2 (NJ:J 6)) (xij ) . m. ’” F(xJ'Q)(l-F(XJ'Q)) The asymptotic variances for (31, ’82, 812, and 1&22 will reduce to the corresponding asymptotic variances given in the previous chapter if )\ were constant. However, our switch- ing probabilities are no longer constant in our present spe- cification so that we have different values of >IJ for each observation. Since }\J I F(xJ' 4,) which is a probit model, then the asymptotic variance ofe, corresponds to the asymp- totic variance of the parameters in a standard probit model. The above expression was derived from Judge et. al. (1980), and Ashford and Sowden (1970), where E( ) is a standard normal probability density function and F( ) is a standard 80 normal cumulative distribution function. There are no a- symptotic variances for the estimated parameters 71 and '%0 since these parameters are not relevant at all when the regimes are fully known. As in the previous chapter, the above expressions for the asymptotic variances of 81, 82, R12, and 822 are derived from the inverse of the informa- tion matrix, where this information matrix corresponds to the likelihood function for the case of known regimes. For the models where the regime is either completely unknown or is partly known, the asymptotic variances of I/HYE — 0) are derived in the same manner -- from the diago- nal elements of the inverse of the information matrix. It follows that ./fi(0 - G) approaches the distribution designa- ted by the expression N (0, 1131(233) -1) . II-veo n The information matrix is defined by the following expression: _ 2 3 - - E 29 In L 3939' where: .I‘r L J“ a 1n L I 35 1n f 1:! 3 Therefore, when the regime is completely unknown, f corres- ponds to the density function of (3.3) given as: f(yjs 9) I )Jf1(y3) + (1 - BJM'ZWJ) (4.3) 0 is a (3K + 2) x 1 vector of parameters, where K is the 81 number of explanatory variables. Therefore, 9 I ( $1}, 82" 2 2 ‘1 ’ 62 ’ 4'" "he” 91 " (911’ $12’°°" elk)" (2 g I g I ($21, 622,000, 92K) and Q (£1, €2,000’ 6K) 0 When the regime is partly known, f corresponds to the densi- ty function in (4.2) given as: f(yJ. WJ; 9) I F(xJ'Q)fl(yJ)(wJF(xJ'X1) + (1 - w3><1 - Fuj'xlm + (l - F(xJ'% ))f2(yJ)(wJ(l - F(xd' ‘60)) + (1 - WJ)F(XJ'XO)) 0 is a (5K + 2) x 1 vector of parameters given by ( 81', F 2', 612: 622, 6', Xl', XO')' where the vectors (:1, 82, and Q have been defined as previously; and 161 I (311, X12,..., 1110' and X0 = (101’ X02""’ X0K)" To facilitate comparison of the results here with those of the preceding chapter, we calculate the information mat- rix in the same manner using similar simulation techniques. The information matrix will be evaluated in the following way: ., 2 »._Ez[a 1:113] :18! aeuac' «I? 1 221 21H ‘ [37(39)(90 We therefore need to derive the first derivative ex- pressions of f when regime is either unknown or partly known. For the case when regime classification is completely un- known (there is no sample separation information at all), the first derivatives of f(y; 0) (we omit the subscript J 82 for simplicity) with respect to O are: a: =F(x'4.) as1k 31‘ 2)52k a: 2 961 a: 2 262 Bf; 331x I at. I (1 - F(x Q )) 2 252k I F(x'&,) afl 351 <1 - F 2325 as, 21‘ = (r1 - r2) amx'é) ask bék where: —EE:H-‘-i:§ “” X'Fi)ih< 951k 61 _a_£____ [a ”In? 3612 2612 «,2 BF(x'Q) = awe ) xk Eek i =1,2;k:=1,.u,K where E( ‘1] ) is a standard normal probability density function. Since G is of dimension (3K + 2) x 1, then 3 is of dimen- sion (3K + 2) x (3K + 2). For the case when regime classification is partly known due to the presence of imperfect sample separation in- formation, the first derivatives of f(y, w; G) with respect 83 to 0 are: 21‘ a Font, )Ql Dfl a51k afilk at“ = (:L-F(x'€t))Q2 3f2 252k 252k 2: = F(x'Q )Ql 31-1 9612 3&1: Br = (1-1='(x'€))Q2 3% 9&22 962: 21' = (rial - r2Q2) Move) aek ask a: = F(x'Q )rlm - (1 - w)) 3“” I‘1) 361k 981k 9r = - (1 - F(x'¢, ))f2(w - (1 - 24)) 33“!"0) 360k 360k where: Q1 I wF(x"Xl) + (1 - w)(l - F(x"Xl)) 02 I w(1 - F(x'XO)) + (1 - W)F(X' ‘60) afi I 1.i (y--x'81)xk 351k 61 afi a fi [(y’x'51)2-1] —'2' "‘2' 2 261 2‘1 61 BFIx'Q) = some) xk 9Qk 84 ’aF(x'%s) I Nx'Xs) "k 268k 1 I 1,2; k I l,...,K; s I 0,1 where fl( ) denotes a standard normal probability density function. G is of dimension (5K + 2) x 1, so that it fol- lows that 9 is of dimension (5K + 2) x (5K + 2). We follow the simulation techniques of the preceding chapter.1/ Using a sample size of n I 5000, and faced with specific parameter values, we draw observations from the switching regression model using a normal random variable generator. We evaluate the information matrix by averaging the expressions derived from the first derivative components of the density functions, when regime is either unknown or is partly known. The asymptotic variances are the corres- ponding diagonal elements of the inverse of the information matrix. 4.4 The Value of Imperfect Information We again derive here two sets of asymptotic variance ratios for each experiment -- one, with regime partly known relative to regime known; and two, with regime unknown Z/In the preceding chapter, we showed that the infor- mation matrix can be evaluated in two ways. We evaluated it by the second method, using first derivative components of the appropriate density functions. In our present model, we followed the same method in order to facilitate a comparison of the simulation results. However, we can also evaluate the information matrix using the second derivative components (although we did not do this) as we did in Chapter 2. For the reader's interest, the expressions are shown in Appendix B. 85 relative to regime known. A comparison of these two ra- tios will show how much more efficient our estimates will be when we use partial information as compared to no infor- mation at all in determining sample separability. Under- standably, the ratios in the former case will all be less than or equal to those in the latter case (they are equal when the partial information is not informative at all). All the ratios will, however, be greater than or equal to one (they are equal when the estimates derived are as effi- cient as full information estimates), and the extent to which they differ from one indicates the value of informa- tion or imperfect information, as the case may be. We maintain the use of two exogenous variables x1 and x2, where x1 is a unit vector and x2 is equal to exp (- x3), where components of x3 are distributed as N(0, 1). x3, like our dependent variable y, is derived from the nor- mal random variable generator. The sample size is set at n I 5000. We retain the experimental conditions of the pre- ceding chapter, in order to make comparisons later with the resulting ratios. We conduct three sets of experiments. In the first set, for given 612, G 22, and 6 parameters which denote some information, we vary our E parameters in order to make sam- ple separation more distinct. We do this twice -- first, u- sing Q, values which imply constant switching probabilities (i.e. 6,2 I 0 but estimated) and second, using Q values which indicate non-constant switching probabilities (i.e. 86 $2 I 0 and also estimated). In the second set of experi— ments, for given 612: 622 , and 6 parameters which denote a distinct sample distribution, we vary our 6 parameters to show different levels of information observability. Again, we do this twice —- where our chosen 6 values imply both constant and non-constant switching probabilities. In the last set of experiments, we vary our 6 parameters given fixed 8, 612, C 22, and ‘6 values. The purpose of this last set of experiments is to find out the effects of dif- ferent Q values (all of which imply non-constant switching probabilities) on parameter estimation efficiencies. For the first experiment, we vary the F values given 512 = «22 = 1, and ‘6 I (1, -1, 1, l)' which is informa- tive. Since we had earlier established that the intercept term is more important in determining sample separability than the slope term, we vary our intercept term 621, hold- ing the other intercept term fixed; therefore, we have P I (0, 0, 821, 0)' where the two distributions are made in- creasingly distinct from each other as 821 increases. We choose Q,I (0, 0)' which essentially implies constant switching probabilities of .5, even if k. is modelled as a probit. This particular choice of Q values enables us to test our model with non-constant switching probabilities, h I F(x'Q) I .5 where G I (0, 0)' but estimated, against the alternative of constant )( (implicitly, )I I F(x'€ ) I .5, where QI (0, 0)' but not estimated), which is actually a special case of our present specification. The results 87 are presented in Table 12, and the ratios here can be com- pared with the ratios in Table 7 of the previous chapter, where, for fixed )\, we perform the same eXperiments. We call this Case 1. ‘6 has some information, i.e. ‘61 I -Xo, so that the variance ratios when regime is partly known are less than or equal to the variance ratios when regime is unknown. They become more equal as the two regimes become distinctly se- parate, meaning that there is little value in obtaining more information on sample separation when the two distributions are clearly far apart. Compared to fixed h , where Q is not estimated (as in Table 7), the ratios we derive now are slightly larger (par- ticularly for [812, $22, and la ) probably due to the fact that more parameters are estimated here, or maybe simply due to randomness. But as the regimes become clearly separate, i.e. F I (0, 0, 4, 0)', the ratios now are almost equal to those derived when X was fixed. The &,1 and E,2 variance ratios are higher than the A) ratios even when both imply that Q. I 0. The reason behind this is that parameter values for Q now have to be estimated, thereby introducing more randomness in the process, compared to the case when Q.I 0, but not estimated. When regimes are quite close to each other, i.e. P I A A (0, 0, l, 0)', 812 and 822 variance ratios are larger than r A the fill and. $21 variance ratios, a pattern very unlike that when X was fixed. However, this observation only holds .mounoso anp CH noHnmp nonpo on» mom can» nH mHnB .czocx nH ouson can: on o>HomHon 23023:: nH oEHmoa cons noocaHum> oHponEmnm uo noHusu on» was nonmnpcohan 2H nohsmHh .ooom I c 88 3.3 3.3 8.3 8.3 8.3 3.3 3.3 8.3 m.H s.H s.H :.H o.m o.m :.m o.m o a o o 3.8 3.83 3.33 363 8.3 2.43 3.3 Amémv H.m m.» w.m b.m m.: m.m a.» N.m o m o o 3.33 2.335 86.33 2.603 3.03 346me 8.9; 3433 m.MHm =.hs m.m o.m p.5m m.mm :.mm m.sH. o H o o Nsm HM mww me mum Hmm mHm HHm. «Nu HN& mHu HHu csocx\Aczocchv csosx zHuumm moHumm mumposmnmm .3 .H .HI .3 a» «.Ao «0V I@ «H INV IHb «.Ao «Hma «C «0v In Cm£3 Hmm wCHhhm> mmocmHnm> oHpomezm< no moHpmm .NH oHnma 89 when regime is partly known. When regimes are completely unknown, $12 and $22 ratios are much smaller than the ‘311 and lgzl ratios, a pattern evident when X was fixed. The same pattern holds, but on a smaller scale when 5 I (O, 0, 2, 0)'. This implies that when regimes are very close, and there is partly known information on regime clas- sification, then there is a larger value of sample separation information when k is not fixed, as compared to when X is fixed. We now repeat the previous experiment, this time choosing non-zero Q parameters -- call this Case 2. The results are presented in Table 13. We set Q»= (1, -l)' so that the implied probability for each observation is no longer constant at .5. The mean value of the (different) X3 is .h38. This implies that there is a slightly larger probability that an observation is generated from regime 2 rather than regime 1. Consistent with the findings of past experiments, the ratios which reflect the value of sample separation information are larger for the regime which is sampled with the lower probability. Since regime l is sam- pled with the lower probability on the average, then the variance ratios of 311’ $12, and 612 are all larger than the corresponding estimated parameter ratios of regime 2. The only difference between Case 1 and Case 2 is in the value of the Q parameters. In Case 1, the choice of the Q; values assure that for each observation, k - .5; in Case 2, the choice of the Q values assure that for each 9O Am.mv Am.mv Am.Hv Am.av Am.mv Aa.mv Ao.:v A~.mv m.m m.m =.H =.H N.N m.m m.m m.m Am.=fiv Am.=av Am.~v Ao.=v As.mv Ae.sv A=.HHV Ae.ev w.HH e.m H.~ =.~ e.m m.s m.m o.m Am.mHHV Am.mmflv Am.mv Am.mv A:.=v Am.mav Am.~mv Am.mmv e.mm m.me m.m m.s m.m a.» m.om m.:a NW HG NV aw NNm HN$ mfiw HH» 4 < N( N< < ( ( < czosx\AcsoschV czosx mapumm mOHpmm mmu Hma Nfla Ham mumpoemnmm .AH .fl .H: .63 an, q-AHI «HV Id «H I Nw I Hb «pno AHN& «C «0v I& C033 HNm wgflhhdxr F moocmauw> onopmemm< no moapmm .ma magma 91 observation, X assumes different values, depending on the magnitude of the independent variables x that determine the value of X , i.e. X = F(x'€ ). A comparison of the ratios between Case 1 and Case 2 shows that, in the latter, the value of sample separation information varies less. That is, when the two regimes are fairly close and the value of sample separation information is important (or the ratios are high) in Case 1, the value of sample separation infor- mation is less important (or the ratios are lower) in Case 2. On the other hand, when the regimes become farther apart, the value of sample separation information in Case 1 becomes lower or the ratios of asymptotic variances tend to approach one as they should. For Case 2, the decline in the ratios is slower, so that ratios in Case 2 are higher than those ob- tained in Case 1, when regimes are distinctly separate. To illustrate, take the ratios for $11. In Case 1, they range from 17.5 to 2.0 (as the distributions become farther apart) when the regimes are partly known, and from 6831." to 2.3 when the regimes are unknown. In Case 2, they range from lu.8 to 3.5 when the regimes are partly known, and from 25.6 to 3.7 when the regimes are unknown. The same pattern holds for all the other variance ratios of the estimated parameters. There does seem to be an advantage in postulating that the switching probabilities be non-constant rather than con- stant (even if the‘Q parameters have to be estimated in both cases), so that the probability that an observation is gene- rated by a particular distribution depends on the values of 92 the exogenous variables. However, this advantage only holds when {,2 !‘ O.-8-/ This is supported by the observation that the efficiency of the estimates does not suffer as much (variance ratios are lower) when )x is non-constant (Q r‘ O), as compared to when X is constant ( Q 8 0), when the re- gimes are very close to each other and are hardly distinct, i.e. B ' (0, 0, l, 0)'. It is evident when regime is either partly known or unknown. As the regimes become separate, the decline in the ratios is quite slow, so that the variance ratios are actually lower when X is constant. Among all the ratios, the highest values belong to the estimated 6, parameters, Just as in Case 1. This means that among all the parameters to be estimated, the largest effi- ciency losses originate from the parameters that determine the switching probabilities. This is fairly intuitive, since the efficiency of the estimates for the parameters in the two regimes are affected by the initial probability of switching regimes or of correctly matching the observations with the proper regimes; therefore, the greater burden of efficiency losses correspond to the 6. parameters, which enter the switching probability probit function. These are applicable only when regimes are difficult to distinguish from one life i‘ O basically implies that X =- F(x'€) is non- constant fog all observations, while Q, = 0 implies that k . F(x'Q,) is constant, since the effects of the variable x are wiped out and are not reflected in the resulting values of k . In our experiments, we adopted the special case of ). = F(x'§) = .5, where 6,8 (0, 0)' but estimated. 93 another. When the regimes are sufficiently apart, then the ratios of the 6 parameters are comparable in magnitude to the other ratios. Consistent with the observation in Case 1, the decline in the variance ratios is monotonic for all estimated parameters as the distributions become far apart. 2 . (S22 = l and also choose a particular sample mix, i.e. 9 = (0, In our second set of experiments, we fix 61 0, 2, 0)'. We vary our ‘6 parameters to reflect different observability levels. We do this set of experiments twice -- Case 1, where Q,= (O, 0)' and Case 2, where Q 8 (l, -1)'. The results are presented in Tables 1“ and 15, respectively. Let us start with Case 1. In the first experiment, K1 = - X 0, that is, the ‘6 parameters imply that no infor- mation is provided at all, and the ratios derived here are very similar to those derived when.) is fixed for all obser- vations, but 6, = O is not estimated (as seen in Table 8 of the previous chapter). Ratios when regime is partly known are exactly equal to those derived when regime is unknown. The only difference between the ratios derived here and those derived when Q= 0 but not estimated is that the ?12, $22, and 2 ratios are much higher when the regimes are close to- gether, i.e. B = (o, o, 2, 0)'. When information is now introduced into the K parame- ters (Kl fi-Ko), as in X= (l, -l, l, l)' and X . (1, 1, -l, l)', then the ratios when regime is partly known are less than the ratios when regime is completely unknown. That is, the presence of sample separation information 94 m.H m.= m.H m.m h.H m.m m.m m.: H HI H H H.m m.» m.m >.m m.: m.m :.> m.m H H HI H m.m =.HOH m.mH o.mH m.= H.Hm m.u =.mm H HI HI H NM HM NNM‘ «HM mmm.‘ HNM mHm HHm mox Howl NHx HH ssosx\czocx thnmm moHpam mnmumEmpwm 20.8..0 .Humw nab :3 .m .o .8 a» can; on, x H» V » wcthm> moocmHnm> oHpoumEmw< mo moHpmm .nH mHnt 95 presents efficiency gains as reflected in the decline of the ratios as compared to when there is no information at all. The results here can be compared to the ratios in Table 8 and Table 9 for the same parameter values of 612, 6 22, g , 'X, and W = .5 where ‘Q = 0 but not estimated. x = (l, -l, l, l)' presents a wider divergence in re- gime classification probabilities p11 and p00, as compared to X = (l, l, -1, l)'. That is why, ratios are lower or effi— ciency gains are higher when p11 is close to p00 as in '3 = (1, 1, -l, l)'. The observation of the previous experiment also applies here. That is, the ratios derived when )( is fixed and Q = 0 but not estimated are close, but slightly less than the ratios derived here where )\ is also fixed and Q ==C, but estimated. Again, the difference may be due to randomness or to the fact that more parameters have to be estimated this time. We now explore Case 2, which is shown in Table 15. The Q values are set differently, where Q = (l, -1)' so that the switching probabilities vary for all observations. This results in an average value of 'k = .438, meaning that there is a slightly larger probability on the average that an observation is generated from regime 2 rather than regime 1. Consequently, the ratios are larger for the estimated pa- rameters of the regime which is observed with the lower pro— bability. The first experiment illustrates a non-informative case, where X = (1, -l, -1, l)'. Therefore, ratios when 96 m.» m.HH :.H H.m m.m m.m m.m m.m H HI H H m.HH h.m H.m =.m ~.m m.= m.m o.m H H HI H m.:H w.:H m.m 0.: :.m u.» :.HH ~.~ H HI HI H NM HM NNM NHM mmm Hmw «Hm. HHm Now, How «HM HHx ssocx\csosx menmm moHpmm mnopoednmm .AHI «HV "6 «H I N.” ..u Hb A.AO «N .0 «CV BM sons A0» x H» V » magnum“; moocanm> oHpopmemw< no moprm .mH mHan 97 regime is partly known are equal to the ratios when regime is completely unknown. The next two experiments provide informative X choices, which essentially duplicate those in Case 1, so that the variance ratios decline when information is not denied from the model. When the Q, values ensure that the switching probabi- lities are non-constant for all observations, the ratios in Case 2 vary less than those in Case 1. Even when the ‘6 pa- rameters are non-informative, variance ratios in Case 2 are lower than the corresponding variance ratios of Case 1. This re-enforces our earlier findings in the first set of ex- periments that there are efficiency advantages when we pos- tulate that the switching probabilities be modelled as non- constant (Q2 3* 0). However, as information is provided on sample separation, the decline in the variance ratios is very slow or is quite minimal in Case 2. To illustrate this point, consider the?ll ratios -- in Case 1, the decline in the values ranges from 52.4 to 4.6 when information is pro- vided, while in Case 2, the decline in the values ranges from 7.7 to 6.8 when the same X information is provided. A si- milar pattern is evident for the ratios of the other parame- ters. Therefore, the advantages of improved efficiency asso- ciated with non-constant switching probabilities seems to oc- cur only within that range of parameter values where informa— tion is very valuable in determining sample separability -- in this instance, when the 6' parameters are non-informative. Since ‘6 - (l, 1, -l, l)' provides less divergence 98 in the p11 and p00 regime classification probabilities as compared to X - (l, -l, l, l)', we would expect that the ratios in the latter should be consistently higher than the ratios in the former. However, this does not hold, parti- cularly in the case of the i311, ,6 12, and 3.1 variance ra- tios, where the decline in the ratios is not monotonic as we vary the “ values from the least informative to the more informative. Another effect of the non-constant ) values is seen in the fact that among all the derived ratios of asymptotic variances, it is the Q ratios which are always the highest. This implies that as information is provided on regime clas- sification, efficiency losses associated with the G parame- ters remain quite substantial when )x is not constant for all observations. When X is constant for all observations, but 49 parameters still have to be estimated (as in Case 1), then the ratios are much lower (when information is provided to the model) and the decline in the values of the asymptotic variance ratios is monotonic as more information is provided on sample separation. The last set of experiments we conduct involves vary- ing the values assumed by the {Q parameters given fixed va- lues for 612, 6‘ 22, g , and X . The results are presented in Table 16. For these experiments, ‘6 1 If - ‘6 0’ so we have informative cases. Ratios when information is partly avail- able are less than ratios derived when there is no information available at all. 99 AH.mmv Am.wmv AH.HHV Am.Hv Am.::v Am.mmv Am.mv A».zv m.>m H.@ :.H m.mm m.mH :.N H H wmo. mom. A~.m~v As.HmV “n.0Hv Am.mv Am.omv A=.s~v Am.mv Ao.~Hv m.mH v.2 m.H z.mH w.m m.m m. m. maH. «mm. A=.mHv Am.mav Am.mv Am.mv Am.oav Am.ev Am.ev N.HH m.m >.H p.» o.m m.: H HI wmz. mmm. Hm.=Hv Am.=HV Am.mv “0.33 A:.mv A>.~v A~.>V w.HH H.m z.m N.m 0.: o.m HI H Nmm. mmn. Am.mv Am.Hoav Am.mHV Ao.mHv Am.:v A».Hmv Am.sv Am.mmv H.w m.m b.m m.: m.m m.m o o m. m. NM. «NM NHM NNm Hmw HHm NV .HO £_I H A czo:x\AssocstV ssocx sznmm onpmm muoposdnmm mosHm> cam: «HINWIHU .AH .H .HI .HV a”, «oAO «N «C «0v In sons so mafimfig moocMHum> oHpomeHm< no moHpmm .wH mHan 100 The different C, values suggest different average values for )7 and (l - ‘)). The resulting variance ratios are consistent with the expectations that the ratios asso- ciated with the parmaters of the regime observed with the lower probability assume higher values. Therefore, as the average value of )\ goes up, the ratios associated with re- gime l, i.e. 811, $12, and 812 all go down. Regarding the ratios corresponding to the Q. parame- ters, the lowest ratios occur when k = l - 'X = .5; this means that the efficiency of the estimates on the parameters of the )5 model is highest when there are equal probabilities for an observation to be generated by either regime. As the switching probabilities increase for any one regime, i.e. as the Q, parameter values increase absolutely, then g, ratios also increase monotonically, implying that the efficiency of the estimates declines substantially when the switching pro- babilities become biased in favor of any one regime. When 2= (1, -l)' and fi" 8 (-l, 1)‘, then Q* = - Q , so that F(x'Q) = 1 - F(-x'€) = 1 - F(x'6,*). Therefore, A 8 l - )\*. This transformation is a similar action to simply interchanging the names of the regimes. Consequently, ’81 ratios for )I are simply equal to $2 ratios for l - )\ *3 2 ratios when -'Q. is equal to 5‘12 ratios are similar to 82 Q“; and so on. Any difference in the values of the variance ratios may be attributed to randomness, and to differences in the information provided by the X' parameters to both regimes. 101 “.5 Summary This chapter has focused on the possibility of model- ling the switching probabilities as probit functions of the exogenous variables in a switching regression model. It has, however, retained the other features of the preceding chapter -- two exogenous variables, and modelling the regime classification probabilities given the true regime also as probit functions of the explanatory variables. In addition, all the parameters will have to be estimated. This expanded model is aimed at improving on the previous specification since using all the available observations on the dependent and independent variables may increase the chances of correct switching between regimes. It also serves as a better indi- cation of the model's ability to classify observations based on the values of the exogenous variables. Different types of experiments were conducted here. In the first two sets -- vary F given K, and vary X given 8 —- we apply both constant ( Q = O) and non-constant (Q, 1‘ 0) switching probabilities, where the 6 parameters are esti- mated in both instances. When Q = O, we have the special case of our former model with fixed X (implicitly, 6 = 0 but not estimated) and the ratios we derived previously can be compared with our present results. When‘Q # O, we can e- valuate the merits of our probit model when the resulting switching probabilities are either constant (16 = O) or non-constant (Q i‘ O). In the last set of experiments, we vary our 4, parameters, all 4, I‘ O, to find out the effects 102 of such an action on the resulting parameter efficiencies. We come up with the following important findings. First, there are advantages when the switching probabilities are modelled as non-constant (éz r‘ 0) as compared to con- stant switching probabilities (42 = 0 but still estimated). These advantages are in terms of greatly improved efficien— cy of the estimates of the parameters. However, these gains only occur during instances where information is most valu- able -- when samples are hardly distinct from each other, and when the information provided by the X parameters is not informative at all. Under these circumstances, we get smaller variance ratios when the switching probabilities are not fixed for all observations. Second, since there are more parameters to estimate in this model, a lot of randomness and variability is introduced. This may account for the fact that the ratios we derive here are slightly larger than those derived when h was fixed.(‘Q = 0 but not estimated). In ad- dition, the slope variance ratios in the regression model are now larger than the intercept variance ratios in instances when the value of information is most important (as mentioned above) and for the sample which is observed with the lower probability on the average. This was not evident at all when we had a constant mixing parameter X.. Third, when we vary the Q parameters to yield various average levels of switching probabilities, the variance ratios of the estimated parameters which correspond to the sample observed with the A lower average probability are generally higher. The Q 103 variance ratios also increase as the probability of ob- serving a particular regime diverges from .5. Last, the value of imperfect sample separation information is still largely dependent on the natural separation of the two samples. Variance ratios are higher when the samples are more difficult to distinguish from each other, and they are lower when samples are far apart. Also, the use of im- perfect information improves parameter estimates as compared to when no information is used at all. Naturally, the more reliable the imperfect information (as evident from the X parameters), the better our estimates will be. CHAPTER FIVE CONCLUSIONS We set out in this study with the purpose of asses— sing the value or importance of imperfect sample separation information in a switching regression model, where all the parameters have to be estimated, so as not to understate the true value of such information. We accomplished this by evaluating information matrices using simulation experi- ments over a large sample size (i.e. 100000 and 5000) in or- der to derive the asymptotic variances of the estimated pa- rameters when regime is either unknown (no available sample separation information) or partly known (the available in- formation is imperfect). These asymptotic variances are simply the corresponding diagonal elements of the inverse of the information matrix. We then solved for asymptotic var- iance ratios when regime is either partly known or completely unknown, relative to when regime is completely known (full sample separation information). A comparison of these two sets of ratios shows the advantage of using imperfect regime classification information relative to no information at all. All these ratios are greater than or equal to one, and the extent to which they differ from one measures the value of information, or imperfect information, as the case may be. The higher these variance ratios, the greater is the value of regime classification information. On the other hand, variance ratios which approach the lower bound of 1.0 imply 10h 105 that information is not very valuable to the model. In the past three chapters, where we evaluated the value of imperfect sample separation information, we made variations on the basic switching regression model by pos- tulating different assumptions about the parameter values. In Chapter 2, we examined a normal mixture model with im- perfect regime classification information, where the proba- bilities of correct regime classification (given actual re- gime classification) are constant over observations. This is a straightforward extension of Schmidt's work to the Lee and Porter model with constant regime classification pro- babilities, p11 and p00. Our experiments consisted of vary- ing the regime classification probabilities, the difference between the means of the two samples, the difference between the variances of the two samples, and the mixing parameter -- each time holding the other parameter values fixed. In Chapter 3, we added another explanatory variable into our switching regression model and further assumed that the presumed regime classification probabilities (p11 and p00) are non-constant over observations, but are in fact, probit functions of the exogenous variables. This extension is aimed at improving the flexibility of the model and is plausible since it is highly likely that the imperfect re- gime classification probabilities vary from one observation to another, and that their values are affected by the exo- genous variables. Our experiments consisted of varying the probit parameters of the regime classification probabilities 106 for a particular sample mix, and varying the sample mixes for a particular set of imperfect indicators -- each time holding the other parameter values constant. In Chapter A, we maintained the features of Chapter 3 but added another assumption, namely, that the switching probabilities, formerly assumed to be a constant mixing pa- rameter for all observations, are now non-constant and can be modelled as a probit function of the explanatory varia- bles. This extension is aimed at providing the model with a better ability to classify observations into the two regimes, by using as much information as possible at each observation. Therefore, actual regime classification probabilities as well as imperfect regime classification probabilities are modelled here as probit functions of the explanatory varia- bles. There are three sets of experiments here: varying the probit parameters of the imperfect regime classification probabilities for a particular sample mix, and varying the sample mixes for a particular set of imperfect regime clas- sification probabilities, each time using non-constant switching probabilities; and varying the parameters in the switching probabilities probit model given fixed values of the other parameters. We have discussed the results of our experiments in detail already, so here we will discuss only a few of the more important findings. First, there are advantages in terms of efficiency gains when using imperfect sample separation information, as compared to no information at 107 all. These efficiency gains can be substantial in some cases. This is especially so when the two samples are not very dis- tinct, so that there is not much sample separation informa- tion in the sample itself. Our second important finding follows from the first one. There are two cases in which imperfect sample separa- tion information does not improve efficiency of estimation: (1) The imperfect sample separation information is not informative. This occurs when the probability of a par- ticular observed regime classification does not depend on the true regime classification; that is, when pll . l — p00. In terms of the model of Chapter 3, where these probabili- ties are modelled as probit functions, this occurs when unto. (2) The samples are very distinct. The two distrib- utions are sufficiently far apart so that there is a very small probability of misclassification for any observation. Therefore, there is hardly any need for information (im- perfect or otherwise) in determining sample separability. This occurs when the means of the two distributions in the sample are clearly separate ([11 distinct from.,uz; 81 dis- tinct from 82). Our third conclusion again follows from the first. The value of imperfect sample separation information is high- est, or the gains in efficiency in using unreliable informa- tion are greatest, under the following circumstances: (1) The imperfect sample separation information is 108 highly informative. In the extreme case, p11 ' p00 = 1 so that the imperfect indicators are perfect indicators, and the regime is fully identifiable based on the available in- formation. The more reliable the imperfect indicators, the more efficient the estimates are. This occurs when p11 8 p00 or p11 near p00 in the extreme ranges of probability, where there is great certainty and confidence that both re- gime classifications are right. (2) The samples are not very distinct. It is here where information (even if imperfect) is most helpful in determining sample separability and improving the efficien- cy of the estimates. This agrees with the findings of pre- vious studies (Kiefer, 1979; Schmidt, 1981; Lee and Porter, 198“) that the value of sample separation information is largely dependent on the natural separation of the two sam- ples. The closer the distributions in the sample (I41 close to #2; 81 close to 82) and the closer the variances are, the more important is information in assigning regime membership. Fourth, it is the intercept term rather than the slope term in a switching regression model which mostly de- termines sample separability. It is more difficult to dis- tinguish one sample from the other when the intercepts are close together rather than when the slopes are. Therefore, the efficiency losses in using no information or using par- tial information are far greater for the parameter estimates when the intercepts are hardly distinct from each other as 109 compared to when the slopes are hardly distinct from one another. Fifth, the value of sample separation information is highest for the estimates corresponding to the mixing para- meter or to the parameters of the switching probabilities. Sixth, there are definite efficiency gains when we model our switching probabilities as non-constant probit functions of the explanatory variables. These gains occur in circumstances where information is most valuable; that is, when samples are hardly distinct from each other and when the imperfect regime classification information is not very informative. Seventh, as we continually expand on our basic switch- ing regression model, we find that regime classification information becomes more valuable. The value of sample se- paration information is more important for complicated models, as Kiefer (1979) suggested. This is due to the fact that as we try to estimate more parameters, more variability is in- troduced to the estimates, which is naturally reflected in larger variances. This notion of more variability in the model is also evident in other situations -- when the X pa- rameters are not very informative, when samples are diffi- cult to disentangle, and when a particular regime is observed with a lower probability. Eighth, in accordance with the findings of Schmidt (1981), the value of information, imperfect or otherwise, is higher for the regime which is observed with the lower 110 probability. In light of these findings, a final word is warranted. Sample separation information, even if imperfect or unrelia- ble, can be used to improve the efficiency of parameter es- timates in switching regression models. Its use is most valuable when the samples are hard to disentangle from each other, and when the imperfect information is informative and fairly reliable. Under these conditions, it may also be ad- visable to model the switching probabilities as non-constant, since this action can further increase the efficiency of the estimates, particularly when the samples are difficult to distinguish from one another. Presumed regime classification probabilities given the actual regimes may also be modelled as non-constant to further improve the model's flexibility. However, when the imperfect information is highly unreliable or when the samples are clearly separate, there is little point in using imperfect information, since only small effi- ciency gains are possible. In addition, one must consider the trade-off implied when adding more parameters to the model (like imperfect regime indicator functions with probit para- meters) since such an action gives the model more variability and tends to increase the variance estimates. Therefore, gains achieved by improving the model's plausibility may be lost or at least partially offset by introducing more varia- bility into the model when additional parameters have to be estimated. APPENDICES APPENDIX A THE SECOND DERIVATIVE COMPONENTS OF THE INFORMATION MATRIX IN THE CASE OF NON-CONSTANT CLASSIFICATION PROBABILITIES The density function (we drop the subscript J for simplicity) when the regime is unknown is: f(y; 0) = Afl(y) + (1 - )~)f2(y) where: 9 = (9151525512’ 522’ 0)' 31‘ (511’ 312"", 3110' [" We] fi(y) = 2 2‘1 1 J2“- (i 1=1,2 The information matrix is given by: M 2 Sg-EIZ"[3 lnfj] 3929' where: azlnf 921‘ r r . J=._1___J_- 1(2_.q(_3_1) 9939' rJ 3929' a? be 30 The first derivatives of f with respect to 0 are given in the text of Chapter 3. The non-zero second derivatives of f with respect to 0 are: 2 351k 351m 2"11:35 1m 111 a 2 r 7 331k 361 321' . 251k 3A 321‘ 3321c 982m 112 2 j 351k 361 2‘1 951k 2 =(1-')) 3 1‘2 932k 332m 2 =(1-M 3 f2 33 2k 362 2 for simplicity) when the regime is partly known is: 32H - Bf1[(y-x'$1)2- 1]+ emf)?- 3612 i . 1,2; k,m = 1,2,...,K The Joint density function (we omit the subscript J f(y, W; 9) = )sf1(y)(wp11 + (1 — W)(l - 1311)) + mmam‘u—m where: 9 ‘ (3153255112 522’ >" $1,960.), $1 = (311’ 512“”: 5110' Ks B (Xsl’ xs2"°” KSK)' 2 f1(y) = 1 exp [i' (y ' x' Pi) ] JW ‘1 2 «12 POO=F(X'XO) IISX'KO J1 exp [-vz] dv -4; 1" i = 1,2; 8 = 0,1 F( ) . standard normal cumulative distribution function The information matrix is given by: 2 §--E£[3 Inf-1] V' 3939' 11H where: 32 lnf 321‘ 3r 3f ' i=_1.__J_-_}_(__J.)(_i) 3000' rJ 3930' r32 30 30 The first derivatives of f with respect to O are given in the text of Chapter 3. The non-zero second derivatives of f with respect to 0 are: 2 32 r = M21 3 f1 3‘1:: 351m ‘ a31k 351m 2 32 f = )Q1 3 fl 351k 3612 walk 3612 a 2 r = Q1 Bfl 3511(93 aElk 32:? = X(w-(l-w)) 3f1 3F(X'X1) ”M 3“m 351k ”in: 2 321‘ =(1-MQ2 31‘2 352k a52m a$21: 952m 2 221‘ =(1—)\)Q2 Bra 3 2 B521: 9‘2 352k 3‘2 32 r = — Q2 91.2 321, =-(l-)\)(w-(l-w)) 3:2 3“"‘0’ a521: 9“ Om 382k ”‘01:: 2 32f = Hal 3 f1 2( (12)2 3 (€12)2 115 )2: ‘Q1 3% 36122): 2515 32: =}\(w-(1-w)) 3"1 3“"‘1’ 9412 axlk 2&1: axlk 2 92:“ =(1-MQ2 3 f2 3((22)§ buzz)? 321' =-Q2 3f2 3622»; 2&2: 32: =-(1- M(w-(1-w)) 2‘2 3“”‘0’ 3‘229X0k a‘22 DXOK 92f =fl(w- (l-w)) 3F(X'Xl) axaxlk BKlk 32f =f2(w-(1-w)) 9F(X'KO) 2A 230k “(ox 2 : 32: =)\fl(w-(1-w))3 F“ ‘51) 2 c 32: =-(l->\)f2(w-(l-w))3 1"("Xo) aXOk 330m aka 3X03, where: Q1 = wF(x' X1) + (1 - w)(l - F(x' Xl)) Q2 = NO - F(X' 1(0)) + (l - W)F(X'Ko) 2 3 f1 = fi (- xkxm) + (y-x'fii) xk in 2 9311:3331; ‘1 ‘12 a‘gzun 116 2 3 f1_-(y”"?1)xk[9f1 1-f1] 251k 3612 3‘1 ‘1 (1 sari . an [(v-x'wz- 1 ] 2 f1[1 _(y-X'$1):| 2‘1 ‘16 2 a F(X'K ). I _ t s fl(x ISSH x Ks) xkxm axsk aXsm i - 1,25 s - 0,1; k,m 8 1,2,...,K F( ) = standard normal cumulative distribution function E( ) = standard normal probability density function APPENDIX B THE SECOND DERIVATIVE COMPONENTS OF THE INFORMATION MATRIX IN THE CASE OF NON-CONSTANT CLASSIFICATION PROBABILITIES AND NON-CONSTANT SWITCHING PROBABILITIES The density function (we drop the subscript J for simplicity) when the regime is unknown is: f(y; 9) = \flw) + (1 - l>f2(y) where: 9 ‘ (31': F25 ‘12: ‘22: Q')' #1 ‘ (511’ 512"“: 3119' % = (4.1, can”, QKM , 2 f1(y) = l eXP [' (y ' x 51) J 1 223? s 2 «12 >\=F(x'g)=Sx'Q l exp[-v2:|dv '06 J21? 1=1,2 F() standard normal cumulative distribution function The information matrix is given by: ,A 2 3=-E§‘[31”f1] V 293m where: 2 2 I a 1nr1._1_ a r1_.1_2(ar1)(3r1) f 3030' 121 3030' J 39 Do 118 The first derivatives of f with respect to 0 are given in the text of Chapter A. The non-zero second derivatives of f with respect to G are: 2 221' =F(x'6,) 3 1.1 951k 35 11:: 3511c 551m 2 92f jar-Pong) B 1‘1 2 ‘5 351k 3‘1 3"1k 2‘1 92f = 315'1 3F(X'Q) D$n(36m aBnc 36m 2 32: =(1-F(x'a>) 3 1‘2 332k a52m a52k 232m 2 92: =(1-F(x'€g)) 3 f2 2 2 252k 362 afizk 362 32: = - 2f2 BF(x'Q) 333(3Qn1 382k 36m 2 32 r =F(x'(¢,) 3 f1 emf)? emf)? Ber = 9f1 312006) 3612 34k 3€12 Mk 2 32f =(l-F(X'€)) 3 f2 2 2 22 3(62) 3K2) 321‘ = - 9f2 me'é) as} aak ac; ack 321' = (fl-f2) 32F(x'€_) Bekaem ack 36m 119 where: 321:1 . r1 (- xkxm) + 3:1 xk (y-x'fii) 951:: 351m :7: 3E: ‘12 32f1 = (y'x'flih‘k 2‘1 1 - 1{'1 am [7??? 2“] 32‘}: Bfi [(y'x'31)2- 1 ]+ W 75? 2:1" W , 2 f1 [ 1 _ (y ' x 51) J 2611: «16 32 mm.) = m'cx- X'mxkxm Eek 2cm 1 = 1,2; k,m = 1,2,...,K F( ) 8 standard normal cumulative distribution function fl( ) = standard normal probability density function The Joint density function (we omit the subscript J for simplicity) when the regime is partly known is: f(y, W; 0) = % f1(y)(Wpll + (l - W)(l - p11)) + (1 - A )f2(y)(w(l - poo) + (1 - w)poo) where: 2 2 9 g ( Fl"’$2fl’*61.’ ($2 , Q3, )glc"xoc)v F1 " ( 311’ 312““: FiK)' Ks (Xsl’ X32"'°’ szy 4. = (61, 1'22“.” ex): 120 , 2 f1(1)!) = 1 eacp [ ' (y ' x 31) ] J21? (1 251: )\=F(x'(¢.)*j‘x'6' _i_ exp [-v2]dv -.b J??? 2 x! P11=F(x"61) '[ X1 _l__ exp [-vz] dv -.o firm" 2 x' 1‘o 2 POO=F(X'XO)'X _2L_ exp [-v ]dv -ob J?“— 2 1 = 1,2; 8 = 0,1 F( ) = standard normal cumulative distribution function The information matrix is given by: §=-E;: [321”3] 3“ 3930' where: 2 2 a 1n fJ Si 2 f1 _ 1 (3f!)( afl): acao' rJ 3939' I"? Do 39 The first derivatives of f with respect to G are given in the text of Chapter 4. The non-zero second derivatives of f with respect to O are: 2 32f -F(x'&)Ql 3 f1 351k 951m 351k 351m 2 Bar j=F(x'Q)Ql 3 f1 951k 3512 951k 2612 3 2 f - Q1 afl 3F(x'4) 321k 96m 331k 9Qn: 121 32 r a F(x'C, )(w - (1 - w)) 3‘1 BFW‘H) 3311: 9‘film a51k 3film 2 3 2 r = (1 - 1"(x'fi.))c22 3 f2 95 2k 8(92le 352}: 352m 2 32: fi=(1-F(X'Q))Q2 3 f2 a32k 3‘ 22 33 2k 9‘2? 92f =-Q2 3f2 3F(x'fil afigkagm 322k 3Q”) 32: -=-(1-F(x'Q,))(w-(l-w)) Bra 9F("'7‘0) 3‘5 2k ax0m a(52k BXOm 2 BZr =F(x'Q)Q 3 f1 2fi2’ 1"""""""'2 2 3(61) 9(61) 321‘ 2Q Bf]. amx'Q) 2 1""""‘2 361 ”'1: 351 36k Bar =F(x'Q)(w- (l-w)) bfl 3“""1’ 3412 axlk “12 axlk 2 32 r a (1 - F(x'Q ))Q2 3 f2 9(622)2 3(62 ) _B_E?t;__= - Q2 1:2? apnea! 362 36k ace 2% 22 r = - (1 — rows, ))(w - (1 - w)) 312 amx'xo) 2 2 ’a 2 r = (rlol - f2Q2) 92 ch'é) 9% 36m aakaam 122 3 2f =f1(w- (l-w)) 3F(x'&) BF(X'X1) 3Q}: 2K 1m Balk 21‘1m 3 2 r = 220,: - (1 - w)) DFu'Q) BM“! 0) 254k axOm 3Q): 3)‘0m 2 : 3 2 r = F(x'Q ”1‘" - (1 - w)) 3 F("11) 91‘1k ”11:: 2“1k 2x111: 2 t 32f =— (1-F(x'Q,))f2(w- (l-w)) a F“ ‘0) 3x01: BKOm 5x01: 210m where: Q1 = wF(x"61) + (1 - w)(l - F(x'X1)) Q2 = w(l - F(x' 1(0)) + (1 - W)F(X' X0) 2 9 f1 = i‘1 (- xkxm) + afi xk (y - x'fii) “—2 ”'7 251k 281m ‘1 351m ‘1 2 v 3 f1 =(y'x51)xk[3f1 1-f1] 2 fl“? —'&' 251k 351 2‘1 C1 ‘1 2 . 2 a fi,= afi [(y'x 51) - 1 ]+ 3(512)2 3512 261“ 2612 f1[ 1 -(y'x'$1)2] "—1 6 2‘1 ‘1 22 mx'fz) = mx'QH- x'Q. )xkxm 22 k, m 2 3 F(x'X8) . my XSH- x' xs)xkxm axSk axsm 123 i I 1,23 s 8 0,1; k,m = l,2,...,K F( ) = standard normal cumulative distribution function E( ) - standard normal probability density function BIBLIOGRAPHY BIBLIOGRAPHY Ashford, J.R. and Sowden, R.R. "Multivariate Probit Analy- sis." Biometrics, 1970, Volume 26, 535-5u6. Eaton, Jonathan and Gersovitz, Mark. "LDC Participation in International Financial Markets: Debt and Reserves.” §ggrna1 of Development Economics, 1980, Volume 7, -10 Fair, R.C. and Jaffee, D.M. "Methods of Estimation for Mar- kets in Disequilibrium." Econometrica, 1972, Volume #0, u97-51u. Gersovitz, Mark. "Classification Probabilities for the Dis- equilibrium Model." Journal of Econometrics, 1980, Volume 1”, 239-246. Goldfeld, Stephen and Quandt, Richard. "Estimation in a Disequilibrium Model and the Value of Information." Journal of Econometrics, 1975. Volume 3, 325-3H8. Hamermesh, Daniel. "Wage Bargains, Threshold Effects, and the Phillips Curve." Quarterlngournal of Economics, 1970, Volume 8n, 501-517. Hartley, Michael. "Comment on "Estimating Mixtures of Normal Distributions and Switching Regressions"." Journal of the American Statistical Association, 1978, Vqume 73, Judge, George G., Griffiths, William E., Carter Hill, 3., and Lee, Tsoung—Chao. The Theor and Practice of Economet- rics. 1980, New York: John iléy and Sons, Inc. Kendall, Maurice and Stuart, Alan. The Advanced Theory of Statistics, Volume 1. 1963, New York: Hafner. Kiefer, Nicholas. "Discrete Parameter Variation: Efficient Estimation of a Switching Regression Model." Econo- metrica, 1978, Volume #6, h27-H3H. . "0n the Value of Sample Separation Information." conometrica, 1979, Volume “7, 997-1003. 12“ 125 . "A Note on Regime Classification in Disequilibrium godeés." Review of Economic Studies, 1980, Volume 47, 37- 39 o Laffont, Jean-Jacques and Garcia, Rene. "Disequilibrium Econometrics for Business Loans." Econometrica, 1977, Volume #5, 1187-120U. Lee, Lung-Fei and Porter, Richard. "Switching Regression Models with Imperfect Sample Separation Information -- with an Application on Cartel Stability." Economet- rica, 198“, Volume 52, 391-N18. Quandt, Richard. "A New Approach to Estimating Switching Regression Models." Journal of the American Statisti- tical Association, 19 , o ume , - . and Ramsey, James. "Estimating Mixtures of Normal Distributions and Switching Regressions." Journal of the American Statistical Association, 1978, Volume‘73, Rosen, Harvey and Quandt, Richard. "Estimation of a Disequi- librium Aggregate Labor Market." Review of Economic Statistics, 1978, Volume 60, 371-379. Schmidt, Peter. "Further Results on the Value of Sample Sepa- ration Information." Econometrica, 1981, Volume #9, 1339-1343. . "An Improved Version of the Quandt-Ramsey MGF Esti- mator for Mixtures of Normal Distributions and Switch- ing Regressions." Econometrica, 1982, Volume 50, 501-516. Suits, Daniel. "An Econometric Model of the Watermelon Market." Journal of Farm Economics, 1955, Volume 37, 237-251.