WllllllllHillllUlllllllllllllllilllUIHIIIIIHIIWIIUIHI 3 1293 10454 1952 MSU LIBRARIES n \uv RETURNING MATERIALS: P1ace in book drop to remove this checkout from your record. FINES will be charged if book is returned after the date stamped be10w. THE COST OF PARTIAL OBSERVABILITY IN THE BIVARIATE PROBIT MODEL By Chun-Lo Katy Ho A DISSERTATION Submitted to Michigan State University in partia] fu1fi11ment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Economics 1982 ABSTRACT THE COST OF PARTIAL OBSERVABILITY IN THE BIVARIATE PROBIT MODEL By Chun-Lo Katy Ho Some recent studies have made use of the bivariate probit model in testing various hypotheses, but with only partial observability about the dichotomous dependent variables. The maximum likelihood estimators in these partial observability cases will be inefficient compared to those obtained under full observability. Therefore in this study, we present several cases with different levels of observability for the bivariate probit model and we measure the efficiency loss of maximum likelihood estimators for each case through some experiments. The example of a two-member committee voting under a unanimity rule can be applied to all of these cases. Case one is the case of full observability in which the dichotomous choices of both voters are always observable. Case two is the case of partial observability in the sense of Poirier, which under the assumption that only the result of the joint choice of the two decision- makers is observed. Case three is called the case of partial partial observ- ability, in which one of the two parties' decision is fully observable. In case four, which is called the case of partial observability with observed veto, when the outcome is "no", we observe one of the two parties casting its "no" vote. Three alternative possibilities are presented for this case, concern- ing who will use the veto first if both parties wish to vote "no". Chun-Lo Katy Ho The log-likelihood functions are provided for the joint estimation of the parameters for each of the various cases above. Since the inverse of information matrix is the asymptotic variance-covariance matrix of maximum likelihood estimator, the derivation of information matrices for all these cases are presented. The conditions for identification for the partial observability cases are also discussed. Then a large variety of experiments are done to measure the cost (in terms of lost efficiency) of partial observability. Here are some of our main conclusions. First we notice that the cost of partial observability is quite high, especially for case two. The cost of partial observability decreases markedly if any piece of observability information can be found. The law of diminishing marginal utility of inform- ation usually holds: it is the first piece of observability information which is most important. The second conclusion is that specifying p (the correlation coefficient of the two probit equations) a priori improves the efficiencies of the estimates of the other parameters a great deal. A third conclusion is that the sample split has a strong influence on the relative efficiencies of the parameter estimates., For a given partial observability case, its efficiency relative to full observability will be higher, the smaller the proportion of observations which fall into the indistinguishable categories. The last conclusion is that the strength of identification matters. The relative efficiency of each partial observability case is very low for parameter values near such points, and it increases rapidly as the parameters move away from such points of singularity. ACKNOWLEDGEMENT I must first express my sincere gratitude to my thesis advisor, Professor Peter Schmidt, who generously bestowed an enormous amount of his time providing detailed guidance on every aspect of this study. Without his continuing assistance and encouragement, I could not finish this thesis and my graduate study at Michigan State University. I am also very grateful to the other members of my dissertation committee. They are Professors John Goddeeris, Daniel Hamermesh and William Quinn. I owe thanks to my typist, Miss Kelli Sweet. Her careful and expedient typing is appreciated. 1 Finally, my deepest appreciation goes to my parents and my husband, Hai-Zui, for their support and encouragement throughout these years. To them, I owe a debt more than I know how to express in words. LIST OF TABLES .......................................................... CHAPTER ONE INTRODUCTION .................................................. TWO BIVARIATE PROBIT MODELS WITH FULL THREE TABLE OF CONTENTS AND PARTIAL OBSERVABILITY 2.1 2.2 2.3 2.4 2.5 2.6 Introduction .. ........................................... Case One: Full Observability ............................ Case Two: Partial Observability in the Sense of Poirier ......................................... Case Three: Partial Partial Observability ............... Case Four: Partial Observability With Observed Veto ............................................ Summary .................................................. DERIVATION OF INFORMATION MATRICES AND CONDITIONS FOR IDENTIFICATION 3.1 3.2 3.3 3.4 3.5 3.6 Introduction ............................................. Case One: Full Observability ............................ Case Two: Partial Observability in the Sense of Poirier ............................................... Case Three: Partial Partial Observability ............... Case Four: Partial Observability With Observed Veto ............................................ Summary .................................................. Page iii 1 7 9 11 12 14 19 21 22 25 3O 33 4l Page FOUR RESULTS OF EXPERIMENTS MEASURING THE COST OF PARTIAL OBSERVABILITY 4.l Introduction ............................................. 43 4.2 General Results of Some Basic Experiments ................ 46 4.3 Results of Further Experiments ........................... 49 4.4 The Results of Experiments With Either The Identification Effect of the Sample Split Effect Constant .................................... 54 4.5 Summary .................................................. 57 FIVE SUMMARY AND CONCLUSIONS ....................................... 81 APPENDIX A .............................................................. 87 REFERENCES .............................................................. 90 ii LIST OF TABLES Table . Page 1 Sample splits for different 3 ................................... 60 2 Ratios of asymptotic variances (cost of partial observability) .................................................. 6l 3 Sample splits for different c when a] = 82 = ['CCX] ................................................ 62 4 Ratios of asymptotic variances (cost of partial observability) when a] = 32 = ['chi and p=o ................... 63 5 Ratios of asymptotic variances (cost of partial observability) when 81 = 82= [’CCXI and p? 0.5 .................. 64 6 Sample splits_for different c_. when a] = [C_x] and 32 = ['CCX] ................................ 65 7 Ratios of asymptotic variances (cost of partial observability) when 81 = [c_§l, 32 = ['céxl and p=0 ............ 66 8 Ratios of asymptotic variances (cost of partial observability) when 8] = [c_§I, 82 : ['chI and p= 0.5 ......... 67 9 Sample splits for different p when 51 = 32 = [-ch] and c= l.0 ............................... 68 lo Sample splits for different p when 81 = [C_ZI, 82 = [TCQY] and c= 1.0 ........................ 68 iii Table ll 12 13 14 15 '16 17 ‘8 19 20 Page Ratios of asymptotic variances (cost of partial observability) when 81 = 82 = ['CCYT and c= 1.0 ................. 69 Ratios of asymptotic variance (cost of partial observability) when 81 = [C_ZI. 82 = I Cost when Cost when Cost when Cost when Cost when Cost when Cost when Cost when .of -c'X c ] and c= l.0 .......... 70 of 81 partial observability for case four 3 = "CX = ooooooooooooooooooooooooooooooo 82 I c ] and c 1.0 . 7l of 81 partial observability for case four = [c_§1. 82 = ['ch1 and c= l.0 ......................... 72 partial observability for case two - d - d --- B1 - [-c]’ 32 - [c] and Pll - 0.25 ......................... 73 of 81 partial observability for case two = [.dc], 82 = [g] and c: 1.0 ............................. 74 of Bl partial observability for case three ---d = ........................ _ 32 - [ c] and P6;'+ FOO. 0.50 75 of partial observability for case three - -d — -d2 —-—-= ............... of partial observability for case three - a -d : a] - 32 [ c] and c 1.0 .................................. 77 of partial observability for case three - -d - -d2 = B] - [ C11’ 32 - [ C ] and c 1.0 .......................... 78 iv Table Page 21 Cost of partial observability for case four = a -d = ——= when 81 82 [ c], p 0.5 and P00 0.25 ...................... 79 22 Cost of partial observability for case four when a] = 32 = [‘3], p= 0.5 and c= 1.0 .......................... 8O CHAPTER ONE INTRODUCTION The purpose of this study is to consider the bivariate probit model under various levels of observability of the dependent variables, and to measure the loss in efficiency caused by less than full observability. There have been quite a few studies using the bivariate probit model in a variety of settings. Zellner and Lee (1965) presented the probit model as well as other models to analyze discrete random variables. They showed that a joint estimation approach for a set of equations with dichotomous endogenous variables yields estimators which are asymptotically more efficient than single equation techniques, provided that the variables being analyzed are correlated. They considered the example of a durable good purchase decision (buy or not buy) and a credit decision (use installment credit or not-use such credit), while the exogenous variable is disposable income. In this example, both decisions are observable. Ashford and Sowden (1970) considered a multivariate probit model and proposed maximum likelihood estimation for its parameters. They applied their techniques to a bivariate probit model, where the two endogenous variables are breathlessness and wheeze of a coal miner, and the exogenous variable is his age. A coal miner may have positive reSponse to neither, to one or the other, or to both of the two symptoms; so there are four possible outcomes. All four possible outcomes are distinguishable; the data gives the number of individuals with each combination of symptoms, within each age group in the sample. Amemiya (1974) proposed two minimum chi-squared estimators for the same model and found that the FIMC (Full Information Minimum Chi-Square) Probit 1 2 estimator is asymptotically as efficient as the maximum likelihood estimator. Gunderson (1974) discussed alternative statistical models for estimating the probability that an on-the-job trainee will be retained by the sponsoring company after training. In this situation, the employer must decide whether or not to make a job offer, and the trainee must decide whether or not to seek a job offer. Each individual's (either employer's or trainee's) decision is not observed; only whether or not the trainee continues working after training is known. Gunderson used a single-equation model with the dichoto- mous dependent variable coded 1 if the trainee stays with the company, and 0 otherwise. Explanatory variables includes the characteristics of both the trainee (age, sex, education, experience, etc.) and the company (company size, area designation, etc.). Poirier (1980) proposed a bivariate probit model under the same assumptions as Gunderson's concerning the amount of available information. His model includes two probit equations each representing the binary choice of a decision-maker, but only the outcome of the joint (unanimous) choice is observable. That is, the only information about the two dichotomies is whether or not both equal unity, and the remaining possible outcomes can't be distinguished from each other whenever there is a negative choice made by either party. Farber's research (1982) on the demand for unionism shows that the union status of workers is determined by a combination of workers' demand for union representation and the decisions of union employers as to whom to hire. That is, a worker is a union member if and only if he desires a union job and a union employer is willing to hire him. If only the final outcome (union status) is observed, it is impossible to determine whether nonunion workers didn't want a union job, couldn't get a union job, or both, and we have 3 Poirier's model. In Farber's study, a unique data is employed which can be used to identify the union or non-union preference of non-union workers. 50 workers' preferences are fully observable, while union employers' decisions are still unknown for those nonunion workers who didn't want to be unionized. 'Connolly's study (1982) analyzes the joint decision to arbitrate or negotiate the contracts between employees' unions and municipalities in Michigan. According to law, there will be negotiation if both sides desire so and there will be arbitration otherwise, but one of the two parties has to cast a veto to seek the arbitration. Therefore, besides the observable result that the contract is negotiated or arbitrated, one party (only) is observed to use the veto whenever there is arbitration. However, the decision of the party which didn't use the veto remains unknown. The examples above can all be analyzed using the bivariate probit model, but under different assumptions concerning what can be observed. The first two cases (Zellner-Lee and Ashford-Sowden) are different from the others in that the two decisions (or symptoms) are all related to one person instead of to two different parties. But they still represent the case in which the two binary dependent variables are both observable, which can be called the case of full observability. All the other cases have less than full observ- ability, in varying degrees. The model established by Poirier (using Gunderson's example) assumes the least observability information among all the cases. As outsiders, we can only tell whether something failed or succeeded. For Farber's or Connally's case, besides the observable joint choice, one of the two individual choices is observed. All of these cases can be called partial observability cases of the bivariate probit model. 4 With incomplete information, the maximum likelihood estimators obtained in these partial observability cases will be inefficient compared to the estimators obtained in the case of full observability. In other words, there is,a cost (in terms of lost efficiency) of partial observability. The point of this research is to measure this cost. The study of the cost of partial observability is important in itself, but it also has some practical implications. For a researcher facing a high price of getting additional information, it is important to know how valuable the information is, so that an intelligent decision can be made about whether additional information is worth obtaining. This paper is divided into five chapters. In Chapter Two, a formal statement of bivariate probit model is presented. All of the cases considered assume this basic model, but with different levels of observability. Case one is the full observability case, in which both parties' choices are observable, and every possible outcomes can therefore be distinguished. Case two is the model with partial observability in the sense of Poirier, in which the only information is the binary outcome of the joint choice made by both parties. If either one party fails to say "yes", then the remaining outcomes are indistinguishable. Case three is called the partial partial observability case. In this case one of the two parties' decision is fully observable, but if the observable party has a negative response, the other party's decision is not known. Case four is called the partial observability case with observed veto. In this case if both sides do not say "yes", then we can observe one and only one party saying "no" (casting the veto). There are three alter- native possibilities which we will consider, under different assumptions about who will cast the veto first if both parties wish to say "no". The appropriate S likelihood functions for all cases and possibilities are provided, so that maximum likelihood estimates can be obtained. Chapter Three contains the derivation of information matrices for all of the (:ases which were presented in Chapter Two. The conditions for identification for the partial observability cases are also discussed. The information matrix (whose inverse is the asymptotic variance-covariance matrix of maximum likelihood estimator) can help us to measure the efficiency loss with different levels of partial observability. The conditions of identification are relevant for efficiency comparisons, because the closer the information matrix is to being singular, the greater the variances of the estimates will be. In Chapter Four, a large variety of experiments are done to measure the cost of various levels of partial observability. For the purpose of simplification, we assume there are only two exogenous variables, and one of them is a constant term. For each experiment, specific values of the para- meters are picked in order to evaluate the inverse of the information matrix. All of the elements of the inverse of the information matrices for the partial observability cases are divided by the corresponding elements of the inverse of the information matrix for the full observability case. Thus the ratios we present are the ratios of asymptotic variances and covariances of parameters in the partial observability case compared to those of the full observability case. The cost of partial observability increases as these ratios increase. We attempt both to make a rough statement about the cost of partial observ- ability in typical cases, and also to identify what types of changes in the parameters cause this cost to increase or decrease. 6 The results of these experiments will be interpreted in Chapter Four, and the tables at the end list the main numerical results. The summary and conclusions of this study will be given in Chapter Five. CHAPTER TWO BIVARIATE PROBIT MODELS WITH FULL AND PARTIAL OBSERVABILITY 2.1 Introduction In this chapter, we will give a formal statement of the bivariate probit model, and consider its estimation under various assumptions about what is observed. Basically, our treatment of the estimation problem is just to provide the appropriate likelihood function to maximize, though in some cases we point out alternative possibilities. The questions of identification and of the relative efficiencies of the various estimators will be deferred until Chapters 3 and 4. Now we start by reviewing the bivariate probit model. Consider two individuals (j=l,2) each faced with a binary choice, yj=m, m=0,l. The dependent variable yj takes on the value 1 if an event occurs or 0 if it does not occur. Suppose the two individuals have utility functions of the form U1m a 91m (wlm’ y2*) + nlm "1:0’] U2m 3 92m ("2m’ 5’1” I n2m where for j=l,2, g. is a non-stochastic scale function, w. is a fixed vector 3m 3m of characteristics of individual j and choice yj=m, ”jm is a random disturbance term and yJ.* is the utility differential * = o - o .= , . yj UJ1 UJ0 J l 2 8 This specification permits interdependency between the utility functions of the two individuals in the sense that the utility of each individual is a function of the sentiment of the other individual. Further suppose 9n ("11’ 372*) ' 910 ("10’ Y2” ‘ Yly2* + X51 921 ("21’ yi”) ' 920 (”20’ Yi”) ‘ Yzyf' + X52 "ii ' ”10 = V1 ”21 ' ”20 = V2 where X is a K-dimensional row vector of explanatory variables, a] and 62 are K-dimensional column vectors of unknown coefficients, y] and y2 are unknown parameters, and V a [V1, V2]' ~ N(O, 0) with 0.1 O) a = [mll w12] 12 22 Then it is easy to show that ’f = 0’2" + X51 1 Vi "“ (1) * = c.--- y2 125’? I X62 1 V2 (2) and that individual j will select '~< ll 1 1ff yJ.*>O 1.e., Ujl > UjO ‘< u 0 iff yj*:O i.e.,U. :u .11 3'0 9 The reduced form equations corresponding to (l) and (2) are yf" ' X81 + 61 ------------- (3) y2* = XBZ + 82 ------------- (4) B] = (51 ' Y152)/(1 Y1Y2) where 82 - (52 ' Y251)/(] Y1Y2) 81 = (V1 + Y1V21/(1 - Y1Y2) e2 : (V2 + Y2V1)/(1 YIYZ) and [:1] has bivariate normal distribution with 0 mean and variance-covariance matrix as [l :1. Here the variances of a1 and 52 have been normalized to equal unity and p is the correlation between 8] and e2. The model just presented is common to all of the cases we will consider. However, the cases differ with respect to how much one observes about y1 and Y2- 2.2 Case One: Full Observability Here we assume that y1 and y2 are both observed. Among all the cases we are going to discuss here, this is the one which.has the most complete observability, and which leads to the most efficient estimates. An example of such a case would be a two member committee voting under a unanimity rule, but with both votes observable. That is, in our random sample of votes, i=1, ---, N, we can not only observe the explanatory variables Xi, but also the votescafboth voters, i.e., yi1 and in' Therefore, there are four possible outcomes which are all distinguishable: (1) both vote yes , i.e., yi1tl and yi2=13 10 (2) the first party votes "yes" and the second votes "no", i.e., yi1=l and yi2=°3 (3) the first party votes "no" and the second votes "yes", i.e., y11=0 and ’12:“ (4) both vote "no", i.e., y11=0 and y12=0. The distribution of in and yi2 in this case is P(yi]=1 and y12=1) F(X181, X182; 9) i=1,---,N P(yi1=l and yiz=0) = F(Xi81, -X182; ~p) P(yi1=0 and yi2=l) = F(-xi81, xiBZ; — p) p(y11=0 and y12=0) F(-XiB], -XiBz; 9) 1 - 4(X181) - ¢(Xi82) + F(Xi81, x182; p) where F(-,-; -) denotes the bivariate standard normal distribution function with correlation coefficient 9, while o(-) is the univariate standard normal distribution function. We can always estimate the reduced form equations separately. The log- likelihood functions are N in L = g {yij 2n ¢(Xisj) + (l-yij) 2ny12=0 1 l 1 l 1 2 15 observed + (l-p) [1_ ¢(XiB]) _¢(x132) + F(Xis], xiBZ; o)]} = §)Z.=] in P(XIBI, x182; p) 1 is observed + 2 In {pll- ¢(X.8 )l + (l-p) [¢(X.B ) - F(x.6 , X-B ; o)l} i;yi1=0 1 1 1 2 1 1 1 2 is observed + 2 in {p [¢(X-8 ) - F(X-B , X.B ; o)l + (l-p) [l-¢(X.B )1}. i’y12=0 1 l 1 l 1 2 1 2 is observed Possibility Two: Case 4A2, Observed Veto, p is Another Parameter Here instead of having p as a given constant, we let p be an unknown parameter in the model. The log-likelihood function is the same as above except p needs to be estimated too. In L(B]: 82: 0: p) = §+Z.=] 2" F(Xi31, x182; p) 1 is observed 17 + s an {pil- 9(X181)] + (l-p) [4(X182) - F(X161, X182; p)l} 19yi1=0 is observed + i in {p[¢(XiB]) - F(X18]. X182; 0)] + (l-p) [l- ¢(X182)]} *y12’O is observed Possibility Three: Case 43, The First Party to Ask for_Arbitration is the One Who Wants Negotiation Least Recall equations (3) and (4) Y1* X81 + e1 ’2* x82 + 62 y1* and y2* are the utility differentials between voting "yes" and "no". They represent individual j's (j=l,2) "sentiment" toward yj=l. When yi]* < O and yi2* < 0, so that neither party wants negotiation, it may be that the party whose sentiment is more strongly against negotiation will cast the veto first. That is, when ‘< ._n .1 * II x m ._I + m .4 _.n A O :1- u x &> and XiBl + 611 < XiBZ + 612 , it is reasonable to conclude that the first party will cast the veto first. So we use P(ei2 < - xiBZ’ 611 - 812 < XiB2 - Xi81) to represent the probab1lity that the first party is observed using the veto and P(ei] < - XiB], €i2 - 61] < .. X181 x18 2) to represent the probability that the second party uses the veto, given that both of them don't want a negotiation. The log-likelihood 18 function is 2" L (8]: 829 p) = §*Z.=] 2" F(X181’ X182; 9) 1 is observed + 2 2n [F(-X.B , X.B ; -p) i,yi]=0 1 l 1 2 is observed + P(Eiz < 'xiBZ’ e1'1 ' E1'2 < X132 ' X131)] + 2 2n [F(X B , -X-B ; -p) i’y12=0 1 l 1 2 is observed + P(611 < -X181, 812 - €11 < X18] - X18211 . ”ere P(eiZ < 'XiBZ’ e1'1 ' 812 < X182 ' X181) 6. -8. XOB -XOB _ 11 12 1 2 1 l _ ’ P < ’ e1'2 < x182) 12(1-9) ¢2(l-o) XLB -X.B ‘ FELL-1J- , ‘- X.8 ; -/(1-o)/2) /2(]—p) 1 2 X B -X B F( 1 i 1 1 s ‘ x182; 9') 9 -Lp' where p' =-¥(lep)/2 P(511 ‘ 'XiBl’ €12 ‘ 811 < X181 ‘ X182) 19 XiBZ'XiBl . F(- XiBla ' X182, 9) ' F(-—:§ET———'9 - X182; 0 ) 1 ‘ ¢(X181) ' ¢(X182) + F(XIB]’ X182; 0) X.3 -X.s ' F64L2L_l—l' s ‘ X-B 3 p.) a on 12 -aa then an L (81. 62, p) = 2 2n F(X181. X182; 9) irZi=l is observed + 2 2n [¢(X B ) - F(X B , X-B ; o) 1*y11’0 1 2 1 1 1 2 is observed XiBZ-Xp1 + F(—_—.—'—— . - x162; o')1 X.B -X.B '1'; in [1’ ¢(x182) " F(_1"'g_'_1_'l s ' X182; 0.)] 1*y12=0 '29. is observed 2.6 Summary In this chapter, six cases are introduced to represent full observability and different types of partial observability for a bivariate probit model. The example of a two member comittee voting under-a unanimity rule can be applied to all cases. Case One gives full observability about the model, since the dichotomous choices of both voters are always observable. Separate estimation is possible for each probit equation but this would not be efficient unless the correlation coefficient p=O. 20 In Case Two, partial observability in the sense of Poirier, only the result of the joint choice of two decision-makers is observed. As long as either party votes "no", the separate votes of the two voters are indis- tinguishable. This is the case which gives us the least information. Case Three and Case Four each lie somewhere between the above two cases. One of the two voters' behavior is observed in Case Three. But when this observable party votes "no", the two choices of the other voter are indistin- guishable. The observable party's probit equation can always be separately estimated, but only when p=O will this be as efficient as joint estimation. Separate estimation for the other party's probit equation is impossible unless p is known to be equal to zero. In Case Four, when either party (or both) votes ”no", we observe the casting of a "no" vote. But while the one party is observed casting the veto, the vote of the other party remains unknown. Some assumptions must be made here about who will use the veto first. We can either assume some fixed p to be the probability that the first party does so (Case 4A1), or have p as another unknown parameter in the model (Case 4A2). Another possibility, which is Case 43, is that the party with the strongest sentiment for a "no" vote will be observed casting the veto. We have provided likelihood functions for the various cases. In each case, a numerical maximization of the likelihood function provides maximum likelihood estimates. In Chapter 3 we will consider the asymptotic distributions of these estimates, and in Chapter 4 we will compare their relative efficiencies. CHAPTER THREE DERIVATION OF INFORMATION MATRICES AND CONDITIONS FOR IDENTIFICATION 3.1 Introduction The log-likelihood functions that we presented in the last chapter for full and partial observability cases have prepared us for the derivation of the information matrices here. The information matrix by definition is equal to minus the matrix of the expectation of the second-order derivatives of the log-likelihood function with respect to the parameters. That is, E(333%%§7L), where e is the vector of unknown parameters. Under certain regularity conditions, it can be shown that the maximum likelihood estimates are consistent and asymptotically normal, with a variance-covariance matrix which is equal to the inverse of information matrix. Therefore, through the information matrices which we derive in this chapter, we can compare the variances and covariances of the parameter estimates in different cases. That is, we can measure the efficiency lost for lack of full observability, which can be called the cost of partial observability. Note that [- E(azlog L/aoao')] 'E [(3 109 L)(3 109 L): ] The latter formula will be used for all cases in this chapter. We also consider the problem of identification of the parameters of the model, under various levels of observability. Since (given certain regularity conditions) a necessary and sufficient condition for local 21 22 identification is non-singularity of the information matrix, we examine the rank of the information matrices which we present. For all levels of observability which we consider, the parameters are identified except in certain special (perverse) cases, whiCh we point out. The question of identification is important in its own right, but we are also interested in it because it is relevant for efficiency comparisons. The closer the information matrix is to being singular, the larger are the variances of the estimates. In the next chapter, we will compare efficiencies by evaluating the inverse of the information matrices, for various specific parameter values. Knowledge of the perverse cases which lead to non-identi- fication will help us in picking regions of the parameters space to investigate. Section 3.2-3.5 contain derivations of the information matrices and discussions of identification for the different cases listed in the last chapter. Section 3.6 gives a summary of this chapter. 3.2 Case One: Full Observability In order to simplify the notation, we let F]. = min], x132; o) i=1,2,...,N a1 = Xi81 b1 = X182 0 = (81'. 82', o)‘ . Then the log-likelihood function for the full observability case is N in L(0) = S {y}.1 yi2 In F1 + yi](1-y12) 2n[ °(-) denotes the standard normal density function, fi=f(xi81’ X182; 0) denotes the standard bivariate normal density function and A1 ___ «17:2 “’1' ‘ “i" B. = 1 (a. - ob.) . ' «TIT? ‘ ' If p is given a priori, the equation (1) is still the same. But 6 now is (31', 82')'instead of (81', 82', o)‘ and the information matrix is (2K)-(2K). If the two probit equations are separately estimated, the log-likelihood function of the first equation is N In L (81) = g {yi1 2n 0(61) + (1'yil) 2n ¢(-ai)} and the information matrix of the first probit equation is N J= 2. M(a1.) M(-a1.)X1.'X1. 1 where M(ai) = 25 Using the same method, we can get the information matrix of the second equation. Separate estimation will not be as efficient as joint estimation unless o=0. Here the information matrix for joint estimation with full observability is the same as given by Amemiya (1974) using the FIMC (Full Information Minimum Chi-Square) Probit method. When the probit equations are estimated separately, the information matrix is the same as his using LIMC (Limited Information Minimum Chi-Square) Probit Method. Under certain conditions, the information matrices of other (partial observability) cases will be singular. But this is not the case here. For full observability, the parameters are identified, except of course in the case of perfect multicollinearity. Perfect multicollinearity is also a perverse case for all of the levels of observability which we consider, but we will not discuss it further. 3.3 Case Two: Partial Observability in the Sense of Poirier The log-likelihood function of Poirier's partial observability model is N in L(o) = a {2i 2nFi + (1-Zi) 2n [l-FiJ} and the information matrix is 1 J=c‘c s 5 "“(2) where C5 is the N r(2K+1) matrix with ith row equalling VFizW-Fi; [¢(ai) 9(A11X1, ¢(b1) ¢(Bi) Xi, f1] , When p is equal to zero, the ith row of C5 becomes 1 76(ai)¢(bi)[1-¢(ai)¢(bi)] [¢(a1-)(b1-)Xi, ¢(b1.)(a)<1>(A)X,-2 ¢(b)¢(B) L¢(b)o(B)X1-z L f ..- It can be seen that in the information matrix, the first row x 1% = the third row, ¢(a)¢(A) the first row x-—-¥:————-= the last row, ¢(a)¢(A) and the second row x 3i9131§l-= the fourth row. { ¢(a)¢(A) Thus the rank of the information matrix is only two; the parameters are not identified. This is so even if p is known a priori. The second perverse case we consider, which was noted by Poirier, is the case in which B1=82. (That is, B11=82], 812=822.) Then bi=ai’ Bi=Ai and the 28 U . . O 1 1nformat1on matr1x1s J: 2 (F). (F ).' i Fi(l-Fi) <3 1 0 1 where MaiMAi) ' (F ). = 4’(‘"i)"’(”'i)xiza ¢(ai)¢(Ai) ci(- ai+bi 1 x.' 381 720-9) /2(1+p) 72(1-6) ' _3_H_i. = 4%in ., ai+bi ) 1 .1 as2 /2(1-p) /2(1+o)' 72(1-6) ' 91 3H. b -a. a.+b. b.-a. —1 = fly—L— [¢(_L_‘_)¢(- .1.__‘_) ‘ ') .Bp 72(1-6) 72(1-9) v/2(1+p) 1-o bi-a. __._.-.. - f(———l-. - 6.; -/<'I'-o)_/2)1 /2(l-o) 1 The derivations of these three derivatives are in Appendix A. Identification problems still occur under the assumptions that 811 E'21 Xi - [l X12], 5] - [B12] and 82 - [822] as before. When B12 - 322 - O and p is unknown, the information matrix of the case of observed veto with a 37 given probability p is of the form :9 ' N 1 (F ) (F ) ' + -l'(Q ) ( 1 ' +'l- ( ) ( ) '} ‘ §=1 {‘f' o i o i Q1 10 i 010 i 02 Q26 i 026 i where - ptl- ¢(a)1 + (l-p)[¢(b)-F] - p[¢(a)-F] + (l-p)[l-¢(b)l O O N .—I l I (¢(a)¢(A) ‘ (at)i>(A)Xi2 ¢(b)¢(8) j_¢(b)i(B)Xi J f r¢(a) l-p-(l-p)¢(A)J ¢(a) I-p-(l-p)¢(A)]Xi2 'Qie'i "' (l-p)¢(b) n- wen (l-p)¢(b) [l- ¢(B)]X1-2 - (l-plf and ’p¢(a) u- ium W p¢(a) [l- (A)1xi2 (029). 3 ¢(b) I-(l-pl-p i(B)] ¢(b) I-(l-pl-p ¢(B)]X1.2 -pf b It can be seen that for the whole information matrix (the first row x q1) + (the third row x q2) = the last row 38 where = (l-pr_ q 1 ¢(a) [(l-p)¢(A)+p¢(B)l q = 9f 2 ¢(b) [(l-p)i(A)+p¢(B)] So the information matrix of the case of observed veto with given probability p is singular, and neither 51 nor 62 can be identified. However this is true only when p is unknown. If p is known and o=(81', 82',)', the information matrix is not singular when 312 = 322 = O. . In the case of observed veto with p as a parameter, the information matrix is still of the form as before N = 1 l- I 1 I 1 I but 6 now includes p as (81'. 82', p, p)‘ and (Fo)i’ (Qlo)i and (029), are all matrices with dimension 6:1.' The extra rows of (Fo)i’ (Qlo)i and (020), are 0, $L-[1- ¢(a)- 6(b) + F] and fil-[l- 6(a) - o(b) + F] respectively. For this 1 2 6-6 information matrix, besides the relationship that (the first row x q1) + (the third row x q2) = the fifth row where q1, q2 are the same as before, it is also true that [the first row x ¢(b)¢(8) - the third row x ¢(a)¢(A)] -[l- i(a) - ¢(b) + F1 [9 ¢(a) ¢(b)¢(8) + (l-p) 6(a) ¢(b)o(A)l X = the last row. 39 Therefore for the case of observed veto with p as another parameter, when 812 = 822 = O the information matrix is singular whether p is known a priori or not. Another way to say this is that this model, B] and 82 are identified only if both p and p are known. Finally we come to the question of identification for the case of observed veto under the assumption that the first to use the veto is the one who wants it worst. When 812 = 822 = O and p is not known, the information matrix is J .—. 1 domz 1 . 1 . 1 . {F(F0)i(Fe)' +R1—(Rle)i(Rle)i +R'2‘(R26)1(R26)i} where (Fo)i is the same as above, R1 = 0(b) - F + H R2 = l - 4(b) - H ' - ¢(a)i>(A) + H 1 B1 [- ¢(a)¢(A) + H811 x,2 - H8] [- H811xi2 - f + Hp W (Rlo)i = - H8] ('”61)Xi2 "s - ¢(b)(A)Hp - fHB1 o(a)¢(b)¢(A)¢(B)-[6(a)¢(A)+¢(b)¢(B)]HB] So for this case, the information matrix is singular when 812 = 822 = O and p is not known, but not so when p is known. Intuitively, when the coefficients of all the exogenous variables except the constant term are equal to zero, there are three distinguishable events (yes, no with party one vetoing, no with party two vetoing). Hence two probabilities are independently estimatable and there are 2K pieces of infor- mation available. Thus we can identify at most 2K parameters. From this point of view we can see that the informationmatrices in the first and the third situations are singular because there are (2K+1) parameters when p is not known. But all the parameters can be identified if p is given. The second case (which is the one with p as andther parameter) can't be identified unless both p and p are known. 41 3.6 Summary In this chapter, six different information matrices have been derived corresponding to joint estimation of the bivariate probit model with varying degrees of observability. Some information matrices for separate estimation of the two equations are also derived, but separate estimation won't be as efficient as joint estimation unless p=O. Because it is the inverse of the information matrix that is the asymptotic variance — covariance matrix of parameters, we also discuss question of identification by analyzing the rank of these matrices. We especially analyze the perverse case in which all the coefficients of exogenous variables are equal to zero. It is found that in the above situation, regardless of the values of the constant terms, all of the information matrices for the partial observability cases are singular if p is not known. When p is known a priori, only the cases of partial observability in the sense of Poirier and of observed veto with p as another parameter still suffer from a lack of identification. The model with full observability is still identified in this case, whether a is known or not. The second perverse case is the one in which 81 = 82; that is, the two probit equations are identical. Then the model with partial observability in the sense of Poirier is not identified, though there are no problems with the other cases. The third perverse case is when 8]] = p321 and 312 = ”822' The model with partial observability in the sense of Poirier and the partial partial observability case are not identified when p is unknown. But there are no problems when p is known a priori. In the similar situation when 821 = pen and 822 = 9812, only the model with partial observability in the sense of Poirier is not identified when p is unknown. 42 These results will be useful in picking parameter points at which to evaluate the cost of partial observability. This we will do in the next chapter. CHAPTER FOUR RESULTS OF EXPERIMENTS MEASURING THE COST OF PARTIAL OBSERVABILITY 4.1 Introduction In this chapter, the results of some experiments measuring the cost of partial observability are presented. We call these "experiments" because various specific values of the parameters have to be picked in order to evaluate the inverse of information matrix. What we are interested in is primarily (l) with the same values of the parameters, a comparison of efficiencies under different levels of partial observability; and (2) the reasons that will cause the change of cost for each individual level of partial observability. With this knowledge, a researcher can compare costs of using and not using a piece of information according to his case and make a better decision. When measuring the cost of partial observability, we let the elements of the inverse of the information matrices for the various partial observability cases be divided by the corresponding elements of the inverse of the information matrix for case one. The reason for choosing case one as a standard of comparison is because it has the most complete observability and therefore leads to the most efficient estimates. Thus the ratios that . we get are the ratios of asymptotic variances and covariances for the various partial observability cases compared to the full observability case, and represent the relative efficiencies of parameter estimates under different levels of observability. The bigger the ratios are, the greater the cost of partial observability. 43 44 Some simplifications made in the last chapter are still applied here. 8 B 11 _ 21 I I s B - [ = e'x13 and the X1.3 are random normal deviates with zero mean and unit Specifically, we assume Xi=(l Xi2)’ s] = ] , where Xi variance. All the experiments in this chapter have been done with a sample size of 50. We also tried sample sizes of 10, 100 and 500, but the results didn't show any significant difference. This simply indicates that the ratios of asymptotic variances are more or less independent of sample size. We do not address the question of what sample sizes are necessary for the asymptotic distributions to be reliable. In section 4.2, we first try three arbitrary cases B1=32=[}]; 81=82=[?]; 6]=[J]] and 32=[}]. We present some general results which we believe are always true for any values of 8. In section 4.3, all the experiments have been done with one common characteristic, namely that g X131 ==§X132 = O (ensured by appropriate choice of value of the constant term). The idea of these experiments is to show the effects of changes in the parameters (8], 82, p and p) when (on average) each party has an equal probability of saying "yes" and "no". These results are not as readily interpreted as we might hope, since changing any parameter (e.g. 812) changes a number of features of the data and model which are relevant to the relative efficiencies, such as the degree of identification, the split of the sample into the various distinguishable outcomes, and so forth. Therefore in section 4.4 we try some more complicated experiments in which we manipulate the parameters in such a way as to isolate these various influences. For the most part these results are in accord with our a priori expectations. Section 4.5 gives the summary of this chapter. 45 Twenty-two tables at the end of this chapter present the results of the experiments. Only the ratios of asymptotic variances of the parameter estimates of the non-constant-term exogenous variables (B12 and 322), for different levels of partial observability, are listed and discussed. Except for those experiments especially designed to focus on p and p, all of the experiments have been done for p=O and p=0.5, and p is fixed at 0.5 for case four. As before, case one is the full observability case; case two is the case of partial observability in the sense of Poirier; case three is the partial partial observability case; case 4A1 is the observed veto case given a known value of p; case 4A2 is the observed veto case with p as another parameter; and case 48 is the observed veto case under the assumption that the party who uses the veto first is the one that wants the veto more strongly. Also we define N “—'= Z F(X-B . X.B ; p)/N P1] i 1 l 1 2 N 10 1 N 01 1 N L 13‘“? [l- ¢(Xie]) - <1(X1-82) + F(xis]. X182;p)]/N OO 1 N X.B -X B s_____._ g = z P(l—Z—i—l. 41.32; - /(1-p)/2)/N. l /2(1-o) They are the average probabilities of: both parties say "yes"; party one says "yes" and party two says "no"; party one says "no" and party two says 46 "yes"; both parties say "no"; and party one uses veto first given that both parties desire to do so; respectively. The distribution of the first four probabilities are called "sample split" and all the probabilities are listed in some of the tables. Also in the tables, "p is not known" means that p is not given but rather is estimated as a parameter, and the information matrices are of dimension (2k+l)°(2k+l) (or (2k+2)-(2k+2) for case 4A2). 4.2 General Results of Some Basic Experiments The experiments presented in this section are called "Experiment 1" in order to be distinguished from the others in section 4.3 and 4.4. Experiment 1 includes three different B's. They are (l) B] = 82 = [1]; (2) B1 = 82 = [?1; 1]. For the first two choices of 3, Poirier's and (3) a] = [3,1 and 82 = I model (case two) is not identified because X181 = X182 for all i. The other cases are all identified. - Table 1 lists the (expected) sample splits for each case under both p=O and p=0.5. Table 2 shows the relative efficiencies of B12 and 322 for each partial observability case. The results in Tables 1 and 2 are not easy to summarize, but we do note the following. (1) When 81 = 82 = [:1, Xie1 and Xie2 are all positive numbers greater than 1, so the average probability of both parties voting "yes" is close to 1, and the average probability of both parties voting "no" is very small. Since X18] = X162, P76 = 01; the average probability of one party voting "yes" and the other voting "no" is the same as the probability of the opposite situation. Both probabilities are small. This is a fairly extreme sample split.. (2) (4) (5) 47 When 81 = 82 = [1] changes to B] = 82 = [$1, both the values of XiBl and xiBZ are decreased, so the average probability that both parties will say "yes" is decreased too. On the other hand, the average probabilities that one or both parties say "no" are all increased. The samples split is still heavily weighted toward fboth yes", but less extremely than before. Knowing p results in smaller ratios of asymptotic variances, and hence reduces the <:ost of partial observability. This agrees with the general principle that we will call the "law of decreasing marginal utility of information" (LOMUI), which is that the more information one has, the less should be the value of another piece of information. Observability information is less valuable when p is known a priori than when it is not. The Poirier case is the worst among all the partial observability cases either when c>is unknown or known, because it is the one that is based on the least information. For case three, the ratios of asymptotic variances for estimates of 312 are equal to one for p=0 and are still very close to one for p=O.5, for all values of B. This is so because party one's behavior is fully observed in this case, and when p=0 party two's behavior is not informative for party one's parameters. This is not true however when pfO. For 82 (party two's parameters), the ratios are bigger than any of those for case four when p is unknown. When p is known, they are bigger than those for case 4A1 or case 48 but smaller than those for case 4A2. (6) (7) (8) (9) 48 When Kip] = XiBZ and p = 0.5, 812 and 822 have the same efficiencies for all three different possibilities of case four. The estimates for case 4A2 are less efficient than for case 4A1, since in case 4A1 more is known (namely p). In general the gain from knowing p is greater when p is known (and conversely). This is a counter- example to the LOMUI. For case 48, H'is the average probability of XiBl + 21] < X162 + €i2 given that X181 + e1] < O and Xis2 + £12 < 0. So when Xis1 = X132, H'= %-P66 . In the case of 31 = [1]] and 32 = [1], xis1 < X132, so that party one is more likely to use the veto first when both sides vote "no", and thus H'> %-PBB . Furthermore, the average probability of the indistinguishable outcome is P6;'+ H for party two while it is P}; + P66 - H for party one. Hence 822 tends to be less efficient than 812 in case 48 when Hi> %-PBB , as it is here. With X161 = X182 and p = 0.5, the efficiencies of 812 and 822 in case 4A1 are very close to those in case 43. This is so because when X181 = xiBZ’ the probability of X181 + 811 < XiB2 + 612 is indeed 0.5. The above conclusions will be seen to hold also in the experiments yet to be presented,and thus are fairly general. (Some of them, of course, are perfectly general since they must be true.) We will not discuss them further. There is some evidence above on the effects of changes in p. However. this will be discussed in the next section. 49 4.3 Results of Further Experiments In this section we report the results of more experiments, which vary 8], 82, p and p more widely than in the last section. All of the experiments in this section have in common the feature that g X131 = g X132 = O. The point of this is to try to minimize effects of parameter changes on the sample split. For example, consider an experiment which varies 812 from zero to some high level. As B12 increases, for some of the partial observability cases the degree of identification increases and we might expect the cost of partial observability to fall. (We might call this an "identification effect".) However, if we change 812 while holding 8]] constant, the probability of a "yes" vote also changes, and thus the sample split changes. This also may affect relative efficiencies; for example, as P;;' increases there is fuller observability for all cases and the relative efficiencies of the partial observability estimators should increase. (We might call this a "sample split effect".) Therefore in this section when 812 is changed, 811 is also changed in such a way that XiB1 is (on average) zero: E X181 = 0, and similarly for 82. For p=O and small B's, this will yield a symmetric sample split: P66 = P6; = P;6'= PT;'= i-. For larger B's this is unfortunately not so. Because of the non-linearity of the model, the average probabilities (over the sample) are not the same as the probabilities evaluated at the average of Xi. The latter will all equal %~(for p=O, anyway) but the former will not. Thus we are not entirely successful in separating identification effects from sample split effects. Indeed, it is not clear how well one can hope to do so, but some more successful attempts will be made in the next section. 50 Several types of experiments are presented in this section: (1) Experiment 2: B1 = 82 = ['CCX] for c = 0.3, 0.5, l, 2 and 3, -X. where X ‘3 e = 1.525 , l x. = 1 '2 i II II M 2 II M Z i X1.3 , i=1,2,...,N, are random numbers with standard normal distribution havingii=0 and o=l. The same X's will be used for all the experiments here, and the sample size is 50. The results of Experiments 2 are in Tables 3, 4 and 5. (2) Experiment 3: B] = [C_E] and 32 = ['ccx] for c = 0.3, 0.5, l, 2 and 3. The results are in Tables 6, 7 and 8. . . - g 4X (3) Exper1ment 4A. 3] - 32 . [1.0]; Experiment 4B: 3] = [_]¥0] and 82 [;¥0] s both for p=-O.5, O, 0.2, 0.5 and 0.8. The results are in Tables 9, 10, 11 and 12. ' o = = -7. (4) Exper1ment 5A. 31 32 [1.0]; [-1161 and B2 . , -7' Exper1ment 58. B1 [1.0], both for p=O, 0.2, 0.5, 0.8 and l (but for cases 4A1 and 4A2 only). The results are in Tables 13 and 14. It is hard to summarize so many tables briefly. However, we will discuss what we find to be the most interesting results. For experiment 2, we know that case two is not identified for any value of c because X181 = xiBZ’ i=1,2,...,N, and case three (82 only) and case 51 four are not identified if c is zero when p is not known. If p is known only case 4A2 is not identified when c is zero. As c increases, the "identification effect" should be to increase relative efficiencies of the partial observ- ability cases. Meanwhile, we notice that the average probabilities of the indistinguishable outcomes, namely (Pa;'+ P66) for 32 of case three and P66' for case four are getting larger when c increases. That is, the identification effect and the sample split effect work against each other when c changes. From the results when p is not known, we can see that when c is smaller (0.3, 0.5) the identification effect is quite strong so that the ratios of asymptotic variances are decreasing as c gets larger for both p=O and p=O.5. But when c increases to a certainlevel (261:3), these two effects seem to cancel each other out and the results are less clear. For the p=O case, whether these ratios will increase or decrease after c > 2.0 is uncertain. When p = 0.5, although ratios are monotonically decreasing as c increases from 0.3 to 3.0, whether they would keep on increasing or not can't really be predicted. When p is known, the relative efficiencies of case three (62 only) and cases 4A1 and 4B are generally decreasing as c increases, presumably because of the sample split effect. For case 4A2, since it still has an identification problem if c is too close to zero, relative efficiencies are increasing as c increases until c=3. In Experiment 3, all of the parameters are identified for all cases. However, for c=0 cases 2 and 4A2 would not be identified, while cases 3, 4A1 and 48 would be identified only with p known. As for the sample split effect, the average probability of the indistinguishable outcome for case two, which is (1-P1 ), is increasing with c but those of case three (82 only) 52 and case four are decreasing as c increases. Thus we are going to discuss the relative efficiencies of the partial observability cases one by one. For case two, the relative efficiencies of both 5] and 32 improve as c increases from 0.3 to 3 for either p=0 or p=0.5, and whether p is known or unknown. This shows that the identification effect is very strong for case two, which is not surprising. For 82 of case three, when p is not known, the identification effect plus the sample split effect make the relative efficiency increase as c increases both for p=0 and p=0.5. When o is known, the sample split effect alone makes the relative efficiency increase for both o=0 and p=0.5 after c=0.5. For case 4A1, the effect of changing c is ambiguous. Note that case 4A1 yields much more efficient estimates than case 4A2, especially when c is small. The same is true in comparing case 4A1 to case 48 when p is unknown. When p is known, cases 4A1 and 48 yield rather similar efficiencies. For case 4A2, for either p=O or p=0.5 and whether p is known or not, the identification effect and the sample split effect both make 8] and 82 relatively more efficient as c gets larger. Therefore, the efficiency relative to full observability is monotonically increasing all the way from c=0.3 to 3. Since case 48 has identification problems only when p is not known, its relative efficiency improves dramatically when c increases, when p is unknown. When p is known, the effect of increasing c is ambiguous. Experiment 4 shows the effect of the correlation coefficient p. For Experiment 4A with p unknown, the relative efficiencies of all cases (only 32 for case three) get worse as p increases. The changes are considerable 53 especially for bigger p. When p is known, the result is mostly the opposite. Most ratios get smaller as p increases, but these are mixed results for case 4A2. However all the changes are much smaller; this is, 0 doesn't matter much if it is known. For Experiment 48 with p unknown, relative efficiencies of all cases (only 82 for case three) except case 4A1 improve as p increases. The changes are eSpecially big when p increases from -O.5 to O. The ratios for case 4A1 are very small compared to other cases, but they increase as p increases. When p is known, the relative efficiencies of case two and case 4A2 improve but cases three (82 only), 4A1 and 48 get worse as p increases. All the changes are also smaller when p is known. For both Experiment 4A and 4B, the relative efficiency of 8] in case three, being affected by the correlation with another unobservable party, gets away from 1 as the absolute value of p increases. Experiment 5 shows the effect of p on cases 4A1 and 4A2. For case 4A1 and 4A2, the average probability of the indistinguishable outcome for party one is P16 + (l-p) 566' and is 501. + p PBS’ for party two. So as expected, increasing p decreases the ratios for a] but increases them for 32, and the effects are very strong. Both experiments 5A and SB have the same results for cases 4A1 and 4A2 whether p is known or not, but the results are mixed for case 4A2 in Experiment 78 when p is known. 54 4.4 The Results of Experiments with Either the Identification Effect or the Sample Split Effect Constant From the results of the last section, we can see that because of the mixture of the identification effect and the sample split effect, sometimes we can not really tell the direction of change of the relative efficiency when a parameter changes. Therefore, in this section, we try some other experiments designed to change one effect while holding the other constant. All the experiments are done by adjusting the values of the constant terms to manipulate the sample split. Since different cases depend on different features of the sample split, we do a different experiment for each partial observability case. There are three types of experiments in this section: (1) Experiment 6A: 31 = [_g] and 32 = [g] for c = 0.3, 0.5, l, 2 and 3; d is adjusted so that qu'is fixed at 0.25. Experiment 68: 81 = [_i] , 82 = [g] and c=l; d is adjusted so that P?— —l varies from 0.15, 0.25, 0.35, 0.45, 0.55 to 0.65. These are experiments designed for case two and results are in Tables 15 and 16. (2) Experiment 7A: a] = 32 = [‘3] for c = o 3,0.5, 1, 2 and 3; d is adjusted so that P01 + P00 = 0.50. Experiment 7B: 31 = ['31] and 52 = [’32] for c = 0.3, 0.5, 1, 2 and 3; d1 and d2 are adjusted so that P0] = P00 and both are fixed at 0.25. 55 Experiment 7C: 81 = 82 = [-3] and c = 1.0; d is adjusted so that PET" + Paa'varies from 0.20, 0.30, 0.40, 0.50. 0.60 to 0.70. ° - = -d] = -d = ' Exper1ment 7D. 81 [ c-]’ 82 [ c2] and c 1.0, d1 and d2 are adjusted so that P6? = P66' and both vary from 0.10, 0.15, 0.20. 0.30 and 0.35. These four experiments are designed for case three. Since 81 has all the ratios very close to one, only the results for 82 are listed and they are in Tables 17, 18, 19 and 20. (3) Experiment 8A: 3] = 32 = ['2] for c = 0.3, 0.5, l, 2 and 3; d is adjusted so that P66 = 0.25. Experiment 88: s] = 32 = [’3] and c = 1.0; d is adjusted so that P66 varies from 0.15, 0.25, 0.35, 0.45, 0.55 to 0.65. These are experiments designed for case four and results are in Tables 21 and 22. We get the following conclusions from the results of these experiments: From the results of Experiment 6B, 7C, 7D and 88, it can be seen that sample split effect does affect the relative efficiencies of partial observability cases. The higher the average probability of the indistinguish- able outcome for each case, the worse the relative efficiency and higher the cost of partial observability. This result holds under different situations 56 concerning p. Thus, in Experiment 68 (Table 16), increasing PT; increases the efficiency of case two relative to case one. In Experiments 7C and 70 (Tables 19 and 20), increasing PB;'+ P66 decreases the efficiency of case three relative to case one. In Experiment 88 (Table 22), increasing PBB' decreases the efficiency of case four relative to case one. All of this is as expected. Experiments 6A, 7A, 7B and 8A attempt to investigate the identification effect while holding the sample split effect constant. This leads to results that are less clear-cut than those just reported. Basically, identification effects are strong and predictable near points of singularity, but less so far from points of singularity. Experiment 6A investigates the identification effect for case two. The probability of the indistiguishable event for case two are l-PYTL so we hold - the sample split effect constant by holding constant PT;'= 0.25. Lack of identification occurs if c=0 regardless of whether p is known or not. Therefore the efficiency of case two relative to case one is expected to increase (and the entires in Table 15 to fall) as c increases. The results in Table 15 show that mostly they do. The exceptions occur when c is big (c=3) and p is known, which are far from points of singularity. Experiments 7A and 7B investigate the identification effect for case three. Since the probability of the indistinguishable event for case three is (P01 + P00), we attempt to hold the sample split effect constant by -1 . —=—___l . holding constant P6;'+ P66 - 7-(Exper1ment 7A) or P0] P00 Z-(Exper1ment 78). Lack of identification for case three occurs when c=O and p is unknown. Therefore when p is unknown we would expect the efficiency of case three relative to case one to rise (and the entires in Table 17 and 18 to fall) as 57 c increases. This does occur as c increases, when c is small, but for c > 1 the opposite occurs. In other words, the identification effect shows up only when the model is close to non-identification. The same phenomenon occurs when p is known. Then the parameters are identified for all c, and the relative efficiency for case three falls monotonically except for very small c. Experiment 8A investigates the identification effect for case four. The probability of the indistinguishable event here is P66} so the relevant portion of the sample split is P66 , which we hold constant as c changes. Lack of identification occurs when c=0; for case 4A2 this is so regardless of whether p is known, while for case 4A1 and 48 this is so only if p is unknown. The results in Table 2.1 are fairly predictable. Wherever identification is relevant (all cases when p is unknown, but only case 4A2 when p is known) the efficiency of case four relative to case one rises (entires in the table fall) as c increases. For cases where identification is not relevant (cases 4A1 and 48 when p is known) relative efficiency first rises and then falls as c increases. 4.5 Summary In this chapter, we have conducted a large variety of experiments to measure the cost of various levels of partial observability. The results have been given in some detail in the preceeding sections. Here we will give a brief summary of the most important conclusions. The cost of partial observability is quite high. The estimates from Poirier's model (our case two) typically have variances tens or hundreds of times as large as do the estimates from the model with full observability (our case one). This cost decreases markedly if any piece of observability 58, information can be found; for example, observability for either party (our case three) or observed veto (our case four). The law of diminishing marginal utility of information usually holds: the gain in moving from case two to case three or four usually exceeds the gain in moving from case three or four to full observability (our case one). It is the first piece of observability information which is most important. A second clear conclusion is that specifying c>a priori improves the efficiency of the estimates of the other parameters a great deal. Further- more the improvement from knowing a is largest when it is needed most; that is, when the relative efficiency is lowest. A third conclusion is that the sample split has a strong influence on the relative efficiencies of the estimates. For a given partial observability case, its efficiency relative to full observability will be higher, the smaller the proportion of observations which fall into_the indistinguishable categories. Thus, for example, for Poirier's model relative efficiency will be high only when most observations are of the "yes, yes" variety. The fraction of such observations is observable. Similarly, in our case three the observations which reduce relative efficiency are the ones for which the observable party votes "no", and the proportion of such observations is also observable. On the other hand, for the observed veto cases the relevant proportion of observations is not directly observable. Our last main conclusion is that the strength of identification matters. All of the partial observability cases are unidentified for some perverse parameter points (as described in Chapter 3). Their relative efficiency is naturally low for parameter values near such points, and it increases rapidly as the parameters move away from such points of singularity. However, 59 these effects are not strong except in the immediate neighborhood of points of non-identification. Furthermore, this last conclusion depends on unobserved parameters, and therefore is less likely to be informative, in practical applications, than the other three conclusions listed above. TABLE 1 Sample Splits for Different B mflflflfl° 01=02= [11 0.0 0.5 0.9233 0.9320 0.0365 0.0283 0.0365 0.0283 0.0031 0.0113 0.0016 0.0057 91=02= [$1 0.0 0.5 0.6846 0.7243 0.1297 0.0900 0.1297 0 0900 0.0560 0.0957 0.0200 0.0473 0.4422 0.0133 0.5182 0.0263 0.0192 6O TABLE 2 Ratios of Asymptotic Variances (Cost of Partial Observability) _ _ 1 _ _ O _ l - l B 81"82' [1] 81-82' [1] 81’ [_‘|]9 82' [1] p 0.0 0.5 0.0 0.5 0.0 0.5 p is not known Case 2 * * * * 17.6810 6.2666 Case 3 1.0000 1.0117 1.0000 1.0119 1.0000 1.0007 812 Case 4A1 5.1284 28.0120 8.7086 41.8127 4.5844 2.8136 Case 4A2 7.0715 33.2000 10.9572 45.7816 4.7963 2.9075 Case 48 5.1225 27.9721 8.6645 41.6975 2.6225 1.8931 Case 2 * * * * 8857.6250 35637.07 Case 3 9.3734 52.2389 17.2976 78.4054 178.6166 864.2910 822 Case 4A1 5.1284 28.0120 8.7086 41.8127 2.4228 1.9527 Case 4A2 7.0715 33.2000 10.9572 45.7816 46.7191 3.8118 Case 48 5.1225 27.9706 8.6644 41.6975 19.8626 47.1023 is known Case 2 * * * * 7.6118 3.7920 Case 3 1.0000 1.0088 1.0000 1.0073 1.0000 1.0008 812 Case 4A1 1.0255 1.0731 1.1135 1.1755 1.0189 1.0273 Case 4A2 2.9686 6.2613 3.3621 5.1455 4.3046 2.0516 Case 48 1.0196 1.0332 1.0694 1.0604 1.0103 1.0067 Case 2 * * * * 105.3535 191.2173 Case 3 1.0500 1.1219 1.2118 1.2796 2.1731 5.0474 822 Case 4A1 1.0255 1.0731 1.1135 1.1755 1.3622 1.6642 Case 4A2 2.9686 6.2613 3.3621 5.1455 10.1104 3.7010 Case 48 1.0196 1.0331 1.0694 1.0604 1.6767 3.0236 *The information matrix of case two is singular (parameters are not identified). 61 TABLE 3 Samp1e Sp1its for Different c _ _ -c'X when 81'82' [ C ] p=0.0 c 0.3 0.5 1.0 2.0 3.0 1571‘ 0.2491 0.2486 0.2542 0.2629 0.2811 1713 0. 2297 0. 2076 0.1464 0. 0993 0. 0450 1'33? 0.2297 0.2076 0.1464 0.0993 0.0450 1'55; 0.2916 0. 3362 0.4529 0.5385 0. 6288 35+} 0.5213 0. 5438 0. 5993 0. 6378 0. 6838 1'1" 0.1458 0.1681 0.2265 0.2693 0.3144 p=0.5 c 0.3 0.5 1.0 2.0 3.0 1°? 0.3244 0.3151 0.2982 0.2913 0.2944 F}; 0.1544 0.1411 0.1024 0.0709 0.0317 F01- 0.1544 0.1411 0.1024 0.0709 0.0317 9—06 0. 3669 0. 4027 0. 4970 0. 5670 0. 6422 1567.433 0. 5213 0.5438 0. 5994 0. 6379 0. 6739 ‘11 0.1834 0.2013 0.2485 0.2835 0.3211 62 TABLE 4 Ratios of Asymptotic Variances (Cos§_of Partia1 0bservabi1ity) .. _ 'C _ When 81-82- [ c ] and p-O c 0.3 0.5 1.0 2.0 3.0 p is not known Case 3 1.0000 1.0000 1.0000 1.0000 1.0000 8 Case 4A1 91.2821 50.9838 19.9076 23.0626 18.5827 12 Case 4A2 103.0591 56.0029 22.1733 23.8038 18.8309 Case 43 91.1068 50.7766 19.4382 21.7407 16.8380 Case 3 205.0555 121.5853 42.5952 52.0272 37.4875 8 Case 4A1 91.2821 50.9838 19.9076 23.0626 18.5827 22 Case 4A2 103.0591 56.0029 22.1733 23.8038 18.8309 Case 48 91.1063 50.7766 19.4382 21.7407 16.8380 9 is known Case 3 1.0000' 1.0000 1.0000 1.0000 1.0000 Case 4A1 1.3747 1.4050 1.7985 3.0932 3.6511 812 Case 4A2 13.1517 6.4240 4.0643 3.8345 3.8994 Case 48 1.1991 1.1977 1.3291 1.7713 1.9064 Case 3 1.7174 1.7369 2.3280 4.0159 4.6467 8 Case 4A1 1.3747 1.4050 1.7985 3.0932 3.6511 22 Case 4A2 13.1517 6.4240 4.0643 3.8345 3.8994 Case 48 1.1991 1.1977 1.3291 1.7713 1.9064 *The information matrix of case two is singu1ar in this experiment. 63 TABLE 5 Ratios of Asymptotic Variances (Cost of Partia1 0bservabi1ity) when 81=82= {-ch] and p=0.5 c 0.3 0.5 1.0 2.0 3.0 o is not known Case 3 1.0008 1.0038 1.0080 1.0095 1.0119 8 Case 4A1 267.7677 207.5012 63.1785 61.4412 43.9869 12 Case 4A2 281.0156 213.1307 65.5496 62.1682 44.2313 Case 4B 267.4762 207.1903 62.6215 60.1184 42.3013 Case 3 527.4460 402.6952 125.7184 129.2031 91.1600 8 Case 4A1 267.7677 207.5012 63.1785 61.4412 43.9869 22 Case 4A2 281.0156 213.1307 65.5496 62.1682 44.2313 Case 43 267.4762 207.1903 62.6215 60.1184 42.3013 p'1S known Case 3 1.0016 1.0019 1.0046 1.0064 1.0074 8 Case 4A1 1.4345 1.4460 1.7343 2.6848 3.1727 12 Case 4A2 14.6908 7.0827 4.1108 3.4136 3.4176 Case 48 1.1428 1.1351 1.1760 1.3589 1.4840 Case 3 1.6862 1.6701 1.9542 2.8662 3.3950 Case 4A1 1.4345 1.4460 1.7343 2.6848 3.1727 822 Case 4A2 14.6908 7.0827 4.1108 3.4136 3.4176 Case 4B 1.1428 1.1351 1.1760 1.3589 1.4840 *The information matrix of case two is singu1ar in this experiment. 64 m Samp1e Sp1its {or Different cY -c ___-c When 8] - [ -c] and 82 [ c 1 p=0.0 c 0.3 0.5 1.0 2.0 3.0 p;;‘ 0.2297 0.2075 0.1464 0.0704 0.0450 510. 0.2916 0.3362 0.4529 0.5881 0.6288 F01 0.2491 0.2486 0.2542 0.2712 ' 0.2811 F00, 0.2297 0.2076 0.1464 0.0704 0.0450 'pa;4p'5' 0.4788 0.4562 0.4006 0.3416 0.3261 ‘R' 0.1079 0.0914 0.0566 0.0282 ' 0.0205 p=0.5 c 0.3 0.5 1.0 2.0 3.0 P11 0.3011 0.2667 0.1800 0.0856 0.0558 975' 0.2202 0.2771 0.4194 0.5729 0.6181 F01 0.1777 0.1895 0.2206 0.2560 0.2704 235' 0.3011 0.2667 0.1800 0.0856 0.0558 03749‘6' 0 4788 0.4562 0.4006 0.3416 0.3262 R' 0.1334 0.1130 0.0605 0.0332 0.0255 65 TABLE 7 Ratios of Asymptotic Variances (Cost of Partia1 0bservabi1ity) When 81= [C_Z1, 82= ['ch1 and p=0 c 0.3 0.5 1.0 2.0 3.0 p is not known Case 2 605.5864 518.8082 151.5306 221.9935 75.3350 Case 3 1.0000 1.0000 1.0000 1.0000 1.0002 812 Case 4A1 1.7312 1.5502 1.3647 1.5126 1.6559 Case 4A2 104.7621 57.8441 10.8805 4.8029 3.8208 Case 48 48.9987 22.8783 13.1748 11.8071 10.9172 Case 2 21751.885 5910.3139 659.7719 78.4286 42.9161 Case 3 890.1655 411.7187 119.8679 30.1673 19.4758 822 Case 4A1 1.8019 1.6830 1.6583 1.7880 1.8296 Case 4A2 91.1751 48.8784 12.1145 7.1413 5.9189 Case 48 64.2327 30.7263 11.6161 6.0176 5.0118 p is known Case 2 365.3644 313.1222 44.8365 32.2472 22.9143 Case 3 1.0000 1.0000 1.0000 1.0000 1.0002 312 Case 4A1 1.3767 1.3311 1.3634 1.4480 1.5318 Case 4A2 81.4734 36.9280 10.7038 4.5243 3.4053 Case 48 1.1951 1.2293 1.5222 2.5273 3.0136 Case 2 589.6485 529.1590 59.2528 22.1721 12.8180 Case 3 2.7547 3.2830 3.1431 2.5740 2.1090 622 Case 4A1 1.6010 1.6209 1.5473 1.4188 1.3331 Case 4A2 71.8316 32.5675 12.0924 6.2325 4.7055 Case 48 1.5562 1.8215 1.9029 1.7929 1.5520 66 TABLE 8 Ratios of Asymptotic Variances (Cost of Partia1 Observabi1ity) _ c 7' _ -c'Y _ When 8]“ [ -C]’ 82' [ C ] and p-0.5 c 0.3 0.5 1.0 2.0 3.0 o is not known Case 2 314.6830 312.4838 35.6600 48.1634 27.8839 Case 3 1.0042 1.0078 1.0089 1.0081 1.0075 812 Case 4A1 1.7337 1.4753 1.3893 1.5222 1.6044 Case 4A2 54.1470 18.4067 5.1199 2.6099 2.5202 Case 48 19.8579 10.2136 8.1579 8.5622 8.4649 Case 2 4389.5386 2022.1393 82.0819 18.1384 14.0965 Case 3 344.3801 176.7509 36.3043 11.6794 7.9632 822 Case 4A1 1.8750 1.7257 1.8023 1.7998 1.7341 Case 4A2 45.6778 15.6901 6.3376 3.9413 3.7014 Case 48 30.8417 15.6040 6.6949 3.9136 3.2999 p is known Case 2 154.3449 47.5010 18.1745 15.2464 14.2868 Case 3 1.0049 1.0086 1.0076 1.0047 1.0034 812 Case 4A1 1.3572 1.3378 1.3793 1.4578 1.5356 Case 4A2 35.7942 12.6180 5.1139 2.5252 2.3125 Case 48 1.1593 1.2532 1.8462 3.6369 3.8029 Case 2 297.7500 105.2685 20.4917 8.7703 5.8623 Case 3 3.1706 4.0090 3.6484 2.7647 2.3078 822 Case 4A1 1.7264 1.7192 1.6060 1.4609 1.3779 Case 4A2 31.4848 11.8300 6.1416 3.5386 2.9358 Case 1.6245 2.0532 2.1276 1.9355 1.6636 48 67 TABLE 9 Samp1e Sp1its for Different 0 When 81=82= ['CC71 and c=1.0 p -0.5 0.0 0.2 0.5 0.8 9;;' 0.2206 0.2542 0.2701 0 2982 0.3363 573' 0 1800 0.1464 0.1305 0 1024 0.0643 93;' 0 1800 0 1464 0.1305 0.1024 0.0643 F35” 0.4194 0.4970 0.4689 0 4970 0.5351 R' 0.2097 0.2485 0.2345 0.2485 0.2676 TABLE 10 Samp1e Sp1its for Different 0 When 81= [C_Z1, 82= ['CCY1 and c=1.0 p -0.5 0.0 0.2 0.5 0.8 9;;’ 0 1024 0 1464 0.1609 0.1800 0.1961 576' 0.4970 0.4529 0.4385 0 4194 0.4032 $37' 0 2982 0.2542 0.2398 0.2206 0.2045 565' 0 1024 0.1464 0.1609 0.1800 0.1961 R' 0.0441 0.0566 0.0590 0.0605 0.0603 68 TABLE 11 Ratios of Asymptotic Variances (Cost of Partia1 0bservabi1ity) When 81=82 {-ccx] and c=1.0 p -0.5 0.0 0.2 0.5 0.8 p is not known Case 3 1.0088 1.0000 1.0014 1.0080 1.0166 8 Case 4A1 9.5192 19.9076 29.4137 63.1785 225.4031 12 Case 4A2 11.5298 22.1733 31.7488 65.5496 227.6452 - Case 48 9.1134 19.4382 28.9126 62.6215 224.7830 Case 3 20.0543 42.5952 62.4095 125.7184 351.5529 8 Case 4A1 9.5192 19.9076 29.4137 63.1785 225.4031 22 Case 4A2 11.5298 22.1733 31.7488 65.5496 227.6452 Case 48 9.1134 19.4382 28.9126 62.6215 224.7830 9 is known Case 3 1.0042 1.0000 1.0007 1.0046 1.0106 8 Case 4A1 2.0485 1.7985 1.7632 1.7343 1.7078 12 Case 4A2 4.0619 4.0643 4.0991 4.1108 3.9623 Case 48 1.6420 1.3291 1.2619 1.1760 1.0843 Case 3 3.1039 2.3280 2.1651 1.9542 1.6801 8 Case 4A1 2.0485 1.7985 1.7632 1.7343 1.7078 22 Case 4A2 4.0619 4.0643 4.0991 4.1108 3.9623 Case 48 1.6420 1.3291 1.2619 1.1760 1.0843 1* The information matrix of case two is singu1ar in this experiment. 69 TABLE 12 Ratios of Asymptotic Variances (Cost of Partia1 0bservabi1ity) 7' - 'Y When s1= [c_c], 82= [ cc 1 and c=1.0 p -0.5 0.0 0.2 0.5 0.8 p is not known Case 2 2358.6990 151.5306 74.2684 35.6600 20.9822 Case 3 1.0079 1.0000 1.0014 1.0089 1.0232 812 Case 4A1 1.3347 1.3647 1.3735 1.3893 1.4071 Case 4A2 34.7615 10.8805 7.8550 5.1199 3.2795 Case 48 25.1054 13.1748 10.7202 8.1579 6.3586 Case 2 7560.6707 659.7719 288.2825 82.0819 21.5540 Case 3 323.2625 119.8679 77.3109 36.3043 14.5077 822 Case 4A1 1.5246 1.6583 1.7142 1.8023 1.8533 Case 4A2 35.6887 12.1145 9.1256 6.3376 4.2237 Case 48 23.2502 11.6161 9.2417 6.6949 4.8725 p is known Case 2 193.8613 44.8365 30.0131 18.1745 11.3890 Case 3 1.0085 1.0000 1.0014 1.0076 1.0168 812 Case 4A1 1.3369 1.3634 1.3692 1.3793 1.3910 Case 4A2 31.9380 10.7038 7.8430 5.1139 3.2787 Case 48 1.3655 1.5222 1.6122 1.8462 2.6237 Case 2 252.0110 59.2528 38.0065 22.1721 11.0502 Case 3 2.5817 3.1431 3.3396 3.6484 3.8425 822 Case 4A1 1.4760 1.5473 1.5679 1.6060 1.6596 Case 4A2 33.5898 12.0924 9.0963 6.1416 4.0939 Case 48 1.6567 1.9029 1.9851 . 2.1276 2.4247 70 Cost of Partia1 0bservabi1i TABLE 13 When 81:82: [1.0] §y for Case Four p 0.0 0.2 0.5 0.8 1.0 =0.0 but isn't known a priori 812 Case 4A1 42.5952 37.9504 19.9076 3.9124 1.0000 Case 4A2 44.5322 38.0962 22.1733 8.3578 3.5517 822 Case 4A1 1.0000 3.9124 19.9076 37.9504 42.5952 Case 4A2 3.5517 8.3578 22.1733 38.0962 44.5322 p=0.0 and is known a priori 812 Case 4A1 2.3280 2.1301 1.7985 1.3913 1.0000 Case 4A2 3.3388 3.8546 4.0643 3.8814 3.2158 . 822 Case 4A1 1.0000 1.3913 1.7985 2.1301 2.3280 Case 4A2 3.2158 3.8814 4.0643 3.8546 3.3388 p=0.5 but isn't known a priori 812 Case 4A1 125.7184 120.5843 63.1785 9.2338 1.0080 Case 4A2 147.9888 127.0938 65.5496 16.4543 3.3842 822 Case 4A1 1.0080 9.2338 63.1785 120.5843 125.7184 Case 4A2 3.3842 16.4543 65.5496 127.0938 147.9888 p=0.5 and is known a priori 812 Case 4A1 1.9542 1.9082 1.7343 1.4090 1.0046 Case 4A2 3.0465 3.8067 4.1108 3.8258 2.9611 822 Case 4A1 1.0046 1.4090 1.7343 1.9082 1.9542 Case 4A2_ 2.9611 3.8258 4.1108 3.8067 3.0465 71 TABLE 14 Cost of Partia1 0bservabi1ity for Case Four _ 'X _ '7 When 81‘ [_1.0]9 82‘ [1.0] p 0.0 0.2 0.5 0.8 1.0 p=0.0 but isn't known a priori 812 Case 4A1 42.5952 5.3042 1.3647 1.3192 1.0000 Case 4A2 71.7093 37.5250 10.8805 3.2471 1.0002 822 Case 4A1 1.0000 1.3031 1.6583 7.9127 119.8679 Case 4A2 1.0008 3.6176 12.1145 40.9526 154.8729 p=0.0 and is known a priori 812 Case 4A1 2.3280 1.7456 1.3634 1.1418 1.0000 Case 4A2 7.6646 14.1330 10.7038 2.2691 1.0001 822 Case 4A1 1.0000 1.1850 1.5473 2.2172 3.1431 Case 4A2 1.0002 2.2614 12.0924 19.1311 5.3093 p=0.5 but isn't known a priori 812 Case 4A1 20.0543 2.4665 1.3893 1.1930 1.0089 Case 4A2 23.2379 14.3879 5.1199 1.8668 1.0089 822 Case 4A1 1.0088 1.1927 1.8023 5.8469 36.3043 Case 4A2 1.0089 2.0992 6.3376 18.5893 36.4342 p=0.5 and is known a priori 812 Case 4A1 3.1039 1.9304 1.3793 1.1413 1.0076 Case 4A2 4.5041 10.5578 5.1139 1.6483 1.0076 822 Case 4A1 1.0025 1.1938 1.6060 2.4451 3.6484 Case 4A2 1.0061 1.9620 6.1416 10.5167 3.6696 72 TABLE 15 Cost of Partia1 Observabi1ity for Case Two - d -d —. When 8 - {-c1’ 82- [c] and P11-0.25 p=0.0 c 0.3 0.5 1.0 2.0 3.0 d 0.0957 0.1963 0.4508 0.9458 1.4487 p is not known 812 1486.7109 618.7743 48.6514 13.8296 6.1085 822 121893.56 52071.731 36717.786 27168.080 18131.267 p is known 812 552.7849 ' 75.3081 17.4288 5.6840 3.7545 822 2524.1335 728.7705 256.3893 146.7752 270.6402 p=0.5 c 0.3 0.5 1.0 2.0 3.0 d -0.0857 0.0447 0.3535 0.9138 1.4409 p is not known 812 680.0607 35.2165 11.3308 4.3019 3.2452 822 327797.83 159416.21 117079.78 97378.301 62839.36 p is known 812 103.4635 15.2357 6.2589 3.1468 2.1256 822 1536.1228 641.0490 448.0437 459.6155 1106.0470 73 TABLE 16 Cost of Partia1 Observabi1ity for Case Two = d = d = When 81 [_c], 82 [c] and c 1.0 p=0.0 977' 0.15 0.35 0.45 0.55 0.65 d 0.0877 0.7636 1.0680 1.3908 1.7623 p is not known 812 108.2441 26.6034 15.8334 9.7708 6.0623 822 111910.395 15725.367 7589.4556 3886.9336 2047.7613 p is known 812 33.9630 10.6121 6.8617 4.7238 3.2380 822 570.5558 148.0751 96.5562 67.6946 49.9944 p=0.5 P;;' 0.15 0.35 0.45 0.55 0.65 d -0.0543 0.6977 1.0258 1.3666 1.7510 p is not known B12 17.7333 8.1456 6.1336 4.6763 3.8507 822 273997.19 60851.628 34097.547 19487.394 10932.74 9 1S known 812 9.2281 4.7224 3.7254 2.9860 2.3859 822 864.2748 277.7323 185.5475 127.9618 89.0390 74 TABLE 17 Cost of Partia1 0bservabi1ity for Case Three - .. -d —— = When 81-82- [ c] and P01+ 00 0.50 o=0.0 c 0.3 0.5 1.0 2.0 3.0 d 0.4003 0.6357 1.1557 2.0572 2.9064 p is not known 322 206.2943 '94.5461 37.6200 43.0889 48.6845 p is known 822 1.6581 1.6153 1.8644 2.5978 2.9655 p=0.5 c 0.3 0.5 1.0 2.0 3.0 d 0.4005 0.6356 1.1557 2.0578 2.9064 p is not known 822 554.9503 340.8961 117.2571 115.8412 125.9888 0 '15 known 322 1.6471 1.5918 1.7113 2.1069 2.3266 75 TABLE 18 Cost of Partia1 0bservabi1ity for Case Three = -d1 = -d2 —_=-—-= When 81 [ c 1. 82 [ c ] and 01 P00 0.25 p=0.0 c 0.3 0.5 1.0 2.0 3.0 d1 0.4003 0.6357 1.1557 2.0578 2.9064 (12 0.3008 0.4323 0.6820 1.1212 1.5651 p is not known 822 234.5919 96.6320 58.0045 119.1044 217.5330 p is known 822 1.6596 1.6206 1.9260 3.1437 6.2018 p=0.5 c 0.3 0.5 1.0 2.0 3.0 d1 0.4005 0.6357 1.1557 2.0578 .2.9064 d2 -0.0622 0.1009 0.4250 0.9644 1.4679 p is not known 822 1623.2131 706.0171 503.8493 1265.8145 3283.1157 p is known 822 1.9167 1.9058 2.3351 4.1795 9.2418 76 TABLE 19 Cost of Partia1 0bservabi1ity for Case When 81=82= ['3] and c=1.0 Three o=0.0 FEE 0.20 0.30 0.40 0.60 0.70 d 0.0630 0.4564 0.8084 1.5275 1.9623 p is not known 822 18.1194 24.4290 31.3962 42.6311 54.2505 p is known 822 1.2294 1.3728 1.5702 2.3317 3.0872 p=0.5 5674066' 0.20 0.30 0.40 0.60 0.70 0 0.0630 0.4564 0.8084 1.5275 1.9623 p is not known 822 80.7077 95.9903 108.5038 125.7936 149.8488 p is known 822 1.2941 1.4034 1.5364 1.9560 2.3189 77 Cost TABLE 20 of Partia1 0bservabi1ity for Case Three When B1= [-31], 82= ['22] and c=1.0 p=0.0 15543; 0.10 0.15 0.20 0.30 0.35 d1 0.0630 0.4564 0.8084 1.5275 1.9623 02 0.5546 0.5962 0.6376 0.7329 0.7951 p is not known 822’ 14.1692 22.4785 35.5819 100.6174 196.7489 p is known 322 1.2093 1.3642 1.5850 2.5186 3.7620 p=0.5 FBYEDBB' 0.10 0.15 0.20 0.30 0.35 d] - 0.0630 0.4564 0.8084 1.5275 1.9623 d2 —0.0085 0.1564 0.2961 0.5513 0.6817 p is not known 822 89.9929 158.1305 276.4605 1014.1426 2513.5569 p is known 822 1.3155 1.5315 1.8423 3.2329 5.2786 78 1591.531. Cost of Partia1 0bservabi1ity for Case Four - _ -d _ -—; When 81-82- [ c], p-0.5 and POO-0.25 p=0.0 C 0.3 0.5 1.0 2.0 3.0 d 0.3497 0.5294 0.8933 1.4822 2.0127 p is not known Case 4A1 89.5407 34.2244 15.5636 14.8929 13.2041 812 & 822 Case 4A2 100.9828 38.8355 18.0756 16.4319 14.3767 Case 4B 89.3978 34.0865 15.3714 14.5476 12.6529 0 is known Case 4A1 1.3176 1.2881 1.3605 1.5742 1.8609 812 & 822 Case 4A2 12.7598 5.8992 3.8724 3.1132 3.0335 Case 4B 1.1745 1.1502 1.1682 1.2289 1.3097 p=0.5 c 0.3 0.5 1.0 2.0 3.0 d 0.1338 0.3202 0.6993 1.3050 1.8420 p is not known Case 4A1 312.4136 125.3176 54.4721 46.5124 40.1396 812 & 8 Case 4A2 326.3303 131.2031 57.7210 48.3723 41.5624 22 Case 48 312.2180 125.1313 54.2374 46.1354 39.5681 p is known Case 4A1 1.3013 1.2777 1.3297 1.4946 1.7272 812 & 822 Case 4A2 15.2242 7.1676 4.5824 3.3578 3.1530 Case 48 1.1058 1.0913 1.0947 1.1170 1.1559 79 TABLE 22 Cost of Partia1 Observabi1ity for Case Four When 81:82: [.3], p=0.5 and c=1.0 o=0.0 15]); 0.15 0.35 0.45 0.55 0.65 d 0.5452 1.2098 1.5158 1.8394 2.2113 p is not known Case 4A1 12.3757 17.9619 19.8483 22.7009 31.2475 812 & 822 Case 4A2 14.8237 20.4170 22.1214 24.6609 32.8082 Case 48 12.2678 17.6562 19.3845 22.0101 30.2299 p is known Case 4A1 1.2250 1.5389 1.7896 2.1554 2.6895 812 & 822 Case 4A2 3.6730 3.9940 4.0627 4.1154 4.2228 Case 48 1.1171 1.2333 1.3258 1.4646 1.6718 p=0.5 566' 0.15 0.35 0.45 0.55 0.65 d 0.2919 1.0422 1.2098 1.7083 2.0917 p is not known _—' Case 4A1 47.3035 58.5931 60.0426 66.2930 80.0371 812 8 82 Case 4A2 50.9623 61.4985 62.7722 68.4357 81.6770 2 Case 48 47.1487 58.2580 59.6427 65.6211 79.0767 p is known Case 4A1 1.2272 1.4570 1.5383 1.8772 2.2411 812 & 822 Case 4A2 4.8880 4.3675 4.2734 4.0250 3.8850 ' Case 48 1.0723 1.1213 1.1377 1.2038 1.2785 80 CHAPTER FIVE SUMMARY AND CONCLUSIONS Some recent studies have made use of the bivariate probit mode1 in testing various hypotheses, but with on1y partia1 observabi1ity about the dichotomous dependent variab1es. These studies inc1ude Poirier‘s bivariate probit mode1 using Gunderson's examp1e of the retention of trainees, Farber's research on the demand for union representation, and Conno11y's study concerning the decisions to arbitrate or negotiate the contracts between emp1oyees' unions and municipa1ities in Michigan. The maximum 1ike1ihood estimators in these partia1 observabi1ity cases wi11 be inefficient compared to those obtained under fu11 observabi1ity. But the degree of efficiency 1055 caused by partia1 observabi1ity is not yet known. Therefore in this study, we present severa1 cases with different 1eve1s of observabi1ity for the bivariate probit mode1 and we measure the efficiency 1oss of maximum 1ike1ihood estimators for each case through some experiments. The resu1ts that we get give us some idea about the cost of partia1 observabi1ity, and have practica1 re1evance in studies 1ike those above. In Chapter Two, a forma1 statement of the bivariate probit mode1 is presented. A genera1 form of the mode1 wou1d be y.*=X.B +5. ‘1 ‘ 1 ‘1 i=1, 2, ..., N. y12* 3 X182 + 812 and yij = 1 iff yij* > 0 j=1 or 2. yij = 0 iff yij* 5_0 81 82 where X1 is a k-dimensiona1 row vector of exp1anatory variab1es, 81 and 82 are k-dimensiona1 co1umn vector of unknown parameters and disturbance term [:;1 has bivariate norma1 distribution with zero mean and variance-covariance matrix as [l 1]- The variab1es y1* and y2* are a1ways unobserved; different assumptions about the observabi1ity of y1 and y2 are considered. Six cases are introduced to represent fu11 observabi1ity and different types of partia1 observabi1ity for the mode1. The examp1e of a two-member committee voting under a unanimity ru1e can be app1ied to a11 of these cases. Case one is the case of fu11 observabi1ity in which the dichotomous choices of both voters are a1ways observab1e. Case two is the case of partia1 observabi1ity in the sense of Poirier, . under the assumption that on1y the resu1t of the joint choice of two decision-makers is observed. Case three is ca11ed the case of partia1 partia1 observabi1ity, in which one of the two parties' decision is fu11y observab1e. The other party's decision can be known on1y when the observab1e party votes "yes". In case four, which is ca11ed the case of partia1 observabi1ity with observed veto, when the outcome is "no“, we observe one of the two parties casting its "no" vote. There are three a1ternative possibi1ities here concerning who wi11 use the veto first if both parties wish 'to vote "no". The first possibi1ity is that we assume some fixed and known probabi1ity p that the first party does so (case 4A1). The second possibi1ity is having p as another parameter which needs to be estimated (case 4A2). Another possibi1ity is that the party with the strongest sentiment toward a "no" vote wi11 be observed casting the veto (case 48). We have provided 1ike1ihood functions for the joint estimation of the parameters for each of the various cases. Separate estimation (one equation at a time) is a1ways possib1e for case one and for the first probit equation 83 (the observab1e one) of case three. The separate estimation of the second probit equation of case three is possib1e on1y when the corre1ation coefficient p is equa1 to zero. However, joint estimation is a1ways more efficient than separate estimation un1ess the corre1ation coefficient (p) between the two probit equations is equa1 to zero. In Chapter Three, six different information matrices have been derived corresponding to the joint estimation of the parameters in our six cases (with varying degrees of observabi1ity). The question of identification is a1so discussed by ana1yzing the rank of these matrices. We especia11y ana1yze the perverse case in which a11 of the coefficients of the exogenous variab1es except the constant terms are equa1 to zero. It has been found that in the above situation, regard1ess of the va1ues of the constant terms, a11 of the information matrices for the partia1 observabi1ity cases are singu1ar if p is not known. When p is known a priori, on1y the case of partia1 observabi1- ity in the sense of Poirier and of observed veto with p as another parameter sti11 can't be identified. Another perverse case is when the two probit equations are identica1. Then the case of partia1 observabi1ity in the sense of Poirier is not identified. But there are no prob1ems with the other cases. There are a1so other situations that wi11 cause identification prob1ems for some cases, which we have discussed in Chapter Three. The perverse cases that we mentioned there do not necessari1y cover a11 that wi11 make the information matrices of various partia1 observabi1ity cases singu1ar. In genera1 one needs to check the rank of the information matrix in each specific situation to make sure that the parameters are identified. 84 In Chapter Four, a 1arge variety of experiments have been done to measure the cost of partia1 observabi1ity. We first try three arbitrary experiments and i11ustrate some genera1 resu1ts. Then we try a second set of experiments by varying the va1ues of parameters from 10w to high 1eve1s whi1e ho1ding the samp1e average va1ues of X181 and X182 equa1 to zero. Two important effects have been observed from these experiments. One effect is that the degree of identification changes as the va1ues of parameters change, and we ca11 it the "identification effect". The other effect is that the probabi1ity of a "yes" vote of either (or both) party changes when the va1ues of para- meters change, and thus the samp1e sp1it between the four possib1e outcomes changes. Hence we ca11 this the "samp1e sp1it effect". Both of these two effects change the cost of partia1 observabi1ity. If they work against each other, sometimes we can't te11 the direction of the change of efficiency when the va1ues of parameters change. Therefore, some more comp1icated experiments have been done with either the identification effect or the samp1e sp1it effect he1d constant whi1e the other changes. We then are more certain about the change of the cost under on1y one effect. Among a11 the conc1usions that we obtain from the resu1ts of these experiments, here we report some rather genera1 and important ones. First we notice that the cost of partia1 observabi1ity is quite high, especia11y for the case of partia1 observabi1ity in the sense of Poirier (our case two). The cost of partia1 observabi1ity decreases marked1y if any piece of observ- abi1ity information can be found. The 1aw of diminishing margina1 uti1ity of information usua11y ho1ds: the gain in moving from case two to case three (partia1 partia1 observabi1ity) or case four (observed veto) usua11y exceeds the gain in moving from case three or four to fu11 observabi1ity (our case 85. one). It is the first piece of observabi1ity information which is most important. From this, our suggestion for Poirier's mode1 using Gunderson's retention of trainees or other simi1arexamp1esis that if any information can be obtained, for examp1e, observabi1ity for either party's decision or an observed veto, then the efficiencies of the estimated parameters can be great1y improved. This is re1evant, for examp1e, to Conno11y's research on the arbitration or negotiation of the contracts between emp1oyees' unions and municipa1ities. In this case there is an observed veto. If this information is not used, this wou1d be just a case of partia1 observabi1ity in the sense of Poirier. The high cost of partia1 observabi1ity for Poirier's mode1 shou1d make one reconsider the possibi1ity of using the observed veto information. The second conc1usion is that specifying p a priori improves the efficiencies of the estimates of the other parametersa great dea1. A1so the improvement from knowing p is 1argest when the re1ative efficiency is 1owest. This can be app1ied to a11 the partia1 observabi1ity cases. For Farber's case as an examp1e, if the observabi1ity of the union emp1oyers' se1ection decision can't be obtained or the cost of getting the information is too high, then specifying p in the mode1 is another way to improve the efficiency. A third conc1usion is that the samp1e sp1it has a strong inf1uence on the re1ative efficiencies of the parameter estimates. For a given partia1 observabi1ity case, its efficiency re1ative to fu11 observabi1ity wi11 be higher, the sma11er the proportion of observations which fa11 into the indistinguishab1e categories. For Poirier's mode1, the more observations that are of the "yes, yes? variety, the higher the re1ative efficiency. The fraction of such observations is observab1e. In case three, the higher the pr0portion of observations having the observab1e party voting "no", the 86 1ower the re1ative efficiency wi11 be. The proportion of such observations is a1so observab1e. For examp1e, 62.8% of Farber's samp1e are non-union workers and 37.6% of these nonunion workers expressed a preference for union representation. Thatisq 39.2% of the who1e samp1e be1ongs to the indistin- guishab1e categories (not in a union and wou1d not vote for a union). The re1ative efficiency of the estimated parameters in the probit equation for the union emp1oyers' se1ection wi11 be 1ower as this percentage increases. For the observed veto case, it is the proportion of observations having both parties voting "no" that is re1evant, but this proportion is not direct1y observab1e. The 1ast conc1usion is that the strength of identification matters. A11 of the partia1 observabi1ity cases are unidentified for some perverse va1ues of the parameters, as we mention above. Their re1ative efficiency is very 1ow for parameter va1ues near such points, and it increases rapid1y as the parameters move away from such points of singu1arity. However, these effects are not strong except in the immediate neighborhood of points of nonidenti- fication. APPENDIX A I ll P(eiz ‘ ' x132’ E11 ' 612 < X132 ‘ X131) 1 8i1'812 < XiBz-X.B1 6 7211—1.) /2(1-p) ‘2 P ( < ' X182) XTBZ-XiBI F (----—- ‘- X.32; - 7(1-p1/2 72(1-0) 1 I T] ,\ a “J —J I >< rip v '0 \J where 0' = '7(1-p)/2 . _ x.8 -X.B Reca11 that 35433§431-= 0(a)¢(—9—33—-), here define a = _l_§__l_l., b= -XiB 1-9 -20' 2 bfp'a 1 | —— - 9(6)“ 1 X. 38] /1 12 201 1 '9 X132+x131) 1 x . 72(1-p) 72(1+p) 72(1-9) ' Let G, = P(ei1 < - X181, 512 - 81] < XiB1 - X182) using the same method as above we can get 86, - XiB1-Xi32)¢(- xiB1+X132 1 —.... ¢ . as2 /2(1-p) 72(1+p) 72(1-p) ' 87 .88 But Gi = F(- X181, -x182; p) - “1 = 1- 4(x181) - 0(x182) + '1 - H. 361 _ 30(Xi32) 8F, 3H1. and?" 38 +238 ’38 ’ 2 2 2 2 3H. 36. X.B -pX.B 1 1 11 1 2 so -——- = [--——- - ¢(X.B ) + ¢(X.B )9( )]X.' 3H X.B -pX.B i 11 12 = {- ——-- ¢(X.B) [1- 4( )1} x.' . 381 12 5:2 1 3H. 1 FOP-'5'; , note that p' = - 4: (1-p)'§ 72 223.: 1 3" 2/2(1-p) a[(X.B -X B )/-20'] and 1 2 i1 ap' _ xiBZ'XiB1 = xiBZ'xiB1 2012 '9 331;. 9.9;331 + 89 .. . 3H. 38 1 = .QL[__1. . 'l -20 3° 381 3(xi32'xiB1) ap' XIBZ-X‘IB'I - +f( 9 XB29 " (I-pj/2)] /2(1-p) X.B -X.B X.B +X.B X.B -X.B = 1 {¢( 1 2 1 1)¢(_ 1 2 1 1)( 1 2 1 1) 272(1-p) 72(1-p) 72(1+p) 1-p X B -X B REFERENCES Abowd, J. M. and Farber, H.S., 1982, "Job Quenes and the Union Status of Workers," Industria1 and Labor Re1ation Review 35, 354-367. Amemiya, T., 1974, ”Bivariate Probit Ana1ysis: Minimum Chi-Squared Methods," Journa1 of American Statistica1 Association 69, 940-944. , 1975, "Qua1itative Response Mode1s," Anna1s of Economic and Socia1 Measurement 4, 363-372. , 1978, "The Estimation of a Simu1taneous Equation Genera1ized Probit Mode1," Econometrica 46, 1193-1205. Ashford, J. R. and Sowden, R. R., 1970, "Mu1tivariate Probit Ana1ysis," Biometrica 26, 535-546. Conna11y, M., 1982, "The Effect of Compu1sory Arbitration on the Bargaining Process and Wage Outcomes," unpub1ished Ph.D. dissertation, Michigan State University. Farber, H. 5., 1982, "Worker Preference for Union Representation," Research in Labor Economics, forthcoming. , 1982, "The Demand for Union Representation," Working Paper No. 295, Massachusetts Institute of Techno1ogy. 90 91 Gunderson, M., 1974, "Retention of Trainees: A Study With Dichotomous Dependent Variab1es," Journa1 of Econometrics 2, 79-93. Heckman, J. J., 1978, "Dummy Endogenous Variab1es in a Simu1taneous Equation System," Econometrica 46, 931-959. , 1979, "Samp1e Bias as a Specification Error," Econometrica 47, 153-161. Kmenta, J., 1971, E1ements of Econometrics, New York: Macmi11an Pub1ishing Co., Inc. Poirier, D. J., 1980, "Partia1 0bservabi1ity in Bivariate Probit Mode1s," Journa1 of Econometrics 12, 210-217. Rothenberg, T. J., 1971, "Identification in Parametric Mode1s," Econometrica 39, 577-591. Thei1, H., 1971, Princip1e of Econometrics, New York; John Wi1ey and Sons, Inc. Ze11ner, A. and Lee, T. H., 1965, "Joint Estimation of Re1ationships Invo1ving Discrete Random Variab1es," Econometrica 33, 382-394. "7'111111111141111'