WllllllllHillllUlllllllllllllllilllUIHIIIIIHIIWIIUIHI

3 1293 10454 1952

 

MSU

LIBRARIES
n

\uv

 

 

 

RETURNING MATERIALS:
P1ace in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped be10w.

 

 

 

 

 

 

THE COST OF PARTIAL OBSERVABILITY
IN THE BIVARIATE PROBIT MODEL

By
Chun-Lo Katy Ho

A DISSERTATION

Submitted to
Michigan State University
in partia] fu1fi11ment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Economics
1982

ABSTRACT

THE COST OF PARTIAL OBSERVABILITY
IN THE BIVARIATE PROBIT MODEL

By
Chun-Lo Katy Ho

Some recent studies have made use of the bivariate probit model in
testing various hypotheses, but with only partial observability about the
dichotomous dependent variables. The maximum likelihood estimators in these
partial observability cases will be inefficient compared to those obtained
under full observability. Therefore in this study, we present several
cases with different levels of observability for the bivariate probit model
and we measure the efficiency loss of maximum likelihood estimators for each
case through some experiments.

The example of a two-member committee voting under a unanimity rule can
be applied to all of these cases. Case one is the case of full observability
in which the dichotomous choices of both voters are always observable. Case
two is the case of partial observability in the sense of Poirier, which under
the assumption that only the result of the joint choice of the two decision-
makers is observed. Case three is called the case of partial partial observ-
ability, in which one of the two parties' decision is fully observable. In
case four, which is called the case of partial observability with observed
veto, when the outcome is "no", we observe one of the two parties casting its
"no" vote. Three alternative possibilities are presented for this case, concern-

ing who will use the veto first if both parties wish to vote "no".

Chun-Lo Katy Ho

The log-likelihood functions are provided for the joint estimation of
the parameters for each of the various cases above. Since the inverse of
information matrix is the asymptotic variance-covariance matrix of maximum
likelihood estimator, the derivation of information matrices for all these
cases are presented. The conditions for identification for the partial
observability cases are also discussed. Then a large variety of experiments
are done to measure the cost (in terms of lost efficiency) of partial
observability.

Here are some of our main conclusions. First we notice that the cost of
partial observability is quite high, especially for case two. The cost of
partial observability decreases markedly if any piece of observability
information can be found. The law of diminishing marginal utility of inform-
ation usually holds: it is the first piece of observability information which
is most important. The second conclusion is that specifying p (the correlation
coefficient of the two probit equations) a priori improves the efficiencies
of the estimates of the other parameters a great deal. A third conclusion
is that the sample split has a strong influence on the relative efficiencies
of the parameter estimates., For a given partial observability case, its
efficiency relative to full observability will be higher, the smaller the
proportion of observations which fall into the indistinguishable categories.
The last conclusion is that the strength of identification matters. The
relative efficiency of each partial observability case is very low for
parameter values near such points, and it increases rapidly as the parameters

move away from such points of singularity.

ACKNOWLEDGEMENT

 

I must first express my sincere gratitude to my thesis advisor,
Professor Peter Schmidt, who generously bestowed an enormous amount of his
time providing detailed guidance on every aspect of this study. Without
his continuing assistance and encouragement, I could not finish this thesis
and my graduate study at Michigan State University.

I am also very grateful to the other members of my dissertation committee.
They are Professors John Goddeeris, Daniel Hamermesh and William Quinn.

I owe thanks to my typist, Miss Kelli Sweet. Her careful and expedient
typing is appreciated. 1

Finally, my deepest appreciation goes to my parents and my husband,
Hai-Zui, for their support and encouragement throughout these years. To

them, I owe a debt more than I know how to express in words.

LIST OF TABLES ..........................................................
CHAPTER
ONE INTRODUCTION ..................................................
TWO BIVARIATE PROBIT MODELS WITH FULL

THREE

TABLE OF CONTENTS

AND PARTIAL OBSERVABILITY

2.1
2.2
2.3

2.4
2.5

2.6

Introduction .. ...........................................
Case One: Full Observability ............................
Case Two: Partial Observability in the

Sense of Poirier .........................................
Case Three: Partial Partial Observability ...............
Case Four: Partial Observability With

Observed Veto ............................................
Summary ..................................................

DERIVATION OF INFORMATION MATRICES
AND CONDITIONS FOR IDENTIFICATION

3.1
3.2
3.3

3.4
3.5

3.6

Introduction .............................................
Case One: Full Observability ............................

Case Two: Partial Observability in the Sense
of Poirier ...............................................

Case Three: Partial Partial Observability ...............

Case Four: Partial Observability With
Observed Veto ............................................

Summary ..................................................

Page

iii

1

7
9

11
12

14
19

21
22

25
3O

33
4l

Page

FOUR RESULTS OF EXPERIMENTS MEASURING THE
COST OF PARTIAL OBSERVABILITY

4.l Introduction ............................................. 43
4.2 General Results of Some Basic Experiments ................ 46
4.3 Results of Further Experiments ........................... 49

4.4 The Results of Experiments With Either
The Identification Effect of the Sample

Split Effect Constant .................................... 54

4.5 Summary .................................................. 57

FIVE SUMMARY AND CONCLUSIONS ....................................... 81
APPENDIX A .............................................................. 87
REFERENCES .............................................................. 90

ii

LIST OF TABLES

Table . Page
1 Sample splits for different 3 ................................... 60
2 Ratios of asymptotic variances (cost of partial
observability) .................................................. 6l
3 Sample splits for different c when
a] = 82 = ['CCX] ................................................ 62

4 Ratios of asymptotic variances (cost of partial
observability) when a] = 32 = ['chi and p=o ................... 63

5 Ratios of asymptotic variances (cost of partial
observability) when 81 = 82= [’CCXI and p? 0.5 .................. 64

6 Sample splits_for different c_.

when a] = [C_x] and 32 = ['CCX] ................................ 65

7 Ratios of asymptotic variances (cost of partial

observability) when 81 = [c_§l, 32 = ['céxl and p=0 ............ 66

8 Ratios of asymptotic variances (cost of partial
observability) when 8] = [c_§I, 82 : ['chI and p= 0.5 ......... 67

9 Sample splits for different p
when 51 = 32 = [-ch] and c= l.0 ............................... 68

lo Sample splits for different p
when 81 = [C_ZI, 82 = [TCQY] and c= 1.0 ........................ 68

iii

Table
ll

12

13

14

15

'16

17

‘8

19

20

Page

Ratios of asymptotic variances (cost of partial
observability) when 81 = 82 = ['CCYT and c= 1.0 ................. 69

Ratios of asymptotic variance (cost of partial

observability) when 81 = [C_ZI. 82 = I

Cost
when

Cost
when

Cost
when

Cost
when

Cost
when

Cost
when

Cost
when

Cost
when

.of

-c'X

c ] and c= l.0 .......... 70

of
81

partial observability for case four

3 = "CX = ooooooooooooooooooooooooooooooo
82 I c ] and c 1.0 . 7l

of
81

partial observability for case four
= [c_§1. 82 = ['ch1 and c= l.0 ......................... 72

partial observability for case two

- d - d ---
B1 - [-c]’ 32 - [c] and Pll - 0.25 ......................... 73

of
81

partial observability for case two
= [.dc], 82 = [g] and c: 1.0 ............................. 74

of
Bl

partial observability for case three

---d = ........................
_ 32 - [ c] and P6;'+ FOO. 0.50 75

of partial observability for case three

- -d — -d2 —-—-= ...............
of partial observability for case three

- a -d :
a] - 32 [ c] and c 1.0 .................................. 77
of partial observability for case three

- -d - -d2 =
B] - [ C11’ 32 - [ C ] and c 1.0 .......................... 78

iv

Table Page

21 Cost of partial observability for case four

= a -d = ——=
when 81 82 [ c], p 0.5 and P00 0.25 ...................... 79

22 Cost of partial observability for case four
when a] = 32 = [‘3], p= 0.5 and c= 1.0 .......................... 8O

CHAPTER ONE

 

INTRODUCTION

 

The purpose of this study is to consider the bivariate probit model under
various levels of observability of the dependent variables, and to measure
the loss in efficiency caused by less than full observability.

There have been quite a few studies using the bivariate probit model in
a variety of settings. Zellner and Lee (1965) presented the probit model as
well as other models to analyze discrete random variables. They showed that
a joint estimation approach for a set of equations with dichotomous endogenous
variables yields estimators which are asymptotically more efficient than single
equation techniques, provided that the variables being analyzed are correlated.
They considered the example of a durable good purchase decision (buy or not
buy) and a credit decision (use installment credit or not-use such credit),
while the exogenous variable is disposable income. In this example, both
decisions are observable.

Ashford and Sowden (1970) considered a multivariate probit model and
proposed maximum likelihood estimation for its parameters. They applied their
techniques to a bivariate probit model, where the two endogenous variables are
breathlessness and wheeze of a coal miner, and the exogenous variable is his
age. A coal miner may have positive reSponse to neither, to one or the other,
or to both of the two symptoms; so there are four possible outcomes. All four
possible outcomes are distinguishable; the data gives the number of individuals
with each combination of symptoms, within each age group in the sample.
Amemiya (1974) proposed two minimum chi-squared estimators for the same

model and found that the FIMC (Full Information Minimum Chi-Square) Probit

1

2
estimator is asymptotically as efficient as the maximum likelihood estimator.

Gunderson (1974) discussed alternative statistical models for estimating
the probability that an on-the-job trainee will be retained by the sponsoring
company after training. In this situation, the employer must decide whether
or not to make a job offer, and the trainee must decide whether or not to
seek a job offer. Each individual's (either employer's or trainee's) decision
is not observed; only whether or not the trainee continues working after
training is known. Gunderson used a single-equation model with the dichoto-
mous dependent variable coded 1 if the trainee stays with the company, and 0
otherwise. Explanatory variables includes the characteristics of both the
trainee (age, sex, education, experience, etc.) and the company (company size,
area designation, etc.). Poirier (1980) proposed a bivariate probit model
under the same assumptions as Gunderson's concerning the amount of available
information. His model includes two probit equations each representing the
binary choice of a decision-maker, but only the outcome of the joint (unanimous)
choice is observable. That is, the only information about the two dichotomies
is whether or not both equal unity, and the remaining possible outcomes can't
be distinguished from each other whenever there is a negative choice made by
either party.

Farber's research (1982) on the demand for unionism shows that the union
status of workers is determined by a combination of workers' demand for union
representation and the decisions of union employers as to whom to hire. That
is, a worker is a union member if and only if he desires a union job and a
union employer is willing to hire him. If only the final outcome (union
status) is observed, it is impossible to determine whether nonunion workers

didn't want a union job, couldn't get a union job, or both, and we have

3
Poirier's model. In Farber's study, a unique data is employed which can be
used to identify the union or non-union preference of non-union workers.

50 workers' preferences are fully observable, while union employers' decisions
are still unknown for those nonunion workers who didn't want to be unionized.
'Connolly's study (1982) analyzes the joint decision to arbitrate or

negotiate the contracts between employees' unions and municipalities in
Michigan. According to law, there will be negotiation if both sides desire
so and there will be arbitration otherwise, but one of the two parties has
to cast a veto to seek the arbitration. Therefore, besides the observable
result that the contract is negotiated or arbitrated, one party (only) is
observed to use the veto whenever there is arbitration. However, the
decision of the party which didn't use the veto remains unknown.

The examples above can all be analyzed using the bivariate probit model,
but under different assumptions concerning what can be observed. The first
two cases (Zellner-Lee and Ashford-Sowden) are different from the others in
that the two decisions (or symptoms) are all related to one person instead
of to two different parties. But they still represent the case in which the
two binary dependent variables are both observable, which can be called the
case of full observability. All the other cases have less than full observ-
ability, in varying degrees. The model established by Poirier (using
Gunderson's example) assumes the least observability information among all the
cases. As outsiders, we can only tell whether something failed or succeeded.
For Farber's or Connally's case, besides the observable joint choice, one of
the two individual choices is observed. All of these cases can be called

partial observability cases of the bivariate probit model.

4

With incomplete information, the maximum likelihood estimators obtained
in these partial observability cases will be inefficient compared to the
estimators obtained in the case of full observability. In other words, there
is,a cost (in terms of lost efficiency) of partial observability. The point
of this research is to measure this cost. The study of the cost of partial
observability is important in itself, but it also has some practical
implications. For a researcher facing a high price of getting additional
information, it is important to know how valuable the information is, so that
an intelligent decision can be made about whether additional information is
worth obtaining.

This paper is divided into five chapters. In Chapter Two, a formal
statement of bivariate probit model is presented. All of the cases considered
assume this basic model, but with different levels of observability. Case one
is the full observability case, in which both parties' choices are observable,
and every possible outcomes can therefore be distinguished. Case two is the
model with partial observability in the sense of Poirier, in which the only
information is the binary outcome of the joint choice made by both parties.

If either one party fails to say "yes", then the remaining outcomes are
indistinguishable. Case three is called the partial partial observability
case. In this case one of the two parties' decision is fully observable, but
if the observable party has a negative response, the other party's decision

is not known. Case four is called the partial observability case with observed
veto. In this case if both sides do not say "yes", then we can observe one

and only one party saying "no" (casting the veto). There are three alter-
native possibilities which we will consider, under different assumptions about

who will cast the veto first if both parties wish to say "no". The appropriate

S
likelihood functions for all cases and possibilities are provided, so that
maximum likelihood estimates can be obtained.

Chapter Three contains the derivation of information matrices for all
of the (:ases which were presented in Chapter Two. The conditions for
identification for the partial observability cases are also discussed. The
information matrix (whose inverse is the asymptotic variance-covariance
matrix of maximum likelihood estimator) can help us to measure the efficiency
loss with different levels of partial observability. The conditions of
identification are relevant for efficiency comparisons, because the closer the
information matrix is to being singular, the greater the variances of the
estimates will be.

In Chapter Four, a large variety of experiments are done to measure the
cost of various levels of partial observability. For the purpose of
simplification, we assume there are only two exogenous variables, and one of
them is a constant term. For each experiment, specific values of the para-
meters are picked in order to evaluate the inverse of the information matrix.
All of the elements of the inverse of the information matrices for the partial
observability cases are divided by the corresponding elements of the inverse
of the information matrix for the full observability case. Thus the ratios
we present are the ratios of asymptotic variances and covariances of parameters
in the partial observability case compared to those of the full observability
case. The cost of partial observability increases as these ratios increase.
We attempt both to make a rough statement about the cost of partial observ-
ability in typical cases, and also to identify what types of changes in the

parameters cause this cost to increase or decrease.

6
The results of these experiments will be interpreted in Chapter Four,
and the tables at the end list the main numerical results. The summary and

conclusions of this study will be given in Chapter Five.

CHAPTER TWO

 

BIVARIATE PROBIT MODELS WITH FULL
AND PARTIAL OBSERVABILITY

2.1 Introduction

 

In this chapter, we will give a formal statement of the bivariate probit
model, and consider its estimation under various assumptions about what is
observed. Basically, our treatment of the estimation problem is just to
provide the appropriate likelihood function to maximize, though in some cases
we point out alternative possibilities. The questions of identification and
of the relative efficiencies of the various estimators will be deferred until
Chapters 3 and 4.

Now we start by reviewing the bivariate probit model. Consider two
individuals (j=l,2) each faced with a binary choice, yj=m, m=0,l. The
dependent variable yj takes on the value 1 if an event occurs or 0 if it does

not occur. Suppose the two individuals have utility functions of the form

U1m a 91m (wlm’ y2*) + nlm "1:0’]

U2m 3 92m ("2m’ 5’1” I n2m

where for j=l,2, g. is a non-stochastic scale function, w. is a fixed vector

3m 3m
of characteristics of individual j and choice yj=m, ”jm is a random disturbance
term and yJ.* is the utility differential
* = o - o .= , .
yj UJ1 UJ0 J l 2

8

This specification permits interdependency between the utility functions of
the two individuals in the sense that the utility of each individual is a
function of the sentiment of the other individual.

Further suppose

9n ("11’ 372*) ' 910 ("10’ Y2” ‘ Yly2* + X51

921 ("21’ yi”) ' 920 (”20’ Yi”) ‘ Yzyf' + X52
"ii ' ”10 = V1
”21 ' ”20 = V2

where X is a K-dimensional row vector of explanatory variables, a] and 62 are

K-dimensional column vectors of unknown coefficients, y] and y2 are unknown

parameters, and

V a [V1, V2]' ~ N(O, 0) with
0.1 O)
a = [mll w12]
12 22

Then it is easy to show that

’f = 0’2" + X51 1 Vi "“ (1)
* = c.---
y2 125’? I X62 1 V2 (2)

and that individual j will select

'~<
ll

1 1ff yJ.*>O 1.e., Ujl > UjO

‘<
u

0 iff yj*:O i.e.,U. :u

.11 3'0

9

The reduced form equations corresponding to (l) and (2) are

yf" ' X81 + 61 ------------- (3)

y2* = XBZ + 82 ------------- (4)

B] = (51 ' Y152)/(1 Y1Y2)
where

82 - (52 ' Y251)/(] Y1Y2)

81 = (V1 + Y1V21/(1 - Y1Y2)

e2 : (V2 + Y2V1)/(1 YIYZ)

and [:1] has bivariate normal distribution with 0 mean and variance-covariance
matrix as [l :1. Here the variances of a1 and 52 have been normalized to equal
unity and p is the correlation between 8] and e2.

The model just presented is common to all of the cases we will consider.

However, the cases differ with respect to how much one observes about y1 and

Y2-

2.2 Case One: Full Observability

Here we assume that y1 and y2 are both observed. Among all the cases we
are going to discuss here, this is the one which.has the most complete
observability, and which leads to the most efficient estimates. An example
of such a case would be a two member committee voting under a unanimity rule,
but with both votes observable. That is, in our random sample of votes, i=1,
---, N, we can not only observe the explanatory variables Xi, but also the
votescafboth voters, i.e., yi1 and in' Therefore, there are four possible
outcomes which are all distinguishable:

(1) both vote yes , i.e., yi1tl and yi2=13

10
(2) the first party votes "yes" and the second votes "no", i.e., yi1=l and

yi2=°3

(3) the first party votes "no" and the second votes "yes", i.e., y11=0 and
’12:“

(4) both vote "no", i.e., y11=0 and y12=0.

The distribution of in and yi2 in this case is

P(yi]=1 and y12=1) F(X181, X182; 9) i=1,---,N

P(yi1=l and yiz=0) = F(Xi81, -X182; ~p)

P(yi1=0 and yi2=l) = F(-xi81, xiBZ; — p)

p(y11=0 and y12=0) F(-XiB], -XiBz; 9)

1 - 4(X181) - ¢(Xi82) + F(Xi81, x182; p)

where F(-,-; -) denotes the bivariate standard normal distribution function
with correlation coefficient 9, while o(-) is the univariate standard normal
distribution function.

We can always estimate the reduced form equations separately. The log-
likelihood functions are

N

in L = g {yij 2n ¢(Xisj) + (l-yij) 2n<b(-Xi8j)}

La
II
_.a
O
1
N

11

But this is efficient only when::=0. When 0 is not equal to zero, it is more
efficient to estimate the two probits jointly. Then the log-likelihood function
of the sample is

N.

In L(B]9 829 D) = g {inin 2" F(xi819 x182; p)

+ yi](1 - Yiz) 2" [¢(XiB]) ' F(XiB]9 X182; 0)]

+ (1 ' yi1) yiz 2" [¢(Xi82) ' F(XiB], X182; 0)]

+ (1 - yi1)(1 - yi2) 2n [1 - 4(X181) - 6(Xi82)

+ F(X181. xiBZ; 011}.

2.3 Case Two: Partial Observability in the Sense of Poirier

This is the case treated by Poirier (1980). An example of this would be
a two member. committee voting under a unanimity rule, but anonymously. As
outsiders we can only observe whether a motion passes (i.e. both members vote
"yes") or whether it fails (i.e. at least one member votes “no"). So instead
of observing Xi, yi] and in’ i=1,2,...,N, we observe only Xi and Zi’ where Zi
= yil'in’ i=1,2,...,N. That is.

Zi=1 iff yi1=l and yi2=l
=0 otherwise .

In terms of the four possible outcomes that we listed in section 2.2, the last
three are indistinguishable because all we can see for these outcomes is just
that the motion does not pass.

This model has been used by Connally (1982) to study the decision to
arbitrate or negotiate the contracts between public employees' unions and
municipalities in Michigan. By law, binding arbitration occurs if either party

so desires. Therefore, a contract is negotiated (Zi=l) if an)only if the

12

union desires negotiation (yi]=l) and the municipality desires negotiation
(yi2=l). Otherwise the contract is arbitrated (Zi=0)' Poirier's model applies
if we observe only the outcome (negotiation or arbitration) and not the desire
of either party separately.

The distribution of 21 is

P(Zi=l) = P(yi]=l and y12=1)
P(Zi=0) = P(yi]=0 or y12=0)

l - F(XiB], X1823 9)

and the log-likelihood function of the sample is

in L(B], 82, o)

N
= i {21 in F (Xi81, X182; 9) + (1'21) 2H[1-F(Xi819 X182; 0)]}

2.4 Case Three: Partial Partial Observability

In this model, we observe more than in Poirier's model, but less than in
the full observability case. Specifically, it is assumed that we observe yi],
Zi = yil'in’ and X1, i=1,2,...,N. Note that the vote of the first voter
(yi1) is always observed. However, the vote of the second voter is only
sometimes observed. Essentially, we observe y].2 if and only if yi1=l. This is
so because if yi1=l, then yi2=Zi, and Zi is observed. However, if yi1=0 we
have no information about in' Thus in terms of the four possible outcomes,
two (yi]=0, y12=l and yi1=0, y12=0) are indistinguishable. This model can

also be regarded as a censored probit model, since the sample for which y].2 is

13

observed is censored, by the value of yi].

This model has been used by Farber (1982) to study the demand for union-
ism. Let yi1=1 if individual i wishes to be in a union, and yi1=0 otherwise;
let yiz=l if a union employer is willing to hire individual i, and yi2=0
otherwise. Individual i is a union member (Zi=yi]-yiz=l) if both y11=l and"
yi2=l, and is not a union member (Zi=0) otherwise; Zi is observed for all i.
If nothing more were known, this model would be Poirier's model. However,
non-union workers in Farber‘s sample were asked if they desired union repre-
sentation, so that yi] is also observed for all i. 0n the other hand, y}.2 is
observed only if y11=1.

Now give that we observe yi], Z1.=yﬂ-y1.2 and Xi’ the log-likelihood

function for the model is

in L(B]s 82: p)

N
= E {yII yiz z" F(XiB]s X182; 9)
+ yi](l - in) znl 9(X§B]) - F(Xie]. X282; 9)]

+ (l - yi1) 2n ¢(-Xie1)}.

Because it is observable, the first probit equation can always be estimated
separately with the log-likelihood function as

an L(6]) = g {yi1 2n ©(Xi81) + (l.- yil) 2n ¢(-X131)}
but joint estimation is more efficient for the first equation unless p=0.
Separate estimation for the second equation is possible if and only if o=0.
When p=0, using the observation of the first party, we can establish the log-

likelihood function of the second probit equation as

14

In L(82) = {yil 2n 6(Xi81) + (l - y11) tn ¢(-X161)}

yi] {yiz 2n ¢(Xisz) + (1 - ’12) In ¢(-X182)} .

.JOMZ doMZ

2.5 Case Four: Partial Observability With Observed Veto

In this case, it is assumed that we observe Zi=yil°yi2 and X1, as in Poirier's
case. However, when Zi=0 we observe either y11=0 or yi2=0 (but never both).
That is, we observe the casting of a veto by one party.

The situation analyzed by Connally (1982) again provides a good example.
Negotiation occurs if and only if both the municipality and the union wish to
negotiate. Otherwise arbitration occurs. However, one party or the other
has to ask for arbitration, so that the veto (of negotiation) by one party is
observed. That is, there is negotiation (Zi=l) iff yi1=l and y12=l, and
there is arbitration (Zi=0) iff yi1=0 or y12=0. Furthermore, when Zi=0
is observed we observe either yi] or y].2 as zero.

Among the four possible outcomes, Zi=l is straightforward. In the case
when Zi=0 and yi1=0 are observed, however, we don't know if party two votes
"yes" or "no". The same.is so for the Z1=0 and yi2=0 situation. In such a case,

the diStribution of Z, is
P(Zi=l) = P(yi]=l and y12=1) = F(XiB], X182; 0)
P(Zi=0 and y11=0) = P(yi]=0 and y12=l) + P(yi]=0 and y12=0)
: P(lst party requests arbitration/yi]=0 and y12=0)
P(Zi=0 and y12=0) = P(yi]=l and yi2=0) + P(yi]=0 and yi2=0)

- P(an party requests arbitration/yi]=0, y12=0) .

15

Note that the events (yi]=0, y12=l) and (yi]=0, y12=0, party one requests
arbitration first) are indistinguishable.

We can always ignore the information about the observed veto and convert
this back to a standard Poirier partial observability case (Case Two). The
cost for doing that is some loss of efficiency of the estimates. Otherwise,
to close the model some assumption must be made about who would ask for

arbitration first, if both parties desire it. Here we have three possibilities.

Possibility One: Case 4A1, Observed Veto, Given Probability =gp
Assume there is some fixed probability for the first party to request

arbitration first given that both sides don't want a negotiation. That is,

p =p(lst party requests arbitration first/O, 0)
l-p =P(2nd party requests arbitration first/0, 0).

Here p is a given constant. It might come from past experience or just

represent a reasonable guess. The log-likelihood function is
in L(B],BZ, o)
= 2 2n F(Xi8], X182; 0)

irZi=l
is observed

+ E’; =0 2n {F(-XiB], X182; - D) + pF(-X13]. -X132;P)}
yil
is observed
+ F - in {F(XiB]s 'XiBZ; ‘0) + (1-p)F(-Xi81’ -X182; 0)}
1’yi2-0

is observed

16

= §+Z.=l ﬁn F(Xi81, X182;p)
1

is observed

+ Z I"III ¢(X.8 ) - P(X B . X-B ; 0)]
iyy11=0 1 2 1 l 1 2

15 observed + le- ¢(xi81) , ¢(x132) + F(x131, x132; p)]}

+ 2 2n {[¢(X-B ) - F(X.6 . x.s ; 9)]
i>y12=0 1 l 1 l 1 2

15 observed + (l-p) [1_ ¢(XiB]) _¢(x132) + F(Xis], xiBZ; o)]}

= §)Z.=] in P(XIBI, x182; p)
1

is observed

+ 2 In {pll- ¢(X.8 )l + (l-p) [¢(X.B ) - F(x.6 , X-B ; o)l}
i;yi1=0 1 1 1 2 1 1 1 2
is observed

+ 2 in {p [¢(X-8 ) - F(X-B , X.B ; o)l + (l-p) [l-¢(X.B )1}.
i’y12=0 1 l 1 l 1 2 1 2
is observed

Possibility Two: Case 4A2, Observed Veto, p is Another Parameter

 

Here instead of having p as a given constant, we let p be an unknown
parameter in the model. The log-likelihood function is the same as above

except p needs to be estimated too.

In L(B]: 82: 0: p)

= §+Z.=] 2" F(Xi31, x182; p)
1

is observed

17

+ s an {pil- 9(X181)] + (l-p) [4(X182) - F(X161, X182; p)l}
19yi1=0

is observed

+ i in {p[¢(XiB]) - F(X18]. X182; 0)] + (l-p) [l- ¢(X182)]}
*y12’O

is observed

Possibility Three: Case 43, The First Party to Ask for_Arbitration is the
One Who Wants Negotiation Least

Recall equations (3) and (4)

Y1* X81 + e1

’2* x82 + 62

y1* and y2* are the utility differentials between voting "yes" and "no".

They represent individual j's (j=l,2) "sentiment" toward yj=l. When yi]* < O
and yi2* < 0, so that neither party wants negotiation, it may be that the
party whose sentiment is more strongly against negotiation will cast the

veto first. That is, when

‘<
._n
.1

*
II
x
m
._I
+
m
.4
_.n
A
O

:1-
u
x
&>

and XiBl + 611 < XiBZ + 612 ,

it is reasonable to conclude that the first party will cast the veto first.

So we use P(ei2 < - xiBZ’ 611 - 812 < XiB2 - Xi81) to represent the probab1lity

that the first party is observed using the veto and P(ei] < - XiB], €i2 - 61]

< ..
X181 x18

2) to represent the probability that the second party uses the

veto, given that both of them don't want a negotiation. The log-likelihood

18

function is

2" L (8]: 829 p)

= §*Z.=] 2" F(X181’ X182; 9)
1

is observed

+ 2 2n [F(-X.B , X.B ; -p)
i,yi]=0 1 l 1 2

is observed

+ P(Eiz < 'xiBZ’ e1'1 ' E1'2 < X132 ' X131)]

+ 2 2n [F(X B , -X-B ; -p)
i’y12=0 1 l 1 2

is observed

+ P(611 < -X181, 812 - €11 < X18] - X18211 .

”ere P(eiZ < 'XiBZ’ e1'1 ' 812 < X182 ' X181)

6. -8. XOB -XOB
_ 11 12 1 2 1 l _
’ P < ’ e1'2 < x182)

12(1-9) ¢2(l-o)

XLB -X.B

 

‘ FELL-1J- , ‘- X.8 ; -/(1-o)/2)
/2(]—p) 1 2
X B -X B
F( 1 i 1 1 s ‘ x182; 9') 9
-Lp'

where p' =-¥(lep)/2

P(511 ‘ 'XiBl’ €12 ‘ 811 < X181 ‘ X182)

19

XiBZ'XiBl .
F(- XiBla ' X182, 9) ' F(-—:§ET———'9 - X182; 0 )

1 ‘ ¢(X181) ' ¢(X182) + F(XIB]’ X182; 0)

X.3 -X.s
' F64L2L_l—l' s ‘ X-B 3 p.) a
on 12
-aa
then
an L (81. 62, p) = 2 2n F(X181. X182; 9)
irZi=l
is observed
+ 2 2n [¢(X B ) - F(X B , X-B ; o)
1*y11’0 1 2 1 1 1 2
is observed XiBZ-Xp1
+ F(—_—.—'—— . - x162; o')1
X.B -X.B
'1'; in [1’ ¢(x182) " F(_1"'g_'_1_'l s ' X182; 0.)]
1*y12=0 '29.
is observed
2.6 Summary

In this chapter, six cases are introduced to represent full observability
and different types of partial observability for a bivariate probit model.
The example of a two member comittee voting under-a unanimity rule can be
applied to all cases. Case One gives full observability about the model,
since the dichotomous choices of both voters are always observable. Separate
estimation is possible for each probit equation but this would not be

efficient unless the correlation coefficient p=O.

20

In Case Two, partial observability in the sense of Poirier, only the
result of the joint choice of two decision-makers is observed. As long as
either party votes "no", the separate votes of the two voters are indis-
tinguishable. This is the case which gives us the least information.

Case Three and Case Four each lie somewhere between the above two cases.
One of the two voters' behavior is observed in Case Three. But when this
observable party votes "no", the two choices of the other voter are indistin-
guishable. The observable party's probit equation can always be separately
estimated, but only when p=O will this be as efficient as joint estimation.
Separate estimation for the other party's probit equation is impossible
unless p is known to be equal to zero.

In Case Four, when either party (or both) votes ”no", we observe the
casting of a "no" vote. But while the one party is observed casting the
veto, the vote of the other party remains unknown. Some assumptions must be
made here about who will use the veto first. We can either assume some
fixed p to be the probability that the first party does so (Case 4A1), or
have p as another unknown parameter in the model (Case 4A2). Another
possibility, which is Case 43, is that the party with the strongest sentiment
for a "no" vote will be observed casting the veto.

We have provided likelihood functions for the various cases. In each
case, a numerical maximization of the likelihood function provides maximum
likelihood estimates. In Chapter 3 we will consider the asymptotic
distributions of these estimates, and in Chapter 4 we will compare their

relative efficiencies.

CHAPTER THREE

DERIVATION OF INFORMATION MATRICES AND CONDITIONS
FOR IDENTIFICATION

3.1 Introduction

The log-likelihood functions that we presented in the last chapter for
full and partial observability cases have prepared us for the derivation of
the information matrices here. The information matrix by definition is
equal to minus the matrix of the expectation of the second-order derivatives
of the log-likelihood function with respect to the parameters. That is,

E(333%%§7L), where e is the vector of unknown parameters. Under certain

regularity conditions, it can be shown that the maximum likelihood estimates
are consistent and asymptotically normal, with a variance-covariance matrix
which is equal to the inverse of information matrix. Therefore, through
the information matrices which we derive in this chapter, we can compare
the variances and covariances of the parameter estimates in different cases.
That is, we can measure the efficiency lost for lack of full observability,

which can be called the cost of partial observability.

Note that

[- E(azlog L/aoao')]
'E [(3 109 L)(3 109 L): ]

The latter formula will be used for all cases in this chapter.

We also consider the problem of identification of the parameters of the
model, under various levels of observability. Since (given certain

regularity conditions) a necessary and sufficient condition for local

21

22

identification is non-singularity of the information matrix, we examine the
rank of the information matrices which we present. For all levels of
observability which we consider, the parameters are identified except in
certain special (perverse) cases, whiCh we point out.

The question of identification is important in its own right, but we are
also interested in it because it is relevant for efficiency comparisons.
The closer the information matrix is to being singular, the larger are the
variances of the estimates. In the next chapter, we will compare efficiencies
by evaluating the inverse of the information matrices, for various specific
parameter values. Knowledge of the perverse cases which lead to non-identi-
fication will help us in picking regions of the parameters space to investigate.

Section 3.2-3.5 contain derivations of the information matrices and
discussions of identification for the different cases listed in the last

chapter. Section 3.6 gives a summary of this chapter.

3.2 Case One: Full Observability

In order to simplify the notation, we let

F]. = min], x132; o) i=1,2,...,N
a1 = Xi81

b1 = X182

0 = (81'. 82', o)‘ .

Then the log-likelihood function for the full observability case is

N
in L(0) = S {y}.1 yi2 In F1 + yi](1-y12) 2n[ °<ai) - F1]
1

23
+ (1-y11)y12 in [¢(b1) ' F1]
+ (l-y1])(l-y12) 2n [1- ¢(ai) - ¢(bi) + F11}.
the information matrix of this case is
e9= c1'c1 + cz'c2 + c3'c3 + c4'c4 ----(1)

where Cl" 02', 03' and C4' are all (2K+l)°N matrices and:

 

 

 

1 8Fi
the ith column of C1' is-::: --
/F. 39
1
a[¢(a.)-F.]
n u czl 1S 1 ;e 1
u u c3' 15 1 $9 1

1 a[l-¢(ai)-¢(bi)+Fi] .

H I. C I is
4 71-¢(ai)-¢(bi)+Fi 39

 

Some of the derivatives which appear in the information matrix above have been

derived by Ashford and Sowden (1970):

aFi

S's—1" = (a1)¢(A1)X1

3F1

33": ¢(bi)¢(Bi)Xi
2

BF

24

and we know that

 

 

 

 

 

 

a¢(ai) ' 36(ai) 36(ai)
as] "p(ai) xi as ‘ 0 so ' 0
2
30(b1) ' 34(bi) 30(bi)
382 '4’(bi) Xi as1 ’ 0 3o ‘ 0

wherei>(-) denotes the standard normal density function, fi=f(xi81’ X182; 0)

denotes the standard bivariate normal density function and

 

 

A1 ___ «17:2 “’1' ‘ “i"
B. = 1 (a. - ob.) .
' «TIT? ‘ '

If p is given a priori, the equation (1) is still the same. But 6 now

is (31', 82')'instead of (81', 82', o)‘ and the information matrix is

(2K)-(2K).
If the two probit equations are separately estimated, the log-likelihood

function of the first equation is

N
In L (81) = g {yi1 2n 0(61) + (1'yil) 2n ¢(-ai)}

and the information matrix of the first probit equation is

N
J= 2. M(a1.) M(-a1.)X1.'X1.

1

 

where M(ai) =

25

Using the same method, we can get the information matrix of the second equation.
Separate estimation will not be as efficient as joint estimation unless o=0.
Here the information matrix for joint estimation with full observability
is the same as given by Amemiya (1974) using the FIMC (Full Information
Minimum Chi-Square) Probit method. When the probit equations are estimated
separately, the information matrix is the same as his using LIMC (Limited
Information Minimum Chi-Square) Probit Method.
Under certain conditions, the information matrices of other (partial
observability) cases will be singular. But this is not the case here.
For full observability, the parameters are identified, except of course in
the case of perfect multicollinearity. Perfect multicollinearity is also
a perverse case for all of the levels of observability which we consider,

but we will not discuss it further.

3.3 Case Two: Partial Observability in the Sense of Poirier

The log-likelihood function of Poirier's partial observability model is

N
in L(o) = a {2i 2nFi + (1-Zi) 2n [l-FiJ} and the information matrix is
1

J=c‘c

s 5 "“(2)

where C5 is the N r(2K+1) matrix with ith row equalling

VFizW-Fi; [¢(ai) 9(A11X1, ¢(b1) ¢(Bi) Xi, f1] ,

When p is equal to zero, the ith row of C5 becomes

1
76(ai)¢(bi)[1-¢(ai)¢(bi)]

 

[¢(a1-)<I>(b1-)Xi, ¢(b1.)<b(a1-)X1-. 6(ai)¢(b1-)l .

26

If p is known, 6 is just (81', 82')'and C5 is a N-ZK matrix with ith row equal

 

to
«WW—7F) [Raiiempxp ¢(b1-)¢(Bi)X1-l .
1 1

which is the same as for 0 unknown except that the last element has been

dropped.
In discussing the identification problem, some simplifications are made

here. We assume that there are only two independent variables and one of

them is a constant term. That is, we assume

Xi ‘ “ Xi2]
s
- 11
e1 [8121
B
62 = I 211
322

(The same simplification will be used in discussing identification in the
rest of this chapter.) These assumptions are only for the purpose of
simplification, and won't change our conclusion.

The first perverse case we consider is the case of 812 = 822 = 0. That
is, all coefficients equal zero except for the constant term.
Then
a"""hi‘"

i 821

27

- 2 _
A1 ' (82] ‘ 081])/ /1'9 A
= - - 2 —
f1 = f(a: b; P) = f
F = F(a, b; p) = F

are all constants for i=1,2,...,N.

Under the assumptions above, the information matrix is

N
1 .
‘0 ‘f T=T1'-'I'=T (Fe)i(Fe)i
where _ (4(a)¢(A)
(Fe), = 4>(a)<1>(A)X,-2
¢(b)¢(B)
L¢(b)o(B)X1-z
L f ..-

 

 

It can be seen that in the information matrix,
the first row x 1% = the third row,
¢(a)¢(A)

the first row x-—-¥:————-= the last row,

¢(a)¢(A)

and the second row x 3i9131§l-= the fourth row.
{ ¢(a)¢(A)
Thus the rank of the information matrix is only two; the parameters are not
identified. This is so even if p is known a priori.
The second perverse case we consider, which was noted by Poirier, is the

case in which B1=82. (That is, B11=82], 812=822.) Then bi=ai’ Bi=Ai and the

28

U . . O 1
1nformat1on matr1x1s J: 2 (F). (F ).'
i Fi(l-Fi) <3 1 0 1

where

MaiMAi) '
(F ). = 4’(‘"i)"’(”'i)xiza
¢(ai)¢(Ai)

ci<a1.)<i(A1-)Xi2

 

 

The information matrix will have rank equal to three, and the parameters are
not identified. Again, this is so whether or not o is known a priori.
The third perverse case we consider is the case in which 311=9321 and

81239822: Then ai=pbi

 

b.
usi=RD ')=um=e

 

 

1 /1- 92
(ob-)2 + b? - 2p(pb.)b.
f, -_-___1__ exp {_ 1 1 1 1 }
‘ - 2n/1- a2 2(1- ,2)
b.
l 1
= ——-—--—- exp {- --}
ZH/T- oz 2
= s ¢(b,)
where s = 1
/2H(1- oz)

and the information matrix is of the same form as in the second perverse

case where

2.9
. ¢(a1)¢(A1) q
¢(ai)<b(A,-)Xi2
oh = 1f‘mi)
J.2""("i)"i2
) 5¢(b1‘) .1

 

 

It can be seen that the third row times 25 is equal to the last row, so the
information matrix is singular. This will not occur if p is known a priori.
The similar situation happens when 821:0811, 822=p812. @(Ai)= %- and

fi = s-¢(ai) where s is the same as above, then the first row times 25 is
equal to the last row and the information matrix is singular. Also there is
no problem if p is known a priori.

These perverse cases can be given an intuitive explanation. In this
partial observability case, there are only two distinguishable events (yes,
no) and one probability is independently estimatable. When the coefficients
of the exogenous variables (Xi) except for the constant term are all equal to
zero, this probability is unrelated to Xi except for the constant term. Hence
there exist only K(the number of exogenous variables) pieces of information,
namely one probability and (K-l) things it doesn't correlate with. From this,
we can't estimate (2K+l) parameters (p is not known) or 2K parameters (9 is
known). In the case when 81=82. the two probit equations are observed to be
the same and there are only (K+l) pieces of information. We still don't have
enough information to estimate all the parameters. In the case when 6H=p321
and B12=p822 (or 8213p811 and 822:9812), there are 2K pieces of information.
We can't estimate all the parameters when p is not known, but won't have the

same problem if p is known.

30

Another situation in which the information matrix will be singular is
called “the peculiar case" by Poirier and has been discussed clearly in his
paper. This case involves specific exogenous variable configurations and will
not be discussed here.

There may be some other situations that the information matrix will be
singular; the above cases do not necessarily cover all. It appears that in
general one needs to check to see whether parameters are identified according
to whether the information matrix has rank equalling 2k+l for- each specific
condition. This is also true for all the identification problems of other

partial observability cases that we are going to discuss in other sections.

3.4 Case Three: Partial Partial Observability

 

 

The log-likelihood function for joint estimates is

N
In L(o) = g {yi1 yi2 anFi
1

+ yi1(l- Viz) 2n[¢(ai) - F1]

T (1' y11) 2n[]' 9(3171}

O = (81', 32.9 p)'
1 = 1,2,...,N

and the information matrix is
“p = c1 c1 + c2 C2 1 C6 C6

where c1, c2 are the same as before and c6' is a (2k+1):N matrix with ith
column equalling
1 3L1“ ¢(a1)]

/l- 6(a1) 39

 

31

When p is known, 6 changes to (81', 82')' and the information matrix is a
2k ¥ 2k matrix.
The first probit equation -- the observable one -- can always be estimated

separately by maximizing

2n L(81) = z {yil 2n 6(a.

i 1) + (1-y11) 1" ¢(-a1)}

and the information matrix is

c) = 2 M( .)M(- .)x.'x.
1 a1 a1 1 1

¢(ai)
where M(ai) = .

¢<ai)

 

Only when p=0, the second probit can be estimated separately from the

observations with yi1 by maximizing

tn L(82) = 1 y I'Iyiz 2n ¢(bi) + (l-yiz) 2n p(-bi)}

and the information matrix is

ei== ; 6(ai) [M(bi)M(-bi)]Xi'Xi
1

 

¢(b,)
where M(bi) =
0(bi)
Under the assum tion that X = [l X ] B = [811] B =[821] and for
p i i2’1 812’2 822

the case that B12=822=O, the information matrix for the joint estimation is

 

N
J =3; 1%? (F911LFQ)1‘ 1' 37—31731:(¢e)i'(F®)i][(¢9)i - (FO)T]'
+ 1 )i(¢®)i'}

32

'¢(a)o(A) 1

¢(a)¢(A)Xiz
where (Fe); o(b)¢(B)
¢(b)¢(B)X12
. f l

 

 

1
and F¢(a)
6(alxi2

(¢o)i o

 

 

It can be seen that the third row of the whole information matrix times

f will be equal to the last row, so the information matrix is singular.

¢(b)¢(B)
= = = =1
Another case is when 81] p821 and B12 “822’ then ai pbi’ 4(Bi) 2- and
l . .
f.=s-¢(b.) where s = -———-—-——— . The information matr1x is of the form that
1 1 V2H(1-pz)

J: 1 {Eli— (Fo)i(Fo)i' + 35:17::- [(‘”e)i‘(Fe)i][(¢e)i‘(Fe)i]'

1

 

+ . .'}
]_¢(ai) (¢G)1(¢O)1
where F¢(ai)¢(Ai) 1
(F ). = ¢(ai)°(Ai)xi2
GI
%-¢(bi)

%'¢(bi)xiz
S¢(bi)

 

 

.J

33

and

' “3.3)
¢(ai)xi2

COO

 

 

It also can be seen that the third row time 25 equals the last row, so the
information matrix is singular.

Since the last row of the information matrix is the one corresponding
to p, for the above two perverse cases, the parameters are identified if p is
known a priori (in which case the last row and column are deleted), but they
are not.identified when p is unknown. However, it should be pointed out that
81 is always identified; the lack of identification is only for 82 and p.

Intuitively, since there are three distinguishable events (yes, no but
party one voted yes, no and party one (voted no), there are two independent
probabilities. Thus there are 2K pieces of information in the data for the
above two perverse cases, which do identify 31 and 32 when p is known, but
which do not identify all of B]. 82 and p.

There also may exist some other situations in which identification fails,

but these are the only two that we found so far.

3.5 Case Four: Partial Observability with Observed Veto
First we will discuss the case in which there is a given probability for
party one to cast the veto first when both parties wish to do so. Let p be
this given probability, then the log-likelihood function is
in L(e) = 2 2n F1
i9Zi=l
is observed

34

+ 2 2n [p(l- 6(ai)) + (l-p)(¢(bi)-Fi)l
iayi1=0
is observed

+ § In [p(¢(ai)-Fi) + (I-P)(1- 9(b1))]
19yi2=0
is observed

where 0 (81'. 82', o)'

i = 1,2,...,N

and the information matrix of this case is

:9

Ci'ci I C7'C7 + ca'ca

where c]' is as before, c7' and c8' are matrices of dimension (2k+l)-N with

ith columns given by

 

 

 

 

_ 1 A§[p(l-¢(ai))+(l-p)(¢(bi)-Fi)]
/p(l-¢(ai))+(l-p)(¢(bi)-F1) as
and
1 a[p(¢(ai)-Fi)+(l-p)(l-¢(bi))l
/P(¢(ai)-Fi)+(1-p)(l-¢(b1)) 66
respectively.

If p is another parameter which is unknown and needs to be estimated, so
that 6 now includes p, this does not change any of the above except that we
now add another column to c1, c7 and c8. The extra column of c1 is zero, the
extra column of c7 has ith element

1
nan-Ra.))+(i-p)(¢(b,)-F,)

 

[l- ¢(ai)- ¢(bi)+Fil

35

and the extra column of c8 has ith element

-1

[l- 6(ai)- ¢(bi)+Fi] .
/p(¢(a1)-F1)+(]-p)(]-¢(b1))

 

The information matrix now is a (2k+2) . (2k+2) matrix.

Now we discuss another possibility of observed veto under the assumption
that the first party to cast the veto is the one whose desire for a "no"
vote is stronger. That is, given X181 + £11 < O and XiBZ + 812 < 0, it is
assumed that the party with the lesser value is observed to cast the veto
first. Letting Hi be the probability that X181 + a].1 < X162 + 612 given
X181 + e11 < O and xiBZ + €i2 < O, the log-likel1hood funct1on is

in L(o) = 2 in F1
ia-Z1.=l
is observed

+ z in [¢(bi)-Fi + H1]
i9yi1=0
is observed

+ 2 2n [1- 9(b11'Hi] .
iey12=0
is observed

The information matrix then is

‘0 ‘ c1'c1 + cei'c9 I c1o'cio

where c1' is as above, c9' and c10' are (2k+l)-N matrices and have ith

columns respectively as

36

 

 

 

 

1 a[¢(bi)'Fi+Hi]
/¢(bi)'Fi+Hi 36
and
1 a[l- °(bi)'Hi]

where o = (61', 62', o)‘

 

 

 

 

 

 

 

i”; = _ ¢ bi'ai <:>(- ai+bi 1 x.'
381 720-9) /2(1+p) 72(1-6) '
_3_H_i. = 4%in ., ai+bi ) 1 .1
as2 /2(1-p) /2(1+o)' 72(1-6) '
91

3H. b -a. a.+b. b.-a.
—1 = fly—L— [¢(_L_‘_)¢(- .1.__‘_) ‘ ')
.Bp 72(1-6) 72(1-9) v/2(1+p) 1-o

bi-a. __._.-..

- f(———l-. - 6.; -/<'I'-o)_/2)1
/2(l-o) 1

The derivations of these three derivatives are in Appendix A.

Identification problems still occur under the assumptions that

811 E'21
Xi - [l X12], 5] - [B12] and 82 - [822] as before. When B12 - 322 - O and

p is unknown, the information matrix of the case of observed veto with a

37

given probability p is of the form

:9 ' N 1 (F ) (F ) ' + -l'(Q ) ( 1 ' +'l- ( ) ( ) '}
‘ §=1 {‘f' o i o i Q1 10 i 010 i 02 Q26 i 026 i

where

- ptl- ¢(a)1 + (l-p)[¢(b)-F]
- p[¢(a)-F] + (l-p)[l-¢(b)l

O O
N .—I
l I

(¢(a)¢(A) ‘
<i>(at)i>(A)Xi2
¢(b)¢(8)
j_¢(b)i(B)Xi J
f

 

 

r¢(a) l-p-(l-p)¢(A)J

¢(a) I-p-(l-p)¢(A)]Xi2

'Qie'i "' (l-p)¢(b) n- wen

(l-p)¢(b) [l- ¢(B)]X1-2
- (l-plf

 

 

and

’p¢(a) u- ium W

p¢(a) [l- <I>(A)1xi2

(029). 3 ¢(b) I-(l-pl-p i(B)]

¢(b) I-(l-pl-p ¢(B)]X1.2
-pf

 

 

b

It can be seen that for the whole information matrix

(the first row x q1) + (the third row x q2)
= the last row

38

where

= (l-pr_

q
1 ¢(a) [(l-p)¢(A)+p¢(B)l

q = 9f
2 ¢(b) [(l-p)i(A)+p¢(B)]

So the information matrix of the case of observed veto with given probability
p is singular, and neither 51 nor 62 can be identified. However this is true
only when p is unknown. If p is known and o=(81', 82',)', the information

matrix is not singular when 312 = 322 = O. .

In the case of observed veto with p as a parameter, the information

matrix is still of the form as before

N
= 1 l- I 1 I 1 I
but 6 now includes p as (81'. 82', p, p)‘ and (Fo)i’ (Qlo)i and (029), are all
matrices with dimension 6:1.' The extra rows of (Fo)i’ (Qlo)i and (020), are

0, $L-[1- ¢(a)- 6(b) + F] and ﬁl-[l- 6(a) - o(b) + F] respectively. For this
1 2

6-6 information matrix, besides the relationship that

(the first row x q1) + (the third row x q2)
= the fifth row

where q1, q2 are the same as before, it is also true that

[the first row x ¢(b)¢(8) - the third row x ¢(a)¢(A)]

-[l- i(a) - ¢(b) + F1
[9 ¢(a) ¢(b)¢(8) + (l-p) 6(a) ¢(b)o(A)l

X

= the last row.

39
Therefore for the case of observed veto with p as another parameter, when
812 = 822 = O the information matrix is singular whether p is known a priori
or not. Another way to say this is that this model, B] and 82 are identified
only if both p and p are known.

Finally we come to the question of identification for the case of
observed veto under the assumption that the first to use the veto is the one
who wants it worst. When 812 = 822 = O and p is not known, the information
matrix is

J .—.

1

domz

1 . 1 . 1 .
{F(F0)i(Fe)' +R1—(Rle)i(Rle)i +R'2‘(R26)1(R26)i}

where (Fo)i is the same as above,

R1 = 0(b) - F + H
R2 = l - 4(b) - H

' - ¢(a)i>(A) + H 1
B1

[- ¢(a)¢(A) + H811 x,2

- H8]

[- H811xi2
- f + Hp W

(Rlo)i =

 

 

- H8]
('”61)Xi2
"s - ¢(b)<i(B)

1
[H81 ' ¢(b)¢(B)]X12
-H
O

(R20)i

 

 

and where

b-a -—-————

H = F(-—--- . - b; - (l-o)/2)
/2(1-p) '

 

4O

 

HB = - ¢(—_P_é_a:_—) <D( a+b 1 1

 

l 72(1-9) -/2(1+p) 72(1-6)

Ho =12-i- “a, (¥E§) + f]

H, H and HD are constant for i=1,2,...,N.

B1

It can be seen that for this information matrix
(the first row x r1) + (the third row x r2)

= the last row

 

 

where
¢(b)i(B)[f-H l-f H
r = o 81
' 1(a)¢(b)¢(A)¢(B)-[¢(a)¢(A)+¢(b)¢(B)1HB]
r2 = ¢(a)<1>(A)Hp - fHB1

o(a)¢(b)¢(A)¢(B)-[6(a)¢(A)+¢(b)¢(B)]HB]

So for this case, the information matrix is singular when 812 = 822 = O and
p is not known, but not so when p is known.

Intuitively, when the coefficients of all the exogenous variables except
the constant term are equal to zero, there are three distinguishable events
(yes, no with party one vetoing, no with party two vetoing). Hence two
probabilities are independently estimatable and there are 2K pieces of infor-
mation available. Thus we can identify at most 2K parameters. From this
point of view we can see that the informationmatrices in the first and the
third situations are singular because there are (2K+1) parameters when p is
not known. But all the parameters can be identified if p is given. The
second case (which is the one with p as andther parameter) can't be identified

unless both p and p are known.

41

3.6 Summary

In this chapter, six different information matrices have been derived
corresponding to joint estimation of the bivariate probit model with varying
degrees of observability. Some information matrices for separate estimation
of the two equations are also derived, but separate estimation won't be as
efficient as joint estimation unless p=O. Because it is the inverse of the
information matrix that is the asymptotic variance — covariance matrix of
parameters, we also discuss question of identification by analyzing the rank
of these matrices. We especially analyze the perverse case in which all the
coefficients of exogenous variables are equal to zero. It is found that in
the above situation, regardless of the values of the constant terms, all of
the information matrices for the partial observability cases are singular
if p is not known. When p is known a priori, only the cases of partial
observability in the sense of Poirier and of observed veto with p as another
parameter still suffer from a lack of identification. The model with full
observability is still identified in this case, whether a is known or not.

The second perverse case is the one in which 81 = 82; that is, the two
probit equations are identical. Then the model with partial observability
in the sense of Poirier is not identified, though there are no problems with
the other cases.

The third perverse case is when 8]] = p321 and 312 = ”822' The model
with partial observability in the sense of Poirier and the partial partial
observability case are not identified when p is unknown. But there are no
problems when p is known a priori. In the similar situation when 821 = pen
and 822 = 9812, only the model with partial observability in the sense of

Poirier is not identified when p is unknown.

42
These results will be useful in picking parameter points at which to
evaluate the cost of partial observability. This we will do in the next

chapter.

CHAPTER FOUR

RESULTS OF EXPERIMENTS MEASURING THE
COST OF PARTIAL OBSERVABILITY

4.1 Introduction

In this chapter, the results of some experiments measuring the cost of
partial observability are presented. We call these "experiments" because
various specific values of the parameters have to be picked in order to
evaluate the inverse of information matrix. What we are interested in is
primarily (l) with the same values of the parameters, a comparison of
efficiencies under different levels of partial observability; and (2) the
reasons that will cause the change of cost for each individual level of
partial observability. With this knowledge, a researcher can compare costs
of using and not using a piece of information according to his case and
make a better decision.

When measuring the cost of partial observability, we let the elements
of the inverse of the information matrices for the various partial
observability cases be divided by the corresponding elements of the inverse
of the information matrix for case one. The reason for choosing case one
as a standard of comparison is because it has the most complete observability
and therefore leads to the most efficient estimates. Thus the ratios that .
we get are the ratios of asymptotic variances and covariances for the various
partial observability cases compared to the full observability case, and
represent the relative efficiencies of parameter estimates under different
levels of observability. The bigger the ratios are, the greater the cost

of partial observability.

43

44

Some simplifications made in the last chapter are still applied here.
8 B
11 _ 21
I I s B - [
= e'x13 and the X1.3 are random normal deviates with zero mean and unit

Specifically, we assume Xi=(l Xi2)’ s] = ] , where Xi
variance. All the experiments in this chapter have been done with a sample
size of 50. We also tried sample sizes of 10, 100 and 500, but the results
didn't show any significant difference. This simply indicates that the

ratios of asymptotic variances are more or less independent of sample size.

We do not address the question of what sample sizes are necessary for the
asymptotic distributions to be reliable.

In section 4.2, we first try three arbitrary cases B1=32=[}]; 81=82=[?];
6]=[J]] and 32=[}]. We present some general results which we believe are
always true for any values of 8. In section 4.3, all the experiments have
been done with one common characteristic, namely that g X131 ==§X132 = O
(ensured by appropriate choice of value of the constant term). The idea of
these experiments is to show the effects of changes in the parameters
(8], 82, p and p) when (on average) each party has an equal probability of
saying "yes" and "no". These results are not as readily interpreted as we
might hope, since changing any parameter (e.g. 812) changes a number of
features of the data and model which are relevant to the relative efficiencies,
such as the degree of identification, the split of the sample into the various
distinguishable outcomes, and so forth. Therefore in section 4.4 we try
some more complicated experiments in which we manipulate the parameters in
such a way as to isolate these various influences. For the most part these

results are in accord with our a priori expectations. Section 4.5 gives

the summary of this chapter.

45

Twenty-two tables at the end of this chapter present the results of the
experiments. Only the ratios of asymptotic variances of the parameter
estimates of the non-constant-term exogenous variables (B12 and 322), for
different levels of partial observability, are listed and discussed. Except
for those experiments especially designed to focus on p and p, all of the
experiments have been done for p=O and p=0.5, and p is fixed at 0.5 for
case four. As before, case one is the full observability case; case two
is the case of partial observability in the sense of Poirier; case three is
the partial partial observability case; case 4A1 is the observed veto case
given a known value of p; case 4A2 is the observed veto case with p as
another parameter; and case 48 is the observed veto case under the assumption
that the party who uses the veto first is the one that wants the veto more

strongly. Also we define

 

N
“—'= Z F(X-B . X.B ; p)/N
P1] i 1 l 1 2
N
10 1
N
01 1
N L
13‘“? [l- ¢(Xie]) - <1(X1-82) + F(xis]. X182;p)]/N
OO 1
N X.B -X B s_____._
g = z P(l—Z—i—l. 41.32; - /(1-p)/2)/N.
l /2(1-o)

They are the average probabilities of: both parties say "yes"; party one

says "yes" and party two says "no"; party one says "no" and party two says

46
"yes"; both parties say "no"; and party one uses veto first given that both
parties desire to do so; respectively. The distribution of the first four
probabilities are called "sample split" and all the probabilities are listed
in some of the tables. Also in the tables, "p is not known" means that p is
not given but rather is estimated as a parameter, and the information matrices

are of dimension (2k+l)°(2k+l) (or (2k+2)-(2k+2) for case 4A2).

4.2 General Results of Some Basic Experiments

The experiments presented in this section are called "Experiment 1" in
order to be distinguished from the others in section 4.3 and 4.4. Experiment 1
includes three different B's. They are (l) B] = 82 = [1]; (2) B1 = 82 = [?1;

1]. For the first two choices of 3, Poirier's

and (3) a] = [3,1 and 82 = I

model (case two) is not identified because X181 = X182 for all i. The other

cases are all identified. -
Table 1 lists the (expected) sample splits for each case under both

p=O and p=0.5. Table 2 shows the relative efficiencies of B12 and 322 for

each partial observability case. The results in Tables 1 and 2 are not easy

to summarize, but we do note the following.

(1) When 81 = 82 = [:1, Xie1 and Xie2 are all positive numbers greater than 1,
so the average probability of both parties voting "yes" is close to 1,
and the average probability of both parties voting "no" is very small.
Since X18] = X162, P76 = 01; the average probability of one party voting
"yes" and the other voting "no" is the same as the probability of the

opposite situation. Both probabilities are small. This is a fairly

extreme sample split..

(2)

(4)

(5)

47
When 81 = 82 = [1] changes to B] = 82 = [$1, both the values of XiBl
and xiBZ are decreased, so the average probability that both parties
will say "yes" is decreased too. On the other hand, the average
probabilities that one or both parties say "no" are all increased. The
samples split is still heavily weighted toward fboth yes", but less
extremely than before.
Knowing p results in smaller ratios of asymptotic variances, and hence
reduces the <:ost of partial observability. This agrees with the
general principle that we will call the "law of decreasing marginal
utility of information" (LOMUI), which is that the more information one
has, the less should be the value of another piece of information.
Observability information is less valuable when p is known a priori
than when it is not.
The Poirier case is the worst among all the partial observability cases
either when c>is unknown or known, because it is the one that is based
on the least information.
For case three, the ratios of asymptotic variances for estimates of 312
are equal to one for p=0 and are still very close to one for p=O.5, for
all values of B. This is so because party one's behavior is fully
observed in this case, and when p=0 party two's behavior is not
informative for party one's parameters. This is not true however when
pfO. For 82 (party two's parameters), the ratios are bigger than any
of those for case four when p is unknown. When p is known, they are
bigger than those for case 4A1 or case 48 but smaller than those for

case 4A2.

(6)

(7)

(8)

(9)

48
When Kip] = XiBZ and p = 0.5, 812 and 822 have the same efficiencies
for all three different possibilities of case four.
The estimates for case 4A2 are less efficient than for case 4A1, since
in case 4A1 more is known (namely p). In general the gain from knowing
p is greater when p is known (and conversely). This is a counter-
example to the LOMUI.
For case 48, H'is the average probability of XiBl + 21] < X162 + €i2
given that X181 + e1] < O and Xis2 + £12 < 0. So when Xis1 = X132,
H'= %-P66 . In the case of 31 = [1]] and 32 = [1], xis1 < X132, so
that party one is more likely to use the veto first when both sides
vote "no", and thus H'> %-PBB . Furthermore, the average probability
of the indistinguishable outcome is P6;'+ H for party two while it is
P}; + P66 - H for party one. Hence 822 tends to be less efficient
than 812 in case 48 when Hi> %-PBB , as it is here.
With X161 = X182 and p = 0.5, the efficiencies of 812 and 822 in case
4A1 are very close to those in case 43. This is so because when
X181 = xiBZ’ the probability of X181 + 811 < XiB2 + 612 is indeed 0.5.

The above conclusions will be seen to hold also in the experiments yet

to be presented,and thus are fairly general. (Some of them, of course, are

perfectly general since they must be true.) We will not discuss them

further.

There is some evidence above on the effects of changes in p. However.

this will be discussed in the next section.

49

4.3 Results of Further Experiments

In this section we report the results of more experiments, which vary
8], 82, p and p more widely than in the last section. All of the experiments
in this section have in common the feature that g X131 = g X132 = O. The
point of this is to try to minimize effects of parameter changes on the sample
split. For example, consider an experiment which varies 812 from zero to
some high level. As B12 increases, for some of the partial observability
cases the degree of identification increases and we might expect the cost of
partial observability to fall. (We might call this an "identification effect".)
However, if we change 812 while holding 8]] constant, the probability of a
"yes" vote also changes, and thus the sample split changes. This also may
affect relative efficiencies; for example, as P;;' increases there is fuller
observability for all cases and the relative efficiencies of the partial
observability estimators should increase. (We might call this a "sample
split effect".) Therefore in this section when 812 is changed, 811 is also
changed in such a way that XiB1 is (on average) zero: E X181 = 0, and
similarly for 82. For p=O and small B's, this will yield a symmetric sample
split: P66 = P6; = P;6'= PT;'= i-. For larger B's this is unfortunately
not so. Because of the non-linearity of the model, the average probabilities
(over the sample) are not the same as the probabilities evaluated at the
average of Xi. The latter will all equal %~(for p=O, anyway) but the former
will not. Thus we are not entirely successful in separating identification
effects from sample split effects. Indeed, it is not clear how well one can

hope to do so, but some more successful attempts will be made in the next

section.

50

Several types of experiments are presented in this section:

(1) Experiment 2: B1 = 82 = ['CCX] for c = 0.3, 0.5, l, 2 and 3,

-X.
where X ‘3

e = 1.525 ,
l

x. =
1 '2 i

II
II M 2
II M Z

i

X1.3 , i=1,2,...,N, are random numbers with standard
normal distribution havingii=0 and o=l. The same X's will
be used for all the experiments here, and the sample size
is 50. The results of Experiments 2 are in Tables 3, 4 and 5.

(2) Experiment 3: B] = [C_E] and 32 = ['ccx] for c = 0.3, 0.5, l, 2 and 3.

The results are in Tables 6, 7 and 8.

. . - g 4X
(3) Exper1ment 4A. 3] - 32 . [1.0];

Experiment 4B: 3] = [_]¥0] and 82 [;¥0] s

both for p=-O.5, O, 0.2, 0.5 and 0.8. The results are in
Tables 9, 10, 11 and 12.

' o = = -7.
(4) Exper1ment 5A. 31 32 [1.0];

[-1161 and B2

. , -7'
Exper1ment 58. B1 [1.0],

both for p=O, 0.2, 0.5, 0.8 and l (but for cases 4A1 and 4A2
only). The results are in Tables 13 and 14.

It is hard to summarize so many tables briefly. However, we will
discuss what we find to be the most interesting results.
For experiment 2, we know that case two is not identified for any value

of c because X181 = xiBZ’ i=1,2,...,N, and case three (82 only) and case

51

four are not identified if c is zero when p is not known. If p is known only
case 4A2 is not identified when c is zero. As c increases, the "identification
effect" should be to increase relative efficiencies of the partial observ-
ability cases. Meanwhile, we notice that the average probabilities of the
indistinguishable outcomes, namely (Pa;'+ P66) for 32 of case three and P66'
for case four are getting larger when c increases. That is, the identification
effect and the sample split effect work against each other when c changes.

From the results when p is not known, we can see that when c is smaller
(0.3, 0.5) the identification effect is quite strong so that the ratios of
asymptotic variances are decreasing as c gets larger for both p=O and p=O.5.
But when c increases to a certainlevel (261:3), these two effects seem to
cancel each other out and the results are less clear. For the p=O case,
whether these ratios will increase or decrease after c > 2.0 is uncertain.

When p = 0.5, although ratios are monotonically decreasing as c increases
from 0.3 to 3.0, whether they would keep on increasing or not can't really
be predicted.

When p is known, the relative efficiencies of case three (62 only) and
cases 4A1 and 4B are generally decreasing as c increases, presumably because
of the sample split effect. For case 4A2, since it still has an identification
problem if c is too close to zero, relative efficiencies are increasing as
c increases until c=3.

In Experiment 3, all of the parameters are identified for all cases.
However, for c=0 cases 2 and 4A2 would not be identified, while cases 3,
4A1 and 48 would be identified only with p known. As for the sample split
effect, the average probability of the indistinguishable outcome for case

two, which is (1-P1 ), is increasing with c but those of case three (82 only)

52

and case four are decreasing as c increases. Thus we are going to discuss
the relative efficiencies of the partial observability cases one by one.

For case two, the relative efficiencies of both 5] and 32 improve as
c increases from 0.3 to 3 for either p=0 or p=0.5, and whether p is known
or unknown. This shows that the identification effect is very strong for
case two, which is not surprising.

For 82 of case three, when p is not known, the identification effect
plus the sample split effect make the relative efficiency increase as c
increases both for p=0 and p=0.5. When o is known, the sample split effect
alone makes the relative efficiency increase for both o=0 and p=0.5 after
c=0.5.

For case 4A1, the effect of changing c is ambiguous. Note that case
4A1 yields much more efficient estimates than case 4A2, especially when c
is small. The same is true in comparing case 4A1 to case 48 when p is
unknown. When p is known, cases 4A1 and 48 yield rather similar efficiencies.

For case 4A2, for either p=O or p=0.5 and whether p is known or not,
the identification effect and the sample split effect both make 8] and 82
relatively more efficient as c gets larger. Therefore, the efficiency
relative to full observability is monotonically increasing all the way from
c=0.3 to 3.

Since case 48 has identification problems only when p is not known, its
relative efficiency improves dramatically when c increases, when p is
unknown. When p is known, the effect of increasing c is ambiguous.

Experiment 4 shows the effect of the correlation coefficient p. For
Experiment 4A with p unknown, the relative efficiencies of all cases (only

32 for case three) get worse as p increases. The changes are considerable

53

especially for bigger p. When p is known, the result is mostly the opposite.
Most ratios get smaller as p increases, but these are mixed results for case
4A2. However all the changes are much smaller; this is, 0 doesn't matter
much if it is known.

For Experiment 48 with p unknown, relative efficiencies of all cases
(only 82 for case three) except case 4A1 improve as p increases. The changes
are eSpecially big when p increases from -O.5 to O. The ratios for case 4A1
are very small compared to other cases, but they increase as p increases.
When p is known, the relative efficiencies of case two and case 4A2 improve
but cases three (82 only), 4A1 and 48 get worse as p increases. All the
changes are also smaller when p is known.

For both Experiment 4A and 4B, the relative efficiency of 8] in case
three, being affected by the correlation with another unobservable party,
gets away from 1 as the absolute value of p increases.

Experiment 5 shows the effect of p on cases 4A1 and 4A2. For case 4A1
and 4A2, the average probability of the indistinguishable outcome for party
one is P16 + (l-p) 566' and is 501. + p PBS’ for party two. So as expected,
increasing p decreases the ratios for a] but increases them for 32, and the
effects are very strong.

Both experiments 5A and SB have the same results for cases 4A1 and 4A2
whether p is known or not, but the results are mixed for case 4A2 in

Experiment 78 when p is known.

54

4.4 The Results of Experiments with Either the Identification Effect or
the Sample Split Effect Constant

 

From the results of the last section, we can see that because of the
mixture of the identification effect and the sample split effect, sometimes
we can not really tell the direction of change of the relative efficiency
when a parameter changes. Therefore, in this section, we try some other
experiments designed to change one effect while holding the other constant.
All the experiments are done by adjusting the values of the constant terms
to manipulate the sample split. Since different cases depend on different
features of the sample split, we do a different experiment for each partial
observability case.

There are three types of experiments in this section:
(1) Experiment 6A: 31 = [_g] and 32 = [g] for c = 0.3, 0.5, l, 2 and 3;

d is adjusted so that qu'is fixed at 0.25.

Experiment 68: 81 = [_i] , 82 = [g] and c=l; d is adjusted so that P?—

—l

varies from 0.15, 0.25, 0.35, 0.45, 0.55 to 0.65.
These are experiments designed for case two and results are

in Tables 15 and 16.
(2) Experiment 7A: a] = 32 = [‘3] for c = o 3,0.5, 1, 2 and 3;

d is adjusted so that P01 + P00 = 0.50.

Experiment 7B: 31 = ['31] and 52 = [’32] for c = 0.3, 0.5, 1, 2 and 3;

d1 and d2 are adjusted so that P0] = P00 and both are fixed
at 0.25.

55
Experiment 7C: 81 = 82 = [-3] and c = 1.0;

d is adjusted so that PET" + Paa'varies from 0.20, 0.30, 0.40,
0.50. 0.60 to 0.70.

° - = -d] = -d = '
Exper1ment 7D. 81 [ c-]’ 82 [ c2] and c 1.0, d1 and d2

are adjusted so that P6? = P66' and both vary from 0.10, 0.15,
0.20. 0.30 and 0.35.

These four experiments are designed for case three. Since 81 has all
the ratios very close to one, only the results for 82 are listed and they are

in Tables 17, 18, 19 and 20.
(3) Experiment 8A: 3] = 32 = ['2] for c = 0.3, 0.5, l, 2 and 3;

d is adjusted so that P66 = 0.25.

Experiment 88: s] = 32 = [’3] and c = 1.0;

d is adjusted so that P66 varies from 0.15, 0.25, 0.35, 0.45,
0.55 to 0.65.

These are experiments designed for case four and results are in Tables
21 and 22.

We get the following conclusions from the results of these experiments:

From the results of Experiment 6B, 7C, 7D and 88, it can be seen that
sample split effect does affect the relative efficiencies of partial
observability cases. The higher the average probability of the indistinguish-
able outcome for each case, the worse the relative efficiency and higher the

cost of partial observability. This result holds under different situations

56

concerning p. Thus, in Experiment 68 (Table 16), increasing PT; increases
the efficiency of case two relative to case one. In Experiments 7C and 70
(Tables 19 and 20), increasing PB;'+ P66 decreases the efficiency of case
three relative to case one. In Experiment 88 (Table 22), increasing PBB'
decreases the efficiency of case four relative to case one. All of this is
as expected.

Experiments 6A, 7A, 7B and 8A attempt to investigate the identification
effect while holding the sample split effect constant. This leads to results
that are less clear-cut than those just reported. Basically, identification
effects are strong and predictable near points of singularity, but less so
far from points of singularity.

Experiment 6A investigates the identification effect for case two. The
probability of the indistiguishable event for case two are l-PYTL so we hold -
the sample split effect constant by holding constant PT;'= 0.25. Lack of
identification occurs if c=0 regardless of whether p is known or not.
Therefore the efficiency of case two relative to case one is expected to
increase (and the entires in Table 15 to fall) as c increases. The results
in Table 15 show that mostly they do. The exceptions occur when c is big
(c=3) and p is known, which are far from points of singularity.

Experiments 7A and 7B investigate the identification effect for case

three. Since the probability of the indistinguishable event for case three

 

is (P01 + P00), we attempt to hold the sample split effect constant by

-1 . —=—___l .
holding constant P6;'+ P66 - 7-(Exper1ment 7A) or P0] P00 Z-(Exper1ment
78). Lack of identification for case three occurs when c=O and p is unknown.
Therefore when p is unknown we would expect the efficiency of case three

relative to case one to rise (and the entires in Table 17 and 18 to fall) as

57

c increases. This does occur as c increases, when c is small, but for c > 1
the opposite occurs. In other words, the identification effect shows up
only when the model is close to non-identification. The same phenomenon
occurs when p is known. Then the parameters are identified for all c, and
the relative efficiency for case three falls monotonically except for very
small c.

Experiment 8A investigates the identification effect for case four. The
probability of the indistinguishable event here is P66} so the relevant
portion of the sample split is P66 , which we hold constant as c changes.
Lack of identification occurs when c=0; for case 4A2 this is so regardless of
whether p is known, while for case 4A1 and 48 this is so only if p is unknown.
The results in Table 2.1 are fairly predictable. Wherever identification is
relevant (all cases when p is unknown, but only case 4A2 when p is known)
the efficiency of case four relative to case one rises (entires in the table
fall) as c increases. For cases where identification is not relevant (cases
4A1 and 48 when p is known) relative efficiency first rises and then falls as

c increases.

4.5 Summary

In this chapter, we have conducted a large variety of experiments to
measure the cost of various levels of partial observability. The results
have been given in some detail in the preceeding sections. Here we will
give a brief summary of the most important conclusions.

The cost of partial observability is quite high. The estimates from
Poirier's model (our case two) typically have variances tens or hundreds of
times as large as do the estimates from the model with full observability

(our case one). This cost decreases markedly if any piece of observability

58,
information can be found; for example, observability for either party (our
case three) or observed veto (our case four). The law of diminishing
marginal utility of information usually holds: the gain in moving from case
two to case three or four usually exceeds the gain in moving from case three
or four to full observability (our case one). It is the first piece of
observability information which is most important.

A second clear conclusion is that specifying c>a priori improves the
efficiency of the estimates of the other parameters a great deal. Further-
more the improvement from knowing a is largest when it is needed most; that
is, when the relative efficiency is lowest.

A third conclusion is that the sample split has a strong influence on
the relative efficiencies of the estimates. For a given partial observability
case, its efficiency relative to full observability will be higher, the
smaller the proportion of observations which fall into_the indistinguishable
categories. Thus, for example, for Poirier's model relative efficiency will
be high only when most observations are of the "yes, yes" variety. The
fraction of such observations is observable. Similarly, in our case three
the observations which reduce relative efficiency are the ones for which the
observable party votes "no", and the proportion of such observations is also
observable. On the other hand, for the observed veto cases the relevant
proportion of observations is not directly observable.

Our last main conclusion is that the strength of identification matters.
All of the partial observability cases are unidentified for some perverse
parameter points (as described in Chapter 3). Their relative efficiency
is naturally low for parameter values near such points, and it increases

rapidly as the parameters move away from such points of singularity. However,

59

these effects are not strong except in the immediate neighborhood of points
of non-identification. Furthermore, this last conclusion depends on
unobserved parameters, and therefore is less likely to be informative, in

practical applications, than the other three conclusions listed above.

TABLE 1

Sample Splits for Different B

 

mﬂﬂﬂﬂ°

01=02= [11

0.0 0.5
0.9233 0.9320
0.0365 0.0283
0.0365 0.0283
0.0031 0.0113
0.0016 0.0057

91=02= [$1

0.0 0.5
0.6846 0.7243
0.1297 0.0900
0.1297 0 0900
0.0560 0.0957
0.0200 0.0473

0.4422
0.0133
0.5182
0.0263
0.0192

 

6O

TABLE 2

Ratios of Asymptotic Variances (Cost of Partial Observability)

 

 

 

 

 

 

 

 

_ _ 1 _ _ O _ l - l
B 81"82' [1] 81-82' [1] 81’ [_‘|]9 82' [1]
p 0.0 0.5 0.0 0.5 0.0 0.5
p is not known
Case 2 * * * * 17.6810 6.2666
Case 3 1.0000 1.0117 1.0000 1.0119 1.0000 1.0007
812 Case 4A1 5.1284 28.0120 8.7086 41.8127 4.5844 2.8136
Case 4A2 7.0715 33.2000 10.9572 45.7816 4.7963 2.9075
Case 48 5.1225 27.9721 8.6645 41.6975 2.6225 1.8931
Case 2 * * * * 8857.6250 35637.07
Case 3 9.3734 52.2389 17.2976 78.4054 178.6166 864.2910
822 Case 4A1 5.1284 28.0120 8.7086 41.8127 2.4228 1.9527
Case 4A2 7.0715 33.2000 10.9572 45.7816 46.7191 3.8118
Case 48 5.1225 27.9706 8.6644 41.6975 19.8626 47.1023
is known
Case 2 * * * * 7.6118 3.7920
Case 3 1.0000 1.0088 1.0000 1.0073 1.0000 1.0008
812 Case 4A1 1.0255 1.0731 1.1135 1.1755 1.0189 1.0273
Case 4A2 2.9686 6.2613 3.3621 5.1455 4.3046 2.0516
Case 48 1.0196 1.0332 1.0694 1.0604 1.0103 1.0067
Case 2 * * * * 105.3535 191.2173
Case 3 1.0500 1.1219 1.2118 1.2796 2.1731 5.0474
822 Case 4A1 1.0255 1.0731 1.1135 1.1755 1.3622 1.6642
Case 4A2 2.9686 6.2613 3.3621 5.1455 10.1104 3.7010
Case 48 1.0196 1.0331 1.0694 1.0604 1.6767 3.0236

 

*The information matrix of case two is singular (parameters are not identified).

61

TABLE 3

Samp1e Sp1its for Different c

 

 

 

 

_ _ -c'X
when 81'82' [ C ]
p=0.0
c 0.3 0.5 1.0 2.0 3.0
1571‘ 0.2491 0.2486 0.2542 0.2629 0.2811
1713 0. 2297 0. 2076 0.1464 0. 0993 0. 0450
1'33? 0.2297 0.2076 0.1464 0.0993 0.0450
1'55; 0.2916 0. 3362 0.4529 0.5385 0. 6288
35+} 0.5213 0. 5438 0. 5993 0. 6378 0. 6838
1'1" 0.1458 0.1681 0.2265 0.2693 0.3144
p=0.5
c 0.3 0.5 1.0 2.0 3.0
1°? 0.3244 0.3151 0.2982 0.2913 0.2944
F}; 0.1544 0.1411 0.1024 0.0709 0.0317
F01- 0.1544 0.1411 0.1024 0.0709 0.0317
9—06 0. 3669 0. 4027 0. 4970 0. 5670 0. 6422
1567.433 0. 5213 0.5438 0. 5994 0. 6379 0. 6739
‘11 0.1834 0.2013 0.2485 0.2835 0.3211

 

62

TABLE 4

 

 

 

 

 

 

 

Ratios of Asymptotic Variances (Cos§_of Partia1 0bservabi1ity)
.. _ 'C _
When 81-82- [ c ] and p-O
c 0.3 0.5 1.0 2.0 3.0
p is not known

Case 3 1.0000 1.0000 1.0000 1.0000 1.0000

8 Case 4A1 91.2821 50.9838 19.9076 23.0626 18.5827
12 Case 4A2 103.0591 56.0029 22.1733 23.8038 18.8309

Case 43 91.1068 50.7766 19.4382 21.7407 16.8380

Case 3 205.0555 121.5853 42.5952 52.0272 37.4875

8 Case 4A1 91.2821 50.9838 19.9076 23.0626 18.5827
22 Case 4A2 103.0591 56.0029 22.1733 23.8038 18.8309

Case 48 91.1063 50.7766 19.4382 21.7407 16.8380

9 is known

Case 3 1.0000' 1.0000 1.0000 1.0000 1.0000

Case 4A1 1.3747 1.4050 1.7985 3.0932 3.6511

812 Case 4A2 13.1517 6.4240 4.0643 3.8345 3.8994
Case 48 1.1991 1.1977 1.3291 1.7713 1.9064

Case 3 1.7174 1.7369 2.3280 4.0159 4.6467

8 Case 4A1 1.3747 1.4050 1.7985 3.0932 3.6511
22 Case 4A2 13.1517 6.4240 4.0643 3.8345 3.8994

Case 48 1.1991 1.1977 1.3291 1.7713 1.9064

 

*The information matrix of case

two is singu1ar in this experiment.

63

TABLE 5

 

 

 

 

 

 

 

Ratios of Asymptotic Variances (Cost of Partia1 0bservabi1ity)
when 81=82= {-ch] and p=0.5
c 0.3 0.5 1.0 2.0 3.0
o is not known

Case 3 1.0008 1.0038 1.0080 1.0095 1.0119

8 Case 4A1 267.7677 207.5012 63.1785 61.4412 43.9869
12 Case 4A2 281.0156 213.1307 65.5496 62.1682 44.2313

Case 4B 267.4762 207.1903 62.6215 60.1184 42.3013

Case 3 527.4460 402.6952 125.7184 129.2031 91.1600

8 Case 4A1 267.7677 207.5012 63.1785 61.4412 43.9869
22 Case 4A2 281.0156 213.1307 65.5496 62.1682 44.2313

Case 43 267.4762 207.1903 62.6215 60.1184 42.3013

p'1S known

Case 3 1.0016 1.0019 1.0046 1.0064 1.0074

8 Case 4A1 1.4345 1.4460 1.7343 2.6848 3.1727
12 Case 4A2 14.6908 7.0827 4.1108 3.4136 3.4176

Case 48 1.1428 1.1351 1.1760 1.3589 1.4840

Case 3 1.6862 1.6701 1.9542 2.8662 3.3950

Case 4A1 1.4345 1.4460 1.7343 2.6848 3.1727

822 Case 4A2 14.6908 7.0827 4.1108 3.4136 3.4176
Case 4B 1.1428 1.1351 1.1760 1.3589 1.4840

 

*The information matrix of case two is singu1ar in this experiment.

64

m

Samp1e Sp1its {or Different cY
-c ___-c
When 8] - [ -c] and 82 [ c 1

 

 

 

 

p=0.0
c 0.3 0.5 1.0 2.0 3.0
p;;‘ 0.2297 0.2075 0.1464 0.0704 0.0450
510. 0.2916 0.3362 0.4529 0.5881 0.6288
F01 0.2491 0.2486 0.2542 0.2712 ' 0.2811
F00, 0.2297 0.2076 0.1464 0.0704 0.0450
'pa;4p'5' 0.4788 0.4562 0.4006 0.3416 0.3261
‘R' 0.1079 0.0914 0.0566 0.0282 ' 0.0205
p=0.5
c 0.3 0.5 1.0 2.0 3.0
P11 0.3011 0.2667 0.1800 0.0856 0.0558
975' 0.2202 0.2771 0.4194 0.5729 0.6181
F01 0.1777 0.1895 0.2206 0.2560 0.2704
235' 0.3011 0.2667 0.1800 0.0856 0.0558
03749‘6' 0 4788 0.4562 0.4006 0.3416 0.3262
R' 0.1334 0.1130 0.0605 0.0332 0.0255

 

65

TABLE 7

Ratios of Asymptotic Variances (Cost of Partia1 0bservabi1ity)
When 81= [C_Z1, 82= ['ch1 and p=0

 

 

 

 

 

 

 

c 0.3 0.5 1.0 2.0 3.0
p is not known
Case 2 605.5864 518.8082 151.5306 221.9935 75.3350
Case 3 1.0000 1.0000 1.0000 1.0000 1.0002
812 Case 4A1 1.7312 1.5502 1.3647 1.5126 1.6559
Case 4A2 104.7621 57.8441 10.8805 4.8029 3.8208
Case 48 48.9987 22.8783 13.1748 11.8071 10.9172
Case 2 21751.885 5910.3139 659.7719 78.4286 42.9161
Case 3 890.1655 411.7187 119.8679 30.1673 19.4758
822 Case 4A1 1.8019 1.6830 1.6583 1.7880 1.8296
Case 4A2 91.1751 48.8784 12.1145 7.1413 5.9189
Case 48 64.2327 30.7263 11.6161 6.0176 5.0118
p is known
Case 2 365.3644 313.1222 44.8365 32.2472 22.9143
Case 3 1.0000 1.0000 1.0000 1.0000 1.0002
312 Case 4A1 1.3767 1.3311 1.3634 1.4480 1.5318
Case 4A2 81.4734 36.9280 10.7038 4.5243 3.4053
Case 48 1.1951 1.2293 1.5222 2.5273 3.0136
Case 2 589.6485 529.1590 59.2528 22.1721 12.8180
Case 3 2.7547 3.2830 3.1431 2.5740 2.1090
622 Case 4A1 1.6010 1.6209 1.5473 1.4188 1.3331
Case 4A2 71.8316 32.5675 12.0924 6.2325 4.7055
Case 48 1.5562 1.8215 1.9029 1.7929 1.5520

 

66

TABLE 8

Ratios of Asymptotic Variances (Cost of Partia1 Observabi1ity)

_ c 7' _ -c'Y _
When 8]“ [ -C]’ 82' [ C ] and p-0.5

 

 

 

 

 

 

 

c 0.3 0.5 1.0 2.0 3.0
o is not known
Case 2 314.6830 312.4838 35.6600 48.1634 27.8839
Case 3 1.0042 1.0078 1.0089 1.0081 1.0075
812 Case 4A1 1.7337 1.4753 1.3893 1.5222 1.6044
Case 4A2 54.1470 18.4067 5.1199 2.6099 2.5202
Case 48 19.8579 10.2136 8.1579 8.5622 8.4649
Case 2 4389.5386 2022.1393 82.0819 18.1384 14.0965
Case 3 344.3801 176.7509 36.3043 11.6794 7.9632
822 Case 4A1 1.8750 1.7257 1.8023 1.7998 1.7341
Case 4A2 45.6778 15.6901 6.3376 3.9413 3.7014
Case 48 30.8417 15.6040 6.6949 3.9136 3.2999
p is known
Case 2 154.3449 47.5010 18.1745 15.2464 14.2868
Case 3 1.0049 1.0086 1.0076 1.0047 1.0034
812 Case 4A1 1.3572 1.3378 1.3793 1.4578 1.5356
Case 4A2 35.7942 12.6180 5.1139 2.5252 2.3125
Case 48 1.1593 1.2532 1.8462 3.6369 3.8029
Case 2 297.7500 105.2685 20.4917 8.7703 5.8623
Case 3 3.1706 4.0090 3.6484 2.7647 2.3078
822 Case 4A1 1.7264 1.7192 1.6060 1.4609 1.3779
Case 4A2 31.4848 11.8300 6.1416 3.5386 2.9358
Case 1.6245 2.0532 2.1276 1.9355 1.6636

48

 

67

TABLE 9

Samp1e Sp1its for Different 0
When 81=82= ['CC71 and c=1.0

 

 

 

p -0.5 0.0 0.2 0.5 0.8
9;;' 0.2206 0.2542 0.2701 0 2982 0.3363
573' 0 1800 0.1464 0.1305 0 1024 0.0643
93;' 0 1800 0 1464 0.1305 0.1024 0.0643
F35” 0.4194 0.4970 0.4689 0 4970 0.5351
R' 0.2097 0.2485 0.2345 0.2485 0.2676
TABLE 10
Samp1e Sp1its for Different 0
When 81= [C_Z1, 82= ['CCY1 and c=1.0
p -0.5 0.0 0.2 0.5 0.8
9;;’ 0 1024 0 1464 0.1609 0.1800 0.1961
576' 0.4970 0.4529 0.4385 0 4194 0.4032
$37' 0 2982 0.2542 0.2398 0.2206 0.2045
565' 0 1024 0.1464 0.1609 0.1800 0.1961
R' 0.0441 0.0566 0.0590 0.0605 0.0603

 

68

TABLE 11

Ratios of Asymptotic Variances (Cost of Partia1 0bservabi1ity)
When 81=82 {-ccx] and c=1.0

 

 

 

 

 

 

 

p -0.5 0.0 0.2 0.5 0.8
p is not known

Case 3 1.0088 1.0000 1.0014 1.0080 1.0166

8 Case 4A1 9.5192 19.9076 29.4137 63.1785 225.4031
12 Case 4A2 11.5298 22.1733 31.7488 65.5496 227.6452

- Case 48 9.1134 19.4382 28.9126 62.6215 224.7830
Case 3 20.0543 42.5952 62.4095 125.7184 351.5529

8 Case 4A1 9.5192 19.9076 29.4137 63.1785 225.4031
22 Case 4A2 11.5298 22.1733 31.7488 65.5496 227.6452

Case 48 9.1134 19.4382 28.9126 62.6215 224.7830

9 is known

Case 3 1.0042 1.0000 1.0007 1.0046 1.0106

8 Case 4A1 2.0485 1.7985 1.7632 1.7343 1.7078
12 Case 4A2 4.0619 4.0643 4.0991 4.1108 3.9623

Case 48 1.6420 1.3291 1.2619 1.1760 1.0843

Case 3 3.1039 2.3280 2.1651 1.9542 1.6801

8 Case 4A1 2.0485 1.7985 1.7632 1.7343 1.7078
22 Case 4A2 4.0619 4.0643 4.0991 4.1108 3.9623

Case 48 1.6420 1.3291 1.2619 1.1760 1.0843

 

1*
The information matrix of case two is singu1ar in this experiment.

69

TABLE 12

Ratios of Asymptotic Variances (Cost of

Partia1 0bservabi1ity)

 

 

 

 

 

 

 

7' - 'Y
When s1= [c_c], 82= [ cc 1 and c=1.0
p -0.5 0.0 0.2 0.5 0.8
p is not known
Case 2 2358.6990 151.5306 74.2684 35.6600 20.9822
Case 3 1.0079 1.0000 1.0014 1.0089 1.0232
812 Case 4A1 1.3347 1.3647 1.3735 1.3893 1.4071
Case 4A2 34.7615 10.8805 7.8550 5.1199 3.2795
Case 48 25.1054 13.1748 10.7202 8.1579 6.3586
Case 2 7560.6707 659.7719 288.2825 82.0819 21.5540
Case 3 323.2625 119.8679 77.3109 36.3043 14.5077
822 Case 4A1 1.5246 1.6583 1.7142 1.8023 1.8533
Case 4A2 35.6887 12.1145 9.1256 6.3376 4.2237
Case 48 23.2502 11.6161 9.2417 6.6949 4.8725
p is known
Case 2 193.8613 44.8365 30.0131 18.1745 11.3890
Case 3 1.0085 1.0000 1.0014 1.0076 1.0168
812 Case 4A1 1.3369 1.3634 1.3692 1.3793 1.3910
Case 4A2 31.9380 10.7038 7.8430 5.1139 3.2787
Case 48 1.3655 1.5222 1.6122 1.8462 2.6237
Case 2 252.0110 59.2528 38.0065 22.1721 11.0502
Case 3 2.5817 3.1431 3.3396 3.6484 3.8425
822 Case 4A1 1.4760 1.5473 1.5679 1.6060 1.6596
Case 4A2 33.5898 12.0924 9.0963 6.1416 4.0939
Case 48 1.6567 1.9029 1.9851 . 2.1276 2.4247

 

70

Cost of Partia1 0bservabi1i

TABLE 13

When 81:82: [1.0]

§y for Case Four

 

 

 

 

 

 

 

 

 

p 0.0 0.2 0.5 0.8 1.0
=0.0 but isn't known a priori

812 Case 4A1 42.5952 37.9504 19.9076 3.9124 1.0000

Case 4A2 44.5322 38.0962 22.1733 8.3578 3.5517

822 Case 4A1 1.0000 3.9124 19.9076 37.9504 42.5952

Case 4A2 3.5517 8.3578 22.1733 38.0962 44.5322

p=0.0 and is known a priori

812 Case 4A1 2.3280 2.1301 1.7985 1.3913 1.0000

Case 4A2 3.3388 3.8546 4.0643 3.8814 3.2158

. 822 Case 4A1 1.0000 1.3913 1.7985 2.1301 2.3280

Case 4A2 3.2158 3.8814 4.0643 3.8546 3.3388
p=0.5 but isn't known a priori

812 Case 4A1 125.7184 120.5843 63.1785 9.2338 1.0080

Case 4A2 147.9888 127.0938 65.5496 16.4543 3.3842

822 Case 4A1 1.0080 9.2338 63.1785 120.5843 125.7184

Case 4A2 3.3842 16.4543 65.5496 127.0938 147.9888

p=0.5 and is known a priori

812 Case 4A1 1.9542 1.9082 1.7343 1.4090 1.0046

Case 4A2 3.0465 3.8067 4.1108 3.8258 2.9611

822 Case 4A1 1.0046 1.4090 1.7343 1.9082 1.9542

Case 4A2_ 2.9611 3.8258 4.1108 3.8067 3.0465

 

71

TABLE 14

Cost of Partia1 0bservabi1ity for Case Four

_ 'X _ '7
When 81‘ [_1.0]9 82‘ [1.0]

 

 

 

 

 

 

 

 

 

p 0.0 0.2 0.5 0.8 1.0
p=0.0 but isn't known a priori
812 Case 4A1 42.5952 5.3042 1.3647 1.3192 1.0000
Case 4A2 71.7093 37.5250 10.8805 3.2471 1.0002
822 Case 4A1 1.0000 1.3031 1.6583 7.9127 119.8679
Case 4A2 1.0008 3.6176 12.1145 40.9526 154.8729
p=0.0 and is known a priori
812 Case 4A1 2.3280 1.7456 1.3634 1.1418 1.0000
Case 4A2 7.6646 14.1330 10.7038 2.2691 1.0001
822 Case 4A1 1.0000 1.1850 1.5473 2.2172 3.1431
Case 4A2 1.0002 2.2614 12.0924 19.1311 5.3093
p=0.5 but isn't known a priori
812 Case 4A1 20.0543 2.4665 1.3893 1.1930 1.0089
Case 4A2 23.2379 14.3879 5.1199 1.8668 1.0089
822 Case 4A1 1.0088 1.1927 1.8023 5.8469 36.3043
Case 4A2 1.0089 2.0992 6.3376 18.5893 36.4342
p=0.5 and is known a priori
812 Case 4A1 3.1039 1.9304 1.3793 1.1413 1.0076
Case 4A2 4.5041 10.5578 5.1139 1.6483 1.0076
822 Case 4A1 1.0025 1.1938 1.6060 2.4451 3.6484
Case 4A2 1.0061 1.9620 6.1416 10.5167 3.6696

 

72

TABLE 15

Cost of Partia1 Observabi1ity for Case Two

 

 

 

 

 

 

 

 

 

 

 

 

- d -d —.
When 8 - {-c1’ 82- [c] and P11-0.25
p=0.0
c 0.3 0.5 1.0 2.0 3.0
d 0.0957 0.1963 0.4508 0.9458 1.4487
p is not known
812 1486.7109 618.7743 48.6514 13.8296 6.1085
822 121893.56 52071.731 36717.786 27168.080 18131.267
p is known
812 552.7849 ' 75.3081 17.4288 5.6840 3.7545
822 2524.1335 728.7705 256.3893 146.7752 270.6402
p=0.5
c 0.3 0.5 1.0 2.0 3.0
d -0.0857 0.0447 0.3535 0.9138 1.4409
p is not known
812 680.0607 35.2165 11.3308 4.3019 3.2452
822 327797.83 159416.21 117079.78 97378.301 62839.36
p is known
812 103.4635 15.2357 6.2589 3.1468 2.1256
822 1536.1228 641.0490 448.0437 459.6155 1106.0470

 

73

TABLE 16

Cost of Partia1 Observabi1ity for Case Two

= d = d =
When 81 [_c], 82 [c] and c 1.0

 

 

 

 

 

 

 

 

 

 

 

 

p=0.0
977' 0.15 0.35 0.45 0.55 0.65
d 0.0877 0.7636 1.0680 1.3908 1.7623
p is not known
812 108.2441 26.6034 15.8334 9.7708 6.0623
822 111910.395 15725.367 7589.4556 3886.9336 2047.7613
p is known
812 33.9630 10.6121 6.8617 4.7238 3.2380
822 570.5558 148.0751 96.5562 67.6946 49.9944
p=0.5
P;;' 0.15 0.35 0.45 0.55 0.65
d -0.0543 0.6977 1.0258 1.3666 1.7510
p is not known
B12 17.7333 8.1456 6.1336 4.6763 3.8507
822 273997.19 60851.628 34097.547 19487.394 10932.74
9 1S known
812 9.2281 4.7224 3.7254 2.9860 2.3859
822 864.2748 277.7323 185.5475 127.9618 89.0390

 

74

TABLE 17

Cost of Partia1 0bservabi1ity for Case Three

 

 

 

 

 

 

 

 

 

 

 

 

- .. -d —— =
When 81-82- [ c] and P01+ 00 0.50
o=0.0
c 0.3 0.5 1.0 2.0 3.0
d 0.4003 0.6357 1.1557 2.0572 2.9064
p is not known
322 206.2943 '94.5461 37.6200 43.0889 48.6845
p is known
822 1.6581 1.6153 1.8644 2.5978 2.9655
p=0.5
c 0.3 0.5 1.0 2.0 3.0
d 0.4005 0.6356 1.1557 2.0578 2.9064
p is not known
822 554.9503 340.8961 117.2571 115.8412 125.9888
0 '15 known
322 1.6471 1.5918 1.7113 2.1069 2.3266

 

75

TABLE 18

Cost of Partia1 0bservabi1ity for Case Three

 

 

 

 

 

 

 

 

 

 

 

 

= -d1 = -d2 —_=-—-=
When 81 [ c 1. 82 [ c ] and 01 P00 0.25
p=0.0
c 0.3 0.5 1.0 2.0 3.0
d1 0.4003 0.6357 1.1557 2.0578 2.9064
(12 0.3008 0.4323 0.6820 1.1212 1.5651
p is not known
822 234.5919 96.6320 58.0045 119.1044 217.5330
p is known
822 1.6596 1.6206 1.9260 3.1437 6.2018
p=0.5
c 0.3 0.5 1.0 2.0 3.0
d1 0.4005 0.6357 1.1557 2.0578 .2.9064
d2 -0.0622 0.1009 0.4250 0.9644 1.4679
p is not known
822 1623.2131 706.0171 503.8493 1265.8145 3283.1157
p is known
822 1.9167 1.9058 2.3351 4.1795 9.2418

 

76

TABLE 19

Cost of Partia1 0bservabi1ity for Case

When 81=82= ['3] and c=1.0

Three

 

 

 

 

 

 

 

 

 

 

 

 

o=0.0
FEE 0.20 0.30 0.40 0.60 0.70
d 0.0630 0.4564 0.8084 1.5275 1.9623
p is not known
822 18.1194 24.4290 31.3962 42.6311 54.2505
p is known
822 1.2294 1.3728 1.5702 2.3317 3.0872
p=0.5
5674066' 0.20 0.30 0.40 0.60 0.70
0 0.0630 0.4564 0.8084 1.5275 1.9623
p is not known
822 80.7077 95.9903 108.5038 125.7936 149.8488
p is known
822 1.2941 1.4034 1.5364 1.9560 2.3189

 

77

Cost

TABLE 20

of Partia1 0bservabi1ity for Case Three
When B1= [-31], 82= ['22] and c=1.0

 

 

 

 

 

 

 

 

 

 

 

 

p=0.0
15543; 0.10 0.15 0.20 0.30 0.35
d1 0.0630 0.4564 0.8084 1.5275 1.9623
02 0.5546 0.5962 0.6376 0.7329 0.7951
p is not known
822’ 14.1692 22.4785 35.5819 100.6174 196.7489
p is known
322 1.2093 1.3642 1.5850 2.5186 3.7620
p=0.5
FBYEDBB' 0.10 0.15 0.20 0.30 0.35
d] - 0.0630 0.4564 0.8084 1.5275 1.9623
d2 —0.0085 0.1564 0.2961 0.5513 0.6817
p is not known
822 89.9929 158.1305 276.4605 1014.1426 2513.5569
p is known
822 1.3155 1.5315 1.8423 3.2329 5.2786

 

78

1591.531.

Cost of Partia1 0bservabi1ity for Case Four

 

 

 

 

 

 

 

 

 

 

 

 

- _ -d _ -—;
When 81-82- [ c], p-0.5 and POO-0.25
p=0.0
C 0.3 0.5 1.0 2.0 3.0
d 0.3497 0.5294 0.8933 1.4822 2.0127
p is not known
Case 4A1 89.5407 34.2244 15.5636 14.8929 13.2041
812 & 822 Case 4A2 100.9828 38.8355 18.0756 16.4319 14.3767
Case 4B 89.3978 34.0865 15.3714 14.5476 12.6529
0 is known
Case 4A1 1.3176 1.2881 1.3605 1.5742 1.8609
812 & 822 Case 4A2 12.7598 5.8992 3.8724 3.1132 3.0335
Case 4B 1.1745 1.1502 1.1682 1.2289 1.3097
p=0.5
c 0.3 0.5 1.0 2.0 3.0
d 0.1338 0.3202 0.6993 1.3050 1.8420
p is not known
Case 4A1 312.4136 125.3176 54.4721 46.5124 40.1396
812 & 8 Case 4A2 326.3303 131.2031 57.7210 48.3723 41.5624
22 Case 48 312.2180 125.1313 54.2374 46.1354 39.5681
p is known
Case 4A1 1.3013 1.2777 1.3297 1.4946 1.7272
812 & 822 Case 4A2 15.2242 7.1676 4.5824 3.3578 3.1530
Case 48 1.1058 1.0913 1.0947 1.1170 1.1559

 

79

TABLE 22

Cost of Partia1 Observabi1ity for Case Four

 

 

 

 

 

 

 

 

 

 

 

 

When 81:82: [.3], p=0.5 and c=1.0
o=0.0
15]); 0.15 0.35 0.45 0.55 0.65
d 0.5452 1.2098 1.5158 1.8394 2.2113
p is not known
Case 4A1 12.3757 17.9619 19.8483 22.7009 31.2475
812 & 822 Case 4A2 14.8237 20.4170 22.1214 24.6609 32.8082
Case 48 12.2678 17.6562 19.3845 22.0101 30.2299
p is known
Case 4A1 1.2250 1.5389 1.7896 2.1554 2.6895
812 & 822 Case 4A2 3.6730 3.9940 4.0627 4.1154 4.2228
Case 48 1.1171 1.2333 1.3258 1.4646 1.6718
p=0.5
566' 0.15 0.35 0.45 0.55 0.65
d 0.2919 1.0422 1.2098 1.7083 2.0917
p is not known _—'
Case 4A1 47.3035 58.5931 60.0426 66.2930 80.0371
812 8 82 Case 4A2 50.9623 61.4985 62.7722 68.4357 81.6770
2 Case 48 47.1487 58.2580 59.6427 65.6211 79.0767
p is known
Case 4A1 1.2272 1.4570 1.5383 1.8772 2.2411
812 & 822 Case 4A2 4.8880 4.3675 4.2734 4.0250 3.8850
' Case 48 1.0723 1.1213 1.1377 1.2038 1.2785

 

80

CHAPTER FIVE

 

SUMMARY AND CONCLUSIONS

Some recent studies have made use of the bivariate probit mode1 in testing
various hypotheses, but with on1y partia1 observabi1ity about the dichotomous
dependent variab1es. These studies inc1ude Poirier‘s bivariate probit mode1
using Gunderson's examp1e of the retention of trainees, Farber's research on
the demand for union representation, and Conno11y's study concerning the
decisions to arbitrate or negotiate the contracts between emp1oyees' unions
and municipa1ities in Michigan. The maximum 1ike1ihood estimators in these
partia1 observabi1ity cases wi11 be inefficient compared to those obtained
under fu11 observabi1ity. But the degree of efficiency 1055 caused by partia1
observabi1ity is not yet known. Therefore in this study, we present severa1
cases with different 1eve1s of observabi1ity for the bivariate probit mode1
and we measure the efficiency 1oss of maximum 1ike1ihood estimators for each
case through some experiments. The resu1ts that we get give us some idea
about the cost of partia1 observabi1ity, and have practica1 re1evance in
studies 1ike those above.

In Chapter Two, a forma1 statement of the bivariate probit mode1 is
presented. A genera1 form of the mode1 wou1d be

y.*=X.B +5.
‘1 ‘ 1 ‘1 i=1, 2, ..., N.

y12* 3 X182 + 812

and
yij = 1 iff yij* > 0 j=1 or 2.
yij = 0 iff yij* 5_0

81

82

where X1 is a k-dimensiona1 row vector of exp1anatory variab1es, 81 and 82
are k-dimensiona1 co1umn vector of unknown parameters and disturbance term
[:;1 has bivariate norma1 distribution with zero mean and variance-covariance
matrix as [l 1]- The variab1es y1* and y2* are a1ways unobserved; different
assumptions about the observabi1ity of y1 and y2 are considered. Six cases
are introduced to represent fu11 observabi1ity and different types of partia1
observabi1ity for the mode1. The examp1e of a two-member committee voting
under a unanimity ru1e can be app1ied to a11 of these cases. Case one is the
case of fu11 observabi1ity in which the dichotomous choices of both voters
are a1ways observab1e. Case two is the case of partia1 observabi1ity in the
sense of Poirier, . under the assumption that on1y the resu1t of the joint
choice of two decision-makers is observed. Case three is ca11ed the case of
partia1 partia1 observabi1ity, in which one of the two parties' decision is
fu11y observab1e. The other party's decision can be known on1y when the
observab1e party votes "yes". In case four, which is ca11ed the case of
partia1 observabi1ity with observed veto, when the outcome is "no“, we
observe one of the two parties casting its "no" vote. There are three
a1ternative possibi1ities here concerning who wi11 use the veto first if both
parties wish 'to vote "no". The first possibi1ity is that we assume some
fixed and known probabi1ity p that the first party does so (case 4A1). The
second possibi1ity is having p as another parameter which needs to be
estimated (case 4A2). Another possibi1ity is that the party with the strongest
sentiment toward a "no" vote wi11 be observed casting the veto (case 48).

We have provided 1ike1ihood functions for the joint estimation of the

parameters for each of the various cases. Separate estimation (one equation

at a time) is a1ways possib1e for case one and for the first probit equation

83
(the observab1e one) of case three. The separate estimation of the second
probit equation of case three is possib1e on1y when the corre1ation
coefficient p is equa1 to zero. However, joint estimation is a1ways more
efficient than separate estimation un1ess the corre1ation coefficient (p)
between the two probit equations is equa1 to zero.

In Chapter Three, six different information matrices have been derived
corresponding to the joint estimation of the parameters in our six cases
(with varying degrees of observabi1ity). The question of identification is
a1so discussed by ana1yzing the rank of these matrices. We especia11y ana1yze
the perverse case in which a11 of the coefficients of the exogenous variab1es
except the constant terms are equa1 to zero. It has been found that in the
above situation, regard1ess of the va1ues of the constant terms, a11 of the
information matrices for the partia1 observabi1ity cases are singu1ar if p
is not known. When p is known a priori, on1y the case of partia1 observabi1-
ity in the sense of Poirier and of observed veto with p as another parameter
sti11 can't be identified.

Another perverse case is when the two probit equations are identica1.
Then the case of partia1 observabi1ity in the sense of Poirier is not
identified. But there are no prob1ems with the other cases.

There are a1so other situations that wi11 cause identification prob1ems
for some cases, which we have discussed in Chapter Three. The perverse cases
that we mentioned there do not necessari1y cover a11 that wi11 make the
information matrices of various partia1 observabi1ity cases singu1ar. In
genera1 one needs to check the rank of the information matrix in each specific

situation to make sure that the parameters are identified.

84

In Chapter Four, a 1arge variety of experiments have been done to measure
the cost of partia1 observabi1ity. We first try three arbitrary experiments
and i11ustrate some genera1 resu1ts. Then we try a second set of experiments
by varying the va1ues of parameters from 10w to high 1eve1s whi1e ho1ding
the samp1e average va1ues of X181 and X182 equa1 to zero. Two important
effects have been observed from these experiments. One effect is that the
degree of identification changes as the va1ues of parameters change, and we
ca11 it the "identification effect". The other effect is that the probabi1ity
of a "yes" vote of either (or both) party changes when the va1ues of para-
meters change, and thus the samp1e sp1it between the four possib1e outcomes
changes. Hence we ca11 this the "samp1e sp1it effect". Both of these two
effects change the cost of partia1 observabi1ity. If they work against each
other, sometimes we can't te11 the direction of the change of efficiency
when the va1ues of parameters change. Therefore, some more comp1icated
experiments have been done with either the identification effect or the
samp1e sp1it effect he1d constant whi1e the other changes. We then are more
certain about the change of the cost under on1y one effect.

Among a11 the conc1usions that we obtain from the resu1ts of these
experiments, here we report some rather genera1 and important ones. First
we notice that the cost of partia1 observabi1ity is quite high, especia11y
for the case of partia1 observabi1ity in the sense of Poirier (our case two).
The cost of partia1 observabi1ity decreases marked1y if any piece of observ-
abi1ity information can be found. The 1aw of diminishing margina1 uti1ity
of information usua11y ho1ds: the gain in moving from case two to case three
(partia1 partia1 observabi1ity) or case four (observed veto) usua11y exceeds

the gain in moving from case three or four to fu11 observabi1ity (our case

85.

one). It is the first piece of observabi1ity information which is most
important. From this, our suggestion for Poirier's mode1 using Gunderson's
retention of trainees or other simi1arexamp1esis that if any information can
be obtained, for examp1e, observabi1ity for either party's decision or an
observed veto, then the efficiencies of the estimated parameters can be
great1y improved. This is re1evant, for examp1e, to Conno11y's research on
the arbitration or negotiation of the contracts between emp1oyees' unions and
municipa1ities. In this case there is an observed veto. If this information
is not used, this wou1d be just a case of partia1 observabi1ity in the sense
of Poirier. The high cost of partia1 observabi1ity for Poirier's mode1 shou1d
make one reconsider the possibi1ity of using the observed veto information.

The second conc1usion is that specifying p a priori improves the
efficiencies of the estimates of the other parametersa great dea1. A1so the
improvement from knowing p is 1argest when the re1ative efficiency is 1owest.
This can be app1ied to a11 the partia1 observabi1ity cases. For Farber's
case as an examp1e, if the observabi1ity of the union emp1oyers' se1ection
decision can't be obtained or the cost of getting the information is too high,
then specifying p in the mode1 is another way to improve the efficiency.

A third conc1usion is that the samp1e sp1it has a strong inf1uence on
the re1ative efficiencies of the parameter estimates. For a given partia1
observabi1ity case, its efficiency re1ative to fu11 observabi1ity wi11 be
higher, the sma11er the proportion of observations which fa11 into the
indistinguishab1e categories. For Poirier's mode1, the more observations
that are of the "yes, yes? variety, the higher the re1ative efficiency. The
fraction of such observations is observab1e. In case three, the higher the

pr0portion of observations having the observab1e party voting "no", the

86
1ower the re1ative efficiency wi11 be. The proportion of such observations
is a1so observab1e. For examp1e, 62.8% of Farber's samp1e are non-union
workers and 37.6% of these nonunion workers expressed a preference for union
representation. Thatisq 39.2% of the who1e samp1e be1ongs to the indistin-
guishab1e categories (not in a union and wou1d not vote for a union). The
re1ative efficiency of the estimated parameters in the probit equation for
the union emp1oyers' se1ection wi11 be 1ower as this percentage increases.
For the observed veto case, it is the proportion of observations having both
parties voting "no" that is re1evant, but this proportion is not direct1y
observab1e.

The 1ast conc1usion is that the strength of identification matters. A11
of the partia1 observabi1ity cases are unidentified for some perverse va1ues
of the parameters, as we mention above. Their re1ative efficiency is very
1ow for parameter va1ues near such points, and it increases rapid1y as the
parameters move away from such points of singu1arity. However, these effects
are not strong except in the immediate neighborhood of points of nonidenti-

fication.

APPENDIX A

 

I
ll

P(eiz ‘ ' x132’ E11 ' 612 < X132 ‘ X131)

1

8i1'812 < XiBz-X.B1 6
7211—1.) /2(1-p) ‘2

P (

 

< ' X182)

XTBZ-XiBI
F (----—- ‘- X.32; - 7(1-p1/2
72(1-0) 1

 

I
T]
,\
a
“J
—J
I
><
rip
v
'0
\J

where 0' = '7(1-p)/2

. _ x.8 -X.B
Reca11 that 35433§431-= 0(a)¢(—9—33—-), here define a = _l_§__l_l., b= -XiB

1-9 -20' 2

bfp'a 1 |
—— - 9(6)“ 1 X.
38] /1 12 201 1

'9

 

X132+x131) 1 x .

72(1-p) 72(1+p) 72(1-9) '

 

 

Let G, = P(ei1 < - X181, 512 - 81] < XiB1 - X182) using the same method as

above we can get

86, - XiB1-Xi32)¢(- xiB1+X132 1

—.... ¢ .
as2 /2(1-p) 72(1+p) 72(1-p) '

87

.88

 

 

 

 

But Gi = F(- X181, -x182; p) - “1
= 1- 4(x181) - 0(x182) + '1 - H.
361 _ 30(Xi32) 8F, 3H1.
and?" 38 +238 ’38 ’
2 2 2 2
3H. 36. X.B -pX.B
1 1 11 1 2
so -——- = [--——- - ¢(X.B ) + ¢(X.B )9( )]X.'
3H X.B -pX.B
i 11 12
= {- ——-- ¢(X.B) [1- 4( )1} x.' .
381 12 5:2 1
3H.
1
FOP-'5'; ,
note that p' = - 4: (1-p)'§
72
223.: 1
3" 2/2(1-p)
a[(X.B -X B )/-20']
and 1 2 i1
ap'
_ xiBZ'XiB1 = xiBZ'xiB1
2012 '9
331;. 9.9;331

+

 

 

 

 

 

 

89 ..
. 3H. 38 1
= .QL[__1. . 'l -20
3° 381 3(xi32'xiB1) ap'
XIBZ-X‘IB'I -
+f( 9 XB29 " (I-pj/2)]
/2(1-p)
X.B -X.B X.B +X.B X.B -X.B
= 1 {¢( 1 2 1 1)¢(_ 1 2 1 1)( 1 2 1 1)
272(1-p) 72(1-p) 72(1+p) 1-p
X B -X B

 

REFERENCES

 

Abowd, J. M. and Farber, H.S., 1982, "Job Quenes and the Union Status of

Workers," Industria1 and Labor Re1ation Review 35, 354-367.

Amemiya, T., 1974, ”Bivariate Probit Ana1ysis: Minimum Chi-Squared Methods,"
Journa1 of American Statistica1 Association 69, 940-944.

 

, 1975, "Qua1itative Response Mode1s," Anna1s of Economic and

 

Socia1 Measurement 4, 363-372.

, 1978, "The Estimation of a Simu1taneous Equation Genera1ized

 

Probit Mode1," Econometrica 46, 1193-1205.

Ashford, J. R. and Sowden, R. R., 1970, "Mu1tivariate Probit Ana1ysis,"
Biometrica 26, 535-546.

 

Conna11y, M., 1982, "The Effect of Compu1sory Arbitration on the Bargaining
Process and Wage Outcomes," unpub1ished Ph.D. dissertation, Michigan

State University.

Farber, H. 5., 1982, "Worker Preference for Union Representation," Research

in Labor Economics, forthcoming.

, 1982, "The Demand for Union Representation," Working Paper

 

No. 295, Massachusetts Institute of Techno1ogy.

90

91
Gunderson, M., 1974, "Retention of Trainees: A Study With Dichotomous

Dependent Variab1es," Journa1 of Econometrics 2, 79-93.

Heckman, J. J., 1978, "Dummy Endogenous Variab1es in a Simu1taneous Equation

System," Econometrica 46, 931-959.

, 1979, "Samp1e Bias as a Specification Error," Econometrica 47,

 

153-161.

Kmenta, J., 1971, E1ements of Econometrics, New York: Macmi11an Pub1ishing

Co., Inc.

Poirier, D. J., 1980, "Partia1 0bservabi1ity in Bivariate Probit Mode1s,"

Journa1 of Econometrics 12, 210-217.

Rothenberg, T. J., 1971, "Identification in Parametric Mode1s," Econometrica

39, 577-591.

Thei1, H., 1971, Princip1e of Econometrics, New York; John Wi1ey and Sons,

 

Inc.

Ze11ner, A. and Lee, T. H., 1965, "Joint Estimation of Re1ationships

Invo1ving Discrete Random Variab1es," Econometrica 33, 382-394.

"7'111111111141111'