SITY LIBRARIES

Illlllllllllllllllllllllllllllll ll“!

3 1293 0089

 

ll

 

 

 

F \

LIBRARY
Michigan State |

University g
a ,1

 

 

This is to certify that the

dissertation entitled

Sufficiency In The Presence
Of Nuisance Parameters

presented by

Nupun Andhivarothai

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in StatiStiCS

RV WWH

Major professor

Date Nov. 5, 1990

MS U is an Afﬁrmative Action/£2] ual Opportunity Institution 0-12771

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE H

_________l
ﬂ

ﬁgl

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ﬂ

I
MSU I. An Affirmative Action/Equal Opportunity Institution
emm.m3-o.t

 

 

 

 

 

SUFFICIENCY IN THE PRESENCE OF NUISANCE PARAMETERS

Nupun Andhivarothai

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1990

ABSTRACT

SUFFICIENCY IN THE PRESENCE OF NUISANCE PARAMETERS

by

Nupun Andhivarothai

This dissertation is devoted to the study of the concept of Sufficiency in the presence of
nuisance parameters. We mainly investigate the notion of Partial Sufficiency proposed
by Hajek in 1965. Decision theoretic aspects of Hajek’s definition is investigated
and we prove a converse to a Rab-Blackwell type theorem in the context of partial
sufficiency. We next extend the concept to one experiment being partially sufficient
to another experiment. Finally, we give some examples and applications to illustrate

the concepts studied.

To my parents
and

my sisters, Jinnarat and Sawanee

iii

Acknowledgements

I wish to express my sincere thanks to Professor R.V. Ramamoorthi for his guid-
ance, patience and encouragement during the preparation of this dissertation.

I would like to thank Professor Habib Salehi and Professor Joseph Gardiner for
serving in my committee. Special thanks are due to Professor Dennis Gilliland for
not only serving in my committee but also for introducing me to statistical consult-
ing,which has greatly enhanced my carreer opportunities.

I would like to thank the Department of Statistics and Probability and also the
Department of Family Medicine for the ﬁnancial support during my graduate studies
at Michigan State University. Finally, I would like to express my deep appreciation

to my parents and my sisters for their patience, encouragement and support.

iv

Contents

1 Introduction and Summary 1
2 Partial Sufﬁciency 3
2.1 Notation and Preliminaries ........................ 3
2.2 Partial Sufﬁciency ............................. 5
2.3 Main Theorem .............................. 7
2.4 Invariance ................................. 10

3 Comparison of Experiments in the Presence of Nuisance Parameters 13

3.1 Preliminaries ............................... 13
3.2 Main Results ............................... 14
4 Examples and Applications 20
4.1 Examples ................................. 20

4.2 Application of Comparison of Experiments in the Presence of Nuisance
Parameter .................................. 25
4.2.1 Comparison of Normal Experiments with Unknown Mean and

Unknown Variance ........................ 25

4.2.2 Comparison of Linear Normal Experiments With A Known

Nonsingular Covariance Matrix ................. 26

Bibliography 28

vi

Chapter 1

Introduction and Summary

Let X be a random variable distributed as P9,, where 0 and a are unknown
parameters. Classically, a reduction of X is achieved via a sufﬁcient statistic for
0,0. That this reduction does not entail any loss of information is established by
the Baa—Blackwell theorem which shows that for any decision problem, the decision
rules based on the sufﬁcient statistic form an essential complete class. A variety of
converses to the Rao—Blackwell theorem also show that the Fisher-Neyman deﬁnition
of sufficiency is appropriate if we are looking for a reduction of X that would be as
effective as X for all decision problems. However, it very often happens that we are
interested in only a subset of the set of all decision problems. A typical case would be
when we are interested in making inferences on the parameter 0 and are indifferent to
the value of a. In such a situation 9 would be referred to as the parameter of interest
and a as the nuisance parameter. There have been many attempts at deﬁning a
sufﬁcient statistic of part of the parameters, in the case just mentioned, a sufﬁcient

statistic for the parameter 0 in the presence of a nuisance parameter a [Neyman and

Pearson (1936), Fraser (1965), Kolmogorov (1942), Héjek (1965)].

In this study, we extend the concept of “partial sufﬁciency” introduced by Hajek
(1965) and give a result which is a converse to a Rao- Blackwell type theorem.

We next turn our attention to the problem of comparison of two experiments.
Let 8 and f be two experiments parameterized by (0,0). Bohnenblust, Shapley
and Sherman deﬁned the notion of 8 being more informative that f in terms of
risk functions obtainable in the experiments. Blackwell extended the concept of a
sufﬁcient statistic and defined 8 being sufﬁcient for .7: in terms of the existence of
Markov kernels. Blackwell then showed that “more informative” and “sufﬁcient” are
equivalent.

Blackwell’s theory involves sufﬁciency for both parameters (0, a), more speciﬁcally,
it needs the consideration of a loss function that would depend on both 0 and 0.
However, when a is a. nuisance parameter, it seems appropriate to consider a loss
function that depends only on 0. These considerations motivate our study of “partial
sufﬁciency” of experiment in Chapter 2.

We extend the concept of partial sufﬁciency to two experiments in Chapter 3. The
notions of 8 being more informative than f, say, for 0 and 6' being partially sufﬁcient
for .7: are introduced. The equivalence of these two concepts is proved. A criterion
in determining 8 being partially sufﬁcient for .77 in terms of sufﬁciency of reduced
experiments is also established.

To conclude this study, in Chapter 4 we give same examples to illustrate the
concept of partially sufﬁcient statistic and some application of results in the earlier

chapters.

Chapter 2

Partial Sufﬁciency

This chapter is devoted to the study of a notion of partial sufﬁciency introduced
by Hajek in 1965. We ﬁrst establish the notation and preliminaries, and then prove

the main result which is a converse to a Rae-Blackwell type theorem.

2.1 Notation and Preliminaries

A statistical experiment is the triplet (26,11, P) where X is a set, A is a 0-
algebra of subsets of X and P is a family of probability measures on (26,11). P
will be endowed with the 0-algebra C, which is the smallest 0-algebra generated by
P H P(A), A E A. Subsets of P will be equipped with the relative 0—algebra. We
will assume that the family P is parameterized by G x E, that is, there is a 1-1
function (0,0) H P9,, from O x 2 onto P.

Thus for us an experiment is given by (X, .4, P9,, : (0, 0) E G x E) where (X, A) is
the sample space and ('3 x 2 would be referred to as the parameter space. A decision

problem consists of a measurable space — “Action Space” (A, A) and a “loss function”

3

L(0, 0, a) from O x E x A —» 32, which is a measurable in (0, 0, a). By a decision rule

6, we mean a function 6 : X x A —) [0,1] such that

i) For all A E A, 6(3, A) is, as a function of x, A measurable.

ii) For each a: E X, 6(3, ) is a probability measure on A.

If a decision rule 6 in i) above is measurable with respect to a sub 0—algebra B of
A, then we shall refer to 6 as a B measurable decision rule.

Denote by CA the set of all bounded loss functions deﬁned on O x E x A. If
L 6 LA and 6 is a decision rule, the “risk function” of 6 (with respect to L) is the

function on O x 2 deﬁned by

mama) = [XL L(0,0,a)6(:c,da)dPo,,(x).

We shall throughout treat 0 as the “parameter of interest” and 0 as the nuisance
parameter. This treatment may be formalized in one way by considering only the
following subset of LA, £3, = {L 6 5,4 : L depends on (0, 0) through 0 only}.

Let B be a sub 0—algebra of A. If P is a probability measure on (X ,A), then
for any bounded A measurable function f, Ep( f IE) will denote any version of the
conditional expectation of f given 8, under the measure P. If P0 is a family of
probability measures on (X, A) then B is called sufﬁcient for P0 if for any bounded A
measurable function f there exists a B measurable function 9 such that g = Ep( f [3)
[P] for all P 6 Po.

We will assume throughout that

i) X is a Borel subset of a complete seperable metric space and A is the

relativized Borel 0-algebra.

ii) {P9,y : 0 6 9,0 6 E} are all mutually absolutely continuous.

As mentioned earlier 0 is the parameter of interest and 0 is the nuisance

parameter.

2.2 Partial Sufﬁciency

Deﬁnition 2.1 (Héjek (1965)) B is said to be H—suﬂicient for 0 in {P9, : 0 6
6,0 6 E} if

i) B is 0-oriented, that is, for each 0, B is ancillary for the family
Pa = {Pm :0 6 E}, i.e. P9,,(B) = Pg,,,(B) for 01,02 in E, and B E 3.

ii) For each 0, there exists a probability measure {9 on P9 such that B is
suﬂicient for {P69 : 0 6 9}, where Pg, is the marginal probability measure on
(X,A) deﬁned by

P..(A) = / amass).

Deﬁnition 2.2 B is said to be partially suﬁicient for 0, if 3 contains a H-suﬂicient

0—algebra.

Theorem 2.1 Let B be partially suﬂicient for 0 in {X ,A, P9,, : (0,0) 6 G x 2}.
Then given any decision problem (A,A) and an A-measurable decision rule 6, there

exists a B—measurable decision rule 6" such that for all 0 E O, and 0 E E,

/ 6’(:c,E)dPg,,(:r) = / 6(:c,E)dP9,,(x)d£9(0).

Proof. Since 8 is partially sufﬁcient for 0, there is a 0 algebra 80 C B which is

H-sufﬁcient for 0. 80 is sufﬁcient for {Pa : 0 6 9} and since (X, A) is standard Borel
there exists an omnibus version of the conditional probability given 80. That is there

is a function Q from X x A H [0,1] such that
(a) Q(:r,A) is Bo-measurable for all A E A
(b) Q(a:, ) is a probability measure on (X ,A) for all a:
and

(c) ff Q(a:,A)dP9,,(a:)dfg(0) = P5,,(A) for all A e A.

Given any decision problem (A, A), and a decision rule 6, deﬁne 6" by

612.3) = / 6(y,E)Q(z.dy),E e A.

By (a) 6*(23, E) is a Bo-measurable decision rule. Further, since 80 is 0-oriented

f 6"(z, E)dPg,,(a:) is constant in 0 and hence for each 0 E 6

/6‘(:c,E)dP9,,(a.-)= / 6‘(m,E)dPo,a(:c)d{g(0)= / 6(x,E)dPg,,(z)d£9(0).

The next theorem is an analogue of the Baa—Blackwell theorem in the context

of partial sufﬁciency and appears as Theorem 2.2 in Hajek (1965).

Theorem 2.2 (Héjek (1965)) Let 8 = {X,A,Po,, : (0,0) 6 9 X E} be an exper-
iment and B be partially suﬁicient for 0 in 8. Let (A, A) be a decision space. Then
given any decision rule 6, there exists a B-measurable decision rule 6" such that for

all loss functions L 6 L3,, we have for each 0

sup RL(0, 0, 6") _<_ sup RL(0, 0, 6). (2.1)
062 062

Proof. Let 6 be any decision rule and 6‘ be any B—measurable decision rule satisfying

the conclusion of Theorem 2.1. We then have

LLf(a)6‘(x,da)dPo,a(x) = [2[r/Af(a)6(x,da)dPg,,(a,-)d§,(a)

whenever f is of the form I 3(a), E E A. A standard induction argument via simple

function yields

Rams) = [J RL(9,U,5)d£o(0)-

so that

sup RL(6a 0’, 6.) = / RL(0i U, 6)d€9(0) S sup RL(0, 0', 6)
E 062

062

2.3 Main Theorem

We now move to a converse of Theorem 2.1 which is the main theorem in this

chapter.

Theorem 2.3 Let E = {X,A,Pg,, : (0,0) 6 ('9 x E} be an experiment and let 8 be
a sub 0—algebra of A. If B satisiﬁes Condition A,

Conditon A: For each 0, there exists a probability measure {9 on 2 such that,
for any decision space (A,A) given any decision rule 6, there exists a B—measurable

decision rule 6" satisfying for all loss functions L 6 £34

Rams) = / Ramadan).

Then [3 is partially suﬁicient for 0 in 8.

Proof. Choose (A,A) to be (X,A) and for each A E A, let LA(0, 0, a) = [4(a),
where I A(-) denotes the indicator function, and set 6(x,E) = I E(x). We then have

from Condition A that there exists a B—measurable decision rule 6" such that

/ 6"(x,A)dP9,,(x) = / P.,(A)d§,(a)

for all A E A.

Since the left hand side in the above equation does not depend on 0, we in fact

have
P£,(A) -_- /6“(x,A)dP¢,(x).
Now deﬁne
a; = 5'
sot/1) = /6:(y.A)6(a=,dy)

6:.(x,A) = /6’-.(y.A)6(m.dy).
An easy argument shows that
PM) = [52(xaAldPM-T) (2.2)

forallnandAEA.

Let us deﬁne

63(x,A) = lim l:6;(x,A) when it exists

n 00
_. nk=l

= P (A) otherwise,

where P is an arbitrary probability measure on (X ,A).

By Hopf’s ergodic theorem in Neveu (1965), for each 0 6 6
63(2, A) = Ep“(IA|89) a.e. [Pa]

where

3, = {B : 63(x,B) = In {Pal}-

If we set 130 = {B : 63(x,B) = 13 [P5,] for all 0 E 9} then 65 is 80 measurable,

we have
569$) = EP¢,(IAIBO)°

This shows that 30 is sufﬁcient for {P5, : 0 6 G}.

We next note that for B 6 80,
5303,13) = 18 [Peel
and from the assumption on i) page 4, we have 65(x,B) = 13 [PM] for all 0 so that
/ 63(x,B)dPo,a(x) = P9,,(B).
On the other hand 80 measurability of 65 yields
/ 55(1, B)dPa,a(x) = P6,,(B).

So that P9,,(B) is constant in 0, thereby establishing that 80 is 0—oreinted.
This shows that 80 is H—sufﬁcient and since 80 C B we have that B is partially

sufﬁcient for 0. Cl

Remark: We feel that Theorem 2.3 while interesting is still rather weak. This is

because given a decision rule 6, we require a decision rule 6" which would be as good

10

as 6 for all loss function in [33,. A more reasonable condition would be to allow 6‘ to
depend on the loss function L and prove Condition A. However, we are unable to

establish such a result.

2.4 Invariance

In the last section we studied the notion of partial sufﬁciency that was proposed
by Hajek in 1965. In the same paper, he demonstrated that in situations where the
nuisance parameter is generated by a group of transfomations on the sample space.
The maximal invariant is partially sufﬁcient. In this section we present Hajek result,
since it provides a wide class of examples of paritally sufﬁcient statistics. More speciﬁc
examples will be given in a later chapter.

Let X be a random variable with a probability distribution P9, 0 is the parameter
of interest, (X ,A) is a sample space of X. Suppose Po 6 P = {P9 : 0 E 9}, P is the
family of probability distribution which is dominated by a 0—ﬁnite measure a. Let

G = {g} be a group of 1-1 transformation from X onto X. Let A E A, put
Paw) = Pug-1A) (2.3)

We will assume the following conditions: Condition B: Let Q be a 0—algebra of

subsets of G, and assume the followings:

i) Let pg be a measure such that pg(A) = p(gA) and pg < p for all g E G
ii) Let pg(x, g) be a density of P9,, with respect to ,u and that p9(x, g) is Ax g
—measurable

iii) Functions ¢h(g) = hg and zbg(h) = gh are g-measurable.

11

iv) There exists an invariant probability measure V on C}, that is, V(Bg) = u(gB)

=V(B)forallgEGandB€Q.

We shall say that an event A E A is G—invariant, if gA = A for all g E G. We can
see that the set G—invariant events is a sub 0—algebra B C A, and we say that if f is

B measurable iff f(g(x)) = f(x) for g E G.

Theorem 2.4 (Hajek (1965)) Let P9 6 P and deﬁne PM by Equation 2.3 for each
0. Under Condition B, the sub 0—algebra B of G-invariant events is partially suf-

ﬁcient for 0.

Proof. It is enough to show that 8 satisﬁes Deﬁnition 2.1

i) Since 8 is a sub 0—algebra of G—invariant events, for B 6 B
P9,,(B) = 130(3).
ii) for A e A, let
P...(A) = / Pix/1W9)
we have

Pu.(A) = / MP9($,9)dﬂ]d'/(9)
= L/po(z,g)dt’(g)dp

With $903) = [170(3) g)du(g)

P...(A) = Amway (2.4)

12

Let Po be some probability measure such that P9 << Po << [1 and deﬁne p0(x) by

We) = f po(z,g)dV(9)-

 

P...(A) = A :gégmzw (2.5)

It follows from Theorem 3.3 of Hajek (1965) that p, (x) /'p'o(x) is B—measurable and
it is also a density of P”, with respect to PW. By Lemma 1 page 401 of Billingsley
(1979), it follows that B is sufﬁcient for {PW}. U

Chapter 3

Comparison of Experiments in the

Presence of Nuisance Parameters

3.1 Preliminaries

In this chapter we study the extension of the concept of partial sufﬁciency to
two experiments. Let E = {X,A,P¢ : t E T} and f = {y,B,Q. : t E T} be two

experiments. Following Blackwell (1951), we deﬁne:

Deﬁnition 3.1 E is more informative than .77 if for any decision problem (A, A) and
loss function L(t,a), if given any decision rule 6 in .7, there exists a decision rule 6"

in 8 such that the risk functions satisfy, for all t E T

R(t,6‘) s R(t,6).

13

14

Deﬁnition 3.2 8 is sufficient for .7", if there exists a Markov kernel M from 5 to .7:

such that, for allt E T and B 6 B

/ M(z.B)dP.(z) = 4MB)

Blackwell(1951) showed that Deﬁnition 3.1 and Deﬁnition 3.2 are equivalent when
T is ﬁnite. When the experiments are dominated the equivalence continues to hold,

see Feldman and Ramamoorthi (1984) for a proof.

3.2 Main Results

A direct analogue of deﬁnition 3.1 in the context of “partial sufﬁciency”, in view
of the remark in Section 2.3 of Chapter 2, is not available. However, motivated by
Theorem 2.1 of Chapter 2, we deﬁne the following:

Let 8 = {X,A,P9,,;(0,0) E O x E} and f = {y,B,Q9,a;(0,0) E ('3 x E} be
two experiments. As before we treate 0 as the parameter of interest and 0 as the
nuisance parameter. For an action space (A, A), let £3, be the class of all bounded

loss functions which do not depend on 0.

Deﬁnition 3.3 E is more informative than .7 for 0, if for any decision problem,
there exists a probability measure [19 on 2 such that, given any decision rule 6 in .7,

there exists 6" in 8 such that for all L 6 £31
RL(0rar 6.) S fRL(0rar 6)dpg(0’).
Remark: For any L E C, deﬁning

L1(0,0,a) = supL(0,0,a)—L(0,0,a).

15

It is easy to see that the “S” in Deﬁnition 3.3 can be replaced by “=”

Deﬁnition 3.4 8 is partially suﬂ‘icient for .7: if there exists a Markov kernel M (-, )

from 8 to J: and probability measures #9 on 2 such that
[M(.,B)dp.,,(.) = [mammal
for all (0,0) 6 9 x E and B E 3.

Theorem 3.1 8 is more informative than .7: for 0 iﬂ' 8 is partially suﬂicient for f.

Proof.
(i) Suppose E is more informative than .7: for 0. Choose (A, A) to be ()2, B) and
let 6(y, E) = IE(y), where I E() is an indicator function. Then, from the assumption

that 8 is more informative than .‘F for 0, we have a decision rule 6" in 8 such that,

for all L 6 £3,

12,,(9, 0, 6") = / mo, 0, 6)dp9(0) (3.1)

For each B E B, the loss function L(0, 0, a) = [3(a) is in 53,, where IB(-) is an

indicator function. Using Equation (3.1), it is evident that 6" satisﬁes

[6*(x.B)dPo..(m) = /Q0.a(B)dI‘0(0)-

(ii) Suppose E is partially sufﬁcient for .77.
Let M be the Markov kernel provided by the partial sufﬁciency of 8 to f. Given

any decision rule 6 in .7, deﬁne 6" by

ms) = [6(y.E)M(z.dy).

16

It is easily veriﬁed that 6" satisﬁes

Rama) = [mmowo

for all L 6 £34. El

Let 8 = {X,A, P9,; (0, 0) E ('9 x E} be an experiment. If A0 is a sub 0—algebra
of A, we will denote by So the experiment {X ,A0,Pg,,; (0, 0) E 9 x 23}. Our next

theorem relates partial sufﬁciency and sufﬁciency.

Theorem 3.2 Let E = {X,A, P9,; (0, 0) E O x E} be an experiment and let the sub
0—algebra A0 be H—suﬁ‘icient for 0.

Similarly let .7: = {37,8, Qg,,;(0, 0) E O x E} be an experiment and let 80 be
H—suﬂ‘icient for 0.

Then S is partially suﬂicient for f iﬂ 80 is sufﬁcient for .752).

Proof. Suppose 8 is partially sufﬁcient for 1:.
Let 6 be any decision rule in .70, then since 6 is also a decision rule in f, there

exists a decision rule 6" in 5 such that

12,,(0, 0, 6*) = f 1240, 0, 6)dp9(0)

for L 6 £34.
However, since 80 is H—sufﬁcient for {)’,B,Qg,, : (0, 0) E G x 2} and L 6 5°,“

RL(0,0, 6) is constant in 0, so that for all (0, 0) E G x 2

Ramos) = [summers
and hence we have

RL(0,0,6"') = RL(0,0,6) (3.2)

291,3:ng

17

By H—sufﬁciency of A0, we have a decision rule 65 in 80 such that
RL(9.0.56) = [RL(0.0.5‘)d£o(a)-
However by equation (3.2), we have that RL(0, 0, 6") is a constant in 0 so that
RL(0,0,65) = RL(0,0,6‘) = RL(0, 0, 6).

Conversely, suppose 50 is sufﬁcient for .70, then

a) the Markov kernel M1(x, A) = I .4 from 8 to £0 satisﬁes, for all A 6 A0
[M1(2.A)dP.,.(x) = P.,.(A)

b) there exists a Markov kernel M; from £0 to .70 such that, for all B 6 Bo
/M,(.,B)dp,,,(.) = 629.03)

c) there exists a Markov kernel M3 from .70 to .7 such that, for all B 6 Bo

/ M3(y.B)on.(y) = [ammo

It is easily veriﬁed that the Markov kernel
M($,B) = /M3(y,B)M2(x2,dy)M1(x1,dx2)
satisﬁes

/ M(z.B)dPa..(z-) = / Qop(B)dﬂo(0)- (3.3)

In the presence of partially sufﬁcient 0—algebras, the following theorem is of in-

terest.

18

Theorem 3.3 Let E = {X ,A, P9 ,0,(0,0) E 9 x E} be an experiment and let A0 be
H—suﬂ‘icient for A. Similarly let .7 = {y,B,Qg,,,(0,0) E G X E} be an experiment
and let 30 be H—suﬂicient for 8.

Then the followings are equivalent:

i) Given any decision rule 6 in .7, there exists 6" in 8 such that for all L 6 £3,

811p RL(0rUr6.) S sup RL(0rar6)'
062 062

ii) 80 is sufficient for .70.

Proof. i) implies ii).

Let 6 be any decision rule in .70. For L 6 L we have for all (0, 0) E 9 x E

RL(0,0',6) = sup RL(9,0,6). (3.4)
062

Since 6 is also a decision rule in .7, we have by i ) a decision rule 6" in i) such that
Rama) 5. sup law. a. 6') s Rama).
068
Since A0 is paritally sufﬁcient for A, there exists 6; in 80 such that

stoma) = [RL(9.a.6‘)d£.(a)
s supRL(0.a.6‘)
062
< RL(0,0,6).

So that So is sufﬁcient for .70.
ii) implies i).

Let 6 be any decision rule in .7. Then there exists 61 in .70 and 6" in 80 such that

RL(0,0,61) = / RL(0,0,61)dp9(0) g 31331240, 0, 5)

19

and
RL(91 U, 6.) S RL(03 0’, 61)

so that

sup RL(02 0', 6.) S sup RL(0r 0: 6)
062 062

Since 6" is in $0 and hence in 8 this establishes i).

Chapter 4

Examples and Applications

We give a few examples to illustrate the notion of partial sufﬁciency and then we
show the application of the theorems in Chapter 3. Some of these examples already

appear in Hajek (1965). Others are new.

4.1 Examples

In this section, examples of partial sufﬁciency will be given in terms of a statistic
T instead of the sub 0—algebra [3, induced by T . However, before giving examples
one may recognize that for T to be H-sufﬁcient or partially sufﬁcient statistic for 0, it

is necessary that T is 0—oriented or equivalently, we have a factorization of the form

P(z‘l0. 0) = 9(Tl 9)f($lT.9, 0) (4-1)

where p(x|0, 0) is a density function of P9,,“

It is also necessary that there exists 69,3 probability measure on 2 such that the

20

21

“mixed” density function,

pea) = panama)

can be factored as

peak) = G(T.9)F($) (4-2)

We now look at examples for partial sufﬁciency.

Example 4.1 (Héjek(1965)) Consider a sample X = (X1,X2, . . . ,X,,) of size n,
each i.i.d from N(p,02). The statistic T(X) = s'2 = E?=1(X.- —X)2/(n — 1) is
partially sufﬁcient for 02 if 02 E (0, K) for K ﬁnite. To see this, we ﬁrst factor the

density function of X as follow:

paw”) = An)... [g3] ex, PM]

202

—ﬂ

where A(0) = ( 21w)
Choose {a to be a normal distribution with mean 0 and variance (K — 02) / n. The

“mixed” density function is:

17(140) = LPMMUWA/I)
= A(0)exp 721% [a eXP [419552.] (16.01)

= A(0)exp ’37:; B(-X—)C(0) (4.3)

 

 

where

B(X) = exp [-i]

C(0) = 0

22

Remark: If we choose {a to be uniform distribution (the Lebesgue measure) over
the whole real line, the proof Theorem 2.2 will break down since the left hand side
of the Equation 2.1 is equal to inﬁnity but not the right hand side. It seems unlikely

that s2 will be P-sufﬁcient for 02, if 0 6 (0,00), however we do not have proof.

Example 4.2 (Neyman & Scott) Consider the data consisting of 2n observations
X1,X;,X2,X;,...,X,,,X,',. Let X.- and X: be independent normal random variables
with mean [1,- (i = 1,2, . . . ,n) and variance 0’. The parameter of interest is 02, the
nuisance parameter is the vector p = ([11,ﬂ2, . . . ,pn). Take 32 = Z?=1(X,- — X32,

X.- = (X,- + X:)/2 and with A(0) = ( 2x0)-n, we have

 

Melina”) = A(U)6Xp["%:_:]exp[-n(x‘—“‘)]

202

The statistic 32 is clearly 0-oriented and is partially sufﬁcient if we take
{,(p1,p2, . . . ,pn) = 2;, 43001,), where ¢,(p,~) is a normal density with mean 0, vari-

ance (K — 02)/2 and also assume that 06(0, K), K is ﬁnite.

Example 4.3 Let X = (X1,X2, . . . ,X,) has a multimomial distribution with pa-

rameters n, pl, p2, . . .p,. The distribution of X is given by

n!

 

_ __ _ __ n1 n: n.
P(X1 — n1,X2 — 712,...X, - 12,) — n1!n2!u.n‘!p1 p2 ...p‘

where Zf=1pg = 1 and 2.9:, n; = n.

The statistic T(X) = (T1(X1), T2(X2), . . . , T.(X,)) is sufﬁcient for (p1,p2, . . . p,).
Also the statistic T1(X1) = n1 is partially sufﬁcient for p1 since the marginal distri-
bution of T1(X1) is binomial(n, p1) which is independent of p2, p3, . . . , p,. Hence, it

is pl-oriented.

23

The factorization equation (4.2) holds if take 5,, a point mass at

 

 

 

 

_ 1"P1 1-P1 1-P1)
(p2rP3,-°-1ps)"(3_lr3_lr"'a8-1
Sowehave
PTX — X- X— _ "1 val-1’1"“
( ( 1)-nla 2—n2r” s_ns)£p;(p2ap3wr°rps) — mp1 (3_1)
_ n!(s—1) p1 ”(l—pl)”
_ n11...n,! 1—p1 s-l
The argument above works for any p,- for i = 1, 2,. . . , 3; therefore, we have T,(X,~)) is

partially sufﬁcient for p,-; i = l, 2, . . . , 3.

Example 4.4 A linear model is represented by Y = X11 + X20 + e, E(e) = 0,
Var(e) = 021", where Y is an n x 1 random vector, X1 is n x k matrix with rank
k, X; is n x p matrix with rank p and L, is an n x n identity matrix. 1' is a k x 1
vector of parameters of interest, ,0 is a p x 1 vector of nuisance parameters, and e is an
n x 1 random vector of errors from a normal distribution with mean 0 and covariance

matrix 021". The density function of Y, with respect to Lebesgue measure, is

p(leaﬂ) = ( 27r0)’"exp [—%_2-“y _ X17. _ X2ﬂllz]

Let M(X,) denote the space spanned by the columns of X,- and .M(X,-)l =
{x : x'y = 0for y E M(X,)}, for i = 1,2. Let P denote the orthogonal projec-

tion onto M(X2)J', so Q = I — P is the orthogonal projection onto M (X2), with

y=Py+Qy;

“(P31 - X17) + (Qy - X20)“2

= lle - Xﬂllz + lle - X20”2

lly — X17 — X20”2

24

Hence,
P(leHB) = (maymexl) {—53:2- {llpy _ X17”? + IIQy _ Xgﬂ||2}]

Py is r—oriented since E(Py) = PXlr and Var(Py) = 02P. Choose £,(ﬂ) =

133(0), therefore, the statistic Py is partially sufﬁcient for 1'.

Example 4.5 (Maximal Invariance) Consider a sample X = (X1,X2, . . . ,Xn) of
size 11, each i.i.d from N(p,02). The transformation g¢(X1,X2,...,X,,) = (X1 +
a,X2 + a, . . . ,Xn + a) of the group G = {9a : a 6 33} transforms the sample space 3t”
onto itself. It is associated with the group G = {g—a : a E 33} of E050) = (p + 0,0)
of parameter space onto itself. The group G leaves 0 invariant.

The maximal invariant for the problem of estimating 0 with respect to the group

G is the different statistic
D = (X2 —X1,X3 — X2,...,Xn —X1).

The statistic s, as a function of D, is invariant and partially sufﬁcient for the

problem of estimating 0 in the sense that

i) s is 0-oriented, and

ii) the statistic s is sufﬁcient for 0.

Example 4.6 (Invariance) Let X be a random vector taking value in .5132. Let
X ~ N(p,I), p E 322.

For the parameter p 6 322, to know p E 32“, we need to know the norm of p
and the angle from a ﬁxed array, say, it = (“pH/J2, “pH/J2), represented by I‘, an
orthogonal matrix, such that I‘ll = p. If we are only interested in estimating ”p”, we

may treat I‘ as a nuisance parameter.

25

In terms of invariance, let the group G = 02 = all orthogonal transformation on
922 and P3; = N ([72, I). The statistic "X H is sufﬁcient for ”p", it is also G-invariant
since ||I‘X I] = ”X H. Clearly, there exists an invariant probability measure on 02 since

02 is compact. By Theorem 2.4 ”X” is partially sufﬁcient for "p”.

4.2 Application of Comparison of Experiments in

the Presence of Nuisance Parameter

4.2.1 Comparison of Normal Experiments with Unknown

Mean and Unknown Variance

Let E.- be a normal experiment with unknown mean p and unknown variance 0”“

where k,(> 0) is a known constant; i = 1,2. Suppose that we are only interested in
making inferences on the parameter 0 with regardless to the value of p—that is, p is
the nuisance parameter and 0 is the parameter of our interest. We want to determine
for what value of k1 and k2 that $1 is more informative than £2 for 0.

The next theorem will answer this question.

Theorem 4.1 Let E,- be a normal experiments with unknown mean p and unknown
variance 0'“ where k, > 0 andi = 1,2. If 1:1 > It; and 0 E (0,K], K < 00, then 81 is

more informative than £2 for 0.

Proof. Let s? be a sample variance obtained from n,- observation from 8.- for i = 1, 2.
By Example 4.1 s? is a partially sufﬁcient statistic for 02'“; i = 1, 2. Let 8,} and 83 be

two experiments derived from the partially sufﬁcient statistics sf and 33, respectively.

26

Then by Theorem 3 of Goel and DeGroot (1979), 83 is sufﬁcient for 83 if In > k; > 0.
So by Theorem 3.1 and Theorem 3.2, we have if k1 > ’02 > 0 then 81 is more

informative than 82 for 0 iii 83 is sufﬁcient for 83. And this concludes the proof. 0

Remark: Goel and DeGroot (1979) proved the above theorem for the case of p
assumed to be known but no restriction on 0. The condition that 0 is bounded which
we required in proving the above theorem, is a legitimate assumption that can be

impose here.

4.2.2 Comparison of Linear Normal Experiments With A '
Known Nonsingular Covariance Matrix

Let E,- = £(X,-r + Zgﬂ, 021m.) be a linear normal experiment which is represented
by Y = X57 + Zrﬂ + c, where Y is an n; x 1 random vector, X,- is n,- x k matrix with
rank k, Z,- is n,- x p matrix with rank 1) and In, is an n,- x n,- identity matrix. 1' is a
k x 1 vector of parameters of interest, 0 is a p x 1 vector of nuisance parameters, and
e,- is an n,- x 1 random vector of errors from a normal distribution with mean 0 and
covariance matrix 021,“; i = 1,2.

By Example 4.4 P,-y,- is a paritally sufﬁcient statistic for r; i = 1,2. Let 8,", =
£(P,-X,-r,02P,~) be a linear experiment base on the partially sufﬁcient statistic ng;
where P,- is an orthogonal projection matrix onto M(Z,)‘L. Note that P,- may be

represented in the form P,- = In, -— Z;(ZfZ,-)"1Z{; i = 1,2.

Theorem 4.2 Assume the above setup, 81 is more informative than £2 for r iﬂ

X{P1X1 — XéPng is non negative deﬁnite.

27

Proof. i) Suppose 81 is more informative than 82 for 1'. By Theorem 3.1
and Theorem 3.2, we have 83 is sufﬁcient for 83. It follows that from Rao—
Blackwell Theorem that Var(c’i'l) _<_ Var(c’i'g); c E M(XQPZ’) C M(X{P{) and
dig is the UMVUE of c’r. Since the UMVUE and BLUE coincide, Var(c’ﬁ) =
c’(X{P,-’P,P,~X,~)"c = c’(X:P,-Xg)'c. By Lemma 2 of Steniak, Wang and Wu (1984),
we have X{P1X1 — XéPng is non negative deﬁnite.

ii) Suppose XiPle — X§P2X2 is non-negative deﬁnite. Let G = X{P1X1 —
XéPng. Denote yf, as a random vector representing 83; i = 1,2. Let 85' be a
“ﬁctitious experiment” such that XLX. = G, X. is a design martix of 85. Let y; be .
a random vector representing 85' and suppose y; and y: are independent. Then it
follows that (P1X1)’y3 is sufﬁcient for r and (P2X; )’ y: +X 1y; is sufﬁcient for 1' under
the combination of experiment 83 and 83'. But (P1X1)’y3 has the same distribution
as (P2X2)’y3 +X1y5. Hence 83 is sufﬁcient for 83. By Theorem 3.1 and Theorem 3.2,

81 is more informative than £2 for r. U

Bibliography

10.

11.

Billingsley, P., Probability and Measure John Wiley, New York, 2nd, 1979.

Blackwell, D. ( 1951), Comparison of Experiments. Proceedings of the Second
Symposium on Mathematical Statistics and Probability, 1:92-102.

Goel, P. K., DeGroot (1979), Comparison of Experiments and Information
Measures. Annals of Statistics, 7 :1066-1077.

Feldman, D., Ramamoorthi, R. V., A Decision Theoretic Proof of Black-
well Theorem. Technical Report, Department of Statistics & Probability,
Michigan State University, 1984.

Hajek, J. (1965), On Basic Concepts of Statistics. Proceedings of the Fifth
Symposium on Mathematical Statistics and Probability, 1:139-162.

Fraser, D. A. S. (1956), Sufﬁcient Statistics With Nuisance Parameters,
Annals of Mathematical Statistics, 27:838-842.

Kolmogorov, A. N. (1942), Sur l’estimation statistique des parameters de

la loi de Gauss. Izv. Akad. Nauk SSSR Ser. Mat., 6:3-32.

Neveu, J., Mathematical Foundations of Calculus of Probability Holden-
Day, San Franciso, 1965.

Neyman, J ., Scott, E. L. (1948), Consistent Estimates Based on Partially
Consistent Observations, Econometrica, 16:1-32.

Neyman, J ., Pearson, E. S. ( 1936), Sufﬁcient Statistics and Uniformly Most
Powerful Tests of Statistical Hypotheses, Statistical Research Memoirs of
the University of London, 1:133—137.

Steniak, C., Wang, S. and Wu, C. F. (1984), Comparison of Linear Exper-
iments With Known Covariances. Annals of Statistics, 12:358-365.

28

"Illllllllll‘llllls