SITY LIBRARIES Illlllllllllllllllllllllllllllll ll“! 3 1293 0089 ll F \ LIBRARY Michigan State | University g a ,1 This is to certify that the dissertation entitled Sufficiency In The Presence Of Nuisance Parameters presented by Nupun Andhivarothai has been accepted towards fulfillment of the requirements for Ph.D. degree in StatiStiCS RV WWH Major professor Date Nov. 5, 1990 MS U is an Affirmative Action/£2] ual Opportunity Institution 0-12771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE H _________l fl figl fl I MSU I. An Affirmative Action/Equal Opportunity Institution emm.m3-o.t SUFFICIENCY IN THE PRESENCE OF NUISANCE PARAMETERS Nupun Andhivarothai A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1990 ABSTRACT SUFFICIENCY IN THE PRESENCE OF NUISANCE PARAMETERS by Nupun Andhivarothai This dissertation is devoted to the study of the concept of Sufficiency in the presence of nuisance parameters. We mainly investigate the notion of Partial Sufficiency proposed by Hajek in 1965. Decision theoretic aspects of Hajek’s definition is investigated and we prove a converse to a Rab-Blackwell type theorem in the context of partial sufficiency. We next extend the concept to one experiment being partially sufficient to another experiment. Finally, we give some examples and applications to illustrate the concepts studied. To my parents and my sisters, Jinnarat and Sawanee iii Acknowledgements I wish to express my sincere thanks to Professor R.V. Ramamoorthi for his guid- ance, patience and encouragement during the preparation of this dissertation. I would like to thank Professor Habib Salehi and Professor Joseph Gardiner for serving in my committee. Special thanks are due to Professor Dennis Gilliland for not only serving in my committee but also for introducing me to statistical consult- ing,which has greatly enhanced my carreer opportunities. I would like to thank the Department of Statistics and Probability and also the Department of Family Medicine for the financial support during my graduate studies at Michigan State University. Finally, I would like to express my deep appreciation to my parents and my sisters for their patience, encouragement and support. iv Contents 1 Introduction and Summary 1 2 Partial Sufficiency 3 2.1 Notation and Preliminaries ........................ 3 2.2 Partial Sufficiency ............................. 5 2.3 Main Theorem .............................. 7 2.4 Invariance ................................. 10 3 Comparison of Experiments in the Presence of Nuisance Parameters 13 3.1 Preliminaries ............................... 13 3.2 Main Results ............................... 14 4 Examples and Applications 20 4.1 Examples ................................. 20 4.2 Application of Comparison of Experiments in the Presence of Nuisance Parameter .................................. 25 4.2.1 Comparison of Normal Experiments with Unknown Mean and Unknown Variance ........................ 25 4.2.2 Comparison of Linear Normal Experiments With A Known Nonsingular Covariance Matrix ................. 26 Bibliography 28 vi Chapter 1 Introduction and Summary Let X be a random variable distributed as P9,, where 0 and a are unknown parameters. Classically, a reduction of X is achieved via a sufficient statistic for 0,0. That this reduction does not entail any loss of information is established by the Baa—Blackwell theorem which shows that for any decision problem, the decision rules based on the sufficient statistic form an essential complete class. A variety of converses to the Rao—Blackwell theorem also show that the Fisher-Neyman definition of sufficiency is appropriate if we are looking for a reduction of X that would be as effective as X for all decision problems. However, it very often happens that we are interested in only a subset of the set of all decision problems. A typical case would be when we are interested in making inferences on the parameter 0 and are indifferent to the value of a. In such a situation 9 would be referred to as the parameter of interest and a as the nuisance parameter. There have been many attempts at defining a sufficient statistic of part of the parameters, in the case just mentioned, a sufficient statistic for the parameter 0 in the presence of a nuisance parameter a [Neyman and Pearson (1936), Fraser (1965), Kolmogorov (1942), Héjek (1965)]. In this study, we extend the concept of “partial sufficiency” introduced by Hajek (1965) and give a result which is a converse to a Rao- Blackwell type theorem. We next turn our attention to the problem of comparison of two experiments. Let 8 and f be two experiments parameterized by (0,0). Bohnenblust, Shapley and Sherman defined the notion of 8 being more informative that f in terms of risk functions obtainable in the experiments. Blackwell extended the concept of a sufficient statistic and defined 8 being sufficient for .7: in terms of the existence of Markov kernels. Blackwell then showed that “more informative” and “sufficient” are equivalent. Blackwell’s theory involves sufficiency for both parameters (0, a), more specifically, it needs the consideration of a loss function that would depend on both 0 and 0. However, when a is a. nuisance parameter, it seems appropriate to consider a loss function that depends only on 0. These considerations motivate our study of “partial sufficiency” of experiment in Chapter 2. We extend the concept of partial sufficiency to two experiments in Chapter 3. The notions of 8 being more informative than f, say, for 0 and 6' being partially sufficient for .7: are introduced. The equivalence of these two concepts is proved. A criterion in determining 8 being partially sufficient for .77 in terms of sufficiency of reduced experiments is also established. To conclude this study, in Chapter 4 we give same examples to illustrate the concept of partially sufficient statistic and some application of results in the earlier chapters. Chapter 2 Partial Sufficiency This chapter is devoted to the study of a notion of partial sufficiency introduced by Hajek in 1965. We first establish the notation and preliminaries, and then prove the main result which is a converse to a Rae-Blackwell type theorem. 2.1 Notation and Preliminaries A statistical experiment is the triplet (26,11, P) where X is a set, A is a 0- algebra of subsets of X and P is a family of probability measures on (26,11). P will be endowed with the 0-algebra C, which is the smallest 0-algebra generated by P H P(A), A E A. Subsets of P will be equipped with the relative 0—algebra. We will assume that the family P is parameterized by G x E, that is, there is a 1-1 function (0,0) H P9,, from O x 2 onto P. Thus for us an experiment is given by (X, .4, P9,, : (0, 0) E G x E) where (X, A) is the sample space and ('3 x 2 would be referred to as the parameter space. A decision problem consists of a measurable space — “Action Space” (A, A) and a “loss function” 3 L(0, 0, a) from O x E x A —» 32, which is a measurable in (0, 0, a). By a decision rule 6, we mean a function 6 : X x A —) [0,1] such that i) For all A E A, 6(3, A) is, as a function of x, A measurable. ii) For each a: E X, 6(3, ) is a probability measure on A. If a decision rule 6 in i) above is measurable with respect to a sub 0—algebra B of A, then we shall refer to 6 as a B measurable decision rule. Denote by CA the set of all bounded loss functions defined on O x E x A. If L 6 LA and 6 is a decision rule, the “risk function” of 6 (with respect to L) is the function on O x 2 defined by mama) = [XL L(0,0,a)6(:c,da)dPo,,(x). We shall throughout treat 0 as the “parameter of interest” and 0 as the nuisance parameter. This treatment may be formalized in one way by considering only the following subset of LA, £3, = {L 6 5,4 : L depends on (0, 0) through 0 only}. Let B be a sub 0—algebra of A. If P is a probability measure on (X ,A), then for any bounded A measurable function f, Ep( f IE) will denote any version of the conditional expectation of f given 8, under the measure P. If P0 is a family of probability measures on (X, A) then B is called sufficient for P0 if for any bounded A measurable function f there exists a B measurable function 9 such that g = Ep( f [3) [P] for all P 6 Po. We will assume throughout that i) X is a Borel subset of a complete seperable metric space and A is the relativized Borel 0-algebra. ii) {P9,y : 0 6 9,0 6 E} are all mutually absolutely continuous. As mentioned earlier 0 is the parameter of interest and 0 is the nuisance parameter. 2.2 Partial Sufficiency Definition 2.1 (Héjek (1965)) B is said to be H—suflicient for 0 in {P9, : 0 6 6,0 6 E} if i) B is 0-oriented, that is, for each 0, B is ancillary for the family Pa = {Pm :0 6 E}, i.e. P9,,(B) = Pg,,,(B) for 01,02 in E, and B E 3. ii) For each 0, there exists a probability measure {9 on P9 such that B is suflicient for {P69 : 0 6 9}, where Pg, is the marginal probability measure on (X,A) defined by P..(A) = / amass). Definition 2.2 B is said to be partially sufiicient for 0, if 3 contains a H-suflicient 0—algebra. Theorem 2.1 Let B be partially suflicient for 0 in {X ,A, P9,, : (0,0) 6 G x 2}. Then given any decision problem (A,A) and an A-measurable decision rule 6, there exists a B—measurable decision rule 6" such that for all 0 E O, and 0 E E, / 6’(:c,E)dPg,,(:r) = / 6(:c,E)dP9,,(x)d£9(0). Proof. Since 8 is partially sufficient for 0, there is a 0 algebra 80 C B which is H-sufficient for 0. 80 is sufficient for {Pa : 0 6 9} and since (X, A) is standard Borel there exists an omnibus version of the conditional probability given 80. That is there is a function Q from X x A H [0,1] such that (a) Q(:r,A) is Bo-measurable for all A E A (b) Q(a:, ) is a probability measure on (X ,A) for all a: and (c) ff Q(a:,A)dP9,,(a:)dfg(0) = P5,,(A) for all A e A. Given any decision problem (A, A), and a decision rule 6, define 6" by 612.3) = / 6(y,E)Q(z.dy),E e A. By (a) 6*(23, E) is a Bo-measurable decision rule. Further, since 80 is 0-oriented f 6"(z, E)dPg,,(a:) is constant in 0 and hence for each 0 E 6 /6‘(:c,E)dP9,,(a.-)= / 6‘(m,E)dPo,a(:c)d{g(0)= / 6(x,E)dPg,,(z)d£9(0). The next theorem is an analogue of the Baa—Blackwell theorem in the context of partial sufficiency and appears as Theorem 2.2 in Hajek (1965). Theorem 2.2 (Héjek (1965)) Let 8 = {X,A,Po,, : (0,0) 6 9 X E} be an exper- iment and B be partially sufiicient for 0 in 8. Let (A, A) be a decision space. Then given any decision rule 6, there exists a B-measurable decision rule 6" such that for all loss functions L 6 L3,, we have for each 0 sup RL(0, 0, 6") _<_ sup RL(0, 0, 6). (2.1) 062 062 Proof. Let 6 be any decision rule and 6‘ be any B—measurable decision rule satisfying the conclusion of Theorem 2.1. We then have LLf(a)6‘(x,da)dPo,a(x) = [2[r/Af(a)6(x,da)dPg,,(a,-)d§,(a) whenever f is of the form I 3(a), E E A. A standard induction argument via simple function yields Rams) = [J RL(9,U,5)d£o(0)- so that sup RL(6a 0’, 6.) = / RL(0i U, 6)d€9(0) S sup RL(0, 0', 6) E 062 062 2.3 Main Theorem We now move to a converse of Theorem 2.1 which is the main theorem in this chapter. Theorem 2.3 Let E = {X,A,Pg,, : (0,0) 6 ('9 x E} be an experiment and let 8 be a sub 0—algebra of A. If B satisifies Condition A, Conditon A: For each 0, there exists a probability measure {9 on 2 such that, for any decision space (A,A) given any decision rule 6, there exists a B—measurable decision rule 6" satisfying for all loss functions L 6 £34 Rams) = / Ramadan). Then [3 is partially sufiicient for 0 in 8. Proof. Choose (A,A) to be (X,A) and for each A E A, let LA(0, 0, a) = [4(a), where I A(-) denotes the indicator function, and set 6(x,E) = I E(x). We then have from Condition A that there exists a B—measurable decision rule 6" such that / 6"(x,A)dP9,,(x) = / P.,(A)d§,(a) for all A E A. Since the left hand side in the above equation does not depend on 0, we in fact have P£,(A) -_- /6“(x,A)dP¢,(x). Now define a; = 5' sot/1) = /6:(y.A)6(a=,dy) 6:.(x,A) = /6’-.(y.A)6(m.dy). An easy argument shows that PM) = [52(xaAldPM-T) (2.2) forallnandAEA. Let us define 63(x,A) = lim l:6;(x,A) when it exists n 00 _. nk=l = P (A) otherwise, where P is an arbitrary probability measure on (X ,A). By Hopf’s ergodic theorem in Neveu (1965), for each 0 6 6 63(2, A) = Ep“(IA|89) a.e. [Pa] where 3, = {B : 63(x,B) = In {Pal}- If we set 130 = {B : 63(x,B) = 13 [P5,] for all 0 E 9} then 65 is 80 measurable, we have 569$) = EP¢,(IAIBO)° This shows that 30 is sufficient for {P5, : 0 6 G}. We next note that for B 6 80, 5303,13) = 18 [Peel and from the assumption on i) page 4, we have 65(x,B) = 13 [PM] for all 0 so that / 63(x,B)dPo,a(x) = P9,,(B). On the other hand 80 measurability of 65 yields / 55(1, B)dPa,a(x) = P6,,(B). So that P9,,(B) is constant in 0, thereby establishing that 80 is 0—oreinted. This shows that 80 is H—sufficient and since 80 C B we have that B is partially sufficient for 0. Cl Remark: We feel that Theorem 2.3 while interesting is still rather weak. This is because given a decision rule 6, we require a decision rule 6" which would be as good 10 as 6 for all loss function in [33,. A more reasonable condition would be to allow 6‘ to depend on the loss function L and prove Condition A. However, we are unable to establish such a result. 2.4 Invariance In the last section we studied the notion of partial sufficiency that was proposed by Hajek in 1965. In the same paper, he demonstrated that in situations where the nuisance parameter is generated by a group of transfomations on the sample space. The maximal invariant is partially sufficient. In this section we present Hajek result, since it provides a wide class of examples of paritally sufficient statistics. More specific examples will be given in a later chapter. Let X be a random variable with a probability distribution P9, 0 is the parameter of interest, (X ,A) is a sample space of X. Suppose Po 6 P = {P9 : 0 E 9}, P is the family of probability distribution which is dominated by a 0—finite measure a. Let G = {g} be a group of 1-1 transformation from X onto X. Let A E A, put Paw) = Pug-1A) (2.3) We will assume the following conditions: Condition B: Let Q be a 0—algebra of subsets of G, and assume the followings: i) Let pg be a measure such that pg(A) = p(gA) and pg < p for all g E G ii) Let pg(x, g) be a density of P9,, with respect to ,u and that p9(x, g) is Ax g —measurable iii) Functions ¢h(g) = hg and zbg(h) = gh are g-measurable. 11 iv) There exists an invariant probability measure V on C}, that is, V(Bg) = u(gB) =V(B)forallgEGandB€Q. We shall say that an event A E A is G—invariant, if gA = A for all g E G. We can see that the set G—invariant events is a sub 0—algebra B C A, and we say that if f is B measurable iff f(g(x)) = f(x) for g E G. Theorem 2.4 (Hajek (1965)) Let P9 6 P and define PM by Equation 2.3 for each 0. Under Condition B, the sub 0—algebra B of G-invariant events is partially suf- ficient for 0. Proof. It is enough to show that 8 satisfies Definition 2.1 i) Since 8 is a sub 0—algebra of G—invariant events, for B 6 B P9,,(B) = 130(3). ii) for A e A, let P...(A) = / Pix/1W9) we have Pu.(A) = / MP9($,9)dfl]d'/(9) = L/po(z,g)dt’(g)dp With $903) = [170(3) g)du(g) P...(A) = Amway (2.4) 12 Let Po be some probability measure such that P9 << Po << [1 and define p0(x) by We) = f po(z,g)dV(9)- P...(A) = A :gégmzw (2.5) It follows from Theorem 3.3 of Hajek (1965) that p, (x) /'p'o(x) is B—measurable and it is also a density of P”, with respect to PW. By Lemma 1 page 401 of Billingsley (1979), it follows that B is sufficient for {PW}. U Chapter 3 Comparison of Experiments in the Presence of Nuisance Parameters 3.1 Preliminaries In this chapter we study the extension of the concept of partial sufficiency to two experiments. Let E = {X,A,P¢ : t E T} and f = {y,B,Q. : t E T} be two experiments. Following Blackwell (1951), we define: Definition 3.1 E is more informative than .77 if for any decision problem (A, A) and loss function L(t,a), if given any decision rule 6 in .7, there exists a decision rule 6" in 8 such that the risk functions satisfy, for all t E T R(t,6‘) s R(t,6). 13 14 Definition 3.2 8 is sufficient for .7", if there exists a Markov kernel M from 5 to .7: such that, for allt E T and B 6 B / M(z.B)dP.(z) = 4MB) Blackwell(1951) showed that Definition 3.1 and Definition 3.2 are equivalent when T is finite. When the experiments are dominated the equivalence continues to hold, see Feldman and Ramamoorthi (1984) for a proof. 3.2 Main Results A direct analogue of definition 3.1 in the context of “partial sufficiency”, in view of the remark in Section 2.3 of Chapter 2, is not available. However, motivated by Theorem 2.1 of Chapter 2, we define the following: Let 8 = {X,A,P9,,;(0,0) E O x E} and f = {y,B,Q9,a;(0,0) E ('3 x E} be two experiments. As before we treate 0 as the parameter of interest and 0 as the nuisance parameter. For an action space (A, A), let £3, be the class of all bounded loss functions which do not depend on 0. Definition 3.3 E is more informative than .7 for 0, if for any decision problem, there exists a probability measure [19 on 2 such that, given any decision rule 6 in .7, there exists 6" in 8 such that for all L 6 £31 RL(0rar 6.) S fRL(0rar 6)dpg(0’). Remark: For any L E C, defining L1(0,0,a) = supL(0,0,a)—L(0,0,a). 15 It is easy to see that the “S” in Definition 3.3 can be replaced by “=” Definition 3.4 8 is partially sufl‘icient for .7: if there exists a Markov kernel M (-, ) from 8 to J: and probability measures #9 on 2 such that [M(.,B)dp.,,(.) = [mammal for all (0,0) 6 9 x E and B E 3. Theorem 3.1 8 is more informative than .7: for 0 ifl' 8 is partially suflicient for f. Proof. (i) Suppose E is more informative than .7: for 0. Choose (A, A) to be ()2, B) and let 6(y, E) = IE(y), where I E() is an indicator function. Then, from the assumption that 8 is more informative than .‘F for 0, we have a decision rule 6" in 8 such that, for all L 6 £3, 12,,(9, 0, 6") = / mo, 0, 6)dp9(0) (3.1) For each B E B, the loss function L(0, 0, a) = [3(a) is in 53,, where IB(-) is an indicator function. Using Equation (3.1), it is evident that 6" satisfies [6*(x.B)dPo..(m) = /Q0.a(B)dI‘0(0)- (ii) Suppose E is partially sufficient for .77. Let M be the Markov kernel provided by the partial sufficiency of 8 to f. Given any decision rule 6 in .7, define 6" by ms) = [6(y.E)M(z.dy). 16 It is easily verified that 6" satisfies Rama) = [mmowo for all L 6 £34. El Let 8 = {X,A, P9,; (0, 0) E ('9 x E} be an experiment. If A0 is a sub 0—algebra of A, we will denote by So the experiment {X ,A0,Pg,,; (0, 0) E 9 x 23}. Our next theorem relates partial sufficiency and sufficiency. Theorem 3.2 Let E = {X,A, P9,; (0, 0) E O x E} be an experiment and let the sub 0—algebra A0 be H—sufi‘icient for 0. Similarly let .7: = {37,8, Qg,,;(0, 0) E O x E} be an experiment and let 80 be H—sufl‘icient for 0. Then S is partially suflicient for f ifl 80 is sufficient for .752). Proof. Suppose 8 is partially sufficient for 1:. Let 6 be any decision rule in .70, then since 6 is also a decision rule in f, there exists a decision rule 6" in 5 such that 12,,(0, 0, 6*) = f 1240, 0, 6)dp9(0) for L 6 £34. However, since 80 is H—sufficient for {)’,B,Qg,, : (0, 0) E G x 2} and L 6 5°,“ RL(0,0, 6) is constant in 0, so that for all (0, 0) E G x 2 Ramos) = [summers and hence we have RL(0,0,6"') = RL(0,0,6) (3.2) 291,3:ng 17 By H—sufficiency of A0, we have a decision rule 65 in 80 such that RL(9.0.56) = [RL(0.0.5‘)d£o(a)- However by equation (3.2), we have that RL(0, 0, 6") is a constant in 0 so that RL(0,0,65) = RL(0,0,6‘) = RL(0, 0, 6). Conversely, suppose 50 is sufficient for .70, then a) the Markov kernel M1(x, A) = I .4 from 8 to £0 satisfies, for all A 6 A0 [M1(2.A)dP.,.(x) = P.,.(A) b) there exists a Markov kernel M; from £0 to .70 such that, for all B 6 Bo /M,(.,B)dp,,,(.) = 629.03) c) there exists a Markov kernel M3 from .70 to .7 such that, for all B 6 Bo / M3(y.B)on.(y) = [ammo It is easily verified that the Markov kernel M($,B) = /M3(y,B)M2(x2,dy)M1(x1,dx2) satisfies / M(z.B)dPa..(z-) = / Qop(B)dflo(0)- (3.3) In the presence of partially sufficient 0—algebras, the following theorem is of in- terest. 18 Theorem 3.3 Let E = {X ,A, P9 ,0,(0,0) E 9 x E} be an experiment and let A0 be H—sufl‘icient for A. Similarly let .7 = {y,B,Qg,,,(0,0) E G X E} be an experiment and let 30 be H—suflicient for 8. Then the followings are equivalent: i) Given any decision rule 6 in .7, there exists 6" in 8 such that for all L 6 £3, 811p RL(0rUr6.) S sup RL(0rar6)' 062 062 ii) 80 is sufficient for .70. Proof. i) implies ii). Let 6 be any decision rule in .70. For L 6 L we have for all (0, 0) E 9 x E RL(0,0',6) = sup RL(9,0,6). (3.4) 062 Since 6 is also a decision rule in .7, we have by i ) a decision rule 6" in i) such that Rama) 5. sup law. a. 6') s Rama). 068 Since A0 is paritally sufficient for A, there exists 6; in 80 such that stoma) = [RL(9.a.6‘)d£.(a) s supRL(0.a.6‘) 062 < RL(0,0,6). So that So is sufficient for .70. ii) implies i). Let 6 be any decision rule in .7. Then there exists 61 in .70 and 6" in 80 such that RL(0,0,61) = / RL(0,0,61)dp9(0) g 31331240, 0, 5) 19 and RL(91 U, 6.) S RL(03 0’, 61) so that sup RL(02 0', 6.) S sup RL(0r 0: 6) 062 062 Since 6" is in $0 and hence in 8 this establishes i). Chapter 4 Examples and Applications We give a few examples to illustrate the notion of partial sufficiency and then we show the application of the theorems in Chapter 3. Some of these examples already appear in Hajek (1965). Others are new. 4.1 Examples In this section, examples of partial sufficiency will be given in terms of a statistic T instead of the sub 0—algebra [3, induced by T . However, before giving examples one may recognize that for T to be H-sufficient or partially sufficient statistic for 0, it is necessary that T is 0—oriented or equivalently, we have a factorization of the form P(z‘l0. 0) = 9(Tl 9)f($lT.9, 0) (4-1) where p(x|0, 0) is a density function of P9,,“ It is also necessary that there exists 69,3 probability measure on 2 such that the 20 21 “mixed” density function, pea) = panama) can be factored as peak) = G(T.9)F($) (4-2) We now look at examples for partial sufficiency. Example 4.1 (Héjek(1965)) Consider a sample X = (X1,X2, . . . ,X,,) of size n, each i.i.d from N(p,02). The statistic T(X) = s'2 = E?=1(X.- —X)2/(n — 1) is partially sufficient for 02 if 02 E (0, K) for K finite. To see this, we first factor the density function of X as follow: paw”) = An)... [g3] ex, PM] 202 —fl where A(0) = ( 21w) Choose {a to be a normal distribution with mean 0 and variance (K — 02) / n. The “mixed” density function is: 17(140) = LPMMUWA/I) = A(0)exp 721% [a eXP [419552.] (16.01) = A(0)exp ’37:; B(-X—)C(0) (4.3) where B(X) = exp [-i] C(0) = 0 22 Remark: If we choose {a to be uniform distribution (the Lebesgue measure) over the whole real line, the proof Theorem 2.2 will break down since the left hand side of the Equation 2.1 is equal to infinity but not the right hand side. It seems unlikely that s2 will be P-sufficient for 02, if 0 6 (0,00), however we do not have proof. Example 4.2 (Neyman & Scott) Consider the data consisting of 2n observations X1,X;,X2,X;,...,X,,,X,',. Let X.- and X: be independent normal random variables with mean [1,- (i = 1,2, . . . ,n) and variance 0’. The parameter of interest is 02, the nuisance parameter is the vector p = ([11,fl2, . . . ,pn). Take 32 = Z?=1(X,- — X32, X.- = (X,- + X:)/2 and with A(0) = ( 2x0)-n, we have Melina”) = A(U)6Xp["%:_:]exp[-n(x‘—“‘)] 202 The statistic 32 is clearly 0-oriented and is partially sufficient if we take {,(p1,p2, . . . ,pn) = 2;, 43001,), where ¢,(p,~) is a normal density with mean 0, vari- ance (K — 02)/2 and also assume that 06(0, K), K is finite. Example 4.3 Let X = (X1,X2, . . . ,X,) has a multimomial distribution with pa- rameters n, pl, p2, . . .p,. The distribution of X is given by n! _ __ _ __ n1 n: n. P(X1 — n1,X2 — 712,...X, - 12,) — n1!n2!u.n‘!p1 p2 ...p‘ where Zf=1pg = 1 and 2.9:, n; = n. The statistic T(X) = (T1(X1), T2(X2), . . . , T.(X,)) is sufficient for (p1,p2, . . . p,). Also the statistic T1(X1) = n1 is partially sufficient for p1 since the marginal distri- bution of T1(X1) is binomial(n, p1) which is independent of p2, p3, . . . , p,. Hence, it is pl-oriented. 23 The factorization equation (4.2) holds if take 5,, a point mass at _ 1"P1 1-P1 1-P1) (p2rP3,-°-1ps)"(3_lr3_lr"'a8-1 Sowehave PTX — X- X— _ "1 val-1’1"“ ( ( 1)-nla 2—n2r” s_ns)£p;(p2ap3wr°rps) — mp1 (3_1) _ n!(s—1) p1 ”(l—pl)” _ n11...n,! 1—p1 s-l The argument above works for any p,- for i = 1, 2,. . . , 3; therefore, we have T,(X,~)) is partially sufficient for p,-; i = l, 2, . . . , 3. Example 4.4 A linear model is represented by Y = X11 + X20 + e, E(e) = 0, Var(e) = 021", where Y is an n x 1 random vector, X1 is n x k matrix with rank k, X; is n x p matrix with rank p and L, is an n x n identity matrix. 1' is a k x 1 vector of parameters of interest, ,0 is a p x 1 vector of nuisance parameters, and e is an n x 1 random vector of errors from a normal distribution with mean 0 and covariance matrix 021". The density function of Y, with respect to Lebesgue measure, is p(leafl) = ( 27r0)’"exp [—%_2-“y _ X17. _ X2flllz] Let M(X,) denote the space spanned by the columns of X,- and .M(X,-)l = {x : x'y = 0for y E M(X,)}, for i = 1,2. Let P denote the orthogonal projec- tion onto M(X2)J', so Q = I — P is the orthogonal projection onto M (X2), with y=Py+Qy; “(P31 - X17) + (Qy - X20)“2 = lle - Xflllz + lle - X20”2 lly — X17 — X20”2 24 Hence, P(leHB) = (maymexl) {—53:2- {llpy _ X17”? + IIQy _ Xgfl||2}] Py is r—oriented since E(Py) = PXlr and Var(Py) = 02P. Choose £,(fl) = 133(0), therefore, the statistic Py is partially sufficient for 1'. Example 4.5 (Maximal Invariance) Consider a sample X = (X1,X2, . . . ,Xn) of size 11, each i.i.d from N(p,02). The transformation g¢(X1,X2,...,X,,) = (X1 + a,X2 + a, . . . ,Xn + a) of the group G = {9a : a 6 33} transforms the sample space 3t” onto itself. It is associated with the group G = {g—a : a E 33} of E050) = (p + 0,0) of parameter space onto itself. The group G leaves 0 invariant. The maximal invariant for the problem of estimating 0 with respect to the group G is the different statistic D = (X2 —X1,X3 — X2,...,Xn —X1). The statistic s, as a function of D, is invariant and partially sufficient for the problem of estimating 0 in the sense that i) s is 0-oriented, and ii) the statistic s is sufficient for 0. Example 4.6 (Invariance) Let X be a random vector taking value in .5132. Let X ~ N(p,I), p E 322. For the parameter p 6 322, to know p E 32“, we need to know the norm of p and the angle from a fixed array, say, it = (“pH/J2, “pH/J2), represented by I‘, an orthogonal matrix, such that I‘ll = p. If we are only interested in estimating ”p”, we may treat I‘ as a nuisance parameter. 25 In terms of invariance, let the group G = 02 = all orthogonal transformation on 922 and P3; = N ([72, I). The statistic "X H is sufficient for ”p", it is also G-invariant since ||I‘X I] = ”X H. Clearly, there exists an invariant probability measure on 02 since 02 is compact. By Theorem 2.4 ”X” is partially sufficient for "p”. 4.2 Application of Comparison of Experiments in the Presence of Nuisance Parameter 4.2.1 Comparison of Normal Experiments with Unknown Mean and Unknown Variance Let E.- be a normal experiment with unknown mean p and unknown variance 0”“ where k,(> 0) is a known constant; i = 1,2. Suppose that we are only interested in making inferences on the parameter 0 with regardless to the value of p—that is, p is the nuisance parameter and 0 is the parameter of our interest. We want to determine for what value of k1 and k2 that $1 is more informative than £2 for 0. The next theorem will answer this question. Theorem 4.1 Let E,- be a normal experiments with unknown mean p and unknown variance 0'“ where k, > 0 andi = 1,2. If 1:1 > It; and 0 E (0,K], K < 00, then 81 is more informative than £2 for 0. Proof. Let s? be a sample variance obtained from n,- observation from 8.- for i = 1, 2. By Example 4.1 s? is a partially sufficient statistic for 02'“; i = 1, 2. Let 8,} and 83 be two experiments derived from the partially sufficient statistics sf and 33, respectively. 26 Then by Theorem 3 of Goel and DeGroot (1979), 83 is sufficient for 83 if In > k; > 0. So by Theorem 3.1 and Theorem 3.2, we have if k1 > ’02 > 0 then 81 is more informative than 82 for 0 iii 83 is sufficient for 83. And this concludes the proof. 0 Remark: Goel and DeGroot (1979) proved the above theorem for the case of p assumed to be known but no restriction on 0. The condition that 0 is bounded which we required in proving the above theorem, is a legitimate assumption that can be impose here. 4.2.2 Comparison of Linear Normal Experiments With A ' Known Nonsingular Covariance Matrix Let E,- = £(X,-r + Zgfl, 021m.) be a linear normal experiment which is represented by Y = X57 + Zrfl + c, where Y is an n; x 1 random vector, X,- is n,- x k matrix with rank k, Z,- is n,- x p matrix with rank 1) and In, is an n,- x n,- identity matrix. 1' is a k x 1 vector of parameters of interest, 0 is a p x 1 vector of nuisance parameters, and e,- is an n,- x 1 random vector of errors from a normal distribution with mean 0 and covariance matrix 021,“; i = 1,2. By Example 4.4 P,-y,- is a paritally sufficient statistic for r; i = 1,2. Let 8,", = £(P,-X,-r,02P,~) be a linear experiment base on the partially sufficient statistic ng; where P,- is an orthogonal projection matrix onto M(Z,)‘L. Note that P,- may be represented in the form P,- = In, -— Z;(ZfZ,-)"1Z{; i = 1,2. Theorem 4.2 Assume the above setup, 81 is more informative than £2 for r ifl X{P1X1 — XéPng is non negative definite. 27 Proof. i) Suppose 81 is more informative than 82 for 1'. By Theorem 3.1 and Theorem 3.2, we have 83 is sufficient for 83. It follows that from Rao— Blackwell Theorem that Var(c’i'l) _<_ Var(c’i'g); c E M(XQPZ’) C M(X{P{) and dig is the UMVUE of c’r. Since the UMVUE and BLUE coincide, Var(c’fi) = c’(X{P,-’P,P,~X,~)"c = c’(X:P,-Xg)'c. By Lemma 2 of Steniak, Wang and Wu (1984), we have X{P1X1 — XéPng is non negative definite. ii) Suppose XiPle — X§P2X2 is non-negative definite. Let G = X{P1X1 — XéPng. Denote yf, as a random vector representing 83; i = 1,2. Let 85' be a “fictitious experiment” such that XLX. = G, X. is a design martix of 85. Let y; be . a random vector representing 85' and suppose y; and y: are independent. Then it follows that (P1X1)’y3 is sufficient for r and (P2X; )’ y: +X 1y; is sufficient for 1' under the combination of experiment 83 and 83'. But (P1X1)’y3 has the same distribution as (P2X2)’y3 +X1y5. Hence 83 is sufficient for 83. By Theorem 3.1 and Theorem 3.2, 81 is more informative than £2 for r. U Bibliography 10. 11. Billingsley, P., Probability and Measure John Wiley, New York, 2nd, 1979. Blackwell, D. ( 1951), Comparison of Experiments. Proceedings of the Second Symposium on Mathematical Statistics and Probability, 1:92-102. Goel, P. K., DeGroot (1979), Comparison of Experiments and Information Measures. Annals of Statistics, 7 :1066-1077. Feldman, D., Ramamoorthi, R. V., A Decision Theoretic Proof of Black- well Theorem. Technical Report, Department of Statistics & Probability, Michigan State University, 1984. Hajek, J. (1965), On Basic Concepts of Statistics. Proceedings of the Fifth Symposium on Mathematical Statistics and Probability, 1:139-162. Fraser, D. A. S. (1956), Sufficient Statistics With Nuisance Parameters, Annals of Mathematical Statistics, 27:838-842. Kolmogorov, A. N. (1942), Sur l’estimation statistique des parameters de la loi de Gauss. Izv. Akad. Nauk SSSR Ser. Mat., 6:3-32. Neveu, J., Mathematical Foundations of Calculus of Probability Holden- Day, San Franciso, 1965. Neyman, J ., Scott, E. L. (1948), Consistent Estimates Based on Partially Consistent Observations, Econometrica, 16:1-32. Neyman, J ., Pearson, E. S. ( 1936), Sufficient Statistics and Uniformly Most Powerful Tests of Statistical Hypotheses, Statistical Research Memoirs of the University of London, 1:133—137. Steniak, C., Wang, S. and Wu, C. F. (1984), Comparison of Linear Exper- iments With Known Covariances. Annals of Statistics, 12:358-365. 28 "Illllllllll‘llllls