LIBRARY Michigan State University PLACE IN RETURN BOX to remove thie checkout from your record. TO AVOID FINES return on or rorbef edete due. I DATE DUE DATE DUE DATE DUE is? -%i %%i% MSU le An Mfirmetive Action/Equal Opportunity Initiation POSTERIOR CONSISTENCY IN SOME BAYESIAN N ONPARAMETRIC PROBLEMS By Sm'kanth K. Rajagopalan A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1997 ABSTRACT POSTERIOR CONSISTENCY IN SOME BAYESIAN N ONPARAMETRIC PROBLEMS By Sm'kanth K. Rajagopalan Issues regarding posterior consistency in Bayesian inference are of interest both to frquentists as well as Bayesians. In this dissertation we study different notions of posterior consistency in some Bayesian nonparametric problems, using Dirichlet process and Polya tree process priors. The first part of the dissertation deals with construction of priors (that yield consistent posteriors) for the class of all distributions symmetric about a point. We consider two natural methods of constructing priors for symmetric distributions, and study the priors obtained by the two methods using Dirichlet processes and Polya tree processes. The second part deals with the Bayesian analysis of right censored data under a nonparametric formulation. We study different Bayesian approaches to this problem with emphasis on the approaches of Susarla and Van Ryzin (1976) and Tsai (1986), who both use Dirichlet process priors. We establish the posterior consistency for both the approaches and also generalize some of the results to include Polya tree priors as well. The Bayesian analysis of interval censored data (again under a nonparametric formulation) is studied in the last part of the dissertation. This portion is rather tentative and we mainly highlight the difficulties in trying to adapt the approaches of Susarla and Van Ryzin (1976), and Tsai (1986) to this problem. To Amma and Appa; Professor A. M. Goon; Arupda iv ACKNOWLEDGMENTS I would like to express my sincere gratitude to my dissertation advisor, Professor R. V. Ramamoorthi, for his constant help, advice, encouragement, guidance, mentor- ship and extreme patience. His caring personality, friendly nature, and excellent sense of humour and wit made the whole doctoral experience enjoyable even during trying circumstances. I would also like to thank, Professors James Hannan, Joseph Gardiner, V. Man- drekar and Habib Salehi for serving on my guidance committee, Professors Hannan and Gardiner for their encouragement, suggestions and many helpful conversations, and Professor V. Mandrekar for useful suggestions on improving the presentation. I would also like to thank Professor J. K. Ghosh for many informal discussions and suggestions. I cannot thank my parents and sisters enough, for the support and encouragement provided by them during my entire student life. This has been the main motivating force behind all my endeavours and will always be fondly remembered and cherished. I would also like to thank Professor A. M. Goon for the care and interest he showed in my progress as a student, during my undergraduate days, and for the encourage- ment to pursue graduate studies in Statistics. The help and guidance received from Dr. Arup Kr. Pal during my student days at the Indian Statistical Institute was in- strumental in igniting my interest in Probability theory and eventually Mathematical Statistics. Last, but not the least, I am thankful to my Beloved Lord, Bhagavan Sri Sathya Sai Baba for all His Love and Grace, without which, this would not have been possible. [Major portion of this research was supported by the National Institute of Health Grant 1 R01 GM49374.] TABLE OF CONTENTS 0 An overview 1 1 Preliminaries 1.1 General Bayesian inference and posterior consistency ..... 5 1.2 Probability measures on probability measures ......... 7 1.3 Topologies on the space of probability measures ........ 10 1.4 Convergence of probability measures and posterior consistency 11 1.5 Dirichlet processes ........................... 14 1.6 Polya tree processes .......................... 20 2 Polya Tree Priors for Symmetric Distributions 24 2.1 Introduction and Summary ..................... 24 2.2 Symmetrization using Polya tree processes ........... 25 2.3 The posterior distribution and its consistency ......... 29 3 Nonparametric Bayesian inference with right censored observations 38 3.1 Introduction and summary ..................... 38 3.2 Dirichlet process priors for F .................... 39 3.3 Priors on the distribution of the observables .......... 46 4 Nonparametric Bayesian inference with interval censored observa- tions 50 4.1 Introduction and summary ..................... 50 4.2 Dirichlet process priors for F .................... 51 4.3 Priors on the distribution of the observables .......... 56 Bibliography 60 vi CHAPTER 0 An overview In any statistical experiment, data is collected following a probability model with an unknown parameter 0, lying in a parameter space 9. The problem of statistical infer- ence deals with drawing meaningful conclusions about 0, given the data. A Bayesian would use a prior probability measure on 6, representing her / his prior belief / opinion. Given the data, the posterior represents the updated belief/ opinion for the Bayesian. Since all Bayes procedures are based on the posterior it is quite natural to require that as more and more data become available, the posterior should concentrate more and more around the true parameter. This idea is formalized as the notion of pos- terior consistency, which has both Bayesian and frequentist interpretations. Priors that yield consistent posteriors ensure that the data eventually swamps the prior and opinions based on very different priors will merge as the data accumulates. Doob (1948) proved a very general result on consistency, which guarantees that the poste- rior will be consistent for all 0 except on a set of prior measure zero. When 6 is finite dimensional, Freedman (1963), and Schwartz (1965), show that under fairly general conditions, the posterior is consistent at all 0. Freedman (1963) also constructs an example which shows that posterior consistency will not always hold when 8 is the set of all probability measures on the space of positive integers. Problems of statistical inference with an infinite dimensional parameter space are of great importance, both theoretically and practically. The Bayesian approach to such nonparametric problems requires the study of (prior and posterior) probability measures on the space of all probability distributions over a set. Freedman’s (1963) example shows that posterior consistency may not always hold when 9 is infinite dimensional. Diaconis and Freedman (1986a) and the ensuing discussions highlight the need for a careful study of posterior consistency in nonparametric and semi- parametric problems. In this dissertation we focus on issues concerning different notions of posterior consistency in some nonparametric problems within a Bayesian formulation. Some of the problems that we study are made more complicated because of the fact that we only have censored data. In Chapter 1, we begin with an introduction to general Bayesian inference and different notions of consistency. We then review and discuss some of the properties of two important families of priors used in Bayesian nonparametrics, namely the Dirichlet processes [Ferguson (1973)], and its generalization, the Polya tree processes [Mauldin et a1. (1992), Lavine (1992, 1994)]. We also prove a convergence result for Dirichlet processes that enables us to establish a strong form of consistency for the posterior of a Dirichlet process. Chapter 2 studies the problem of constructing a family of priors for problems where the parameter set is the space of all distributions symmetric about an arbitrary point on the real line, which we denote by M 3 (IR). This problem has been studied by Dalal (1979), who constructs a class of priors using Dirichlet process priors, which has been used in the context of the location problem by Diaconis and Freedman (1986). We consider two natural methods of constructing a prior on M 5 (R), and study the behaviour of the posterior under the two methods, using both Dirichlet processes and Polya tree processes. We show that using appropriate Dirichlet processes, the two methods yield the same prior on M 5 (1R), while using appropriate Polya tree processes yield different priors on M 5 (1R), unless the Polya tree processes being considered are Dirichlet processes. We also establish the posterior consistency for both the approaches. In Chapter 3, we consider two different approaches to Bayesian inference with right censored data. Susarla and Van Ryzin (1976), first considered this problem in a Bayesian set-up by considering a Dirichlet process prior for F, the distribution function of interest. They obtain a Bayes estimate and show that this estimate converges to the usual product limit estimate of Kaplan and Meier (1958). Blum and Susarla (1977), complemented this result by proving that the posterior distribution given the right censored data is a mixture of Dirichlet processes. We show that the posterior can be represented as a Polya tree process, a representation which clarifies some of the calculations in Susarla and Van Ryzin (1976). Using this Polya tree representation for the posterior, we then are able to establish the posterior consistency. for this approach. Yet another approach to Bayesian inference with right censored data, is to consider priors for the observable random variables as studied by Tsai (1986), who considers a Dirichlet process prior for the distribution of the observable random variables. Under this approach, using a result from Peterson(1977), we are able to establish consistency of the posterior for a wide class of priors. Chapter 4 is somewhat tentative. Here we consider the Bayesian analysis of the interval censoring problem with single inspection time. We began this study with a goal of obtaining a Bayesian interpretation of the well known Turnbull estimator (1976), which can also be thought of as the nonparametric maximum likelihood esti- mator (NPMLE). Similar to Chapter 3, here also we look at two different approaches. We highlight the fact that approaches similar to the ones that yield interesting results in the right censoring problem do not yield interesting results in this case. In the first approach we consider a Dirichlet process prior for F, the distribution of interest and study the limiting behaviour of the Bayes estimate. As pointed out in Wang (1993), the NPMLE is not necessarily the limit of the Bayes estimates. We present a set of examples which show that no obvious relationship connects the limiting Bayes esti- mate and the NPMLE. We also make an attempt to study consistency prOperties of the posterior, when we consider priors for the distribution of the observable random variables. Unfortunately, the result that we have in this context, though mathemati- cally nice, is not statistically very useful. CHAPTER 1 Preliminaries 1.1 General Bayesian inference and posterior con- sistency Consider a family of probability measures { Q9 : 6 E 6 } on a measurable space (X, A). We view (6,8) as a measurable space such that Q9(A) is B—measurable for every A e A. We write 623° for the product measure on X°° which makes the coordinate random variables X1, X2, . . . , independent with common distribution Q9. In general X and O are Borel subsets of complete separable metric spaces. (In this dissertation X will either be the real line or the positive half line, and 9 will be the set of all probability measures thereon.) Let p be a prior probability measure on O, and let P” denote the joint distribution of the parameter and the data: P..(B x A) = / QS°(A)u(d9) for B E B, and A 6 A°°. The posterior is the P” -distribution of the parameter 0 given the data X1, X2, . .. ,Xn, and is formally defined below. We denote this by #n(' I X17 X2) ' ' ' )Xn)° Definition 1.1.1 un(~ [ ) : B x X" -+ [0,1] is called a posterior distribution given X12X27'“ ,Xn 2f, 1. For each (X1,X2, . .. ,Xn) E X",,u.n(- | X1,X2, . .. ,Xn) is aprobability measure on (9,8). 2. For each B E B,u,,(B | ) is A" measurable 3. For every B E B,A EA" P:(B X A) = / [Jn(B [ X1,X2,... ,Xn)dPn(X1,X2,... ,Xn), A where P:(B x A) = Pfl(B x (A x X°°)) and P"(A) = Pfle x A). The posterior distribution is of course unique only up to P" null sets. In situations we consider, there is a natural candidate for the posterior and we will generally refer to it as ‘the posterior’. (For the Bayesian) The posterior distribution encapsulates all that is known about 6 following the observation of the data X1,X2,... ,Xn, and one would want the posterior to concentrate around the true value of the parameter as more and more data become available. The main topic of study in this dissertation is the consistency property of the posterior sequence {un(- | X1,X2, . .. ,Xn)}n21 in certain Bayesian nonparametric settings. The sequence of posteriors {un(- | X1,X2, . .. ,Xn)},,21 is said to be consistent at 00 E G if, whenever 00 is the true value of the parameter 9, as observations accumulate the effect of the prior diminishes and the posterior gets closer and closer to the ’true’ prior 690 - the degenerate prior at 00. (More formal definitions of posterior consistency will be mentioned later.) Posterior consistency has both Bayesian and frequentist interpretations, and for a detailed discussion of this notion of consistency, especially in a nonparametric set-up, the interested reader is referred to Diaconis and Freedman (1986a) [pages 3, 4, 10-20]. 1.2 Probability measures on probability measures Throughout this dissertation IR will denote the real line, B(lR) will denote the Borel o- algebra of IR, and M (R) will denote the space of all probability measures on (IR, B(IR)). Also, 1R+ will denote the positive half line, with B(IR+) and M (lR+) having an analo- gous interpretation. On M (IR), we consider the smallest o-algebra that makes the map P +—> P(B), measurable for each Borel set B E BUR). We denote this o-algebra by BM, i.e. BM 2 a{P(B) : B 6 B(IR)}. Since the elements of M (R) are functions on BUR) taking values in [0,1], M (R) can be viewed as a subset of [0,1]B(R). If the product space [0,1]B(Rl is equipped with the product o-algebra (the smallest o-algebra that makes all the coordinate functions measurable), the restriction of this o—algebra to M (R) is BM. However, M (IR) is not a measurable subset of [0,1]B(R). Therefore one needs to be careful in constructing probability measures on M (IR). The following two theorems implicit in Ferguson (1973), and mentioned in Ghosh and Ramamoorthi (1996-97), give a way of constructing and defining probability measures on M (IR) Theorem 1.2.1 . Suppose for each collection {Bl,Bz, . . . ,Bk} of subsets of IR, a distribution #3,,”qu is assigned for (P(Bl), . .. ,P(Bk)) such that: 1. If {A1,A2,... ,A,} C {Bl,Bz,... ,Bk}, then the marginal distribution of (P(Al),~- ,P(Az)) derived from #3....3, 2'8 uA,,...,A,. 2. For every partition {Bl,Bg,... ,Bk} of IR, ughqu,‘ is a probability mea— sure on S), = {(p1,...,pk) : p,- 2 0,21),- = 1} and further if A,- is an union of sets from {Bl,Bg,... ,Bk}, then #A1,...,A.. 2 distribution of (28,041 P(Bi)’ ° -- 123,04" P(Bi)) 3. If An 1 (1), then P(An) J, 0 in distribution. Then, there exists a probability measure it on M (R) such that the distribution of (P(BI),--- ,P(Bk)) underu i3 #81,...,B,.- [The proof is taken from Ghosh and Ramamoorthi (1996—97), and is mentioned here for the sake of completeness] Proof: Using 1. and 2., it follows from Klomogorov ’s consistency theorem that there exists a probability measure on [0,1]3 with finite dimensional marginals given by uB,,B,,...,3k. Since M (R) is not a measurable subset of [0,1]B, it is not easy to show that this measure is supported by M (R) So we take an indirect route. Let .77 be the set of all distribution functions on ’R, and let 7' be the restriction of functions in f to a countable dense set Q, say the rationals. Then .7: = {F: F is monotone, right continuous, lim¢_,_oo F(t) = 0,1imt—mo F(t) = 1 l and .7” = {F : F is monotone, right continuous on Q, limt_,_co F(t) = 0, limHoo F (t) = 1 }. Take any t1 < t2 - - - < tk in Q. Set the distribution of (F(tl), F(tz), . - - , F(tk)) as the distribution of (P(—oo, t1],P(—oo,t2], - -- ,P(—oo,tk]). This assignment gives a consistent specification and hence there exists a probability measure a on [0, 1]Q with these marginals. We now argue that ,u(f'*) = 1. It is easy to see that for any fixed t1 < t2,F(t1) < F (t2) with u probability 1. Since Q is countable u{F : F is monotone in Q }=1. Condition 3. gives that F is right continuous on Q with probability 1, so that u(f*) = 1. Let the map 45 : f —+ .7” be the restriction of F in f to Q. Since this map is 1-1, onto and measurable, the probability on .7” can be transferred to a probabil- ity measure on .7. Under this measure (P(Bl), P(Bg), - -- ,P(Bk) has the marginal distribution uB,,B,,...,Bk whenever B,- is of the form (—00, t,], t,- E Q. An usual induction argument shows that the statement holds for all borel sets. <> Theorem 1.2.2 stated below shows that it is enough to specify u 31,...,B,, for every partition Bl, B2, . . . ,Bk of IR. Theorem 1.2.2 Suppose the following two conditions hold: 1. For every finite partition B1, B2, . .. ,8), of IR, (P(Bl), . . . ,P(Bk)) has a distri- bution uBthk on 5],. 2. If Bl, Bz, . .. ,Bk and A1, A2, . .. ,An are two partitions of R, such that each A,- is a union of some Bjs, then tummy, is the same as the ”Bun-,3): distribution n 0f(ZB,-cA1 P(Bilv - - - 3 23,04" P(Bill- For any collection A1, A2, . . . , An of subsets of IR, take any partition 31,32, . .. ,Bk oflR such that each A,- is a union of some Bjs, and define HA1....,An as the [1131,”qu distribution of (23,0,1 P(Bi), . .. , 2&0," P(B,)). Then, {uA,,,,_,An: A,- E BUR), i = 1, 2, ..., n; n = I, 2, } satisfies condition 1 of Theorem 1.2.1. Remarks: We will see later how Theorems 1.2.1 and 1.2.2 are used to define the most commonly mentioned prior in Bayesian nonparametrics, called the Dirichlet process. Another way of defining a probability measure on M (R) is via probability measures on the space of all probability measures on sequences, and the Polya tree process discussed later is an example of one such prior. 10 1.3 Topologies on the space of probability mea- sures A major focus of this dissertation is on issues related to posterior consistency in nonparametric problems. Thus the parameter space is M UR), and the sequence of posteriors {un(- | X 1, X 2, . . . , Xn)}n21 is a sequence of probability measures on M UR). Since the notion of consistency involves convergence of probability measures on M UR), we next look at some of the commonly considered modes of convergence on M (R), and later present the corresponding notions of convergence on the space of probability measures on M (R) Weak Topology: The first notion we look at is weak convergence arising from the usual weak convergence on M (R). We recall that on M (IR) weak convergence is defined as: Let {{Pn},,21, P} C M (IR) P" is said to converge weakly (or in the weak topology) to P if f fdP" —+ f fdP for all bounded continuous functions f on R. For any Po 6 M (IR), sets of the form Up0={P:|/f,-dP—/f,-dPo|<6,-;i=1,...,k}, where for each i, f,- is a bounded continuous function on IR, constitute a base of open neighbourhoods for P0 under the weak topology. It is well known that under this topology M (IR) is a complete separable metric space, with B M as its Borel o—algebra [Parthasarathy (1967), Chapter II, Section 6]. Kolmogorov Metric: The Kolmogorov metric on M (R) is defined as follows: 511:“)an : suptelR I P(—OO,t] _ Q(—OO,t] I - Interest on this metric stems from the 1-1 correspondence between probability 11 measures on IR and the cumulative distribution functions, and the Glivenko-Cantelli theorem on convergence of empirical distribution functions. Under the metric dk, M (IR) is neither separable not complete. Total Variation Metric: The total variation metric dt on M (R) is defined as 61¢”)an = supseam) [P(B) — 52(3) [- This metric is uninteresting in the context of all of M (R). However, when the pa- rameter space is restricted to subsets of M (R) of the form L1(u) = {all probability measures onM (IR) dominated by a o — finite measure u}, (it is extremely useful, and has a nice form. For (P, Q} C L1(u), dt(P, Q) = i; f | % — {73 | du. Further, L1(u) equipped with d, is a complete separable metric space. 1.4 Convergence of probability measures and pos- terior consistency As noted earlier, M UR) when equipped with the weak convergence metric becomes a complete separable metric space with BM as its Borel o-algebra. Thus a natural topology on the space of probability measures on M (IR) is the weak topology arising from this metric on M (R). A formal definition is given below: Definition 1.4.1 A sequence of probability measures {an} on M UR) is said to con- verge weakly to a probability measure it (on M (IR) ) if / ¢(P)dun(P) a / ¢(P)du(P), 12 for all bounded continuous functions (25 on M (IR), and we write an —->‘” ,u or an => u. Under this convergence, the space of probability measures on M (IR) also becomes a complete separable metric space [ Parthasarathy (1967), Chapter II, Section 6]. A detailed study of weak convergence requires an understanding of the continuous (in the weak topology) functions on M (IR) But, we will mainly be interested in the case when u 2 (Spa, for some Po 6 M (IR) Since convergence in distribution of an to 61:0, is equivalent to convergence in probability of Pn to P0, where Pn ~ pm, this convergence can be described in terms of the continuous functions on IR rather than those on M (IR), as mentioned in the following proposition. Proposition 1.4.1 p" ——>‘” 6190 if un(Up0) —> 1 for all Upo of the form Up0 = {P :| [fidP — ffidPo |< 6,;i = 1,... ,k}, where for each i f,- is a bounded. continuous function on R. The non separability of M (IR) with either the Kolmogorov metric dk or the total variation metric dt prevents the induction of a natural topology on M (M (IR)), when M (IR) is equipped with either dk or d,. However Proposition 1.4.1 still enables us to speak of ’convergence’ of an to 6pc in the sense that as n -> 00, an concentrates more and more around Po, and these are formally mentioned in the definitions below. Definition 1.4.2 A sequence of probability measures {an} on M (R) is said to con- verge to 6P0 on uniform ( total variation) neighbourhoods if un(P : dt(P, P0) < 6) -—) 1, for all 6 > 0, and we write an —>‘ (Spa. Definition 1.4.3 A sequence of probability measures {un} on M (R) is said to con— verge to 6120 on k-neighbourhoods if un(P : dk(P, P0) < 6) —> 1, for all 6 > 0, and we write an —>" 6,20. Note: The last two notions of convergence provide for a stronger sense of conver- gence than the weak convergence of Definition 1.4.1 (and Proposition 1.4.1). 13 Most of our discussion will focus on weak convergence and hence on Proposition 1.4.1. We will on occasion consider convergence in k-neighbourhoods. Convergence on uniform neighbourhoods will in general not be relevant to our discussion. We now formally define the notion of posterior consistency under the same set-up mentioned in Section 1.1. Definition 1.4.4 The sequence of posteriors {un(- ] X1,X2, . .. ,Xn)}n21 is said to be 1. weakly consistent at P0, if {un(- | X1,X2, . .. ,Xn)} —->‘” 6P0 a.s. Po, and 2. k-consistent at P0, if {un(- I X1,X2, . .. ,Xn)} —>’° 6p0 a.s. Po, and 3. t-consistent at P0, if {,un(- | X1,X2, . .. ,Xn)} ——)‘ 6P0 a.s. Po. We end this section by mentioning a result which is used quite a lot in proving weak consistency of a sequence of posteriors. Throughout this dissertation for any [1 E M (M (IR)), [1 E M (R) will denote the probability measure defined as follows: [i(A) = Eu(P(A)), for all A 6 BUR). Proposition 1.4.2 Let {Mnlnzl C M (IR), be such that {flnhzl is tight as a family of probability measures on IR, then, {unhzl is a tight family of probability measures (with respect to weak convergence) on M (IR) Proof: The proof is along the same lines as that of Theorem 3.1 of Sethuraman and Tiwari (1982), and is mentioned here for the sake of completeness. Fix 6 > 0. By the tightness of {fin}an for every positive integer d there exists a sequence of compact sets Kd in IR, such that supnfln(K§) S 336%. For d = 1,2, . .. , let, Md 2 {P 6 M(IR) :P(K§) S fi}, and let M = (‘1de. Then by its very definition M is a compact subset of M (IR), in the weak topology. Further, 14 by Markov’s inequality, #n(M§) S dEu. (P (1(5)) : dfln(K§) 66 d37r2 Hence, for any n = 1,2,... ,un(M) 3 2,15% = 6. By Theorem 6.7, on page 47 of Parthasarathy (1967), this proves that { un}n21 is tight. 0 In the next two sections we introduce the two families of priors that are used in the problems considered in this dissertation, namely the Dirichlet processes and Polya tree processes. 1 .5 Dirichlet processes Dirichlet processes were formally introduced by Ferguson (1973, 1974), who mentions many of its basic properties, and also applies it to a variety of nonparametric problems. In the process a Bayesian interpretation for some of the commonly used nonparametric procedures were provided for the first time. Dirichlet processes arise naturally as an infinite dimensional analogue of the finite dimensional Dirichlet distribution, which itself is the multivariate generalization of the Beta distribution. Here we restrict ourselves to stating its definition and mentioning some of its basic properties. For a detailed account we refer the interested reader to Ferguson (1973, 1974), Schervish (1995), and Ghosh and Ramamoorthi (1997). Definition 1.5.1 Let a be a finite non-null measure on (IR, B(IR)). A (prior) probability measure P on M(IR) is said to be a Dirichlet process with parame- ter (or base measure) a if, for every finite measurable partition {Bl,Bg,... ,Bk} 15 of IR, the random vector (P(Bl),P(Bg),... ,P(Bk)) has the Dirichlet distribution ’D{a(Bl),a(Bg), . .. ,a(Bk)) under H’. In particular, for any A E BUR), P(A) has the Beta distribution B(a(A), aUR) —a(A)) under IP. So, Ep(a)(P(A)) = 2&— is the ‘prior’ guess for P(A). We view the Dirichlet Process as choosing a probability P randomly according to D(a) and write it as P E P(a). The existence of the Dirichlet process can be established using Theorems 1.2.1 and 1.2.2 mentioned earlier. A very clever- and elegant construction of the Dirichlet process is given by Sethuraman (1994) and is mentioned in the next theorem. This construction gives an insight into some of the peculiarities of the Dirichlet process, and is an extremely useful tool for simulation purposes. We will make use of this construction in Chapter III. Theorem 1.5.1 Let a be a finite non—null measure on (IR, B(IR)). Let {Yn}n21 be an i.i.d. sequence of random variables with Y1 ~ 61, and let {9n},,21 be an i.i.d. sequence of random variables with 01 ~ Beta(1, a(IR)), and let {Yn}n21 and {0n}n21 be independent. Define P1 = 01, and for n 2 2, Pn = 0,, HTIU — 0,). Then, H’ = 2‘,” 13,63»). is a Dirichlet process with parameter a. Support [Ferguson (1974), Facts 2. and 3.]. 1. If P 6 D(a), then with probability one P is discrete. 2. The topological support (that is, the smallest closed set with probability one) of D(a), w. r. t. the topology of weak convergence is the set of all distributions whose (topological) support is contained in the (topological) support of a. Thus, even though the measure theoretic support is ’small’, the topological sup- port is fairly large. For example, if the (topological) support of a is R then the 16 (topological) support of P(a) is all of M (IR), and ’D(a) gives positive mass to every open set in M (IR) Posterior Distribution [Theorem 1. Ferguson (1973)]. Let P E D(a). If, given P, X1,X2, . .. ,Xfl is a sample from P, then the posterior distribution of P given X1,X2,... ,Xn is P(a + 2'; 6;“), where 6,, is the measure giving mass one to 1:. Thus just like the (finite dimensional) Dirichlet distribution priors for the vector of proportions in a multinomial model, the Dirichlet processes provide a conjugate family of priors for M(IR). Predictive Distribution and Bayes Estimates. Let P E P(a) and let (i(-) = 016%. The Bayes estimate (w. r. t. squared error loss) of P(A), given a sample X1,X2,... ,Xn from P, is FAA) = E(P(A) |X1,X2,... ,Xn) = 197.5104) + (1 -pn)Fn(A), where Fn(-) denotes the sample (empirical) distribution and p1, = fi%. The Bayes estimate F}, is thus a linear combination of ii and the sample distri- bution function F". This Bayes estimate can also be looked upon as the ‘Predictive distribution’ of a future observation given X1, X2, . . . , X”. Convergence of Dirichlet processes. The Dirichlet process possesses nice continu- ity properties with respect to the base measure a. In Propositions 1.5.1 and 1.5.2 we mention two such properties, of which Proposition 1.5.1 is well known, while Propo- sition 1.5.2 is a new result. (Throughout ‘=>’ will denote weak convergence and all convergences are as ‘11 goes to 00’.) Also we will write or to denote the probability measure defined as &(A) = Ep(a)(P(A)), for all A E BUR). Proposition 1.5.1 Let an, for n = 1, 2, be finite non-null measures on IR such that 61,, => Po, (where Po 6 M(IR)) and anUR) —+ 00, then D{a,,) => 6pc. 17 [We mention the proof to illustrate the general principles behind the weak con- vergence results proved in this dissertation] Proof: By Proposition 1.4.2, since {(513.21 is tight {D(a,,)},,21, is a tight family of prob- ability measures. Let f be a bounded continuous function on R“, with compact support. It is enough to show that ’D(a,,)(Vé’o) —> 1, where V30={Pz|/fdP—/fdpol< 5}. f bounded continuous, with compact support, implies that there exists a simple function f5 = 2le aiIA,, such that A,’s are Po continuity sets, and sup: | f (x) —- f5(a:) |< %. Noting that I/fdP—ffdPol s I/f.dP—/f.dpo|+?3—", and k / fadP = Za,P(A,-). i=1 Our proof will be complete if we can show that Ep(a,,)(P(A,-) — P0(A,-))2 —) 0, and this follows from the fact that an(A,) ——) P0(A,-), and (07,,(A,-))2 —+ (P0(A,-))2. 0 Proposition 1.5.2 . Let an, for n = 1, 2, be finite non-null measures on IR such that 1. anUR) —-> oo, 2' suptEIR I 61,,(—oo,t] _ Po(—OO,t] I—) 0: and suptER I (1,,(—oo, t) — P0(—oo, t) [—> 0, 18 then ’D(a,,) —>" 6120. Proof: For any P E M(IR), let P(t) = P(—oo, t], and let P(t-) = P(—oo,t). We need to show that for any 6 > 0, P(an)(P : SUP I P(t) - P00) IZ 6) -+ 0- tEIR Let m be a fixed positive integer. Let ¢(u) :2 in f{;r : Po(:c) 2 u}, and let xmyc :2 ¢(k/m), for k =1,2,... ,m. We observe that P0(¢(u)—) S u S Po(¢(u)), and hence, P0(:rm,1—) S 1/m, P0(a:m,m-1) Z (1 — l/m), and for 2 S k S m, (Po($m,k—) — Po(17m,k—1))S 1/m. Let 1 S k S m — 1. For mm,k_1 S t < :cmyc, I P(t) - P005) ISI P(Im,k—) — Po($m,k—1) I V I P(meJc—I) " P0(93m,k") I, and for t 2 rm,m_1, |P(t) - P0(t) IS (1 - Po($m,m—1))V (1 - P(mmm—ll- Therefore suptem | P(t) — P0(t) |S Bm, where Bm = ma$k{Bm,k} V (1 — Po(:cm,m_1)) V (1 — P(xm,m_1)), and Bch = {I P(xm,k_) — P0($m,k—-1) I V I P(mch—l) '— P0(xm,k_') I} Hence, D(an)(P : suptelg | P(t) — P0(t) [2 e) S D(a,,)(P : Bm 2 c). 19 Let 6 > 0, and let Nm be such that, for all n 2 Nm, 6 twill I l o( l I 2m - 6 8:3: I an(t) -Po(t) | < 51;, and 01,,(R) > 962' By Markov’s inequality, and our choice of xmycs, D(anlU’ =I P(mm,k-) - Po($m,k--1) |.>_ 6) (P(xmjc") — P0($m,k-l))2 S ED(On) 62 < i(P(:c —)—P(:c ))2+-§§—]f0 alln>N _ £2 0 m,Ic 0 m,k-1 2m 1 I' _ m 1 1 66. S 62 m2 2m The second inequality above follows from our choice of Nm and from the fact that for any finite non-null measure a on R, and t 6 R, - 1 Ev(a)(P(t—))2 = a(t—) x a“: :7 and Ev(o)( (t-I) =5t(t—) Hence, DION”) 1 suptEIR I P(t) — Pom I2 6) |/\ P(an)(P : Bm _>_ 6) |/\ 2124.7:— + 66], for all n _>_ Nm- 20 Since 6 > 0 is arbitrary and the last mentioned inequality holds for all m, ”(an)(P = suntan: | P(t) - Po(t) I2 6) -+ 0- <> Posterior consistency: It is well known that a P(a) prior leads to a posterior that is weakly consistent at all Po 6 M (R) (This fact follows on observing that the posterior for ’D(a) given X1, X2, . . . ,Xn is P(a + 2;, 6,“), and then taking a,, = P(a + 2;, 6X) in Proposition 1.5.1.) The next theorem mentions that in this case, the posterior has the stronger k-consistency property. Theorem 1.5.2 Let P E P(a) , and given P, let X1,X2, . .. ,Xn be a sample from P. Then D(O)(Upo | X1,X2,... ,Xn) —> 1 as. Po, for all k-neighbourhoods Up0 of P0. ' Proof: We observe that the posterior for P(a) given X1,X2, . .. ,Xn is D(a + 23:16)“). The proof now follows from Proposition 1.5.2 by taking an 2 a + 2;, 6X“ and a simple application of the Glivenko-Cantelli theorem. 0 Critics of the Dirichlet process point to the fact that with probability one, the Dirichlet process selects a discrete distribution, as its major shortcoming. Polya tree processes discussed in the next section, is a family of workable priors which overcome this drawback of the Dirichlet process. 1.6 Polya tree processes Polya tree priors (or Polya tree processes) are a generalization of Dirichlet processes, and share many of the properties of the Dirichlet processes. These processes are de- scribed through a large number of parameters and a suitable choice of these param- eters allows the statistician to overcome some of the shortcomings of the Dirichlet 21 processes. Here also we mention only some of its basic properties, and for a de- tailed account, the interested reader is referred to Lavine (1992, 1994), Mauldin et.al. (1992), Schervish (1995), and Ghosh and Ramamoorthi (1997). Let r0 = R and II = {rm;m = 0,1, ...... }, where ro,7r1, . .. , is sequence of par- titions of R such that B = o(U3°7rm) and such that every B E rrm+1 is an inter- val and is obtained by splitting some 3’ E rm into two pieces. Let 8,, = R and let rm ={B€,,_._,€m : e,- = 0 or 1 forj = 1, ..., m} and let Bel....,em0 E rm“ and B,,,_,,,.m1 6 rm+1 be the two pieces into which B..,...,.,,, is split. Definition 1.6.1 A random probability measure P on (R, B(R)) is said to have a Polya tree distribution or a Polya tree prior with parameters (H, a) and we write P 6 PT (H, a), if there exists a collection of non-negative numbers a={a€,,,__,cm :ej=00r1,forj=1,... ,m;m=1, 2, ...} such that the following hold: 1. {P(B€,,,_,,€mg | B.,,,,,,€m).'ej = 0 or 1 forj = 1, ..., m; m = 1, 2, ...} are independent random variables. 2. P(B€,,__,,.mo | Burufim) has the beta distribution B{a5,,,_,,emo, a£,,,,,,€m1). Polya tree priors seem to have their origin in Blackwell (1973), and Ferguson (1974, page 620) (even though both Blackwell and Ferguson do not use the phrase ‘Polya tree priors’), and recently Lavine (1992, 1994) and Mauldin et al.(1992) investigate some of their interesting properties and set the course for their use in Bayesian analysis. Existence. The existence of Polya tree processes can be shown by first realizing it as a prior on the space of probabilities on the sequence space {0, 1}N and then transfering it to M(R). A more elegant way is to use de Finnetti’s theorem. We refer the reader to Mauldin et al.(1992) for a discussion of these issues. 22 Support. The support of a Polya tree process is controlled by the choice of the parameters a and of course the partitions H. Mauldin et a1. (1992) give sufficient conditions for the Polya tree prior to give mass one to the space of all continuous probability distributions. If for simplicity we consider the Polya tree prior for (0, 1] with rm = { ((92—71)), (215)] : i = 1, ..., 2m} - the set of all dyadic intervals of length 24;, and take ammem 2 m2, the resulting Polya tree will be absolutely continuous with probability one. This feature of Polya tree priors make it more attractive as a prior especially in the context of density estimation problems. Lavine (1992, 1994) have a discussion on the implications and interpretations of various choices of the partitions H and the non negative numbers a. From now on to avoid cumbersome notation we will write Be for B6,”, ,6," and as for durum unless it is very important to write otherwise. I Connection to Dirichlet process [Lavine (1994). Fact 2.]. The Polya tree prior is a generalization of the Dirichlet process in the sense explained below. a) A Dirichlet Process ’D(a) is a Polya tree w.r.t. any sequence of partitions H with 015 = a(B€), for all Be 6 H. b) A Polya tree PT(H, a)is a Dirichlet process if as =aeo + (151, for all possible values of e. The parameter a of the associated Dirichlet process is specified as a(B€) = 015. Posterior Distribution [Mauldin et. al.(1992), Theorem 4.3]. Let P E PT(H, a) and, given P, let X1, . .. ,Xn be a sample from P. Then the posterior distribution of P given X1,... ,Xn is PT(H, 0X1.....Xn) where as in a is replaced by (as + ETHX, E 85]) in 0X1.....Xn- Thus the Polya tree priors form a conjugate family of priors. Posterior distribution given incomplete/partial observations [Lavine (1994), page 1223]. One feature of interest to us is the fact that a Polya tree process permits easy posterior updating even in the presence of partial information. More precisely, let 23 P E PT(H, a) and given P, let X1, . .. ,Xn be a sample from P. Then the posterior distribution of P given { X1 6 B 61, . . . ,Xn E 36"} is again a Polya tree with respect to H, with as changing to a5 + 2’; I {B e‘ C Be}. [331313; In the case where we have some observations fully specified and some partially specified, the updating for the posterior is first done for the fully specified observations and then for the partially specified observations in an obvious way.] Bayes Estimates. Let P E PT(H, a) then the Bayes estimate (w.r.t. squared error loss) of P(Bq,...,cm) given a sample X1, X2, . .. ,Xn from P is, _ m 0(1""'¢i+2? [[XjEBcl,” "1.1 E(P(Bflyo.. ,fm)) _ H1 0‘1“”,(i_10+a(1....,¢i_11+2? [[XJEBclP'-n¢i_l]. As with the Dirichlet process, here also, if the (16’s are small (compared to n), the] Bayes estimate is close to the sample distribution function. This expression for the Bayes estimate also describes the predictive distribution of a future observation given X1, X2, . . . , Xn. Details of this can be found in Mauldin et a1. (1992). Posterior consistency. Calculations similar to that carried out for the Dirichlet process, also shows that the Polya tree priors lead to posteriors that are weakly consistent at all P0 6 M (R) But, unlike the Dirichlet process, the stronger k- consistency need not hold for the Polya tree priors. Remarks: The properties of Dirichlet process and Polya tree processes on M (R) that have been mentioned in the last two sections have obvious extension to M (R+) and M (R+ x {0, 1}), the two other spaces discussed in this dissertation. CHAPTER 2 Polya Tree Priors for Symmetric Distributions 2.1 Introduction and Summary In many semi-parametric inference problems, within a Bayesian formulation, iden- tifiability conditions requires the Bayesian to consider priors on the class of distribu- tions symmetric around an arbitrary point on the real line. A typical example is the location problem. Diaconis and Freedman (1986a, 1986b) consider the location prob- lem, using symmetrized Dirichlet process priors. The first paper and the subsequent discussions provide a good summary of the basic issues involved in such problems, and also elaborate on the need for families of rich priors on the class of symmetric distributions. More recently, Ghosal et a1. (1996), consider the same problem using using symmetrized Polya tree priors on distributions with symmetric densities. Dalal (1979) constructs a class of priors which are invariant under a finite group of transformations, using the Dirichlet process priors and calls them Dirichlet Invariant processes. In this chapter, we study priors on the class of symmetric distributions, using the Polya tree processes. We consider two natural methods (discussed below) of constructing priors on the class of symmetric distributions and compare the two 24 25 methods using Dirichlet processes and Polya tree processes. A prior IP on the class of all symmetric distributions on R, denoted by M 3(R), can be constructed in two natural ways. Method 1. For any prior (say Al) on M (R), the map P r—> Pf, defined by P(A) + P(—A) Pf(A) = 2 , for A E B(R) induces a prior on M3 (R). Method 2. For any prior (say A2)on M (R+), the map P r—> P3, defined by mm m A) P(R- n A) 2 + 2 P;(A) = ,for A E B(R+), induces a prior on M 5(R). Dalal (1979), looks at symmetrization using Method 1, with A1 :2 P(a), where a is a symmetric measure on R Using the transformation invariance property of Dirichlet processes, it can be verified [ Hannum and Hollander (1983), Theorem 2.1] that with A2 = D(2a+), the Method 2 symmetrization is equivalent to the (Method 1) symmetrization considered by Dalal (1979). In the next section we look at the two methods of symmetrization using analogous Polya tree priors and show that the two methods yield the same prior, iff the Polya tree processes being considered are Dirichlet processes. In the last section we consider (a version of the) posterior distributions under the two methods and establish the weak consistency for the sequence of posterior distributions. 2.2 Symmetrization using Polya tree processes In this section we study the two methods of symmetrizations using Polya trees that can be considered analogous to D(a) and D(2a+). With this in mind, we now 26 introduce notation that is crucial to our construction and results. Let, r;={B+ :cj=00r1forj=1,...,m}, €1,Iee6m where {Baum : 6,: 0 or 1 forj = 1,. .. ,m} is a partition of R+. Let Bar.” = —B;Mm, and let r; = {Be—Ive... 263°: 0 or 1forj=1,...,m}. Let H+ = {7r,‘,‘,:m= 1,2,...}, and H" = {7r;,:m= 1,2,...}and let H = 11+ 0 11'“. In Method 1 we take A] = PT(H, a), where a is a symmetric collection, i.e. under PT(H, a), P(B;0 B?) and P(Bg0 | B; ) both have the Beta(aeo, 0151) distribution. In Method 2 we take A2 = PT(H+, 20*), where 0+ has an obvious interpretation. In the remainder of this chapter A1 will always represent PT(H, a), and A2 will always represent PT(H+, 20“). Theorem 2.2.1 The priors induced on M S (R) by Method 1 (using A1 = PT(H, 0)) and Method 2 (using A2 = PT(H+, 20+)) are the same if and only if ac;,...,cm = acl,...,cm0 + ae1,...,cmla for all {€1,... ,cm} 6 {0, 1}"‘; for m =1,2,... and hence for Polya tree processes the two methods yield the some prior if and only if it is a Dirichlet process. Proof: The proof is by the principle of mathematical induction and uses elementary properties of the Beta distribution. We recall that if X ~ Beta(a, b), then E(X) = a E(X)2 = “In" 3753’ (a+b)(a+b+1) ' 27 If part: If the condition holds , then A1 = P(a), with the measure a given by a(B€) = ac for all Be 6 it, and A2 = D(2a+). The result now follows from the remarks made in the last section about symmetrization using Dirichlet process priors. Only if part: If the priors induced by the two methods on M S (R) are the same, then, EA.[P(B$,.2,...,.,.) + P(B.'},..,...,.,.)l2 = EAzIP(B;I,..,...,..)I2 for all €1,62,... ,6" E {0,1}m;form=1,2,... To avoid trivialities, we will assume that achmugn > 0, for all 61,612,. .. ,6" 6 {0,1}'". Then, EM [P(BJ) + P(Bo-llz = EIX1Y1 + (1 - X1)Y2)2, where X1 ~ Beta(a, a), Y, ~ Beta(a§, al)(i = 1, 2), and {X — 1, Y1, Y2} are independent. a(a +1) ao(ao +1) aa 00 2 “ I X ] 2 ———— x ( 2a(2a +1) (a0 + al)(ao + 01+ 1) 2a(2a +1) a0 + a, 1 C10 (0 +1)(ao +1) + aao 2a+1ao+al ao+al+1 ao+al Similarly, 00(200 + I) (Clo + al)(2ao + 201+1) EAzIP(B0+)I2 = Therefore, EA, [P(BJ) + P(Bfl]2 = E,\2[P(B{,I)]2, iff [(a +1)(ao +1) am, (2010 +1)(2a +1) (ao+a1+1) (10+01— Zao+2a1+l which in turn holds iff a 2 are + a1. 28 Let, ael,eg,...,£j —' ael,cg,...,cj0+ael,(2,...,eJli for 61,€2,... ,e, e {O,1}j;j=1,2,... ,(n—l). Then we will show that _ n ael,eg,...,en "' an,m,...,en0 + ae1,eg,...,enli for 61, 6.2, - - ' yen E {0:1} 1 by equating E,\,[P(B+ ) + P(B‘ )]2 and E)I.,.[P(B+ )]2. (1,62,...,€n0 €1,62,...,€n0 £1,62,...,cn0 EA1IP(B+ )+ P(B" )12 (1,62,...,cn0 (1,52,...,cn0 n __ 2[ at),62,...,cj (061,62,...,€j + 1) a(a+ 1) 20(2a +1)j=l (a61,62,...,£j_10 + 061,152,...,cj_1l)(a£1,€2,...,€j-10 + an,€2,...,cj_11 + 1) X ae1.62,...,cn0(ael,52,.., ,cnO + I) ] (061,62,...,cn0 + 061,62)":icnl)(a€1,€2r..,€n0 ‘I' 061.62,...,en1 + 1) a a 1 2[ ( + ) 2a(2a +1) ael,cz,... ,9 2 ael,(2,... ,cnO 2 ( ( ) (a€1,€2,---,€j—10 + 061.6%" .61—11) (Quay-”6N + animu- .tnl) i=1 1 061,62,...,£n (acl,eg,...,cn + 1)a€1,62,...,En0(a€1,€2,...,fno + 1) 2I2 2 l a( a + 1) (061,62,...,€n0 + 061,62,...,cnl)(a€1,£2,...,£n0 + a61,€2,...,£n1 + 1) [ 1 ( aq,€2,...,€na€1,€2,...,EnO )2] 20(20 "I' 1) aq,eg,...,cn0 + aq,q,...,enl (The second equality follows from the above hypothesis.) Similarly, (20161.62...~ .cn)(2061.62....,6n + 1) (200 + 2al)(2ao + 201 + 1) 20.1,...",.,.o(2a..,.2,... ,cnO + 1) (2051,52,...,e..0 + 2051,62,...,en1)(2ae1,62,...,en0 + 2a£1,c2,...,£n1 + 1) EAzIP(B:1-,c2,... ,cn0)I2 X 29 Using the above hypothesis again, we can conclude that, _ 2 2 . EA1[P(B:I,62,...,cn0) +P(B£1,€2,...,€n0I = EA2[P(B:I,62,...,en0)I ? lff (061162)"‘s6n + 1)(aC1,€2,---,Cn0 + 1) afli£2i"')€na51)€2r"16710 aei.ez,--..cn0 'l' “chem-meal + 1 aei.£2.--..cn0 + aet.£2.---,enl _ (2061,62,...,6n + 1)(2a€1,€2,...,€n0 + 1) ,cnO + 2051,62,...,en1 + 1 ’... which in turn holds iff 061,62,...,€n "' 061,62,...,en0 + ael,eg,...,enl 2.3 The posterior distribution and its consistency We observe that there is a 1-1 correspondence between MS (R) and M (R1), and will make use of this correspondence in the remamder of this section. With this in mind we briefly review the properties of this correspondence, that are relevant to our discussion. Let, a5 : MSUR) H M(RU. be defined as ¢(P)(A) = 2P(A), for A e M(R+). (t is 1-1 and onto. We will on occasion write P‘r for ¢(P) in the remainder of this section. Let u be a (prior) probability measure on M S (R), then the map (,6 induces the prior probability measure girl on M (R+). The following propositions summarize the important consequences of using the map (15 and consider the following set-up: 30 Let P ~ u, and given P, let {X1,X2, . .. ,Xn} be i.i.d P. Proposition 2.3.1 The posterior distribution of P given {X1,X2, . .. ,Xn}, is the same as the conditional distribution of u given {I X1 I, I X2 I, . .. , I X" I}, i. e. H('IX1,X2w°'aXn):h‘(°I IX1I9IX2IrH°aIXnIl [This follows from the fact that {I X1 I,I X2 I, . .. ,I Xn I} is a sufficient statistic for symmetric distributions] Proposition 2.3.2 p(- I I X1 I, I X2 I,... ,I X" I) = ”QS-IIQX') I I X1 I’I X2 I)" ’I X11 I) Proof: For notational convenience we will consider the case n =- 1. We need to show that for any B E B(R+), the measures #1, and M on M S (R), defined as me) = [13#¢‘1(¢(C)||X|)fl(dx) use) = [C 2P(_B)u(dP) are the same where C C MS(R), and fl(A) = E,,(P(I X IE A)). use) = [C2P(B)/1(dP) = [Common = / P(er‘wP) ¢(C) f3 #¢-1(¢(C) H X l)fl(d:v) = [11(0) 31 (The third equality above follows from the change of variable theorem, and the fourth equality follows from the definition of conditional distributions.) 0 Proposition 2.3.3 Let Po 6 M S (R), and let {un}n21 be probability measures on MSUR), then #1; => 5pc iff unit“ => 5P3“: Proof: For any f bounded continuous function on R, and any P E M S (R), we observe that (faunas) = I. (fungi—amp“) _ (f($) +f(-$)) — 2/R‘+ 2 dP(:1:) (f (x) + f (-~ar)_)_ 2 = I... rename), where f* (as) = = rumpus. IR+ Similarly we can show that for any f + bounded continuous function on IR“, there exist an f bounded continuous on R such that fw f +( :r )dQ( sz f (a2)(d>‘1Q)(:r ). Therefore with {{Pn}n21,P} C MS(R), Pu => P iff ¢(P,,) => ¢(P), and hence un 2? (Spa iff garb-1 => 6P8“ 0 In view of Propositions 2.3.1, 2.3.2, and 2.3.3, (and ease of notation), we will study the priors (and posteriors) induced by Method 1 and Method 2 on M (R+ ), rather than M S (R). We will see that even though the two methods are not always equivalent, the posteriors still are weakly consistent for both the methods. We recall that in Method 1, the prior on M(R“) is Al og‘1 where (go P)(A) = P(A) + P(—A), and in Method 2 the prior on M(R+) is A2, with A1 = PT(H, a) and A2 = PT(H'I', 206*). The next proposition follows from the properties of Polya tree priors mentioned in Chapter 1. 32 Proposition 2.3.4 Let P ~ A2, and given P, let {T1,T2,... ,Tn} be a ran- dom sample from P. The posterior distribution of P given {T1,T2,... ,Tn} is PT («1”, 2oz;l ,T2,...,T..): and further the posterior is consistent at all Q0 6 M (R+). Thus, in view of the comments made earlier, the Method 2 symmetrization using Polya tree priors yields (weakly) consistent posteriors. The more interesting result is that the Method 1 symmetrization also yields a (weakly) consistent sequence of posteriors. To establish this we first need to consider (a version of) the posterior distribution under Method 1. Theorem 2.3.1 Let P ~ Alog’l, and given P, let T1, T2, . . . , Tn be a random sample from P. The posterior distribution of P given {T1,T2, . .. , Tn}, is given as Ang—I('I711ajw2r° ' ' ’Tn) :- 2 PT(“’ a$1,32,u.,$n) o g_1(.)le 7T2r"'yTn ($1) $23 . . . ’1‘“), {x1,:r2,... ,xn:I$5I=T§} Pr(XleB:Fk ,xnestk) _ ' L n + . ' _ Where, fT1,T2,...,Tn(q:Tlr ' ' ' , :FTn) — lzmk PT(T1€B+k,---,Tn68+k) 7 and Bek '1' 7‘1 for z _ 61 n I 1,... ,n [Remarkz In an intuitive sense, fT1,T2,...,Tn (221,232, . .. ,a:,,) = PT(X1=$1,...,Xn=$nIIX1I=Tl,... rIXn I=Tfl)'I Proof: For any measurable C C M (R+), and B E B(R+"), let M(C) :2 [A1 og’1(C I T1,T2,... ,Tn)c‘i" og‘1(dT), 3 11(0) 2: ch"(B)PT(1r,a)og'1(dP), where d" o g‘1(B) = EpT(1r,a)og-1P”(B), and P"(B) = P(T E B), with T = {T1,T2, . .. ,Tn}, and T, ~""d' P. 33 To verify that Al 0 g‘1(- I T1,T2, . .. ,Tn) is indeed a version of the posterior , we need to verify that for every B 6 BUR”), the measures a, and u on M (R+), defined above are the same. In fact, it is enough to verify the same for B of the form B 2 Bl >< x B", where B,- E M(R’“). We observe that if B = 81 x -- - x B”, with B,- E M(R+), then 61" og"(B) = EpT(1r,a)og-1(P(Bll X X P(Bnll = EpT<1r,a)[((P(BT) + P(BID >< >< (P(BI) + P(B;))l- Let (Si,,((B,+ u Bf) x ... x (Bf u B;)) := Epr<1r,a)[((P(Bf) + P(Bfl) >< ~~~ >< (P(B:) + P(B;))}- It suffices to check that the moments of {P(B:,) : 6’ E {0, 1}"‘}, under the two measures u(-), and V() are the same when B = B, x - - . x B“, i.e., we want to verify that for positive integers r,, Edi—IRE; .)"l- — EulIfIH Hz.) "1. EVIIIP(B:.)"] = [(fimf) ))I:IP( (B+) r£1921 (,1: a)o g‘(dP) = / f1(P(Bf) + P(Bj')) II
34
Also,
2m
Bull—I P(B;,)r']
i=1
27"
= / If