LIBRARY

Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove thie checkout from your record.
TO AVOID FINES return on or rorbef edete due.

I DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

is?
-%i
%%i%

MSU le An Mﬁrmetive Action/Equal Opportunity Initiation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

POSTERIOR CONSISTENCY IN SOME
BAYESIAN N ONPARAMETRIC PROBLEMS

By

Sm'kanth K. Rajagopalan

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1997

ABSTRACT

POSTERIOR CONSISTENCY IN SOME
BAYESIAN N ONPARAMETRIC PROBLEMS

By

Sm'kanth K. Rajagopalan

Issues regarding posterior consistency in Bayesian inference are of interest both
to frquentists as well as Bayesians. In this dissertation we study different notions
of posterior consistency in some Bayesian nonparametric problems, using Dirichlet
process and Polya tree process priors.

The ﬁrst part of the dissertation deals with construction of priors (that yield
consistent posteriors) for the class of all distributions symmetric about a point. We
consider two natural methods of constructing priors for symmetric distributions, and
study the priors obtained by the two methods using Dirichlet processes and Polya
tree processes.

The second part deals with the Bayesian analysis of right censored data under a
nonparametric formulation. We study different Bayesian approaches to this problem
with emphasis on the approaches of Susarla and Van Ryzin (1976) and Tsai (1986),
who both use Dirichlet process priors. We establish the posterior consistency for both

the approaches and also generalize some of the results to include Polya tree priors as

well.

The Bayesian analysis of interval censored data (again under a nonparametric
formulation) is studied in the last part of the dissertation. This portion is rather
tentative and we mainly highlight the difficulties in trying to adapt the approaches
of Susarla and Van Ryzin (1976), and Tsai (1986) to this problem.

To Amma and Appa; Professor A. M. Goon; Arupda

iv

ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my dissertation advisor, Professor
R. V. Ramamoorthi, for his constant help, advice, encouragement, guidance, mentor-
ship and extreme patience. His caring personality, friendly nature, and excellent sense
of humour and wit made the whole doctoral experience enjoyable even during trying
circumstances.

I would also like to thank, Professors James Hannan, Joseph Gardiner, V. Man-
drekar and Habib Salehi for serving on my guidance committee, Professors Hannan
and Gardiner for their encouragement, suggestions and many helpful conversations,
and Professor V. Mandrekar for useful suggestions on improving the presentation. I
would also like to thank Professor J. K. Ghosh for many informal discussions and
suggestions.

I cannot thank my parents and sisters enough, for the support and encouragement
provided by them during my entire student life. This has been the main motivating
force behind all my endeavours and will always be fondly remembered and cherished.
I would also like to thank Professor A. M. Goon for the care and interest he showed
in my progress as a student, during my undergraduate days, and for the encourage-
ment to pursue graduate studies in Statistics. The help and guidance received from
Dr. Arup Kr. Pal during my student days at the Indian Statistical Institute was in-
strumental in igniting my interest in Probability theory and eventually Mathematical
Statistics. Last, but not the least, I am thankful to my Beloved Lord, Bhagavan Sri
Sathya Sai Baba for all His Love and Grace, without which, this would not have been
possible.

[Major portion of this research was supported by the National Institute of Health
Grant 1 R01 GM49374.]

TABLE OF CONTENTS

0 An overview 1

1 Preliminaries

1.1 General Bayesian inference and posterior consistency ..... 5
1.2 Probability measures on probability measures ......... 7
1.3 Topologies on the space of probability measures ........ 10
1.4 Convergence of probability measures and posterior consistency 11
1.5 Dirichlet processes ........................... 14
1.6 Polya tree processes .......................... 20
2 Polya Tree Priors for Symmetric Distributions 24
2.1 Introduction and Summary ..................... 24
2.2 Symmetrization using Polya tree processes ........... 25
2.3 The posterior distribution and its consistency ......... 29

3 Nonparametric Bayesian inference with right censored observations 38

3.1 Introduction and summary ..................... 38
3.2 Dirichlet process priors for F .................... 39
3.3 Priors on the distribution of the observables .......... 46

4 Nonparametric Bayesian inference with interval censored observa-

tions 50
4.1 Introduction and summary ..................... 50
4.2 Dirichlet process priors for F .................... 51
4.3 Priors on the distribution of the observables .......... 56
Bibliography 60

vi

CHAPTER 0

An overview

In any statistical experiment, data is collected following a probability model with an
unknown parameter 0, lying in a parameter space 9. The problem of statistical infer-
ence deals with drawing meaningful conclusions about 0, given the data. A Bayesian
would use a prior probability measure on 6, representing her / his prior belief / opinion.
Given the data, the posterior represents the updated belief/ opinion for the Bayesian.
Since all Bayes procedures are based on the posterior it is quite natural to require
that as more and more data become available, the posterior should concentrate more
and more around the true parameter. This idea is formalized as the notion of pos-
terior consistency, which has both Bayesian and frequentist interpretations. Priors
that yield consistent posteriors ensure that the data eventually swamps the prior and
opinions based on very different priors will merge as the data accumulates. Doob
(1948) proved a very general result on consistency, which guarantees that the poste-
rior will be consistent for all 0 except on a set of prior measure zero. When 6 is ﬁnite
dimensional, Freedman (1963), and Schwartz (1965), show that under fairly general
conditions, the posterior is consistent at all 0. Freedman (1963) also constructs an
example which shows that posterior consistency will not always hold when 8 is the
set of all probability measures on the space of positive integers.

Problems of statistical inference with an inﬁnite dimensional parameter space are

of great importance, both theoretically and practically. The Bayesian approach to
such nonparametric problems requires the study of (prior and posterior) probability
measures on the space of all probability distributions over a set. Freedman’s (1963)
example shows that posterior consistency may not always hold when 9 is inﬁnite
dimensional. Diaconis and Freedman (1986a) and the ensuing discussions highlight
the need for a careful study of posterior consistency in nonparametric and semi-
parametric problems. In this dissertation we focus on issues concerning different
notions of posterior consistency in some nonparametric problems within a Bayesian
formulation. Some of the problems that we study are made more complicated because
of the fact that we only have censored data.

In Chapter 1, we begin with an introduction to general Bayesian inference and
different notions of consistency. We then review and discuss some of the properties
of two important families of priors used in Bayesian nonparametrics, namely the
Dirichlet processes [Ferguson (1973)], and its generalization, the Polya tree processes
[Mauldin et a1. (1992), Lavine (1992, 1994)]. We also prove a convergence result for
Dirichlet processes that enables us to establish a strong form of consistency for the
posterior of a Dirichlet process.

Chapter 2 studies the problem of constructing a family of priors for problems
where the parameter set is the space of all distributions symmetric about an arbitrary
point on the real line, which we denote by M 3 (IR). This problem has been studied by
Dalal (1979), who constructs a class of priors using Dirichlet process priors, which has
been used in the context of the location problem by Diaconis and Freedman (1986).
We consider two natural methods of constructing a prior on M 5 (R), and study the
behaviour of the posterior under the two methods, using both Dirichlet processes and
Polya tree processes. We show that using appropriate Dirichlet processes, the two
methods yield the same prior on M 5 (1R), while using appropriate Polya tree processes

yield different priors on M 5 (1R), unless the Polya tree processes being considered

are Dirichlet processes. We also establish the posterior consistency for both the
approaches.

In Chapter 3, we consider two different approaches to Bayesian inference with
right censored data. Susarla and Van Ryzin (1976), ﬁrst considered this problem
in a Bayesian set-up by considering a Dirichlet process prior for F, the distribution
function of interest. They obtain a Bayes estimate and show that this estimate
converges to the usual product limit estimate of Kaplan and Meier (1958). Blum and
Susarla (1977), complemented this result by proving that the posterior distribution
given the right censored data is a mixture of Dirichlet processes. We show that the
posterior can be represented as a Polya tree process, a representation which clariﬁes
some of the calculations in Susarla and Van Ryzin (1976). Using this Polya tree
representation for the posterior, we then are able to establish the posterior consistency.
for this approach. Yet another approach to Bayesian inference with right censored
data, is to consider priors for the observable random variables as studied by Tsai
(1986), who considers a Dirichlet process prior for the distribution of the observable
random variables. Under this approach, using a result from Peterson(1977), we are
able to establish consistency of the posterior for a wide class of priors.

Chapter 4 is somewhat tentative. Here we consider the Bayesian analysis of the
interval censoring problem with single inspection time. We began this study with
a goal of obtaining a Bayesian interpretation of the well known Turnbull estimator
(1976), which can also be thought of as the nonparametric maximum likelihood esti-
mator (NPMLE). Similar to Chapter 3, here also we look at two different approaches.
We highlight the fact that approaches similar to the ones that yield interesting results
in the right censoring problem do not yield interesting results in this case. In the ﬁrst
approach we consider a Dirichlet process prior for F, the distribution of interest and
study the limiting behaviour of the Bayes estimate. As pointed out in Wang (1993),

the NPMLE is not necessarily the limit of the Bayes estimates. We present a set of

examples which show that no obvious relationship connects the limiting Bayes esti-
mate and the NPMLE. We also make an attempt to study consistency prOperties of
the posterior, when we consider priors for the distribution of the observable random
variables. Unfortunately, the result that we have in this context, though mathemati-

cally nice, is not statistically very useful.

CHAPTER 1

Preliminaries

1.1 General Bayesian inference and posterior con-

sistency

Consider a family of probability measures { Q9 : 6 E 6 } on a measurable space

(X, A). We view (6,8) as a measurable space such that Q9(A) is B—measurable
for every A e A. We write 623° for the product measure on X°° which makes the
coordinate random variables X1, X2, . . . , independent with common distribution Q9.
In general X and O are Borel subsets of complete separable metric spaces. (In this
dissertation X will either be the real line or the positive half line, and 9 will be the
set of all probability measures thereon.) Let p be a prior probability measure on O,

and let P” denote the joint distribution of the parameter and the data:

P..(B x A) = / QS°(A)u(d9)

for B E B, and A 6 A°°. The posterior is the P” -distribution of the parameter
0 given the data X1, X2, . .. ,Xn, and is formally deﬁned below. We denote this by
#n(' I X17 X2) ' ' ' )Xn)°

Deﬁnition 1.1.1 un(~ [ ) : B x X" -+ [0,1] is called a posterior distribution given
X12X27'“ ,Xn 2f,

1. For each (X1,X2, . .. ,Xn) E X",,u.n(- | X1,X2, . .. ,Xn) is aprobability measure
on (9,8).

2. For each B E B,u,,(B | ) is A" measurable

3. For every B E B,A EA"
P:(B X A) = / [Jn(B [ X1,X2,... ,Xn)dPn(X1,X2,... ,Xn),
A

where P:(B x A) = Pﬂ(B x (A x X°°)) and P"(A) = Pﬂe x A).

The posterior distribution is of course unique only up to P" null sets. In situations
we consider, there is a natural candidate for the posterior and we will generally refer
to it as ‘the posterior’.

(For the Bayesian) The posterior distribution encapsulates all that is known about
6 following the observation of the data X1,X2,... ,Xn, and one would want the
posterior to concentrate around the true value of the parameter as more and more
data become available. The main topic of study in this dissertation is the consistency
property of the posterior sequence {un(- | X1,X2, . .. ,Xn)}n21 in certain Bayesian
nonparametric settings. The sequence of posteriors {un(- | X1,X2, . .. ,Xn)},,21 is
said to be consistent at 00 E G if, whenever 00 is the true value of the parameter 9,
as observations accumulate the effect of the prior diminishes and the posterior gets
closer and closer to the ’true’ prior 690 - the degenerate prior at 00. (More formal
deﬁnitions of posterior consistency will be mentioned later.) Posterior consistency
has both Bayesian and frequentist interpretations, and for a detailed discussion of
this notion of consistency, especially in a nonparametric set-up, the interested reader

is referred to Diaconis and Freedman (1986a) [pages 3, 4, 10-20].

1.2 Probability measures on probability measures

Throughout this dissertation IR will denote the real line, B(lR) will denote the Borel o-
algebra of IR, and M (R) will denote the space of all probability measures on (IR, B(IR)).
Also, 1R+ will denote the positive half line, with B(IR+) and M (lR+) having an analo-
gous interpretation.

On M (IR), we consider the smallest o-algebra that makes the map P +—> P(B),
measurable for each Borel set B E BUR). We denote this o-algebra by BM, i.e.
BM 2 a{P(B) : B 6 B(IR)}. Since the elements of M (R) are functions on BUR)
taking values in [0,1], M (R) can be viewed as a subset of [0,1]B(R). If the product
space [0,1]B(Rl is equipped with the product o-algebra (the smallest o-algebra that
makes all the coordinate functions measurable), the restriction of this o—algebra to
M (R) is BM. However, M (IR) is not a measurable subset of [0,1]B(R). Therefore one
needs to be careful in constructing probability measures on M (IR). The following two
theorems implicit in Ferguson (1973), and mentioned in Ghosh and Ramamoorthi

(1996-97), give a way of constructing and deﬁning probability measures on M (IR)

Theorem 1.2.1 . Suppose for each collection {Bl,Bz, . . . ,Bk} of subsets of IR, a

distribution #3,,”qu is assigned for (P(Bl), . .. ,P(Bk)) such that:

1. If {A1,A2,... ,A,} C {Bl,Bz,... ,Bk}, then the marginal distribution of
(P(Al),~- ,P(Az)) derived from #3....3, 2'8 uA,,...,A,.

2. For every partition {Bl,Bg,... ,Bk} of IR, ughqu,‘ is a probability mea—
sure on S), = {(p1,...,pk) : p,- 2 0,21),- = 1} and further if A,- is

an union of sets from {Bl,Bg,... ,Bk}, then #A1,...,A.. 2 distribution of

(28,041 P(Bi)’ ° -- 123,04" P(Bi))

3. If An 1 (1), then P(An) J, 0 in distribution.

Then, there exists a probability measure it on M (R) such that the distribution of
(P(BI),--- ,P(Bk)) underu i3 #81,...,B,.-

[The proof is taken from Ghosh and Ramamoorthi (1996—97), and is mentioned
here for the sake of completeness]

Proof: Using 1. and 2., it follows from Klomogorov ’s consistency theorem that
there exists a probability measure on [0,1]3 with ﬁnite dimensional marginals given
by uB,,B,,...,3k. Since M (R) is not a measurable subset of [0,1]B, it is not easy to
show that this measure is supported by M (R) So we take an indirect route.

Let .77 be the set of all distribution functions on ’R, and let 7' be the restriction
of functions in f to a countable dense set Q, say the rationals. Then

.7: = {F: F is monotone, right continuous, lim¢_,_oo F(t) = 0,1imt—mo F(t) = 1 l

and

.7” = {F : F is monotone, right continuous on Q, limt_,_co F(t) = 0,
limHoo F (t) = 1 }.

Take any t1 < t2 - - - < tk in Q. Set the distribution of (F(tl), F(tz), . - - , F(tk)) as
the distribution of (P(—oo, t1],P(—oo,t2], - -- ,P(—oo,tk]). This assignment gives a
consistent speciﬁcation and hence there exists a probability measure a on [0, 1]Q with
these marginals. We now argue that ,u(f'*) = 1.

It is easy to see that for any ﬁxed t1 < t2,F(t1) < F (t2) with u probability 1.
Since Q is countable u{F : F is monotone in Q }=1. Condition 3. gives that F is
right continuous on Q with probability 1, so that u(f*) = 1.

Let the map 45 : f —+ .7” be the restriction of F in f to Q. Since this map is
1-1, onto and measurable, the probability on .7” can be transferred to a probabil-
ity measure on .7. Under this measure (P(Bl), P(Bg), - -- ,P(Bk) has the marginal

distribution uB,,B,,...,Bk whenever B,- is of the form (—00, t,], t,- E Q.

An usual induction argument shows that the statement holds for all borel sets.

<>

Theorem 1.2.2 stated below shows that it is enough to specify u 31,...,B,, for every

partition Bl, B2, . . . ,Bk of IR.
Theorem 1.2.2 Suppose the following two conditions hold:

1. For every ﬁnite partition B1, B2, . .. ,8), of IR, (P(Bl), . . . ,P(Bk)) has a distri-

bution uBthk on 5],.

2. If Bl, Bz, . .. ,Bk and A1, A2, . .. ,An are two partitions of R, such that each A,-

is a union of some Bjs, then tummy, is the same as the ”Bun-,3): distribution

n

0f(ZB,-cA1 P(Bilv - - - 3 23,04" P(Bill-

For any collection A1, A2, . . . , An of subsets of IR, take any partition
31,32, . .. ,Bk oflR such that each A,- is a union of some Bjs, and deﬁne HA1....,An
as the [1131,”qu distribution of (23,0,1 P(Bi), . .. , 2&0," P(B,)).

Then, {uA,,,,_,An: A,- E BUR), i = 1, 2, ..., n; n = I, 2, } satisﬁes condition

1 of Theorem 1.2.1.

Remarks: We will see later how Theorems 1.2.1 and 1.2.2 are used to deﬁne the
most commonly mentioned prior in Bayesian nonparametrics, called the Dirichlet
process. Another way of deﬁning a probability measure on M (R) is via probability
measures on the space of all probability measures on sequences, and the Polya tree

process discussed later is an example of one such prior.

10
1.3 Topologies on the space of probability mea-
sures

A major focus of this dissertation is on issues related to posterior consistency in
nonparametric problems. Thus the parameter space is M UR), and the sequence of
posteriors {un(- | X 1, X 2, . . . , Xn)}n21 is a sequence of probability measures on M UR).
Since the notion of consistency involves convergence of probability measures on M UR),
we next look at some of the commonly considered modes of convergence on M (R),
and later present the corresponding notions of convergence on the space of probability
measures on M (R)

Weak Topology: The ﬁrst notion we look at is weak convergence arising from

 

the usual weak convergence on M (R). We recall that on M (IR) weak convergence is
deﬁned as: Let {{Pn},,21, P} C M (IR) P" is said to converge weakly (or in the weak
topology) to P if f fdP" —+ f fdP for all bounded continuous functions f on R.

For any Po 6 M (IR), sets of the form

Up0={P:|/f,-dP—/f,-dPo|<6,-;i=1,...,k},

where for each i, f,- is a bounded continuous function on IR, constitute a base of open
neighbourhoods for P0 under the weak topology.

It is well known that under this topology M (IR) is a complete separable metric
space, with B M as its Borel o—algebra [Parthasarathy (1967), Chapter II, Section 6].

Kolmogorov Metric: The Kolmogorov metric on M (R) is deﬁned as follows:

 

511:“)an : suptelR I P(—OO,t] _ Q(—OO,t] I -

Interest on this metric stems from the 1-1 correspondence between probability

11

measures on IR and the cumulative distribution functions, and the Glivenko-Cantelli
theorem on convergence of empirical distribution functions. Under the metric dk,
M (IR) is neither separable not complete.

Total Variation Metric: The total variation metric dt on M (R) is deﬁned as

61¢”)an = supseam) [P(B) — 52(3) [-

This metric is uninteresting in the context of all of M (R). However, when the pa-

rameter space is restricted to subsets of M (R) of the form
L1(u) = {all probability measures onM (IR) dominated by a o — ﬁnite measure u},

(it is extremely useful, and has a nice form. For (P, Q} C L1(u),
dt(P, Q) = i; f | % — {73 | du. Further, L1(u) equipped with d, is a complete separable

metric space.

1.4 Convergence of probability measures and pos-
terior consistency

As noted earlier, M UR) when equipped with the weak convergence metric becomes
a complete separable metric space with BM as its Borel o-algebra. Thus a natural
topology on the space of probability measures on M (IR) is the weak topology arising

from this metric on M (R). A formal deﬁnition is given below:

Deﬁnition 1.4.1 A sequence of probability measures {an} on M UR) is said to con-

verge weakly to a probability measure it (on M (IR) ) if

/ ¢(P)dun(P) a / ¢(P)du(P),

12

for all bounded continuous functions (25 on M (IR), and we write an —->‘” ,u or an => u.

Under this convergence, the space of probability measures on M (IR) also becomes
a complete separable metric space [ Parthasarathy (1967), Chapter II, Section 6]. A
detailed study of weak convergence requires an understanding of the continuous (in
the weak topology) functions on M (IR) But, we will mainly be interested in the
case when u 2 (Spa, for some Po 6 M (IR) Since convergence in distribution of an
to 61:0, is equivalent to convergence in probability of Pn to P0, where Pn ~ pm, this
convergence can be described in terms of the continuous functions on IR rather than

those on M (IR), as mentioned in the following proposition.

Proposition 1.4.1 p" ——>‘” 6190 if un(Up0) —> 1 for all Upo of the form
Up0 = {P :| [fidP — ffidPo |< 6,;i = 1,... ,k}, where for each i f,- is a bounded.

continuous function on R.

The non separability of M (IR) with either the Kolmogorov metric dk or the total
variation metric dt prevents the induction of a natural topology on M (M (IR)), when
M (IR) is equipped with either dk or d,. However Proposition 1.4.1 still enables us to
speak of ’convergence’ of an to 6pc in the sense that as n -> 00, an concentrates more

and more around Po, and these are formally mentioned in the deﬁnitions below.

Deﬁnition 1.4.2 A sequence of probability measures {an} on M (R) is said to con-
verge to 6P0 on uniform ( total variation) neighbourhoods if

un(P : dt(P, P0) < 6) -—) 1, for all 6 > 0, and we write an —>‘ (Spa.

Deﬁnition 1.4.3 A sequence of probability measures {un} on M (R) is said to con—
verge to 6120 on k-neighbourhoods if un(P : dk(P, P0) < 6) —> 1, for all 6 > 0, and we

write an —>" 6,20.

Note: The last two notions of convergence provide for a stronger sense of conver-

gence than the weak convergence of Deﬁnition 1.4.1 (and Proposition 1.4.1).

13

Most of our discussion will focus on weak convergence and hence on Proposition
1.4.1. We will on occasion consider convergence in k-neighbourhoods. Convergence
on uniform neighbourhoods will in general not be relevant to our discussion.

We now formally deﬁne the notion of posterior consistency under the same set-up

mentioned in Section 1.1.

Deﬁnition 1.4.4 The sequence of posteriors {un(- ] X1,X2, . .. ,Xn)}n21 is said to
be

1. weakly consistent at P0, if {un(- | X1,X2, . .. ,Xn)} —->‘” 6P0 a.s. Po, and
2. k-consistent at P0, if {un(- I X1,X2, . .. ,Xn)} —>’° 6p0 a.s. Po, and

3. t-consistent at P0, if {,un(- | X1,X2, . .. ,Xn)} ——)‘ 6P0 a.s. Po.

We end this section by mentioning a result which is used quite a lot in proving
weak consistency of a sequence of posteriors. Throughout this dissertation for any
[1 E M (M (IR)), [1 E M (R) will denote the probability measure deﬁned as follows:
[i(A) = Eu(P(A)), for all A 6 BUR).

Proposition 1.4.2 Let {Mnlnzl C M (IR), be such that {ﬂnhzl is tight as a family
of probability measures on IR, then, {unhzl is a tight family of probability measures

(with respect to weak convergence) on M (IR)

Proof: The proof is along the same lines as that of Theorem 3.1 of Sethuraman and
Tiwari (1982), and is mentioned here for the sake of completeness.

Fix 6 > 0. By the tightness of {fin}an for every positive integer d there exists a
sequence of compact sets Kd in IR, such that supnﬂn(K§) S 336%.

For d = 1,2, . .. , let, Md 2 {P 6 M(IR) :P(K§) S ﬁ}, and let M = (‘1de. Then

by its very deﬁnition M is a compact subset of M (IR), in the weak topology. Further,

14

by Markov’s inequality,

#n(M§) S dEu. (P (1(5))
: dﬂn(K§)
66
d37r2
Hence, for any n = 1,2,... ,un(M) 3 2,15% = 6.
By Theorem 6.7, on page 47 of Parthasarathy (1967), this proves that { un}n21 is

tight. 0

In the next two sections we introduce the two families of priors that are used in
the problems considered in this dissertation, namely the Dirichlet processes and Polya

tree processes.

1 .5 Dirichlet processes

Dirichlet processes were formally introduced by Ferguson (1973, 1974), who mentions
many of its basic properties, and also applies it to a variety of nonparametric problems.
In the process a Bayesian interpretation for some of the commonly used nonparametric
procedures were provided for the ﬁrst time. Dirichlet processes arise naturally as an
inﬁnite dimensional analogue of the ﬁnite dimensional Dirichlet distribution, which
itself is the multivariate generalization of the Beta distribution. Here we restrict
ourselves to stating its deﬁnition and mentioning some of its basic properties. For a
detailed account we refer the interested reader to Ferguson (1973, 1974), Schervish

(1995), and Ghosh and Ramamoorthi (1997).

Deﬁnition 1.5.1 Let a be a ﬁnite non-null measure on (IR, B(IR)). A (prior)
probability measure P on M(IR) is said to be a Dirichlet process with parame-

ter (or base measure) a if, for every ﬁnite measurable partition {Bl,Bg,... ,Bk}

15

of IR, the random vector (P(Bl),P(Bg),... ,P(Bk)) has the Dirichlet distribution
’D{a(Bl),a(Bg), . .. ,a(Bk)) under H’.

In particular, for any A E BUR), P(A) has the Beta distribution B(a(A), aUR) —a(A))
under IP. So, Ep(a)(P(A)) = 2&— is the ‘prior’ guess for P(A).

We view the Dirichlet Process as choosing a probability P randomly according to
D(a) and write it as P E P(a).

The existence of the Dirichlet process can be established using Theorems 1.2.1
and 1.2.2 mentioned earlier. A very clever- and elegant construction of the Dirichlet
process is given by Sethuraman (1994) and is mentioned in the next theorem. This
construction gives an insight into some of the peculiarities of the Dirichlet process,

and is an extremely useful tool for simulation purposes. We will make use of this

construction in Chapter III.

Theorem 1.5.1 Let a be a ﬁnite non—null measure on (IR, B(IR)). Let {Yn}n21 be
an i.i.d. sequence of random variables with Y1 ~ 61, and let {9n},,21 be an i.i.d.
sequence of random variables with 01 ~ Beta(1, a(IR)), and let {Yn}n21 and {0n}n21
be independent. Deﬁne P1 = 01, and for n 2 2, Pn = 0,, HTIU — 0,). Then,

H’ = 2‘,” 13,63»). is a Dirichlet process with parameter a.

Support [Ferguson (1974), Facts 2. and 3.].
1. If P 6 D(a), then with probability one P is discrete.

2. The topological support (that is, the smallest closed set with probability one)
of D(a), w. r. t. the topology of weak convergence is the set of all distributions

whose (topological) support is contained in the (topological) support of a.

Thus, even though the measure theoretic support is ’small’, the topological sup-

port is fairly large. For example, if the (topological) support of a is R then the

16

(topological) support of P(a) is all of M (IR), and ’D(a) gives positive mass to every
open set in M (IR)

Posterior Distribution [Theorem 1. Ferguson (1973)]. Let P E D(a). If, given
P, X1,X2, . .. ,Xﬂ is a sample from P, then the posterior distribution of P given
X1,X2,... ,Xn is P(a + 2'; 6;“), where 6,, is the measure giving mass one to 1:.
Thus just like the (ﬁnite dimensional) Dirichlet distribution priors for the vector
of proportions in a multinomial model, the Dirichlet processes provide a conjugate
family of priors for M(IR).

Predictive Distribution and Bayes Estimates. Let P E P(a) and let

 

(i(-) = 016%. The Bayes estimate (w. r. t. squared error loss) of P(A), given a sample

X1,X2,... ,Xn from P, is

FAA) = E(P(A) |X1,X2,... ,Xn) = 197.5104) + (1 -pn)Fn(A),

where Fn(-) denotes the sample (empirical) distribution and p1, = ﬁ%.

The Bayes estimate F}, is thus a linear combination of ii and the sample distri-
bution function F". This Bayes estimate can also be looked upon as the ‘Predictive
distribution’ of a future observation given X1, X2, . . . , X”.

Convergence of Dirichlet processes. The Dirichlet process possesses nice continu-

 

ity properties with respect to the base measure a. In Propositions 1.5.1 and 1.5.2 we
mention two such properties, of which Proposition 1.5.1 is well known, while Propo-
sition 1.5.2 is a new result. (Throughout ‘=>’ will denote weak convergence and all
convergences are as ‘11 goes to 00’.)

Also we will write or to denote the probability measure deﬁned as

&(A) = Ep(a)(P(A)), for all A E BUR).

Proposition 1.5.1 Let an, for n = 1, 2, be ﬁnite non-null measures on IR such

that 61,, => Po, (where Po 6 M(IR)) and anUR) —+ 00, then D{a,,) => 6pc.

17

[We mention the proof to illustrate the general principles behind the weak con-
vergence results proved in this dissertation]

Proof:

By Proposition 1.4.2, since {(513.21 is tight {D(a,,)},,21, is a tight family of prob-
ability measures.

Let f be a bounded continuous function on R“, with compact support. It is

enough to show that ’D(a,,)(Vé’o) —> 1, where

V30={Pz|/fdP—/fdpol< 5}.

f bounded continuous, with compact support, implies that there exists a simple
function f5 = 2le aiIA,, such that A,’s are Po continuity sets, and

sup: | f (x) —- f5(a:) |< %. Noting that

I/fdP—ffdPol s I/f.dP—/f.dpo|+?3—", and

k
/ fadP = Za,P(A,-).
i=1

Our proof will be complete if we can show that Ep(a,,)(P(A,-) — P0(A,-))2 —) 0,
and this follows from the fact that an(A,) ——) P0(A,-), and (07,,(A,-))2 —+ (P0(A,-))2.
0

Proposition 1.5.2 . Let an, for n = 1, 2, be ﬁnite non-null measures on IR such

that
1. anUR) —-> oo,

2' suptEIR I 61,,(—oo,t] _ Po(—OO,t] I—) 0:

and suptER I (1,,(—oo, t) — P0(—oo, t) [—> 0,

18

then ’D(a,,) —>" 6120.

Proof: For any P E M(IR), let P(t) = P(—oo, t], and let

P(t-) = P(—oo,t). We need to show that for any 6 > 0,

P(an)(P : SUP I P(t) - P00) IZ 6) -+ 0-

tEIR

Let m be a ﬁxed positive integer. Let ¢(u) :2 in f{;r : Po(:c) 2 u}, and let
xmyc :2 ¢(k/m), for k =1,2,... ,m. We observe that
P0(¢(u)—) S u S Po(¢(u)), and hence, P0(:rm,1—) S 1/m,
P0(a:m,m-1) Z (1 — l/m), and for 2 S k S m, (Po($m,k—) — Po(17m,k—1))S 1/m.

Let 1 S k S m — 1. For mm,k_1 S t < :cmyc,
I P(t) - P005) ISI P(Im,k—) — Po($m,k—1) I V I P(meJc—I) " P0(93m,k") I,
and for t 2 rm,m_1,
|P(t) - P0(t) IS (1 - Po($m,m—1))V (1 - P(mmm—ll-
Therefore suptem | P(t) — P0(t) |S Bm, where
Bm = ma$k{Bm,k} V (1 — Po(:cm,m_1)) V (1 — P(xm,m_1)), and

Bch = {I P(xm,k_) — P0($m,k—-1) I V I P(mch—l) '— P0(xm,k_') I}

Hence,

D(an)(P : suptelg | P(t) — P0(t) [2 e) S D(a,,)(P : Bm 2 c).

19

Let 6 > 0, and let Nm be such that, for all n 2 Nm,

6
twill I l o( l I 2m
- 6
8:3: I an(t) -Po(t) | < 51;, and
01,,(R) > 962'

By Markov’s inequality, and our choice of xmycs,

D(anlU’ =I P(mm,k-) - Po($m,k--1) |.>_ 6)
(P(xmjc") — P0($m,k-l))2

 

S ED(On) 62

< i(P(:c —)—P(:c ))2+-§§—]f0 alln>N

_ £2 0 m,Ic 0 m,k-1 2m 1 I' _ m
1 1 66.

S

62 m2 2m

The second inequality above follows from our choice of Nm and from the fact that for

any ﬁnite non-null measure a on R, and t 6 R,

 

- 1
Ev(a)(P(t—))2 = a(t—) x a“: :7
and
Ev(o)( (t-I) =5t(t—)
Hence,

DION”) 1 suptEIR I P(t) — Pom I2 6)

|/\

P(an)(P : Bm _>_ 6)

|/\

2124.7:— + 66], for all n _>_ Nm-

20

Since 6 > 0 is arbitrary and the last mentioned inequality holds for all m,
”(an)(P = suntan: | P(t) - Po(t) I2 6) -+ 0-
<>

Posterior consistency: It is well known that a P(a) prior leads to a posterior

 

that is weakly consistent at all Po 6 M (R) (This fact follows on observing that the
posterior for ’D(a) given X1, X2, . . . ,Xn is P(a + 2;, 6,“), and then taking
a,, = P(a + 2;, 6X) in Proposition 1.5.1.) The next theorem mentions that in this

case, the posterior has the stronger k-consistency property.

Theorem 1.5.2 Let P E P(a) , and given P, let X1,X2, . .. ,Xn be a sample from
P. Then D(O)(Upo | X1,X2,... ,Xn) —> 1 as. Po, for all k-neighbourhoods Up0 of
P0. '

Proof: We observe that the posterior for P(a) given X1,X2, . .. ,Xn is
D(a + 23:16)“). The proof now follows from Proposition 1.5.2 by taking

an 2 a + 2;, 6X“ and a simple application of the Glivenko-Cantelli theorem. 0

Critics of the Dirichlet process point to the fact that with probability one, the
Dirichlet process selects a discrete distribution, as its major shortcoming. Polya tree
processes discussed in the next section, is a family of workable priors which overcome

this drawback of the Dirichlet process.

1.6 Polya tree processes

Polya tree priors (or Polya tree processes) are a generalization of Dirichlet processes,
and share many of the properties of the Dirichlet processes. These processes are de-
scribed through a large number of parameters and a suitable choice of these param-

eters allows the statistician to overcome some of the shortcomings of the Dirichlet

21

processes. Here also we mention only some of its basic properties, and for a de-
tailed account, the interested reader is referred to Lavine (1992, 1994), Mauldin et.al.
(1992), Schervish (1995), and Ghosh and Ramamoorthi (1997).

Let r0 = R and II = {rm;m = 0,1, ...... }, where ro,7r1, . .. , is sequence of par-
titions of R such that B = o(U3°7rm) and such that every B E rrm+1 is an inter-
val and is obtained by splitting some 3’ E rm into two pieces. Let 8,, = R and
let rm ={B€,,_._,€m : e,- = 0 or 1 forj = 1, ..., m} and let Bel....,em0 E rm“ and

B,,,_,,,.m1 6 rm+1 be the two pieces into which B..,...,.,,, is split.

Deﬁnition 1.6.1 A random probability measure P on (R, B(R)) is said to have a
Polya tree distribution or a Polya tree prior with parameters (H, a) and we write P

6 PT (H, a), if there exists a collection of non-negative numbers

a={a€,,,__,cm :ej=00r1,forj=1,... ,m;m=1, 2, ...}

such that the following hold:

1. {P(B€,,,_,,€mg | B.,,,,,,€m).'ej = 0 or 1 forj = 1, ..., m; m = 1, 2, ...} are

independent random variables.
2. P(B€,,__,,.mo | Buruﬁm) has the beta distribution B{a5,,,_,,emo, a£,,,,,,€m1).

Polya tree priors seem to have their origin in Blackwell (1973), and Ferguson (1974,
page 620) (even though both Blackwell and Ferguson do not use the phrase ‘Polya tree
priors’), and recently Lavine (1992, 1994) and Mauldin et al.(1992) investigate some
of their interesting properties and set the course for their use in Bayesian analysis.

Existence. The existence of Polya tree processes can be shown by ﬁrst realizing
it as a prior on the space of probabilities on the sequence space {0, 1}N and then
transfering it to M(R). A more elegant way is to use de Finnetti’s theorem. We refer

the reader to Mauldin et al.(1992) for a discussion of these issues.

22

Support. The support of a Polya tree process is controlled by the choice of the
parameters a and of course the partitions H. Mauldin et a1. (1992) give sufﬁcient
conditions for the Polya tree prior to give mass one to the space of all continuous
probability distributions. If for simplicity we consider the Polya tree prior for (0, 1]

with rm = { ((92—71)), (215)] : i = 1, ..., 2m} - the set of all dyadic intervals of length

24;, and take ammem 2 m2, the resulting Polya tree will be absolutely continuous

with probability one. This feature of Polya tree priors make it more attractive as a
prior especially in the context of density estimation problems. Lavine (1992, 1994)
have a discussion on the implications and interpretations of various choices of the
partitions H and the non negative numbers a.

From now on to avoid cumbersome notation we will write Be for B6,”, ,6," and as
for durum unless it is very important to write otherwise. I

Connection to Dirichlet process [Lavine (1994). Fact 2.]. The Polya tree prior is

 

a generalization of the Dirichlet process in the sense explained below.

a) A Dirichlet Process ’D(a) is a Polya tree w.r.t. any sequence of partitions H
with 015 = a(B€), for all Be 6 H.

b) A Polya tree PT(H, a)is a Dirichlet process if as =aeo + (151, for all possible
values of e. The parameter a of the associated Dirichlet process is speciﬁed as a(B€)
= 015.

Posterior Distribution [Mauldin et. al.(1992), Theorem 4.3]. Let P E PT(H, a)
and, given P, let X1, . .. ,Xn be a sample from P. Then the posterior distribution
of P given X1,... ,Xn is PT(H, 0X1.....Xn) where as in a is replaced by (as +
ETHX, E 85]) in 0X1.....Xn- Thus the Polya tree priors form a conjugate family of
priors.

Posterior distribution given incomplete/partial observations [Lavine (1994), page

 

1223]. One feature of interest to us is the fact that a Polya tree process permits easy

posterior updating even in the presence of partial information. More precisely, let

23

P E PT(H, a) and given P, let X1, . .. ,Xn be a sample from P. Then the posterior
distribution of P given { X1 6 B 61, . . . ,Xn E 36"} is again a Polya tree with respect
to H, with as changing to a5 + 2’; I {B e‘ C Be}.

[331313; In the case where we have some observations fully speciﬁed and some
partially speciﬁed, the updating for the posterior is ﬁrst done for the fully speciﬁed
observations and then for the partially speciﬁed observations in an obvious way.]

Bayes Estimates. Let P E PT(H, a) then the Bayes estimate (w.r.t. squared

 

error loss) of P(Bq,...,cm) given a sample X1, X2, . .. ,Xn from P is,

 

_ m 0(1""'¢i+2? [[XjEBcl,” "1.1
E(P(Bflyo.. ,fm)) _ H1 0‘1“”,(i_10+a(1....,¢i_11+2? [[XJEBclP'-n¢i_l].

As with the Dirichlet process, here also, if the (16’s are small (compared to n), the]
Bayes estimate is close to the sample distribution function. This expression for the
Bayes estimate also describes the predictive distribution of a future observation given
X1, X2, . . . , Xn. Details of this can be found in Mauldin et a1. (1992).

Posterior consistency. Calculations similar to that carried out for the Dirichlet

 

process, also shows that the Polya tree priors lead to posteriors that are weakly
consistent at all P0 6 M (R) But, unlike the Dirichlet process, the stronger k-
consistency need not hold for the Polya tree priors.

Remarks: The properties of Dirichlet process and Polya tree processes on M (R)
that have been mentioned in the last two sections have obvious extension to M (R+)

and M (R+ x {0, 1}), the two other spaces discussed in this dissertation.

CHAPTER 2

Polya Tree Priors for Symmetric

Distributions

2.1 Introduction and Summary

In many semi-parametric inference problems, within a Bayesian formulation, iden-
tiﬁability conditions requires the Bayesian to consider priors on the class of distribu-
tions symmetric around an arbitrary point on the real line. A typical example is the
location problem. Diaconis and Freedman (1986a, 1986b) consider the location prob-
lem, using symmetrized Dirichlet process priors. The ﬁrst paper and the subsequent
discussions provide a good summary of the basic issues involved in such problems,
and also elaborate on the need for families of rich priors on the class of symmetric
distributions. More recently, Ghosal et a1. (1996), consider the same problem using
using symmetrized Polya tree priors on distributions with symmetric densities.

Dalal (1979) constructs a class of priors which are invariant under a ﬁnite group of
transformations, using the Dirichlet process priors and calls them Dirichlet Invariant
processes. In this chapter, we study priors on the class of symmetric distributions,
using the Polya tree processes. We consider two natural methods (discussed below)

of constructing priors on the class of symmetric distributions and compare the two

24

25

methods using Dirichlet processes and Polya tree processes.
A prior IP on the class of all symmetric distributions on R, denoted by M 3(R),
can be constructed in two natural ways.
Method 1. For any prior (say Al) on M (R), the map P r—> Pf, deﬁned by
P(A) + P(—A)

Pf(A) = 2 , for A E B(R)

 

induces a prior on M3 (R).

Method 2. For any prior (say A2)on M (R+), the map P r—> P3, deﬁned by

mm m A) P(R- n A)
2 + 2

 

P;(A) = ,for A E B(R+),

induces a prior on M 5(R).

Dalal (1979), looks at symmetrization using Method 1, with A1 :2 P(a), where
a is a symmetric measure on R Using the transformation invariance property of
Dirichlet processes, it can be veriﬁed [ Hannum and Hollander (1983), Theorem 2.1]
that with A2 = D(2a+), the Method 2 symmetrization is equivalent to the (Method
1) symmetrization considered by Dalal (1979).

In the next section we look at the two methods of symmetrization using analogous
Polya tree priors and show that the two methods yield the same prior, iff the Polya tree
processes being considered are Dirichlet processes. In the last section we consider (a
version of the) posterior distributions under the two methods and establish the weak

consistency for the sequence of posterior distributions.

2.2 Symmetrization using Polya tree processes

In this section we study the two methods of symmetrizations using Polya trees that

can be considered analogous to D(a) and D(2a+). With this in mind, we now

26

introduce notation that is crucial to our construction and results. Let,

r;={B+ :cj=00r1forj=1,...,m},

€1,Iee6m

where {Baum : 6,: 0 or 1 forj = 1,. .. ,m} is a partition of R+. Let
Bar.” = —B;Mm, and let
r; = {Be—Ive... 263°: 0 or 1forj=1,...,m}.

Let H+ = {7r,‘,‘,:m= 1,2,...}, and H" = {7r;,:m= 1,2,...}and let

H = 11+ 0 11'“.
In Method 1 we take A] = PT(H, a), where a is a symmetric collection, i.e. under

PT(H, a), P(B;0 B?) and P(Bg0 | B; ) both have the Beta(aeo, 0151) distribution.
In Method 2 we take A2 = PT(H+, 20*), where 0+ has an obvious interpretation.
In the remainder of this chapter A1 will always represent PT(H, a), and A2 will

always represent PT(H+, 20“).

Theorem 2.2.1 The priors induced on M S (R) by Method 1 (using A1 = PT(H, 0))
and Method 2 (using A2 = PT(H+, 20+)) are the same if and only if

ac;,...,cm = acl,...,cm0 + ae1,...,cmla

for all {€1,... ,cm} 6 {0, 1}"‘; for m =1,2,...

and hence for Polya tree processes the two methods yield the some prior if and only

if it is a Dirichlet process.

Proof: The proof is by the principle of mathematical induction and uses elementary
properties of the Beta distribution. We recall that if
X ~ Beta(a, b), then E(X) = a E(X)2 = “In"

3753’ (a+b)(a+b+1) '

27

If part: If the condition holds , then A1 = P(a), with the measure a given
by a(B€) = ac for all Be 6 it, and A2 = D(2a+). The result now follows from the
remarks made in the last section about symmetrization using Dirichlet process priors.

Only if part: If the priors induced by the two methods on M S (R) are the same,

then,

EA.[P(B$,.2,...,.,.) + P(B.'},..,...,.,.)l2 = EAzIP(B;I,..,...,..)I2

for all €1,62,... ,6" E {0,1}m;form=1,2,...

To avoid trivialities, we will assume that

achmugn > 0, for all 61,612,. .. ,6" 6 {0,1}'". Then,

EM [P(BJ) + P(Bo-llz
= EIX1Y1 + (1 - X1)Y2)2,
where X1 ~ Beta(a, a), Y, ~ Beta(a§, al)(i = 1, 2),

and {X — 1, Y1, Y2} are independent.
a(a +1) ao(ao +1) aa 00 2
“ I X ] 2 ———— x (
2a(2a +1) (a0 + al)(ao + 01+ 1) 2a(2a +1) a0 + a,
1 C10 (0 +1)(ao +1) + aao
2a+1ao+al ao+al+1 ao+al

 

Similarly,
00(200 + I)

(Clo + al)(2ao + 201+1)

 

EAzIP(B0+)I2 =
Therefore, EA, [P(BJ) + P(Bﬂ]2 = E,\2[P(B{,I)]2, iff

[(a +1)(ao +1) am, (2010 +1)(2a +1)

(ao+a1+1) (10+01— Zao+2a1+l

 

which in turn holds iff a 2 are + a1.

28

Let,

ael,eg,...,£j —' ael,cg,...,cj0+ael,(2,...,eJli

for 61,€2,... ,e, e {O,1}j;j=1,2,... ,(n—l).

Then we will show that

_ n
ael,eg,...,en "' an,m,...,en0 + ae1,eg,...,enli for 61, 6.2, - - ' yen E {0:1} 1

by equating E,\,[P(B+ ) + P(B‘ )]2 and E)I.,.[P(B+ )]2.

(1,62,...,€n0 €1,62,...,€n0 £1,62,...,cn0

EA1IP(B+ )+ P(B" )12

(1,62,...,cn0 (1,52,...,cn0
n
__ 2[ at),62,...,cj (061,62,...,€j + 1)

a(a+ 1)

20(2a +1)j=l (a61,62,...,£j_10 + 061,152,...,cj_1l)(a£1,€2,...,€j-10 + an,€2,...,cj_11 + 1)

 

X ae1.62,...,cn0(ael,52,.., ,cnO + I) ]
(061,62,...,cn0 + 061,62)":icnl)(a€1,€2r..,€n0 ‘I' 061.62,...,en1 + 1)
a a 1

2[ ( + )

2a(2a +1)

 

ael,cz,... ,9 2 ael,(2,... ,cnO 2
( ( )
(a€1,€2,---,€j—10 + 061.6%" .61—11) (Quay-”6N + animu- .tnl)

 

 

i=1
1 061,62,...,£n (acl,eg,...,cn + 1)a€1,62,...,En0(a€1,€2,...,fno + 1)
2I2 2 l
a( a + 1) (061,62,...,€n0 + 061,62,...,cnl)(a€1,£2,...,£n0 + a61,€2,...,£n1 + 1)
[ 1 ( aq,€2,...,€na€1,€2,...,EnO )2]
20(20 "I' 1) aq,eg,...,cn0 + aq,q,...,enl

 

 

(The second equality follows from the above hypothesis.)

Similarly,

(20161.62...~ .cn)(2061.62....,6n + 1)
(200 + 2al)(2ao + 201 + 1)
20.1,...",.,.o(2a..,.2,... ,cnO + 1)
(2051,52,...,e..0 + 2051,62,...,en1)(2ae1,62,...,en0 + 2a£1,c2,...,£n1 + 1)

 

EAzIP(B:1-,c2,... ,cn0)I2

X

 

29

Using the above hypothesis again, we can conclude that,

 

 

_ 2 2 .
EA1[P(B:I,62,...,cn0) +P(B£1,€2,...,€n0I = EA2[P(B:I,62,...,en0)I ? lff
(061162)"‘s6n + 1)(aC1,€2,---,Cn0 + 1) afli£2i"')€na51)€2r"16710
aei.ez,--..cn0 'l' “chem-meal + 1 aei.£2.--..cn0 + aet.£2.---,enl

_ (2061,62,...,6n + 1)(2a€1,€2,...,€n0 + 1)
,cnO + 2051,62,...,en1 + 1

 

’...

which in turn holds iff

061,62,...,€n "' 061,62,...,en0 + ael,eg,...,enl

2.3 The posterior distribution and its consistency

We observe that there is a 1-1 correspondence between MS (R) and M (R1), and
will make use of this correspondence in the remamder of this section. With this in
mind we brieﬂy review the properties of this correspondence, that are relevant to our
discussion.
Let,
a5 : MSUR) H M(RU.

be deﬁned as ¢(P)(A) = 2P(A), for A e M(R+). (t is 1-1 and onto. We will on
occasion write P‘r for ¢(P) in the remainder of this section.

Let u be a (prior) probability measure on M S (R), then the map (,6 induces the
prior probability measure girl on M (R+). The following propositions summarize

the important consequences of using the map (15 and consider the following set-up:

30
Let P ~ u, and given P, let {X1,X2, . .. ,Xn} be i.i.d P.
Proposition 2.3.1 The posterior distribution of P given {X1,X2, . .. ,Xn}, is the
same as the conditional distribution of u given {I X1 I, I X2 I, . .. , I X" I}, i. e.

H('IX1,X2w°'aXn):h‘(°I IX1I9IX2IrH°aIXnIl

[This follows from the fact that {I X1 I,I X2 I, . .. ,I Xn I} is a sufﬁcient statistic for

symmetric distributions]

Proposition 2.3.2 p(- I I X1 I, I X2 I,... ,I X" I) =
”QS-IIQX') I I X1 I’I X2 I)" ’I X11 I)

Proof: For notational convenience we will consider the case n =- 1. We need to

show that for any B E B(R+), the measures #1, and M on M S (R), deﬁned as

me) = [13#¢‘1(¢(C)||X|)ﬂ(dx)

use) = [C 2P(_B)u(dP)

are the same where C C MS(R), and ﬂ(A) = E,,(P(I X IE A)).

use) = [C2P(B)/1(dP)
= [Common
= / P(er‘wP)
¢(C)

f3 #¢-1(¢(C) H X l)ﬂ(d:v)

= [11(0)

31

(The third equality above follows from the change of variable theorem, and the fourth

equality follows from the deﬁnition of conditional distributions.) 0

Proposition 2.3.3 Let Po 6 M S (R), and let {un}n21 be probability measures on
MSUR), then #1; => 5pc iff unit“ => 5P3“:

Proof: For any f bounded continuous function on R, and any P E M S (R), we

observe that

(faunas) = I. (fungi—amp“)

_ (f($) +f(-$))
— 2/R‘+ 2 dP(:1:)

 

 

(f (x) + f (-~ar)_)_
2

 

= I... rename), where f* (as) =
= rumpus.

IR+

Similarly we can show that for any f + bounded continuous function on IR“, there
exist an f bounded continuous on R such that fw f +( :r )dQ( sz f (a2)(d>‘1Q)(:r ).
Therefore with {{Pn}n21,P} C MS(R), Pu => P iff ¢(P,,) => ¢(P), and hence
un 2? (Spa iff garb-1 => 6P8“
0

In view of Propositions 2.3.1, 2.3.2, and 2.3.3, (and ease of notation), we will study
the priors (and posteriors) induced by Method 1 and Method 2 on M (R+ ), rather than
M S (R). We will see that even though the two methods are not always equivalent, the
posteriors still are weakly consistent for both the methods. We recall that in Method
1, the prior on M(R“) is Al og‘1 where (go P)(A) = P(A) + P(—A), and in Method
2 the prior on M(R+) is A2, with A1 = PT(H, a) and A2 = PT(H'I', 206*).

The next proposition follows from the properties of Polya tree priors mentioned

in Chapter 1.

32

Proposition 2.3.4 Let P ~ A2, and given P, let {T1,T2,... ,Tn} be a ran-
dom sample from P. The posterior distribution of P given {T1,T2,... ,Tn} is

PT («1”, 2oz;l ,T2,...,T..): and further the posterior is consistent at all Q0 6 M (R+).

Thus, in view of the comments made earlier, the Method 2 symmetrization using
Polya tree priors yields (weakly) consistent posteriors. The more interesting result
is that the Method 1 symmetrization also yields a (weakly) consistent sequence of
posteriors. To establish this we ﬁrst need to consider (a version of) the posterior

distribution under Method 1.

Theorem 2.3.1 Let P ~ Alog’l, and given P, let T1, T2, . . . , Tn be a random sample

from P. The posterior distribution of P given {T1,T2, . .. , Tn}, is given as

Ang—I('I711ajw2r° ' ' ’Tn)

:- 2 PT(“’ a$1,32,u.,$n) o g_1(.)le 7T2r"'yTn ($1) $23 . . . ’1‘“),

{x1,:r2,... ,xn:I$5I=T§}

 

Pr(XleB:Fk ,xnestk)
_ ' L n + . ' _
Where, fT1,T2,...,Tn(q:Tlr ' ' ' , :FTn) — lzmk PT(T1€B+k,---,Tn68+k) 7 and Bek '1' 7‘1 for z _
61 n I
1,... ,n
[Remarkz In an intuitive sense, fT1,T2,...,Tn (221,232, . .. ,a:,,) =
PT(X1=$1,...,Xn=$nIIX1I=Tl,... rIXn I=Tﬂ)'I

Proof: For any measurable C C M (R+), and B E B(R+"), let

M(C) :2 [A1 og’1(C I T1,T2,... ,Tn)c‘i" og‘1(dT),
3

11(0) 2: ch"(B)PT(1r,a)og'1(dP),

where d" o g‘1(B) = EpT(1r,a)og-1P”(B), and P"(B) = P(T E B), with
T = {T1,T2, . .. ,Tn}, and T, ~""d' P.

33

To verify that Al 0 g‘1(- I T1,T2, . .. ,Tn) is indeed a version of the posterior ,
we need to verify that for every B 6 BUR”), the measures a, and u on M (R+),
deﬁned above are the same. In fact, it is enough to verify the same for B of the form
B 2 Bl >< x B", where B,- E M(R’“).

We observe that if B = 81 x -- - x B”, with B,- E M(R+), then

61" og"(B) = EpT(1r,a)og-1(P(Bll X X P(Bnll

= EpT<1r,a)[((P(BT) + P(BID >< >< (P(BI) + P(B;))l-

Let (Si,,((B,+ u Bf) x ... x (Bf u B;)) :=
Epr<1r,a)[((P(Bf) + P(Bﬂ) >< ~~~ >< (P(B:) + P(B;))}-

It sufﬁces to check that the moments of {P(B:,) : 6’ E {0, 1}"‘}, under the two
measures u(-), and V() are the same when B = B, x - - . x B“, i.e., we want to verify

that for positive integers r,,

Edi—IRE; .)"l- — EulIfIH Hz.) "1.

EVIIIP(B:.)"]
= [(ﬁmf) ))I:IP( (B+) r£1921 (,1: a)o g‘(dP)

= / f1(P(Bf) + P(Bj')) II<P<B;.) + P(Bg.))"PT(1r. a)(dP>

34

 

Also,
2m
Bull—I P(B;,)r']
i=1
27"
= / If<H<P(B;))'1> Z PT(T, am...» g-le)
Bifx-uxBI i=1
fT1,T2.---Tn($1a$21 '° xn)]énog—l(dT)
Pr(X1€B:,:.. ,XnEB)
= lim 6‘ [if P(B )+ P B;
Z kPT(T1€ B; . ,nT E B :kn) (B+UBf)x---X(B,IUB; ) ﬁ(B ( in»

61’
PT(‘II’, 0,1,3“,,xn)(dP)]an(dT)

Pr(X1 eB:,,... ,XneB:,) ,.
= lim 1 n f f P 3+.- +
k: PT(Tl 6 ng’ . .. ,Tn E 82-h) (BTUBF)X-11X(B;I'UB;)[ g( ( 6)

1

PT(" am ”1'2 -.z.)(dP)l5¥n(d¢)
PT(XIEB:,,,.. ,Xn EB:I:)
= [2'me PT(TIEB; 111T €3+)/H(PB+)+PB))HP(B )«I—(BP

61

 

 

 

PT(rr, a)(dP)
PT(XIEB:k)"'a Xn €B::) 2m
: l r‘]
”n": Pr(T1€B:, ,,,T 63%)EVIHPU321)

61

= E,[HP( (B+,) 1'1]

[The second and the third equality above follows from the dominated convergence
theorem; the fourth equality uses the fact that PT (11', a$1,$2,---.$n) is the posterior of
PT(1r,a) given {x1,x2, . .. ,xn}.] 0

Theorem 2.3.2 Let a,,,,,,,,,,,(1) =2 EPT(7T.O:1.x2.....zn)°9“(PM): Then
{é,,,32,,,,x,(-) :I x,- I2 T,,i = 1,2,... ,n},,21 is a tight family of probability
measures on (R+, B(R+)), whenever there exists a Q0 6 M (IR+), such that

I 31-2? I(T,- E Bg) - Q0(Bg) |—> 0, for all e E Um{0,1}”‘.

Proof: For any (€1,62,... ,em) 6 {0,1}’", and any {xn :I xn I= Tn},,21 we will

P(Bg.))"

62/)“.

35

show that I 61%,, ,,,(B+ )— Q0(B,,,,2,m ,m) I—> 0. This in turn will imply that

T... 61,62,aee,€m

a,,,,,,.,,,,(~) => Q0, and tightness follows.

We observe that for any {xn :I xn I: Tn}n21,
n

1 " 1" 1
— i B+ =— I 1' 8+ — I; B‘,
”21:1(TE e) n; (336 el+n§1:($€ e)

and hence

I @31132111-1311 (B;,cz,...,cm) " Q0(361,62.-~.6m) I
n

1 " 1
_ + . + _ ' --
S I (1371:5321-"1371(Bel,cg,...,£m)_ .1; :10”: 6 BC) + n 21:10.13; 6 BC) I

1 " .
+ I 521(21- e 821—620(le I.
1

and hence it is enough to verify that
' B+ 1 n I B+ 1 n I B" 0
I a31.$2,---,$n( €1,62,...,€m) _ TI: 2 (1131' E E) + I; ; (xi 6 6) '6 o

l

The last mentioned convergence follows by observing that,

 

(3 (8+ ) _ ﬁ aelr€2r"'r€j + 2: 1(1):. E Bxﬁzpu ,CJ')
x1,x2....,xn £1,62,...,em j=1 061,62....,e,‘-10 'I' a,,,,,,,,_,.j_,1 + 21(2),- 6 B;,GZ,_H,€J._1)

+ f1 aq,¢2,...,e,- + Z [(1); 6 B;¢2r.wej)
061,62,...,€j-10 + aa,€2,...,€j_11 + 21(153' E B;’€2vu’€j_l)

 

i=1

0

Corollary 2.3.3 {A1 og’1(- I T1, T2, . . . ,Tn)},,21 is a tight family of probability mea-
sures on M (R+), whenever there exists a Q0 6 M (R+), such that

I%2?1(7} 6 BE) — Q0(B§) I—> 0, for all e E Um{0,1}"‘.

Proof: The tightness of the family of probability measures

36

{a,,,,,,_,,,,(-) :I x,- |= T,,i = 1,2,... ,n},,21 implies (according to Proposition
1.4.2), that {PT(rr,a,,,,,,,_,,,,)},,21 is a tight family of probability measures on
M (R+). Therefore for every 6 > 0 there exists a compact M; C M (R+), such
that PT(1r,a,,,z,w_,xn)(Mg) > 1 — 6, for all {xnInzl such that I xn [2 Tn. This in
turn implies that A10 g‘1(Mg I T1,T2, . .. ,Tn) > 1 -— 6, for all n. 0

We are now in a position to establish the posterior (weak) consistency for the

Method 1 symmetrization using Polya tree priors.

Theorem 2.3.4 Let Q0 6 M (R+), be such that
I £27170",- 6 B2) — Q0(Bg) I-—) 0, for all 6 E Um{0,1}"’,
then A109_1(' [T1,T2, . . . ,Tn) => 6Q0

Proof: Let f be a bounded continuous function on R+, with compact support. It

is enough to show that A1 0 940/330 I T1,T2, . .. ,T,,) —> 1, where

V23. = {Q 2| [fete—[Idea |< 6}.

f, bounded continuous with compact support implies that there exists a simple func-

tion of the form

I:
so = Zea-IRA).

with By E Um{0,1}"‘, such that supxew | f(x) —— f5(x) I< %.

Observing that

IffdQ—ffdoolsz < §+IffadQ—fndool. and,

k
[fadQ- [11.on = Draws) ‘QOIBe'l,

we can conclude that there exists a 7 sufﬁciently small such that

37

U220 C V30 , where
U220 = UHQ 1| Q(Be) - Q0(Be.-) |< ’Y}~

To complete our proof, it is enough to to prove that Alog‘1(Ugo | T1, T2, . . . ,Tn) —+

1. This follows by observing that

Al og_1({Q 1| Q(B€i) - Qo(Be*) |> 7} I 711,712, - ~- :Tn)
)[l Q(B€.-) ‘ Q0(B£.-) ”2
’72

 

S. EA109'1(-|T1,T2,...,Tn

—>0.

[The last mentioned convergence follows by an argument similar to the one used

in the proof of Theorem 2.3.2]
0

Corollary 2.3.5 The sequence of posteriors A1 0 g’1(- | T1,T2, . .. ,Tn) is (weakly)

consistent at all Q0 6 M (re+).

Proof: We need to show that A1 0 g‘1(- | T1,T2, . .. ,Tn) => 6Qo a.s. Q0,
for all Q0 6 MGR“). By the SLLN, | £2? I(T.- E B?) — Q0(Bz) i—+ O, for all
e E Um{0,1}m a.s. Q0. By Theorem 2.3.4, the result follows. 0

CHAPTER 3

Nonparametric Bayesian inference

with right censored observations

3.1 Introduction and summary

In several clinical, epidemiological, biomedical, and reliability studies the outcomes
of interest are response times, for example, time to death, time to relapse, time to
failure etc. These endpoints may however, not be observed in all subjects. Subjects
may be lost to follow-up due to withdrawal, or due to the occurrence of an end
point unrelated to the outcome of interest in the study. These data are termed right
censored. The typical situation where right censorship occurs is when subjects enter
a study at different (random) time points and are followed until a specific endpoint is
observed or until the termination of the study. The time of occurrence of the desired
end point will be right censored if by the time of termination of the study the event
of interest has not taken place.

Formally the right censoring problem is deﬁned as: Let X1,X2, . .. ,Xn and
Y1,Y2, . .. ,Yn be non-negative i.i.d. random variables with distributions F and G
respectively. We view the X s as lifetimes and the Ys as censoring times and also

assume that the X s and Y5 are independent. The observed data are:

38

39

{ (21,61), . .. , (Zmdn) }, where Z, = X,- /\ 1”,, and 6,- : I[X,- g Y,], and the goal is
to make inferences on F. This problem was ﬁrst considered in a non-Bayesian frame-
work by Kaplan and Meier (1958), who introduced the product limit estimator and
interpreted it as a “maximum likelihood estimator” of F. In this chapter our main
focus is on issues related to posterior consistency in the Bayesian inference with right
censored data.

In the censored data context, priors can be constructed in many ways. One way
is to consider a prior for F directly as done by Susarla and Van Ryzin (1976) who
used a Dirichlet process prior for F. This approach was further studied by Blum and
Susarla (1978) who realized the posterior as a mixture of Dirichlet processes. In
Section 2, we pursue the approach of the former paper and show that the posterior
can be represented as a Polya tree process. This representation clariﬁes and simpliﬁes
many of the calculations of both papers and also enables us to establish the posterior
consistency.

Priors can also be constructed via the distribution of the observables (Z, 6), and
identiﬁability conditions can then be used to transfer to a prior for (F, G). This
is the method adapted by Tsai (1986) who considers a Dirichlet process prior for
the distribution of (Z, 6). In Section 3, we brieﬂy discuss issues related to posterior
consistency with priors of this kind.

Yet another approach is to construct priors for F via a prior for the cumulative
hazard function. Examples of such construction are Ferguson and Phadia ( 1979), and

Hjort(1990). This approach is not discussed in this dissertation.

3.2 Dirichlet process priors for F

The set-up that we consider can be described as follows: X and Y are non-negative

random variables corresponding to life time and censoring time, with distribution

40

F E M(IR“) and G E M(R+). Also {F, X1,X2,... ,Xn}, and {G, Y1,Y2,... ,Yn}
are independent. The observed data is {(Z1,61), . .. ,(Zn,6,,)}. With priors of the
form it = #1 x [1.2 for (F, G), our interest is in the marginal posterior distribution
ux(- | (Z1,61),... ,(Zn,6,,)) (of F) on M(IR+) given {(Z1,61),... ,(Zn,6n)}. It is
clear that if Z,- = z and 6,- = 0, then with regard to X,-, the only information we
have is X,- > z and thus ux(- | (Z,,6,-)) should be given by )Q( | X,- > 2). In other
words, in view of the independence of F and C under the prior, the marginal posterior
distribution of F does not depend on pg and hence the prior on G plays no role in
the analysis that we are interested in. The next proposition is a formal statement of

this fact.

Proposition 3.2.1 Let u = #1 x [1.2 be a prior on M(lR+) X M(R") and let
(21,61), . . . , (Zn, (in) be a sequence of observations realized as described above. Assume
WLOG that 61 = = 6m = 1, and 6m+1 = = 6n = 0. Then (a version of) the
posterior distribution ux(- I (Z1,61), . .. ,(Zn,6,,)) is given by

u;_(' |X1= Z1,... ,Xm = Zm,Xm+1> Zm+1,... ,Xn > Zn).

As mentioned earlier, pg does not play any role in the marginal posterior, and
hence in the results that follow. For the remainder of this section we focus our
attention on priors of the form P(a) x 63,, on M (R+) x M (lR+). This model was
ﬁrst investigated by Susarla and Van Ryzin (1976), who obtained a Bayes estlmate
for F and showed that this Bayes estimate converges to the Kaplan Meier estimate as
a(IR+) -> O. Blum and Susarla (1978) complemented this result by showing that the
posterior distribution is a mixture of Dirichlet processes. The mixture representation
is somewhat cumbersome and we feel that the Polya tree approach is more natural
for the censored data framework and makes the computations transparent. We next

introduce notation that will be used in the remainder of this dissertation.

41

Let Z = (Z1, Z2, . .. , Zn), where Z1 < < Zn. Consider the sequence of nested

partitions {rrm(Z)}m21 given by:

7T1(Z) 2 Bo = (0,211,31 = (21,00)
7r2(Z) : B00,BOI,B10 = (Z1,Zg],Bn = (Z2,oo), and for l g (n — 1), let

TF1+1(Z) 1 30,0,Bo,1,- -- ,Bl,,o =(ZI,ZI+1],Bi,1=(Zz+1,00),

where 11 is a string of l ls, and 0, is a string of 1 OS. The remaining 855 are arbitrar-
ily partitioned into two intervals such that {rm(Z)}m21 forms a sequence of nested
partitions which generates 801?“).

Let aq,...,e = a(B£1,...,€()’ and Clan-".6: = 26,-:0 IKZ" 00) C B‘1"""']

I
For any {(21,61), . . . , (Zn, 6,,)}, let Z * denote the vector of distinct values of the
observations for which the corresponding 6 = 0 (i.e. the corresponding observation is

a censored observation) arranged in an increasing order.

Theorem 3.2.1 Let u = D(cr)x600 be the prior on M(IR+) x M(IR“). Then the
posterior distribution u1(- I (Z1,61),... ,(Zn,6,,)) is a Polya tree process with pa-
rameters «ﬁz'a’ = {7r,*,(Z"')},,21 and agZﬁ) = {dmmq}, where (5151,...“ = am,“ +

2&2]. [[Zi E BC1,...,€(] + C:,...,«:;

[Remark: Note that if Benn,“ = (Zhoo) then 6!,th = a(B,,,m,c,) + no. of
individuals surviving at time Zk, and for every other Burn,“ durum = a(B,,,,,_,5,) +
no. of uncensored observations in Baruﬂ]

Proof: Since ’D(a) is a Polya tree process with respect to any sequence of par-
titions, it is a Polya tree with respect to the partitions nhz’é), with parameter a
= a(B¢,,...,¢,). The proof now follows from Proposition 3.2.1, and the results on the

posterior of a Polya tree process given exact and incomplete observations mentioned

in Chapter 1. O

42

We will denote this posterior by PT(ngz’a), (15.16)).

Note: Throughout this dissertation for any distribution function F,

P(t):=l—F(t),fort€R.

Proposition 3.2.2 Let 6,- : 0, for some 1 S j g n, then the Bayes estimate of

F(zj) given the observables {(21,51), . .. ,(zn,6n)} is given by

 

_ _ a(z(‘i),oo)+n.-
Fn(Zj) _ H{i:z(“.)Szj}[a(z(‘i_l),00)+ni+)\u]’

where n,- = #{2k 2 2(1)}, and A,- = #{Zk E (2&4), 2a.) : 6,, = 1}

Proof: The Bayes estimate of F(zj) is given as
Pn(zj) = fF‘(zj)du1(F I (21,61), . .. ,(zn,6,,)). Using the Polya tree representation
for the posterior u1(- | (21, 61), . .. ,(zn, 6a)), the result follows from the properties of

Polya tree processes mentioned in Chapter 1. 0

Let Mo C M (IR+) be the class of all distribution functions F, such that

1. F and Co have no points of discontinuity in common, and

2. Support(F) C Support(G0)

Let F0 6 M0, and consider the set V170 of all sequences {(zn, 6a)}n21 such that

I. For any (Zj,6j) With 63‘ = 0, a) 12?:0 [{Zi 2 21'} —) Fo(2j)Go(Zj-), and b)

n

Gn(zj—) —-) Co(zj-—), where G" is the Kaplan Meier estimate of G,
2. {23- : 6,- : O} is a dense subset in the support of F0.

Remark: It follows from the SLLN that , ﬁZLO I {Z _>_ z} —> Po(z)C'o(z-), a.s.
(F0, G0), whenever z is a point of discontinuity of Go. Also the SLLN for censored
data [Stute and Wang (1994)], implies that Cn(z—) —+ C'O(z—).

Therefore PﬁGJVFO) = 1. To prove any consistency result for the posterior at

(F0, Go), it is enough to prove the result for {(zn, 6n)}n21 E Vpo.

43

Theorem 3.2.2 Let {(zn,6n)}n31 E Vpo, then the Bayes estimate of F,
m- | (21,51),--- «m.» = ] F(-)PT<<«5f’5>,atrium“),

converges weakly to Fo(-).

Proof: Let us consider a ﬁxed sequence {(zJ,6J )}n>1 E Vpo, and let (2J-,6 J) be a
coordinate such that 6J- = 0. By our assumptions on Vpo, it is enough to show that
P},(zJ-) —> Fo(zJ-). For simplicity of notation we will assume that 61 = 0, and we will
show that Bn(z1) —> P0071).

Let 0 = z(o) < 3(1) < < z(,,(1)) = 21, be the zs among {(21,61), . .. ,(zn,6,,)} for

which the corresponding 63 are 0, and are S 21. By proposition 2,

 

"(1)
:l-Ila( (1(2(J),oo)+nJ___
1:1 (1(Z(j_1),00) +777 +AJ'

where nJ- = 2k [(2]; Z 20)}, and )‘j = 2k;5k=11{zk 6 (30-1), 20')”

Rewriting the expression on the right hand side of the last equation, we get,

(1)—1
-a(0031,)+21{2i221} n (2mm 00)+nj

Fn
(21) a(IR+) + n X a (2(J),OO )+ nJ+1 + AJ+1
= An(Zl) X Bn(Zl),

 

 

i=1

 

 

_ a(z ,oo)+2 [(2122 } n(1)-— 1 a(z(.),oo)+n'
Where An(zl) — l a(IR+)+n l ’and B nZ( 1): Hj=l 0(20),ooj)+nJ-+1-+{AJ+1'

Since {(Zj,6j)}n21 E VFo and 61 = 0, An(21) —) 60(Zl—)Fo(21).
L613 dj = #{2i 1 5 = 0,677,612; = 20)}, then n,“ + Ag.“ = n,- — di, and hence

Bn(Z1)— .. [VIII-l 1 “(211)00le d1. Therefore

a(z(J),oo)+nJ~

 

"(1) 1 n(1)-— l

nJ--— < a(IR+)+nJ-——dJ

a(IR+)+nJ-

 

 

71:]

i=1

44

Observing that H2234 313-131 is the Kaplan Meier estimate for G‘ (21—), and hence
converges to Co(z1-), we can conclude that Bn(z1) ——> (Go(zl—))‘1. Therefore
Fn(zl) = An(zl) x Bn(zl) —> 170(2)).

0

Combining the above theorem with the fact that 3360(Vp0) = 1, we get the

following corollary.

Corollary 3.2.3 Suppose D(a) gives mass one to M'o, then the Bayes estimate of F

A

F.<- I (21.6.). . .. «2.6.» = / F(-)PT<<«$.Z’5),agmxdr),

converges weakly to F0(-) a.s. Ppogo, for F0 E 1%.

am; It can be seen. via Sethuraman’s construction (Theorem 1.5.2) of the
Dirichlet process that it is possible to choose D(a) such that ’D(a)(M0) = 1. This is
achieved by choosing a such that:
a) Support(a) C Support(Go), and b) a and Co have no discontinuity points in

common.

Theorem 3.2.4 Let ,u = D(a)x6(;o be the prior on M (IR+) x M (IR+), where
’D(a)(Mo) = 1. Let F0 6 M0, then the marginal posterior on M (IR+) is weakly

consistent at F0.

(z,6),a(z,6)) => 6170, for all

Proof: We will show that the marginal posterior PT ((1r
{(zJ, 6J)}n21 E Vpo. By Proposition 1.4.2, and Theorem 3.2.3, the posterior sequence
PT((rrgz’é) ), (1556)) is a tight family of probability measures on M (IV). To complete
the proof, it is enough to show that for any f continuous function on IR+, with a

compact support and any 6 > O,

45

PT(ngz’a),aI,z’6))(Ugo) -> 1, where

U20={F:|/de—/de0I< 6}.

Let us ﬁx the sequence {(zJ,6J-)}n21 E Vpo. For the remaining portion of the proof,
we will write 1r" for «$35), and an for 01536). Also, let D 2 {2J- : (SJ = 0}.

Let f have support [0, k], and let 7 be such that I x—y I< 7 implies I f(:r)—f(y) I<
(5/3.

Let 0 = a1 < a2 < < and = k, be such that I a,~+1— a,- I< 7/2, and let
2(1) < 3(2) < - -- < 2(1), with z(,-) E D, and z(,-) E (a,,a,-+1).

Let f5(z) :2 [1+1 f(z(,-))I{z E (z(,~_1),z(,)I}, with 2(0) =2 0, and Z(l+1) = k. Then

Hf - fa ||< i, where H 9 ||= sun. I 906) |-

Further,
I/de—fdeo Ig %é+I/f5dF---/f5dFo I.

Let U1 = {F :l [de17 — fdeFo |< {3;}, then U; c Ugo.
For any F, f fng = Z f(z(,-))IP(z(,-_1)) P(zmﬂ, and hence

I / fidF — [ma Is 2 II f II 2 I Feet-I) — F(zt-II I,

Observing that PT(nn, an)(U £10) 2 PT (nn,an)(U61), to complete our proof, it is
enough to show that PT (nn,an)(F :I P(zm — F0(z(,-) I> n) —) 0, for every n > 0.
This follows by an application of Markov’s inequality, if we are able to show that
EIFU) I (21,51): - -~ ’(Zn)6R)) “) F0“), and
E((F‘(t))2 | (21,61), . .. , (2n,6,,)) -—+ (F‘0(t))2 for all t E D.

In the proof of Theorem 3.2.3, we have already seen that
E(F’(t) I (21,61), . .. , (2n,6n)) —-> Fo(t), for all t E D.

We will now show that

46

E((F(t))2 | (21,61). - .. .(zm6n)) -+ (Fo(t))2, for all t E D-
For simplicity of notation, let us assume that 21 E D, and we will use the same
notation as in the proof of Theorem 3.2.3. Using the properties of Polya tree processes

and the Polya tree representation for the posterior, we have

15‘7((1':"(21))2 | (21,51), - -- a(zm6n)) = A3011) >< 32(21),

where AHZI) = “(z‘fliégihz‘zz‘l X agzl’oth‘giﬁﬁnhl and

B" (2 ) _ Hum—1 0(z(j),°°)+ﬂj Hum-1 °(z(i)'°°)+"i+1
n 1 1:1 a(z(J),oo)+nJ+1+A,-+1 i=1 0(20I,°°)+"j+1+’\1+1+1'

An argument similar to the one used in the proof of Theorem 3.2.3, now yields
A1121) —> (30(21-)F0(21))2a and 33(le —> (60(21))_2I

and thus E((F(zl))2 I (z1,61), . .. ,(zn,6,,)) —> (F0(zl))2. Similarly

E((F(t))2 I (21,61), . .. ,(zn,6,,)) ——> (Fo(t))2 for all t E D. O

3.3 Priors on the distribution of the observables

As in Section 3.2, here also we work under the following set up: X and Y are non-
negative random variables corresponding to life time and censoring time, with distri-
bution F E M(IR+) and G E M(IR‘U. Also {F, X1,X2, . .. ,Xn}, and
{G, Y1, Y2, . .. , Yn} are independent. The observed data is {(Z1,61), . .. ,(Zn, 6,,)}. In
this section we consider priors for the distribution of the observable random variables
names {2,6}, and study the posterior given {(Z1,61), . . . ,(Zn, 6,,)}.

Let M X’Y C M (II?) x M (IR+) denote the collection of all pairs of distribution

functions (F, G) such that
1. F and C have the same support
2. F and G do not have any discontinuity points in common.

We equip M X ’Y with the two dimensional Kolmogorov metric di, deﬁned as

dIIIFIIGI), (F2,02)) = dk(F1, F2) + dk(GlaG2)-

47

We write M " (IR+) for M (IR+ x {0, 1}) and observe that any H 6 M *(IR+) is iden-
tiﬁed by the pair of sub-survival functions (H0, H1), where Ho(t) = H ((t, 00) x 0),
and H1(t) = H ((t, 00) x 1). We view M ‘(IR‘L) as the space of sub-survival func-
tions (H0, H1), where Ho and H1 are right continuous and non-increasing satisfying:
170(0) + H1(0)=1,
and limHoo{Ho(t) + H1(t)} = 0.

Let M“ C M"(IR+), be the collection of all (H0, H1) 6 M*(IR+) such that

1. H0 and H1 have the same support.
2. H0 and H1 do not have any discontinuity points in common.

On M 2’5 , we consider the appropriate ‘Kolmogorov’ metric d; deﬁned as
d2((HoIH1)I (HSIHSII = “Pt I HoIt) - H605) I +3111): I H1(t) - Hi“) I
In this section we will restrict ourselves to M X ’Y and M Z ’5 as our parameter spaces

of interest.

Let T: M)” +—> M“
(F, G) I—> Distribution of (Z, 6) whenX ~ F, Y ~ G;

X and Y are independent

i.e. T(F, G) = (H0,H1) such that

3
ll
“\‘8
GI
35
a.
"In
3

That the map T is 1-1 is a consequence of the identiﬁability property of (Z, 6).

48

Peterson (1977) deﬁnes a map

Q : M“ I-—) M)”

using which one can conclude that T is also onto, and T‘1 = Q. The following facts
summarize the properties of T that we are interested in.

Egg. T is a continuous map from (M X ’Y, dﬁ) to (M 2’5 , (1;). (Follows easily from
the representation for T (F, G) given above.)

Faggweterson (1977)). T‘1(= Q) is a continuous map from (M 2’5 , d;) to
(M X ’Y, (12)-

These observations lead to the following theorem.

Theorem 3.3.1 Let p. be a prior on .Mz’é, and let [2 denote the induced prior on
MX’Y via the map Q, i. e. [2 = poT. Ifu(- I (Z1,61), . .. , (Zn,6,,)) is k-consistent at
(H0,H1), then [i(- I (21,61), . .. ,(Zn,6,,)) is k-consistent at T“1(H0,H1).

A natural class of priors on M 2’6 is the Dirichlet process priors. These priors were
ﬁrst explored in this set-up by Tsai (1986), who used them to construct a class of
self consistent estimates. In this context, using Theorem 1.5.1 (rather Proposition
1.5.2), we can conclude that the posterior given {(Z1,61), . .. , (Zn,6n)} on M“ is
k-consistent at all (H0, H1) 6 M Z"; . Theorem 3.3.1 now ensures that we infact have
k-consistency for the sequence of posteriors for the induced prior ’D(a) o T on M X 'Y
given {(Z1,61), . .. , (Zn, 6,,)}. A consistency result of this sort was proved by Ghosh
and Ramamoorthi (1995). However they had mistakenly assumed that
P(al x (12) = ’D(a1) x P(ag) and thus believed that they had a version of Theorem
3.2.4. A careful look at their proof shows that it is a special case of Theorem 3.3.1.

Another class of priors on M Z" would be Polya tree priors. Since Polya tree

priors can be made to sit on densities, this method gives us (induced) priors on

49

M X'Y sitting on densities which yield consistent posteriors. The consistency is an
immediate consequence of Theorem 3.3.1, since if (F0, G0) are continuous, the weak
neighbourhoods and k-neighbourhoods coincide.

Theorem 3.3.1 thus provides a wide class of priors for (F, G) which ensures pos-
terior consistency in the context of right censored data. However, there do arise
interpretational difﬁculties. Manageable priors on M 2’5 , like the Dirichlet processes
or Polya tree processes, when transferred to M X ’Y gives rise to a prior which is not
of the form in x [1.2, and hence under the predictive distribution X and Y will not be

independent.

CHAPTER 4

Nonparametric Bayesian inference

with interval censored observations

4.1 Introduction and summary

In some medical follow up studies and epidemiologic investigations, continuous moni-
toring for outcome variable of interest is impractical, and assessment of study subjects
can only take place at deterministic or random time points. The precise time at which
the outcome occurred is not observed, but it is either known to have taken place within
a speciﬁed time interval determined from two consecutive examination times, or not
to have occurred by the last available assessment. The time of occurrence T of the
event is said to be interval censored, and one of the important statistical issues is
the problem of inference on the distribution of the time of occurrence from a sample
of interval censored observations. The problem that we consider in this chapter is
the simplest form of interval censoring with a single examination time called Case 1
interval censoring and is described below.

Let X1,X2,... ,X7, and Y1,Y2,... ,Yn be non-negative i.i.d. random variables
with distributions F and G respectively. We view the X s as lifetimes and the Ys as

inspection times and also assume that the X s and Y3 are independent. The observed

50

51

data are { (Y1,61),... ,(Yn,6,,) : i = 1,... ,n}, where 6,- : IIX, S Y,], and the goal
is to make inferences on F based on {(Y1,61), . .. , (Yn, 6a)}. This problem has been
studied in a non Bayesian framework by among others, Turnbull (1976), Groeneboom
and Wellner (1992), and Wang and Gardiner (1996). Groeneboom and Wellner (1992)
study in good detail the asymptotic properties of the ‘Turnbull estimator‘ - the non
parametric maximum likelihood estimator (NPMLE) of F. Our research in the interval
censoring problem was driven by a desire to obtain good Bayesian estimates for F, and
also to give a Bayesian interpretation to the Turnbull estimator. The material that is
presented in this chapter is essentially a description of the complications associated
in using the approaches that work ‘well’ in the right censoring problem, described in
Chapter 3. Unfortunately, our attempts have not been very successful.

Similar to the discussion in Chapter 3, for Bayesian inference in a interval censored
data context, priors can be constructed in two natural ways. One way is to consider a
prior for F directly. Wang (1993) makes an attempt in this direction by considering a
Dirichlet process prior for F. This approach is discussed in Section 2. Another method
is to consider a prior for the distribution of the observables {(Y1,61), . .. , (Yn, 6a)},
and then use identiﬁability conditions to transfer to a prior for (F, G). We discuss

this approach in Section 3.

4.2 Dirichlet process priors for F

The set-up that we consider can be described as follows: X and Y are non-negative
random variables corresponding to life time and inspection time, with distribution
F 6 M(R+) and G E M(IR+). Also {F, X1,X2,... ,Xn}, and {G, Y1,Y2,... ,Yn}
are independent. The observed data is {(Y1,61),... ,(Yn,6,,)}. We consider priors
of the form D(a)x6(;o for (F, G) Our interest is in the Bayes estimate of F and the

(marginal) posterior distribution

52

’D(a)(- I (Y1,61),... ,(Yn,6,,)) (of F) on M(R+) given {(Y1,61),... , (Yn,6,,)}.

The next proposition follows from Theorem 2 of Blum and Susarla ( 1977).

Proposition 4.2.1 Let p. = D(a)x6go be the prior (for (F, 0)) on M(IR+)XM(IR+).
Then the posterior distribution {of F) given {(16,61). .. ,(Yn,6,,)} is a mixture of

Dirichlet processes.

[Brim—ark: As Theorem 2 of Blum and Susarla (1977) suggests, this mixture rep-
resentation could be pretty complicated]

Wang (1993) suggests a way of calculating the Bayes estimate of F (and hence F)
given { (Y1, 61), . .. ,(Yn, 6,,)}, and highlights the computational difficulties that arise
in this case. Also, unlike the right censoring set-up, the limit of the Bayes estimate as ,
a(IR+) —> 0, does not always equal the N PMLE (the Turnbull estimator). Wang(1993,
pages 41-42) has an example where the limit of the Bayes estimates as a(IR+) —-> 0 is
not the NPMLE.

We mention below four examples that illustrates how the limit of Bayes estimates
behaves and its relationship to the NPMLE. In all the four examples the Bayes esti-
mate of F is derived under the ’D(a)x6Go prior for (F, G). Also, F will denote the
NPMLE while F will denote the limit of the Bayes estimates as a(IR+) —> 0. Since it
is clear that if Y,- = y and 6,- = 1, then with regard to X,-, the only information we have
is X,- S y, and if Y,- = y and 6,- = 0, then with regard to X,, the only information we
have is X,- > y, the oberved data set will be presented in terms of the intervals that
contains X,-. For the remaining portion of this chapter, for any distribution function
F, we will write F(a, b] to represent F(b) - F(a).

Example 1.: This example illustrates that the limit of Bayes estimates could be
supported on a much bigger set than the NPMLE. The observed data consists of the
four intervals (1, oo), (2, oo), (0, 3], (4, 00).

The limit of Bayes estimates in this case turns out to be,

53

F(O,1I — -,‘—2,
F(1,21 = —
F(2,3] = -,6—,,
moo] - -

while the NPMLE is given by,

A

F(2, 3] = %,

F(4,oo]=%.

Example 2.: This example illustrates that the limit of Bayes estimates could be

supported on the same set as the NPMLE, but they may still be different. The

observed data consists of the intervals (0, 1], (2, oo), (3, oo), (0, 4].

The limit of Bayes estimates in this case turns out to be,

F(0,1] = 3,
Po, 4] = 3,
while the NPMLE is given by,
F(0,1] ==

7

who wl"

F(3, 4] =

Example 3.: This is an example where the limit of Bayes estimates and the

NPMLE are the same. The observed data consists of the intervals (0, 1], (2, oo), (0,

3], (4, 00).

The limit of Bayes estimates in this case turns out to be,

F(0,1] = I,
F(4, 00) = %,
while the NPMLE is given by,

F(0,1] = %,
1374,00) =

l
2.

54

Example 4.: This example illustrates that the limit of Bayes estimates could be
supported on a smaller set than the NPMLE. The observed data consists of the
intervals (0, 1], (2, 00), (0, 3], (0, 4], (5, 00).

The limit of Bayes estimates in this case turns out to be,
F(0,11= %,

~

F(5,00) = %,

while the NPMLE is given by,

To gain further insight into the behaviour of the limit of Bayes estimates, we
introduce the notion of ‘allocation’, and ‘minimal allocation’ for interval censored
data.

An Lallocation’ based on the observed data (intervals) consists of a) the smallest
sub-intervals formed by using the left end points and right end points of the ob-
served intervals, and which can account for all the (unobserved) X,- values, (i.e. the
unobserved X,s can all be classiﬁed as belonging to one [and only one] of the sub-
intervals;), and b) the numbers representing the no. of Xis that can be assigned to
each sub—interval.

For instance in Example 1. above, an allocation could consist of the intervals (0,
1], (2, 3], and (4, 00), with the corresponding number of X,s assigned to the above
intervals possibly being 1, 1, and 2 respectively. As is obvious, in any particular
example there could be many different possible allocations.

A ‘minimal allocation’ is an allocation consisting of fewest number of sub-intervals

 

among all allocations.

55

In Example 1. above, a minimal allocation will consist of only 2 sub-intervals.
(i) (0,1], and (4, 00), with the corresponding numbers of X ,s in the sub-intervals being
1 and 3 respectively, represents a minimal allocation.
(ii) (2, 3] and (4, 00) with the corresponding numbers of X,s in the sub-intervals
being 1 and 3 respectively represents another minimal allocation.
(iii) (2, 3] and (4, 00) with the corresponding numbers of X,-s in the sub-intervals
being 2 and 2 respectively represents yet another minimal allocation.

A minimal allocation will be called a ‘unique minimal allocation’ if there is no

 

other minimal allocation (taking into account both the no. of subintervals included
in the allocation and the no. of ways in which the (unobserved) X,s can be assigned
to the sub-intervals).

For instance, in Example 2. above, the allocation including the intervals (0,
1.] and (3, 4] is a minimal allocation, but not a unique minimal allocation, but in
example 3. above, the allocation consisting of the intervals (0, 1], and (4, 00), with
the corresponding numbers of X,’s in the sub-intervals being 2 and 2 respectively,
represents a unique minimal allocation.

Based on the above examples and some elementary analysis we have the following
conjectures about the behaviour of the limit of Bayes estimates and its relationship
with the NPMLE.

Conjecture 1. If there is a unique minimal allocation and the intervals included in

 

the allocation are exactly the same as the intervals which are assigned positive mass
by the NPMLE, then, the limit of Bayes estimates (as a(IR+) -) O) is the NPMLE.

Conjecture 2. The limit of Bayes estimates (as oz(IR+) ——> 0) assigns positive mass

 

to only those intervals which appear in at least one ‘minimal allocation, and vice

VBI‘SB.

56
4.3 Priors on the distribution of the observables

As in Section 4.2, here also we are working under the following set up: X and Y are
non-negative random variables corresponding to life time and inspection time, with
distribution F E M (IR+) and G E M (IR+). Also
{F, X1,X2, . .. ,Xn}, and {G, Y1,Yg, . .. ,Yn} are independent. The observed data is
{(Y1,61),--- r(Yn76n)}'

Let M X'Y C M (IR+) x M (IR+) denote the collection of all pairs of distribution

functions (F, G) such that
1. 0 < F(r) < 1 for x E IR+, and F is continuous on IR“.

2. G is absolutely continuous (with respect to Lebesgue measure) having a density

g which is continuous and positive on IR+.

We equip M X’Y with the metric d”, deﬁned as
dk,t((F1, G1), (F2,G2)) -- dk(F1, F2) + dt(G1,G2), where
d,(G1, G2) = '15 f I g1 — g2 I d/\, with A denoting the lebesgue measure on IR“.

As in Chapter 3, we write M ‘(1R+) for M (IR+ x {0,1}) and identify any
H E M "‘ (IR+) by the pair of sub-distribution functions (H0, H1), where
H0(t) = H((O, t] x O), and H1(t) = H((O,t] x 1). We view M‘(IR+) as the space
of sub-distribution functions (H0,H1), where Ho and H1 are right continuous and
non-decreasing satisfying: Ho(0) + H1(O) = O, and limHoo{Ho(t) + H1(t)} = 1.

Let M Y"’ C M ’(IR+), be the collection of all (H0, H1) 6 M " (IR‘L) such that

1. H1(a:) = f0:c F(u)g(u)du,

2. H0(:1:) 2 IO: F(u)g(u)du,

for some (F, G) E MX'Y.

57

On M K6, we consider the appropriate ‘total variation metric’ (I? deﬁned as
d:((H07 H1), (H6) HI» =

if I 1_I'“(u)g(u) - F"(u)g‘(u) I du + I f I F(u)g(u) — F*(u)g*(u) I du.
In this section we will restrict ourselves to M X’Y, and M K", as our parameter

spaces of interest.

Let T: M)” I—I M”
(F, G) I—> Distribution of (Y, 6) whenX ~ F, Y ~ G;

X and Y are independent

i.e. T(F, G) = (H0, H1) such that

E
:3,
II

[0. F(u)g(u>du,
H1(:r) = foxF(u)g(u)du.

The following proposition summarizes the properties of the map T, that are of interest

to us.

Proposition 4.3.1 The map T from (MX’Y,dk,t) to (MY",d;‘), is 1-1, onto, and T

and T"1 are both continuous.

Proof: That the map T is onto follows from its very deﬁnition. From Theorem 1,
of Wang et al. (1994), it follows that T is 1-1.
Let {{(Fn,G,,)},,21, (F, G)} C M)”, be such that dk,t((F,,,G,,), (F, G)) —> 0. To

58

prove that T is continuous, we need to show that d{((Hn0,Hn1), (H0, H1)) -—> 0, with

Hoe) .: [xF(u)g(u)du,
Hie) = [IF‘(u)g(U)du,

0

where gn is a density of Gn, and g is a density of G, both with respect to Lebesgue
measure. In fact, it is enough to show that

f I Fn(u)g,,(u)du — F(u)g(u) I du —> 0. By the triangle inequality,

f I F.<u)g.(u)du - F(u)g<u) I du
/ I Fn(u)gn(u) _ F(u)9n(u) I d” +/ I F(ulgn(u) ‘" FIU)9(U) I d“

s IIF.—FII+/Ig.Iu)—g(u)Idu,

I/\

which converges to zero by our hypothesis.

Let {{(Hno, Hn1)},,21, (H0,H1)} C M” be such that
d{((H,,0,H,,1), (H0,H1)) —-+ 0, where Hno, Hn1,Ho, and H1, have the same represen-
tation as above. To prove that T’1 is continuous, we need to show that
dk,t((Fm Ga), (F, G)) -> 0-

Observing that H1(a:) + Ho(:r) = G(:r), and Hf($) + H3(a:) = Gn(:r), it follows
that d¢(G,,,G) —-> 0. To complete our proof, we need to show that dk(F,,, F) —> 0.
This is proved by showing that every subsequence of {F,,}n21 converges to F in the dk
metric. Since the Fns are distribution functions on IR+ , every sequence has a further

subsequence that converges to a sub-distribution function F‘ at all continuity points

59

of F‘. Since F is continuous, our proof will be complete, if we can show that F = F *,
a.e. Lebesgue measure. By the dominated convergence theorem,

f I Fn(u)g(u) — F‘(u)g(u) | du ——> 0. On the other hand, noting that

l Fn(U)9(UI - F(U)9(U) |

S | Fn(U)9(UI - Fn(U)gn(U) I + I Fn(U)9n(U) - F(U)9(U) I,

we can conclude that f I Fn(u)g(u) — F(u)g(u) I du —% 0. Since g(u) > 0, for all

u 6 IR“, we can now conclude that F(u) = F*(u) a. e. Lebesgue.

0

This leads us to the following theorem.

Theorem 4.3.1 Let p be a prior on M Y”, and let [1 denote the induced prior on
MX'Y via the map T“, i. e. [i = uoT. pr(- I (Y1,61), . .. ,(Yn,6,,)) is t-consistent
at (H0,H1), then il(- I (Y1,61), . .. ,(Yn,6,,)) is (k,t)~consistent at T"(H0,H1).

Since Polya tree priors can be made to sit on densities, it is very tempting to think
that Polya tree priors would be a natural class of priors on M Y" , and then conclude
that this method gives us (induced) priors on M X ’Y which yield consistent posteriors.
But, unfortunately we do not have a way of constructing Polya tree priors that will
sit on M Y” and even more importantly, we not know whether Polya tree priors will
yield posteriors that are t-consistent. Also, we do not know of any family of priors
that will give mass one to M K" , leave alone yield t-consistent posteriors, and hence

Theorem 4.3.1 is not of much use.

Bibliography

[1] Blackwell, D. (1973). Discreteness of Ferguson selections. Ann. Statist. 1, 356-
358.

[2] Blum, J. and Susarla, V. (1977). On the posterior distribution of a Dirichlet
process given randomly right censored observations. Stoch. Processes Appl. ' 5,

207-211.

[3] Dalal, S. R. (1979). Dirichlet invariant processes and applications to nonparamet-

ric estimation of symmetric distribution functions. Stoch. Proc. App. 9, 99-107.

[4] Diaconis, P. and Freedman, D. (1986a). On the consistency of Bayesestimates

(with discussion). Ann. Statist. 14, 1-67. _

[5] Diaconis, P. and Freedman, D. (1986b). On inconsistent Bayes estimates of lo-

cation. Ann. Statist. 14, 68-87.

[6] Doob, J. L. (1948) Application to the theory of martingales. Coll. Int. du C. N.
R. S. Paris, 22-28.

[7] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems.

Ann. Statist. 1, 209-230.

[8] Ferguson, T. S. (1974). Prior distributions on spaces of probability measures.

Ann. Statist. 2, 615-629.

60

61

[9] Ferguson, T. S., Phadia, E. G. and Tiwari, R. C. (1992). Bayesian nonparametric
inference. Current Issues in Statistical Inference: Essays in Honor of D. Basu
17, 127-150. (Institute of Mathematical Statistics Lecture Notes - Monograph

Series).

[10] Ferguson, T. S., and Phadia, E. G. (1979). Bayesian nonparametric estimation

based on censored data. Ann. Statist. 7, 163-186

[11] Freedman, D. A. (1963). On the asymptotic behavior of Bayes estimates in the

discrete case. Ann. Math. Statist. 34, 1386-1403.

[12] Ghosal, 8., Ghosh, J. K. and Ramamoorthi, R. V. (1996). Consistent semipara—

metric Bayesian inference about a location parameter. (To be Published).

[13] Ghosh, J. K., and Ramamoorthi, R. V. (1995). Consistency of Bayesian inference
for survival analysis with or without censoring. Analysis of censored data 27, 95-

103. (Institute of Mathematical Statistics Lecture Notes - Monograph Series).

[14] Ghosh, J. K., and Ramamoorthi, R. V. (1996—97) Unpublished manuscript on

Bayesian nonparametrics.

[15] Groeneboom, P. and Wellner, J. A. (1992) Information bounds and nonparamet-

ric maximum likelihood estimation. Birkhauser Verlag, Basel.

[16] Hannum, R. and Hollander, M. (1983). Robustness of Ferguson’s Bayes estimator

of a distribution function. Ann. Statist. 11, 632-639.

[17] Hjort, N. L. (1990). Nonparametric Bayes estimators based on beta processes in

models for life history data. Ann. Statist. 18, 1259-1294.

[18] Lavine, M. (1992). Some aspects of Polya tree distributions for statistical mod-
elling. Ann. Statist. 20, 1222-1235.

62

[19] Lavine, M. (1994). More aspects of Polya tree distributions for statistical mod-
elling. Ann. Statist. 22, 1161-1176.

[20] Mauldin, R. D., Sudderth, W. D. and Williams, S C. (1992). Polya trees and
random distributions. Ann. Statist. 20, 1203-1221.

[21] Parthasarathy, K., R. (1967). Probability measures on metric spaces. Academic
Press, New York.

[22] Peterson, A. V. (1977). Expressing the Kaplan-Meier estimator as a function of

empirical subsurvival functions. J. Amer. Statist. Assoc. 72, 854-858.
[23] Schervish, (1995). Theory of Statistics. Springer- Verlag, New York
[24] Schwartz, L. (1965) On Bayes’ procedures. Z. Wahrsch. verw. Gebiete 4, 10-26.

[25] Sethuraman, J. (1994). A constructive deﬁnition of Dirichlet priors. Statistica

Sinica 4, 639-650.

[26] Sethuraman, J. and Tiwari, R. C. (1982). Convergence of Dirichlet measures
and the interpretation of their parameter. Statistical Decision Theory and Re-
lated Topics III. 2, 305-315. (Edited by Shanti S. Gupta and James O. Berger;
Published by Academic Press).

[27] Susarla, V. and Van Ryzin, J. (1976). Nonparametric Bayesian estimation of
survival curves from incomplete observations, J. Amer. Statist. Assoc. 71, 897-

902.

[28] Stute, W. and Wang, J.-L. (1993). The strong law under random censorship.
Ann. Statist. 21, 1591-1607.

63

[29] Tsai, W. Y. (1986). Estmation of survival curves from dependent censorship
models via a generalized self-consistency property with nonparametric Bayesian

approach. Ann. Statist. 14, 238-249.

[30] Turnbull, B. W. (1976). The empirical distribution function from arbitrarily
grouped, censored, and truncated data. J. Royal Statist. Soc. Ser. B. 38, 290-

295.

[31] Wang, Z. (1993). Estimation in interval censorship models. Ph. D. Dissertation,

Michigan State University.

[32] Wang, Z., and Gardiner, J. C. (1996). A class of estimators of the survival

function from interval censored data. Ann. Statist. 24, 647-658.

[33] Wang, Z., Gardiner, J. C., and Ramamoorthi, R. V. (1994). Identiﬁability in

interval censorship models. Stat. and Prob. Letters 21, 215-221.

"‘IIIIIIIIIIIIIIIII