. . ,1 A... :i: 5. 3.1.. v 2:?! \7. 5 V 3A..» . , a. n.” . . at 3an :7: mm .. EST: .u. . ‘ ., .an.m.fi..m.m..m ‘ ‘ .2 .n..:_.”. 35%.».39g THC-SIS I 799 ImmmI1111111Iiiii‘iiiiinulmm " 3 1293 01810 3949 This is to certify that the dissertation entitled Some properties and characterizations of neutral to the right priors and beta processes presented by Jyotirmoy Dey has been accepted towards fulfillment of the requirements for Ph.D. degree in Statistics g V' QWCUW‘O‘NTQ (1 Major professor Date August 23, 1999 MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. ‘ DATE DUE DATE DUE DATE DUE ma chanpGS—p.“ SOME PROPERTIES AND CHARACTERIZATIONS OF NEUTRAL TO THE RIGHT PRIORS AND BETA PROCESSES By J yotirmoy Dey A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1999 ABSTRACT SOME PROPERTIES AND CHARACTERIZATIONS OF NEUTRAL TO THE RIGHT PRIORS AND BETA PROCESSES By J yotirmoy Dey The nonparametric Bayesian paradigm requires us to consider probability mea- sures - priors - on the infinite dimensional space .7: of all cumulative distribution functions. This dissertation is a study of one such class of measures introduced by Doksum [8] known as Neutral to the right (NR) priors. The material is organized in three chapters. Chapter 1 is a survey of NR priors and serves as a prelude to the subsequent chapters and results. We conclude the chapter by observing that these priors can be chosen to have all of .7: as support and thus satisfies one of the desirable requirements of nonparametric prior distributions as laid down by Ferguson [10]. In chapter 2 we provide necessary and sufficient conditions on a sequence of ex- changeable random variables such that the prior distribution obtained from it via de Finetti’s Theorem is NR. En route to such characterization we also obtain char- acterizations in terms of the posterior distributions. Chapter 3 contains a study of beta and beta-Stacy processes. Hjort [14] and Walker and Muliere [26], respectively, developed beta processes and beta-Stacy pro- cesses as concrete examples of NR prior distributions. We first give a construction of beta process priors directly on .7: and then prove the following: . the posterior corresponding to a beta process is weakly consistent, . beta processes with distinct parameters are mutually singular, . carefully chosen Dirichlet process priors on the space of right-censored observa- tions induce beta priors on the space of lifetime and censoring-time distributions, . a beta-Stacy prior is a simple reparameterization of a beta process. To the memory of my father. iv ACKNOWLEDGEMENTS My deepest regards and sincerest thanks go to my dissertation advisor, Professor R. V. Ramamoorthi. I have benefited immensely from his insightful advice, brilliant vision, constant encouragement and fathomless patience. His profound knowledge of Statistics, clarity of thought and exemplary work ethic has been a constant source of inspiration to me. I will always cherish the care and interest with which he oversaw my academic and professional progress. I thank Professors R. Erickson, H. Koul and C. Wei] for serving on my disser- tation committee. I greatly appreciate the contribution made by Professor Koul in carefully reading previous drafts of this document and suggesting several improve- ments to the text. Special thanks go to Professors C. Page and J. Stapleton for their involvement and guidance in my development as a graduate instructor and statistical consultant. Their interest and assistance in my search for a professional career is gratefully acknowledged. It is impossible to express in words how much I owe my academic achievements to the encouragement, affection, support and sacrifice of my mother and my sisters. Without them, my pursuit of a higher education would never be possible. I must also thank my colleagues and friends for their company, help and encour- agement. I will refrain from furnishing a complete list here as it is too long. I must, however, mention that I will always retain fond memories of my associations with Ms. M. Geraldes, Mr. N. Choudhury and Mr. D. Biswas. Contents Introduction 1 1 Neutral to the Right Priors 5 1.1 The Space .7 ............................... 6 1.2 Definition and Examples ......................... 7 1.3 Independent Increment Processes .................... 10 1.4 NR Priors via Lévy Processes ...................... 13 1.5 Support of NR Priors ........................... 21 2 Posterior Properties and Characterizations 26 2.1 Posterior Distribution ........................... 27 2.2 Characterizations of NR Priors ..................... 33 2.3 NR Priors from Censored Observations ................. 38 3 Beta Processes 41 3.1 Definition and Construction ....................... 42 3.2 Properties of Beta Processes ....................... 46 3.3 The Problem of Right-Censored Observations ............. 52 vi 3.4 Mutual Singularity of Beta Processes .................. 57 3.5 Beta-Stacy Process Priors ........................ 61 vii Introduction Let 6 be an unknown parameter and X be an observable random variable whose dis- tribution F9 depends on 0. The goal of statistical investigation is to make inference on 6 based on the observed value of X. In the Bayesian paradigm 6 itself is endowed with a distribution II, called the prior distribution, and the inference essentially consists of updating the prior II to IIX - the conditional distribution of 0 given X - commonly known as the posterior distribution of 9 given X. The prior distribution II reflects, often as an approximation, the investigators knowledge of the parameter 0 prior to observing X. In the parametric case 0 is generally taken to be an element of IR" and the map 0 —> P}; is a smooth parameterization. In other words, the distribution of X is assumed to be among {F9 : 6’ E G}, where (-3 C 1R". In the nonparametric case the restriction to a finite dimensional parametric family is removed. The set of permissible distributions for X is typically the set of all distributions or a large subset thereof. The model that we consider consists of 1. f — the set of probability distributions on IR“, 2. n independent identically distributed random variables X1, . . . , X n whose com- mon distribution F is, of course, an element of f, and 3. II - a probability measure (probability distribution) on .77. In the nonparametric case the Bayesian paradigm requires us to consider proba- bility measures II on the infinite dimensional space .77. It is thus necessary to develop and study probability measures on .7: which would be analytically tractable and which would also be interpretable. This thesis is devoted to a study of a class of priors called neutral to the right (NR) priors. A prior II is said to be NR if, for all k 2 1 and all t1 < < tk, PM) F(t-1)‘i : 1,...,k, are independent, where F(-) = 1 — F(-). These priors were introduced by Doksum [8], who also showed that if II is NR, then the posterior given 71 observations is also NR. This result was extended to the case of right-censored data by Ferguson and Phadia [12]. While these authors considered NR priors in abstraction, Hjort [14] and Walker and Muliere [26], respectively, developed beta processes and beta-Stacy processes which provide concrete and useful classes of NR priors. These priors, which are analogous to the beta prior for the Bernoulli(6), are analytically tractable and are flexible enough to incorporate a wide variety of prior beliefs. The map F r—+ ¢D(F) = — log(1 — F) maps f into the space of increasing func- tions. Doksum showed that a prior II on f is NR if and only if the induced measure II 0 $51 gives rise to independent increment processes. Hjort proved a similar result by considering the map ¢H(F)(-) = [(0 .] £5927. When F is continuous the two images 050(17) and ¢H(F) are the same. Since independent increment processes are well understood, this connection pro- vides a powerful tool for studying NR priors. In particular, independent increment processes have a cannonical structure, the so—called Lévy representation. The asso— ciated Lévy measure can be used to elucidate properties of NR priors. For instance Hjort provides an explicit expression for the posterior given n independent obser- vations in terms of the Lévy representation when the Lévy measure is of a specific form. In Chapter 1 we provide a brief introduction to NR priors and related independent increment processes. We then show that a NR prior with support .77 can be obtained by choosing a Lévy measure with full support. This result shows, in particular, that with a proper choice of parameters the beta and beta-Stacy processes can have all of J: as support. In Chapter 2 we first recall Doksum‘s result and the fact that if II is NR, then the posterior distribution of F (t), given X1. - - - ,Xn, depends only on {Nn(s) : s S t} where Nn(-) = 2;, 19,33}. In other words, the posterior distribution of F (t) does not depend on the exact values of the observations larger than t, but only on how many there are. We then show that this property of the posterior actually characterizes NR priors. This characterization is then used to provide yet another characterization of NR priors via de Finetti’s Theorem. The chapter concludes with an extension of the results to the case of right censored data. Chapter 3 is devoted to a study of beta and beta-Stacy processes. We first con- struct beta processes directly on .7. Then we show that beta processes yield consistent posteriors, i.e. if F0 is indeed the true distribution then, as more and more observa- 3 tions accumulate, the posterior converges to F0 with probability 1. Hjort has shown that beta processes possess many pleasing pr0perties in the con- text of right-censored data such as easy updating of the prior parameters. In the same spirit we show that if (Z, A) is a right-censored observation arising from a survival time X and an independent censoring time Y, then under a Dirichlet process prior for the distribution of (Z, A) the distributions of X and Y marginally have beta priors. We then use a result of Brown [1] on mutual singularity of Poisson processes to show that any two beta process priors are mutually singular. We conclude the chapter by observing that any beta-Stacy process is just a reparameterization of a beta process. Chapter 1 Neutral to the Right Priors Neutral to the right(NR) priors is a specific class of nonparametric priors that was introduced by Doksum [8]. Historically, the concept of neutrality is due to Connor and Mosimann [3] who considered it in the multinomial context. Doksum extended it to distributions on the real line in the form of neutral and neutral to the right priors. Subsequent papers by Ferguson [11], Ferguson and Phadia [l2], Hjort [14] and Walker and Muliere [26] have made significant contributions to their theory. As mentioned earlier, the theory of independent increment processes provides a powerful tool to understand these priors. The purpose of this chapter is to give a summary of the basic properties of NR priors. In Section 1.1 the Bayesian nonparametric set-up is formalized. Definition and examples of NR priors are provided in Section 1.2. In Section 1.3, we discuss independent increment processes and their Lévy representation. Section 1.4 then pro- vides a description of the connections between NR priors and independent increment processes. Section 1.5 discusses the support of NR priors. 1.1 The Space .7 Consider the measurable space ( [0, oo), me) ) where 8mm) denotes the collection of all Borel subsets of [0, 00). Let M(lR+) denote the space of all probability measures on ([0, 00), 8mm») and .7 denote the space of all cumulative distribution functions on [0, 00). Our goal is to investigate certain properties of probability measures on M(R+). However, since there is a 1-1 correspondence between M(R+) and .7, it suffices to consider probability measures on 7. To further simplify notation, we will let F denote both a distribution function and its corresponding probability measure. Equip M(R+) with the Borel a—algebra under the topology of weak convergence. Since weak convergence on M(R+) is equivalent to convergence in distribution on .7 (also called weak convergence for this reason). we will equip .7 with the topology of weak convergence too. Formally, a. sequence of distribution functions {Fn : n 2 1} E .7 converges to F E .7 weakly if Fn(t) —> F (t) as n -> 00 for all continuity points t of F. We write this fact as Fn —w—> F. One of the properties that makes such convergence useful is the fact that, under the weak topology, .7 is a complete, separable metric space. Let 2; denote the a-algebra of Borel subsets of .7. It is also the smallest a-algebra with respect to which all coordinate maps F v——> F (t),t Z 0 are measurable. A prior is defined to be a probability measure on the space (.7, 2;). It is common practice to view a prior as the process measure corresponding to a stochastic process with paths almost surely in 7. A prior is thus the distribution of a random 7-valued function. The standard nonparametric Bayesian set-up consists of a random distribution function F with a prior distribution II, and given F E .7, a random sequence of observations X = (X 1, X2, . . .) which are independent and identically distributed as F. In short, one writes F ~ 11, and given F, X1,X2, . . . ‘1'? F (1.1.1) Formally, consider the probability space (Q, 2, PH) where Q = .7 x [0, 00)”, Z = 2}- ® 8500) and the measure Pn is defined by Pn(FeD.XleB1,...,XneB) =81] where D E 2;; 81,.. ,,,B 6 8pm,), for every n >1 For each n 2 1, a version of the conditional distribution of F given X1, . . . , Xn is called a posterior distribution of F or, sometimes, a posterior corresponding to the prior distribution II of F. When there is no ambiguity about the prior, we will refer to it simply as the posterior. 1.2 Definition and Examples For any F E .7, let F (-) = 1 -— F () F is commonly known as the survival function corresponding to F. Let F(O) = 1. Definition 1.1. A prior II on .7 is said to be Neutral to the right (NR) if, under II, forallkZlandallO th > s). For t > 0, F(t) is viewed as the conditional probability F (X > t|X > 0). Example 1. Consider a finite ordered set {151, . .. ,tn} of points in (0,00). To con- struct a NR prior on the set 7th” y" of distribution functions supported by the points t1, . . . ,tn, we only need to specify (n — 1) independently distributed [0, 1]-valued ran- dom variables l/'1,...,V,,_1, and then set F(t,)/F(t,-_1) = 1 — V, for l _<_ i _<_ n — 1. Finally, set F(tn)/F(tn_1) = 0. Observe that, F(tn) = 0 and, for 1 S i g n - 1, i F(f,) = Ha — VJ). i=1 Example 2. In a very similar fashion we can construct a NR prior on the space .7; of all distribution functions supported by a countable subset :7; = {t1 < t2 < ...} of (0, 00). Let {Ii-L21 be a sequence of independent [0. 1]-valued random variables such that, for some 77 > 0, 2130/, > n) = 00. :21 This happens, for instance, when H’s are identically distributed with P(V,~ > n) > 0. As before, for i 2 1, set F(t,)/F(t,-_1) = l— V}. In other words, F(tk) = [[17:10 — V,), for all k 2 1. By the second Borel-Cantelli lemma, we have P(H(1—V,-)=0) =1. :21 This defines a NR prior II on .7 because t-roo 1: lim F(t imH(1— V,) — as. II. Other non-trivial examples of NR priors are also available. Example 3. Dirichlet process priors, introduced by Ferguson [10], provide a ready example of a family of NR priors. Doksum [8] suggested construction of general classes of such priors via independent increment processes. Examples of suitable independent increment processes are: Example 4. Beta processes developed by Hjort [14]. These are a family of independent increment processes that correspond to NR prior distributions on .7. We will refer to the induced NR priors also as beta priors. Example 5. Log-beta processes developed by \R’alker and Muliere [26] are another family of independent increment processes that lead to NR priors on .7 for suitable choice of parameters. Priors on .7 constructed via suitable log-beta processes were named beta-Stacy prior processes by Walker and Muliere. We defer further discussion of the last two prior processes and their construction to Chapter 3. Consider, briefly, the problem of specifying a general NR prior on .7. Let Q be a dense subset of [0,oo) and let {t1,t2, . . .} be an enumeration of Q. As seen in Example 1, it is easy to specify a NR prior II" on 7t,,_,_,,n. Let t[") < < til") be an ordering of {t1,...,tn}. We wish to specify the . . . . . - (n) (n) F(t’“ ’) _ An) distributions of the independent [O,1]-valued variables F (tl )— — V1 , F—(ffi — l2 , ’ (n) . 5:8,, ) = V191 in such a fashion that the sequence of priors thus generated, {flu}, is n—l weakly convergent to some measure II, which will then be NR. The difficulty is that we need to know the convergence of a whole family of finite dimensional distributions and their limits for this kind of specification, which is equivalent to knowing II beforehand. However, given a prior II on .7, we can, in above fashion, construct a sequence of priors IIn with support 7t, ,,,,, tn, n 2 1, which will converge to II weakly. We will refer to priors that give probability one. to 7;], where A is at most an ordered countable set, as time-discrete prior processes. Thus, concisely, any NR prior on .7 can be obtained as a weak limit of time-discrete NR prior processes. Later we will construct the beta prior along these lines. 1.3 Independent Increment Processes The theory of NR priors owes much of its development and analytic elegance to its connection with independent increment processes. The principal examples of general families of NR priors have been constructed via this connection. Naturally, no dis- cussion of NR priors can be complete without reference to this phenomenon. This connection also leads to special statistical significance for NR priors. In the next section, we will establish the relationship between NR priors and in- dependent increment processes with non-decreasing paths. For now we briefly discuss the relevant theory of these processes in terms of a representation due to P. Lévy [18]. Here is a brief description of the representation. The following facts are well-known and may be found in Ito [15] and/or Kallen- 10 berg [16]. Definition 1.2. A stochastic process {A(t)}t20 is said to be an independent incre- ment process if A(0) = 0 almost surely and if, for every I: and every {to < t1 < < tk} C [0, 00), the family {A(t,) — A(t,_1)}f=1 is independent. Let ’H be a space of functions defined by ”H = {H | H : [0, 00) r—> [0, 00], H(0) = 0, H non-decreasing, right-continuous}. (1.3.2) Let B(o.oo)x[o,oo] be the Borel o-algebra on (0,00) x [0,00]. Theorem 1.1 (Ito). Let II‘ be a probability on ’H. Under II‘, {A(t) : t > 0} is an independent increment process if and only if the following three conditions hold: there exists 1. a finite or countable set M = {t1,t2,...} of points in (0,00) and, for each t,- E M, a positive random variable I",- defined on ’H with density f,; 2. a non-random continuous non-decreasing function b; and 3. a measure A on ((0.00) x [0,00],IB(0,OO)X[0,OO]) which, for allt > 0, satisfies (a) A({t} x [0. 00]) = 0, (b) ff —1—:—;/\(dsdu)<00; O<,[0 00], u( (E, ),...,u(Ek,-) are independent, and u(E,-, -) ~ Poisson(/\(E,-)) for 1 S i g 1:. Note the following facts about independent increment processes which will be useful to us later and facilitate understanding of the remaining subject matter. 1.3.1. The measure /\ on (0, 00) x [0, 00] is often expressed as a family of measures {At : t > 0} where AAA) 2 A((0, t] x A) for Borel sets A. 1.3.2. The above representation may be expressed equivalently in terms of the moment generating function of A(t) as £(€ —94(t)) 6-b(t) [118(8 ~9Y3) )] exp /:/(1_ e)—0u A(d8dll.) “(f 0 0) > 0. It is known that there are at most countably many such fixed jump—points, that the set M is precisely the set of such points and that 1.3.4. The random measure A H u(-,A) also has an explicit description. For any Borel subset E of (0,00) X [0,00], REA) = # { 0}. 12 1.3.5. Let Ac(t) = A(t) — b(t) — 2&5, A{t,}. Then A°(t) = f/ u ”(at ds, A). 0 0}, or, equivalently, of the measure II”. The measure A is known as the Lévy measure of II‘. 1.3.7. A Lévy process II‘ without any non-random component, i.e., for which b(t) = 0, for all t > 0, has sample paths that increase only in jumps almost surely II‘. All Lévy processes that we will encounter will be of such type. 1.4 NR Priors via Lévy Processes Consider the map ¢D(F) : — logF of .7 into H. Doksum [8] showed that a prior distribution II on .7 is NR if and only if the measure II induced by the map 00 gives rise to an independent increment process. On the other hand, Hjort [14] used the function ¢H(F)(-) = Ap(-) = [0 :32) towards the same end. These maps lend special statistical significance to NR priors in the context of survival analysis. Several authors define cumulative hazard as H F(t) :— ¢D(F)(t) = — log F (t) For us, however, cumulative hazard would be the map (by for the following corresponds to a measure of the rate reasons. First, when F is discrete Ap{t} = #7, :0) of occurence of an event at time t given it has not occurred before. The cumulative 13 hazard, then, is just the sum of these rates. Thus, while on is mathematically simple, the function A p is a more natural choice for the cumulative hazard function. Second, as noted earlier, the two definitions coincide when F is continuous. How- ever, in estimating a survival function or a cumulative hazard function one typically employs a discontinuous estimate. The nature of the map, therefore, plays an impor- tant role in inference about lifetime distributions and hazard rates. Also, since all independent increment processes we will consider increase only in jumps, distributions sampled from the corresponding NR priors are discrete. We will now proceed with the formal details about these maps. Let F E .7 and let T; = inf{t : F(t) = 1}. Note that Tp may be 00. Define a transform (by of F as follows: dF(s) . f(0.t] F[s,00)’ for t S TF’ 4F(TF) fort > TF. 1.4.1. The integral in the definition of AF, for t _<_ Tp, is a Stieltjes integral. Let ) (71) {31.52, . . } be a dense subset of (0, 00). For each n, let 3]" < < 5,. be an ordering of {31, . . . , 3"}. Let .98") = 0 and define F(‘S[n)‘s(n) Tl _ 5' F(s’ ‘m) (t) A%(TF) fort > TF. Then, for all t, A’Ht) —> Ap(t) as n —+ 00. 1.4.2. A p is non-decreasing and right-continuous. The fact that A p is non-decreasing follows trivially since F is non-decreasing. To see that AF is right-continuous, 14 fix a point t and note that, ifj- — max{i < n: s(-" )< t}, then (SUI) $01)] AF(t-+-) - AP“) = lim F(Sjttv sj+2 . n—mo F( S§+)1,OO) where, both {3221} and {33732} are non-decreasing sequences converging to t from above. Thus F (531),, 33'32] -) 0 as n —> 00. If t < TF, then the denominator of the R.H.S. of the above equation is positive for some n. Hence right-continuity follows. For t Z Tp it follows from the definition. It is easy to see that .~lp(t) < 00 for every t < Tp. As with F, we will think of AF simultaneously as a function and a measure. Thus the measure of any interval (s,t] under AF will be defined as Ap(s,t] —-— Ap(t) — Ap(s). For Tp < s < t, define A F(s, t] = 0. One may now uniquely extend it to a a-finite measure on Borel sets. 1.4.3. For any t, AF has a jump at t iff F has a jump at t, i.e. {t : Ap{t} > 0} = {t : F{t} > 0}. 1.4.4. The map AF has an explicit expression. Let Fee) = F(t) — ZF{3} sgt be the continuous part of F. Then F s =}:F[ { } -og<1— Fem). 1.4.5. It follows from 1.4.4 above that 15 (a) Tp = inf{t : Ap(t) = 00 or Ap{t}=1}, (b) Ap{t} g 1 for all t, (c) AF(TF) = 00 if Tp is a continuity point of F, (d) AF{TF}=1 lfF{Tp} > 0. These and other properties of (by and details may be found in G111 and Jo— hansen [13] and Hjort [14]. Now, let A’ be the space of all functions on [0, 00) that are non-decreasing, right- continuous and may, at any finite point, be infinity, but has jumps no greater than one, i.e. A’ = {B e H | B{t} g 1 for all t}. Equip A’ with the smallest o-algebra under which, the maps {A i——> A(t),t 2 0} are measurable. From 1.4.1 on page 14 it follows that (by is measurable with respect to this o-algebra. Also note, from 1.4.2 on page 14, that (13” maps 7 into A'. The actual range, which we will now describe, is smaller. For A E A’, let TA = inf{t : A(t) = 00 or A{t} = 1}. Let A be the space of all cumulative hazard functions on [0, 00). Formally define A as A = {A e A’ | A(t) = A(TA) for all t 2 TA}. Endow A with the (Jr-algebra which is the restriction of the o-algebra on A’ to A. The map (by is a 1-1 measurable map from .7 onto A, and, in fact, has an inverse (see Gill and Johansen [13]). We consider this inverse map next and briefly summarize its properties. 16 Let A 6 A'. Let [3,, 32, . . .} be dense in (0,00). For each 71, let s[") < < 3;") be as before. Fix 3 < t. If A(t) < 00, define the product integral of A by H(1—dA)=,}i_g; H (1-A(s.‘-'_‘l.s5"’]) (s,t] s 0 for some A E A as n —-> 00, if p5(A:,AT) —+ 0 for all T > 0 where A: and AT are restrictions of An and A to [0, T]. It may be shown, following Hjort ([14], Lemma A.2, pp. 1290—91), that if {An}, A E A and p5(A,,, A) -—) 0, then ¢§1(An) 3) 031(A). Thus, if A is endowed with the Skorokhod metric, then (1)171 is a continuous map. The next result establishes the connection between NR priors and independent increment processes with non-decreasing paths via the map (by. Theorem 1.3. Let II be 0 NR prior on 7. Then, under the measure II" on A induced by the map a”, {A(t) : t > 0} has independent increments. Conversely, if II‘ is a probability measure on A such. that the process {A(t) : t > 0} has independent increments, then the measure induced on 7 by the map ogleal—HU—dA). (0.1] is neutral to the right. A formal proof of this simple result does not appear anywhere in print, and hence, we provide one here. Proof. First suppose that H is NR on 7 and let t1 < < tk be arbitrary points in (0,00). Consider, as before, a dense set {81,82,...} in (0,00). Let, for each n, (7!) 31 < < sh") be as before. Suppose n is large enough such that 3%") 2 tk. Then, for each 1 g i S k, we have, 18 with A} as in 1.4.1 on page 14, F(sm 00) Hi Ara.) — A%(t.-_i) = 2: ti- 1 (sgn)St§ ass-">1 = Z (1_F(s‘.’:’,))' t,-1 0} is an independent increment process. Again, let t1 < ~- - < t), be arbitrary points in (0, 00). Then with 5]") < < sh") as before, let, for 1 S i S k, Pie.) = H (1«A. S§n)St. If F A = (2531(A), then it follows from the definition of the product integral that F£")(t) —> FA(t) for all t, as n —> 00. Now, observe that, for 1 S i S k, F"(ti) _ _ (n) (n) m- H (1 A(Sj—ltsj l)- z,-1 — log(1 — x) under the measure A,‘, and (2). For each t, A; is the distribution ofx H I — e‘I under At, where A, and A; are defined following 1.3.1 on page 12. Proof. The proposition is an easy consequence of the following, again easy, fact. If to i—> u(-, w) is a M(I)-valued random measure which is a Poisson process with parameter measure A, then for any measurable function g : x —-> x, the random measure to i—> u(g“(-), w) is a Poisson process with parameter measure A o 9“. Now note that, ¢D(F)(t) — ¢D(F)(t-) = — log $23)) : ”Til " $22)} : —10g[1—(¢H(F)(t)— ¢H(F)(t—))i' 20 1.5 Support of NR Priors Since II on 7 corresponds, via (by, to an independent increment process measure II“ and, since II’ is described by its Lévy measure A, it is natural to expect that properties of II would be describable in terms of A. The next theorem is a case in point. Theorem 1.5. Let II he a NR prior on 7 and let II“ he the measure on A induced by the map a”. If A, the Lévy measure of II“, has support [0,00) x [0,1], then the support ofII is 7. Proof. Let F0 6 7 and let A0 = ¢H(F0). Let, for e > 0, U = {A : p§(Ao,A) < e} where p? is the Skorokhod metric on D[0,T], the space of all right-continuous functions with left limits on [0, T]. By 1.4.8 on page 18, every weak open neighborhood of F0 contains a set of the form (1517,1(U) Hence it is enough to show that II‘(U) > 0. Recall, p§(A0,A) is the infimum of all 6’ > 0 such that there exists a strictly increasing function a from [0, T] onto [0, T] for which sup |a(t) — t| < 6' th and SUP le(t) — A(Oz(t))| < 6'- igr For any given 6 > 0 and T, let 0 = to < t1 < < tn = T be chosen such that Ao(t,,t,+1) < 6 for all i = 0,...,n — 1. 21 Let a,- < t,- < b,- be continuity points of A0 such that a, >ti-6, 140(04) >A0(ti—) —(5/2, z=1,...,n; biAo(ti)+6/2, i=0,...,n-1. Set a0 = 0 and bn = T. Consider A E A such that, for every 0 S i S n, (i) |A(a,~) — A0010] < 5, (ii) |44(br) - A0(b,-)| < 6, (iii) there exists a, < s,- < b, such that Is,- — t,| < 6 and |A{s,-} —— A{t,~}| < 6. Suppose W be the set of all such A E A, i.e. let W = {A E A : A satisfies (i), (ii) and (iii) above}. We will argue that W Q U. Let A E W and let so, . . . , 3,, as in (iii). Let a be the function defined by [3,, for x = t,; x, for b,_1 S x S a,-; linear for a,- S x S t,; linear for t, S x S b,. i Clearly [a(t) - t] < 6 for all t S T. Consider t E [b,-_1, a,). Then A(a(t)) > A(b,_1) > A0(b,_1) — (5, and A0“) < A0(ai) < A0(bi—l) + 6- 22 Therefore Ao(t) — A(a(t)) < 26. Similarly A(a(t)) < A(a,) < Ao(a,-) + 6, and A00) > A0(bi—l) > A0(ai) - 5 thus, A(a(t)) — A0(t) < 26. From above, it follows that |A(a(t)) - Ao(t)| < 26 for t E [b,-_1,a,-). Now suppose t 6 [a,-, t,). Then, since A(QU» Z A(a,) > A0014) — (I, and .40“) S A0(ti—) < A0(a,-) + (5/2, we have Ao(t) — A(a(t)) < 33-6. On the other hand, A(O(t)) S A(Si—) S 14(01) — A{Si} < 140(1),) + (5 — .40{t1'} + 6 < (Ao(t,-) + 6/2) + 6 — A0{t,-} + 6 : A0(t,—) + 36 and Ao(t) Z Ao(a,) > A0(t,-—) — 6/2. Hence A(a(t)) — Ao(t) < 36. It follows that [A(a(t)) — Ao(t)| < 36 for t E [a,,t,~). Finally, let t E [t,-, b,-). Observe that ."I(Cl(t)) 2 A(Sz‘) 2 14(01') '— A{Si} > 140((14') + Ao{t,‘} —' 26 and .400) § .40(b,) < Ao(t,) + 6/2 < Ao(t,—) + A0{t,~} +‘6/2 < Ao(a,-) + A0{t,} + 6. 23 Therefore, A0(t) — A(a(t)) < 36. Also, since A(Q(t)) < A(bi) < 140(1),) + 6, and A00) 2 A0(ti) > A0091)" (5/2. we conclude A(a(t)) — A0(t) < 36. Combining the last two inequalities we get |A(a(t)) — Ao(t)| < 36 for t 6 [t,~, bi). Now let 6 S 6/3 and note that, for all 0 S t S T, [A(a(t)) — Ao(t)| < 6. Consequently A E U. This shows that W Q U. We will now find a subset W0 of W such that II‘(I«I"0) > 0. To this end, for 6 as above and i = 1, ..., n, define the sets 6 6 E1 = [bi-1,01) X (AME—11h) — E1440(ti—lati)+ a) G,“ = [a,,b,-) X (.40{ti} — 4i 40{t,’} + 4%) ’4 TI, and subsequently define III/"0 : {AI#(E1,A) =1 and u(G,~,A)=1,i=1,...,n}. W0 is clearly measurable and has positive II‘ probability since u(E,-,A) and ,u(G,-, A), i = 1, ..., n, are all independent Poisson with positive parameters. Hence, to conclude the proof, all we need to show is that ”"0 Q W. To see this, let A 6 W0 and observe that, 24 (a) for i = 1,...,n, [A(ai) " Ao(az‘)l < [A(ai) — A0(ti)| + 5/2 S :2 |A(b,_1, 01') ‘ A0(t,~-1, till + i“: |A[a,~, bi) — Aoftjll + 5/2 i=1 i=0 < 6/4+6/4+6/2 = 6. (b) fori=0,...,n— 1, |A(b,~) — A0(b,~)| < |A(b,—) — Ao(ti)| + 5/2 S : |A(bj_1,a,-) - A0(tj—1vtj)| + :3 [Alajvbjl " Aoitjil + 5/2 i=1 :0 < 6/4+6/4+6/2 = 6. and, since u(G,-, A) = 1, letting {3,} = {t E [a,-, b,) : A{t} > 0}, we get (c) A[a,, bi) = A{s,} and consequently |A{s,} — Ao{t,-}| = |A[a,,b,-) - Ao{t,-}| < 4% < 6 which concludes the proof. 25 Chapter 2 Posterior Properties and Characterizations In discussing the specification of prior distributions on 7 for nonparametric prob— lems, Ferguson [11] stated that a good prior should be such that: (1) its support, with respect to some suitable topology on 7 , should be ”large”, and (2) the posterior distribution given a sample from the true distribution should be analytically manage- able. In Section 1.5 we addressed the issue of support for NR priors. In the present chapter we focus attention on their posterior. Doksum [8] demonstrated that NR priors are a conjugate family of priors on 7. Later Ferguson and Phadia [12] established that these priors are also conjugate in the presence of right-censored data. Section 2.1 presents Doksum’s result on the conjugacy of NR priors and its extension to the case of right-censored observations due to Ferguson and Phadia. An explicit expression for the posterior, in terms of the Lévy representation, due to Hjort, is provided. In Section 2.2 we observe that, if II is 26 NR, then the posterior distribution of F (t) given n observations depends on the exact observations preceeding t and just the number of observations beyond t. We show that this property of the posterior actually characterizes NR priors. We then use the result to provide another characterization via de Finetti’s Theorem. The result is then extended to the case of right-censored data in Section 2.3. 2.1 Posterior Distribution Consider the standard Bayesian set-up considered in Section 1.1, i.e. let II be a prior and let F be a random element of 7 distributed as II. Given F, let X1,X2, . .. be i.i.d. F. For each n 2 1, denote by II x, ,,,,, X" a version of the posterior distribution, i.e. the conditional distribution of F given X1, . . . , X". We will now state Doksum’s result on the conjugacy of NR priors and provide an alternate proof for it. Theorem 2.1 (Doksum, 1974). Let II be NR. Then IIXl ,,,,, A’.. is also NR. Remark 1. Hereafter, all equalities involving conditional probabilities, in particular posterior probabilities, and conditional expectations are to be interpreted in the al- most sure sense. Notation I. For n 2 1, define the observation process Nn(.) as follows: Nn(t) = Z toga.) for all t > 0. iSn For every n 2 1, let Nn(0) E 0. Observe that Nn(.) is right-continuous on [0, 00). Let —_ _ 15(t2) 15(tk) 9...-.. — 0{F(‘1)’F(t.)“" re.-.) }' 27 Thus 9h...“ denotes the collection of all sets of the form _ - F02) F(tic) D _ {(F(t1),F(tl),... Tar—1))6 C} where C E 863.11“ Proof of the theorem. Fix k 2 1 and let t1 < t2 < < t), be arbitrary points in (O, 00). Denote by Q the set of all rationals in (O, 00) and let Q = QU{t1, . . . ,tk}. Let {31, 32, . . . } be an enumeration on'. Observe that, for large enough m, {t1, . . . , tk} C {51, . . . , 3",}. For such an m. let sgm) < < sill") be an ordering of {31, . . .,s,,,}. Let 1’5"” 2 :—:s[:,% and, under II, let II (m) denote the distribution of Y-(m Let n1 S S nm. Then, given {Nn(s[m)) = n1,...,N,,(s$,',")) = n,,,}, the posterior density of (11%),. . . , YAM) is written as III'LI(1-y.-)"“"‘ ‘21." "' f 11.";(1 - yam-13y "animus :l—If (1_y_) n—n, fylfml’mfi'rfnmdyla - - ' 9 gm) : )n‘ -ni— 311‘ f(1-y.-) Y“ "' 'y? "‘dl'll'")(y.~) This shows that (I’1(m),..., 51"”) are independent under the posterior given {.,,(N’(s(1,,(m)),...,N(sm ).} Hence, m)) '7ij < (53' (832) , i=1,...,k E :2. 1 0 || :1 "q: ti-1<8§m)Sti are also independent under the posterior given the same information. Now, by the right-continuity of Nn(-) we have. as n —+ 00, o{N,,(s,-),j S m} T 0{N,,(t),t Z 0} E 0(X1, . . .,X,,). 28 Hence, for any A 6 gm,“ by the Martingale Convergence theorem, we have n(.4 | N,,(s‘,"‘)), . . . , Nn(s§;,">)) —> mm | X1, . . . ,xn) almost surely. In other words, one concludes that, the distribution of (F (t1), Fi‘ii‘i’ . .. "5%) under the posterior given Nn(s(1m)), . . . , Nn(s$,'1")) converges to its distribution under the posterior given X1, . . . , X", as m —> 00. Since F(tl), géi—f%,... ,Fftlti) are independent given 0(Nn(s(1m)),...,Nn(s§L"))), independence also holds in the limit. I Doksum provides a representation of the posterior 1‘le ,,,,, Xn of a general NR prior and shows that, unlike the Dirichlet, the distribution of %, under the posterior, depends on the exact values of the observations in (0, tk] and not just on the number of observations in that interval. Ferguson and Phadia [12] extend Doksum’s result in the case of inclusively and exclusively right-censored observations. Let x be a real number in (0, 00). Given a cdf F E 1:, an observation X from F is said to be exclusively right-censored if we only know X 2 a: and inclusively right-censored if we know X > 2:. We state their result next and reproduce the proof for the sake of completeness. Theorem 2.2 (Ferguson and Phadia, 1979). Let F be a random cdf neutral to the right. Let X be a sample of size one from F, and let a: be a number in (0, 00). (a) the posterior distribution ofF given X > :1: is NR, (b) the posterior distribution ofF given X Z a: is NR. 29 Proof. Let t1 < < tk be arbitrary points in (0, 00) including :r. Let t] = I and let to = 0. Define 7 Fa ) _ , . F(IJ; fOI'l—O,...,]—2, m = . . , F(¢,_,) for! J+1,...,k, 11:4 F(t -) _ . . 7’57;— forl_] -1, PH” = . \F(t(—-) fOl'l J Under the prior, Y0, . . . , Yk are independent random variables with joint density, It jib...“ (yo, - - - 1 Me) = H fY. (Eli) i=0 with respect to some convenient product measure. Given F, the probability of X > a: is written as hence, the posterior density of Y0, . . . ,Yk given X > :1: is written as j k X > :r) = C. (H yiqu(yz)) - < H fill/1)) i=0 i=j+1 fi'o...i',,(y0, . - - , M: where C is a normalizing constant depending on 3:. Thus Y,- are independent under the posterior as well, and hence, the posterior is NR. In the same way, since, given F, the probability of X _>_ I is the posterior density of 1}), . . . , 1",, given X 2 a: is j—l k fi'o...Y,.(yo, - - my): I X Z I) = C- (H yth.(3/z)) - (H fY.(yz)) i=0 i=j where C is again a normalizing constant depending on 2:. Thus Y,- are again independent under the posterior given X 2 a: and hence it is NR. I Suppose fl is a NR prior for which the corresponding independent increment process has b E O and a Lévy measure of the form d/\ = a(s, u) ds d.40(u) (2.1.1) for some non-decreasing function A0 and a positive function a. In view of Theo- rems 2.1 and 2.2, the posterior, given some possibly right—censored observations, is also NR. The following theorem, taken from Hjort [14] and due to Ferguson and Pha- dia, provides an explicit expression for the components of the independent increment process corresponding to the posterior. Let A0 6 A and let M = {t1, . . . , tk} be the set of fixed jumps of A0. Theorem 2.3 (Hjort, 1990). Let F be a NR random cdf such that A; is an in— dependent increment process with components b E O, Lévy measure given by equa- tion (2.1.1), fixed points of jumps belonging to M and density for the jump Ap{tj} denoted by f], j = 1,...,k. Denote by M‘, {f;} and a‘ the posterior parameters. Then the following hold: (1). Given X > 1‘, the posterior parameters are given by 31 (a) M“ = M (b) 0(1 - Slfj(8) iftj _<. 1?; f;(8) : fj(3) 2f tj > 1‘. where c is a normalizing constant; and (C) (1 —- s)a(s, u) ifu g 1‘; a‘(s, u) = a(s, u) ifu > z. (2). Given X = :r,a: E M, the posterior parameters are given by (a) M‘ = M (b) c<1— shy-(s) in. s x; f;(3) : csf,(s) 1of ti = 13; fj(8) 2f tj > 1'. where c is again a normalizing constant; and (c) a‘(s,u) as in (I). (3). Given X = 13,1: Q M, the posterior parameters are given by (a) M“ =M 32 (1?) C(1— s)fJ-(s) if t,- < I; f}(8) = fj(s) if t,- > :c. The new jump A{x} has density f‘(s) = csa(r,s). where the c’s are normalizing constants; and (c) a‘(s,u) as in (1). Of interest to us is the following consequence. Theorem 2.4. Let A be a Lévy process with no fixed jumps, i.e. M : a5. Suppose its Levy measure is given by d/\ = a(s, u)ds dA0(u) for some .40 E A. Then the posterior distribution given (X1, . . . , X") = (:61, . . .,;L”n) is a Le’vy process with parameters (a) M‘ = {81 < < 8,0}, the distinct elements of {1:1, . . . ,xn}; (b) f;(s) = csN"(51)“N"(SJ*‘)(1 — s)"‘N"(31)a(sj, u) where Nn is as before; and (c) a’(s,u) = (1 — s)”‘N"(“‘)a(s,u). The proof follows by repeated application of Theorem 2.3. 2.2 Characterizations of NR Priors For each T in R, let N}: = {Nn(t) : t g T}. As before, for a prior H, 11x, ,,,,, X" stands for its posterior given X1, . . . , X", and HM? for its posterior given {Nn(t) : t g T}. In general, for any family of random variables 1‘7 depending on X1, . . . ,Xn, and any 33 Illi’E PW EYE Pr Ellis!) measure V, the conditional distribution corresponding to u, given N, will be denoted VS" Theorem 2.5. Suppose II is a prior such that, II{F : 0 < F(t) < 1 V t} = 1. Then the following are equivalent: (1). II is NR. (2). For all T E R, the distributions of {F(t) : t S T} under 1'le ,,,,, X" and UN; are the same. (3). For all t1 < < tk < tk+1, the distributions of (F(tl),...,F(tk)) under HNnm) ,,,,, Nnmfl) and TIA/"(m ,,,,, Nnm) are the same. Remark 2. Note that (2) is a statement that holds almost everywhere with respect to the marginal distribution of X1 ...... X n. However, under our assumption, (3) holds everywhere. Interpret (2) as, there exists a version of the L.H.S. equal to the R.H.S. everywhere. Proof. (1) => (2) is well known. See for instance Doksum [8]. To see (3) => (1), we will show that, for fixed 0 = to < t1 < < tk+1, {F(tl). . . . , FUD} is independent of Flat“). 1?(tk) This would then show that - F(t'z) F(tk) } . . F(tk+l) Ft ,_ ,...,.———— isinde endentof _ . { (I) F(til Flu—1) p F(tk) Let n be a positive integer. Since the posterior densities of (F(tl), . . .,F(tk)), given {Nn(t1) = 0,. ..,N,,(tk) '2 O}, and {Nn(t1) = 0,. . ., Nn(tk) = 0,Nn(tk+1) = O}, 34 are equal, the same holds for the posterior densities of (F (t1), iii—:3, . . . , $531)) given the same information. This gives p i n k+l F(z) n - 111.1%] _ an [11.-. . . .1 I F mam] HI: [F21 n — I—[rH-l n H1 11L... ,1 mm 1.11. _1p;+..‘:1~——,1)FH 0, as n —> oo 11{X1 < t,...,X,, < t} —>0 and u{X1>t,...,X,, > t} —+0. Then the following are equivalent: (1). II is NR. (2). For all t 6 IR and n _>_ 1, [ix] ..... Xn{Xn+l > t} = #Nfi,{Xn+1 > t}- Proof. In view of Theorem 2.5, (2) immediately follows from (1), since #x,...x,.{Xn+1 > t} = / F(t)dHX1 ..... X..(F) )7 and “fix,“ > t} = f, F(t)dIIN3‘(F). To see (2) => (1), fix 5 6 IR and define T(:1:) = II(0,5](I) +(S+1)I(5,oo)(1:). Then (2) implies, for every positive integer m, that X1, . . .,X,, 1;.L5T(X"+1)’ . . . ,T(X,,+m). (2.2.2) To prove this claim, consider 0 = to < t1 < < tk < it“ = 5. Since 0{N,‘,‘} (_I . ~- g a{N,:w} = 0{N,f}, by (2) each of the events {Xn+1 > t,}, i = 1,...,k +1, is conditionally independent of X1, . . . , Xn, given an, under the measure 11. Hence the 36 random variable Tk(X,,+1) = 2:11 t,I(t,_,,¢,}(Xn+1)+(S+1)I(5,oo)(Xn+1) is condition- ally independent of X1, . . . ,Xn, given NE, under 11. Letting t1, . . . ,tk run through a countable dense set in (0, S], we get that X1, . . .,X,, ELENXW). A simple induction argument then yields the claim (2.2.2). Now, fix t1 < < tk. Given integers n1,...,nk, set m = n1+---+ nk and, given X1, ..... Y n, consider the predictive probability of the event: ” of the next m observations, n,- fall in (t,_1,t,~] for i = 1, . . . , k.” Since the event {XHJ- E (t,_1, t,]} is the same as the event {T(Xn+J-) E (t,_1,t,-]}, for all i = 1,...,k, and S 2 tk. we have [.11 — F(t1)1"‘[F(t1) — F(t2>]"2...1F(tk-1)— Fearkdnx. ..... mm = All — F(t1)l"‘[F(t1) - Fear? ...[F(t..-1) — F(t111"*dn~g(r). This shows that the distribution ofF(t1), F(t1)— F(tg), . . . , F(tk_1) — F(tk), given X1,...,X,,, is the same as that given NS, for all S 2 tk. Since F(tl),...,F(tk) is a function of these quantities, the same will be true for F(tl), . . . , F(tk). Hence (1) follows from Theorem 2.5. I In a recent work, yet to appear, Walker and Muliere [27] also obtained a similar result. Their condition on u is expressed in terms of the expected instantaneous hazard rate under the de F inetti prior and corresponding posteriors. When the set of values for the observations is finite, they provide an explicit condition on the predictive 37 distributions corresponding to u. The above results and a similar characterization for Tailfree priors may be found in Dey, Draghici and Ramamoorthi [5]. 2.3 NR Priors from Censored Observations We will now extend Theorem 2.5 and provide a characterization of neutral to right priors in terms of their posterior distributions in the presence of right-censored ob- servations. Suppose, as before, that F ~ II and given F, X1,X2, . .. are independent and identically distributed as F. Here X1,X2, . .. are thought of as survival times. Let c1,c2, . .. be constants. These are our censoring times. For each i, we only get to observe: 21‘: min(X,-,c,~) and Ai = I{X.‘SC,}' Therefore, A,- = 0 means Z,- is a right-censored observation. Define the observation processes: n Ni(n)(t) : Z [{ZJSLAJ'Z'i}, i: 031 1:1 and let N("l(t) = (Nf"’(t), Ng"’(t)) and also let A3,“ = {t g T: Ng">(t) — Ng"’(t—) > 0}. Under this set-up, the following theorem characterizes NR priors. Theorem 2.7. Suppose II is a prior such that, II{F : 0 < F(t) < 1 V t} = 1. Then the following are equivalent: 38 (1). II is NR. (2). For all T 6 IR and n 2 1, the distributions 0f{F(t) : t _<_ T} under H{N(n)(t),t>o} and II{N(..)(,).,ST} are the same. (3). For alln 21, k 2 1 andtl < < tk < tk+1, the distribution ofF(t1), . ..,F(tk) under the posterior given {A71(")(t1),..., anl(tk+1)}, {N(")(t) : t S tk} is the same as that under the posterior given {Nf")(t1), . . .,N1(")(tk)}, {N(")(t) : t S 1,}. Proof. ( 1) :> (2) is due to Ferguson and Phadia [12]. We omit the proof here. To prove (2) :> (3) note that, by (2), {F(tl), . ..F(tk)} is conditionally indepen- dent of {N(")(t) : t Z 0} given {N("l(t) : t g tk}. Therefore {F(tl),...F(tk)} is conditionally independent of {1V1(")(t1),...,N1(")(tk+1)} and {Né")(t) : t S tk} given {N‘"’(t) : t s 1.}. Then, for any measurable function g of (F(tl), . . . F(tk)), s[ |N(")(t ) N‘")(t )] 9 1 1 7 a 1 k+1 is measurable both with respect to the 0{N1(")(t1), . . . , Nf")(tk+1)} Vo{N((,")(t) : t g tk} and o{N,,(t) : t g tk}. Hence it is also measurable with respect to their intersec- tion, which is 0{N1(")(t1),. .., an)(tk)} Vo{Né")(t) : t g tk}. To see (3) => (1), we will show that for arbitrary t1 < < tkH, _ - Ft {F(tl), . . . , F(tk)} is independent of 5 H1). F(tk) Fix y] < < y, g tk. Let Aff = {y1,...,yr} and let 31 < < sk+r+1 be points 39 such that At: U{t1,---,tk+1}={31< < Sk+r+1} Note that 3“. 2 ti, and sk+r+1 = tk+1- Now, for integers m1 3 - -- 3 mr, consider the set B: {Ndn)(t) =le(mJ ‘mJ- 1)Ilyj°°)(t)7t S tic} Thus, on B, (m,- —mJ-_1) observations are censored at the point yj, j = 1, . . . , r. Let C = {N,‘")(s,~) = 0,j = 1,...,k+r} and let 0’ -.= {Nf"’(s,) = O,j=1,...,k+r+1}. Since the posterior density of (F (t1), . . .,F (tk)) given C n B and the posterior density given C’ n B are equal, then so are the densities of (F (t1), 7;-E%;, . . . , %). This gives, l'lfiflml" m‘ ‘gnlnk:r+1[F‘I:‘—:(.s ““711" m' 1 IF (31) ’F(3"+')l ff'l—I l—Ik+r[ [Magi-12017... mi-ldII(F)— III—I; Hk+lr+1[ [Ffsgf:)l)]n-mi_ldn(p) and hence that F(3k+r+1) - 5n [ F(SlH-r) ]" m" |F(sl),...,F(sk+,) =constant a.e. II. Since this holds for all n — mk, it)” is independent of {F (t1), . . . , F (tk)}. Re- peating the argument with I: replaced by k - 1, k — 2,. . . , 1 the result follows. I 40 Chapter 3 Beta Processes Beta processes, introduced by Hjort [14], are independent increment processes with sample paths almost surely in A. As such, the process measures for beta processes, hereafter referred to as beta process priors on A, induce NR prior distributions on .7: via the (15,? transform. We will call the induced priors beta priors on .7. Apart from Ferguson’s well known Dirichlet process priors, beta priors constitute the first specific examples of a family of NR priors on .7. Another family of NR priors are the beta-Stacy priors introduced and named by Walker and Muliere [26]. However, as we shall see in this chapter, these two families are identical. The organization of this chapter is as follows. Section 3.1 serves as an introduc- tion to beta processes and discusses the construction of beta priors on .7. Section 3.2 provides some posterior properties and establishes weak consistency of beta priors. Section 3.3 considers Dirichlet process priors for distributions on the space of right- censored observations. We show that suitable such Dirichlet process priors induce independent beta priors on the space of lifetime and censoring time distributions. 41 Section 3.4 then shows that beta processes with distinct parameters tend to be mu- tually singular. We conclude the chapter in Section 3.5 by discussing the beta-Stacy priors and establishing that they are simple reparameterizations of beta priors on .77. 3.1 Definition and Construction Let .40 be a hazard function with finitely many jumps. Let t1, . . .,tk be the jump- points of .40. Let c(-) be a piecewise continuous non-negative function on [0, 00) and let A3 denote the continuous part of A0. Let .40(t) < 00 for all t. Definition 3.1. An independent increment process A is said to be a beta process with parameters c(.) and .40(.), written A ~ beta(c, .40), if the following holds: A has Lévy representation as in Theorem 1.1 with (I). M = {t1, . . .,tk} and the jump-size at any t, given by 135 A{tj} ~ beta(c(tJ-)Ao{tj},c(tj)(1— A0{tj})); (2). Lévy measure given by A(ds du) = c(s)u‘1(1 — u)c(5)_1du dA3(s) forO g s < 00,0 < u <1; andfor which (3). b(t) E O for allt > 0. The existence of such a process with sample paths almost surely in H is guaranteed 42 by the Lévy representation theorem (Theorem 1.1), because 0 0. sfflgt Theorem 3.1. {Iln},,21 converges weakly to a NR prior II on .7 which corresponds to a beta process. 43 Proof. First observe that, as n —> 00, 5mm) = II an.(1- 12"”) sfnlgt = 11 (1- semen) so)“ Fo(SE:)1,OO) —> 110- who.» (0J1 = H“ - 61/10) = Felt) (0J1 for all t 2 0. Thus Enn(F) = Fn 3+ F0 as n ——> 00. Hence, by a result due to Sethuraman, {IL} is tight. We shall now follow Hjort’s calculatons to show that the. finite—dimensional dis- tributions of the process F, under the prior II", converges weakly to those under the prior induced by a beta process with parameters c and .40 on 71. Consider, for each n 2 1, an independent increment process A5, with process measure II:l on .A such that, for each fixed t > 0, was» = a Z 11"”). 3:")St Thus, for each n 2 1, AS, is a purely jump-process with fixed jump-points at ) ('1) SW and with random jump sizes given by 12("),...,Vn(fl at these sites. 31 ,. . ., "_1 Clearly lI;"1 induces the prior II" on f. Now, for any fixed t > 0, repeating computations as in Hjort ( [14], Theorem 3.1, pp. 1270-72) with Foe Sin) 611,3 2 C(S(,n) ), bnn‘ = Cn,‘ - c(( (111)) , and and = Cnfl; — bn’i. F0 3:41 44 one concludes that, for each 0, as n —-> oo, E[e“aAi‘(‘)] —> exp {[1 /t(1 — e'o“)/\(ds du)} o o and, similarly, m m 1 a, 6 exp — ZB,~.4S,(a,-_1, a,] —> exp {— 23/0 f (1 — e-oju))1(ds du)}. 1:1 i=1 “H Thus the finite dimensional distributions of the independent increment processes An converge to the finite dimensional distributions of an independent increment pro- cess with Lévy measure as in Definition 3.1. If the process measure is denoted by II‘ and the corresponding induced measure on .7: is denoted by II, then considering the Skorokhod topology on A and by the continuity of (25,? we conclude that, for all ala ' ° ' a am» £(F(a1), . . . , F(am) |1In)£> £(F((11), . . .,F(am) III). Therefore {IIn} converges weakly to II, a NR prior on f. I As noted earlier, the existence of a Lévy process with the given Lévy measure is not difficult to establish. It is also not much trouble to show that the finite dimensional distributions under the time-discrete processes converge to the finite dimensional distributions of the given Lévy process. The difficult issue to resolve is the tightness of the sequence of time-discrete processes and that the paths are almost surely in A. By constructing the prior directly on .7 one is able to apply Sethuraman’s result and thereby resolve the tightness issue very easily. Since explicit expressions for the finite 45 dimensional distributions are hard to obtain, we take recourse to the continuity of the product integral to establish proper convergence of these distributions. 3.2 Properties of Beta Processes In this section we prove weak consistency of the posteriors of beta process priors on A. First we take a look at some properties of the prior and the posterior. 3.2.1 General Properties The following properties of beta processes may be found in Hjort [14]. (1). Let .40 E A be a hazard function with finitely many points of discontinuity and let c be a piecewise continuous function on (0, 00). If A ~ beta(c, A0) then £(A(t)) = Ao(t). In other words F = (15,104) follows a beta(c, F0) prior distribution and we have £(F(t)) = F0(t) where F0 2 ¢;,I(Ao). The function c enters the expression for the variance. If M = {t1, . . . , tk} is the set of discontinuity points of A0 then V(A(t)) = Z Aoitjlfl - Ao{tj}) +/0 dAo,c(3) tJSt c(tj) + 1 C(15) + 1 where A0,c(t) = A0(t) - ZtiSt A0{ti}. . Let A ~ beta(c, A0) where, as before, .40 has discontinuities at points in M. Let, given F, X1, . . . , Xn be i.i.d. F. Then the posterior distribution of F given X1, . . . , Xn is again a beta prior, i.e. the corresponding independent increment process is again beta. 46 To describe the posterior parameters, let Xn be the set of distinct elements of ($1,...,x,,}. Define )=ZIXZ,) and Y(t)=ZI(x>,) i=1 With Nn(t) as before, note that 17,,(t) = n — Nn(t) and Yn(t) = n - Nn(t—). Using this notation, the posterior beta process has parameters Cx....x..(t) = C(t) + Yn(t); _ 'c(z) dAo(z) + dNn(z) Aaxl. X (t) _ f0 c(z) + Yn(z) ' More explicitly, A0,X1...Xn has discontinuities at points in M‘ = M U X", and for t E M‘, _ c(t).Ao{t} + Nn{t}. A0,X,...x..{t} - C(t) + mt) ’ C _ tc(z)dA"'(z) ‘ 0.xl...x,,(t) _fo c(z)+13n(z)° Note that, ift E M‘, A{t} ~beta(c (t,,)Ao{t}+N {t},c )1(—A0{t}) +Y(t) N—,,{t}). . Our interest is in the following special case of (2). Suppose A ~ beta(c, A0) and that A0 is continuous. Then the posterior given X1, . . . , Xn is again a beta process with parameters Can-Juli) 2 CU) + Ynlt) and ‘40.A’1...Xn(t) = Ag,x,...x (t) + ‘40 X... .Xn (t) 47 where and . _ wow/10(2) A0,.....x.(t) — / c<21+ m2). As a consequence, if t E X", then under the posterior II x. ,,,,, X" we have A{t} ~ beta(Nn{t}, c(t) + 17,,(t)). Also note that the Bayes estimates are 8m, ..... x.(F(t)) = II (1‘ c(t,])V:{i::(t.-))exp i-htCI(zz))j-A}3n((zi)}' (3.2.2) . If .40 is continuous then, since F40.X1...X.. has jumps, the prior and posterior are mutually singular. 3.2.2 Weak Consistency of the Posterior Let II be a prior on f and, as before, let II),l ..... X. be (a fixed version of) the posterior given 4Y1, . . . , X". 48 Definition 3.2. The sequence of posteriors {II X, ,,,,, X11} is said to be weakly consistent at F0 if "11:201le _____ X,,(U) = l as F6” for every weak neighborhood U of F0. Weak consistency, as shown in the next proposition, is a weak requirement. Proposition 3.1. If, for all continuity points t of F0 and all e > 0, IIX, ,,,,, Xn {F : |F(t) — F0(t)| < e} —+ 1 as. F6”, then II X, ,,,,, x. is weakly consistent. Proof. Since every weak neighborhood of F0 is of the form {F : |F(t,—) -— F0(t,-)| < c,,i = 1,. . .,k,t,- continuity points of F0} the result follows easily. I Theorem 3.2. Let A0 6 A be continuous and finite for allt > 0. Let II be a beta(c, .40) prior. then the posterior is weakly consistent at all F0 6 f. Proof. Let a = {($1,1:2,...) : Van“) —> F0(t—) for all t}. By the Glivenko-Cantelli lemma F§°(Q) = 1. Fix a; = ($1,172,...) in (2. Let Xn = {t1,...,tk(,,)} be the distinct elements in ($1,...,:z:,,}. Fix t, a continuity point of F0, with F0(t) > 0. For notational convenience we will use Nno‘ = ”{tj}, Y” = Yn(tj), I’M- = F',,(tj) and c,- = c(tj). 49 Also let K = supzS,c(z). Recall from (3.2.2) that . - Nn - ‘ c(z) dAo(z) n as t := -—-———41—- —- . S (t) aux] ..... Xn (F( )) H (1 CJ' + Yn’j) exp{ A C(Z) + Yn(Z)} We will argue that Sn(t) —> Fo(t) as n —> 00. Since Yn(t) —> 00 as n —> 00, t c(z)dA0(z) ‘dA0(z) A0(t) ,/(] c(z) + Yn(z) S K/O Yn(Z) S K —> 0. (3.2.3) Consequently, as n —> oo, 1- 1: 11311} —» . Now, observe that, H (1_ NM )3 H (Cj+l7n.1)_ tjEXn Cj + yng‘ tjEXr. C3" + Yn,j tht tJSt Since, for positive numbers a, b, c such that c > b, we have if}: > g, as n —+ co, 1’]: (Ci +:n.j) 2 H Ind Z Yn(t) _> F()(t) tjexn Cj + 71,]- ‘151 ‘15! On the other hand, l-I (Cj +InJ) S K +In,j < K+Yn(t) _) F0“). hex. ”Ex" 61' + ij A + ij K + n ‘JSt ‘1 3‘ Next we will show that lim Enx,.. ...... (1520)) = Fifi) (3-2-4) which will prove that lim anl ,,,,, “(Fan = 0 (3.2.5) 50 and will establish the result. For each n 2 1, let An,c(t) = A(t) — Znexn A{ti}, then for any A, “St F(t) = I10 — dA> = H (1- A{t.})II(1- dA.,.(s)>. 0,t t.‘ Ex 0,! 11 ,6; 11 We will show that (1). limnnoo ‘9an ''''' xn[l_[[o,,](1 — dA,,,,:(s))]2 = 1; and that Q) 2 2 "lggognx, ..... x,. H (1" A{till = le Enx, ..... Jrn [110'- A{till = —02(t) neg; n 00 est Since A{t} < 1 for all A, [H(1 - dAn,c(8)):| = [|t,lim (1 — An,c(tia ti+ll) [0,t] _tj lio : l 1— Anc ti, i 2 Infillw ( . ( t+1ll < r 1—Azt,m —It1-1I,I-IL0 ( n,c( 7 +1“ =Ifi1—flAhflflL [OJ] and since, by (3.2.3), Enxl ..... X" AM —> O, £nx,.....xn [HO ‘— dAn.C(3))] _<— gnxl ..... x,1 [Hi1 — dA:,c(3))] [01!] [0,1] =6rm,xgnasgrm.x«%*_+r ..... ..... mam 51 Since, under H X1..... x", A{ti} ~ beta(Nmi, c,- + I’M) and are independent, Ymi'i'ci Yn,i+ci+l 5m, ..... x. H(1—A{ti})2 : H [Yni+cixYni+ci+1] tiexn tiexn tht t§St S H Ymi-l-K X Y'n,i+K+I Yni + K In; + K + 1 tiexn , ’ HS! < 17,,(t—) + K Yn(t—) + K + 1 — n + K n + K + 1 —> 1350—) = 133(1). (3.2.7) Thus, by (3.2.6) and (3.2.7), we have lim 511x, ..... x.(F2(t)) S -020) which, of course, yields (3.2.4) and hence (3.2.5). I 3.3 The Problem of Right-Censored Observations Let X and Y be independent positive random variables with distributions F x and F y respectively. Our motivation is to think of X as a lifetime and of Y as a censoring time. In the problem of right-censored data, we get to observe Z = min(X,Y). We also observe A = I{X S Y}, thus being informed whether a lifetime has been observed or was censored. Our intention is to infer about F x- One of the many Bayesian approaches to this problem is to put a suitable prior on the joint distribution of (Z, A). We will refer to the space of all such joint distributions as the observational distribution space and denote it by $1sz Peterson [20] provides explicit expressions for recovering F x and FY from the joint distribution of (Z,A) 52 under certain mild conditions. We will use this to study the distributions of F x and Fy for suitably chosen priors on 7(Z.A)- Specifically we will consider a Dirichlet process prior for the observational distri- bution space. Our goal is to obtain the induced distributions of F x and Fy under such a prior. We begin by defining a Dirichlet process on HF E (0, 00). Definition 3.3. Let a be positive measure on R+. A Dirichlet process prior on IR+ with parameter a, denoted Dir(a), is such that for any k 2 1 and for arbitrary points 0 < t1 < to < < tk, under Dir(a),(F(t1),F(t2)- F(tl), . . .,F(tk) — F(tk_1) has a Dirichlet distribution with parameters (a((0,t1]), a((t1,t2]), . . . , a((tk_1, tk])). As before, let FA-(t) = 1 — Fx-(t) and Fy(t) = 1 — Fy(t) be the survival functions for the lifetime and the censoring time respectively. Let F1(t)=P(Z§t|A=1); F0(t)=P(th|A=O); andp=P(A=1). Let a be a positive measure on R+ x {0,1} such that o aO(-) = a(- x {0}) and al(-) 2 a(. x {1}) are positive measures on 112+ with full support. 0 {t : ao{t} > 0} and {t : al{t} > 0} are disjoint. Definition 3.4. A Dirichlet process prior II with parameter a on F(ZA) is a prior such that, under II, (I).F1~ D’lT‘(C¥1),’ (2). F0 N DiT((10),’ and 53 (3). p ~ beta(al(lR+),ao(lR+)), and, F0,F1 andp are independent. Our goal is to obtain the distribution of F x using Peterson’s map. The definition of the hazard function is crucial to the map. Peterson uses Doksum’s definition for this purpose. However, we intend to use the product integral, i.e. the definition used by Hjort. Hence a slight modification of the map is necessary. First let us observe the following properties of F0, F1 and p. These properties and more may be found in Peterson [20]. (ll- PRU) = ‘ftw FYIS-ldFX(-3)a and (1 “Plfioltl = "ftoc FX(3-)dFY(3)° (2). With obvious notation pF1(t) + (1 — p)F0(t) :2 FX(t) Fy(t) = Fz(t). (3). Jump-points of F1 are jump-points of FX and vice versa. Similarly, jump-points of F0 are jump-points of Fy and vice versa. Under our assumption on (11 and 02, if II is a Dirichlet process with parameter a, then F1 and F0 does not share jump-points almost surely II. Also F1 and F0 have full support almost surely II. As before, we will treat each of the distributions F x, Fy, F 2, F0 and F1 both as distribution functions and the measures corresponding to them. Proposition 3.2. Let F1 and F0 be distribution functions with full support and no jump-points in common. Let p > 0. Then F0, F1 and p uniquely determines FX and 54 Fy. The map for FX is given by: Mt) = II(1‘ 1%%%) “pi—lot g—jiég} sgt where Ff(t) = F1(t) — :33, F1{t}. The map for Fy is similar. Observe that here Fz[s,oo) = pF1[s,oo) + (1 —p) F0[s, 00). Proof. We know that if A x is the cumulative hazard function corresponding to F x, then Fx(t) = H(1— dAx(S)) = H“. — Ax{8}) exp[—A§((t)]. [0,1] 35: where A; is the continuous part of A x- Note that tBEIELSI__ LBS-EEG) __ ‘dF;(s) _ c l. Mow) ‘ f0 Fx(s-)Fy AX(t) as n —> 00. Let F(ZA) denote the joint distribution of (Z, A) specified by F1, F0 and p. Since F(ZA) ~ II, the Dir(a) distribution, pF1[t(") ti:)1) F [,t.) 00) ~ beta(a(A£"’>.a(B‘"’)) Z i , l where Mf") = {t:"), 00) x {0,1}, Ag"): (1?” 15:1)x {1} and Bf") .—_ ME") — Ag"). Rewrite this as, PFlltln),tI:;1l (n) (ll/4:71)) ,(n) _ MAI”) Fz[t(n)oo beta (a(M ) a(MW), ,,~Ct(M ) 1 a(Minl) 1 (>1) pF1[tn 3",) F [teem ,i Z 1 are all independent. 2‘ 1' Further XM- E Assume that al is continuous. We will now show that, under II, the distribution of An converges in distribution to a beta process with parameters c and A0, where c(t) = a([t, 00) x {0,1}). (3.3.9) 56 Since An(t) —-> A x(t) for all t as n —> co, the distribution of An under II converges to the distribution of A x under II, i.e. £(¢n(F1,Fo,p) l H) 3* £(Ax I II)- We may then conclude that, £(Ax I II) is a beta process with parameters as stated above. Theorem 3.3. Ax ~ beta(c,Aa) where c and A, are as in (3.3.9) and (3.3.8) re- spectively. Proof. In view of the above discussion it suffices to show that £e:rp{—-0An(t)} —> exp {— /01(1 — e"9’)d)\t(s)} where dAt(s) = (f; c(z) s‘1(1 - s)C(‘)‘1dAa(z)) ds. and that {An} is tight. Letting cm, = a([t]n), 00) X {0, 1}), a(AE'”) __ owl“) a(anl) and b,”- = J ( ) “(Min ) an,i : Cn,i and observing that Xm— ~ beta(an,,-.b,,,,-) and An(t) =: 21‘") <1 an, the proof follows easily by mimicking Hjort’s construction of the beta process ([14], Theorem 3.1, pp. 1270—72). I 3.4 Mutual Singularity of Beta Processes Let II”; and II; be two independent increment processes on A with no fixed jump- points. Then, as we saw from the Lévy representation,~the random measure A 1——> 57 u(-,A) defined by #(EaA) = # {(t.A{t}) E E = A{t} > 0}. for any Borel subset E of (0, 00) x [0, 00], is a Poisson process under III and II; with mean measures say A1 and A2. If the Poisson processes induced by II; and II; are mutually singular then so are I1; and H5. Conditions on A1 and A2 can be given which will ensure that the corresponding Poisson processes are mutually singular. We quote below a theorem due to Brown [1] in this context. Theorem 3.4 (M. Brown, 1971). Let PA, and PA, be Poisson processes over a measurable space (LA) with o-finite mean measures /\1 and A2. let A1 = u+ u be the Lebesgue decomposition of /\1 with respect to A2 (u << A2, V .L A2). Then PA, I PA, if and only if one of the following conditions hold: {1). 11(1) = 00, (2). chlf—lldAgzoo for somec>0 wheref= %: and BC: {|f— 1] > c}, (3). féc(f—1)2d)\2=oo for allc>0. We apply the above theorem and show that, in general, beta processes tend to be singular. Theorem 3.5. Let III and II; be two beta processes with parameters (c1,A1) and (c2,A2). Assume that Al and A2 are continuous. Then (c1,A1) # (c2,A2) implies III .1. I13. 58 Proof. Recall that the Lévy measures corresponding to II; and II; are given by /\1(ds dz) = c1(z)s—l(1 — s)°‘(s)'ldA1(z)ds A2(ds dz) = c2(z)s"l(1 — s)°2(‘)‘ldA2(z)ds. As before, we will continue to use A1 and A2 for the measures generated by these functions. (1) Suppose .41 and A2 are not mutually absolutely continuous, so that there exists a Borel set B C 1R+ such that A1(B) > 0 but A2(B) = 0. Then A (Bx (0,)1) 2/8/119’1)(—-s“(zl’lc1(z)dsdA1(z)=oo because the inner integral is 00 for each fixed z. and A2(B X (0,1)) = 0. Consequently, by condition ((1)) of Theorem 3.4, the result follows. (2) Now suppose that Al and A2 are mutually absolutely continuous. Let g(z) = 3%},(2). Let 6 > 0 and consider the sets E5={z::;::;g(z)>1—6} and D5={zzcl(z)g(z)>l+6}. Let G = {z ; :;§:19(z) = 1}. Case 1. Suppose A2(D6 x (0,1)) > 0. (a) If, in addition, A2 (D5 fl{:;(‘(:) >1} ><(0,1)) > 0, then let 6 < and 2(l6+6)’ note that, for each fixed 2 6 D5 (1 {:fg—g 2 1}, there exists t0(z) such that (1 — s)c1(z)'C2(z) > 1— e for all s < t0(z). 59 Let D = {(z,s) : z E D6,s < t0(z)}. On D _ d1, “278): CT)“; g(z)(1— s)c‘(z)—Cz(z) > (1+ 6)(1— 6) > 1+ g. Letting c = 6/2 in Theorem 3.4, we have If -1131, 2 [a — 11.1.1. 2 312(0). BC D Since, for each 2, to(Z) / s—l(1 _ S)Cl(2)—C2(Z)ds : m 0 we have A2(D) = 00. Thus, condition ((2)) of Brown’s Theorem is satisfied and, hence, we have the result. If, on the other hand, A2 (D5 fl{c1(z} 2 1} x (0,1)) = 0, then C2(Z 61(3) c2(z) < 1} x (0,1)) > 0. A2 (D5 {'1 { Since, for each 2, (1 - s)cl(‘)’cz(z) —> 00 as s T 1, there exists t1(z) such that (1 — s)c‘(zl‘c2(z) > 1+ 6 for all s > t1(z). Noting that ftlj(z)S—1(1 — s)C‘(zl’C2(z)ds = 00, for each 2, an argument similar to the one in (a) yields the result. Case 2. Suppose A2(D5) = 0, for all 6, but /\2(E5) > 0 for some 6. Then the above argument goes through by reversing the roles of A1 and A2. Case 3. Suppose A2(D5) 2: O and A2(E5) z 0 for all 6. Then :—;{—:—}g(z) = 1 a.e. A2. 60 In this case f(z, s) = (1—s)c1(z)‘“’(zl. Since, on G, at least one of {c1(z)—c2(z) > 6} and {c2(z) — c1(z) > 6} has positive measure for some 6, it is easy to see that, at least one of f8.- If —1|d/\2 and ch|}-1|d).1 is 00. 3.5 Beta-Stacy Process Priors The family of beta-Stacy process priors was introduced by Walker and Muliere [26] as examples of NR priors on 7:. We will see here that this family is, however, identical to the family of beta priors on .7. Definition 3.5. Let c(.) be a piecewise continuous positive function on (0, 00). Let F0 6 T have finitely many jumps at the points t1,...,tk. A random cdf F is said to have a beta-Stacy process prior with parameters c and F0, written F ~ beta-Stacy(c,Fo), if for allt 2 0, F(t) = e'H“) where H(t) is an independent in- crement process with Lévy representation such that (1). t1, . . . , tk are fired jump-points ofH and jump-sizes are given by 1— 8—H{t1} N beta(c(t,-)F0{tj}, C(tj)F0(tj, 00)); (2). the Le’vy measure is given by e—uc(s)Fo(s.oo) A(ds du) = c(s) dFoc(s) du 1—e‘“ where 0 g s < oo, 0 < u < 00; and for which 61 {3). b(t) E 0. Walker and IV'Iuliere referred to the Lévy process H as a Log-beta process. A discussion of such processes is unnecessary for our purpose. The distribution of F as above will be called a beta-Stacy(c, F0) distribution. We will now show that any beta-Stacy(c, F0) prior distribution on (F is simply a reparameterization of a beta prior. For this we take a careful look at the following construction of beta-Stacy process priors given in Walker and Muliere [26]. To avoid notational complications, let us again assume that F0 6 .7 is continuous. Let Q be a dense subset of (0, 00). As before, let {31, 32, . . .} be an enumeration of Q and for each n 2 1, let 3]") < - - - < 5%") denote an ordering of 31, . . . , sn. Now, for Igign—Ldefine WI") ~ beta(c(sfz)l).F0(s]n),oo),c(s£3)1).F0(s£E)1,s£")]). (3.5.10) 2 Define an independent increment process Hn as Hn(t) = — 2 log Him for all t 2 0 sf"’gt and denote its process measure by II;,. If II’ is the process measure of the beta-Stacy process, then Walker and Muliere shows that the finite dimensional distributions under II;l converge to the finite di- mensional distributions under II’ in the following manner. Let cm, 2 (3:11),), any 2 cn,,-F0(sS"), 00) and bmi = cn,,-Fo(s$f)1,s]")]. Fix t > 0 and 62 observe that £[e-0Hn(t)]= H £(”;i(n))9 sfMgt F(CnJ') 9+bn,— 1 0,1,4 in< [‘(anfl)n3)/ (1— xi) dzt l“(0113+ 9)F(bn,i). Therefore I‘(,b...- + 0) logg eXp(_ :2 log C,” )+ 0)P(bn, i) i: s(n) oo log£exp{-:9jH( (aj_1))))}—)Z/:/(1e'o“)A(dsdu). The following result now explicitly provides the reparameterization that yields a beta-Stacy prior from a beta prior. Theorem 3.6. II is a beta-Stacy(c, F0) prior if and only ifII is a beta(c’, F0) prior, where c’(s) = c(s)F0(s, 00). Proof. Let II denote a beta(c’, F0) prior and II denote a beta-Stacy(c, F0) prior on .7". Since c’(s) = c(s)F0(s, 00), it is easy to see that W") in equation (3.1.1) and W5") in equation (3.5.10) have the same distribution for all n 2 1 and all 1 g i S n — 1. Also. if Fn (t )2 e‘”" (t), then, for each fixed t, no i 1'1 11’5") i H V5") i=sf"’5t irsf"’5t where 5 denotes equality in distribution. Let {Iln} be the approximating sequence of time-discrete prior distrubtions con- verging weakly to any one of II and II Then it converges to the other, as well, and by the uniqueness of the limit we conclude that II and II are the same. I Another way to see the above connection between the two prior processes is in terms of the Lévy measure of the two corresponding independent increment processes. 64 Let A85 denote the Lévy measure for the independent increment process corre- sponding to the beta-Stacy(c, F0) distribution. Let A” denote the Levy measure for the beta(c’. AO) process where .40 = oH(F0). If F is a random cdf such that F follows a beta-Stacy(c, F0) distribution, then H (t) 2 OD( F )(t) is an independent increment process with Lévy measure C(S) e—u c(s) Fo(s,oo) ABS(dsdu) 2 (113(3) du. (3.5.11) 1 — e‘“ On the otherhand, if F follows a beta(c’, F0) distribution, then A(t) = ¢H(F)(t) is an independent increment process with Lévy measure A”(dsdu) = 5(8) u-1(1 — u)C’<8>-1dA0(s) du. (3.5.12) Making the transformation g : :r 1—> 1 — e"r in (3.5.11) and using the relationship between c’ and c we get, for any Borel set B, e—uc(s) Fo(s, co) ABS(g B))=/(B)S/0tc(')1 _ e‘ u dF0(s) du _ dF(s ) = ’ .~ 1 '(‘Sl 1—0 1 [lo c(s)t (1— v)C Fo[.S,OO) dt =/B/t')C(S) (1-v)cl(s)"ldAo(s)dv =A”(B That the two priors are the same now follows from Proposition 1.1. Thus, if we consider a particular sample path of the beta(c’, F0) process with jumps at the points {t1,t2, . . .} (note that such a path only increases in jumps). and replace the corresponding jump-sizes F%f—SOL) with the jump-amount — log (I — %), then we obtain a sample path of the independent increment process corresponding to a beta- Stacy(c,F0) prior. 65 Bibliography [1] Brown, M. (1971). Discrimination of Poisson processes. Ann. Math. Statist. 42, 773-776. [‘2] Cox, D. R.; Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall, London. [3] Connor, R. J.; Mosimann, J. E. (1969). Concepts of independence of proportions with a generalization of the Dirichlet distribution. J. Amer. Statist. Assoc. 64, 194-206. [4] Damien, P.; Laud, P. W.; Smith, A. F. M. (1996). Implementation of Bayesian non-parametric inference based on beta processes. Scand. J. Statist. 23, 27—36. [5] Dey, J .; Draghici, L.; Ramamoorthi, R. V. (1999). Characterizations of Tailfree and neutral to the right Priors. MS U Tech. Report. [6] Diaconis, P.; Freedman, D. (1986). On the consistency of Bayes estimates (with discussion). Ann. Statist. 14, 1-67. [7] Diaconis, P.; Freedman, D. (1986). On inconsistent Bayes estimates of location. Ann. Statist. 14, 68-87. 66 [8] Doksum, K. (1974). Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab. 2, 183-201. [9] Dykstra, R. L.; Laud, P. (1981). A Bayesian nonparametric approach to reliabil- ity. Ann. Statist. 9, 356-367. [10] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1, 615-629. [11] Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2, 209-230. [12] Ferguson. T. S.; Phadia, E. G. (1979). Bayesian nonparametric estimation based on censored data. Ann. Statist. 7, 163-186. [13] Gill, R. D.; Johansen, S. (1990). A survey of product integration with a view toward application in survival analysis. Ann. Statist. 18, 1501-1555. [14] Hjort, N. L. (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist. 18, 1259-1294. [15] Ito, K. (1969). Stochastic Processes. Lecture notes series, 16. [16] Kallenberg, O. (1997). Foundations of Modern Probability. Springer-Verlag, New York. [17] Kaplan, E. L.; Meier, P. (1958). Nonparametric estimation from incomplete ob- servations. J. Amer. Statist. Assoc. 53, 457-481. 67 [18] Lévy, P. (1937). Théorie de l’Addition des Variables Aléatoire. Gauthier-Yillars. Paris. [19] Muliere, P.; Walker, S. A. (1997). Bayesian nonparametric approach to deter- mining a maximum tolerated dose. J. Statist. Plann. Inference. 61, 339—353. [20] Peterson, A. \r'. (1977). Expressing the Kaplan-Meier estimator as a function of empirical subsurvival functions. J. Amer. Statist. Assoc. 72, 854-858. [21] Schervish, MJ. (1995). Theory of Statistics. Springer Series in Statistics, Springer-Verlag, New-York. [‘22] Susarla. \-".; Van Ryzin, J. (1976). Nonparametric Bayesian estimation of survival curves from incomplete observations. J. Amer. Statist. Assoc. 71, 897-902. [23] Tsai, W. (1986). Estimation of survival curves from dependent censorship models via a generalized self-consistent prOperty with nonparametric Bayesian estima- tion application. Ann. Statist. 25, 1762-1780. [24] Walker, S. A. (1998) Characterisation of Hjort’s discrete time beta process. Statist. Probab. Lett. 37, 351-355. [25] Walker, S. A.; Damien, P. (1998). A full Bayesian nonparametric analysis involv- ing a neutral to the right process. Scand. J. Statist. 25. 669-680. [26] Walker, S. A.; Muliere, P. (1997). Beta-Stacy processes and a generalization of the Polya-urn scheme. Ann. Statist. 25, 1762-1780. 68 [27] Walker, S. A.; Muliere, P. (1999). A characterisation of a neutral to the right prior via an extension of Johnson’s sufficientness postulate. Ann. Statist. 27. 69