THESIS \ Illiiliilllliflilllliiilllfiillllilillfiil 3 1293 01688 4714 This is to certify that the dissertation entitled Bayesian Bootstrap Credible Sets for Multidimensional Mean Functional presented by Nidhan Choudhuri has been accepted towards fulfillment of the requirements for Ph.D. degree in StatiSthS M Major professor Hira L. Koul Date June 15, 1998 MSUiJ an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE ma WWW-p.14 BAYESIAN BOOTSTRAP CREDIBLE SETS FOR MULTIDIMENSIONAL MEAN FUNCTIONAL By Nidhan Choudhurz' A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics and Probability 1998 I '.l[ Till ABSTRACT BAYESIAN BOOTSTRAP CREDIBLE SETS FOR MULTIDIMENSIONAL MEAN FUNCTIONAL By Nz'dhan Choudhurz' Let X1, . . . , Xn be i.i.d. Observations from an unknown d-dimensional distribution func- tion F with finite expectation. The aim is to Obtain a Bayesian set estimate for the mean of F in a nonparametric setup. Results using several kinds of nonparametric priors can be found in the literature for this purpose. But quantifying the prior information in the form of a nonparametric prior distribution is not an easy task. Besides this, the main difficulty is that often one does not have enough initial information to construct any prior. Hence there is a need for a non-informative nonparametric prior. Rubin (1981) introduced the concept Of Bayesian bootstrap (BB) to express the pos- terior knowledge about F and its functionals in the absence of any prior information in a nonparametric setup. The justification behind using Bayesian bootstrap as the posterior under a non-informative prior can be found in Rubin (1981) and Gasparini (1995). Hence in the absence of any prior knowledge using the (1 - p) central part of BB distribution of the mean functional as a (1 ——- p) level posterior credible set is natural. This dissertation establishes the existence of a strongly unimodal Lebesgue density for the exact BB distribution of the multidimensional mean functional provided the convex hull of the data has nonempty interior. This result is then used to identify the posterior credible sets at different levels of coverage. Then a two step procedure is described for constructing a credible set. First a Monte-Carlo procedure is used to simulate Observations from the BB distribution. Then a histogram smoothing approach is adopted to approxi- mate the posterior credible set. A theorem there proves that for almost every simulation sequence, the simulation based credible set converges to the exact BB credible set at the rate O(m"1/(d+1) log m) with respect to the metric defined by the Lebesgue measure of the symmetric difference. Here m is the simulation Size. The results are then extended to the case when the interior of the convex hull of the data is empty. The shapes of the credible sets are also investigated. It is found that the shape of a BB credible set is completely determined by data alone and reflects the presence Of any skewness in the underlying distribution F. The influence of an outlier to these sets is discussed in great detail. A theorem quantifies the extent of non-robustness by considering how much an outlier can deform a credible set. The effect is proportional to the distance Of the outlier from the data cloud and inversely proportional to the sample size n. In this outlier context, a comparison is made with the empirical likelihood ratio (ELR) confidence set (Owen: 1990) and a normal approximation set estimate. Another theorem shows that the effect of an outlier on a ELR confidence set is Of the same type as a BB credible set. But the constants of proportionality for the BB credible sets are found to be smaller than those of the ELR confidence sets at every level Of coverage and whatever be dimension of the data. The dissertation ends with an argument showing that Bayesian bootstrap can be viewed as the Bayesian counter part Of empirical likelihood in a general nonparametric setup. TO My Parents iv ACKNOWLEDGMENTS I would like to express my sincere gratitude to my dissertation advisor, Professor Hira L. Koul, for his constant help, advice, encouragement, guidance, mentorship and extreme patience. His caring personality and friendly nature made the whole doctoral experience enjoyable. I would also like to thank Professor Raoul LePage, Professor Dennis Gilliland and Pro- fessor W.T. Sledd for serving on my guidance committee, Professor V. Mandrekar, Processor R.V. Ramamoorthi and Mr. Alex White for their encouragement, suggestions and many helpful conversations. I would also like to thank Professor A. Dasgupta for many informal discussions and suggestions. I would also like to thank Mr. Aditya Vailaya, Mr. Visal Thakkar, Mr. Prasun Sinha and Mr. Samik Sengupta for helping me in writing the C program for the simulations and processing the simulated images. I cannot thank my parents and my brother enough, for the support and encouragement provided by them. This has been the main motivating force behind all my endeavors. I would also like to thank Professor P. Bhimasankaran and Professor D. Sengupta for the care and interest they showed in my progress as a student at Indian Statistical Institute, and for motivating me into the research. TABLE OF CONTENTS LIST OF TABLES vii LIST OF FIGURES viii 1 Introduction 1 2 Bayesian Bootstrap in the Nonparametric Bayesian Inference: A Historic Review 5 2.1 General Bayesian inference and credible set .................... 5 2.2 Non-parametric Bayes and Dirichlet Process Priors ................ 6 2.3 Non-informative priors and Bayesian bootstrap .................. 8 2.4 Other perspective of Bayesian bootstrap ...................... 10 3 The BB distribution and credible sets for the mean functional 13 3.1 A normal approximation credible set ........................ 14 3.2 An exact BB credible set .............................. 16 3.3 Constructing the confidence region ......................... 20 3.4 The case of singular data .............................. 32 3.5 Extension to linear functionals ........................... 33 4 Connection with Empirical Likelihood 34 4.1 Empirical Likelihood Ratio Confidence Sets .................... 34 4.2 Comparison of BB credible sets with ELR Confidence sets in the presence of an outlier ....................................... 35 4.3 Proof Of Theorem 3.1 ................................. 43 4.4 Proof of Theorem 3.2 ................................. 46 4.5 Connection Between EL and BB methodology. .................. 49 vi LIST OF TABLES 4.1 The values Of un in (3.7) along with no for dimension 2 and 3, and the values of (— log p) in (3.12) for four values of p ...................... 42 vii 3.1 3.2 4.1 4.2 4.3 LIST OF FIGURES Small sample BB credible sets of levels 80%, 95%, and 99% for the mean. . . . Large sample BB credible sets of levels 80%, 95%, and 99% for the mean. The BB, ELR and Normal approximation credible sets of level 95%. ...... The three credible sets of level 95% for a normal data set with an outlier. Diagram to identify the inflation effect and the sift efl'ect of an outlier ..... viii 23 24 36 37 38 Chapter 1 Introduction Let X1, . . . ,Xn be i.i.d. d-dimensional random variables having an arbitrary unknown distribution F0 with finite expectation . Let .77 denote the class of all distribution functions on d-dimensional Euclidean space Rd and .75 denote the subclass Of .7 with finite expectation, i.e., f- : {F e f :/ llxlldF(a:) < 00}, Rd and p be the mean functional on .73 defined as A: (0.1) MF) = [Rd :rdF(:r) F E .7. The focus of this paper is to construct a set estimate for p(F0). Bayes approach to this problem is to construct a prior probability on .7. One assumes F to be a random element of .7: according to this prior probability, F0 to be a particular realization of F and given F, X1, . . . ,Xn are i.i.d. F. Then one uses the posterior distriv bution of F and p(F) to infer about F0 and p(Fo). A non-parametric prior often used in the literature is a Dirichlet process prior with a finite shape measure a (Ferguson: 1973). 1 2 But most Often one does not have enough initial information about F tO construct any kind Of prior. Besides, quantifying the prior knowledge in the form of a prior is not an easy task. The need for a non-informative prior to represent vague initial information in non-parametric Bayesian statistics is thus well justified. Rubin (1981) introduced the concept Of Bayesian bootstrap to express the posterior knowledge about F and its functionals in the absence of any prior information. Replacing the mass l/n of the empirical distribution Fn by random weights, he defined a random distribution function on Rd as (0.2) Dn : Z WiJXn l where the joint distribution Of (W1, . . . , W") is uniform on the simplex (0.3) Qn={wER":Zw,=1,w,-20}CIR" i=1 and is independent of the sample X1, . . . , Xn. Bayesian bootstrap (BB) distribution of any functional 0 on .77 is the conditional distribution of 9(Dn), given X1, . . . ,Xn. Rubin (1981) argued that for a fixed finite sample, the BB distribution of F can be obtained as a weak limit of the posterior distributions under Dirichlet priors when the total mass of the Shape measure (1 tends to zero, i.e. 0(Rd) —> O. The results thus obtained are then comparable to standard frequentist results, as illustrated by the applications in Section 5 of Ferguson (1973). Gasparini (1995) proves that the posterior distribution of p(F), for large class Of Dirichlet priors, converges weakly to p(Dn) when a(le) —) 0. These facts establish the role of Bayesian bootstrap as an non-informative prior in nonparametric Bayesian statistics. Hence, in the absence of any prior knowledge, using the (1 — p) central 3 part of BB distribution of p(F) as a posterior credible set and in turn using it as a (l — p) level Bayesian set estimate for ,u(Fo) is natural. The concept Of using BB distribution to produce credible sets has been used before. Example 1.1 of L0 (1987) uses the BB distribution to obtain a 95% probability band for a univariate distribution function F. For the multidimensional mean functional, the difficulty lies in selecting the central (1 — p) part of the BB distribution. In the one-dimensional case the interval between the (p/2)”‘ and the (1 —— p/2)‘h quantile represents the central (1 — p) part of the distribution. Hence this interval can be used as a credible set. But this quantile approach does not extend to the higher dimension due to the lack Of proper definition of quantiles. To identify the central part of a multivariate distribution, it is important to know the nature of the distribution. This thesis establishes the existence Of a strongly unimodal Lebesgue density for the exact BB distribution of the multidimensional mean functional under some mild conditions. This result is then used in the construction of credible sets. The construction procedure is then extended for the cases when the condition fails. Then this paper finds the influence of an outlier on BB credible sets and compares these credible sets with the empirical likelihood confidence sets (Owen: 1990) in this context. An argument is presented to Show that the BB can be thought as a Bayesian counterpart of the empirical likelihood. The plan for the thesis is as follows. General Bayesian procedure is presented in section 1.1. Section 1.2 and 1.3 establishes the role of BB as a non-informative prior. A brief literature survey ends Section 1.4. Section 2.2 presents the main strong unimodality result and identifies the credible sets. A two step procedure of constructing the credible sets is presented in Section 2.3. The strong unimodality result is then extended to more general case in Section 2.4 and 2.5. Section 3.2 compares BB credible sets with the empirical 4 likelihood confidence sets and the connection between the two procedure is presented in Section 3.5. Chapter 2 Bayesian Bootstrap in the Nonparametric Bayesian Inference: A Historic Review 2.1 General Bayesian inference and credible set In a statistical experiment, data is collected following a probability model with an unknown parameter 0, lying in a parameter space 9. A Bayesian would use a prior probability on O, representing his/her prior belief about the unknown parameter 0. Then the prior belief is updated using the data to obtain a posterior belief. For a Bayesian, the posterior distribution encapsulates all that is known about 0 following the Observations and any inference about 0 should be made by analyzing the posterior distribution. Let X1, . . . , X n be i.i.d. d—dimensional random variable having unknown distribution F0. Let B“ be the Borel o-field in W. Consider a family of distribution function {F9 : 6 E O} on (Rd, 8“) which describes the probability model of the statistical experiment generating each 5 6 of the X ,’s and let 00 be the true parameter value such that F90 = F0. A Bayesian approach here is to first obtain an o-field .A on 6 such that the map 0 ——) F9{B} is A—measurable for every B E 3“. Then one constructs a prior probability m on (O,.A) and assumes that the unknown parameter takes a value according to this prior probability and conditional on 0, X1, . . . , Xn are i.i.d. with common distribution F9. 80 one can think that (60, X1, . . . ,Xn) is a particular realization from the joint distribution of the parameter and the data, where the parameter value 00 is missing. The Bayesian idea is to use mn(6|X1, . . . , X n), the conditional distribution of 0 given X1, . . . , X n, to infer about 90. This conditional distribution is known as the posterior distribution. Let 7 : O ——> IR" be a A-measurable map. The aim is to Obtain a set estimate for 7(00). First one obtains the the distribution in IR" induced by 131,, under the map 7, known as the posterior distribution of 7(6). Then one analyses this posterior distribution to Obtain a credible set for 7(6) as is defined below. DEFINITION 1.1. A credible set for 7(0) of level 1— p is a set C contained in the support of the posterior distribution of 7(0) such that I i) posterior probability of C is 1 - p and ii) C represents the central part of the posterior distribution. Since a credible set represents the central probability concentration region of the pos- terior distribution, it is natural to use this set as a Bayesian set estimate for 7(90). Note that the choice Of C is not unique and depends on the definition on centrality in item (ii). 2.2 Non-parametric Bayes and Dirichlet Process Priors In many statistical experiments, it is desirable to make fewer assumptions about the under- lying population from where the data is obtained than are required for a parametric model. 7 Non-parametric models are constructed to provide support for more eventualities than are supported by a parametric model. Often one assumes that F0, the common distribution of X,’s, can be any element of the set .7, the class of all distribution functions on Rd, or an element of a large subset Of .7. For a Bayesian, the first job is to construct a o-field and a prior probability on .7. The natural o-field, A, on .7 is the smallest o-field that makes the map F ——) F{B} measurable for every B E 3". Since Rd is a complete separable metric space, one can think of the weak convergence topology on .7. A sequence {Fr} 6 .7 converges weakly to F E .7, if and only if, f ngr ——) f ng for all bounded continuous function g on Rd . Under the weak convergence topology, .7 becomes a complete separable metric Space and the corresponding Borel a-field is the same as A. There are several classes of priors on (.7 , A) in the literature. The Dirichlet process priors plays a central role in the non-parametric Bayesian analysis. DEFINITION 1.2. ( Ferguson 1973 ) Let a be anon-zero finite measure on (Rd,Bd). A probability DO on (.7, A) is said to be a Dirichlet process with Shape measure a if, for every finite measurable partition {81, - - - ,Bk} Ofle, the random vector (F{B1}, - - - , F{Bk}) has a Dirichlet distribution on [0, 1]’c with parameters (a(Bl), - -- , a(Bk)). RESULT 1.1. ( Existence and uniqueness ) For every non-zero finite measure a on (Rd, 8"), there exists an unique Dirichlet process measure on (.7 , A). RESULT 1.2. ( Posterior Distribution ) Let F is a random element in .7 with a Dirichlet process prior with shape measure a and given F, X1, . . . ,X7, is i.i.d. F; then the posterior distribution of F is also a Dirichlet process with Shape measure a + 2'; (ix... Result 1.1 can be established in many ways. One proof using the KOlmogorov consistency result can be found in Ferguson (1973). The same paper contains a proof of Result 1.2. 8 2.3 Non-informative priors and Bayesian bootstrap Non-informative priors are of some Special interest in Bayesian literature. In a parametric case, non-informative priors are Obtained as a limit of a sequence of priors when the prior information is decreasing through the sequence. In the nonparametric case, one can start with a sequence of Dirichlet priors. Let F be a random element in .7 with a prior distribution Da. Let a* = OUR“), the total measure of a and 6: = a/a*, the normalized probability measure. Then for every B 6 8d, F{B} has a Beta distribution B(a(B), a* — a(B)). Hence IEF{B} = a(B)/a*=5¢(B) VarF{B} = c'ie(B)(1— c'x(B))/(a* +1) SO 6: is the center of D0 in .7 and can be thought as a prior guess for F. 01* controls the variance of F. A large value Of 0* implies Do is concentrated near a and a small value of 01* implies Dc, is widely distributed. With Do, priors, the posterior expectation of F is "6‘ * IE(F|X1,...,X,,)=9—j-——Z—l——&= a - ——a —F, a*+n a*+n +a*+n n which is a weighted average of the prior guess and empirical distribution. Hence a* can be thought as an index of confidence on the prior probability and letting (1* tend to 0, one can expect to have a non-informative prior. In parametric situation, by taking limits Of a reasonable sequence of priors, one Often ends up with an improper prior instead of a probability measure. But still the procedure become useful in the sense that the sequence Of posteriors Often result in a probability measure as their limit, which has no influence Of prior guess and is completely determined by the data. Since 7, equipped with weak convergence topology, is a complete separable metric space, one can introduce the notion of convergence in distribution of a sequence of prior and posterior probabilities. We need to see what happens to a sequence of posterior distributions under Dirichlet priors Do. when a: —> 0. The following convergence result will be useful in this regard. RESULT 1.3. (Sethuraman and Tiwari 1.982) Let {Or} be a sequence of finite measures on (Rd, 8") such that sup Iva-(B) - ao(B)l —> 0 868d for some non-zero finite measure do on (Rd, 8"). Then Dar converges in distribution to Do,O on .7. RESULT 1.4. Let Do, be a sequence of Dirichlet priors on (.7 , A) such that a: ——> 0. Then the sequence of posterior distributions of F converges in distribution to the BB distribution of F on .7 for any data set X1, . . . , X". PROOF: Under Dar prior, the posterior is Dar+npn. For every B 6 3", Na. + nFn)(B) -— (nanBn = ar(B) s a:- This implies that sgp [(01, + nFn)(B) — (nFn)(B)| —> 0 as a: —> 0. Hence by Result 1.3, the posterior distribution of F converges to ann. Note that ann is the BB distribution of F and this completes the proof. Cl 10 The above result proves the claim in Rubin (1981) that the BB distribution Of F can be Obtained as a weak limit of Dirichlet process priors when the faith on the prior is decreasing to zero and hence BB distribution can be thought of as a non-informative posterior in a nonparametric set up. Since our Objective is to infer about the mean functional, it would be interesting to know what happens to the posterior distributions of p(F) in the above case. Since mean functional is not continuous on .7 with respect to the weak convergence topology, the limiting behavior of p(F) cannot be obtained from Result 1.4. Gasparini (1995) has a result in this regard. RESULT 1.5. Let a, = 0:61 be a sequence of non-zero finite measure on (Rd, 8") such that 61 is a probability measure with / ||$||2d€x(:r) < 00 Rd and a: —-) 0. Then the posterior distribution of p(F) under Do, priors converges in distribution to the BB distribution Of u(F) on (Rd, 8“). Result 1.4 and Result 1.5 establish the role of Bayesian bootstrap as a non-informative prior in a nonparametric setup. 2.4 Other perspective of Bayesian bootstrap Asymptotic equivalence of BB distribution and the posterior distribution under a Dirichlet prior with non zero a has been noticed earlier. Lo (1987) showed that in the one dimension case, the posterior distribution Of F for a Dirichlet prior and the BB distribution, condi- tional on the data, are first order asymptotically equivalent in the sense that for almost all sample sequences and subject to prOper centering and n1/2 scaling, they achieve the same 11 limiting conditional distribution. Weng (1989) pointed out that for the one dimensional mean functional, the two distributions are equivalent up to a second order asymptotic if / llwll3da(:v) < oo, Rd and F0 finite has third moment. Thus one can approximate the posterior distribution under a Dirichlet prior by the BB distribution. This approximation becomes useful as it is very easy to Simulate from BB distribution. The Operational and structural similarities between BB and bootstrap of Efron (1979) are mentioned in Rubin (1981) and Efron (1982). Rubin has shown that the ordinary boot- strap is the same as BB except the very fact that the weights (W1, . . . , W") are continuous in BB, whereas they are replaced by some discrete weights in ordinary bootstrap. Rubin (1981) gives an example in which the histogram of 1000 BB correlation coefficients is Similar to, but smoother than, a histogram of 1000 ordinary bootstrap correlation coefficients. LO (1987) proved the first order asymptotic equivalence of the two procedures for a variety of functionals including the mean functional and the identity functional. Similar results for finite population case were obtained in L0 (1988). This gives a frequentist perspective Of BB method. In the case Ofa finite support set {d1, - -- ,dk} for F, a vector 9 with 61- = F{dj}, F{dj} being the probability of Singleton set {(1,}, uniquely identifies F. Hence the space of all prob- ability measure on {d1, - . ~ , dk} can be parameterized by the k-variate unit simplex. Now a prior on 0 with density proportional to H 0;” leads to the posterior density proportional to I] 627+”, where nj’s are the number of observations equal to dj’s. A non-informative prior (improper) with all pj = —1 leads to the fact that 0,- = 0 with posterior (improper) proba- 12 bility 1 for any unobserved d,- and the posterior distribution becomes the BB distribution. An important fact is that one does not need to know the value Of unobserved d,’s, as pointed out in Owen (1990). This gives a justification behind using BB as an non-informative prior in finite support case. Owen (1990) introduced the concept of empirical likelihood as a nonparametric gen- eralization of the well studied parametric likelihood and used this concept to construct confidence sets and test statistics for several nonparametric functionals. He Observed that in the finite support case the empirical likelihood is proportional to the BB density. He thus argued in favOr Of connecting empirical likelihood with the posterior under a non-informative prior as in the parametric case. This argument is extended for the general nonparametric set up in the Section 3.5 Of this Thesis. Chapter 3 The BB distribution and credible sets for the mean functional The role of Bayesian bootstrap in the nonparametric inference may inspire one to use the BB distribution of p(F) for constructing a set estimate for p(Fo) by means of a BB credible set. As we have seen in the introduction, there are some difficulties in choosing the central probability concentration region of a multivariate distribution. Besides, one is also concerned about the shape Of the region. Since one is using this credible set as a set estimate of the unknown mean, a connected region is preferred than a union of some disjoint sets. Since mean is a convex sum of the points in the support Of a distribution function, convexity Of a set estimate is desired. Besides desire on the shape, the size Of the credible set Should be as small as possible (in terms Of Lebesgue measure) to make the estimate precise. Along with these, one needs to remember that a credible set should have the required (1 — p) posterior coverage probability. These are the issues to be remembered while constructing the Bayesian bootstrap credible set. 13 14 3.1 A normal approximation credible set Let X denote the sample sequence {X 1, X2, . . . }, F6” denote the infinite product measure on (le)°°, X), = p(F,,) denote the sample mean and A", X denote the BB distribution Of the mean functional. The aim is to find a central high probability concentration set Of An,X- When (2.1) [Rd ||rc||2dFo(z) < oo, a normal approximation Of the BB distribution may be useful for this purpose. THEOREM 2.1. If (2.1} holds then for almost every sample sequence X, (2.2) mum.) — Xnux => Nam, 2), where Z] is the dispersion matrix of F0. PROOF: The proof is just a multidimensional extension of its one dimensional version in L0 (1987), Theorem 4.1, which says that if f zzdFo < 00, then for almost every sample sequence X, (2-3) fill‘an) _ anlx => N(0702), where o2 is the variance of F0. Now for every I 6 W, by (2.1) / (ITx)2dFo(z)suzn2/ ll$||2dFo(a=)<00- Rd Rd 15 So by (2.3) for almost every sample sequence X, ”(x/710401.)-Xn})|x = fi{ZWI(lTXi)-ITXn}IX where of = Var(lTX1) = [TEL Hence by the Cramér-Wold device, the proof iS complete. Cl If E is Of full rank, then one can substitute for 2 with the sample dispersion matrix in the limiting normal distribution and obtain Ap = {:13 i "(1' - anTS;l(-T " Xn) S Xil—p} as an approximate central high probability concentration region of An,X- This set is the smallest set (in terms of Lebesgue measure) with limiting coverage probability (1 — p) and also has the desired convex Shape. Hence Ap can be used as a credible set of level (1 — p) for ”(F). But there are some limitations of this asymptotic procedure. First, the convergence in (2.2) is in the first order sense as indicated by Weng (1989). SO A], does not reflect any higher order moment structure of An, X such as skewness. Besides Ap is always elliptical in shape. If the data shows a clear skewness, it may be difficult to have faith on an elliptical set estimate for the mean. A data determined shape is thus more desirable. Note that AP is the same as the frequentist confidence set obtained by Hotelling’s T2-distribution up to a scale factor, i.e. the cut off point xihp is replaced by a multiple Of some quantile Of 16 an F-distribution. If the posterior distribution under a non-informative prior is the prime Object, then it is important to find the central part Of the exact BB distribution. 3.2 An exact BB credible set A probability on Rd is said to be strongly unimodal if it has a Lebesgue density 9 such that every high density contour {:r 6 Rd : g(:r) Z c} is a convex set. The existence Of the density implies that any high density contour is the smallest set (in terms of Lebesgue measure) among all sets with the same probability. Strong unimodality implies that such high density contours are convex and surrounded by low probability region. Hence a high density contour in some sense represent the central high probability concentration region and can be used as a credible set. If we can prove the strong unimodality of An,x, than a high density contour can be used as a BB credible set for the mean. DEFINITION 2.1. [Prékopa (1973), eqno 1.1] A nonnegative function f on IR" is said to be logconcave if for every 17, y E IRd,t E [0, 1], (2.4) f(trv + (1 — t)y) 2 [f(a:)l‘ [Hz/)1“- PROPOSITION 2.1. A probability with logconcave Lebesgue density is strongly unimodal. PROOF: Fix an c > 0. TO prove strong unimodality, we have to show that {x : f (x) Z c} is a convex set. Let 23,3; 6 {2: : f(:c) Z c}. Then f(:r) 2 c and fly) 2 c. So for any t 6 [0,1], by (2-4), f(t:r: +(1—t)y) Z ctcl‘t = c. Hence ta: + (1 — t)y E {x : f(:r) Z c} which implies that {z : f(:r) Z c} is convex. This completes the proof. [:1 17 THEOREM 2.2. If the convex hull of X1,...,X,, has a nonempty interior in IRd, then Amx is absolutely continuous with respect to the Lebesgue measure on Rd and the corresponding density is logconcave. A probability distribution F on Rd is said to be non-singular if F{H} < 1 for every hyperplane H. Note that the convex hull of X1, . . . ,Xn has nonempty interior if and only if all Xi’s are not confined in a hyperplane i.e. E, is non-singular. If F0 iS non-singular then for almost every sample sequence X, there is an N, depending on the sample sequence X, such that Fn is non-singular for n > N. Hence An,X is eventually strongly unimodal. Moreover if F0{H} = 0 for any hyperplane H, then with F6” probability one, Fn is non- singular for every n _>_ (d + 1). Hence the condition of Theorem 2.2 is satisfied in most Of the cases and a BB credible set could be Obtained through high density contours. The rest Of section establishes the proof of Theorem 2.2. An affine in Rd is a subset M Of le such that for every 11:, y E M and —00 < t < 00 we have ta: + (1 — t)y E M i.e. the entire line passing through a: and y are in M. If 0 E S, then M is called a subspace. Affines are sometimes known as lower dimensional planes. Define the affine hull of a set A in IRd as (2.5) ’H(A) = {tx+(1—t)y::r,yEA,—oo 0 around a point x E IRd as R(x,h) = {y E IRd : IyU) — xml S h/2, V l = 1, - -- ,d}. Now we will partition the region ’R into small hyper cubes. Fix an h > 0. For I = 1,--- ,d, let S1={(i+1/2)h: i = 0,--- ,[(b, —-a1)/h]} C IR, where [c] denotes the largest integer less than or equal to c. Define the cells on IR“ as 72,, = [[11 S;, the Cartesian product of 51’s. Then the hyper cubes {R(x, h) : x E Rh} cover the set ’R and are disjoint except at the boundaries. For each x E IRd define (2.7) r(x) = :02,- E R(x,h)} = # of 59- belonging to R(x, h). j=l STEP 2. For the data X1, . . . ,Xn, obtain the set R. Choose h = m‘l/(d+2l and obtain the cells Rh. Calculate r(x) for each x E R}, and order the cells according the descending order of T(x). Let {x1, - -- ,xk} denote the ordered cells, where k is the number of points in R3. Find the integer [to such that ko—l ’50 Z 'r(xj) < (1—p)m and Zflxj) _>_ (1—p)m. 1 1 This can be done by adding T(£L‘j) ’3 one at a time until we reach (1 —p)m. Now use the set k0 1 as an approximation to C33. 23 - I. (TX X9) . q— -1 .- o O O (a) Bivariate Normal data 0 ’ 6 0 ti:- o o O .. 2 . 0 D 2 o . 6 (b) Bivariate Gamma data Figure 3.1: Small sample BB credible sets of levels 80%, 95%, and 99% for the mean. 24 U ..- 2 O .9 no 0 O o o .. O o O O O T am, . -2 ° 9 o 0 D 2 O O - O o O o P o D -2 d)- (a) Bivariate Normal data db 6 O (b) Bivariate Gamma data Figure 3.2: Large sample BB credible sets of levels 80%, 95%, and 99% for the mean. 25 Some simulation results are presented in Figure 1 using the simulation Size m = 200, 000 for each case. Figure 2.1(a) shows the BB credible sets with confidence level 80%, 95% and 99% for the twelve Observation from a bivariate normal distribution with mean 0 variance 1 for each component and the correlation coeflicient —0.5. The credible sets are almost elliptical in shape as expected with data coming from an elliptically symmetric distribution. Figure 2.1(b) shows these credible sets based on twelve Observation form a skewed bivariate distribution with density f (u,v) = uve‘("+”), u,v > 0, i.e bivariate Gamma. Note that these credible sets are able to reflect the skewness Of the underlying distribution in their shape. The moderate sample BB credible sets taking n = 40 from these two distribution are presented in Figure 2.2(a) and 2.2(b). These credible sets for Gamma observations in Figure 2.2(b) are almost elliptical in shape with very little skewness. This is expected as the standardized BB distribution is asymptotically normal. Commenting on the computational aspect, Step 1 takes time proportional to the sim- ulation Size m. The number of cells k = Hilfibl — a¢)m‘1/(d+2l + 1] 2 cmd/(d+2l. Hence the calculation Of r(x) for x E 72,, will take time proportional to km which is of the order smaller than 0(m2). Ordering of cells will take time proportional to k2 which is also of the order smaller than o(m2). Hence the magnitude Of time taken for the entire procedure will be of the order smaller than 0(m2). This allows one to perform the simulation with large m and making the approximation more accurate. Another advantage here is that the computational time depends mainly on m and remains almost unchanged with a change in dimension d or in sample size n. But the convergence rate Of Cm decreases with an increase in the dimension and one needs to use a larger m to achieve the same resolution. To measure the performance of Cm in approximating C33, one needs tO define a measure 26 of proximity between sets. Define a metric d1 on the subsets Of le as (11 (31,32) = Leb(B1ABg), 81,82 C Rd, where A defines the symmetric difference Of sets and ’Leb’ denote the Lebesgue measure on IRd. Then we have the following convergence result on Cm. THEOREM 2.3 If the convex hull of any n — 1 data pints has nonempty interior, then for a.e. simulation sequence, a. (cmcgs) = 0(m‘1/<°'+2>1n(m)). The rest Of this section is dedicated to the proof of Theorem 2.3. Throughout this proof, the data X1, . . . ,Xn is fixed and the randomness is coming from the randomness in the simulation. Recall that the hyper cubes {R(x, h) : x E R3} cover 72 and are disjoint except at the boundaries. Then the function gm on IRd defined as 9m($) = ("thall—l Z 1R(x,,h)($)7'($i) is a histogram smoothing density estimate Of g and the set Cm is the same as {x : gm(x) 2 Am} with Am = (mhd)‘lr(xko). Hence Cm is a high density contour of gm. First we shall Show that gm -—> g and Am —> A a.e, where /\ is defined in (2.6). We shall identify the BB density 9 with a multivariate B-spline to obtain some smooth- ness result. Here is a probabilistic definition of B-spline, taken from Section 2 of Karlin, Micchelli and Rinott (1986). DEFINITION 2.3. Let X1, . . . ,Xn be n points in IR“! such that their convex hull has 27 a non-empty interior. Then there is a function Mn(-|X1,...,X,,) : IRd ——> [0,00) with X1, . . . , Xn as parameters, such that for every bounded continuous function h : IRd —-> IR, (2.8) (n -1)l/ h(w1X1+---+ wan)dw =f h(x)M,,(x|X1, . . . ,Xn)dx. 9,, Rd Here (2,, = {w E IR" : 2 w,- = 1,w,~ Z 0} as defined in (0.3) and the LHS integration is with respect the n — 1 dimensional Lebesgue measure on (In. The function Mn(-|X1, . . . , X") is called a B-spline function with knots X1, . . . , Xn. One can see that the LHS Of (2.8) is IE[h(Z)|X1, . . . ,Xn], where Z = W1X1+- - -+W,,X,, and (W1, . . . , W") are uniform on (In. Thus the RHS of (2.8) implies that, Mn(-|X1, . . . , Xn) is a version of the Lebesgue density Of the distribution Of Z on IR". Note that Z is nothing but a BB mean and hence Mn(-|X1, . . . , X") is also a version of the BB density 9. Thus one can use standard B—splines results on 9. By Corollary 3 Of Micchelli (1980) and its extension in the adjacent paragraph, we deduce the following result. RESULT 2.1. If the convex hull of any n — 1 data points X1,...,X,, has nonempty interior, then g has a continuous derivative on IRd. The derivative vector Vg is bounded above and is also non-zero in the interior of the support of g except at the mode of g. LEMMA 2.5. Let A be as in (2.6). Then there exist constants b,- > 0, i = 1,2,3 and 60 > 0, possibly depending on A, such that for all 6 S 60, we have (i) Leb{x: |9($) — Al S 5} S b15, (2-9) (ii) Leb{x : 0 < A — g(x) g 6} > b26, (iii) Leb{x: 0 < g(x) — A S 6} Z b36. PROOF: First we shall prove (2.9.ii) and (2.9.iii) using the fact that H V g“ is bounded 28 above (Result 2.1). Let A = {y : g(y) = A} and k = supat II V g(x)”. Then for any x,y, |9(y) - g(x)! S III! - atllk- Thus, a; IA—gun 56} :2 Us: = le—yll s 6/k}, yEA and {x2 0 0. Hence for small 6, there exists a b2 such that Leb[Uy€A{x : ||x — y” g 6/k} (I {x : g(x) — A < 0}] Z b26. This completes the proof of (2.9.ii). The proof of (2.9.iii) is similar. TO prove (2.9.i), we shall use the fact that vg is nonzero. Continuity of g implies that A is a closed set and A > 0 implies that A is in the interior of the support of g, which is a bounded set. Hence A is a compact set. Since A does not contain the mode of g, by Result 2.1, H v g(y)“ is continuous and positive on A. SO the compactness of A implies that 29 II V g(y)“ is bounded away from zero on A. Let infyeA II V g(y)“ 2 k1. First we shall Show, by contradiction, that {23: |g(a=) - Al S 5} E U {x = Ila: - 31“ S 36/k1}. yEA Suppose this is not true. Then there exists an x0 such that [g(xo) --Al 3 6 but leo -y|| > 36/k1 for all y E A. Then either g(xo) < A or g(xo) > A. Assume g(xo) < A. Let yo be the closest point on A such that the line joining yo and x0 is perpendicular to the tangential plane to A passing through yo. Then the vector vg(y0) has the same direction as the vector (yo-$0) Hence (V9(yo))T(yo-$o) = IIV9(yolll ° ”yo-$0” Z killyo-ftoll- The contimfity 0f vg implies that, when x is in a neighborhood Of yo, (Vg($))T(i/0 - x0) > (kl/2)||y0 — xoll > (3/2)5. By the fundamental theorem Of calculus, 1 g(yo) — g(xo) = [0 gm. + <1 — t)zo)dt l = A (VQUyo + (1 — t)SiiollTQJo — xoldt > [1(3/2)6dt 0 >6. Hence a contradiction. The case g(xo) > A is handled similarly. Note that UyeA{x : ||x—y|| S 36/k1} is a thin region Of width 36/191 around A. Convexity of {x : g(x) — A Z 0} implies that the Lebesgue measure of UyEA{x : ”x — y“ S 36/k1} divided by 6 has a positive limit as 6 ——) 0. Hence for small 6, there exists a b1 such that Leb[Uy€A{x 2 “III -- y“ S 36/k1}] S 016. 30 The proof Of (2.9.i) follows from the fact that {an lg(x) - AI 3 6} g uyeAIx : ”a: — gm 3 we}. :1 LEMMA 2.6. Let 7m = supx |gm(x) — g(x)]. Then for a.e. simulation sequence (2.9) 7m 2 0(m-1/(d+2)ln(m)). PROOF: Since g has bounded support, continuous bounded derivative, the result follows as a multidimensional extension of Theorem 3 in Révész (1972). El LEMMA 2.7. Am — A = 0(7m) a.e. PROOF: Because fgI(g 2 A) = l —p = fng(gm 2 Am), we obtain MIN!) 2 A.gm < Am) —I(g < x,y,, 2 Am)} = f9{1(9 Z A) -1(gm 2 Am)} = f91(9 Z b)“f9m1(9m 2 Am) +f(gm —g)I(9m 2 Am) = f(gm _ g)I(gm 2 Am)- (2.11) Suppose (Am — A) /7m is not bounded from the above. Then there is a subsequence such that (Am — A) /7m > 1 through that subsequence. By definition of 7",, (2:12) {.9 < )‘agm 2 Am} C {Am _7m < 9 _<. A}- Since Am - 7", > A through that subsequence, the event in the RHS Of (2.12) is a null set. Thus by (2.11), through that subsequence, (2.13) f My 2 x,y,. < A...) = fun. — My... 2 A...) s Lebuz) 7..., 31 as {gm 2 Am} C R. Again, [91(92 A.gm 0 such that lAm — A] 3 K7,". Hence by (2.9.i), Leb(CmAC33) g b(K + 1)7m. Hence the proof is complete by Lemma 2.6. C] 32 3.4 The case of Singular data One can also proceed even if the data does not have a non-empty interior. In this case, all the X,’s are confined in an affine of ROI. Let ’Ho be the afline hull Of the data set {X 1, . . . , Xn} and s be the dimension of H0. Then we have the following theorem. THEOREM 2.4 If 0 < s < d, then An, X is absolutely continuous with respect to the s-dimensional Lebesgue measure restricted to ’Ho and there is a logconcave version of the corresponding density. PROOF: Let A, denote the Lebesgue measure on IR’ and A, denote the s—dimensional Lebesgue measure restricted on 7-10. Then there exists a bijective affine map IL : HQ ——> IRS such that A,IL‘1 = A,. (A,IL‘l denotes the induced measure Of A, on IR”.) Let Y,- = IL(X,-). Linearity Of IL implies that the affine hull of {Y1,- -- , Yn} is the image of the afline hull Of X1, . . . , Xn under IL, i.e. the entire IR3. Hence the convex hull of {Y1,- - - , Yn} has non—empty interior in R3. Note that AleL—1 is the same as the BB distribution on IR3 obtained by the data {Y1, - -- ,Yn}. Hence by Theorem 2.2, Amle-l has logconcave Lebesgue density g, on 1R3. Since IL is one to one, Amle—1 << A, = A,IL‘1 implies An,X << A, and g(x) = g(lL(x)) dAmx 8 defines a version of . Affine property of IL and logconcave property of 9, implies g is logconcave on 7-10. El Note that the map IL is not unique and g, depends on IL. But g, 0 IL is a version of gig-fill and hence is independent of the choice Of IL. Since IL is one to. one and affine, the inverse image Of a high density contour of 9, will be a high density contour Of g in ’Ho. The high density contours of 9, may depend on IL but their inverse images in ’Ho under the map IL will not depend on IL, as they are the high density contours Of g. Hence these high density contours Of g can be used as BB credible sets in ’Ho. To apply this result, one needs to find an IL. AS the affine hull of data has dimension 3, 33 the rank of the sample dispersion matrix 5,, is s and S, has exactly 3 non-zero eigen-values. Take a spectral decomposition of Sn, and let e1, - . - ,e, be the eigen vectors corresponding to the non-zero eigen-values. Then a candidate for IL is IL = [e1,--- ,e,]T(x — X). In this case the inverse function IL‘1 : IRS —> H0 is of the form IL‘l (y) = 2: ye, + X. 3.5 Extension to linear functionals The above logconcavity result can be extended to the BB distribution of a linear functional. Let cp : IR" ——) IR" be a Borel measurable function and u, be a linear functional on .7 defined as #MF) = [Rd d will be taken care of by Theorem 2.4. Chapter 4 Connection with Empirical Likelihood 4.1 Empirical Likelihood Ratio Confidence Sets For i.i.d. data X1, . . . ,Xn, Owen (1990) defines the empirical likelihood of a distribution function F E .7 as (3.1) W) = H F{X.}, i=1 where F {x} denote the probability Of the Singleton set {x} under F. This likelihood function is maximized at the empirical distribution function Fn, the well known nonparametric MLE of F0. In some cases, the empirical likelihood ratio function (3.2) R(F) 2 56:3) 2 n" H F{X,} 34 35 can be used to construct confidence sets and test statistic for a functional 0 on .7. Consider sets of the form {0(F) : F << Fn, R(F) 2 r} for 0 < r < 1. Owen (1990) gives conditions on 0 and F0 under which these sets can be used as confidence sets for 0(F0). For 0 = u, the mean functional, define (3.3) cm. = {u(F) = F << a, R(F) 2 r}- Then we have the following result by Owen [(1990),Theorem 1]. RESULT 3.1. If F0 has finite second moment, i.e. (2.1) holds, and the dispersion matrix 2 is of rank 3 > 0, then for every 0 < r < 1, C33 is a convex set and lim PF0(CEL 3 #(F0)) = PIX: S -21087‘)- 11900 If one chooses r = exp{—%X§,p}, then C31, serves as a confidence set for g(Fo) with the (frequentist) asymptotic coverage 1 — (1. Theorem 1 of Owen (1990) also contains some results related to 0(n‘1/2) rate Of convergence Of the above limit. DiCicciO, Hall and Romano (1991) have shown that the rate is 0(n'1) if the assumptions justifying Edgeworth expansions are met and the Bartlett factor improves the rate to 0(n‘2). Results related to some other functional can be found in Owen (1990). 4.2 Comparison of BB credible sets with ELR Confidence sets in the presence of an outlier One advantage of both the BB and EL methods for constructing set estimate is that the Shapes of these sets are completely determined by the data. These sets are also able to 36 -1“ 0 Normal Sc: 0 O (a) Bivariate Normal data 0 . 6 O 4)- o 0 Ilb 2 N o "03 (b) Bivariate Gamma data Figure 4.1: The BB, ELR and Normal approximation credible sets of level 95%. 37 / ELR 5.: (a) Data with Small Outlier r '0 8 " / ELR Sc: 4 - 33 5'4: Mama! 5's: 8 (b) Data with Large Outlier Figure 4.2: The three credible sets of level 95% for a normal data set with an outlier. 38 Data Cloud (b) Measuring shift effect Figure 4.3: Diagram to identify the inflation effect and the sift effect of an outlier 39 incorporate the skewness of the data in their shapes and hence in turn capture the skewness of the underlying distribution. Figure 3.1(a) shows 95% confidence sets using the BB, ELR and normal approximation methods based on twelve normal Observations used in Section 2.3. Figure 3.1(b) shows these three sets based on the twelve observations from the skewed distribution used in Section 2.3. For the normal data, all three sets behave similarly whereas in the skewed distribution case, the BB credible set and the ELR confidence set are able tO reflect the skewness in the underlying distribution while the normal approximation method fails to do so. But a problem with the BB and ELR methods is that both the regions are sensitive to outliers. This can be seen from Figure 3.2(a) and 3.2(b). An outlier (not random) is added to the twelve normal Observations used earlier. Figure 3.2(a) and 3.2(b) shows 95% confidence sets using the three methods based on all the thirteen observations for two difl'erent values of the outlier. One can see that the outlier has deformed all the three regions and inflated them towards itself. But the extent Of inflation in BB credible set is less than that of the ELR confidence set, while the normal approximation method is least effected. A quantitative study of the extent of the sensitivity of BB and ELR confidence sets is done here. We will define two measures of non-robustness by considering how much an outlier can deform a set estimate. Let X1, . . . ,Xn_1 be the first n — 1 observations and Xn_1 denote their average. Let the nth observation Xn be such that ”X” — Xn_1|| is large compared to (3.4) n = sup{||X.- - 22.-.”: 1: 2' s (n —1)}. Then call X1, . . . ,Xn_1 as the data cloud and X n as an outlier. A diagram in two dimensions 40 (d = 2) is presented in Figure 3.3. Let C be an arbitrary set estimate for )1 based on all observations including the outlier. To measure the inflation of C, introduce the quantity (3.5) U = sup{llz — X.-.“ = x e C}, which is the distance Of the farthest point in C from Xn_1. (See Figure 3.3b). Note that, large U signifies C has a long nose towards the outlier Xn and one can conclude that the outlier has inflated the region C towards itself. Whereas small U implies less eflect of X, on C. Sometimes the influence of an outlier is so much that the whole region shifts away from the data cloud towards the outlier. (See Figure 3.3b). We say that C has shifted from the data cloud if X-n-1 E C and we measure the shift by the quantity (3.6) L = inf{||x — Xn_1||: x E C}. L = 0 implies no shift and the vice-verse. Large L implies the region C has largely shifted from the data cloud, indicating large influence of the outlier. Let U33 and U 33 denote the extent Of inflation of the BB credible set and the ELR confidence set with coverage level (1 —- p). Let L33 and L 33 denote the shifts for these two sets. Note that in Figure 3.2(a) and 3.2(b), both L33 and L33 are zero indicating that there is no shift effect of the outlier. But there is a large extent of inflation effect. Here we give some theoretical bounds on U 33, U 33 and L 31,. THEOREM 3.1. For any data set X1,...,X,,, (3.7) U3). 2 ..,, “X” ’nX"'1”, 41 and (3.8) L3), 21,,”X" — X’H“ — 1723/2(— logr)l/2(n —1)-1/2, n where 1,, and un are the smallest and the largest roots of the equation l—h n—l (3.9) fn(h) :2 h (1+ >114: r, 0 g h g n. Here r = exp{—%X3’p}. Moreover, as n —-) oo, (3.10) 1,, =10 + 0(n‘l) and un = uo + 0(n’1), where lo and ac are the smallest and the largest roots of the equation (3.11) f(h) := hel‘h = r, o g h < oo. OBSERVATION 3.1. The function fn is continuous, strictly increasing on [0,1), strictly decreasing on (1,n] and fn(0) = 0, fn(1) = 1 and fn(n) = 0. So for every 0 < r < 1, fn(h) = r has exactly two solution, In in (0,1) and an in (1, n) and {h : fn(h) Z r} = [Imun]. OBSERVATION 3.2. For a fixed coverage level (1 — p), the quantity r in (3.9) and (3.11) decreases with an increase in the dimension d, as the percentiles Of a x2 distribution is an increasing function of the degrees Of freedom. Thus 11,, decreases with an increase in the dimension d as both f7, and f are decreasing for x > 1. THEOREM 3.2. Let the outlier Xn satisfies IIXnH = 0(n). Then ”Xn "' Xvi—III (3.12) U33 z (- logp) Table 4.1: The values of un in (3.7) along with no for dimension 2 and 3, and the values of (— logp) in (3.12) for four values ofp 42 d=2 d=3 P "13 U0 1113 no (-10gp) .01 5.95392 7.63835 6.60939 8.85321 4.605170 .05 4.79608 5.74386 5.48027 6.82846 2.995732 .10 4.21403 4.88972 4.89876 5.90078 2.302585 .20 3.55979 3.99431 4.23016 4.91262 1.609438 Theorem 3.1 and 3.2 help one in comparing the non-robustness Of the BB credible sets and the ELR confidence sets. The extent of inflation on both type of sets is proportional to the distance of outlier from the data cloud and is inversely proportional to the sample size n. an in (3.7) describes the constant Of proportionality for an ELR set and increases with the increase in the dimension of the data as well as with the increase in the coverage level. On the other hand, — logp in (3.12), the BB constant, does not depend on the dimension of the data. And most importantly, the BB constants are always smaller than the ELR constants at every level of coverage and whatever be the dimension. Table 1 presents the values of un for n = 13 and d = 2, 3 along with the values of — log p at four different level Of coverage error p. Since an ——> uo, as n —> 00, the values of ac are also attached. These observations indicates some robustness advantage for BB method over the ELR method, but neither method is robust. NO theoretical bound is found for L33. The BB credible sets usually contain Xn_1 and L 33 = 0 unless the outlier is too big. L3), is also equal to zero for small magnitude of the outlier. The first term in the lower bound (3.8) is of the order 0(n‘1) whereas the second term is of the order 0(n’1/2). Hence the RHS is negative often making the inequality trivial. This implies that the shift effect Of an outlier on both these set estimates are negligible, but the inflation effect is very much prominent. 43 The data diameter 7) is stochastically increasing in n and the magnitude is of the order 0(n1/") if the 1“" moment is finite. Hence an Observation Of order 0(n) is rare. But the effect of an outlier is inversely proportional to n and so only an outlier with magnitude Of order 0(n) or more should be of concern. 4.3 Proof Of Theorem 3.1. Define a likelihood function and a likelihood ratio on 52,, respectively as in w = w,- (3.13) ( ) H Rn(w) = n" Hwi. LEMMA 3.1. For every 0 < r < 1, the set (3.14) 633 = {fl(w) : w E (In, Rn(w) 2 r} is the same as the set C33 defined in (3.3). PROOF: By Lemma 1 of Owen (1988), we have (3.15) I‘Mw) 2 r => MP...) 2 r and There is an w E (2,, s.t. (3.16) F << Fn, R(F) 2 r = Rn(w)27'ande=Fa 44 where Fw = ngix... Hence C33 C C33 follows from (3.15). To prove the converse, let 2 E C33. SO there is an F << Fn such that z = 11(F) and R(F) 2 r. By (3.16) there is a w E (In such that Rn(w) Z r and Fw = F. Hence [1(w) E C33. But 11(w) = 11(Fw) = 2. SO C33 C C33 and the proof is complete. Cl By Lemma 3.1, it is enough to consider C33 instead of C33. Define fn(h) = sup{B.,,(w) : w E 911,101; = h/n}. It is easy tO see that the supremum in the RHS. is attained at wh, where the n-vector w" is defined as w? = (1 — h/n)/(n — 1) i=1,--- , (n — 1) and wfi = h/n. So the fn defined here is the same as that in (3.9) and Now fl(w") = WM. + {(1 - h/n)/ 22) fn(hw) Z 1‘. By using the definition of n in (3.4), the second term of (3.17) can be bounded above as n—l (3.19) H 2am.- - X21)“ 3 nK.._1(r) l where n—l - 1 - ~ - Kn_1(r) = supQ: Iw, — 71—517 : w e 52,,_1,R,,_1(w) 2 r}. 1 By equation (5.1) of Owen (1988), K._1(r) s 2(_210g.~)1/2(n —1)“/2. Again by (3.18.ii) and Observation 3.1, Rn(w) Z r implies hw 2 la and the first term of (3.17) can be bounded bellow as Ian — Xn-lll wnlIXn — Xn—l“ Z In 71. 46 Hence for any w E 9,, with Rn(w) 2 r, ||fi(w) — X.-.” 21.. — n23/2(-losr)1/2(n —1)‘1/2, “Xn " XVI—1 II n and (3.8) is proved. The proof Of (3.10) is routine calculus and is omitted. 4.4 Proof of Theorem 3.2. Since all the distances are measured from Xn_1, without loss Of generality we can assume 59-1 2 0. As the BB distribution is the conditional distribution of u(D,,), given X1, . . . , X n; throughout this proof, the sample sequence X is fixed and the randomness comes from (W1, . . .,W,,). Let V,, denote 2'; W,X,-. Then n-l Vn = Wan ‘l’ (1 ‘” Wu) Z WiXia l where W,- = Wi/(l — Wn). Identify W,’s in terms of U(i)’s, the order statistics Of i.i.d. U (0,1), as in Procedure 1, in Section 2.3. AS the joint distribution of (U(1)/U(,,_1),...,U(,,_2)/U(,,_1)) is independent Of U(n_1), so the joint distribution of (W1, . . ., Wn_1) is independent of Wu. Let Vn_1 denote Z?_1W,X,-. Then V, = Wan + (1 — Wn)l7,,_1, and {4,4 is independent of W". Let Z, denote (Van)/||X,,|| and Zn_1 denote (Vlanfllanll. We will find a tn > 0 such that P(Z,, > t,,) z p. To this effect observe that, |2n_1| g n and for large n, lanII > n. 47 Therefore, for a t > 17, using the independence of W, and 2,4, E{P(Wnllxnll+(1- Wnlzn—l > tlZn—lll .. n—l = E 1— t- Z"? IIanl - Zn—l .. l-n t n—l Zn—l = 1— —— E 1— . ( um) { nan} Let cn = n(t/||X,,||). Then P(Z,, > t) t n—l 1 — ) z e—C", ( lanll and - l-n ~ ~ 2 _ n—l = _ Zn—l (n — 1)" Zn—l . (3'20) {1 lennl 1+3. ”Mani“ 2 {1123.11} 0‘” Note that for every i = 1,... ,(n — 1), ~ 11 — 2 Var(W,-) - W, and for i aé j, Cov(W,,Wj) = —:1—2 Var(W1) Hence for an unit vector l E IRn—1 , n—l (TVar(V,,_1)l = Var (Z: WilTXi) 1 n—l = fl gym-)2 — figflxmflxi) I .7 48 — n(n —1)2 n — 2 1 i < 17271—1 as IlTXil S n. Again by the construction of Zn_1, we obtain (3.21) EZ,,_1 = XH = 0, ~ XT ",,_ X (3.22) EZ,2,_1= "V”(V 1) " Sn2 “1. Ianll2 Using (3.20) and (3.21) in (3.22), one Obtains .. l—n Zn—l _1 E 1—— =1+On , { lenul ( ) P(Z,, > t) z 6‘“. Hence Xn-X_ tnz(_logp)ll n n 1”. Note that 17,4 is confined in a small region around 0 with diameter n, X, is far away from 0 and the random variable Wn is concentrated around zero with the density (n — 1)(1 - U)"_21[o,1)(UI- SO the density of V, is high near zero and decreases as we approach X n. So tn will serve as an approximate upper bound for U 3 3. Hence (3.12) is proved. 49 4.5 Connection Between EL and BB methodology. Let P denote the class Of all finite measurable partition of IR". For any n1,7r2 E P, say 1n 5 r2 (read n2 is finer than 1n) if m is a refinement of 1n, i.e. the elements of n2 are Obtained by partitioning some or all of the elements Of 7n. Then j defines a partial order on ’P and ’P is a directed set under j. (A directed set is a set along with a partial order 5 such that for every two elements 1n and n2, there is an element 7r with m j 1r and M j it.) Now for every element 7r 2 {A1, 1 -- ,Ak} E ’P, define a map g7r : .7 —) Q), as 91I’(F) =(F{A1}a”' :F{Al€}): where 52)c is the unit simplex in IR" defined in (1.3). Then for every F E .7, the collection Of vectors {g,,(F) : 7r E ’P} uniquely identifies F. Let A denote the Borel o-algebra generated by the weak convergence tOpology on .7. Then g,r is A-measurable and A is the smallest o-algebra under which every g,r is measurable. Now for i.i.d. data set X1, . . . ,Xn in IR", define a net of non-negative functions on .7 as i=1 j=1 n k (3.23) L,,(F) =H{ZF{A,}1{X,EA,}}, F67, where 1r = {A1, - - - ,A),} is a k-partition of IRd. (See Kelley: 1955, page 65, for the definition Of net.) Then the net Of functions L,r converges pointwise to the empirical likelihood function L(F) in (3.1). TO see this, take no ={{X1},--- , {Xn},IRd — {X1,.. . ,Xn}}. Then L,0(F) is exactly equal to L(F) and hence the limit is achieved at any finite stage 1r with no -_< 71'. Now fix an arbitrary 1r = {A1, - - - , A3}. If one is interested in knowing only the proba- bilities Of A1, - -- , A)c under F, then the problem reduces tO a multinomial problem with cell 50 probabilities equal to g,,(F). Let n,- denote the number Of X ,- ’s in A,. Then the multinomial likelihood Of the parameters g,(F) is proportional to k H(,F{A})" i=1 which is the same as L7r in (3.23). Since the collection Of vectors {g,,(F) : 7r E P} uniquely identifies F, one can think g,(F) as the finite stage parameter value of F and L,r as the finite stage likelihood function at stage 1r. SO at a finite stage 1r, for the multinomial problem, one can put a prior density 1»: on g,,(F) and Obtain a posterior density proportional to L,,(F)n(g,,(F)). A common choice Of non-informative prior in multinomial case is an improper prior with Lebesgue density 1: =(1'[F{A})“ i=1 Under this prior, the posterior density is proportional to k (3.24) H( F{A }) )""l ==R(g«(F)). 1:1 and for any set A,- with n]- = 0, we have F {A,-} = 0 with posterior probability one. Thus we get a collection Of posterior densities Ft(g,,(F)). The question is, does this collection of densities lead to any probability on (.7 , A). The answer is affirmative. One can use the Kolmogorov consistency result to prove the existence of a probability on (.7, A). But the converse approach is easier here. Recall the .7 valued random variable Dn defined in (1.2), i.e. the BB distribution of F. Note that for any finite measurable partition 1r = {A1, - .. , Ak}, the distribution of (Dn{A1}, . .. ,Dn{Ak}) is Dirichlet distribution with 51 parameter (n1, - - - , nk) and the density is proportional to (3.24). SO these collection of pos- teriors lead us to the BB distribution on (.7 , A). Since the empirical likelihood is Obtained as the limit of the finite stage multinomial likelihood and the BB distribution is obtained from the collection of posterior probabilities under a non-informative prior of these multi- nomial problems, one can think of BB distribution as the posterior under a non-informative prior incorporated with empirical likelihood in the entire parameter space .7 . This is the extension of Owens argument in finite support case to general case. Note that this argu- ments is valid if one has the observations on an arbitrary locally compact topological space X. Bibliography [1] DHARMADHIKARI, S. and JOAC-DEv, K. (1988). Unimodality,Convexity, and Appli- cations. Academic press. [2] DEVROYE, L. (1986) Nonuniform random variate generation. Springer-Verlag, New York-Berlin. [3] DICICCIO, T. and HALL, P. and ROMANO, J. (1991). Eempirical likelihood is Bartlett- correctable. Ann. Statist. 19 1053-1061. [4] EFRON, B. (1979). Bootstrap methods: Another look at the jackknife Ann. Statist. 9 1-26. [5] EFRON, B. (1982). The jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia. [6] FERGUSON, TS. (1973). A Bayesian analysis of some nonparametric problems. Ann. S tatist. 1 209-230. [7] GASPARINI, M. (1995). Exact multivariate Bayesian bootstrap distributions of mo- ments. Ann. Statist. 23 762-768. [8] HARTIGAN, J.A. (1987). Estimation Of a convex density contour in two dimension. J. Amer. Stat. A330. 82 267-270. 52 53 [9] KARLIN,S. and MICCHELLI, CA. and RINOTT, Y. (1986). Multivariate splines: a probabilistic perspective. J. Multivariate Anal. 20 69-90. [10] KELLEY, J .L. (1955). General Topology. Van Nostrand Reinhold CO., New York. [11] L0 ,A.Y. (1987). A large sample study for the Bayesian bootstrap. Ann. Statist. 15 360-37 5. [12] L0, A.Y. (1988). A Bayesian bootstrap for finite population. Ann. Statist. 16 1684- 1695. [13] L0, A.Y. (1991). Bayesian bootstrap clones and a biometry function. Sankhya Ser. A 53 320-333. [14] MICCHELLI, GA. (1980). A constructive approach to Kergin interpolation in IR": mul- tivariate B-Spline and Lagrange interpolation. Rocky Mountain J. Math. 10 485-497. [15] OWEN, AB. (1988). Empirical likelihood ratio confidence intervals for a single func- tional. Biometrika 75 237-249. [16] OWEN, AB. (1990). Empirical likelihood ratio confidence regions. Ann. Statist 18 90-120. [17] POLONIK, W. (1995). Measuring mass concentrations and estimating density contour clusters - an excess mass approach. Ann. Statist. 23 855-881. [18] PREKOPA, A. (1973). On logarithmic concave measures and functions. Acta Sci. Math. 34 335-343. [19] REVESZ, P. (1972). On empirical density function. Period. math. Hunger. 2 85-110. [20] RUBIN, DB. (1981). A Bayesian bootstrap. Ann. Statist. 9 130-134. 54 [21] SETHURAMAN,J . and TIWARI,R.C. (1882). Convergence of Dirichlet measures and the interpretation of their parameter. In Statistical Decision Theory and Related Topics 3 (S.S.Gupta and J .O.Berger, eds.) 2 305-315. Academic Press, New York. [22] TSYBAKOV, AB. (1997). On nonparametric estimation Of density level sets. Ann. Statist. 25 948-969. [23] WENG, CS. (1989). A second-order property of the Bayesian bootstrap mean. Ann. Statist. 17 705-710. "‘11111111111113)“