wm‘znqm . . .1 t t a . w "1 . . ‘ .. u u v! . . 1 1K: .to. . .4... .:~ .3 1h. fa» .1 13! :« him... . .1. .1.m...1..1...1.....n. .1... .1: i... .n 11”}th . J r. t). a. . .u 1../fly: .3 u... ”.1. 311! 4.. my... 1., a." 3.1.. , .II. '1 MICHIGAN STATE I 3IIIIIII1III I IBRARI IES II1I1IIIII III II 417 2682 II III 1III This is to certify that the dissertation entitled REGRESSION MODELS WITH (CASE 2) INTERVAL CENSORING presented by Vasilis Katsikiotis has been accepted towards fulfillment of the requirements for the Doctor of Wdegreein Wand Probability Date August 2, 1995 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Mlchlgan State UnlversIty PLACE IN RETURN BOX to remove thie checkout from your record. TO AVOID FINES return on or before dete due. DATE DUE DATE DUE DATE DUE I MSU I.“ AA‘! .L A n -1 lArr .2; .. A. I HIS-9‘ REGRESSION MODELS WITH (CASE 2) INTERVAL CENSORING By Vasilis Katsikiotis A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1995 ABSTRACT REGRESSION MODELS WITH (CASE 2) INTERVAL CENSORING By Vasilis Katsikiotis Interval censoring occurs frequently in longitudinal studies with periodic follow-up. The outcome of interest is not directly observed but its occurrence can be ascertained within an interval of successive inspection times. The Accelerated Failure Time (AFT) and the Proportional Hazards (PH) are two of the regression models used widely in survival analysis and reliability theory. Maximum likelihood estimation is pursued in both models in a semiparametn'c framework. Existence of the estimators is established along the lines of Groeneboom and Wellner (1992). Strong consistency is proved and necessary conditions are given under which the information for the finite dimensional parameter is positive. The importance of the information calculation is illustrated in two ways. A lower bound for the asymptotic variance of regular estimators is derived first. Moreover, the benefit of scheduling two inspections instead of a single one is measured explicitly by the anticipated gain in the information measure. Estimates of this measure are also provided. Lack of smoothness in the function 9 l—) F,,(-,9) motivates the search for alternative estimators in the AF T model. Asymptotically Generalized M-Estimators (AGME) are considered and a few of the conditions of a general theorem due to Bickel, Klaassen, Ritov and Wellner (1993) are established. A simulation study evaluates the performance of the MLE in the AFT model. N'I‘S n l — t. . . p e ~ It. t . . v .. .1 . ._ _. .i . . v... .. . ‘ k - ”will be an mm -3) ACKNOWLEDGMENTS I would like to thank all members of my guidance committee for their encouragement and support during my years at M.S.U. Special thanks to Professor R.V. Ramamoorthi for some very valuable suggestions. His comments have greatly improved early versions of this manuscript. My deep appreciation to Professor D. Gilliland for his critical support during my first difficult year in the US. To Professor P. Groeneboom, special thanks for making his computer programs available to me. Finally but not ultimately, my sincere appreciation to two people that mean so much to me. To my advisor, Professor Joseph Gardiner, whose guidance and continuous support was vital for the completion of this work. To Professor Nigel Paneth whose research, analytic thinking and approach to problems will be so influential to me now and in the years to come. it TABLE OF CONTENTS List of Tables List of Figures Chapter 1: Introduction 1.1 Regression models in survival analysis 1.2 Random censoring 1.3 Interval censoring - A general scheme Chapter 2: Maximum Likelihood Estimation in Two Semiparametric Models 2.1 The Cox PHM 2.2 The Accelerated Failure Model 2.3 Maximum likelihood in semiparametric models 2.4 Interval censoring with two inspections 2.5 Profile likelihood estimation 2.6 Characterization of the NPMLE Chapter 3: Strong Consistency of MLE Chapter 4: Information Theory 4.1 Efficient scores and information bounds 4.2 Preliminaries 4.3 Information lower bound 4.4 Estimation of [(90) 4.5 Comparison of information measures vi 22 34 35 36 48 Chapter 5: Generalized M-estimation in the AFT model 5.1 The master theorem 5.2 Generalized M-estimation under interval censoring Chapter 6: Simulations Appendix Bibliography vii 52 53 61 67 72 5.1 5.2 6.1 6.2 LIST OF TABLES Information measure in the Cox model (y=.2) Information measure in the Cox model (y=.5) Profile Likelihood Profile Likelihood viii 49 50 62 63 Chapter 1 INTRODUCTION 1. Regression Models in Survival Analysis: One of the principal goals in survival analysis is to make inference about the time to a specific response or event, in relation to the risk factors that influence its occurrence. In most applications the identification of important risk factors is a challenging problem itself, sometimes with significant statistical input. Regression models establish a relationship between the outcome of interest and a vector of covariates. Although such models merely approximate the true relationship between two groups of variables, they become important analytic tools, especially when they build upon the characteristics of the variables involved. Exploratory statistical methods often provide a very useful insight to the relationship between the lifetime of interest and the covariates involved. Models based on the monotonicity of failure rates or the time-invariant relative risks for failure or the proportionality of odds, have been used in a variety of situations with considerable success as far as goodness of fit and interpretation of results. On the other hand, general regression models often demonstrate a trade off between adaptivity to specific applications and mathematical complexity. The Cox Proportional Hazards Model (PHM, Cox (1972), is one of the most frequently used models to express the relationship between a lifetime and a vector of 2 covariates. There is an enormous literature that refers to this model and statistical methods associated with it have been examined in a variety of situations. Multiplicative Intensity Models (MIM, Aalen (1978), make up another important category of regression models used in survival analysis. They are based on a product factorization of the rate of failure (intensity process) into a component that describes the risk set at a given time t and a component that describes the risk of failing at t, given the covariates 2. Such models have been used in a variety of scientific fields like medical research, econometrics, reliability and engineering. Models with increasing (decreasing) failure rate are often used to incorporate some prior knowledge on the distribution of the failure time, while others that assume proportionality of odds-ratios are applicable to situations more general than those described by the classical PHM. A general regression model establishes the relationship between a failure time and covariates through a regression function g. When g is linear (nonlinear) and known up to a finite dimensional parameter 9 , then we obtain familiar formats of regression models, like the linear regression and the accelerated failure time models. When g remains unspecified to a large extent, then the problem of nonparametric regression - one of the most outstanding ones - provides a wide open field for further research, while at the same time it allows the broadest range of applications. In this dissertation we consider estimation methods in regression models that represent the two major categories of models described earlier. In particular we analyze the Accelerated Failure Time (AFT) and the PH models under a censoring scheme that occurs frequently in longitudinal studies and studies with periodic follow up. 3 2. Random Censoring : One of the characteristic features in the analysis of survival data is the partial loss of information on the main variable of interest. Follow up studies have usually a finite horizon over which the outcome of interest might occur. In addition, subjects may withdraw from the study for a number of reasons or simply miss scheduled examinations. Moreover there are situations where continuous flow of information with respect to the variable of interest is virtually impossible. In such cases, an inspection scheme is needed to provide the necessary information. If X denotes the time to event i.e. ‘failure time’, then there are situations where X might be : 1) fully observed, 2) partially observed, 3) not observed at all. Usually there is a competing variable Y called the censoring variable and an inspection scheme that provide information about X in cases 2 and 3 respectively. This variable can be either fixed or random. Non-random censoring is common in economics related research, where variables are observed or become of interest, whenever they fall above or below a fixed threshold. A simple example is that of an employee who plans to achieve a certain goal during his/her tenure with a company. The time at which he/she achieves that goal is observed as long as it occurs prior to the termination of the employee’s service. Random censoring occurs naturally in a variety of situations. In toxicological experiments animals injected with suspected carcinogens are monitored for tumor development (Hoel and Walburg (1972)). The presence or absence of tumor is assessed at random times of sacrifice for each animal. Depending on the lethality of the injection, the time X of tumor onset is randomly censored at the time of sacrifice Y. 4 Other observational studies provide a window that allows information on X. To assess motor development of pre-school children, a study was planned to test the skills of participating children (Leiderman et. al. (1973)). If a child had the skill prior to the initiation of the study, then the time to event is left censored at the beginning of the study. If the child develops the skill during the study period, then the time of the event is directly observed. If at the end of the study a child lacks the specific skill, then the event of interest is said to be censored to the right, at the study’s termination period. The window over which X is observed consists of the random time of entry and exit for each subject. This scheme is known in the literature as double censoring (Tumbull (1974)). Another form of random censorship is the one where X is never observed but known to belong to a time interval, consisting of the last negative and the first positive assessment of the event’s occurrence. The time of mail delivery illustrates this form of censoring. One is interested in the time the mailman delivers a piece of mail. However only a sequence of mailbox inspections provides information about the time of delivery, which is known up to an interval. This scheme is known as interval censoring, (Peto and Peto (1972), Tumbull (1976)). 3. Interval Censoring - A General Scheme: interval censoring occurs frequently in studies in which information about an event of interest is obtained by an inspection process that assesses at each inspection time, the occurrence or not of the target event. The most appealing applications of interval censoring, appear in cancer and AIDS related research. An important measure of effectiveness of a treatment therapy in cancer research 5 is the length of time that a patient remains in remission (remission duration) - Rficker and Messerer (1988). Remission duration is defined as the time period between complete remission afier treatment and tumor relapse. It is clear that both the initial and terminal events that define remission duration are subject to interval censoring because they can only be assessed by a sequence of inspections. A similar problem occurs in situations where one is interested in the length of the incubation period of AIDS. By incubation period we mean the time elapsed from infection to the onset of clinical AIDS. Evidently at least the initial event (time of infection) is in almost every situation subject to interval censoring. The following general model that describes the interval censoring mechanism, has been proposed by Wang, Gardiner, Ramamoorthi (1994). Let {Wk :k 21} be a sequence of ordered positive random variables that represent the potential examination times for a subject. At each time Wk an assessment is made of whether or not the event of interest has occurred. Let t denote current time measured from the beginning of the study and define N (t) = min {k 21: X 5 W, S t} , if such k exists, with N (t) = 00 otherwise. Also let M(t) = max{k21: W,Ir St}, W0=0 and Wm = WM. Assuming at least one examination is made, N when finite, marks the first assessment at which a positive diagnosis of the event of interest, occurring at the unobserved time X is made, by time t. On the other hand if N = +00, we only have knowledge that X > WM. In their paper these authors prove the identifiability of the distribution of X from the datum (WN_,,WN,N, M). The time of diagnosis of the event of interest is defined as Z=WN, if N WM, when N=+oo. 6 In what follows we will consider a rather simple interval censoring model consisting of two inspections. Groeneboom and Wellner (1992) call this scheme, case-2 interval censoring, to distinguish it from the case of a single inspection (case-1) which can be viewed as a degenerate interval censoring situation. The manuscript is organized as follows. In chapter 2 we discuss the existence of maximum likelihood estimators in two semiparametric models. In chapter 3 we prove the strong consistency of the estimators. Chapter 4 examines the optimality in the estimation. We compute efficient scores and information lower bounds for the finite dimensional parameters and provide estimates of their asymptotic variances. In chapter 5 we consider generalized M-estimation in the AFT model. We review some fundamental results from the modern theory of empirical processes and establish a few of the necessary conditions needed to obtain the asymptotic distribution of our estimators. Finally in chapter 6 we present some simulation studies to illustrate the performance of our estimators. Chapter 2 MAXIMUM LIKELIHOOD ESTIMATION IN TWO SEMIPARAMETRIC MODELS 1. The Cox PHM is one of the most popular regression models used extensively in survival analysis. It assumes that the conditional hazard function given a vector of covariates Z, factorizes as (1.1) M42) = xo(r)g o with 7;, a baseline “failure time”. The model’s name illustrates that the covariates Z have an accelerated (decelerated) effect on the survival function of T, compared to the corresponding function of It}. The log-transformation reduces (2.1) to the familiar linear regression model (2.2) X = 9 'Z + s with X = logT and a = loglg. This model has been studied in a variety of situations including random (right-left) censoring. Important work on the subject can be found in Buckley, James (1979), Koul, Susarla, Van Ryzin (1981), Ritov (1990), Schick (1993). 3. Maximum Likelihood in Semiparametric Models: Consider a family of distributions 0’:{R,,F: (9,F)e®x.7} on a measurable space (1.8). Let u be a measure on (B and p9.F(-) adensity of If” with respect to 11. Suppose that @CRd while .7 is an infinite dimensional space. Let (90,F{,) be the true parameter and suppose that X ,,X2,...,X is a random A sample from Pom. Define the maximum likelihood estimator (6 n , 1:7,.) by (6 ,, , 13;) E arg max I log p9 ‘F (x)dP,, (x) with P” the empirical measure based on 6 x F X1,X2,...,X , under the assumption that such maximizer exists. Let n §=(9,F) e EEOXJ. 10 4. Interval Censoring with two inspections: Let A be a random variable having distribution F3, (T, U) random variables with joint distribution H, Z a random vector with distribution W. Denote by J the joint distribution of (T, U,Z). Suppose that A is independent of (T, U) conditional on Z with distribution AIZ ~ G 0 , £0 65. We will refer to (T, U) as the censoring variables and will assume throughout that Pr{T < U}: 1. Consider the measure space (R x R2 x R", 03 ,Qg) with Qg a probability measure on the Borel o—algebra 03. Denote by Y° .=_ (A, T, U, Z) a typical element from this space and let (i) be the measurable transformation Y =¢(Y° ) = (5,y ,T,U,Z) where 8 =1“ 5,}, y =1{T ude§(a,t,u,z) with A x B, E Borel sets in R2 and K” respectively. Our problem is to estimate £0 =(60,E,) on the basis ofa sample {K,I§,...,X,} of independent and identically distributed observations. In what follows, we consider maximum likelihood estimation in regression models under the censoring scheme described above. We will call this situation interval censoring without any further reference. 4a. The Cox model with interval censoring: Suppose that the hazard fimction A. , associated with a nonnegative random variable X, conditional on a vector of covariates Z, is given by (4.2) Mxlz) = 2.(x)e°" We maintain all the notations and assumptions introduced earlier in the section, with the exception that A E X _>_ 0 w. p. 1. In addition we will assume that (T, U) and Z have densities h and w with respect to Lebesgue measure, which do not depend on 9 GO . Based on the observable y= (8 ,y ,t,u, z) , we have that for g = (G, F) e E P,{8 =1|T=t,U = u,Z = z} = P§(X s tIZ = z) a F9012) = 1—[F(t)]' Using the definition of the conditional cumulative hazard function A(t|z) a JIMsIz)ds 0 and (4.2), we can write the density of Y with respect to u = v2 x 1 4+2 [—1 —5 (4.3) p.(y) =(1-r(:)°*”‘°”)5(r(t)“"<“’ —r(u)°**"°"’)’ (r(u)°""‘°"’) h(t,u/z)w(z), where v2 =counting measure on {0,l}®2 and r d = Lebesgue measure on R". The transformation 7 = e"" allows an equivalent form of the density (4.3), namely (4-4) My) = (1 -exr>[-A(t)e°"])5 x (eXP[—A(t)e°"]—exr)[-A(u)e°"l)7 x (cxp[—A(u)e°“])""‘ x h(t,u|z) x w(z). Very often we will switch between (4.3) and (4.4) depending on the circumstances. Since we have assumed that h and w do not contain any information about 9 and since our primary goal is to do inference about 2; , we can proceed safely considering h and w known. The log-likelihood function based on n independent and identically distributed observations { Yl ,I’; ,..., X, } is (up to an additive constant) given by Maj”)::5"1°gl1‘fl’ilmml’rti10g(F(t,-)°’“"°"”-F(u.-)m(°'z'))+ i=1 20‘” i _5i)log17(ui)ew(9'zl) l=l and ln(9,A; y) = :8, log(1—e-A"I"°*' )+‘yi 1046mm”: _ e—Amv”, )_. i=1 —(l—y,. —8,.)A(u,.)e°'z' Early attempts to estimate the parameters in the Cox model under interval censoring were confined to purely parametric methods. Finkelstein (1986) has considered maximum likelihood estimation under the assumption that the baseline hazard has finite support. 'Recently Huang (1994) has completed a thesis on efficient estimation in the Cox model with case 1 interval censoring. He has proved that the MLE of the finite dimensional parameter is asymptotically normal and efficient. Although from one point of view our results can be taken as a natural generalization of Huang’s results, case 2 interval censoring, still in its infancy, displays difficulties that do not appear in case 1. The biggest of all is the potential “nearness” of the two inspections. This is not a mere technicality that one has to address in a way or another but an integral part of the problem. Although there are complete results that describe the asymptotic distribution of the NPMLE in case 1, (Groeneboom (1989)), no asymptotic theory for case 2 has been developed as of today, 13 to the best of our knowledge. In fact, not even the rate of convergence of the NPMLE - fundamental tool in efficiency considerations- is known for case 2 as opposed to case 1. 4b. The linear regression model with interval censoring: Consider the model (2.2) with 8 having distribution Fo. In addition to the basic assumptions for the interval censored models that we considered earlier, we assume here that e is independent of the covariates Z. Then for A a e , a density of Y with respect to u is given (up to a multiplicative constant) by _7.5 (4.5) p§(y)={F(t-6'z)}5{F(u—9’z)-F(t—9’z)}7 {l—F(u-—9’z)}l and the log-likelihood fimction by 1,,(e,F; y) = is, log{F(t,. —e'z,.)} +y ,. log{F(u, —e'z,.) — F(t,. -9'z,.)}+ (4.6) ‘1‘ 2,10 —y , —6,)log{l — F(u,. —e'z,.)}. Finkelstein and Wolfe (1985) were the first to consider this model under interval censoring. To model the joint distribution of (X2), they introduced a parametric formulation of the conditional distribution of Z given X and they estimated the distribution of X, using the “self-consistent” nonparametric estimator of Tumbull (1976). Although they argue that their estimators are maximum likelihood in nature, it is well known today that the self-consistent equations do not always yield maximum likelihood estimators. 14 5. Profile Likelihood Estimation: We consider here a three step procedure that yields a maximum likelihood estimator for the models we have introduced. This is a standard approach for M-estimation in semiparametric problems and has been used among others by Anderson and Gill (1982), Whittemor and Keller (1986), Leblanc and Crowley (1995). It can be summarized in the following three steps. 81ml: For 9 e 9 fixed, consider E,(-,9)=argmaxln(G,F). Fe} Step1: Replace F by F;(-,6) and consider the profile likelihood function 9 1—) 1,,(9,E,(.,9 )). m3: Let 6,, =argmaxl,,(0,E,(-,9)). Gee Set 6, a 5, and i;(-) = F,,(-,§,,). There are a number of issues that need to be clarified before we proceed to the properties of the maximum likelihood estimator. In Step 1 we need to justify the existence of the maximizer. Moreover a practical method for the computation of the maxirnizer might be a priority. In the next section, we provide the arguments for the legitimacy of step 1 in the two regression models that we consider. In that regard, the work of Groeneboom and Wellner (1992) on the existence of the NPMLE is the basis for our arguments. Details on the computational aspects of the MLE from interval censored data and algorithms to carry out step 1, are given in Groeneboom and Wellner (1992). In relation to step 3, there seems to be an ad-hock assumption that 9,, 69 Vn . This is not generally the case. We will only need (in GO eventually, with high probability and this result will be established in Theorem 3.1. Finally, with all three steps substantiated we obtain 1,(é,,&)21,(§,,F,(-,6,) with (6,,F) = argmaxln(0,F). 8x] Moreover I,(6,,F,(-,6,) 21,(9,F,(-,e)) 2 1,,(6,F) v (9,F)e®x.7, by steps 3,1 respectively. It follows that ln(én,13;)=ln(§", [ix-,6" )). This proves that the three step procedure described above yields a maximum likelihood estimator. In the remainder of this chapter we will fix a 9 66. We will also use the abbreviation q,=6’z,. for ie{1,2,...,n}. 6. Characterization of the NPMLE: In this section we state necessary and sufficient conditions for an estimate of F to be a maximum likelihood estimate. Our focus is in the Cox model for which we provide the full details. In Theorem 6.2 we state similar results for the linear regression model without any proofs, since this would be a duplication of arguments to a large extent. We consider the mapping A H 1,,(A) based on (4.4), (6.1) l,,(A; y) = Est-1030‘ e—Mnk'n )+Y i log(e-A(n)e"l _e—A(u,)a. )_ (1 ‘71- —8,.)A(u,)e"’ . i=1 Let Jfln={T,.: 8,.=1 or y,=1 for i=1,2,...,n}, J9)={U,: y,=1 or 1—6,—y,=1 for i=1,2,...,n}and Jn=.}f,')U 153’. 16 Notice that J,' c:{7,7 ;i=1,2,...,n} U {U,. ; i=1,2,...,n} marks the set of relevant observations which contributes to the likelihood function. Let 0 S "(1) S...S no") be the order statistics of the elements of J". Write A j E A(n (1.)) and notice that 0 3 A1 5. A2 5...: A", due to the monotonicity of A . We abuse notation when we write 8(1)’ 7(1) or Z”) referring to 5's, 7's and 2's associated with nu). The MLE A" of A0 can be chosen to be a right continuous, nondecreasing, step function, with jumps at J". Set 11(0) =0 and An(0)=0. Then A" will have the form 0 05t71(,,.)- It is worth noting that if 5(,)=0 or (6(m)=l or y(m)=l) then A"(n(l))=0 or An(n(m))=oo respectively. So without loss of generality we will assume that 8 = 1 and 5 (m) = y (M) = 0. In this way we can restrict ourselves to firnctions fiom m £2{A: A(nm)>0, A(n(m))0 Vje{1,2,...,m}}. In this way we can avoid pathological situations with maximizers of the form 1,,(An, Y) = —oo. Before we state the main theorems in this section, we introduce some additional notations. Let _ 5'. Y: GI‘AUll‘m A110) _ Z (1_e—A(7j)e" - e—A(T,)e‘Ii _e—A(U))¢"' ) e 7.- Fri-5.- m-MUIW + Z [e—Mrw _e-A(U,):" — e—A(U,)e" )e — 8 ‘Y q—A(s)e7 — n’l [I —e‘A(’)'q — e‘Al‘)” _e-A(v)"’ )e dPn(5,Y ,S,u,z)+ sSI Y _ 1_Y _6 q—A(u)e" "i ie-Ame‘ _e-A(u)e" e-Atuw )e dP"(5'Y'S’":Z)- 115! Note that the process W has jumps at points of J" . THEOREM 6.1. For fixed 966), let q,=9'z,. for ie{1,2,...,n}. Suppose that 6 =1 and 6 (m) =7 (m) = 0. The following conditions are sufi‘icient and necessary for a (1) function An(-,9) to maximize (6.1) over i. i) dex #(s)50 VtZO and ii) JAn(s,6)dW/~\mq(s) =0. l 0 Moreover An(-,9) is uniquely determined by i) and ii). Proof. Define S={f=(A,,A2,...,Am) :Aj=A(n(j)), A61} and let 0 we have (31+elj) 63 V ISjSm. Thus lirn 0, . l ’ if I j = 0 , else forsome lskSn and Ak=A(U,,)—A(Tk). It follows that Z(y,— y) V,,.(p(y )= 0 from which we obtain y: = y, Vi such i: 5(,)=1 0R y(,)=l that 6(1) = 1 or y( 1) = 1 . This establishes the uniqueness of our estimator. Theorem (6.1) is now proved. REMARK 6.2. As a result of Proposition 2.1 in the appendix, conditions (i) and (ii) o A w. .1 always hold. This along with the consistency result (90 GO, 9,, —P> 9°) confirms that the M.L.E. A"(-) = An(én,-) always exists for n sufficiently large. REMARK 6.3. If we set i;(-) = 1 — e‘M" then we obtain a nondecreasing, right ‘3‘. 21 continuous step function, satisfying F;(0) = O, 0 < 132010)) <1 V 151's m. This function is defined explicitly as a result of our Theorem 6.1 and has jumps at the same points as A" does. The monotonicity of our transformation A —> 1 —— e“ implies that F; = argmax l,,(F) , with .7 : {subdistribution function F: F(0) = O, 0 < F(nm) <1}. For REMARK 6.4. We can treat the problem of maximum likelihood estimation in the linear regression model (2.2) in the same way. To justify step 1 in the profile likelihood approach, we need to consider _ _ I _ _ I 9.. 9_ fi—T} Gz, Uta—U, 92,8i—l{s,5fi,},y,—1{7I,=Elie1[tiir) trun- —'F<—e)) E ,..[,(U.)f,(,.)—‘;_5,Z(;;§]- Moreover F;(-,9) is uniquely determined by 1) and ii). Chapter 3 STRONG CONSISTENCY OF M.L.E. In chapter 2 we have defined the maximum likelihood estimator (Sm/X“) and (Shin) in the Cox and the linear regression models respectively. In the next two theorems we prove its consistency under a suitable topology on the parameter space. Pfanzagl (1988) proves consistency of the NPMLE under a global condition -compactness of G) - and a local condition - continuity of 9 —> f; in the neighborhood of 90. Van de Geer (1993) obtains consistency of the NPMLE with respect to the Hellinger distance using some entropy calculations. Although her results cover a much wider range of applications and in certain cases lead to rates of convergence, we adapt Pfanzagl’s approach here, since it is more direct and suitable to the demands of the semiparametric nature of our problem. Theorem 1 presents the consistency result for the Cox model, while Theorem 2 is its direct counterpart for the linear regression. For the Cox model described by (2.4.2), a density of Pg with respect to u = v2 x 1:2“, is given by 1—1— _ , a _ , _ , __ ,2 5 Pt“) ___ (1_ F(t)ocp(e 2)) (Forms 1) _ F(u)=xp(ez))1 (F(u)cxp(e )) h(t,u/z)w(z) with é = (6, F) e E E G) x .7, v2=counting measure on {0,1}®2 and 13d Lebesgue measure on R" . Define 22 23 My) a 6(1- main)” (F'(r)“"‘°"’ - Form) +(1 - Y — gamma). Let SH be the support of H and define a0 Esup{x:Fo(x)=0} , bO Einf{x:Fo(x)=1}. THEOREM 1. (Consistency in the Cox model) Suppose that (i) O is a subset of Rd with bounded closure . (it) 90 e G, the interior of 6. (m) V 9 it e, Pr{e Z ¢ 902} > 0 (iv) V (t,u) ES”, 05a0 60 and Ii(y)i>fi(y) Vy 6 En CFO, under P0 5 Pow-1,), where C F0 denotes the set of continuity points of 11; and E = (do, be) . In the special case that 143 is continuous on E, the above convergence is uniform with probability one. i. e. sup P: (y) — F}, (y)| —“'—) O. yeE REMARK 1. Assumptions (i)-(iii) are essentially the same as in Pfanzagl (1988). While assumption (iii) safeguards against non-identifiability problems, (iv) is naturally satisfied in almost all problems with interval censored observations. Moreover (iv) is essential here since the MLE F‘ I1 is uniquely defined on S H and it is a step function on [0, b0). Finally Pfanzagl’s assumption of a concave density with respect to the unknown parameter, seems unnecessary here. He used it in order to verify condition (2.6), page 24 140, in his Lemma 2.5 -originally due to Wang (1985). We show that a more direct way is indeed possible. Proof of Theorem 1. Consider the measurable space (EA?) with it the Borel o—field. Let m = { measures F on (E,d§) : F (E)Sl }. We have defined earlier .7 to be the set 3 = {F= distr. function: 5(0) = 0, o < F(n(,.)) < 1, F(n,) — F(n,._,) > 0 Vi 6 {1,2,..., m}}. Note that 7 c m. Equip m with the topology .7, of vague convergence, i.e. the smallest topology that makes fimctions of the form F 6771 1—) I de continuous V f e C‘(E), the space of continuous functions with compact support. Helly’s theorem asserts that ( m, .7, ) is a vaguely compact topological space. Equip (9 with the usual Euclidean topology .7,. Then (6 x m, .7 ) is compact in the product topology .7 = .71 x .72. We say that (9", E)-'"—’>(9, F) if and only if 9,, —) 9 and E, —'> F, with the latter denoting vague convergence. Fix an arbitrary ae(0,1) and let F 6.7. For 5,: (6, F)¢(90, 1%) = £0, E E Ego , f a j; , f0 5 ft. , Jensen’s inequality applied to the strictly concave function 1,,(E,o) which implies glog%:%20 and (1.2) go“0E:;]=§1ogg[1+a(fi:(:3—1J] ( =§log[a;:[ —(—:;]+ +(1— a)]_>_ aglogfigg 20. Consider now a collection {71,(g): e > 0} of nested neighborhoods of g. Let 71E E 719(é) andf (y): sup fg (y) Notice that §—) f§(y) is continuous under .7 for u—ae y, and é'efl. bounded above by 1. Thus Z(y) lo f (y) for u-a.e y. We want to prove the measurability of y 1—> Z( y) . Notice that for We denoting the closure of 71 under.7, we can find @671 suchthat f()= sup.f§(-). If £6715 then 6571‘ measurability of y 1—> f§( y) holds and there is nothing to prove. However if E 6 ‘71E \71E then El (5,")"5" in 718 such that §,—’>E. Thus f§(y)—&+f§(y) and by the completeness of u, fg is dimeasurable. It follows that Metal) .. If (MkzkeN) with Mk>0 Vk and MkToo then by the Lebesgue dominated convergence theorem we obtain 26 lsigiE[(p[%]/\ 114,]: E[(p[£((:))]f\ Mk] Vk. fly) fo(y) convergence theorem and (1.1) give mm 7.0) A = im f(y) A 11~E111101y11 “1 11911111111 M11 Moreover since (p[ )A M k 2 log(1— a) Vk , an application of the monotone Thusfor e>0 small 3 k-=-k(e): Zm A (1.3) E[0, with f: —-.supfg 561/; v—‘fi J“) a fit b \W_l 0 C:3 m S v 3 :C3 as 27 7,111) fem Set 155,. =(p[ JA Mj. The random variables {153.121} are i.i.d \7’j e{l,2,...m}. Moreover le’J —El’}'J| S M]. —log(1—a) and as we have shown in (1.3), Eljf, < 0. Applying Hoeffding’s inequality to the sum of centered, bounded random variables If, — EY’ we obtain 1.1 1 111%. es}s‘:&{iA.(y) Vy eEn CF, ix.(y)—A.(y) J—m. and in case that F; is continuous on E, sup yEE We now turn to the linear regression model (2.2.2). A density of I; with respect to u = v2 x rm is given by pg (y) = (F(t —Bz))6(F(u — Oz) — F(t - 92))7 (IT-(u -— 92))H—8 h(t, u 2M2), with I; = (9, F) , v2=counting measure on {0,1}®2 and rd Lebesgue measure on R”. Let j;(y) =8F(t - 62) + y[F(u — 62) — F(t — 92)] + (l — y - 5)F(u — 92), with 7;, = T—GoZ , Uo = U —60Z . Denote by V, J the joint distributions of (73, U0) and 29 ('1', (1.2) respectively and by S, the support of V. THEOREM 2. (Consistency in the Linear Regression model) Suppose that: (i) O is a subset of R” with bounded closure (ii) 60 60, the interior of 6) (iii) V 9 at 90 Pr{9Z¢ GOZ} > 0 (iv) V (t,u) eSV, 05a0 Fo(y) Vy 6 En Co, under Po E P(6..Fo)’ where C F0 denotes the set of continuity points of IS and E = (a0, be). In the special case that If; is continuous on E, the above convergence is uniform with probability one, i.e sup FA; (y) - F0 (y)‘ —"‘-) 0. yeE Proof of Theorem 2. As in Theorem 1, we endow the parameter space with the product topology .727, x .7,. Continuity of §1—> f§(y) holds for u—a.e y, under .7 and the proof goes through as in Theorem 1. REMARK 3. In his thesis, Huang (1994) gives a proof of the consistency of M.L.E. with case 1 interval censoring, based on a uniform law of large numbers (generalized Glivenco-Cantelli) for V.C. subgraph, classes of functions. This nice proof is based on 30 the specific form of the model under consideration - as opposed to the proof we gave earlier-. Moreover it introduces some useful technics - largely due to the powerfirl results from the theory of empirical processes. This proof can also work in case 2 interval censoring, with some modifications. We present it here for our linear regression model. Second Proof of Theorem 2. Using the same argument as in Theorem 1, we will treat (5 x 771, 7) as a compact topological space with .7 the product topology. Let Q={w =nsNz y]. =(8j,yj,tj,uj,zj)}, 01 the Borel c-algebra on Q, R, a P:o the true underlined distribution, (0,8,1?) the corresponding probability space. Let P" be the empirical distribution based on i.i.d. observations Yl , Y2,...,lf,. From the strong law of large numbers (S.L.L.N) and a separability argument 11;”{oaz P.(y;w)—,> m1} =1 Vy = (5,7.t,u,2)- Now fix an a) from the above set and denote Q, E Q" (co) , 1:105 [ix-,co). For every subsequence (n’) c (n), 3 (n") c (n') and a (9.,Fl) O x 771 , such that (2.0) 6,. —» e. and F —"+ F.. We want to show that 9. a 00 and F2 E Fo. Define a" =inf{t—énz : (t,u,z) eSJ}, b" = sup{u—é,,z : (t,u,z) eSJ} and let a. ,b. denote their counterparts when 9. replaces é" . At first we would like to prove 31 (2.1) Va,be(a.,b.) with a o, (F, — 1;)1lb “(u—é nz)—'“+o and (P, — 13,)1[b_5.](u-é nz)1W_](r -6 .2) Am. Bounded convergence theorem implies, 32 (2.7) P, y ilwot — énz)l[a”&'](t — énz) —"—) 1:, y 1”.”(21 — 9.2)1[a”m](t — 9.2) R) (1— Y —5)1[b.5.](u — énz)—,,) R) (1‘ 'Y _ 8)1[b,b,](u " 9'2) = EJFb(Uo)l[b‘b,](U') > O and 1:, 511m ](t-é,z) —.> F0 81[a"a.](t—9.z) = EJF5(73)1[0"0_](72) > 0. This is true by assumption (iv) of our theorem, the definition of a., b. and the fact that a. < a < b < b.. Notice the use of some extra notation, 7: E T—9.Z , U. E U -G.Z. The combination of (2.4) and (2.5) with (2.6) and (2.7) proves (2.1). Now let’s consider the following empirical processes B.‘ = I 1...](r 43.218 loam —é.z)dP. F. , P, a PO. The combination of G.G.C.T, (2.0), (2.1) and bounded convergence theorem gives BfiLBi j= 1,2,3. Although this convergence is straightforward for j = 1,3, it deserves a closer look when j=2. We need to check how the G.G.C.T. applies. Notice that if 1W](t -é,,z)l[afi](u —é,,z) = 0 eventually, then B: —L) B} E 0. However if 1w)“—énz)l[a»](u—énz)=l eventually, then B: as a sum over the “relevant set” of 33 observations J 9' n 3 involves only terms for which log[F,,(n?" ) — FLU)?" )] > -oo for some i >j , (REMARK 2.6.3 and 2.6.4). Therefore in this case, F,(b) > F,(a) since [a, b] contains at least one element from J?" , eventually. It follows that B: 5 mm. — M.)Il[,,,,](t ~é.z)1[,,](u -é.z) 7 d(P. — Foxy) + 11...,0 —é.z)1,.,,,,,(u—é.z) 1’ Iog[fi‘.(u—é.z) —F.(r —é.z)] dIMy), which implies that B: i) B} . A 3 . 3 , 3 . Since I,,(§0) 3 mg") 5 EB; Vn, EB; 49231, I,(§o)i>C, it follows j=l j=l H 3 that ZBJZC Va.O . The Fisher information matrix is defined as [(6) 441'ng . We say that 8 is regular in ('9 , if 0° is: (i) dominated with 1.1-densities pe (~) , (ii) Hellinger differentiable with derivative i9, (iii) the function e H p9 (x) is differentiable for u-a.e. x. Let’s fix a measure P0 60’ and let a elg(&)={h 617(1’0): Itho =0} be an arbitrary function. Without loss of generality we might assume that a is bounded. 36 Otherwise for some M>0 we can work with aM(x) = a(x)l{laI 5M} -J‘a(x)l{laI sM}dR) (x). For 9 one dimensional, consider the parametric family, ft) (x) = Po(x) eXp(90(x) - b(9)), with 12(0) = log” e""("’ciP0 (x)]-1-r(u|z) R()‘a(z)-o.(z) 37 , )_ E{Ze2°'ZR(Z)0.(2)| T = t,U = u} w( ,u — E{eze'ZR(Z)0,(Z)| T=t,U= u} A.,.(z)=R(z)(1+0.(z)) Q.(y)=e°"{60.(z)—vA..(z)} a(y)=e°"{iA.,.(z)—<1-a)}. THEOREM 1. Suppose that (i) v (t,u) es, o_<_ao 0: Pr{|Z|SM}=1. Then (a) The eflicient score function for e is i.i(y) = lz—l(9,F;];-) at n=0. It is givenby , QeeszIdF AeezfadF (A—1)e°z_[?1dF effiieur (1.1) lfa(y)=—5—F_7(t—)——+y F(t) — F(u) +(I—Y—O)—?(-J— [2dr EMF = " 9* ro— ‘ 92 it?) Moreover the assumptions of the theorem guarantee that both is, lfa are Po-square integrable. The efficient score function for 9 is defined by 39 (1-2) ie.(y)=ie(y)—ifa'(y) with l f a. being the projection of [9 onto the closure of the space spanned by {lfazaeLg(F)}. Let Vl(a.) = zA(t) + _ Then f” —UMm-iaode=n) [aw . M+aan(a M]+L(:flaw 2( )+aa.(M Notice that E(Q,’|T,U,z) = e2°zo,R, E(Q,Q,|T,U,Z) = —e2620(,R, E(Q22|T,U,Z) = emOUR. Thus 2dr“ -E(19(Y)-I,a.(Y))l a(Y): E H{£1T_T)E[e 262R(0T K(a.)—0,,Vz(a,))|7‘,u]}+ I: 262 13,2{F‘1(-—U—)E{eR0u(V2(m-) Wilma] Wewanttohave E(l9(y)—lfa.(y))lfa(y)=0 VaeLg(F). a(t,u)=E[e2°ZR(0iK(0o)-0..V2(ar))|T="U=“l Let r2(t,u) = E[e2°ZR0,,(V2 (a.)- V,(a.))IT = t,U = u] Set r,(t,u) = r2(t,u) = 0. It follows that Ia2dF Ia dF (1.3) giW-ga ;,( ) -f2 (u)-f.A(t) find}? Rd}? (1.4) g2 3%) - 'nr) =f2[A(t)—A(u)], with f,(t,u) = E{Ze2°ZRO,|T = t,U = u} g,(t,u) = E{e2°ZR0,|T = t,U = u} f2(t,u) = E{Ze2°ZROu|T = t,U = u} g2(t,u) = E{emR0u|T = t,U = 2,}, Solving (1.3) and (1.4) we obtain (1.5) [2am = —oA(r)F(t) and £21.dF = [(o — tp)A(t) — uA(u)]F(u). Now by assumption (ii) we can easily verify that LEAF! S MA(t)Iv—‘ (t) —) O as t —> 0. Moreover I "an15“ —) LEAF as r —> o . Thus a. 6 L307). Now we can use (1 . 1) and (1.5) to obtain from (1.2) the efficient score function for 9, 1.9.0!) = 1'2 (y) - i f “—0) =(2 -O : Pr{|Z|SM}=1. (iii)V(t,u)eS, O_<_a0<0t]@ =E2[[Z-£W;2U2)]@[f(7;)a +{/(U2)JI2 «70min, with T, = T-ez, U, = U-ez, A] aAj(7;,U,) Vj. which is positive definite, unless Z = E(Z|7;,U#) a.e. ti. 42 Proof of Theorem 2. We follow the pattern established in Theorem 1. Let #1. Arguing as in section 2, for a given (bounded) function a e 192(17), we can construct a regular parametric family it passing through f. Let 1t : {fi‘zln| = [(25 (y)V(t22) + Q.(y)Q2o)V(u2z)] j adF 43 14-82 + [elm/(24,2) + Q. (new) V(r,z)] 1 MR Easy computations show that E{Q.’IT,UJ}= Alma), E{Q§(Y)IT,U,Z}= Axum) and E{Q.A.2] —oo r T—9Z=r T-BZ-—-r = E1IadF|:A‘(r’S)E{V(T,Z)|U-GZ = 5}" A12(r,S)E{V(U,Z)'U_GZ = 3}] s T + E1JadF[A2(raS)E{V(U’Z)U_QZ=s U‘OZ:S —BZ=r} { IT—OZ=r}] _ A12(r,s)E V(TaZ) , with l the joint distribution of (T -—- OZ, U — 92). To satisfy the orthogonality condition 5(1}, — i,a.)i,a = 0 Va 6 13m, set the two brackets in the preceding equation equal to zero and solve the system of equations to obtain (2.2) jadF = — f (r)k(r,s) and I'mdF = — f (u)k(r, s). From (i) and (ii) we obtain I a.dF“ s Mf(r)—:;;—~>O. Since J” a.dF —-) I” a.dF as r —> 00, we conclude that a. e L3(F). Now from (2.1) and (2.2), the efficient score function for 9 is 44 1;; (y) = iety) — ya. (y) = —[z — k(t — oz,u — 9a)][Ql (y) f(t - 92) + Q2 (y) f(u — 92)] and the information at e, 1(6)=£{i;(Y)]2=EJ[[Z—E(217;,zt)]2[f2(m +f2(Ut)Az -2fl7;)f(l/;)An]] , with T, = T—ez, U, = 11—92, A] a rel/(mug) Vj. Notice that strict monotonicity of F and assumption (iii) imply that t# -+ 0(5) is a strictly decreasing function. Moreover mm + {my/17 —f(n)JBT}2=f2(I;)At ”We -2f(7;)f(Ua)Aa > 0, if and only if 0(a) > 0(u,,) for n < tn. This shows that the information for o is positive unless Z = E (Z / 7;,U,,) a.e. 1;. The theorem is now proved. REMARK 2. We want to emphasize here that both theorems that we have considered have immediate generalizations to situations where interval censoring is a result of an inspection process with finitely many observations-inspections. It will be a matter of future research the case of a random number of inspections as it is described in Wang et.al (1994). 4. Estimation of [(90): In this section we provide an estimator of [(90) in both models that we have analyzed in this thesis. This estimator provides at least an approximation of the lower bound for the variance of (3,, in the Neonatal Brain 45 Hemorrhage application that we consider in Chapter 6. Let (Esp/i") , (émFg) denote MLE’s of (9,A) and (9, F) in the two models respectively. We make use of the notation introduced in previous sections with obvious modifications when the MLE replaces the true parameter. To make sure that there is no confusion about the model to which we refer, we denote by if and if the estimates of the information in the Cox and the Linear regression model respectively. The major difficulty that we encounter here is the estimation of the conditional expectation that appears in the expression for I (60). The following special cases are particularly useful in the Cox model due to its special structure and the form of the information measure. No simple expression for [(90) is available in the linear regression model. 1) WW. Then E Zezez 0 292 Z-W(t,u)=Z— W{ 202 Rl,u(Z) 14(2)} Z—(p(t’u)=Z—EW{ZEOZ 0!(Z)}. EW{e Rl,u(Z)0u(Z)} EW {8 0, (2)} 22.8%)?» (mt, <2.) zzWo <2.) Let GIN. = 1:1" ‘ and (finJ = Fl" . _ 236.2119,“(Zj)0"‘U'(Zj) 2826an ".7,-(2]) j-l i=1 Now we can estimate 1C (00) by 2 (4.1) if = fiitz, -\II..,.-)2[RAMBO-11.0320] 0 (2.91%..(20 + i=1 fie,- —q‘>.,.-)2 [847.7120] 0",. (2,). i=1 46 WWW Let r,,.(Z)=e2°ZR,,.(Z)0.(Z) and A,(Z)=e2°20,(2). Then EW{A,(Z)|T,U}= AT(O)Pr{Z = 0| T,U}+ A,(1)Pr{z = 1| T,U} AO( O)_a___hs(T U) A (1)(l-a)fi(T,U) T h(T, U) T h(T,U) ’ _ (l-a)ht(T,U) EW{ZAT(Z)|T,U}—AT(1) MT’U) . Similarly __ ah0(T,U) (l-a)ht(T,U) Ew{rr.t/(Z)|T,U}- rum—WM Nana) MU) _ (1_a)h|(T’U) and EW{ZFT,U(Z)|T9U}_ FTJ/(l) h(T9U) with h,(t,u)=h(t,u|Z = i) i = 0,1. Now we can estimate (p,\u by (1— a)A (1)12 .(t u) aA M) ,..(t u)+(1- M. (012.. (t u) = (I-a)r.(1)h.,.(r,u) af.(0)h.,.(r,u>+<1-a>f.(1>h.,.(t,u) <3n(t,u)= and \l7,.(t,u) respectively, where hn,,(t,u)=hn (t,ulZ = i) i = O, is a kernel type estimator of h,(t,u)=h(t,u|Z = i) i = 0,1. By appropriate choice of kernel and bandwidth h can be ’n,i a consistent estimator of h, - see Silverman (1986). An estimate of [C(G) is given by (41) 118ng Wu’crfin ‘ 47 To estimate 1(9) in other cases than the previously noted, one has to estimate the conditional expectations that appear in the information for the models that we consider. In addition to that, the linear regression model requires an estimate of the density f . Kernel density estimation might be a solution here, leaving us the burden to choose the kernel and the appropriate bandwidth. In this case the estimator will have the form (4.2) f.(t) = ii K[’,;“]dfi.(u). To estimate the conditional expectations that appear in the definition of (‘1’,(p) and k in the Cox and the linear regression model, we proceed in two steps. If the conditional expectation has the form E(g9(T,U,Z,F)|1t9(T,U)) = g(ne(T,U)), for some known functions genre , then i) Approximate g9(T,U,Z,F) by gé.(T,U,Z,F:,) and 1136 by “é,' ii) Employ tools from nonparametric regression to estimate g in gé"(T,U,Z,Fn)= g(nén(T,U))+e , with E8 = 0. Call the resulting estimator gn(T,U). Now a meaningful estimator of 1(0) in the Cox model is given by (4.1), with filmcfin functions of §n(T,U). The corresponding estimator in the linear regression is n 2 A A A A A A A A A A A (4.3) I“: = %Z[Z,. -§.(T,,U,.>] [mam +f,3(U,-.t>A. —2f,.(T,,.)f,. O, Pr{Z= O}=1—y , Pr{Z = 1} = y. For I“,,u(Z) =ezezR,.u(Z) 0,,(Z) , A,(Z) =e262 0, (Z) as in the previous section, we obtain E. {Z F...(Z>} e2°R. .(1>0.(1)t \p(t,u) = = 29 ’ Ez{r.,.(2)} e R.,.(1)0.(1)Y+R,,..(0)0.(0)(1-y) and +(1—v)0,(0> ' We will also assume that the censoring distribution is discrete uniform on the lattice is = {(i, j): i < j, i, j e {1,2,3,4}}. Tables 5.1 and 5.2 contain the information measure for 0, for a few selected distributions under interval censoring and no censoring. In the latter case, Ritov and Wellner (1988) prove that I"(9) = E X {var(Z|X )} In the current setup, the information for 6 without any censoring is given by te°[F(x)]’° ye°[F]‘ +(1—t)F’(x) [a(9)=Ex{§(X)—E,2(X)} with €05): 49 Notice that if X ~ F , F a continuous distribution, then F (X ) ~ U (0,1) , thus making the information independent of F. This is indeed the case as the next two tables show. For interval censoring, we present the information measure in case that two inspections are available as well as in case of a single inspection (see Remark 3.1). In the following tables the numbers in parenthesis correspond to case 1 interval censoring. Table 5.1: Information measure in the Cox model (Y = .2) Hazard 6 -2 -1 0 l 2 exp(.5) 1(9) .037 .077 .1 17 .096 .031 (.0235) (.052) (.086) (.076) (.021) U(0,5) 1(9) .028 .060 .101 .105 .061 (.0135) (.032) (.061) (.079) (.046) f(x)=.08x 1(9) .017 .039 .072 .091 .065 05x55 (.006) (.014) (.030) (.049) (.049) no ,.(9) .076 .125 .160 .131 .081 censoring 50 Table 5.2: Information measure in the Cox model (Y = .5) Hazard 0 -2 -1 0 l 2 exp(.5) 1(9) .077 .136 .183 .150 .052 (.050) (.095) (.134) (.119) (.036) U(0,5) 1(9) .059 .110 .158 .152 .082 (.030) (.060) (.096) (.1 10) (.063) f(x)=.08x 1(9) .037 .073 .113 .126 .080 05x55 (.013) (.027) (.046) (.061) (.059) no 1‘0) censoring .165 .223 .250 .187 .097 We have chosen the three distributions on the basis of severity in censoring. Clearly an exponential distribution for the variable of interest creates many left censored observations, while the distribution specified by the density f puts most of its mass towards the right endpoint of the interval [0,5], thus causing an excess of right censored observations. Chapter 5 GENERALIZED M-ESTIMATION IN THE ACCELERATED FAILURE TIME MODEL In Chapter 2 we defined maximum (profile) likelihood estimators for the Cox and the linear regression models and established sufficient and necessary conditions for step 1 in the profile likelihood algorithm to be well defined. Although in the Cox model the entire algorithm seems to be easily implemented, largely due to the smoothness of the function 0 l—> e9: , the same result might not hold in linear regression (in step 2 we might not get a maximizer) . The problem arises from the unpleasant situation of nonsmoothness in the function 0 1—> E(o;0), with F,(-;0) the maximum likelihood estimator of F}, for a fixed 0. Any attempt to develop the asymptotic theory for this class of estimators will have to confront problems of this nature. To avoid artificial assumptions which are often impossible to verify in practice and make the problem unnatural, we turn to a smaller class of M-estimators which hopefully provides asymptotic theory under reasonable assumptions on the underline model. Our efforts follow closely the theory of Asymptotic Generalized M-Estimators (AGME) of Bickel, Klaassen, Ritov, Wellner (1993)— hereafter abbreviated to BKRW. In their master theorem 7.3.1, page 312, these authors establish the conditions for an AGM estimator to be asymptotically normal. However their requirement for a consistent AGM estimator is 51 52 extremely difficult to verify, especially in models where a close form of such an estimator is not available or very hard to obtain. We manage to obtain the result of their theorem by relaxing the consistency assumption to the expence of more smoothness in the objective function. To verify the rest of the conditions in their theorem, we exploit the modern arsenal in the theory of empirical processes. Sufficient conditions for convergence of stochastic processes and results of the form sup{~/;(Pn-P)f : f 6.7,, c3}=op(l), are given in Pollard (1989) for certain classes of functions 3. In the appendix of this thesis we put together the necessary machinery that will enable us to verify the assumptions of our modified version of Theorem 7.3.1 of BKRW. Most of this work will be the subject of future research. 1. The master theorem: Let 00 be a model and 0° c 7710 with 7710 containing all measures with finite support. Let W" , W: R’" x 7710 —> R" and v20“ —) R’". Suppose W (v(P), P) = 0 VP 6 0) and Wn(v, Po) = W(v, Po) + 0(1). Introduce the notation W (v) E Wn(v,Pn) with P" denoting empirical distribution. We say that {In is generalized M-estimator (GME) of v( P) if W;,(\7n) = 0 and asymptotically generalized M-estimator (AGME) if W,,(9,,) = op(n'”2). Let V..(v)E\/Z(W(v)-W(v,8)) . v. a v0.)- Vn(v)—Vn(v0)| 1+J;|v— vol :Iv— vol 5 8n} = op(l) (GMO) VenlO sup{ 53 (GMO') VMl = lv-vols Mn-“2}=o.0). (GMl) 3 v: in, —~) R" such that W(v(P),P)=0 VPemo. (GM2) 3 w:I x m, —> R'", w e{1.‘;(P,)}"' such that = n":w(l§, Po)+op(n‘“2). i=l (GM3) W(- , P0) is differentiable at v0 and W(Po) -=- W(vo, P0) is nonsingular. THEOREM (BKRW-l993). Suppose P0 60’. Let 9, be an AGM estimator of Va. If {1 is consistent and (GMO)-(GM3) hold or 0,, is J; - consistent and (GM 0') — (GM 3) n hold, then M9,. -vo)=- W‘MMY- Po)+0 (1-) si— i=1 2. Generalized M-estimation under interval censoring: From our computations in Theorem 4.3.2, the efficient score function for 0 is (2.0) = )-—[z (8,44) )][Q( (y)f(t )+Q2( y)f(u.)] with a, a a —02 Va. We will derive a generalized estimator of 0 by appropriately modifying (2.0). Notice (although not exhibited) that Ql ,Q2 depend on both 0 and F. We replace f k in (2.0) by known functions f, E . In addition we replace F ( — 02) by its maximum likelihood estimator F,(-—02;0). Denote the corresponding generalized score fimction by Z(y;0) and let W,,(0) a P l:(0;Y), W,,(0, P) :- Pl:(0; Y) , with the functional n representation of the integral Pf = I fdP , used everywhere in this chapter. Finally let 54 X —0Z~F(-;0) V0 66). From the consistency property of the NPMLE of F3, (Groeneboom and Wellner (1992)), we obtain F,(-—02;0)-i-)F(-—02;0) with F ( — 002,00) 5 Fg,(- — 002). It follows that (2.1) lbw) ——‘"——>T(y;9) s -[2—F(t.,u.)l {[F(:;9) — F(u#;9)y_ F(t#;9):|~(t#) 1-8— ~ {Wiel- F(t.;9) " F(u,;6y)lf(ut) }. Choosing carefully if , we will prove that (2.2) 1907,:(00,Y)—+ P,T(9,,Y) and (2.3) W..(e.) a (P. — P. )T.(e.;Y)+ Imam—“‘4 P.7(e.;Y) 2 W0... Po). Moreover . _ ~ _ ~ FLO-002) E,(u—002)—F;,(t-002) ~ W(9’&)=M9’Y)"llz‘kl{l F(t.;e) ' F(u,;e)—F(t,;e) l (W 5(“‘902)-Fo(t-902) Eta—6.2) .. [ F(u,;0)—F(t#;9) - F(urfi) lf(u#)}dj(t’u’z)r from which it clearly follows that (2.4) W(00;R,)-=-0. The following theorem is our modification of the master theorem presented earlier. It shows that we can relax the consistency assumption and still manage to obtain the same asymptotic result at the expense of more smoothness in the limit of the objective function. Consider the conditions V(0)—V,,(00)| : I0—00|Sn’“}=op(1), for some a>0. (C1) sup{ (C2) 3 w:Rf x R" x {0,1}2 x9 —> R’", w e{13(P,)}"' such that 55 W (0,) = P,w(Y,0,) +0p(n"’2). (C3) 0 l——) W (0,P0) is difi’erentiable at 00 with W(BO) nonsingular. Moreover W(0,Po)= W(Go,R,)+W(00)(0—00)+|0—00|' with ya >1/2. (C4) W(90,R,) = 0' THEOREM 1. Let eoeé), K,(0)Et/h[lK,(0)—W(0,R,)],0€@. Suppose that (Cl-C4) hold. Then (i) For large n 3 0n: n 0,, —00 Is n'“ with ya >1/2 such that W (0") = op(n"“2) almost surely Po. (ii) Meat/72(0), -0,) = -\/I—1_ P,w(Y,9,)+op(1), fiom which it follows that 72(0, —0,)—"—>N(0,z,) I with 2, = W(6,)"[E,Wv][W(0,)“] . REMARK 1. As we have mentioned earlier, the verification of the conditions of Theorem 1 in our interval censoring problem, will be the subject of fiiture work. However in Theorem 2 we prove a uniform law of large numbers, tailored for the needs of our linear regression model, see display (2.3). This result is used in the proof of Theorem 1. 56 REMARK 2. Notice the connection between the modulo of continuity in the objective function (C1) and the smoothness of the model (C3). If 01—)W(0,P0) is twice differentiable and (C3) holds with 7:2, then (Cl) needs to be established for a >1/4. In this case, it might be appropriate to follow the approach of Manski (1975), (1985), seeking “maximum scored” estimators of 00. Huang (1993), proved that the “maximum scored estimator” of 00 with case 1 interval censored data is n“ -consistent. If we manage to establish such a result with case 2 interval censoring, then all is left to do in the present context is to solve W,,(0) = 0 in a )0 — 0,, = Op(n'”3 ) neighborhood of 0". Proof of Theorem 1. Let 0 e Bn(a) = {0:[0 — 00] s n""} . In view of (C4) ,write (C1) as W (9) = W") + W..(90) + 0601"”) (3’ W,,(00) + W(0,)(0 —0,) + |0 — 0,|’ + spot-"2) =W,,(e,)+ W(0,)(0 —90)+op(n'"2) since ya >1/2. Now from (2.3), (2.4) and the definition of B" (a) , (i) follows. To obtain (ii) write W,,('0',,) =W,,(0,) + W(0,)(0', —0,) + op(n'”2), from which it follows that 0,, — 0, = —[W(90)]" W,,(e,) + op(n'”2). In view of (C2), the theorem is now proved. The following lemma is used in the proof of Theorem 2. Recall that H, J, V denote the joint distributions of (T, U), (T, U,Z), (T—GOZ,U —00Z) respectively, while [ambo] indicates the support of F3. 57 LEMMA 1. Suppose that (i) F, is strictly increasing. (ii) J is a continuous distribution, 0, U) and Z have densities with respect to Lebesgue measure. (iii) (T10 and Z are bounded. Moreover V (1:,p) e supportflO E S, 00 : H{(t,u): u2t+§} = 1. Then there exists a neighborhood N ( 0) of 00 and an M >0 such that l l _ S M efinlfilfiO-Gzzfim) + 15;,(u—0z;0;00)—F,,(t—02;0;c0)+ F,,(u—02;0;00)] for n large enough and (t) e B with Po°°(B) = l . Proof of Lemma 1. According to our definition in chapter 2 and for 0 e (9 fixed, F,(-,0) = argmaxl,(0,F). F67 Let a e(0,l). For simplicity we denote by F,() E F,,(-,0). Then 1,,[(1— e)F+eF,]— 1,,( )<0 vtt. It follows that l:rr)1-2{l,,[(—l 8)F+8F]- l,,( (F)}<0 Vn. If we evaluate this limit explicitly we obtain (2. 5) [[8 BEL—92+) F3 n} and some n > 0. — l—y—B 2.9 F b -._——dP _<_ l . ( ) 0( )I E, (u _ 92) n From the strong law of large ntunbers we obtain the following weak convergence result P(-;0))—"'—)P0() for (1) EB :R,”(B)= 1. n We now claim that 3 constant M2 > 0 such that for n large enough and co 6 B , l (2.10) F,(u—02;0))— F,(t-02;co) S M2 V(t,u,z) ES, : {t—02,u—02} e[a,b]. If this is not true, then for (t. ,u.,z.) ES, , (t. -— 02. ,u. — 02.) e [a,b]®2 and with positive probability QJD 1 >AJ VM>0. F,,(u. -02. ;c0)— F,,(t. —02. ;(0) Let A = {(x, y): a S t. - 02 S x < y S u. — 02 S b}. Note that by the monotonicity of F,(-;o3), (2.11) holds for every point in A. Moreover A has positive Lebesgue measure. Therefore 11J'F( ,,(—u —02;a)) F—,,(t— —02 ;0))dP"(.;w) > T1M jrdPJ-W) AnnA = nMP,(A,,nAn{t 1 for large n, M. Since this violates (2.8), we conclude that (2.10) holds. Similarly we can prove that l (2.12) E,(t—02;0))SM' V(t,u,z)eSJ :t—026[a,b] and (2-13) E(u-192;03)S M3 V(t,u,z)eS_, : u—02 e[a,b] with (0 EB :P0°°(B)=1. The result now follows from (2.10), (2.12) and (2.13) for M = M1 + M2 + M,. We conclude this section with the theorem that establishes the uniform strong law of large numbers for our empirical process W,,(0) a P,,l:(Y ,0) . THEOREM 2. In addition to the assumptions in Lemma 1 , assume that (i) I? is a bounded function on [a,,b,]®2 . (ii) 7 is Lipschitz, i. e. lf(x)—f(y)| S mlx —y| Vx,y. Then limW,,(0,) = W(Oo) as P0”. Proof of Theorem 2. Recall the definitions of (2.14) "2(9) 5 P.Z.'(Y,9) =(P. - P We; Y) + Pile; Y) 9 66>- and a(y;e)a—[z—t(t.,u.)]t[ 5 - Y ]70.) 17,059) E10459)" 120169) 60 +[ Y _l—8—Y].7(u#) }- E(ut;9)-Fn(tt;9) E(urfi) Let r,, be a sequence of positive numbers such that r,, i 0. Write sup | P,T(0,,Y). Now from (2.15) and (2.16) we obtain W,,(0,) —) W(0,, 1,) = 0 as p,“ . The theorem is now proved. Chapter 6 SIMULATIONS We have seen in chapter 5 that the profile likelihood approach doesn’t always work in the AFT model as nicely as it does in the PH model. The source of most problems with this model is the frequent lack of smoothness in the function 0 l—) F,,(-,0). In fact, very little one can say about this function and its properties. Nevertheless, smoothness is essential in obtaining the maximizer in step 2 of the profile likelihood algorithm. To shed some light in the behavior of 0 1—-) F,(-,0) , a limited simulation study is conducted to examine the behavior of the profile likelihood as a function of 0 . We work with the model X = 0Z + e where Z is a single covariate having Bernoulli(.5) distribution and s a normally distributed random variable. We present the results of the simulation for sample sizes of n=100 and n=1000 observations. Non negative, independent random variables 7] , T2 are generated from preselected distributions and the censoring pair (inspection times) is constructed according to TET,U5T+E. In an interval centered around the true 0 , we take a grid of points and for every such point we compute the MLE F;(-,0,): i = 1,2,...,k , with k indicating the cardinality of the grid. We then record the value of the profile likelihood function 1t(9,)=l(9,,F;,(-,9,)) and plot the pairs (0,,1t(0,.)). For 0. E argmaxrt(0,.), we lSjSk A A set 0 50. and F(-)EF(-,0.). n 61 62 Table 6.1: Profile Likelihood x=2z+N(0,1 ) T. ~ Ewen. ~ Exp(5) n=100 n=1000 0 7t(0) 0 71(0) -1.0 -58.54 1.0 475.9 -0.5 -51.66 1.2 460.5 0 44.15 1.4 445.1 0.5 -33.19 1.6 433.7 1 -26.7 1.8 426.8 1.5 -25.16 2.0 424.4 2 -25.6 2.2 428.2 2.5 -28.7 2.4 434.7 3 -29.4 2.6 441.4 3.5 -29.5 2.8 448.7 4.0 -29.5 3.0 457.6 4.5 -29.6 5 -29.8 The proportion of censored observations in the two samples by type of censoring is (.60,.30,.10) and (.60,.25,.15) respectively, for left, interval, right censoring. The 1: function is maximized for 0.=l.5 when n=100 and for 0.=2 when n=1000. 63 Table 6.2: Profile Likelihood X=.5Z+N(4, 1) 7, ~ Unif [3,5], T2 ~ Unif [0,1] n=100 n=1000 0 n(0) 0 7t(0) -1.0 -93.18 0 -860.0 -0.8 -90.88 0.1 -854.0 -0.6 -88.31 0.2 -850.2 -0.4 -86.60 0.3 -845.0 -0 -87.0 0.4 -840.2 0 -83.29 0.5 -839.8 0.2 -79.45 0.6 -837.6 0.4 -80.61 0.7 -841.5 0.6 -78.34 0.8 -840.6 0.8 -77.59 0.9 -845.5 1.0 -79.71 1.0 -850.4 The proportion of (lefi,interval,right) censoring is (.40,.18,.42) and (.38,.18,.44) respectively, in the two groups. With 50% of the data essentially censored (to the right), convergence of the profile estimator is much slower. With n=100, we get 0. =8, while increasing the sample size to n=1000, we only obtain 0. =6. The 1: function is obviously “less” smooth here. In the next figures we plot the 71: function and the maximum (profile) likelihood estimate of the error distribution. Iog-lik log-lik 450 430 -470 n=100 T1 ~ Exp(.5), T2 ~ Exp(.5) theta Figure 6.1a: Profile Likelihood: X=2Z+N(O,1) n=1000 T1 ~ Exp(.5), T2 ~ Exp(.5) 1.0 1.5 2.0 2.5 3.0 theta Figure 6.1 b: Profile Likelihood: X=22+N(0.1) Iog-lik log-lik -850 -840 -860 65 n=100 T1 ~ U[3,5], T2 ~ U[0,1] -1.0 -0.5 0.0 0.5 1.0 theta Figure 6.2a: Profile Likelihood: X=.52+N(4,1) l n=1000 ; T1 ~ U[3,5], 12 ~ UlO,1] , 0.0 0.2 0.4 0.6 0.8 1.0 theta Figure 6.2b: Profile Likelihood: X=.52+N(4,1) 0.8 0.4 0.0 0.8 0.4 0.0 66 Figure 6.3a: M.L.E. of error distribution: X=22+N(0,1) Figure 6.3b: M.L.E. of error distribution: X=.5Z+N(4,1) APPENDIX Appendix 1. Concavity of the function A l—) 1,, (0;A): In chapter 2 we have used without proof the concavity of the log-likelihood function with respect to A for every fixed 0 e G). In this section we give a short proof of this claim. Let a>0 and consider the function w,(t,u) = 810g(1— e"") + y log(e"" —e““’)—(1 —y —8)au. 6 8ae"" yae"" 0 yae'm Th— ,= — ,—t,=—————1——8. en at w0(t u) 1_e-ta e-ta _e—ua 6“ WO( u) e—ta _e-ua ( Y )a Set 8e"" l—e"" +8e'2’“ e-('+u)a eat...) th(5’Y) E ( _) 2 + I _ 2 a Ww(5’Y) E :Y, _ 2 (1_e a) (e a_e ua) (e a_e Ira) -a(t+u) W(v(B’Y)E-— _! _ 2 ° (e a -e ua) Then we can write 2 2 Ewaaiu) = _ 02Wa(5’y) ’ 517W0(t’u) = _ 02WW(5’Y)’ 62 auat Now the matrix of second derivatives is \p,(t,u) = — azwm(8,y). 67 68 It is easy to verify that 3(8)) is non-negative definite. Thus w,(t,u) is concave as claimed. 2. Alternative characterization of MLE: We give here an alternative characterization of the maximum likelihood estimator in both models that we considered. It is based on a geometric interpretation of the NPMLE, as the left derivative of the gratest convex minorant of a cumulative sum diagram. The idea is given in Robertson, Wright and Dykstra (1988) and was first implemented in the interval censored problems by Groeneboom, see Groeneboom and Wellner (1992). The process W associated with the first derivative of the log-likelihood function with respect to A in the Cox model is 5.- Yi qrMT’W WA,q(t) = Z (1_e-A(T.)¢"" _ e_/t(7;)¢8 _e—A((/,)e<fi ) e + i: T,St ll 1‘71 '61 q,—A((l,)e‘” Z 'Alrl‘v’fi -A(U-)eq" _ _,\((/ )erh e - e ' -e ' e l 1': (1,5! Let d G,,(t) = —EWM,(t) and dVA,(t) = dWA, (t) + A(r)dG,,,, (t) . The following proposition is adapted from Groeneboom and Wellner (1992). It provides an alternative characterization of the MLE of A , thus giving an equivalent statement to the one presented in our Theorem 2.6.1. PROPOSITION 1. For fixed 0 66), let q, =0'z, ,ie{l,2,...,n}. Suppose that 8,, =1 and 5(m)=1(m) :0. Then X,(-,0) is the NPMLE of A, ifand onlyif K,(.,0) 69 is the lefl derivative of the convex minorant of the “cumulative sum diagram consisting of the points PM = (Gx,(..e).a(n(j))’ VK.(-.6).q(n(1))) where R, =(0,0) and ”(1) eJn, j=1,2,...,m. A similar proposition can be formulated to characterize the MLE in the linear regression model. It can be used as an equivalent statement to Theorem 2.6.2. The associated W process is given by n 8? .- WM’) = livellrm) _ F(U.°)Y- F(Yfll , 7? _1-5? -7? W Foil-Fm we?) ' — + 3 3. Some results from empirical process theory: In this section we summarize results that can be used to prove uniform central limit theorems and laws for large numbers. We see the need for such machinery in our chapter 5, in our effort to verify the hypotheses of the master theorem of BKRW (1994). Most of them can be found in Dudley (1984), (1987) and Pollard (1984), (1989). Let (S,d) be a metric space, B c S, e > 0 and .7 c [(P) , a family of functions for some r>0. Denote by In N (e,B,d) , ln D(c,B,d) , In N B (8,3 ,d) , the e—entropy, e-capacity and a— bracket entropy of B respectively. The following proposition is a simple consequence of the definition of MD. PROPOSITION 1: For every 8 > 0 and Vset B in a metric space (S ,d), 70 D(2e,B,d) s N(e,B,d) s D(e, B,d). We now define the concept of a manageable class of functions . Pollard (1989) introduced these classes and obtained results that go beyond the Vapnik-Cervonenkis (1971) theory of VC classes of sets, thus extending the availability of central limit theorems to estimators that depend on larger classes of functions. DEFINITION 1: Let .7 be a class of functions with an envelope F, that is I f |SF Vf 6.7 and let [HIM indicate the L,(Q) norm.. We say that .7 is manageable for the envelope F if there exists a decreasing function F() for which (i) E(logl"(x))mdxoo. 31") Then 2 —)O as n——)oo. Esura hf 50') The concept of a manageable class comes close to Dudley’s definition of functional Donsker classes, see Dudley (1987). Although a manageable class for a constant envelope is a functional Donsker class, not all Donsker classes are manageable. Let .7 be a class of uniformly bounded functions on a probability space (1,111, P). Set v,(f)th(P,—P)f f6.7. DEFINITION 2. The class of functions .7 is said to be afimctional Donsker class if and only if (i) .7 is totally bounded for the sup-norm. (ii) 3 8 > 0 such that sup Iv, (f) — v, (g)| = 0,,(1) , modulo measurability constraints. If-gldi BIBLIOGRAPHY BIBLIOGRAPHY Aalen, 0.0. (1978b). Nonparametric inference for a family of counting processes. Ann. Statist. 6, 701-26. Andersen, P. K., Borgan, 0., Gill, R.D., Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer-Verlag, New York. Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. Ann. Statist. 10, 1100-1120. Bickel, P. J ., Klaassen, C. A. J ., Ritov, Y., Wellner, J. A. (1993). Efi‘icient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press, Baltimore. Buckley, J ., and James, I. R. (1979). Linear regression with censored data. Biometrika 66, 429-436. Cox, D. R. (1972). Regression models and life-tables. J. R. Statist. Soc. B, 34, 187-220. Leblanc, M., and Crowley, J. (1995). Semiparametric regression fiinctionals. JASA, 429, 95-105. Dudley, R. M. (1984). A course on empirical processes. Ecole d ’ Bté de Probabilités de Saint-Flour XII-1 982. Lecture Notes in Math. 1097, 2-142. Springer, New York. Dudley, R. M. (1987). Universal Donsker classes and metric entropy. Ann. Probab. 15, 1 306-1 326. Finkelstein, D. M. and Wolfe R. A. (1985). A semiparametric model for regression analysis of interval-censored failure time data. Biometrics 41, 933-945. 72 73 Finkelstein, D. M. (1986). A proportional hazards model for interval-censored failure time data. Biometrics 42, 845-854. Fleming, T. R., Harrington, D. P. (1991). Counting processes and survival analysis. Wiley, New York. Green, P. and Yandell, B., (1985). Semiparametric generalized linear models. Proceedings 2nd International GLIM conference, Lecture notes in Statistics, 32, Springer - Verlag, Berlin. Heckman, N. E. (1986). Spline smoothing in a partially linear model, J. Roy. Statist. Soc.,B, 48, 244-248. Groeneboom, P. (1989). Brownian motion with a parabolic drift and Airy functions. Prob. Theory and Related Fields, 811, 79-109. Groeneboom, P. and Wellner, J. A. (1992). Information Bounds and Nonparametric Maximum Likelihood Estimation. DMV Seminar Band 19, Birkhauser, Basel. Hoel, D. G. and Walburg, H. E. (1972). Statistical analysis of survival experiments. J. Nat. Canc. Inst, 49, 361-372. Huang, J. (1993). Maximum scored likelihood estimation of a linear regression model with interval censored data. Tech. Report, No. 253, Dept. of Statist., Univ. of Washington. Huang, J. and Wellner, J. A. (1993). Regression models with interval censoring. Proceedings of the Kolmogorov Seminar, Euler Mathematics Institute, St. Petersburg, Russia. Koul, H. L., Sousarla, V. and Van Ryzin, J. (1981). Regression analysis with randomly righr censored data. Ann. Statist. 9, 1276-1288. Leiderman, P.H., Babu, D., Kagia, J ., Kramer, H.C., Leiderman, GP. (1973). African infant precocity and some social influence during the first year. Nature, 242, 247- 249. Manski, C. F. (1975). Maximum score estimation of the stochastic utility model of choice. J. Econometrics, 3 , 205-228. 74 Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator. J Econometrics, 27, 313-333. Paneth, N., Pinto-Martin, J ., Gardiner, J ., Wallenstein, S., Katsikiotis, V., Hegyi, T., Hiatt, M., Susser, M. (1993). Incidence and timing of Germinal Matrix/ Intraventricular hemorrhage in low birth weight infants. Am. J. Epidemiol. 137, 1167-1176. Peto, R. and Peto, J .(1972). Asymptotically efficient rank invariant test procedures. J. R. Statist. Soc. A, 135, 185-206. Pfanzagl, J. (1988). Consistency of maximtun likelihood estimators for certain nonparametric families, in particular: Mixtures. J. Statist. Plann. Inference, l9, 1 37-1 58. Pinto-Martin, J ., Paneth, N., Witomski, T., Stein, 1., Schonfeld, S., Rosenfeld, D., Rose, W., Kazam, E., Kairam, R., Katsikiotis, V., Susser, M. (1992). The central New Jersey neonatal brain haemorrhage study: design of the study and reliability of ultrasound diagnosis. Paediatr. Perinat. Epidemiol. 6, 273-84. Pollard, D., (1984). Convergence of Stochastic Processes. Springer, New York. Pollard, D., (1989). Asymptotics via Empirical Processes. Statist. Sci. 4, 341-366 Ritov, Y., (1990). Estimation in a linear regression model with censored data. Ann. Statist. 18, 303-328. Robertson, T., Wright, F. T., Dykstra, R. L. (1988). Order Restricted Statistical Inference. Wiley, New York. Rficker, G., Messerer, D.,(l988). Remission duration: An example of interval censored observations. Statist. in Medicine, 7, 1139-1145. Schick, A.,(1993). On efficient estimation in regression models. Ann. Statist. 21, 1486- 1521. Silverman, B. M. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York. 75 Tumbull, B.W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J. Amer. Statist. Assoc. 69, 169-173. Tumbull, B.W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Statist. Soc. B, 38, 290-295. Van de Geer, S. (1993). Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 21, 14-44. Wang, J .-L. (1985). Strong consistency of approximate maximum likelihood estimators with applications to nonparametrics. Ann. Statist. 13, 932-946. Wang, Z., Gardiner, C. J ., Ramamoorthi, V. R. (1994). Identifiability in interval censhorship models. Statist. Prob. Letters, 21, 215-221. Whittemore, S. A. and Keller, B. J. (1986). Survival estimation using splines. Biometrics, 42, 495-506. 31293014172682 lllllllllllllllllll