.w a" .3 . .. . 5:“. J km: W .,, a an. 4S. , a. z .. . . w . ‘ a... . . as 3. t . . . ‘ “fl 2....» A Jen-Mm“ _ . , , , 4¥ 61) m: V . ":5”; “.3012? , . in... if bat}. ...~ . . ”.3 . J. ‘ . a. Taxi»... . .fbn. .: 31...! . r. . in“... ”rev , . Lulu...“ Suki. ‘ m6. .1 yam-«Rah l , . , . Emmmwtmfisfifi: 2., _ awn» . , , .4. . . .9. “may. «Lauww < tmumm. , s x. x .s 3.3% . .r. #192 J: 1. , mam. 5.x A». n: .x i. . 1i .Qvl 3...)», _ 2.1%.. x .531.” {a ‘ {1.1 .2. 2?. a ». ZL‘Ci This is to certify that the dissertation entitled Semiparametric Estimation For Current Status Data With Flexible Covariate Effects presented by Neniiang Lu has been accepted towards fulfillment of the requirements for PhoDo degreein StatiStiCS M Major professoi' Hira L. Kou] Date December 4, 2000 MS U i: an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michlgan State UnIversity PLACE IN REFURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE moo mas-p.14 SEMIPARAMETRIC ESTIMATION FOR CURRENT STATUS DATA WITH FLEXIBLE COVARIATE EFFECTS By Wenliang Lu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2000 ABSTRACT SEN’IIPARAMETRIC ESTIMATION FOR CURRENT STATUS DATA WITH FLEXIBLE COVARIATE EFFECTS By Wenliang Lu This thesis studies a semiparametric hazard model with parametric baseline hazard rate and nonparametric covariate dependency based on current status data. Two estimators are proposed. One is the generalized profile maximum likelihood estima- tor (GPMLE) and the other is the sieve maximum likelihood estimator (SMLE). The GPMLE is obtained by maximizing the profile likelihood function where the nonpara- metric covariate part is estimated using kernel and least square methods. Under some regular conditions, the thesis establishes the square root consistency and asymptotic normality of this estimator. The SMLE of the parameter is obtained by maximizing the log-likelihood function with respect to both the finite dimensional and the infinite dimensional nuisance parameters while the infinite dimensional nuisance parameter is constrained to a subset of the parameter space which increases with the increase in the sample size. This estimator is shown to be consistent and asymptotic normal. Moreover, its asymptotic variance achieves the semiparametric lower bound. ACKNOWLEDGMENTS I would like to thank my advisor Professor Hira L. Koul for his guidance and many helpful discussions on the subject of this thesis. He was always available when I had doubts or questions. His general thinking of a statistical problem and ways to solve the problem will help my future research and working. I would also like to thank all the other committee members, Professor Joseph Gardiner, James Hannan and Habib Salehi, for serving my guidance committee, especially Professor James Hannan who was also my academic advisor. Finally I would like to thank the department of Statistics and Probability for offering me graduate assistantships so that I could come to the states and finish my thesis and my graduate study at Michigan State University. iii TABLE OF CONTENTS LIST OF TABLES ............................................................. vii Simulation results for the two estimators Introduction ..................................................................... 1 Overview .................................................................... 1 Literature review Summary description The model ................................................................... 3 The cumulative hazard function A(a:, 60)g(z) Chapter 1 Generalized Profile Maximum Likelihood Estimation ............................. 5 1.1 Definition of estimators of 60 and g ....................................... 5 Kernel estimate, 13‘(TJ-, Z,), of F(Tj, Z,, 00) Least square estimate, §9(Z,-), of g(Z,-) for each fixed 0 The estimator, 9, of 60 1.2 Asymptotic properties of the estimators .................................. 8 1.2.1 Consistency ........................................................ 8 Assumptions for the consistency and asymptotic normality of 9 6i converges to 60 in probability: Theorem 1 1.2.2 Asymptotic normality ............................................. 10 Jag — 00) converges weakly to a normal distribution: Theorem 2 1.3 Simulation .............................................................. 11 Weibull distribution with c.d.f. 1 — e19” Means and standard deviations of 9 for small and moderate samples iv 1.4 Proof of the consistency and asymptotic normality ...................... 12 1.4.1 Lemmas preliminary to the proof ................................. 12 Lemma 1: Uniform consistency of ("(t), the mean of independent r.v.’s, t E 7) Lemma 2 - 5: Uniform consistency of Kernel estimators 1.4.2 Proof of Theorem 1 and 2 ......................................... 19 Lemma 6 - 9: Uniform consistency of F(Tj, Z,) for F(Tj, Z,-, 00) and of Q9(Z,-) and its derivatives for gg(Z,) and its derivatives Proof of Theorem 1 (Consistency) .................................. 28 Proof of Theorem 2 (Asymptotic normality) ......................... 30 Chapter 2 Sieve Estimation ................................................................ 44 2.1 Estimation .............................................................. 44 Score equation: 5,,(0, on) = 0 The estimator, d, of 60: 3,,(9, (in) = 0 2.2 Consistency ............................................................. 47 Assumptions for the asymptotic properties Theorem 3: Id — 60| + “an — O‘olloo converges to O in probability Theorem 4: Convergence rate of the estimators, I0“ — 90l = ope-i) and Ma. - aon -—- 0.027%) 2.3 Asymptotic normality of d .............................................. 49 Theorem 5: Jag — 00) converges weakly to a normal distribution 2.4 Information bound for 60 ................................................ 49 Efficient score for 60 2.5 Simulation .............................................................. 51 Weibull distribution with c.d.f. 1 — e402 Means and standard deviations of IS3 for small and moderate samples 2.6 Proof of the theorems ................................................... 53 2.6.1 Proof of Theorem 3 ................................................ 53 Lemma 10: Inverse function theorem with sup-norm 2.6.2 Proof of Theorem 4 ................................................ 57 Lemma 11: The convergence rates of sieve estimators 2.6.3 Proof of Theorem 5 ................................................ 60 Lemma 12: Stochastic equi-continuity for empirical processes Bibliography ................................................................... 67 vi LIST OF TABLES Table 1 Simulation results for the GPMLE .............................................. 11 Table 2 Simulation results for the SMLE ................................................ 52 vii Introduction 0.1 Overview Current status data arise in some clinical setting when the survival time of interest can only be determined to lie below or above a random examination time. In the settings such as destructive testing, animal experiments in which the occurrence of a survival time is only observable upon sacrifice, and epidemiologic studies in which obtaining more than one examination is not cost effective, current status data are commonly encountered. The nonparametric estimation of the survival time distribution and some smooth functionals thereof have been discussed for current status data by a number of authors, including Groeneboom and Wellner (1992, §2.3), Huang and Wellner (1995), Geskus and Groeneboom (1996) and Geskus and Groeneboom (1997). Semiparametric models based on current status data have also been studied in the literature. Klein and Spady (1993), Rabinowitz, Tsiatis and Aragon (1995), Li and Zhang (1998), and Murphy, Van der Vaart and Wellner (1999) considered the linear regression model based on current status data. Klein and Spady used the profile maximum likelihood method to derive the estimator of the regression parameters which were shown to achieve the semiparametric lower bound. In Rabinowitz, Tsiatis and Aragon’s paper, a class of score statistics that may be used for estimation and confidence procedures is proposed. Li and Zhang minimized a class of U-statistics of order 3 to obtain estimators of the parameters. Murphy, Van der Vaart and Wellner considered the penalized maximized likelihood estimator of the regression parameter which was shown to be efficient. Koul and Schick (1999) studied the estimation and hypothesis testing of the ratio of scale parameters in the two-sample setting, using a U-statistic of order 2. Cox’s regression model has been also studied based on the current status data. Finkelstein (1986), Diamond and McDonald (1991), and Shiboski and Jewell (1992) developed several methods to fit the model. Huang (1996) showed that, profiled over the cumulative baseline hazard function, the profile maximum likelihood estimator for the regression parameter is asymptotically normal with ni-convergence rate. Among the other semiparametric models for the current status data, additive hazards regression model was studied by Lin, Oaks and Ying (1998) and the propor- tional odds regression model was studied by Rossini and Tsiatis (1996). Under certain conditions on the examination time, Lin, Oaks and Ying found that one can make inferences about the regression parameters of the additive hazards model by using the familiar asymptotic theory and software for the proportional hazards model with right censoring data. Rossini and T siatis’s approach in the proportional odds regres- sion model is based on approximating the infinite-dimensional nuisance parameter, the baseline log-odds of failure, with a step function, and carrying out a maximum likelihood procedure. The resulting finite dimensional parameter estimates for the regression parameters are shown to be asymptotically normal and semiparametrically efficient. Although these models, especially the Cox’s regression model, are popular and widely used in practice, in many applications the shape of the baseline hazard is thought to be well understood but the covariate effect is rarely specified precisely. For example, in insurance problems the Gompertz-Makeham hazard has a long tradition of successful application, [Jordan (1975), page 21]. Meshalkin and Kagan (1972) claimed that the logarithm of the baseline hazard is approximately linear for a number of chronic diseases. As an alternative to Cox’s regression model, Nielsen, Linton and Bickel (1998) studied a model where the baseline hazard rate belongs to a parametric class of hazard functions but the covariate part is of unknown functional form. They obtained an estimator of the the underlying parameter by profile maximum likelihood method when the data is randomly right censored. This dissertation discusses the estimation of the underlying parameter in this model (Nielsen, Linton and Bickel, 1998) for current status data. Two estimators are proposed. The first one is obtained by maximizing a profile likelihood where the infinite dimensional nuisance parameter is estimated nonparametrically. This is called the generalized profile maximum likelihood estimator. A set of sufficient conditions are provided for consistency and asymptotic normality. The second estimator, called sieve maximum likelihood estimator, is obtained by maximizing the log-likelihood function with respect to both the finite dimensional and the infinite dimensional nuisance parameters while the infinite dimensional nuisance parameter is constrained to a subset of the parameter space which increases with the increase in the sample size. It is shown to be consistent, asymptotically normal, with its asymptotic variance achieving the semiparametric lower bound. Simulations are conducted to study the behavior of these estimators for small and moderate sample sizes. The generalized profile maximum likelihood estimator seems to have a slightly lower bias and variance than the sieve maximum likelihood estimator. Since the latter achieves the lower bound, as the sample size increase, it should behave better than the generalized profile maximum likelihood estimator for large samples. 0.2 The model Let X, T, Z be a random vector, where X represents the survival time, T the mon- itoring variable and Z the covariate which could be a vector. Let (X1, T1, 21), ---, (Xn, Tn, Zn) be i.i.d copies of X, T, Z. Assume that, conditioned on Z, X and T are conditionally independent. The conditional distribution of X, given Z, is assumed to depend on some parameter and the covariate. In Cox’s regression model, the cumulative hazard rate function of X given Z has the form A0(a:)efi’z, where the first part A0, with unspecified form, is called the baseline cumulative hazard function, and ,3 is a vector of parameters. Nielsen, Linton, and Bickel (1998) proposed an alternative model with the first part depending only on some parameter 00 and the second part with unspecified form. More specifically, the cumulative hazard rate function is of the form A($i 60)g(Z)i where A(:r, 60) is a known function with unknown parameter 00, but 9 is an unknown function. Here 60 belongs to O, a subset of 72" for some (1 Z 1. They discussed the estimation of 90 and 9 under right censoring. In this dissertation we discuss the estimation of 60 and g(z) based on current status data or interval censoring Case I data, where one observes (7145,, Z,),i = 1,2,. . . ,n, with 6, = [(A'iSTir It is assumed in the following sections that 90 is a scalar. For 00 as a vector, similar results can be obtained. Because of the curse of dimensionality, Z is assumed to be a scalar also. Let F (1:, Z, 00) be the conditional distribution function of X, given Z. Assume that the cumulative hazard rate function is continuous. Then FCC) Z, 00) :1_ 9Xp(_1\(I,60)g(Z)) We also assume that the distribution of (T, Z) does not depend on 90 or g, and that if A(t,z,01)gl(z) = A(t,z,00)g(z) for all (t,z) in the support of (T, Z), then 01 = 60 and 91(2) = g(z) for all z. The latter is the identifiability condition. Chapter 1 Generalized Profile Maximum Likelihood Estimation 1.1 Definition of estimators of 60 and g In this dissertation we first use a semiparametric profile likelihood method to define the estimator of the parameter. Both Klein and Spady (1993) and Nielsen, Linton, and Bickel (1998) used generalized profile likelihood methods to estimate the finite dimensional parameter while the infinitely dimensional nuisance parameter was es- timated by the kernel method. The ensuing discussion in this section will be a bit informal. The precise conditions under which all definitions are valid are stated in the next section. In this chapter, 6 is assumed to be a compact subset of R1, and is rewritten as No. One notes that, given (Ti, Z,),i = 1, 2, . .. ,n, the (conditional) log-likelihood for 0 and 9 based on (T,,6,~,Z,~),i = 1,2,... ,n is Eli-109(1- exp(-A(Tt,0)9(Zi-))) — (1 — 6t)A(T.-,0)Q(Zt)l- £21 The idea of generalized profile likelihood methods is as follows: (1) For a fixed 0, obtain the estimates, 99(Z,), of g(Z,-), i = 1, - -- ,n, by using some method such as the kernel method. (2) The generalized profile likelihood for 6 arising when g(Z,-) is replaced by §9(Z,-) is Zl5i109(1— €1‘IJ(—é\(71,9)§9(zt))) — (1— 5i)A(Tt,9)§o(Zi)l- i=1 Maximize it with respect to 6’ to obtain the estimate (9 of 6. (3) If we want to estimate g(z), we treat 0 as the real parameter and use some method as in step (1) or some other method to estimate it. When 0 = 60, 690(Z,) should approach g(Z,-) for all fixed Z,- as the sample size it tends to infinite. Moreover, the convergence must be faster than some particular rate. This is hard to achieve for all 2,, i = 1, - -- ,n, because of the edge effects in the kernel estimation. Hence we use the following modified likelihood for 0 and 9: [11(6) 9) = Z w1(T.-,b)w2(Zi,b)[5i109(1- 633P(-A(1’"t.9)9(Z.-))) — (1 - 5i)/\(71, 9)9(Zi)l i=1 where w2(Z,-, b) = 1 if Z,- is at least b far away from the boundary and 0 otherwise, w1(TJ-, b) = 1 if T,- is at least b far away from the boundary and 0 otherwise. More precisely, for example, if the support of Z is an interval [zf, zg], then w2(Z,-, b) = 1 if Z, is in the interval [2? +b, .2; — b] and 0 otherwise, where b depends on n and b —> 0 as n —> 00. Therefore, the modified likelihood is almost the same as the real likelihood for 77. large enough. In this dissertation, the support of a random variable (or possibly a random vector) with a density with respect to Lebesgue measure means the closure of the set of all points at which the density is positive. To estimate 9 for any fixed 0, our approach uses two dimensional kernel method to estimate F(T:jiZii00)i iij21a°°'in7 and then combines these estimates for each fixed 2' to obtain §9(Z,-), i = 1, - - - ,n. The least square method is used in the latter step. 6 Let K be a kernel and b the bandwidth. Define - ..6K T—T'K Z—Z, F(E,Zi):ZI¢J,z I b(z J) b( I ) 1< ' '< . 1.1.1 Zt¢j,iKb(:’7 — T,)K,,(Z, - Zi) ’ — 2’] - n ( l where Note that 13‘ (~, ) depends on j, i, but we don’t make it explicit until it is necessary. Under certain conditions on K, F and the density of (T, Z), and if b —+ 0 and ab2 —> 00, then, conditioned on T], Z,, in probability, . _ E- ,[6K,(T -— T‘)Kb(Z — 2.)] FT-,Z, —>lim 1’ J ( 3 ) b—>0 Ej,,~[Kb(T — Tj)Kb(Z — Zill _F(T,-, 2.,00)h(T,-,Zi) — h(T:]i Zi) =F(T}‘, Zia 00)) where h(t, z) is the joint density of (T, Z) and EL,- denote conditional expectation, given T], 2,. Therefore F(Tj, Z,-) can be used to estimate F(Tj, Z,, 00). Now if 0 is the real parameter, then —log(1 — F(Tj, Z,,6)) = A(Tj,9)g(Z,-) and A —log(1 -— F(T,,Z,~)) should be close to —log(1 -— F(T,»,Zi,6)) for all j and i when the sample size is large enough. For fixed Z,-, we shall estimate g(Z,-) such that A A(TJ-,9)g(Z,~) is close to —log(1 — F(T,-,Z,)), j = 1, - -- ,n. Let . Z .w (T-,b)A(T-,6)zog(1- F(T, 2.)) 90(21): — 3% 1 J J 2 J , (1.1.2) 2]“;61‘ w1(Tja b)A (71], 0) a least square estimator of g(Z,-), attaining mzi.“ w1(7}ab)l109(1— F(Tji 21)) + M7}, 9)9(Zi)l2- 9( 01%. The counterpart of 69(2) in limit is E [A(T, 6)log(1 — F(T, Z, 00))] E[A(T,6)A(T, 60)] = _ = 1. . where E means the expectation w.r.t. the real parameter 60 and 9. Note that, by (1.1.3), 990 = g. Let F(t, z, 6) = 1 — e—MWM, F 2 1 —— F (1.1.4) and F(t, z, 6) = 1 — (““2096“). (1.1.5) The modified profile log-likelihood that arises when 9 is replaced by 99 is ("1(6) = Zw1(71,b)w2(Ziab)l5t109(1— €$P(—1\(71,9)§o(zi))) - (1 — 5i)A(73a9)§o(Zi)l- i=1 (1.1.6) The estimator, 6, of 60 is the maximizer of the above likelihood over 6 E No. Finally, the estimator of g(z) is defined as 2?..w1(T.-.b)A(T.-,é)log(1— Fa:- 2)) 23.1w.(71,b)A2(T.-.é) ‘ 6(2) = - 1.2 Asymptotic properties of the estimators 1.2.1 Consistency In this section, we state the consistency of the generalized profile maximum likelihood estimator 6. Before doing this, we give various assumptions which will be used to prove the consistency and asymptotic normality of 6. We list the following assumptions. (A1) The respective supports Z and T of Z and T are closed intervals of R1. A(t,6), 9(2) and h(t, z) are positive and continuous on their domains of definition T x M, Z and T x Z. Moreover A(t, 6) is continuous in 6 uniformly for t. The first and second derivatives of A(t, 6) w.r.t. 6, A(t, 6) and A(t, 6), exist, and A(t, 6), A(t, 6) are continuous in 6 uniformly for t, and continuous in t for any fixed 6. (A2) The function 9(2) and h(t, z) are four times differentiable on their domains of definition with continuous 4th (partial) derivatives. Assume A(t, 60) is four times differentiable in t with continuous 4th (partial) derivatives. (A3) The kernel function K is an r-th order kernel supported on [—1, 1], symmetric about zero and Lipschitz continuous on its support. (r-th order kernel means K satisfies: fK(t)dt = 1, ftsK(t)dt = 0 for s = 2, - u ,r — 1 and f |t|’|K(t)|dt < 00.) (A3’) The kernel function K is Lipschitz continuous, supported on [—1,1], and satisfies: fK(t)dt = 1. (A4) b = O(n‘°) with % < a < %. (A5) 60 is an interior point of N0, which is a compact subset of R1. (A6) ' 2F(Ti2700) E (A(T,60)g(Z) + A(T:90)600(Z)) m > 0, where 99(2) is the (partial) derivative of 99(3) with respect to 6. Assumption (A1) or similar assumptions have been seen in the literature, see, for example, Huang (1996), Klein and Spady (1993), Nielsen, Linton and Bickel (1998). Assumption (A2) is a smooth condition on the model, which is used mainly for the asymptotic normality. Assumptions (A3) and (AB’) are made for the kernel. One notes that (A3) implies (A3’). Assumption (A4) is the bandwidth condition in kernel estimation, which is crucial to the asymptotic normality. For the consistency of the estimator, this bandwidth condition can be weakened. Assumptions (A1), (A3’) and (A5) are imposed for consistency of the estimator. To prove the asymptotic normality, we use assumptions (A1) -(A6). Next we state the theorem on the consistency of the estimator. The proof will be given in Section 1.4.2 following the general preliminary Lemmas 1-5 on kernel estimations in Section 1.4.1. Before the proof of the theorem in Section 1.4.2, we give first Lemmas 6—9 on the uniform consistency of F(Tj, Z,) for F (Tj, Z,) and of gg(Z,) and its derivatives for gg(Z,-) and its derivatives, 1 S i, j S n. Theorem 1 Suppose that (A1), (A3’) and (A5) hold, b = 0(n’a) with 0 < a < %. Then the generalized profile likelihood estimator, 6, which is obtained by maximizing ln1(6), converges in probability to the real parameter 60. 1.2.2 Asymptotic normality In this section, we state the theorem on the asymptotic distribution of the estimator and the proof will be given after the proof of Theorem 1 in Section 1.4.2. Theorem 2 (Asymptotic distribution of 6} Suppose (AU-(A6) hold with r = 4 for (A3). Then flu) — 00) => MO, 02), where 2 _ E{[D1(T, 2. at) — MT. 2, 00)]? R(T, z, 60)} [E (Dim 2. 00)R(T, 2. 00))12 , MT. 00)h1(T) Coh(T, Z)R(T, Z, 90) C0 : E‘I\2(T7 90)) D1(ti Z) 60) : A(ti 00)g(Z) + A(ti 60)900(Z)7 ACT, Z: 60) : /A(t7 90)D1(ti Z7 60)R(t9 Z) 00)h(t3 Z)dti and F(ta 23 60) h1(t):Lh(t,Z)dZ, R(t,2,00) = m. 10 1 .3 Simulation Before we prove the stated asymptotic properties of the estimator, let’s take a look at its behavior for small and moderate samples. Assume that the conditional distribution of X, given Z, is a Weibull distribution with distribution function 1— 8-1909(Z). where g(z) : 2. Also assume that T and Z are uniformly distributed on [1,2] and [0.2, 1.2] respectively. For each fixed sample size (n=30, 60, 100, 200 respectively) and appropriate b’s, 100 samples are generated with the real parameter 60 z 1.5 and 100 replications of the estimate of 60 based on the generalized profile maximum likelihood estima- tor (GPMLE) are obtained. The means and standard deviations are shown in the following table. Table 1. Simulation results for GPMLE n b mean s.d. 30 0.0400 1.3847 1.3299 0.0420 1.4915 1.4075 0.0450 1.7596 1.3905 60 0.0308 1.4720 0.9801 0.0310 1.4824 0.9947 0.0312 1.4908 1.0043 100 0.0238 1.4535 0.7720 0.0240 1.4876 0.7702 0.0242 1.5075 0.7943 200 0.0166 1.4560 0.4795 0.0168 1.4990 0.4902 0.0170 1.5421 0.5103 The kernel function used in the simulation is K(:c) = 9/8 —- 15/812, —1 g :1: S 1; 0, otherwise. 11 From the table we can see that the mean is around the true value for all the sample sizes but the standard deviation decreases with the increase in the sample size. The choice of b is crucial to the reduction of the bias of the estimator. 1.4 Proof of the consistency and asymptotic nor- mality 1.4.1 Lemmas preliminary to the proof To prove the consistency and asymptotic normality of the generalized profile maxi- mum likelihood estimator, the uniform consistency of 13' (T,, Z,-) for F (T), Z.) over all 1 S i,j g n, and of gg(Z,-) for gg(Z,-) over all 1 g i g n and 6 E No is proved first. Since gg(Z,-) is a function of F(Tj, Zi) which, in view of (1.1.1), is a ratio of two sums (or means) of independent random variables, we first discuss some uniform conver- gence results of the sums (or means) of independent random variables in a general setting. Lemma 1 Let Y1, - - - ,Yn be i.i.d d-dimensional random vectors. Let D be a compact subset of Rd, and for each t E D, let Wn(t, -),n 2 1, be a sequence of measurable functions on 72“. Let €n(t) = izwnua Y1): t6 13- (1.4.1) i=1 Let 0 < ha 2 O(n"“°) with an > 0 and assume that for some 0 5 s, r < 00, and finite real number CO, d hZIWnUiyll E Co) hZIWnUny) - WnUM/N S Co: ltlj " t231a (14.2) i=1 uniformly for y 6 Rd and for all t, t1, t2 in D. Assume also that E(W,,(t, 14)) = 0, t e D. 12 Then, for all a > 0, [—0 sup l€n(t)| = 0,,(n- 2 hr). (1.4.3) 161) Proof. Let 5 hi An = n n, 2C0d where 0 < 5,, —> 0, to be chosen later. By (1.4.1) and the second part of (1.4.2), for all L1,L2 E D With ti = (til, ' ° ' ,tid),’l=1,2, d |€n(t1)— €n(t2)l : Ooh: Z In. —— t2.) 1:1 If |t1j — t2j| < A", then this inequality and the definition of An lead to |€n(t1) - €n(t2)| S CultisdAn = Since D is a compact space, it is contained in a hypercube. Without loss of generality, let it be contained in a unit cube. Let N" = 1/An if 1/An is an integer, and ([l/An] + 1)“ otherwise, where [2:] means the integer part of 1:. Divide the unit cube into small cubes Cm, i = 1, - -- ,Nn, each with length less than or equal to An. Cover D with sets D F) Cm, i = 1, - -- ,Nn. Discard empty sets and let Din, i = 1, - -- ,.M,,, be the remaining sets. Then t1,t2 E D,,, implies that [th —— tgjl < An, j = 1, - .. ,d. Note also that 1 d M < — +1 . ,, - (A. ) For i = 1, . -- ,Mn, let t, be a point in D,,,. Then, by triangle inequality, 5n 811p |€n(t)| S sup [l€n(ti)l + sup |€n(t) - €n(ti)l sup l€n(ti)| + 7 rev i=1,---,Mn ten... i=1,~-,M,, It follows that F(fgglt..(t)l>e.)313(3):) |€n(t) >553) 21006.. pg). (1.4.4) l-Mn 13 Notice that, by (1.4.1) and the first part of (1.4.2), nh;€n(t) is a sum of indepen- dent and bounded random variables. Recall Bernstein’s inequality (for example, from Shorack and Wellner(1986), page 855): for independent random variables €1,--- ,5" with bounded ranges [—M, M] and zero means, 2 1 a: 0.. < ‘ ————_—— . . p(|g1+ +€n| > :17) __ 2e$p( 22) + M17/3), (14 5) for v 2 va'r(€1 + - - . + é"). Apply the above inequality with 6,- : h;W,,(t,Y,-), a: :— nhgen/2 and v = n03 to obtain en 1 n2h2’c2/4 (IE ( )l > 2) _ erp( 2nC§+Conh;5n/6) Since has" —> 0 as n —> 00, the second term in the denominator of the fraction will be less than the first term for large enough n, and hence the above is less than 2exp(—Cnh3,'ef,), for some 0 < C < 00, not depending on n, hn and en. It now readily follows from (1.4.4), the upper bound for Mn and the definition of A" that d P (sup |€n(t)| > an) S 2 (200d +1) exp(—Cnh,2,ref,), (1.4.6) tED hZEn which is 0(1) if 5,, = en‘lTahgr for all e > 0 and a > 0. The lemma is proved. Next we are going to use Lemma 1 to show the uniform convergences in probability of the means of independent random variables which have the same forms as those in the definition of F (T), Z,), 1 g i, j S n. Moreover, their mean square convergence is also established, which is crucial to the proof of the asymptotic normality of the generalized profile maximum likelihood estimator. 14 Let U 2 (U1, U2, - -- ,Ud) be a random vector in 72" and 7 be a random variable taking values 0 or 1, and U,- = (LE-1.032, - -- ,Uid),’y,-, i = 1, - -- ,n, be i.i.d. copies of U, '7, respectively. Let g be a function on 72“ and K be a function on 72‘. Let Kb(t) : K(t/b)/b, t E 72‘, b depends on n, b —> 0 as n —> 00. Also let i) 2 (v1, v2, . -- ,vd) be a vector in 72". If 11,23 6 Rd and 51:, y E 721, then xii + 3/5 :— (xu1+ yvl, - -- ,atud + yvd). Let also d5 = dul - - - dud in the integration. Define - 1" ~ Tm) = ; 29(Uile(Ui1 - v.) - - -K.(U.. — vi). i=1 The following two lemmas establish the convergence of T n (5). Lemma 2 establishes the convergence rate of Tn(5) to its mean, in probability and in mean square, uniformly in 5. Lemma 3 studies the rate behavior of the asymptotic bias of Tn(i)). Lemma 2 Assume 0' has a bounded (joint) density f(&) with support D, = [8}, t'[‘] x X [33,63], where s{,t’; E 72‘, i = 1, . n ,d. Also assume that K() is a bounded and Lipschitz continuous function with /_00 K2(t)dt < 00, and g(&) is bounded. Then, nbd seug) E|T,,(i}) — ETn(’D)|2 = 0(1) (1.4.7) v f and for all a > 0, W :11; (mi) — Erna») = 0,,(1). (1.4.8) v I 15 Proof. Using the fact that Var(Y) S EY'Z, for any random variable Y, and the change of variable formula, we obtain V ar(T,, (17)) = lIl’c1.r(g(U)I\’b(U1— v1)” Boa/d — val) TL 1 - g ;E{g() (U Kb(U1- v1) Kb(Ud - Udllz _ ~ 2( U1 - v12 Ud— Dd ,~ - _ if [ya u biwm b ) K (—b—)f(U)du : nb‘1’_/i/‘2(U+ bt) K2(t(t1~) ~~K2(td)f(v + bf)dt. Therefore, by the boundedness of f and g, and the square integrability of K, sup ndear(Tn(i))) : 0(1). 176D; Hence (1.4.7) is proved. Apply Lemma 1 with t = 5, D : Df, 5,,(t) = 7",,(23) — Ean), hn = b, r = d and s = d +1 to obtain (1.4.8). Lemma 3 Assume the conditions of Lemma 2 hold. (1) Iff and g are also Lipschitz continuous and K has support [—1,1] and satisfies /K(t)dt = 1, /|K(t)|dt < oo. SUP IETn(?7) - 9(5)f(17)| = 0(5), " 0 vEDf Then, where D? : [s‘i‘+b,t’f —b] x ~-[s;+b,t;—b] (2) Suppose f and 9 have up to rth bounded and continuous (partial) derivatives, and K is an rth order kernel supported on [—1,1], and symmetric around zero. Then SUP IETnii’) - 9(‘5)f(17)| = 0W)- - 0 06D! 16 Proof. We only prove the second assertion since the first one can be proved in a similar but simpler way. Change of variables and Taylor expansion yields E[T( (i) )]— _ :Eg( ((3)1t',(U1 — t) ). Kb(Ud — m) ”'WQ ’(UI—U1)"°I\r(ltd;vd)f(fl)dfi 1 --/ g( (v+bt)K (t)---K(td)f('5+bf)dt 0 ‘ yew .. 'foli [ii-1301'“) or Wfi*))b’t§]fiK(t,)dt = 9('5)f(17') + 0W), y... $N\b + ll" uniformly in v 6 D9, where i)“ :2 (vl‘, - - - ,vd") and vj“ is between vj —b and vJ- +b. In the last two steps, the assumption f11t3K(t)dt = 0, s = 1, - .. ,r—l and f31K(t)dt = 1 were used. The following two lemmas discuss the convergences of two other forms of means of independent random variables based on kernels. They will be used to prove the theorems in the following section. In Lemma 4, it is already centered; and in Lemma 5 there is some kind of centering. Lemma 4 Assume that the conditions of Lemma 2 hold. Assume also that g(v) is the conditional expectation ofy given U = i}. Let n d - 1 ~ sncv) = ; Zl’h — g(Ut-H HleWv — v.)- J: Then, nb“ sup EIS..(?3')|2 = 0(1), 06D} and for all a > 0, «nI-aed sup 15427)) = 0,,(1). 06D} 17 Proof. Note that Hence Var(S (1.)) = E 'Vasr (Sn(v)lU,,i = , ,n)] ' n d 1 - - :E EZQHJI)“ ”g(Uilll—IKIHUU v1):l _ i=1 j=1 The rest of the proof is exactly the same as that of Lemma 2. Lemma 5 Assume the conditions of Lemma 2 hold and that g is Lipschitz continu- ous. Let T1107) = '1‘ Zlflflil — g(fi)le(Uz-1 — 711) ° ° ' Kb(Uid — 71d)- n . 1:1 For the variance part of T7203), we have nbd"2 sup EIT,'1(v) — ET,'l(i2)|2 = 0(1) (1.4.9) {JED} and for all a > 0, an—“b2d sup |T,'l(17) — ET,'l(v)| 2 010(1). (1.4.10) 569} For the bias part, we have the following. (1) Iff is also Lipschitz continuous and K satisfies: fK(t)dt = 1, f |K(t)|dt < 00, and has support [0, 1], then sup |ET,'l(ii)| = ()(b). (1.4.11) {JEDf (2) If f and 9 have up to rth bounded and continuous (partial) derivatives, and that K is a rth order kernel supported on [—1.1], symmetric around zero, then sup |ET,'l(v)| = 0(br). (1.4.12) ~ 0 v61?! 18 Proof. Since we have the difference term g(U,) — g(v) in TM), we should expect a better convergence rate than that of T n(i) The proof is similar to that of Lemmas 2 and 3. For any 27 E Df, V07‘(Tl.(l7)) = éVGT [Mm - 9(0))HKde — ’31)] S 31-19 (9(0) — 9(5))2HK3W1 — val] 1 - ,~, 2i ,2 “1‘1“ 2_——_u"_vd a a =—/“'/l9(u)—9(L)l b241‘( b ) K( b )f( )d = — f - - - [W + bi) — 9(5)l2K2(t1)---K2(td)f(17 + boon” |/\ for some finite real number C, not depending on 27. Thus (1.4.9) is proved. Apply Lemma 1 with t = a, D = 1);, {(t) = T,’,(ii) — E(T,’,(v)), hn = b, r = d and s = d + 1 to obtain (1.4.10). Assertion (1.4.11) and (1.4.12) can be proved in the same way as in the proof of Lemma 3. 1.4.2 Proof of Theorem 1 and 2 Before giving proofs of Theorem 1 and 2, we shall use the general results of the previous section to obtain some preliminaries for their proofs. To begin with, we shall first establish the uniform convergence of 13‘ (T,, Z,) to F (T,, 2,) over all 1 g i, j S n, 942,-) to 942,-), §,(z,) to 9.42,) and §,(z,) to §o(Z.-) over all 1 g i g n and 9 e No. The expected square differences between 13’ (T,, Z,) and F (T), Z,), between §9(Z,-) and gg(Z,-), and between §9(Z,-) and 90(Z.) are established as well. By assumption (A1), let Z = [z’f,z.3] and T = [t",t§], two finite real intervals. Let Z“ = [21‘ + b, 2; — b] and 7'0 = [t’{ + b, t; — b]. Then the support of h, Dh :—.. [t*,t§] x [21325]. Also, let D2 = 7'0 x 2’0. 19 Recall the definition of F(T,, Z,) from (1.1.1). Write 14404212.) + 31mm, 2.) F(T-,Z,- — F(T-,Z,-,6 ) = .. , 1.4.13 .7 ) J 0 Bfljdl)(73, Zi) ( ) where W") (t, z) :i—Zm F(:r,,Z,,60)]K,,(T, — t)K,,(Z, — 2), (1.4.14) nl¢i ,j B791.“ t, Z) —-%(Z[F (T1, Z1, 00)- -F(t, Z, 90)]Kb(T1 — t)Kb(Z1 — Z) (1415) nlyéig' and .2. 1 87(le )(t,z) : E ,; Kb(T, — t)Kb(Z, — z). 3:] We first show that Véj’i)(TJ-,Z,-) and ng’i)(TJ-,Z,) converge to 0 in probability, uniformly over 1 S i, j g n, and the conditional expectation of the squares of them, given T,- and Z1, converge to 0 at certain rate, uniformly over 1 _<__ i, j g n. The same convergence results of Bfijgilfl}, Z,) to h(T,-, Z,) are obtained as well. The previous lemmas are used to obtain these convergence results. More specifically, we have the following lemma. In the following, sup“. stands for SUPlgj,ign and supm, 21,619,; stands for “PearlSj.z<5n,(73.zt)evh)- Lemma 6 (1) Assume that the conditions (A1) and (A3’) hold, and b = O(n““) with 0 0, sup an(t, z)| = op(n‘1+a°b'4) + 0(b), (t,Z)EDh which is op( 1) as 0.0 is chosen to be small enough. This is because of the assumption on the convergence rate of b to 0. Similar argument as above leads to (1.4.18). Similarly, apply (1.4.8) of Lemma 2 and part (1) of Lemma 3 with Tn(13) :- Bn0(t, 2), 9(6) = 1, d = 2 and U, = (11-, 2,) to obtain that sup IBno 11—6. Thus (1.4.20) follows from the same discussion as above. Similarly, apply (1.4.7) of Lemma 2 and part (2) of Lemma 3 with Tn(i2) = Bn0(t, z), r = 4 and d = 2 to obtain sup EIBn0(t, z) — h(t, z)|2 = op(n—%). (t,z)E”Dg 22 (1.4.21) follows from the same discussion as above. The lemma is proved. Since, by assumption (Al), h(t, z) is bounded away from 0 and 00, and F(t, z, 00) bounded away from O and 1, their estimators will also have these properties with probability approaching 1 as the sample size tends to infinity. We then discuss the convergence of these estimators to their limits only on the set on which these prop- erties are satisfies. There exist real numbers 0 < a, 3 (12 < 00 such that a, < infumem h(t,z) and a2 > s11p(,,z)€Dhli(t,z), and 0 < d, 3 d2 < 1 such that d, < inf(1,z)€’ph F(t,.z) and d2 > supuflmm F(t, z). Particularly, choose a, = inf h(t,z) — 6, a2 = sup h(t,z) +6, (t.z)€Dh (i,z)eo,, and d1: inf F(t,z,60)—e, d2: sup F(t,z,90)+e, (t7z)EDh (2,2)6Dh for some 6 > 0. Write F(j‘il(Tj, Z,) for F(T,, 2,) as the latter depends on (j, i), and let F(j")(t, 2:) be obtained from (1.1.1) with T], Z,- replaced by t, 2 respectively. Let An, = {a1 3 min nggi)(t,z) 3 max B£36i)(t,z) 3 a2}, (mevg (1.2)6172 132'.an 132'.an Aug 2 {d1 3 min 13(3):)“, 2) s max F(j‘i)(t,z) 3 d2}. (t,z)E'D2 (t,z)€D?1 19.an 152‘.an In the definition of g(Z1), see (1.1.2), the summation is taken over these j such that T,- E To, i.e. w,(TJ-,b) = 1, j = 1, - -- ,n. As we discuss the convergence rate of g(Z1) to g(Zi), we want to exclude the case when all the T ,- fall into the edge area, more specifically, 231:1 w,(T,-, b) = 0. Therefore, define ...={s;...,,,..}. 23 It is easy to see that, the probability of the complement of 14713, P(Af,3) = 0(b"), by the assumption (A1). Let An 2 Anl n A112 fl An3- The probability of An is expected to go to 1 as n tends to 00. This is proved later. Next the main results used to prove the consistency and asymptotic normality of the generalized profile maximum likelihood estimator are established in the following two lemmas. Lemma 7 (1) Assume condition (A1) and (A3’) hold, and b = 0(n‘“) with O < a < %. Then sup |13‘(T,-,Z,-) — F(T,, 2,, 00)| : o,(1). (1.4.26) (T,.Z.-)6Dg (2) Assume condition (A1),(A2) and (A?) hold, and b = 0(n—a) with 11—6 < d< %. Then sup E,-..-IF(T,,Z.) — F(T,-2.40%.... : 0,01%). (1.4.27) (Tj,Zi)E'Dg Proof. Note that h(t, z) is bounded away from 0. Thus (1.4.26) follows from (1.4.13), (1.4.16), (1.4.18), (1.4.19), and (1.4.27) follows from (1.4.13), (1.4.17), (1.4.20), (1.4.19). The lemma is proved. Recall that 99(2) is the first (partial) derivative of g()(z) with respect to 0. Let 'g'g(z) be the second (partial) derivative of 99(2) with respect to 0. Similarly define 20(2) and 20(3). Lemma 8 (I) If {AU-{A3} hold, and b = 0(n‘“) with 116 < a < i, then sup 47.1442.) — 99(2.)I2IA. = 0,44%), (1.4.28) 74:5" 24 and 1 . -1 SUP Eilgt)(Zi) ‘ 90(Zill2IAn = 011(71 2)- (1-4-29) 2.630 OEA‘O where E,- stands for the conditional expectation given 2,. (2) If condition (A1), (A3’) hold, and b = 0(n’“) with 0 < a < i, then SUP lflo(Zi) — 90(Zi)l Z 011(1), (1-4-30) Z, 6 Z") sup |§o(Zz-) - 99(Zz-ll = 0,,(1), 2.62,“ BEA/o and sup (50(2.) — §o(Z.-)I = 0,0). Z,€Z° OEAm Proof. We prove only (1.4.30) and (1.4.28). The proof of the remaining results will be similar. In view if (1.1.2) and (1.1.3), g,(Z,-) — gg(Z,-) can be decomposed into Rn1,9(Z,-) + Rn2,9(Z,~), where A 2,4. w1(TJ» blAlij 9)l109(1 - F(T,, 21)) - l09(1 - F(T,, 21,90)” Rn , Z. = — ”l ) gimme/14:13.0) and Z z-wl(T-.b)/\(T'.(”MT-.190) EA 2:9 A T,6 R112.0(Zi) : HE ,J J 2 J — ( 2) ( 0) g(Zi)- 21¢, w,(Tj,b)A (73,6) EA (T, 6) It is enough to show that sup E,|R,,,,,,(2,)(21A, = o,,(n-%), k = 1,2, (1.4.31) Z1620 GENO under the conditions of part (1), and sup ank,o(Zi)| = 0.41). k :12, (1.4.32) 3'55: under the conditions of part (2). By the mean value theorem, lx-yl 1.4. ,Ay, < 33 ”09(1) -109(y)| S A for all positive 31:,y. Apply this with a: = 1 — F(T,, Z,-) and y = 1 — F(T,, Z1, 60) to obtain Ilog(1— Pow.» — 109(1 — F(T.,Z.-,00))I s '.F (71‘2") ’ F (212,90), . (1" F(Th Zill A (1 _ F(Th 21300)) (1.4.34) By the definition of Rfl,,g(Z,), and the boundedness of A(t,6) away from 0 and 00, we obtain that, on Ang, A sup sup an1,9(Z1)| g C sup |log(1— F(T,-,Z,)) — log(1 —— F(T},Z,-,60))|, 2611561300 (3.206131. for some constant 0 < C < 00. This, (1.4.34), (1.4.26), (1.4.27) and the boundedness of F(t, z, 60) away from 0 and 1 imply (1.4.31) and (1.4.32) with k z 1. The Lipschitz continuity of A(t, 6) with respect to 6 uniformly in t, and the uniform SLLN imply (1.4.31) and (1.4.32) with k = 2. (They can also be proved by applying (1).) Notice that F(T,, 2,) does not depend on 6 and 99(Z,) depends on 6 only through MT], 6). By the assumption on A(Tj, 6), similarly we can prove the remaining asser- tions. The lemma is proved. We shall show that the probability of An approaches 1 as n -+ 00. Lemma 9 Assume that (A1) and (A3’) hold. Then lim P(A,,) = 1. 71—)00 Proof. It suffices to show that lim P (Auk) = 1 or equivalently lim P (Afm) = 0, k = 1, 2,3. n—>oo 71-900 26 We have seen that limnaoo P(An3) = 0 by its definition. We first prove the above assertion with k = 1. By the definition of Am, its compliment equal to sup |B,(,J,,’i)(t, z) — h(t, 2)) > e (t,z)E’D2 194311 One also notes that Boa 2) — 353,0“, 2) = [Kim — t>K,(Z. — z) + KblTi — t)Kb(Z.- — 2)] /n. the absolute value of which is less than C/(nb2) for all 1 g i, j g n, (t, z) 6 D1,, and for some finite constant C, since K is bounded. Hence we have P sup 185.].“(44—4441» (t,z)evg 151,357; g P ( sup |Bn0(t, 2) — h(t, z)| > e — C/(nb2)) (t,z)evg which is 0(1) in View of (1.4.25) and that nb2 —> 00 as n —> 00. We thus obtain lim P(A,,,) = 1. 11—)00 Let F(t, z) = 2721 6114(7) _ thblZ’ " 2). 21:1 KAT, —— t)Kb(Zz — 2) Similarly, one can obtain ~ P(Af,2) g P ( sup |F(t, z) — F(t, 2,60)| > e — sup |F(t,2) — F(j’i)(t, z)|), (t,z)evg j,z‘,(t,z)evg which is 0(1) if sup |F(t,2) — F(j’i)(t,z)l = 0,,(1). jai,(t,Z)ED2 This is easy to show and omitted here. The lemma is proved. 27 Proof of Theorem 1 (Consistency) It is enough to Show that ln1(6)/n con- verges in probability, uniformly in NO, to a nonrandom function that has unique maximizer at 60. We are going to prove later that sup |ln1(6)/n —ln1(6)| = 0,,(1), (1.4.35) 6ENO where ln1(6) 1 71 By a uniform law of large numbers, which holds under our conditions, and the fact that P(w1(T, b) = 0) = 0(b) and P(w2(Z, b) = 0) = 0(b), we obtain sup |ln1(6) — l(6)] = 0,,(1), (1.4.36) OENO where 1(9) = El5l09(1- CHM-MT, 9)go(Z))) - (1 - 5)A(T, 9)go(Z)l = f/[U —- e"A(t'9°)9(z))log(1 — e”\("9)99(z)) — A(t, 6)gg(z)e"'\("9°)9(zl]h(t, z)dtdz. This can also be obtained by apply Lemma 1 with t = 6, {(t) = lm-(6), D = No, r = s = 0. Next we prove that l (6) has a unique maximizer at 60. One notes that the function f(y) = (1 - e’x)log(1 - e7”) - ye“ attains its maximum at y = :1: for any a: > 0 and y > 0, because e‘y - e"” f’(y) = —————, l—e‘y 28 which is positive for y < 2:, equals 0 for y = :1: and negative for y > :17. Apply this with :1: = A(t, 60)g(z) and y = A(t,6)g9(z) to obtain that 1(6) 3 [(60), and [(61) = l(60) ifl' A(t,61)g,91 (z) = A(t, 60)g(z), (t, z) 6 D1,. This and the identifiability condition imply that l (6) < l(60) for any 6 51$ 60. Therefore [(6) is uniquely maximized at 60. This, (1.4.35) and (1.4.36) prove the theorem. Now we establish (1.4.35). Write 111,-,- for w,(T,-, b)'LU2(Zi, b). It is enough to prove that su — w,,6,- lo F(T,,Z,,6)) w,,-6,-10(F(T,-,Z,~,6 =0 1 1.4.37 06301 2: g(( —23 g( >11 ,() ( ) and Slip l_:wii(1_ “1176)96(Z' )_Z::wii(1— — Mm16)96(zi‘—_)l 0P(1)' 6623/0 n i- 1 (1.4.38) Apply (1.433) with :1: = F(T,, 2,, 0) and y = F(T,, 2,, 19) to obtain A isZia _ i1 i30 Ilog(F (71.2.4)) — log(F(T.-,Z.,0))I 3 WT 0) ”T Z )' (1.4.39) F(n2Zi16)/\F(nizfig) By the mean value theorem, Ie'r — e.” 3 la: — yI, (1.4.40) for all positive :3, y. Apply this with :1: = A(T,-,6)§9(Z,-) and y .—: A(T,-,6)g9(Z,) and recall the definition of F(T,, Z,,6) and F(T,, Z,,6) (see (1.1.5) and (1.1.4)) to obtain that the right hand side of (1.4.39) is no more than A(T.,9)|9(Z.) — 912.): 1 F(T'iaZiyg) A F(Tiaziag) Therefore the left hand side of (1.4.37) is no more than SUPaeNo,(T,,2.)eDg A(Tz‘, 0)l69(Zi) — 90(Zi)l infOENo.(T,-,Z,)E'D2 F(T,, Z130) A F(Tia Zia 9) 3 29 which is 0,,(1) because of the boundedness of A, (1.4.26), (1.4.30) and the boundedness of F (t, z, 60) away from 0. This proves (1.4.37), and (1.4.38) can be proved in a similar way. Hence (1.4.35) is proved. Proof of Theorem 2 (Asymptotic normality) We first prove the following. sup (F(T,, 2,, 6) — F(T,, 2,, (1)1 = 0,,(1), (1.4.41) (T120602 BEA/'0 where F(t, z, 6) is defined in (1.1.5) and F(t, z, 6) is defined in (1.1.4). Apply (1.4.40) with :c = A(t, 6)f]g(z) and y = A(t, 6)gg(z) to obtain A lF(t1Z16) — F(t,2,9)l S A(t16)l60(z) _ 99(Z)|. This, the boundedness of A and (1.4.30) imply (1.4.41). By the definition of 99(2), see (1.1.3), and the assumption on A and 9 (see As- sumption (Al)), g()(z), as a function of 6 and 2 on No x Z, is bounded away from 0 and 00. It follows from the definition of F (t, z, 6) that, as a function of t, z and 6, it is boundedness from 0 and 00. Let Dim, 2., 0) = Aw, (9)422.) + Am, (1)4,(23, (1.4.42) and D1 (717 Zia 0) : A(7’176)90(Zi) + AA(T;,6)99(Z1-) It follows from part (2) of Lemma 8 that sup ID1(T.-.Z.,9) — D1(TiaZia6)l: 0,0). (1.4.43) (T,,Z,)ED2 Now we begin to prove the theorem. The derivative, with respect to 6, of the modified profile log-likelihood, ln1(6), defined in (1.1.6), is given by a n filnl (0) .2 2 wii i=1 6" — 1 D,(T,-,Z,~,6). Fm, 21,9) 30 Let 1 6 n 6 2 _—ln 6 Then, by the mean value theorem, 0 = 3,,(19) = 5,,(190) + «no? — 60)S‘,,(O*), (1.4.44) where 6* is between 60 and 6, and n ill—F 1) 2'70 " =__Zw,, F( (T Z )lDf(T,,2,-,9) T,,,-Z,0) + :23” FHLZfl) d xM<7t0wwv+muflflfid&%wfiflflfidflfl We are going to show that Sn(6*) converges in probability to a positive number. To do this, let 3;;(6) is obtained from 3,,(9) with F(T,, 2,, 9) replaced by F(T,, 2,, o), D1(T,-, Z,,6) replaced by D1(T,~,Z,,6), §g(Z,) replaced by 90(Z1), 99(Z,) replaced by g,(Z,-) and 542,-) replaced by g'9(Z,-.) In view of (1.4.41), (1.4.43), part (2) of Lemma 8, boundedness of F (t, z, 6) away from 0, and the boundedness of A, A and A, we obtain sup (3,,(0) — s;(0)| = o,(1). (1.4.45) GENO One also notes that, under assumption (A1), 33(6) is Lipschitz continuous in 6 on No. This, (1.4.45) and the triangle inequality imply lawn—Suan=401 04%) Since 3;;(60) is the mean of bounded random variables, it follows from the SLLN (Strong Law of Large Numbers) that 3,:(60) converges with probability 1 to F(T, Z) 60) —E __ F(T,Z,90) D‘f(T, 2, 00) 2: —d(9,,). 31 This and (1.4.46) imply that S,,(6*) converges to —d(60) in probability. Hence it follows from this and (1.4.44) that. yew—90) = d-1(90)s,,(90)[1+o,(1)]. (1.4.47) Next we are going to find the limiting distribution of 8,,(60). Write g(z) for 690(2). W'rite 5,,(90) = E, + Q... (1.4.43) where 1 n [31(7‘2, 2:360) En = — 1.0,, (S, — F 71', 21,6 .. , 1.4.49 W7- 2; l l 0)] F(T.~.Z.-,6o) ( ) and min, 2.90) F(T,,Zeeol " 1 " - Q. = 752321)..- [F(T.~, 2.,90) - F(Tt.Z.~.90)] (1.4.50) i=1 Both E, and 62,, has contributions to the limiting distribution of 5,,(60). First we deal with E". Write 111,-, for w1(TJ-, b). By the definition of En, it can be rewritten as the following 1 " Di(T,-. 21.00) EnZ—‘E 1,61—Ffl,Zi,6 , 1.4.1 \/7—l 1:] w i ( 0)]F(11)Zi360) + R" ( 5 ) where 1 " . ~ Rn = 75 12:1 wiil6z‘ - F(Ti, ZtigollD(Ti, Zi) and 7' D1(TiiZi100) Dl(naZi700) D 71,21 = A — . 1.4.52 ( ) F(T,, 2,9,) F(T,, 2.90) l ) In view of (1.4.41) and (1.4.43), R, is expected to go to 0 in probability. To prove 32 this, we show the expectation of R3, converges to 0 as 71 tends to infinity. Note that = — :2 tit-.16 F(T,, 2., 90)12D2(T 2) £2 E uililldil — F(TiiniligollD(Tiini1 ) i1¢i2 X wig, [(512 — F(Tz'zi Zi2360)lD(:riga 212-) That the first term on the right hand side of the above expression is 0,,(1) follows from SUP lblTi'aZill 2 011(1): (T,,Z.')E’Dg which in turn follows from (1.4.41), (1.4.43) and the fact that F(t,z,6) is bounded away from 0 and 1. To prove that the second term of the expression of R3 goes to 0 in probability, define the following Zane”- 5kal-Tk " T,)K,,(Z,, -' Zn) Zea“, K1477: — T,)K,,(Z,, — Zn) ’ For 1 g i1,i2 S n and i1 75 i2, let D(‘2)(T,-,,Z,-,) be obtained from D(T,~,,Z,-,) with F“2’(T,-.Z.-.) = 1 31.21.12 3 n, 2‘, #12. F(T,-,Z,,), 1 < j g n, replaced by F(‘2)(Tj,Z,-,). For any 1 g i, g n, by the definitions of D(T,-,,Z,-,) (see (1.452)) and D1(T,-,,Z,,,6) (see (1. 4. 42)), D(T,-,,Z,-,) depends on F(T,-,2“), 1 S j g n, through 990(Z,,). See (1.1.2) for the dependence 0f 660(21'1) on 1317113211)? 1 S j S n- One can see that, for 1 g j,i,,i2 < n and i, ¢ i2, 5: 2TKb(i Z’leblzi 2-211) if 12 #3- F 721125..) s 263319 (__ <2 — 1222,. 2cg+conbze,/3) —- ”M C" C) (1.4.54) 33 The last inequality holds for some finite and positive number C if b215,, = 0(n‘“) for some 0 < a < 1. In view of the proof of Lemma 1, instead of using 5,,(t), en, ha, (1, r and s in the proof of Lemma 1, here using nW(t, z), nen, b, 2, 2 and 3, then (1.4.6) there, with the exponential part replaced by that of (1.4.54), leads to 4 2 P( sup |W(t, 2)] > 5") g 2 (aneb +1) eatp (—Cnb2€n)a (t,z)ev,, n which is 0(1) if 5,, is chosen to be n‘(1”“°)b’2 for all 0 < a0 < 1. For these values of a0, 5an = O(n‘“‘“0l), so that (1.4.54) holds. It follows that sup |W(t, z)| = 0,,(n—(1‘aolb—2). (1,2)61), Since b = 0(n‘“) with % < a < i, the above rate is 0,,(n‘l1‘00‘2‘ll) and is 0,,(n‘i) if 0 < on < % — 2a. Therefore, we obtain sup |W(t, z)| = 0,,(n‘i). (1.4.55) (t,Z)E'Dh By (1.4.19) of Lemma 6, and that infumeph h(t,z) > 0, it follows from (1.4.53) and (1.4.55) that . - i -1 Sup lF(TjaZi1) _ F(2)(TjaZi1)l: 0,,(71 2)‘ 19,11,229: i1¢i2 By the definition of Dll2)(T,-,, Z,,) and D(T,-,, ,,), 1 _<_ i1, i2 3 n, i, ¢ i2, and assump- tion (Al), we can obtain that, with probability approaching 1, sup lb(T..,2..)1‘9‘1141"...2..) and 16.. — F(T..,2..)lM.-. — F(T... 2..)llD(T.-. , 2..) — D"‘”(T...2..)1D“”(T..,2..) are zero. Thus their expectations are 0 too. Therefore the expectation of the second part of R3, is equal to .1. ..,_ ~___~(.-,)_. E (n 2312;; weld. F(T..,2..,0.)HD(T.., 2..) D (T... 2.)l X wi2i2[5i2 — F(Tt'g. 2.2. 901llblTis. i2) _ D“”(71..Z..)l) which is o,( 1) in view of (1.4.56) and the boundedness of 6, — F (T,, 2,). Therefore, Rn = 012(1). and hence, by (1.4.51), 1 Zn Di(Ti.Zi.90) En:— ii6i_Fn)Zi76 1 fl i=1 w [ ( 0)] F(Y-lza Zi.90) + 0p( ) Since P(w,, = 0) = 0(b) = 0(1), D1(T,,Z,,60) is bounded, F(T,,Z,,60) is bounded away from 0, and the conditional expectation of 6, — F (T,, Z,, 60) given (T,, Z,) is 0, it is easy to see that _ 1 n . D1(7-;aZi700) E. — ,5 gm. F(T..2.,9.)l F(,.” 2,, ,0, + 0,,(1). (1.4.57) Now we deal with Qn. Recall (2,, from (1.4.50). Under the condition of the boundedness of A(t, 60), A(t, 60) and F (t, z, 60) away from 0 and 00, by the uniform boundedness of Q(Z,) and hence F (T,, Z,, 60) on An (defined before), applying (1.4.40) with :1: = A(T,, 60)§(Z,-) and y = A(T,, 60)g(Z,-), we can see that 1 n D T,,Z,-,6 lo..-—23w..-1F(T.2..9.)—F(T..2..9.)1 “ ”)II... fl i=1 F(Tiazi.90) < — 29.99 2.) ll9(2.~) — 9(2))! + 19(2) — 9(2.)IlI.. %2:3I(z 62011912 Z-)|2 + |9(Z )— 912..)11912.) — 9129111.... Here C is a positive and finite number. Taking expectation first conditioned on Z, for each sub-term, we obtain that, by (1.4.28), the expectation of the first term in the last display is 0,,(1). It follows from Cauchy-Schwartz inequality, (1.4.28) and (1.4.29) that the expectation of the second term is also 0,,(1). Hence we obtain that, on An, D1(Tia Zia 00) F(Ti. 21. 90) Qn = :71: Ewii[F(n, Z1, 90) “ F(Tt'. Zia 00)] + 0p(1)' (1°4'58) By Taylor expansion of 1 — e‘” with respect to a: at some point 2:0, applying this with :1: : A(T,,60)§(Z,) and :170 = A(T,, 60)g(Z,-), noticing the boundedness of A, we obtain “F(T‘za Zi) 60) _ F(Y-ia Zia 60)] + A(Y’t) 60)F(na Zia 60)[Q(Zi) _ g(Zi)“ S Cl6(Zi) — g(Zill2) for some finite and positive number C. This, the boundedness of f(T,-, Z,) (defined below) and (1.4.28) imply that, on An, Q. = 71—7.,- 23w..t(T.-. 2919(2) — 9(2)] + 0.41) (1499) where {(t, z) 2: A(t,60)D1(t, 2,60)H, (t, z) 6 D1,. (1.4.60) 36 Let 1 n 0.0 = E Zwflfim, 00). #1 By the law of large number and that P(wj1 2 0) = 0(1), Cno converges in probability to EA2(T, 00) which is (:0 according to the notation used before. By Taylor expansion of 109(1-33) with respect to :1: at some point 130, applying this with :r : F(Tj,Z,) and :50 = F(T},Z,,00), noticing (1.4.27) of Lemma 7, we obtain that, on An, 912.) — 912 CO —239.1An(T.-,99) 119911 — F129, 2.)) — 19911 — F129, 21, Am 1 1 . A(T- 19 F T- Z, —F T-,Z,—,9 _1 : C _ “’31 J) 0)[ 53]]: g 0 ( J 0)] +0p(n 2). nOnj¢i (j, 1'90) This and (1.4.13) imply that, on An, 912.) — 92( )— 12.112) + 19.212 )+op(n 2) (1.4.61) where 1 1 v.5“) T ,Z, Rn1(Z,) = C _ij,A(T,-,90) _ ( 3’ 99)) , (1.4.62) 110 n jii F(Tj, Zi, 00)B,(IJ0 (Tja Zi) and B‘j"’(T.-, 2) R.,( —Zw,-,A( 3,90) _ < ) . (1.4.63) 0110”] F(T},Zi,60)Bn‘7t (73,21) Substitute (1.4.61) into (1.4.59) to obtain that, on An, Q11 : in + Qn2 + 0p(1)a (L464) where Q,“ = “3/: 21392991711, 2012.112.) 37 and Q...— — ——-—— “15239.91 12.2) 2.212). 114.65) Let Cnl 2: inCno. Then 1 ” A(-)T,60 ),,-V(T Z) C. =-——— 9.... :62) J J’ ‘ «a: 5‘ 69%“ 212.2)20122) , ' ‘. i Y. . . . o By Taylor expanslon of I Wlth respect to :1: at some pomt 170, applymg thlS With a: = Bffai)(Y}-, Z.) and 1‘0 2 h(TJ-, Z,), we obtain 1 n . A(TjagO) (,9) Cn1=-fi;wn€(7}.zi)anwJ-1-(I},Z-—-—)_Vn1(Ty-.2.) X [MiG-12.) ’ 19‘1212.2)12(322”(732Z>— h1T..2.)) , where h*(T,-,Z,-) is between h(T,-,Z,) and BLQ"(T,,Z,). By (1.4.17) and (1.4.21) of ' Lemma 6, and the boundedness of h, f, A and F, using Cauchy-Schwartz inequality, we obtain that 1 " A(T 190) 11 :——E ii 7’113Zz)— 7:15;“ J, /n GZi 1 Let 1 {(71121) n , '— i1 K Z,‘ — Z C (T’ 2') 2.2.9.“) F12..2.)h1T..2) ”( ’) and h C, d!‘ ((1.2): (T6022) (9 Z) 5, (1,9) eDh. (1.4.66) where {(9, z) is defined (1.4.60). Then, by the definition of WWI), 2.), 1 g 9', j g n, (See (1.414)), and change of summations, 1 Cm — \7—5:[ (51— 1707. 21)]; ZUI’112‘\(7}.90)K9(73 — T1)Cn(Tj. Zz) + 019(1)- 9921 38 Let 11.2,,(2b) = 1 if T, E [t‘i‘ + 2b, t; — 2b] and Z1 6 [2f + 2b, z; — 2b], and 0 otherwise, 1 g l g 71. Now we write the main part of C,” as the sum of two parts according to whether 111,,(2b) = 1 or O. The reason for doing this is because of the edge effect of the kernel estimation. Write C... = Cn1+C21+op(1), (1.4.67) where - 1 " C... = ——2329..12b)19. — 171,2.)1—an A1 12.6.)Kb1TyT)<.1T.-,2) fl (=1 j¢l and 031 = —% 2(1— wu(25))l51 — F(T,, 201% Z “111M731 60)K9(T9' - T1)Cn(Tj, Z1) - #1 Since conditioned on (T.,Z.-),13= 1, - u ,n, 61— F(T,, 21,00) and 15k — F(Tk, Zk,90) for 1 7f k are independent with mean zero and variances F (T), Z,,00)[1— F (T), Z;, 60)] and F (Tk, Z,,, 90)[1— F (Tk, Zk, 00)] respectively, by taking conditional expectation first, we can see that 2102.)? = £23411 — wu(2b))F(T1. 2., 9911 — 1212.219.» (=1 x l wle(TJ-,00)Kb(Tj —T,)C,.(TJ-,Z,) 2 (n 2 ) l j¢l which is 0,,(1) because 1 a. Z EIA(T7, 00)Kb(Tj — YDCnCTja Zl)| #11 1\(-T 90)€ )(T. Z) <— 3’ ’ KZ.-ZKT-—T =01 and E (1 — wll(2b))¢ = 0(1). Here we use the boundedness of W and that supmznengllKfiZ, — Zl)Kb(Tj — T,)| < 00. Therefore, we obtain 03, = 9,,(1). (1.4.68) 39 Write énl = 1.11 + 6.212. (1-4-69) where 1 , C... = —— $521321 29) 19.— F1T.. 2.90)); ZwflMTTO°Mba3 — T.)<(T.-. 2.) #1 and Cnl2— _——1—;u’u( 21)) [—51 1701.21.90” x n: 23 w..A1T.. 9.)K.1T. — T.)1<.1T., 2.) — <1T.-,2.)1- .1525! Note that ((15, z) is four times differentiable under the assumptions. Apply (1.4.7 ) of Lemma 2 and part (1) of Lemma 3 to ("(Tj, Z.) with d = 1 and r = 4 to obtain sup E...1<.1T.,2.)—<1T..2.)1 =0.1— +0.11%). 114.20) 1 b) Z.€[zf+2b,z§—2b],TjeT0 11 Because of the conditional independence of 15. — F (T., Z,, 00) and 15k — F (Tk, Zk, 00) for l aé k with mean zero and variance F(Tl, Z¢)[1—F(T., Z.)] and F(Tk, Zk)[1—F(Tk, Zk)] respectively when (T., Z.),z' = 1, - - - ,n are given, as before, we have C)... ——%2:32{w..12b)FF1:/1.2..9.)11 — F1T..2.,6.)1 x l; Z,,w.A1T.A)21T. — 2119.12.21) - 1.12.291]? <22 [9..129)F1T..2. 9.)11— F1T..2..6.)) _ 1 2 x —239..A 123.9.)K.1T.— T.)— ”2311.12.29 — <.1T.-.2.))2]. #1 ”#1 The last inequality follows from Cauchy-Schwartz inequality. Since K is bounded, 311-2 lel S C/b2 for some finite number C. This, the boundedness of F (T.,Z.,00) and A(T.,60), and (1.4.70), imply that 21‘...)2=0.1—— 1 )+0.1b‘2) nb3 40 which is 0,,(1) as b = 0(n‘“) with 21; < a < % (See assumption (A4)). Therefore, an Cum 2 0,,(1) and it follows from (1.4.69) that 1 Cnl — £2"; UM ()2b [—51 F(Tz. 21)]; Z'wle(Tj.60)Kb(Tj " T1)C(Tj.Zl) #1 + 0,,(1). Let n.1T.),Z.-— £21111“ 1T.,0..)K.1T.— T.)<1T.,Z.). 13157., ”#1 and 77(t, z) = A(t,00)C(t, z)h.1(t), (t, z) E 'Dh. (1.4.71) where ((t,z) is defined in (1.4.66), and h1(t) is the marginal distribution of T as defined before. Similarly, we can obtain Cnl '2 «22le (2b) )—[(51 F(T},Z()]77(CT(,Z1) +0p(1). Since P(w..(2b) = 0) = 0(1)), 7)(t, z) is bounded, and the conditional expectation of 6., given (T., Z.), is 0, it is easy to see that (in. = —i i315. — F(T,, 201.7171, Z.) + 0 (1). (1.4.72) 1/5 = p Since Q... = CHI/C...) and 0110 — co = 0,,(1), it follows from (1.4.67), (1.4.68) and (1.4.72) that 62.. = ——— «5215.— F1T.,Z.)1n1T.,Z.) + 0.11). (1.4.73) Now we deal with Qn‘z. Let CH2 2 QnQCno. If follows from (1.4.65) and (1.4.63) that an fizwu€( T 21‘)”: — (\(7330202) B£j,2)(7:7, 2;) n19“ flF(73’Zi’90)BnJO, (7372i) 41 As b = ()(n”“) with fi < a < i (see assumption (Al)), (1.4.20) and (1.4.21) of Lemma 6 and the same arguments as we were dealing with C... lead to \(T .90) .. C71 — n (7132') J, B0,!) T'aZi 1 - 2— £21 61,1,Zw..F(TH‘Z 00),,(TTT) . 1. )+o.1) That is, 353577}, Z.) can be replaced by h.(TT-, Z.) with a small difference 010(1). By the definition of 3713,.)(73, Z.) (see (1.415)), 1 n 1 MT 60) 0.. = —— 111.. T1.Z.‘ — 717‘ - J, 2 fl Z; a )n 3; )1F(Tj.Zi.90)h(73. Z.) 1 x ; Z1F1T. 2.6..) — HT. 2. 11..)1K.1T. — T.)K.1Z. — Z.) + 0.11) l¢i,j :: 1120 + 017(1)? say Note that _E(T'zini1)A(TJl’00) 01120) _ "512 Z Z wzlilellF F(le,Zil,60)h(Y}1,Zil) 11=1J1J5i1 (H511 J1 X[F(Tl13Z11100) _ F(TJHZIHQO) le( T11 — 731)Kb(er _ Zii) €(Ti212i2)A(T7'2160) X u’igizu) )1 2: 3;? 12;” J F(Tji” Z’2’ 60)h(TJ21 Zi 2) X [F(leaZl-zvgo) _ F(Tjw Zi2160)]Kb(T12 — TJz)Kb(Z12 _ Ziz)‘ Since E{[F(T, Z, 60) — F(t, z, 60)]Kb(T — t)Kb(Z — 2)} 1 1 2/ / [F(t-1— b11,z + bv,190)— F(t, z, 00)]K(u)K(v)h(t + bu, z + bv)dudv 0 0 = 0(b4), uniformly in (t, z) 6 Do, for the terms with l1 # l2, conditioned on TTUZ.“ 732, Z,,, its expectation is of order 0(b8) uniformly in 31,332,131, 12 and hence the sum of the expectations of these terms is of order 0(nb8) which is 0(1) if nb8 —> 0. Note also 42 that E[7.L 11|F( (’33160)_ F(T 21‘ 60)]Rb(t —T')Kb(z —- Zi)” :/0(b/1 IF( t+bu 2+1”) 90)— F(t,2.90)IIK(U)K(U)h(t+bu,z+bv|dudv For those terms with l. = 12 and 1'. 91$ 12,]. # jg, conditioned on T.,,Z.,, its ex- pectation is of order 0(b) uniformly in T“, Z.,, hence the order of the sum of the expectations of these terms is also of order 0(b) since there are 5 summations. This order is also 0(1). Similarly, the sum of the expectations of the other terms is of order 0(1). There- fore, E(Cn‘20)2 = 0(1) and hence Cm... : 0T,(1). Therefore, an and hence Qng is 0p(1). This, (1.4.64) and (1.4.73) imply that Q. = _EST/lth :15.- F(T., Z.,60)]11(Tz, Z.) + 0,,(1). (1.4.74) If follows from (1.4.48), (1.4.57) and (1.4.74) that, on An, 0101.21.90) 77(TI.ZI) 6)0 =\/_12[6[— F(T‘I,Z(,00)] [F(T},Zl,60) —' CO ] +0p(1). (1.4.75) This and (1.4.47) imply that, on An, chrlaZlaBO) "(Y-lazl) ML”:d_,(60)_1fi§1..41712.9... [F(T..Zz.0o> _ ] + 0.11), where 77(t, z) is defined in (1.4.71) and F(T, Z: 00) W") 2 E [F(T, Z, 9..) DT(T1 Z) 00)] ' Since D.(t, z, 00) and 17(t, z) are bounded and F(t, z, 60) is bounded away from 0 on D.., the theorem follows from the central limit theorem and Lemma 9. 43 Chapter 2 Sieve Estimation 2. 1 Estimation The second approach uses the idea of sieve and is analogous to that of Rossini and Tsiatis (1996). The goal of this chapter is to estimate 0 efficiently, with a(z) = log(g(z)) as an infinite dimensional nuisance parameter. The rescaled (conditional) log-likelihood of 0 and 0 based on (7},6,,Z,),i = 1,2,... ,n. is L,(9, a) 2 £2 [9,logF(T,-, Z,, 9, a) + (1 -— 6,)zogF(T,, Z,, 9, a)] £21 1 n .r, 0(2') =_ 9,1 1— “(Fume ‘ — 1—6, AT,- 9 0W] 2. .1 7,;[ome >< >(.)e (1) Here F(t, 2:, 6, a) = 1— e“‘<"">€°“), F(t, z, 9, a) = 1 — F(t, z, 9, a). (2.1.2) To maximize the log-likelihood over all possible 0 and a, we should set a(Z,) to be positive infinite if 6,- = 1, and negative infinite if 6,- = 0. Hence the maximum likelihood estimator over all possible functions a does not exist. The log-likelihood function is maximized as oz varies over a small set of functions which depends on the sample size. More specifically, we approximate a by a step function with known jump 44 points and maximize the log-likelihood as a varies over the step functions. As the number of steps increases along with the sample size, the bias from the approximation disappears. Assume that the covariate lies in a bounded interval. Without loss of generality, it will be taken to be an interval [0,1]. To construct the step function, define a partition 0 = 20 < 2:1 < < z), = 1, where k depends on n and increases with the increase of n. The step function is then defined as k an(z) = 29,-5.9), (21.3) where 13(2) is the indicator function for the jth interval, defined by 13(2) 2 1 if zj_1 < 2 3 z, and zero otherwise. For the fixed partition, the step function is completely specified by the parameters (a,1,--- ,ank). Hence, from here on, an will denote either the function 0:, given by (2.1.3) or, equivalently, the vector oz, depending on the context. The estimate (9,61,) is obtained by maximizing the approximate likelihood formed by substituting (2.1.3) for a in (2.1.1). Since I: is an increasing integer-valued function of n, written as k(n), a, will tend to a. The next two sections show that when k(n) 2 0(71") with i < 7 < %, (6,611,) is consistent and 9 is also asymptotically normal. The first and second partial derivatives of the approximate log-likelihood are used to generate the estimates and their variance. In view of (2.1.1), the first derivative with respect to 6 is 5,0(9, 0,) = A(T,, 9)ean, (2.1.4) l i [6i — F(T,“ Ziagi C{71)} n i=1 F(TiaZi10J1n) and that with respect to am is 1 " [9, —F(T,- Z,- 9 a,)] . ,9, =— ’ ” 1 ,9 a"WI-Z,, (=1,-.- k, S ,J( 70 ) n; F(n,Zi,0,an) \(T7 )6 ]( ) J a where A(t, 0) denote the derivative with respect to 6. The score vector is defined as s,(9,a,).—. ’ ,a . (2.1.6) Sn,k(gaan) The estimates (0, (in) are defined to be a solution to the score equation 5,,(0, an) 2 0. (2.1.7) The derivative of S, with respect to (6, an) is called the Hessian matrix and related to the observed information. This is defined as 6 ”(W = W 5,,(0, 0,), (2.1.8) which is the (k + 1) by (k + 1) matrix of partial derivatives with respect to 0 and a, of the elements of 5,,(6, a,). Let 0 denote the first element. Then the elements of H, are defined by 1 n 61 '— F(T‘hZiag an) " \ a , h00(0,a,,) Z nZ[F (T, Z,,- 9 a ) Mme” "(21) 2:1 7 a n 1 n 6i(D00 71:92:30,011) n,_, F(Tl‘,,Z,-,0a,) ’ 1 n [5. — F(‘Z,'- Z,- 9 a,)]- _ . 0 n : _ 1 , , , j . 071(2') . i hOJ( ,a ) n 1.2—1 F(E,Z,~,9,an) \(T,,0)e 13(2) _ l : 51001013 Z1, 0, an)Ij(Zi) n F(R7Zi307an) , j:17°"7k1 hj0(97an) : h0j(67an)a 3:1: ' ' ' 7k) [6i—F(7117Zi10 071)] 011(21') hJ'J( (9 an) 1%: (T ,Zi,9,an) A(Ti,9)€ Ij(Zi) _ _ 1: 62'D11(7‘ia Zia 63 an)IJ(Zi) n 1701,2919, an) ’ j:11"'aka 46 and hij(gaan):07 z¢j:lamvk7 where D00(t9 216,071) F(t Z 6 a )‘&2(t36)e2an(2)3 (2H19) F t,z,6,an - a , D01(ta 376,071) 2 FEt Z 0 a ;‘/\(t30)1\(t36)82 "(0)7 (2'1'10) F t, 2,9,0, 0 D11(t,z,0,an) = F(t z 6 a ;A2 (t, 0)e 2 "( (2.1.11) and A(t, 9) is the second derivative with respect to 0. Expectation is taken with respect to the true parameters ((90, a0). 2.2 Consistency In order to have the consistency and asymptotic normality of the estimator, we use some assumptions. We call the following assumptions Condition A. (1) The real parameter 60 is an interior point of G. (2) Let T and Z be the supports of T and Z respectively, where Z is a closed interval of 72‘. A(t,0) is bounded away from 0 and 00 over (t,6) E T x N1, where = {0 : |0 — 60| S A} for some 0 < A < 00. The density of (T, Z), h(t, z), is bounded on T x Z, Lipschitz continuous in z uniformly for t E T. (3) The first and second derivatives of A(t,0) with respect to (9, A(t, 6) and A(tfl), exist, are bounded for t 6 PT and 9 6 N1, and continuous in 0 for any fixed t; (4) (10(2) is Lipschitz continuous on Z. For any function b(z) defined on the support of Z, let Hblloo : supzez |b(z)| and ||b||= (/E 2be sup-norm and Lg-norm respectively. 47 In the following, Theorem 3 states the existence of one consistent (in sup—norm) estimator, d, which is a solution to the score equation. Theorem 4 establishes the convergence rate of the estimator (in L2 norm), which will be used to prove the asymptotic normality of the estimator. The proof of them will be given later. Theorem 3 Assume that Condition A holds, and the number of intervals is increas- ing at a rate k(n) : 717, with O < 7 < 1. Assume also that for all k and 010,, with Hag, — aOHoo < A0 for some positive and finite number A0, P(Ij(Z) =1) 2 0(1), kP(IJ-(Z) = 1) > c, '=1,2,--- ,k, (2.2.1) and F(T12190 10011 (WDIKT. Z, 90, 00n)1j(z)) k . E [D00(T, Z, 90,90,“ - Z > c, (2.2.2) 1:1 2 9(mfiflwamuzammmo) E F(T,Z,9o,ao,,) for some 0 < c < 00, not depending on n. Then there is at least one consistent {in sup-norm) solution to (2.1.7), i.e. there exists at least one (d, 61,) such that lé - 90|+H51n - aoiloo = 012(1). The proof is given in Section 2.6. Theorem 4 Assume that the conditions in Theorem 3 holds. Assume also k(n) = n7, - 1 1 with Z < ”y < 5, and E(001(T1Z190.ao))2 E (D11(T1 21001 00)) E [D00(T, Z, 60, 0(0)] — > 0. (2.2.3) Then the estimator (0162,.) in Theorem 3 has the following convergence rate 9—v=4m%ilnwwm=4mdi The proof is given in Section 2.6. 48 2.3 Asymptotic normality of 6 In this section, the asymptotic normality of the estimator is stated and the proof will be given later. Theorem 5 Assume that the conditions in Theorem 4 hold, and 02 defined below is finite. Assume also that the third derivative of A(t, 6) with respect to 6 exists for 6 in a neighborhood of 60, and is continuous at 60. Then fi(é — 60) _T N(01 02), where the asymptotic variance is given by (2.3.1) (E(DOI(T, Z, 90, a0)|z))2)] —1. 2: ED T29 4' —E o ( oo( , , 0:00)) ( E(D11(T,Z,90,040)IZ) The proof is given in Section 2.6. 2.4 Information bound for 60 The true model has two parameters: 6 is finite dimensional, and oz is an infinite- dimensional functional parameter. The semiparametric information bound for esti- mating 6 is based on the maximum of the asymptotic variance bounds of regular estimators for 6 obtained using parametric sub-models of a. It was shown in Section 2.3 that the estimator 6 is asymptotically normal with a certain asymptotic variance. It is shown in this section that this asymptotic variance achieves the bound. Projec- tion methods are used to find the efficient score for the semiparametric model and hence the variance bound (Bickel et al. 1993). The log-likelihood of 6 and (1 based on (T, 6, Z) is given by 0(2) 5109(1 — e“"‘<7¥9>e ) — (1 — 6)A(T, 6)e°(Z). (2.4.1) 49 Consider a general parametric submodel with a = a,, specified by 7 (a real variable), where £a,(z)l,:0 : a(z) for some function a(z) with Ea2(Z) < 00. Take derivatives of (2.4.1) with respect to 6 and 7 at (6 = 60, 7 = 0) to obtain the scores A(Ta 60)600(Z) (2.4.2) and A(T, 60)e°‘°(Z)a(Z) Sa(T,Z.5,90100) "‘2 l6 — F(T,Z,6o,0‘0)l F(T Z 90 010) (2.4.3) To find the information bound, project SO to the linear span formed from all square integrable So. This projection is denoted by 50- and is computed by solving for all 5.. E(SOS,) = E(s,.s,). (2.4.4) Note that the conditional expectation and variance of 6 given (T, Z) is F (T, Z, 60, a0) and F(T, Z, 60,a0)F(T, Z, 60, (10) respectively. Substituting (2.4.2), (2.4.3) for So, S, in the above expression, taking conditional expectation, given (T, Z) first, and then taking expectation with respect to (T, Z), We obtain E(D01 (T, Z, 00, ao)a(Z)) = E(D]1(T, Z, 00, a0)a*(Z)a(Z)), where D01 and Du were defined in (2.1.10) and (2.1.11) respectively. Take conditional expectation, given Z first, and then expectation with respect to Z to obtain E[E(D01(T, Z, 60, a0)|Z)a(Z)] = E[E(D11(T, Z, 00, ao)|Z)a*(Z)a(Z)]. (2.4.5) It is easy to see that (2.4.9) solve (2.4.5) and hence also solve (2.4.4). 50 Therefore, the efficient score is given by 50(T. 2.90100) - Sa‘(T12260100) (6 — F(T, Z,60,(10))800(Z) . , E(D01(T, Z, 00,00)|Z) = AT9—AT9 F(T, Z, 90, 90) l ’ 0) l ’ 0)E(D11(T, Z, 60,ao)|Z) The semiparametric information bound is equal to E [SO(T7 Z: 60-, 0'0) _ Sa'(Ta Z1001QO)]2 and the asymptotic variance bound is the inverse of the information bound. Take the conditional expectation of the square of the efficient score, given (T, Z) first, and then expectation with respect to (T, Z) to obtain E [50(T, Z, 60, (YO) _ 30' (T1 2760100)l2 F(T,Z,00,00)6200(Z) ( E(D01(T12160100)IZ))2 =E A T,6 —AT,6 , F(T,Z,60,a0) ( 0) ( °)E(Du(r,z,9o,ao)|Z) Expand the square term and take the conditional expectation given Z first to obtain that the right hand side of the previous display is equal to E [D00(T, Z, 60,00) - (E(D01(T’Z’00100)IZ))2] E(Dll(T, Z, 90, 00)|Z) In view of (2.3.1), it follows that the asymptotic variance of 6 achieves the asymp- totic variance bound. 2.5 Simulation A simulation study is presented before we go to the proof of the stated asymptotic properties of the estimator. As in Section 1.3, assume that the conditional distribution of X given Z is a Weibull distribution with distribution function 1 -— e_1.60e00(Z) 51 where (10(2) 2 log(z). Also assume that T and Z are uniformly distributed on [1,2] and [02,12] respectively. For each fixed sample size (11230, 60, 100, 200, 500, 1000 respectively) and ap- propriate k’s, 100 samples are generated with the real parameter 60 = 1.5 and 100 replications of the estimate of 60 based on the sieve maximum likelihood estimator (SMLE) are obtained. The means and standard deviations of these estimates are shown in the following table. Table 2. Simulation results for the SMLE 11 w mean s.d. 30 1.6976 1.8180 2.1060 2.0269 2.4360 2.9598 60 1.7064 1.2145 1.8189 1.2680 1.9675 1.4248 100 1.5954 0.8047 1.6427 0.8103 1.6932 0.8502 200 1.5624 0.5154 @030003r500031hv‘kwt—t 1.5838 0.5330 p—I O 1.6240 0.5365 500 03 1.5591 0.2946 00 1.5671 0.2893 b—J 01 1.6076 0.2964 1000 10 1.5432 0.2136 15 1.5530 0.2125 20 1.5651 0.2177 The above table shows that when the sample size is not large, the bias and variance are slightly larger than those of the generalized profile maximum likelihood estimator (see Table 1). However, they decrease with the increase of the sample size, and the variance will be eventually less than that of the generalized profile maximum likeli- hood estimator since it achieves the semiparametric lower bound. Unfortunately, a 52 very large sample size is needed for this to happen. This can be seen when we com- pare the above table with the simulation results for the generalized profile maximum likelihood estimator in Section 1.3. 2.6 Proof of the theorems 2.6.1 Proof of Theorem 3 The definitions of sup-norms for a vector and a matrix are introduced first. If a is a vector with elements a], 1 _<_ j g m, then llalloo = lrsrljgnlajl- If A is an m. by 771. matrix whose (i, j) element is denoted by a,,-, then m ”Alloo = 121,2); (2 Iasl) 1:1 Now define a step function, 00,, of form (2.1.3) as an approximation to 00. Pre- cisely, k(n) 0011(3) : 200(Zj)1j(z)' j=1 The Lipschitz continuity of 00 implies that [[0071 — 00““) = O(k(n)_1). (2.6.1) Let fin = (970111: ' ° ' yank), BOn : (90) (10(21),' ' '1OO(Zk))1 and [30 = (90.00)- 53 Note that 3,,(6n) = 0 is equivalent to 5.11%.) 5,9,) := kSmtw") = 0. (2.6.2) kSn,k(IBn) The derivative of 16,03") with respect to 6,, is h00(,5n) helmn) h0k(/3n) H,(,9,) ;-_— khml'g") 1.9110(9) ,0 g , (2.6.3) khOkwn) 0 0 khkkwn) where h,,- is defined in Section 2.1. The low-right k by k sub-matrix is a diagonal matrix. Let SW") = ESn()6n) (expectations for all the elements). Then, by (2.6.2), (2.1.4), (2.1.5) and the fact that the conditional expectation of 6,- given (T,,Z,-) is F(T,, Z,, 60,00), we obtain EA(T,9)) ~ kE(A(T, 2150, fin)A(T10)Il(Z)) S (5..) = , (2.6.4) kE(A(T, Z, 60, ,6,)A(T, 6)Ik(Z)) where HTZ%%) .TZ 9,=a. ’ ’ -1- 4( , .60., ) e F(T,Z,6,an) By (2.1.2) and Assumption (2), (3) of Condition A, F(t, z,6,a) is Lipschitz in 6,0, uniformly for (t, z) E T x Z. It is easy to see that ||S~'(6n)||00 = 0(1) if “6,, — 60,,“00 = 0(k’1) and P(I,~(Z) = 1) = 0(1) forj = 1, - -- ,k. Let g(fin) = Efilnwn). Similarly, by (2.6.3) and the definition of h,,~(,6,,) (see Section 2.1), 0 g i,j S k, boown) (901(571) ' ' ' bakwn) kbOkWn) 0 "' kbkk(fln) 54 where 600(9,) = E [(R(T, Z, 90, 9,) — 1) A(T, 6)e""(Z)] — E(R(T, Z, 90, ,6,)DOO(T, Z, 9, a,)), 510.03..) =E[(R(T,Z,9o,9n)-1)A(T,6)e“"‘z’1.(Z)] _ E(R(T1ZaflOafln)-D01(TaZigaan)1j(Z))i .721127' ' ' 1k, b,,(,9,) =E [(R(T, Z, 90, 9,) — 1) A(T, 6)e°"(Z)I,-(Z)] — E(R(T? Za fi07/Bn)D11(T, Z) 0: an)I](Z))a 3:1123'H1k7 and F(T7Z100700) Z , = . R(Ta 1/30316 ) F(T, Z, 0,0,1) Notice that - a .. H , = — , . (9 > ,,nsw ) The inverse of 1706,) is as follows ~ [53—1 H “(9.) = R be a suitably chosen function. We are interested in the properties of an estimate 6, over a subset G, of 8 by maximizing the empirical criterion C,(6) 2 £2,221 l(6, Y2), that is, C,(6) = maxgee, C,(6). Here 9, is an approximation to 9 in the sense that for any 6 E 9, there exists 7r,6 E G, such that for an appropriate pseudo—distance p, p(7r, 6, 6) —> 0 as n —-> 00. The following assumptions are needed for the lemma. C0. l is bounded. C1. For some constants A1 > 0 and a > 0, and for all small 6 > 0, .f El/3,Y ~lB’Y >2A 2a. P(5,fio)r>le,669n ((10 l ( )) __ 16 C2. For some constants A2 > 0 and b > 0, and for all small 6 > 0, .f V l Yr—l 1Y<2A2b, PWfiolgcfleen ar( ([30, l (,3 )l _ 26 C3. Let .77, = {l(6, -) — l(7r,60) :6 E 9,}. For some constants r0 < % and A3 > 0, H(€,}',) g A3ri2r°log (1) for all small 6 > 0, C 57 where H (e, f,) is the Loo-metric entropy of the space Tm that is, exp(H(e,.7-',)) is the smallest number of e-balls in the Loo-metric needed to cover the space 17,. Lemma 11 Suppose Assumptions C6 to C3 hold. Then .003, 230) = 0,. (max (n",p(7rnfio. Bo), Kl/(2“)(71nfio,fio))) . where K(rr,60, 60) = E(l(60, Y) — l(7r,60, Y)) and 1—2rQ _ loglqgn 2f b > O. T __: 2a Zalogn ’ — ’ 1—2r - ii, 2f (2 < a. From the proof of Theorem 1 of Shen and Wong (1994), it is noted that the globe maximizer could be replaced by a local maximizer around the real parameter and the convergence rate is still true for the local maximizer. In this situation, the sieve G, is a sequence of shrinking neighborhoods of the real parameter 60. To apply the above Lemma to our case, let Y = (T, Z, 6), 6 = (6,a), 7r,6 = (6, (1,) where a, is of form (2.1.3) with 02,,- = a(z,—). Also let en : {(01072) 3 l6 "' 00] S an: ”an _' QOIIOO < bn}: where a, and b, are chosen such that, with probability approaching 1, (6, (1,) is the maximum point in 9,. Define the metric as follows 903. 60) = l9 - 6ol + Ila - 00“: (246-8) and also define 1(9, Y) = (Slog (1 — e-MT'OW“) — (1 — 6)A(T, 6)eG(Z). Under our assumptions, C0 is true. Note that El(6,Y) 58 Taking Taylor expansion of l ( 6, Y) with respect to 6 and a, noticing that the expec- tation of the first derivative vanishes at 60 and the matrix of the second derivatives is negative definite by (2.2.3), we obtain Em... 1’) — 1(9, Y)) 2 W, 90) (2.6.9) for same finite and positive number c. Hence C1 is satisfied with a = 1. It is easy to see that, under Condition A, 1"av‘(l(/3o. Y) — [(3, Y» S E(l(.30, Y) — ((5, Y))2 S 092(6aljol for some 0 < C < 00. Thus C2 holds with b = 1. Since for all y, |l(1'3. y) - 1(130. y)! S 6‘019 - 90] + Ila - aolloo). for some 0 < C < 00, not depending on y, it is easy to see that FIR-.75.) S H(€/Ca9n), where H (7),9,) is the metric entropy of the space 9, with respect to the norm |6—60|+ [|a—ao||oo. Since 9, is a sequence of shrinking neighborhoods of 60 = (60, do), there exists a positive and finite number C0 such that [6| S Co and “oz,“00 S C0, (6,0,) 6 9,, and a, of form (2.1.3). For any 77 > 0, divide the interval [0,C0] into small intervals, with length 71/2 or less, such that the number of intervals is less than or equal to %Q + 1. Then, it is easy to see that . k(n) H(-n, 9,) 3 log ((2% +1) (3% +1) ) g Ck(n)log (%), for some positive and finite constant C, as r} is small enough. Hence, for small 6 > 0, H(e,}',) g Ck(n)log(-71-7-) = Cnllog (%), 59 for some positive and finite number C. Therefore C3 is satisfied with r0 : %. Apply the above lemma, we obtain p(6, 60) = O, (max (n—T,p(7r,60,60),K71i(7r,60,60))) , (2.6.10) where 1 — 7 loglogn T:———— 2 2logn ‘ Note that, for large n, i < r < 3 as i < 7 < %. Since 60 = (90,00), 7rM30 = (90,007.), where do, is of form (2.1.3), we obtain that, by (2.6.8) and (4) of Condition A, 102(7Tnfloafio) = HO‘On — aoll2 g CHM—2 = 071—277 for some positive and finite number C. Thus p(7r,60,60) S Cn’”. (2.6.11) The same argument as that leading to (2.6.9) gives that, for some finite and positive number C, K(Tfnfio, (30) = E(l(,60, Y) — l(flnflo,Y)) S Cllao, — (lollz 1' Cn"27, (2.6.12) which is of order between o(n’i‘) and o(n 1) for] < 7 <2 — .It follows from (2.6.10), (2.6.11) and (2.6.12) that, for i < 7 < %, 1063,60) = OAR—i). The theorem is proved. 2.6.3 Proof of Theorem 5 S,,0(6,a) was defined in (2.1.4) and further denote, for a function a on Z with E(a(Z))2 < oo —F(T,~,Z,,6, a) S"(6’a)[a]: 6F( T, z 9 a) A(Y},9)e"(z')a(Z-1). 60 where F(t, 2,6,0) is defined in (2.1.2). Denote the expectation of S,,0 and S,[a] by S0 and S [a] respectively. Since the conditional expectation of 6,- given (T,, Z,) is F(T,, Z,, 60, (10), we obtain 50(9, (1) = E [F(T’ 2,925,655,211? 2,9,)MT, 9)e°