TESTS OF HOMOGENEITY IN TWO-COMPONENT MIXTURE MODELS By Wei-Wen Hsu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics 2011 ABSTRACT TESTS OF HOMOGENEITY IN TWO-COMPONENT MIXTURE MODELS By Wei-Wen Hsu In many applications of two-component mixture models such as zero-inflated models and cure rate models, it is often of interest to conduct inferences for the mixing weights. Score and Wald tests have been particularly useful for this purpose. But the existing testing procedures often rely on restrictive assumptions such as the constancy of the mixing weights under the alternative. In this thesis, we suggest an extension to covariate dependent mixing weights, which is very typical in real applications of these finite mixture models. We discuss these tests under the two representations of the mixture model, the marginal and the hierarchical representations. Under the marginal representation of the mixture model, we propose a technique based on a decomposition of the mixing weights into terms that have an obvious statistical interpretation. We make good use of this decomposition to lay the foundation of the test. We apply these techniques to both the score and Wald tests. An important feature of the score tests is that they do not require the alternative model to be fitted. However, the advent of computer software with robust functions and procedures has made simple a routine fitting of the alternative model in practice. We make good use of this opportunity by using results generated from these fits to develop a Wald test statistic to evaluate homogeneity in this class of models. We show how the proposed Wald test can be performed with a minimal programming effort for the practicing statistician. Simulation results show that the proposed score test and Wald test statistics can greatly improve efficiently over test statistics based on constant mixing weights. A real life example in dental caries research is used to illustrate the methodology. The tests of homogeneity developed under the marginal representation of the mixture model are usually based on the two-sided alternatives with potentially negative mixing weights. These tests, however, are not valid for two-component models which maintain the hierarchical representation of the mixture model. But in practice, zero-inflated models that maintain this representation are usually fit. To assess the inclusion of mixing weights in two-component models under the hierarchical representation, we adopt a general formulation where the mixing weight is allowed to depend on covariates. A complication is that some nuisance parameters are unidentified under the null. One possible solution to the identifiability problem is to fix these nuisance parameters conditional upon which the pivotal function results in processes of these parameters. We extend this idea by deriving a score-based test statistic from these processes. We establish the limiting null distribution of this statistic as a functional of chi-squared processes which is rigorously approximated by a simple resampling procedure. We apply these tests to zero-inflated models for discrete data. Our simulation results also show that the proposed tests can greatly improve efficiently over tests based on constant mixing weights. The practical utility of the methodology is illustrated using a real life data examples, the dental caries data. ACKNOWLEDGMENT I owe my deepest gratitude to my advisor Professor David Todem for his guidance and encouragements during my years as a student at Michigan State University. I especially thank Dr. Todem for the financial support from his various funded projects. It is a great honor for me to work with him and to be his student. I would also like to show my gratitude to Dr. Yuehua Cui for stimulating interesting discussions about my research work and providing valuable help for my job search. I also wish to thank my other committee members Dr. Lijian Yang and Dr. Connie Page, for their many comments and suggestions. And a special thanks goes to Dr. Hira Koul for his valuable and helpful suggestions. Their input improved my research, which resulted in a much better final product. I am further grateful to Dr. Jessica V. Barnes from the University Outreach and Engagement, Michigan State University, for her encouragements and financial support during the second and third year of my PhD work. During my tenure as a research assistant in her research team, I gained a valuable experience in analyzing large data sets. I am very grateful to Professor Lijian Yang for his encouragements and, most importantly, for providing technical help during my preparation of the Statistics PhD Preliminary exam. Without his help, it would have been very hard for me to succeed on my own in this challenging exam. Next, I would like to acknowledge the encouragement of my best friend Robert Chuang, and the support of my older sisters Pei-Feng, Pei-Jing, father Hsin-Yi and mother Tsai-Feng. Without your support and encouragements, my dream will never get to come true. Special thanks to my wife Shu-Wei Fang and my two lovely daughters, Chelsea Hsu iv and Adeline H. Fang. Because of their love and support, I overcame all obstacles during challenging moments and completed all requirements of the degree. This accomplishment truly belongs to them. For all of this, I would like to tell them this: “ Thank you so much and I love you all.” v TABLE OF CONTENTS List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction 1.1 Background and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Principles for solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Tests of homogeneity under the marginal representation 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The method . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Hypothesis formulation and reparameterization . . 2.2.2 Generalized score tests for homogeneity . . . . . . . 2.2.3 Generalized Wald tests for homogeneity . . . . . . . 2.3 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Empirical performances of proposed score tests . . 2.3.2 Empirical performances of proposed Wald tests . . 2.3.3 Dental caries data . . . . . . . . . . . . . . . . . . . 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Tests of homogeneity under the hierarchical representation 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The main framework . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Hypothesis formulation and reparameterization . . . . 3.2.2 Covariate-adjusted score statistic . . . . . . . . . . . . 3.2.3 Large sample properties . . . . . . . . . . . . . . . . . 3.2.4 Specification of the support set Γ . . . . . . . . . . . . 3.3 Numerical Study . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 The score test for zero-inflated Poisson models . . . . . 3.3.2 Application to Dental caries data . . . . . . . . . . . . 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Summary, discussion and future study 4.1 Summary . . . . . . . . . . . . . . . . 4.2 A score-type test for positive functions 4.3 Bias-Adjustment of the score test . . . 4.4 Future Study . . . . . . . . . . . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 5 6 . . . . . . . . . . 7 7 10 10 16 18 21 21 25 28 35 . . . . . . . . . . 37 37 41 41 44 46 50 51 52 57 62 . . . . 64 64 66 72 75 Appendices 77 A Calculation of the information matrix A.1 Non-degenerate Poisson model with y ∗ = 0 . . . . . . . . . . . . . . . . . . . A.2 Non-degenerate binomial model with y ∗ = 0 . . . . . . . . . . . . . . . . . . 78 79 80 B Example SAS code for the Wald test 81 C Proofs of Theorems C.1 Large sample properties of score functions . . . . . . . . . . . . . . . . . . . C.2 Justification of the resampling procedure . . . . . . . . . . . . . . . . . . . . 83 83 86 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 vii LIST OF FIGURES 2.1 Probability mass functions fitted under the null model (homogeneous negative binomial distribution) by age-group for the dental caries data . . . . . . . . viii 32 LIST OF TABLES 2.1 2.2 2.3 2.4 2.5 2.6 2.7 t Natural parameterizations to relate πi to linear predictor zi γ for some commonly used two-component mixture models and link functions relating µi to linear predictor xt β when y ∗ = 0 . . . . . . . . . . . . . . . . . . . . . . . . i Empirical size of score test statistics when the null model is a non-degenerate ∗ Poisson model with mean µ∗ = exp{β0 − 1.45xi }, xi ∼ Uniform(0, 1), at 5% i significance level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ∗ Empirical power of score test statistics to detect various forms ωi of heterogeneity coupled with a non-degenerate Poisson model with mean µ∗ = i ∗ exp{β0 − 1.45xi }, xi ∼ Uniform(0, 1), at 5% significance level . . . . . . . . Finite sample size for Wald tests at 5% significance level based on 1000 Monte ∗ Carlo samples when negative binomial model with mean µ∗ = exp{β0 + i 1.45xi }, overdispersion parameter κ = 1.1, xi ∼ Uniform(0, 1) . . . . . . . . Power for Wald tests at 5% significance level based on 1000 Monte Carlo ∗ samples when negative binomial model with mean µ∗ = exp{β0 + 1.45xi } i and overdispersion parameter κ = 1.1, xi ∼ Uniform(0, 1) . . . . . . . . . . . Score test and Wald test statistics, degrees of freedom and associated p-values for heterogeneity in dental caries data when the null model is a non-degenerate negative binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . Model fitting for the number of decayed surfaces under H0 and various H1 when baseline distribution is negative binomial with mean µi = exp{β0 + β1 ∗ Agei + β2 ∗ SIi + β3 ∗ Agei ∗ SIi } and overdispersion parameter κ . . . . . . 15 22 23 26 27 31 34 3.1 Size of score-based test statistics assuming a homogeneous model . . . . . . 54 3.2 Power of score-based test statistics assuming well specified models . . . . . . 56 3.3 Power of score-based test statistics under various model misspecifications . . 58 3.4 Score-based test values (p-values) for dental caries data . . . . . . . . . . . . 60 4.1 Empirical size of the bias-adjusted score test statistics assuming a homoge∗ neous Poisson model with mean λ∗ = exp{θ0 + 0.4x1i + 0.5x2i }, at 5% i significance level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 ix Chapter 1 Introduction 1.1 Background and objectives Under a two-component mixture model, observations are generated from two sources, representing a heterogeneity in the population. The mixing weight measures the extent of this heterogeneity in the population. In the sequel, the term heterogeneity will then be used to characterize this mixture in distribution. Likewise, the term homogeneity will be used to characterize a population generated from a single-component distribution. In other words, under a homogeneous model, the mixing weight equals to zero. Two-component mixture models provide an interesting parametric framework to accommodate heterogeneity in the population. Typical examples of such mixture models include the class of zero-inflated models for discrete data and the cure models in survival analysis. Zero-inflated models for discrete data have been extensively studied in statistical research and applied in the medical, social, and behavioral sciences. These models view data as being generated from a mixture of a zero point mass and a non degenerate discrete distribution. One special case of this mixture model is the zero-inflated Poisson model first studied by 1 Mullahy (1986), Farewell and Sprott (1988) and Lambert (1992) and its extension to dependent discrete data by Hall (2000). Another important example is the zero-inflated binomial model (Lambert, 1992 and Hall, 2000). These two-component models allow for two types of representations, depending on the support of the mixing weights. Under its hierarchical representation, the mixing weights are probability mass, requiring the distribution constraints of being bounded between 0 and 1. Under the marginal representation, however, the only distributional constraint is imposed on the marginal distribution, which eventually may result in potentially negative mixing weights. The marginal model, therefore, can be used to accommodate both zero-inflated data (positive mixing weight) and zero-deflated data (negative mixing weight), a property unfortunately not shared by its hierarchical counterpart. It should be strongly emphasized, however, that the marginal model does not allow any hierarchical interpretation of the mixture model when the mixing weight is negative. A prevailing concern for this class of models, however, is whether the inherent heterogeneity is consistent with observed data. In other words, one is typically interested in whether the inclusion of mixing weights is necessary for these mixture models. It is well established that mixture models are prone to efficiency loss, especially when heterogeneity does not hold. Fitting homogeneous models, however, may yield biased inferences if observed data support heterogeneity. Many authors have examined this issue in a variety of settings using score statistics. In a Biometrics paper, van den Broek (1995) has examined this important issue using a score test statistic. This test was derived under the marginal representation of the mixture model that ignores its hierarchical representation, but potentially could improve the model fit. While this approach is valid in principle when the score function is evaluated at the null value, it is often unclear how the constraints of the mixing weights from the marginal model are accounted for in the testing scheme. Most importantly, existing methodologies 2 have not considered a general structure of the mixing weights. One common limitation is that constant mixing weights are often assumed under the alternative hypothesis. From a practical perspective, this is an important limitation as covariates dependent mixing weights are very typical in real applications of zero-mixture regression models. In this thesis, we suggest an extension of the test to covariates. A recent illustration of this approach for zeroinflated models that do not maintain their hierarchical representation is given by Jansakul and Hinde (2009). These authors related the mixing weights to covariates using an identity link function. A limitation, however, is that the identity link function is seldom used in practice for this class of models. For marginal models that maintain their hierarchical representation, we are not aware of any test to assess for heterogeneity that depends on covariates. In this thesis, we also suggest a new test which assumes heterogeneity depends on covariates. Complications are that the implied hypotheses may not be typical and standard regularity conditions to conduct the test may not hold. There may be; i) hypothesized parameters that are evaluated at infinity, leading to parameters under the null being on the boundary of the parameter space; ii) onesided composite hypotheses under the alternatives; iii) some nuisance parameters that are unidentified under the null while others are identified; and iv) an uncertainty regarding the support space of the unidentified nuisance parameters under the null. From a methodological standpoint, there have been several papers in the literature which examine related issues and complications. But these works accommodate only one or two of these complications. We are not aware of any testing procedure that addresses all these complications simultaneously. The problem of parameters under the null being on the boundary of the parameter space has been extensively discussed in the literature. A seminal work on the topic is that of Chernoff (1954) and more recent papers include those of Self and Liang (1987), and 3 Andrews (2001) and references therein. Several papers also consider score tests for one-sided alternatives (see, for, example, Silvapulle & Silvapulle, 1995; and Verbeke & Molenberghs, 2003). But these papers are often discussed in situations where the nuisance parameters are fully identified under the null. Various approaches to the problem of unidentifiable nuisance parameters under the null have appeared in the literature. One approach requires the unidentifiable nuisance parameters to be estimated from data under the alternative hypothesis (see, for example, Conniffe, 2001). Another popular approach to address nonidentifiability under the null is to fix the nuisance parameters conditional upon which the pivotal function results in processes of these parameters (see, for example, Davies, 1977; Hansen, 1996; Andrews, 2001; Ritz & Skovgaard, 2005; Zhu & Zhang, 2006; and Song et al., 2009). This approach has been by far the most popular method for dealing with unidentifiable parameters under the null. Finally, in papers which discuss the issue of unidentified nuisance parameters under the null, it is often assumed that the support space of these parameters is known to the analyst. For example, when testing the hypothesis that q ≥ 2 random effects variances are equal to zero in a random coefficients model, unidentified nuisance parameters under the null are typically the correlations among random coefficients which have a well defined space (see Zhu and Zhang, 2006 for a specific example). In our testing problem as formulated in Chapter 3, the support of the nuisance parameter may not be known to the analyst a priori. 4 1.2 Principles for solutions We formulate and develop a generalized score test of homogeneity based on an intuitive approach that accommodates the features of the marginal model. Specifically, we embed the marginal model structural constraints of the mixing weights into the procedure. The technique is based on decomposing the mixing weights into terms that have an obvious statistical interpretation. We make good use of this decomposition to lay the foundation of the test. One appealing feature of this decomposition is that it naturally incorporates covariates into the mixing weights. We show that this is a natural strategy to adopt when testing is carried out under the marginal model that ignores the hierarchical representation of the mixture distribution. We further show that the test in its current formulation, can be naturally used to study heterogeneity at any point of the population and for any nondegenerate parametric distributions that are not necessarily members of the exponential family. For mixture models that maintain their hierarchical representation, we extend the idea of pivotal functions that result in processes of unidentified parameters to evaluate the hypothesis of interest. Specifically, for each fixed value of the unidentified parameters, we construct a modified score function in the spirit of Silvapulle and Silvapulle (1995) to capture onesided alternatives. The statistic is then constructed by mapping functionals, such as the supremum function, of these score-based processes on the whole support of the unidentified nuisance parameters to the real line. We approximate the limiting null distribution of this statistic using a simple resampling procedure. 5 1.3 Outline of the thesis In chapter 2, we describe the generalized tests of homogeneity in two-component models under the marginal representation of the mixture. We specify the homogeneity hypothesis at any point of the population and for inference, we develop a generalized score test and a Wald test that accommodate the features of the marginal model. The empirical performance of the test is studied and its real life applications are illustrated using dental caries counts in young children. Some remaining issues are discussed in the last section of the chapter. In chapter 3, a generalized one-sided score test for homogeneity of zeros against alternatives from the mixture models that maintain their hierarchical representation with covariates dependent mixing probabilities is described. We discuss the test for the class of zero-inflated models for discrete data as a special case. For this special case, we conduct numerical studies to evaluate the size and power of the proposed testing procedure. We illustrate its practical utility using a real life example in dental caries research. Finally in chapter 4, we summarize our work and discuss; i) how a supremum score statistic can be used to evaluate non-standard parametric restrictions that are positive in the maintained hypothesis, and ii) an adjustment of the supremum score test to reduce its conservativeness in small to moderate sample sizes. Particularly, the first extension mentioned above is presented in a more general framework which does not necessitate the full distribution of the population to be fully specified. Instead, an estimating function for a finite dimensional component of the model is assumed but other aspects of the model which may be described by infinite dimensional parameters are left completely unspecified. Finally, we present some topics that could be investigated in future work. 6 Chapter 2 Tests of homogeneity under the marginal representation 2.1 Introduction Suppose that a sample of n independent realizations {y1 , · · · , yn } of some random variable Y is drawn from Pi , a mixture of two distributions Q and G, related in the simple form, Pi (yi ) = ωi Q(yi ) + (1 − ωi )G(yi ), i = 1, · · · , n, (2.1) where ωi is an unknown mixing weight, Q is an atom distribution at a known point y ∗ and G is an unknown non-degenerate discrete distribution. Under this mixture model, observations are generated from two sources, representing a heterogeneity in the population. The mixing weight ωi measures the extent of this heterogeneity in the population. In the sequel, the term heterogeneity will then be used to characterize this mixture in distribution. Likewise, the term homogeneity will be used to characterize a population generated from a single7 component distribution. In other words, under a homogeneous model, the mixing weight ωi equals to zero. Special cases of the model (2.1) where y ∗ = 0 include zero-inflated models for independent data first studied by Mullahy (1986), Farewell and Sprott (1988) and Lambert (1992) and their extension to correlated data by Hall (2000). This class of models have been extensively studied in statistical research and applied to data from various disciplines including agriculture, econometrics, medicine, engineering, sociology and behavioral sciences. Ridout, Demetrio and Hinde (1998) provide an extensive review of this literature. The mixture model (2.1) allows for two types of representations, depending on the support of ωi . Under its hierarchical representation, the mixing weight is a probability mass, requiring the distributional constraint 0 ≤ ωi ≤ 1, assuming dependence on subject i = 1, · · · , n. Under the marginal representation, however, the only distributional constraint is that 0 ≤ pi (yi ) ≤ 1, which results in the constraint on the mixing weights, −gi (y ∗ ) ≤ ωi ≤ 1, i = 1, · · · , n, 1 − gi (y ∗ ) (2.2) where the pi (.) and gi (.) denote the probability mass functions of the distribution Pi and G, respectively. The gi (y ∗ ) is the probability mass function of G evaluated at y ∗ . The marginal model can be used to accommodate both inflation (positive ωi ) and deflation (negative ωi ) at y ∗ , a property unfortunately not shared by its hierarchical counterpart. The marginal model, however, does not allow any hierarchical interpretation of the mixture model when the mixing weight is negative. For further discussions on the interpretation of these models when y ∗ = 0, see Heilbron (1994). A fundamental question in these two-component mixture models is whether or not the 8 inclusion of mixing weights is necessary. Many authors have examined this important issue using two-sided score tests, especially when y ∗ = 0. In a Biometrics paper, van den Broek (1995, 738-748) has developed a two-sided score test to assess for homogeneity in zero-inflated models for count data. Others have extended this test to situations where the non-degenerate distribution is a member of the exponential family (Deng and Paul, 2000) and to clustered data (Xiang et al., 2006). The original test and its extensions were constructed under alternatives to homogeneity that allow for negative mixing weights. Specifically, they were derived under the marginal representation of the mixture model that ignores its hierarchical representation. While this approach uses score functions that are well defined at the null value, it is often unclear how the constraints in (2.2) are accommodated under the alternative. Most importantly, existing testing procedures often do not fully make good use of the general structure of the mixing weights. A common restriction is that constant mixing weights are often assumed under heterogeneity. From a practical perspective, this is an important limitation as covariates dependent mixing weights are very typical in real applications of these two-component mixture models. Such tests may lack power to detect heterogeneity if both deflation and inflation are present in the population. In this chapter, we suggest an extension of the test to covariates. A recent illustration of this approach when y ∗ = 0 is given by Jansakul and Hinde (2002, 2009). These authors related the mixing weights to covariates using an identity link function. A limitation, however, is that the identity link function is seldom used in practice for this class of models. A common limitation of existing methodologies is that the structure of the marginal model is often ignored or properly not integrated into the testing procedures. As a solution, we formulate and develop a generalized score test and a Wald test of homogeneity based on an intuitive approach that accommodates the features of the marginal model. Specifically, we 9 embed the marginal model structural constraints of the mixing weights into the procedure. The technique is based on decomposing the mixing weights into terms that have an obvious statistical interpretation. We make good use of this decomposition to lay the foundation of the test. One appealing feature of this decomposition is that it naturally incorporates covariates into the mixing weights. We show that this is a natural strategy to adopt when testing is carried out under the marginal model that ignores the hierarchical representation of the mixture distribution. We further show that the generalized score test in its current formulation, can be naturally used to study heterogeneity at any point y ∗ of the population and for any non-degenerate parametric distributions that are not necessarily members of the exponential family. The remainder of this chapter is organized as follows. In section 2.2, we specify the homogeneity hypothesis at any point y ∗ of the population and for inference, we develop a generalized score test and a generalized Wald test that accommodate the features of the marginal model. In section 2.3, the empirical performances of the tests are studied. In section 2.4, their real life applications are illustrated using dental caries counts in young children. Some remaining issues are discussed in section 2.5. 2.2 2.2.1 The method Hypothesis formulation and reparameterization Suppose that a sample of n independent observations, {y1 , · · · , yn }, is potentially generated from the mixture distribution Pi in (2.1) and we are interested in evaluating the hypothesis of zero mixing weights. The constraints on ωi are critical in specifying the alternative hy10 pothesis. Under the marginal representation of the mixture model, one is typically interested in the two-sided hypothesis, H0 : ωi = 0, for all i, vs. H1 : ωi = 0, for some i, where ωi satisfies (2.2). To test this homogeneity hypothesis, we need to find a suitable transformation of ωi that incorporates the constraints in (2.2) into the testing scheme. ∗ ∗ Let δi denote the indicator of yi = y ∗ , i.e. δi = 1, if yi = y ∗ and 0 otherwise. The ∗ expectation of δi is equal to the probability mass of the mixture distribution at y ∗ . This ∗ probability mass pi (y ∗ ) can be estimated very well if many independent copies of δi are available, and also, in practice, we can relate it to covariates by using various link functions, such as logit function. On the other hand, we know pi (y ∗ ) = ωi + {1 − ωi }gi (y ∗ ) from the mixture model (2.1). Therefore, we can use pi (y ∗ ) to connect the mixing weight ωi and potential covariates. More specifically, we consider a quantity πi which is a function of ∗ covariates and let πi = E{δi }, then a suitable transformation of ωi can be obtained from the equation πi = ωi + {1 − ωi }gi (y ∗ ). Thus, a natural transformation in light of these constraints is given by, π − gi (y ∗ ) ωi = i , 1 − gi (y ∗ ) (2.3) where the quantity πi is bounded between 0 and 1. Clearly, the lower and upper bounds of ωi in (2.3) are attained at points 0 and 1 of πi , respectively. The transformation as specified in (2.3) arises from equating the probability mass at y ∗ as predicted by the mixture model (2.1) ∗ to the first moment of δi . This novel parameterization is critical when testing for inflation or deflation at y ∗ . We make good use of this transformation to lay the foundation of the 11 test. Based on the parameterization (2.3), we formally state the homogeneity hypothesis as, H0 : πi = gi (y ∗ ), for all i, vs. H1 : πi = gi (y ∗ ), for some i. We consider a suitable parameterization for πi and gi (y ∗ ) that reduces the homogeneity hypothesis above to a finite dimensional problem. For this, assume that G is a parametric family known up to a parameter vector θ, which we denote by Gθ to highlight this dependence. A good example of Gθ is the basic exponential family with probability mass functions of the form, g(y; θ) = exp{˜(θ)y − ˜ a b(θ)+ c(y)}, where a(θ), ˜ and c(y) are known ˜ ˜ b(θ) ˜ continuously differentiable functions. A non member of the exponential family such as the two-parameter negative binomial distribution also constitutes an example of this parametric family of interest. We assume that µi , the mean of yi , under the non-degenerate parametric model Gθ is finite. We express the probability mass gi (y ∗ ; θ) as a function of µi , gi (y ∗ ; θ) = gθ,y ∗ (µi ), ˙ where gθ,y ∗ (.) is a bounded function between 0 and 1 and indexed by θ and y ∗ . If the ˙ data can be arranged to form strata with a finite, possibly small, number of terms πi and gθ,y ∗ (µi ), the homogeneity hypothesis naturally reduces to a finite dimensional problem. ˙ This stratified approach has its limitations when the number of strata or more generally when the number of covariates is increasing, however. Alternatively, regression techniques relating µi and πi to some observed covariates can be used to address the curse of dimensionality. Consider two column vector covariates xi and zi , of dimensions r and q, respectively, that are also observed alongside the responses yi , i = 1, 2, · · · , n. We relate µi to covariates xi as follows, h(µi ) = xt β, where β is a subset of θ and h(.) a monotone, differentiable and i t invertible function. Likewise, we relate πi to covariates zi as follows, πi = fθ,y ∗ (zi γ), where 12 fθ,y ∗ (.) is a differentiable function indexed by θ and y ∗ . We find a suitable function fθ,y ∗ (.) that translates the homogeneity hypothesis into an equality that involves only parameters β and γ. Note that µi = h−1 (xt β) and under the null πi = gθ,y ∗ (µi ) for all i, which ˙ i by substitution gives πi = gθ,y ∗ (h−1 (xt β)). Thus, a natural function fθ,y ∗ (.) relating ˙ i t ˙ πi to linear predictor zi γ is defined such that fθ,y ∗ = gθ,y ∗ ◦ h−1 , in which case πi = t gθ,y ∗ ◦ h−1 (zi γ), where ◦ represents the composite function operator. We illustrate this ˙ result for commonly used non-degenerate parametric distributions Gθ and link functions h(.). Specific results when y ∗ = 0 are given in Table 2.1. (i) The two-component mixture model when Gθ is a Poisson process with mean µi . Here θ = β and using a log link function log{µi } = xt β, we have gθ,y ∗ (µi ) = ˙ i ∗ exp{− exp{xt β}}{exp{xt β}}y Γ−1 (y ∗ + 1). Thus, a natural parameterization relating πi i i t to zi γ is then given by, ∗ t t πi = exp{− exp{zi γ}}{exp{zi γ}}y Γ−1 (y ∗ + 1). (ii) The two-component mixture model when Gθ is a binomial model with success probability µi and planned number of trials mi . Here θ = β and as˙ suming the logit link function log{µi /(1 − µi )} = xt β, we have gθ,y ∗ (µi ) = mi Cy ∗ {1 + i ∗ ∗ Γ(mi +1) exp{−xt β}}−y {1 + exp{xt β}}y −mi with mi Cy ∗ = ∗ +1)Γ(m −y ∗ +1) . A natural i i Γ(y i parameterization for πi is then given by, ∗ ∗ t t πi = mi Cy ∗ {1 + exp{−zi γ}}−y {1 + exp{zi γ}}y −mi . (iii) The two-component mixture model when Gθ is a negative binomial dis- 13 tribution with mean µi and overdispersion parameter κ. Here θ = (β, κ) and asΓ(y ∗ +κ−1 ) suming the log link function log{µi } = xt β, we have gθ,y ∗ (µi ) = ˙ {1 + i Γ(κ−1 )Γ(y ∗ +1) ∗ κ exp{xt β}}−1/κ {1 + κ−1 exp{−xt β}}−y . A natural parameterization for πi is then, i i πi = ∗ Γ(y ∗ + κ−1 ) t t {1 + κ exp{zi γ}}−1/κ {1 + κ−1 exp{−zi γ}}−y . Γ(κ−1 )Γ(y ∗ + 1) t Proposition 1. Under the parameterizations µi = h−1 (xt β) and πi = gθ,y ∗ ◦ h−1 (zi γ), ˙ i the null hypothesis can be formulated as, t H0 : gθ,y ∗ ◦ h−1 (zi γ) = gθ,y ∗ ◦ h−1 (xt β), for all i, ˙ ˙ i (2.4) which further reduces to a linear contrast involving parameters β and γ. Proof. To show that this proposition is true, we first consider the situation where gθ,y ∗ (.) ˙ as a function of µi is invertible. Therefore, the composition gθ,y ∗ ◦ h−1 is invertible and the ˙ t null hypothesis in (2.4) then reduces to the simple form xt β = zi γ for all i. This ultimately i reduces to a linear contrast of parameters β and γ. In particular, when xi = zi , for all i, the null hypothesis is given by H0 : β = γ. Now we assume that gθ,y ∗ (.) is not invertible. ˙ If xi = zi , for all i, the identifiability of the non-degenerate model Gθ requires that β = γ under the null. When xi = zi , for some i, the identifiability condition of the non-degenerate model Gθ ensures that the null reduces to a linear contrast that involves only the parameters β and γ under the null. 14 t Table 2.1: Natural parameterizations to relate πi to linear predictor zi γ for some commonly used two-component mixture models and link functions relating µi to linear predictor xt β when y ∗ = 0 i Non-degenerate distribution Gθ Link function h(.) relating µi to xt β i = gθ,0 (µi ) ˙ Parameterization t relating πi to zi γ Poisson with mean µi log{µi } exp{− exp{xt β}} i t exp{− exp{zi γ}} logit{µi } Φ−1 (µi ) log{− log{µi }} {1 + exp{xt β}}−mi i {1 − Φ(xt β)}mi i {1 − exp{− exp{xt β}}}mi i t {1 + exp{zi γ}}−mi t {1 − Φ(zi γ)}mi t {1 − exp{− exp{zi γ}}}mi log{µi } {1 + κ exp{xt β}}−1/κ i t {1 + κ exp{zi γ}}−1/κ Binomial with success probability µi and mi trials Negative Binomial with mean µi and overdispersion parameter κ gi (0; θ) Note: θ = β for the Poisson and the binomial processes θ = (β, κ) for the negative binomial process 15 2.2.2 Generalized score tests for homogeneity Without any loss of generality, we will develop the test for the case where xi = zi , although our method generalizes quite naturally for any xi and zi . Under this assumption, our homogeneity hypothesis then becomes, H0 : β = γ. Let C be a selector matrix such that Cθ = β, the homogeneity hypothesis is then equivalent to, H0 : Cθ = γ. Under the alternative model, the log likelihood function, given observations {y1 , · · · , yn }, as a function of θ and γ is given by, n ℓn (θ, γ) = i=1 ∗ ∗ δi log{πi } + (1 − δi ) log{1 − πi } ∗ + (1 − δi ) log gi (yi ; θ) 1 − gi (y ∗ ; θ) . We assume the following change of variables, α = Cθ − γ, with α = 0 under the null hypothesis. We denote by un (θ, α) the score function ∂ℓn (θ, Cθ − α)/∂(θ, α). To construct the score test statistic and its limiting null distribution, we set the following conditions. C1. The support set of (θ, α) is compact. C2. Assume a nonsingular matrix I(θ, α) such that −n−1 ∂un (θ, α)/∂(θ, α) = I(θ, α) + op (1). C3. For all a > 0, sup ζ ≤a {n−1/2 [un ((θ, α) + n−1/2 ζ) − un (θ, α)] + I(θ, α)ζ} = op (1). Here op (1) represents the convergence in probability as n → ∞. Under Conditions C1-C3, standard asymptotic results give, n−1/2 un (θ, α) →d N (0, I(θ, α)), where →d represents the convergence in distribution, as n → ∞ (see Cox and Hinkley, 1974, p. 311-343). Assume the following decomposition, un (θ, α) = (ut (θ, α), ut (θ, α))t where the entries are the n,α n,θ first-order derivatives of the log-likelihood function with respect to θ and α, respectively. 16 Let Iθθ (θ, α), Iθα (θ, α) and Iαα (θ, α) be the corresponding blocks in I(θ, α). The main building block of the test statistic for any parametric probability mass function gi (yi ; θ), is given by, un,α (θ, α) = ∂ℓn (θ,Cθ−α) = ∂α = n ∗ 1 ∂πi ∗ −1 ∂πi i=1 δi πi ∂α + (1 − δi ) 1−πi ∂α ∗ 1 δi −πi ∂πi . n i=1 πi 1−πi ∂α Note that πi in this expression depends on both θ and α due to the change of variables. The other building block I(θ, α) is in general tedious to compute. In appendix, we show details of these calculations for two well known members of the exponential family under the condition y ∗ = 0. Let bn (θ) define the score function with respect to θ under the null distribution Gθ . ˆ We assume the existence of a root-n consistent estimator θ of θ∗ , the true value of θ unˆ der the null distribution Gθ , such that n1/2 (θ − θ∗ ) = n−1/2 I −1 (θ∗ , 0)bn (θ∗ ) + op (1). θθ The core of the score statistic to evaluate the hypothesis α = 0 is given by un,α = ˆ ˆ un,α (θ, 0). For Poisson and binomial non degenerate distributions with θ = β, these cores at ∗ δi −exp{−ˆi } µ n ∗ = 0 are respectively given by, u and un,α = ˆ y ˆn,α = xi µi ˆ i=1 1−exp{−ˆi } µ ∗ δi −(1−ˆi )mi µ n ˆ . In these expressions, µi takes values exp{xt θ} and ˆ xi m i µ i ˆ mi i=1 i 1−(1−ˆi ) µ ˆ {1 + exp{−xt θ}}−1 for the Poisson and binomial non degenerate distributions, respectively. i The asymptotic distribution of un,α can be derived by applying a standard Taylor series ˆ expansion coupled with the law of large numbers, n−1/2 un,α = n−1/2 ˆ −1 ∗ n ∗ ∗ t ∗ i=1 ui,α (θ , 0) − Iθα (θ , 0)Iθθ (θ , 0)bi,n (θ ) + op (1), 17 where ui,α (θ∗ , 0) and bi,n (θ∗ ) are, respectively, the random contributions of subject i to the score functions un,α (θ∗ , 0) and bn (θ∗ ). The right-hand side of this equation consists of sums t of independent random vectors vi (θ∗ ) = ui,α (θ∗ , 0)−Iθα (θ∗ , 0)I −1 (θ∗ , 0)bi,n (θ∗ ) for which θθ the multivariate central limit theorem applies. That is, n−1/2 un,α →d N (0, Λ(θ∗ )), where ˆ n E(v ⊗2 (θ∗ )), with v ⊗2 (θ∗ ) = v (θ∗ )v t (θ∗ ). A straightforward i i=1 i i i t calculation gives Λ(θ∗ ) = Iαα (θ∗ , 0)−Iθα (θ∗ , 0)I −1 (θ∗ , 0)Iθα (θ∗ , 0) for which a consistent θθ Λ(θ∗ ) = limn→∞ n−1 ˆ ˆ estimator is obtained by replacing θ∗ by its estimator θ. From the estimator Λ(θ) of Λ(θ∗ ), we construct a score test statistic as, ˆu sn = n−1 ut Λ−1 (θ)ˆn,α →d χ2 under H0 as n → ∞, ˆn,α r where r = q. The convergence in distribution as n → ∞ follows immediately from the Slustky Theorem. The null hypothesis is rejected for unusual large values of sn . 2.2.3 Generalized Wald tests for homogeneity Tests of homogeneity in zero-mixture models for discrete data have been well discussed in the literature, but the existing methodologies have relied primarily on score statistics. An often cited justification for the use of these statistics is that they do not require the model to be fitted under the alternative. But the advent of computer software with robust functions and procedures has made easy for these alternative models to be fitted routinely in practice. In this section, we make good use of this opportunity by using results generated from a computer software to develop a Wald test statistic to evaluate homogeneity in this class of models. Regardless of model fitting under H0 , we can fit the zero mixture model under the alternative easily via software build-in procedure or function, such as NLMIXED in SAS 18 and MaxLik in R. We want to demonstrate the proposed Wald test can be performed with a minimal programming effort for the practicing statistician. Technically, the test is based on the same reparameterization of the mixing weights that we introduced in the previous section. The reparameterization translates the homogeneity hypothesis into a linear combination of parameters from the marginal model. One appealing feature of this reparameterization is that it naturally incorporates covariates into the mixing weights, a characteristic often ignored by existing testing procedures. A simulation study is conducted to evaluate the empirical performance of the proposed Wald test statistic in the next section. Under the marginal representation of the mixture model, one is typically interested in the two-sided hypothesis, H0 : ωi = 0, for all i, vs. H1 : ωi = 0, for some i, where ωi satisfies (2.2). In general, one could follow standard theory to construct a Wald test for evaluating the homogeneity, and the test statistic is given by, wt {cov(w)}−1 w, ˆ ˆ ˆ where w = (ω1 , ω2 , · · · , ωn )t is the mixing weight vector and w is the maximum likelihood ˆ estimate (MLE) of w under the alternative. However, the covariance matrix cov(w) may ˆ be a singular matrix when the sample size is large, because the dimension of the covariance matrix depends on the sample size n. 19 Therefore, we still adopt the reparameterization (2.3) introduced in the previous section to translate the homogeneity hypothesis into a new hypothesis only involving finite parameters. Without any loss of generality, we develop the test assuming xi = zi . Under this assumption and the reparameterization (2.3), our homogeneity hypothesis then becomes H0 : β = γ. Let ϑ = (θ, γ) and C is a max{dim(θ), dim(γ)} × dim(ϑ) contrast matrix such that the homogeneity hypothesis is equivalent to, H0 : Cϑ = 0. Under some regularity conditions, those ˆ estimates, ϑ, have asymptotic normality, and the Wald test statistic can be rewritten as follows, ˆ ˆ ˆ (Cϑ)t {cov[Cϑ]}−1 (Cϑ), (2.5) which follows a χ2 distribution with the degree of freedom of rank(C) under H0 . More specifically, for example, we consider a two-component mixture model when Gθ is a Poisson process with mean µi = exp{xt β}, i = 1, · · · , n and θ = β. Here ϑ = (β t , γ t )t . i Assuming xi and zi are same covariates and dim(β) = dim(γ) = q, then the contrast matrix is C = (Iβ , −Iγ ), where Iβ and Iγ are identity matrices with rank of q. Thus, the null hypothesis is, H0 : Iβ β − Iγ γ = 0, (2.6) and the Wald test statistics (2.5) becomes, ˆ ˆ ˆ (Iβ β − Iγ γ )t {(Iβ , −Iγ )I−1 (ϑ)(Iβ , −Iγ )t }−1 (Iβ β − Iγ γ ), ˆ ˆ n 20 (2.7) ˆ ˆ ˆ ˆ where the ϑ = (β, γ ) is a MLE of ϑ, In (ϑ) is the Fisher information matrix when ϑ is ˆ replaced by ϑ. The test statistics has a χ2 distribution with the degree of freedom q, the rank of C, under H0 . 2.3 Numerical study The two representations of the mixture model (2.1) have some technical implications on the data generating mechanism. When ωi is positive, the mixture model maintains its hierarchical representation and data can then be generated using the usual two-stage process. However, for negative ωi the mixture model looses its hierarchical representation and the data can not be generated from the two-stage process. Instead, data are generated directly from the marginal distribution Pi by inverting the cumulative distribution function of a uniform distribution on (0, 1). 2.3.1 Empirical performances of proposed score tests We conducted a simulation study to evaluate the empirical performances of the proposed generalized score test in small to moderate sample sizes. We compared these performances to those of the tests proposed by van den Broek (1995), and Jansakul and Hinde (2009). Throughout our simulations, we set y ∗ = 0 and assumed that the non-degenerate distribution ∗ is a Poisson model with a simple conditional mean µ∗ = exp{β0 −1.45xi }, where the intercept i ∗ β0 takes values in the set {−0.75, 0, 0.75} and xi is a covariate generated from a uniform distribution on (0, 1). Throughout our simulations, we performed the generalized score test assuming the working mixing weight ωi = {πi − exp{−µi }}{1 − exp{−µi }}−1 under the alternative, where πi = exp{− exp{γ0 + γ1 xi }} and µi = exp{β0 + β1 xi }. The test of 21 Table 2.2: Empirical size of score test statistics when the null model is a non-degenerate ∗ Poisson model with mean µ∗ = exp{β0 − 1.45xi }, xi ∼ Uniform(0, 1), at 5% significance i level n=50 ∗ β0 ω∗ = 0 vdB J&H Gen. n=100 n=200 -0.75 0 0.75 -0.75 0 0.75 -0.75 0 0.75 0.031 0.033 0.036 0.053 0.051 0.048 0.051 0.054 0.048 0.048 0.038 0.040 0.063 0.044 0.046 0.059 0.061 0.057 0.042 0.036 0.033 0.050 0.049 0.045 0.066 0.052 0.060 Note: vdB: van den Broek, test with df=1. J&H: Jansakul and Hinde, test with df=2. Gen.: Generalized score, test with df=2. van den Broek (1995) was performed assuming ωi = γ0 , and that of Jansakul and Hinde (2009), assuming ωi = γ0 + γ1 xi . With these parameterizations, the null hypotheses to be evaluated were then giving by: H0 : γ0 = β0 , γ1 = β1 , for our formulation; H0 : γ0 = 0, for van den Broek’s test; and H0 : γ0 = 0; γ1 = 0, for Jansakul and Hinde’s test. The maximum ˆ likelihood estimate β of the true value of β under the null was obtained from a homogeneous Poisson model. Finally, all simulations were replicated 1,000 times and for sample sizes 50, 100 and 200. To investigate the empirical type I error rates of the proposed score tests, we generated data from the homogeneous Poisson model. The empirical type I error rates at 5% nominal level are reported in Table 2.2. All considered score tests have well controlled type I error rates even for a sample size as small as 50, except when the true conditional mean µ∗ i ∗ nears zero. In other words, when −β0 is increasingly large, the tests tend to reject the null hypothesis less often than anticipated and this conservativeness does not diminish with increasing sample sizes. 22 ∗ Table 2.3: Empirical power of score test statistics to detect various forms ωi of heterogeneity ∗ coupled with a non-degenerate Poisson model with mean µ∗ = exp{β0 − 1.45xi }, xi ∼ i Uniform(0, 1), at 5% significance level n=50 ∗ β0 n=100 n=200 -0.75 0 0.75 -0.75 0 0.75 -0.75 0 0.75 0.060 0.069 0.071 0.096 0.087 0.086 0.313 0.249 0.247 0.086 0.094 0.094 0.147 0.118 0.122 0.578 0.481 0.478 0.122 0.123 0.124 0.294 0.238 0.241 0.895 0.851 0.849 ∗ ωi = 0.45 + 0.15xi vdB 0.100 0.221 J&H 0.102 0.190 Gen. 0.105 0.191 0.698 0.623 0.617 0.156 0.178 0.179 0.388 0.362 0.363 0.960 0.927 0.933 0.219 0.237 0.235 0.684 0.608 0.603 1.000 0.999 0.999 ∗ ωi = exp{−1.2 + 1.1xi }/(1 + exp{−1.2 + 1.1xi }) vdB 0.059 0.109 0.373 0.090 0.213 J&H 0.080 0.112 0.318 0.103 0.192 Gen. 0.087 0.112 0.317 0.105 0.194 0.704 0.662 0.655 0.124 0.136 0.135 0.343 0.297 0.292 0.953 0.929 0.924 ∗ ∗ ∗ ωi = (πi − exp{−µ∗ })/(1 − exp{−µ∗ }), πi = exp{− exp{1.3 − 2.4xi }} i i vdB 0.166 0.293 0.257 0.295 0.395 0.413 0.392 0.612 J&H 0.236 0.454 0.386 0.383 0.628 0.544 0.506 0.847 Gen. 0.710 0.845 0.957 0.944 0.981 0.997 0.997 1.000 0.590 0.731 1.000 ∗ ωi = 0.25 vdB J&H Gen. Note: vdB: van den Broek, test with df=1. J&H: Jansakul and Hinde, test with df=2. Gen.: Generalized score, test with df=2. 23 We investigated the empirical power of the tests to detect various forms of heterogeneity in the population. First, we generated the data as before, but fixed the true mixing weight at 0.25. For this constant mixing weight, increasing sample sizes and separation of mixture components improve the power of detecting the alternatives under consideration for all tests considered (see Table 2.3). But the improvement appears to be greater for the test proposed by van den Broek (1995), especially when the mixture components are well separated. This is expected as data were generated from a positive and constant mixing weight, for which the marginal model maintains its hierarchical representation. The loss of power was fairly minor for covariate-adjusted tests, however. We further allowed the true mixing weight to depend on covariates using the linear, the logistic and the proposed transformation (see Table 2.3). Overall, the power of the three score tests improves with the sample size, regardless of the true mixing weight. When the true mixing weight is a linear or a logistic function of the covariates, all considered tests appear to have comparable powers. But the power deteriorates as the mean of the non-degenerate Poisson model nears zero. This is not surprising as the true mixing weights in these schemes are bounded between 0 and 1, a condition for the marginal mixture model to maintain its hierarchical representation. For the last model of the true mixing weight considered, the marginal model allows for negative values and the proposed score test statistic appears to be more powerful than the other tests. The deterioration of power appears to be greater for the test proposed by van den Broek (1995). This is expected as the constant mixing weight test averages the mixing weights over the space of covariates, which may greatly affect power if both deflation and inflation are present in the population. In sum, incorporating covariates in the score tests for homogeneity can greatly improve efficiency. But this efficiency gain highly depends on the pre-specified working model of the 24 mixing weight and the behavior of the tests under model misspecification. When the true mixing weight does not depend on covariates, our test may not be as efficient as the constant mixing weight test. However, given that the true model is usually unknown to the analyst, a general approach that assumes covariates dependent mixing weights appears to be the most conservative data analytic strategy. 2.3.2 Empirical performances of proposed Wald tests We conducted a similar simulation study to the one for score tests but using different settings. We set y ∗ = 0 and assumed that the non-degenerate distribution is a negative binomial model with an overdispersion parameter κ = 1.1 and a simple conditional mean µ∗ = i ∗ ∗ exp{β0 + 1.45xi }, where the intercept β0 takes values in the set {0.5, 1, 1.5} and xi is a covariate generated from a uniform distribution on (0, 1). We also compared these performances to the constant mixing weight tests, and the test assuming mixing weight is a linear function of covariates. Throughout our simulations, we performed the Wald test assuming the working mixing weight ωi = {πi −(1+κµi )−1/κ }{1− (1 + κµi )−1/κ }−1 under the alternative, where πi = {1 + κ exp{γ0 + γ1 xi }}−1/κ and µi = exp{β0 + β1 xi }. The constant mixing weight test was performed assuming ωi = γ0 , and the linear mixing weight test assuming ωi = γ0 + γ1 xi . With these parameterizations, the null hypotheses to be evaluated were then giving by: H0 : γ0 = β0 , γ1 = β1 , for our formulation; H0 : γ0 = 0, for constant test; and H0 : γ0 = 0; γ1 = 0, for linear test. The ˆ ˆ maximum likelihood estimate (β, κ) of the true value of (β, κ) under the null was obtained from a homogeneous negative binomial model. Finally, all simulations were replicated 1,000 times and for sample sizes 100, 200 and 400. 25 Table 2.4: Finite sample size for Wald tests at 5% significance level based on 1000 Monte ∗ Carlo samples when negative binomial model with mean µ∗ = exp{β0 + 1.45xi }, overdisi persion parameter κ = 1.1, xi ∼ Uniform(0, 1) n=100 ∗ β0 ω∗ = 0 const. linear Our transf. n=200 n=400 0.5 1 1.5 0.5 1 1.5 0.5 1 1.5 0.048 0.053 0.040 0.056 0.050 0.039 0.048 0.041 0.041 0.059 0.051 0.043 0.054 0.051 0.041 0.051 0.050 0.033 0.058 0.058 0.062 0.044 0.048 0.040 0.045 0.056 0.044 Note: const.: constant mixing weight, test with df=1. linear: linear mixing weight, test with df=2. Our transf.: our transformation, test with df=2. To investigate the empirical type I error rates of the proposed Wald tests, we generated data from the homogeneous negative binomial model. The empirical type I error rates at 5% nominal level are reported in Table 2.4. All considered Wald tests have well controlled type I error rates even for a sample size as small as 100. Further, we investigated the empirical power of the tests to detect various forms of heterogeneity in the population. First, we generated the data as before, but fixed the true mixing weight at 0.15. For this constant mixing weight, increasing sample sizes and separation of mixture components improve the power of detecting the alternatives under consideration for all tests considered (see Table 2.5). The constant test appears to be greater than others, especially when the mixture components are well separated. This is expected as data were generated from a positive and constant mixing weight, for which the marginal model maintains its hierarchical representation. The loss of power, however, was fairly minor for covariate-adjusted tests. We further allowed the true mixing weight to depend on covariates using the linear, the 26 Table 2.5: Power for Wald tests at 5% significance level based on 1000 Monte Carlo sam∗ ples when negative binomial model with mean µ∗ = exp{β0 + 1.45xi } and overdispersion i parameter κ = 1.1, xi ∼ Uniform(0, 1) n=100 ∗ β0 n=200 n=400 0.5 1 1.5 0.5 1 1.5 0.5 1 0.538 0.420 0.460 0.747 0.623 0.740 0.875 0.755 0.880 0.849 0.768 0.793 0.961 0.928 0.949 0.995 0.985 0.994 0.990 0.985 0.985 1.000 0.999 1.000 1.000 1.000 1.000 ∗ ωi = 0.32 − 0.3xi const 0.438 linear 0.516 Our transf. 0.474 0.679 0.703 0.730 0.858 0.844 0.908 0.715 0.816 0.787 0.931 0.960 0.970 0.987 0.996 0.998 0.952 0.989 0.987 0.996 1.000 1.000 1.000 1.000 1.000 ∗ ωi = exp{1 − 7xi }/(1 + exp{1 − 7xi }) const 0.249 0.448 0.683 0.376 linear 0.647 0.773 0.898 0.904 Our transf. 0.666 0.870 0.978 0.951 0.685 0.981 0.997 0.909 0.999 1.000 0.660 0.995 1.000 0.939 1.000 1.000 0.995 1.000 1.000 ∗ ωi = 0.15 const linear Our transf. 1.5 ∗ ∗ ∗ ωi = (πi − (1 − κµ∗ )−1/κ )/(1 − (1 − κµ∗ )−1/κ ), πi = (1 − κ exp{−0.6 + 2.8xi })−1/κ i i const 0.057 0.298 0.687 0.124 0.637 0.952 0.221 0.911 1.000 linear 0.214 0.233 0.456 0.490 0.517 0.869 0.871 0.888 1.000 Our transf. 0.710 0.719 0.887 0.937 0.943 0.989 1.000 1.000 1.000 Note: const.: constant mixing weight, test with df=1. linear: linear mixing weight, test with df=2. Our transf.: our transformation, test with df=2. 27 logistic and the proposed transformation (see Table 2.5). Overall, the power of the three score tests improves with the sample size, regardless of the true mixing weight. When the true mixing weight is a linear or a logistic function of the covariates, all considered tests appear to have comparable powers. But the power deteriorates as the mean of the nondegenerate negative binomial model closes zero. This is not surprising as the true mixing weights in these schemes are bounded between 0 and 1, a condition for the marginal mixture model to maintain its hierarchical representation. For the last model of the true mixing weight considered, the marginal model allows for negative values and the proposed Wald test statistic appears to be more powerful than the other tests. The deterioration of power appears to be greater for the constant test. This is expected as the constant mixing weight test averages the mixing weights over the space of covariates, which may greatly affect power if both deflation and inflation are present in the population. To incorporate covariates in the Wald tests for homogeneity can greatly improve efficiency. But this efficiency gain highly depends on the pre-specified working model of the mixing weight and the behavior of the tests under model misspecification. When the true mixing weight does not depend on covariates, our test may not be as efficient as the constant mixing weight test. However, given that the true model is usually unknown to the analyst, a general approach that assumes covariates dependent mixing weights appears to be the most conservative strategy. 2.3.3 Dental caries data To illustrate our methodology, we consider data from a survey conducted by the Detroit Center for Research on Oral Health Disparities, one of the five sponsored centers of the US 28 National Institute of Dental and Craniofacial Research. This is a very unique longitudinal survey designed to collect oral health information on low-income African American children (0-5 years) and their main caregivers (14+ years), living in the city of Detroit. The overall goal of the study was to promote oral health and reduce its disparities within the community of low-income African Americans through the understanding of the social, familial, biological, and neighborhood determinants of dental caries and periodontal disease. In this study, data on oral examinations involving assessment of dental caries, periodontal disease, oral cancer and oral hygiene were collected. For our illustration, we only focus on dental caries scores of 897 surveyed children. These scores measure the number of tooth surfaces that show signs of clinically detectable enamel lesions comprising both noncavitated and cavitated lesions. Using this stringent dental caries definition, three different outcomes were derived. The first outcome denoted DS measures the number of decayed surfaces, the second DFS measures the number of decayed and filled surfaces and the third DMFS measures the number of missing decayed and filled surfaces. Although this survey is longitudinal in nature, our numerical computations are based on cross-sectional data obtained in the first wave of examinations and interviews completed between 2002-2003. Covariates considered include Age (the child’s age in years) and SI (the child’s sugar intake) and their multiplicative interaction. A more detailed description of the study can be found elsewhere (see, for example, Tellez et al., 2006). It is well known that dental caries data often exhibit overdispersion in addition to zero inflation (see, for example, B¨hning et al., 1999). We evaluated the homogeneity hypothesis o using the proposed score test and Wald test. Specifically, we considered a two-component mixture model for which the non-degenerate distribution is a negative binomial model with mean µi = exp{β0 + β1 ∗ Agei + β2 ∗ SIi + β3 ∗ Agei ∗ SIi } and overdispersion parameter 29 κ and the mixing weight ωi is given by equation (2.3) with πi = 1 + κ exp{γ0 + γ1 ∗ Agei + γ2 ∗ SIi + γ3 ∗ Agei ∗ SIi } −1/κ . For comparison, the score test with constant mixing weight ωi = γ0 of van der Broek (1995) and that proposed by Jansakul and Hinde (2009) with mixing weight ωi = γ0 + γ1 ∗ Agei + γ2 ∗ SIi + γ3 ∗ Agei ∗ SIi were also performed. With these parameterizations, the null hypotheses to be evaluated were then giving by: H0 : γj = βj , j = 0, 1, 2, 3, for our formulation; H0 : γ0 = 0, for van den Broek’s test; and H0 : γj = 0, j = 0, 1, 2, 3, for Jansakul and Hinde’s test. All score tests were conducted by replacing the nuisance parameter β by its maximum likelihood estimate under the null distribution. In contrast, all Wald tests were conducted by using maximum likelihood estimates under the alternative. Results of this analysis are presented in Table 2.6. Our score test statistic and that of Jansakul and Hinde (2009) reject the homogeneity hypothesis for all outcomes at 5% significance level, supporting the hypothesis of heterogeneity. But our score test provides a much stronger evidence for heterogeneity in this population. We have same conclusion for the proposed Wald test. Interestingly, the constant mixing weight test, however, fails to reject the null for all outcomes. To give additional insight to the interesting finding, we further fit the data by using a homogeneous negative binomial model, and stratify the analysis by age-group (Age<3 and Age>4). For our illustration, we only use one dental caries outcome, DS (the number of decayed surfaces), in the analysis. We draw the probability mass function for each agegroup in Figure (2.1). This figure has demonstrate that both the zero-infaltion (in younger age group: Age<3) and the zero-deflation (in older age group: Age>4) are present in the population. However, the data can be fitted very well by a homogeneous negative binomial 30 Table 2.6: Score test and Wald test statistics, degrees of freedom and associated p-values for heterogeneity in dental caries data when the null model is a non-degenerate negative binomial distribution van den Broek Jansakul & Hinde Generalized score test Generalized Wald test response df test statistic(p-value) df test statistic(p-value) df test statistic(p-value) df test statistic(p-value) DS DFS DFMS 1 1 1 0.0164 (0.8980) 0.0083 (0.9274) 0.0009 (0.9759) 4 4 4 128.1512 (< 0.001) 123.2241 (< 0.001) 126.7810 (< 0.001) 4 4 4 151.8566 (< 0.001) 148.8391 (< 0.001) 151.8784 (< 0.001) 4 4 4 310.4211(< 0.001) 306.6089(< 0.001) 343.7801(< 0.001) 31 0.8 Age<3 0.0 0.0 All Subjects Observed prop. Fitted prop. O Proportion 0.2 0.4 0.6 Proportion 0.2 0.4 O Observed prop. Fitted prop. 22 Proportion 0.1 0.2 0.3 O 30 38 46 0 4 8 12 0.4 14 Observed prop. Fitted prop. O Proportion 0.1 0.2 0.3 6 0.4 0 20 24 Observed prop. Fitted prop. Age between 3 and 4 0.0 0.0 Age>4 16 0 6 14 22 30 38 46 Number of decayed surfaces 0 6 12 20 28 36 44 Number of decayed surfaces Figure 2.1: Probability mass functions fitted under the null model (homogeneous negative binomial distribution) by age-group for the dental caries data model. This explains why the constant mixing weigh test fails to reject the null. In fact, the constant mixing weight test may lack of power to detect the heterogeneity in the population if without considering covariates into the testing scheme. Furthermore, we fit the data under different hypotheses as in Table 2.7. For our illustration, we still use only one of the dental caries outcomes, DS (the number of decayed surfaces), in the analysis. The model under the null and the model assumed constant mixing weight have a similar −2 log L value. This also explains that the constant test may lack of power 32 to detect the heterogeneity. This is a good example where the score test or Wald test based on constant mixing weights is not powerful enough to capture heterogeneity in the data. But when covariate-adjusted the score test or the Wald test greatly improves efficiency in detecting heterogeneity. It is worth noting that failure to reject homogeneity does not give evidence that the zero-inflated negative binomial model provides a best fit for the data. Instead, such rejection only gives grounds to the zero-inflated negative binomial model to be further evaluated. As a final investigation, we then compared the zero-inflated negative binomial model to the zero-inflated Poisson model. A one-sided test (Silvapulle and Silvapulle, 1995) of the overdispersion parameter of the negative binomial model was highly significant at 1% significance level, revealing a strong evidence for overdispersion in addition to zero inflation. This confounding of mixtures with overdispersion is not uncommon in practice as recognized by Lindsay and Roeder (1992). 33 Table 2.7: Model fitting for the number of decayed surfaces under H0 and various H1 when baseline distribution is negative binomial with mean µi = exp{β0 + β1 ∗ Agei + β2 ∗ SIi + β3 ∗ Agei ∗ SIi } and overdispersion parameter κ Parameters H1 : ωi = γ0 + γ1 ∗ Agei + γ2 ∗ SIi + γ3 x3 † H1 : ωi = {πi − gi }{1 − gi }−1 H0 : ωi = 0 H1 : ωi = γ0 β0 β1 β2 β3 κ -1.3579(0.2213)∗ 0.8300(0.0734)∗ 0.0083(0.0022)∗ -0.0021(0.0005)∗ 2.6157(0.1762)∗ -1.4875(0.3113)∗ 0.8501(0.0813)∗ 0.0084(0.0022)∗ -0.0021(0.0006)∗ 2.9912(0.7173)∗ 1.3044(0.2497)∗ 0.2002(0.0682)∗ 0.0022(0.0019) -0.0005(0.0005) 0.9404(0.1069)∗ 1.3174(0.2584)∗ 0.1943(0.0704)∗ 0.0020(0.0019) -0.0005(0.0005) 0.9932(0.1177)∗ γ0 γ1 γ2 γ3 - -0.0765(0.1383) - 1.0888(0.0367)∗ -0.1992(0.0124)∗ -0.0023(0.0006)∗ 0.0004(0.0001)∗ -3.4186(0.3275)∗ 1.1487(0.1177)∗ 0.0097(0.0031)∗ -0.0024(0.0008)∗ 3975.5 3985.5 3975.1 3987.1 3764.2 3782.2 3748.1 3766.1 -2log L AIC † π = 1+κ exp{γ +γ ∗Age +γ ∗SI +γ ∗Age ∗SI } −1/κ , and g = (1+κ exp{β +β ∗Age +β ∗SI +β ∗Age ∗SI })−1/κ . 0 1 0 1 i i 2 i 3 i i i i 2 i 3 i i ∗ : p < 0.01 ˆ AIC= −2f (θ) + 2q ˆ where f () is the marginal log-likelihood function. θ is the vector of parameter estimates. q is the number of parameters. 34 2.4 Discussion The goal of this chapter was not to argue for or against the marginal model, but rather to show how to conduct inferences for the mixing weights in this class of models. Although score tests for evaluating homogeneity in two-component mixture models for discrete data have been well discussed in the literature, existing methodologies have relied primarily on restrictive assumptions. One common restriction is that constant mixing weights are often assumed. From a practical perspective, this is an important limitation as covariates dependent mixing weights are very typical in real applications of two-component regression models. More generally, limitations of existing methodologies result from the structure of the marginal model being simply ignored or at best improperly integrated into the testing procedures. In this chapter, we formulated and developed a generalized score test and a generalized Wald test of homogeneity based on an intuitive approach that accommodates the features of the marginal model. Specifically, our proposed test adopted a novel parameterization that allows the structural constraints of the mixing weight to be embedded in the testing scheme. Most importantly, this parameterization naturally incorporates covariates in the mixing weights. We showed that this is a natural strategy to adopt when testing is carried out under the marginal model that ignores its hierarchical representation. The proposed tests can be extended to refine the model specification. We generally conduct the test of homogeneity because it is easy to control its type I error. However, if the null hypothesis is rejected, one is typically interested in evaluating composite hypotheses. For example, one may be interested in testing equality of some coefficients of the binary regression model to the corresponding coefficients of the non-degenerate regression model. 35 While the details to evaluate such composite hypotheses warrant a separate investigation, the results presented here can be used to evaluate these specific hypotheses with a proper adjustment of the type I error. It is noteworthy that rejecting the homogeneity hypothesis against a specific two-component mixture model does not necessarily imply that the latter model is appropriate. Instead, we advocate that such rejection only gives grounds to the two-component mixture model to be further investigated. It is therefore crucial to study how plausible are inferences with respect to a candidate model under consideration that are of major substantive interest from the observed data standpoint. 36 Chapter 3 Tests of homogeneity under the hierarchical representation 3.1 Introduction Suppose that a sample observation y of some random quantity Y is drawn from the distribution P , a mixture of two distributions Q and G, related in the simple form, P = ωQ + (1 − ω)G, where ω is an unknown mixing weight, Q is an atom distribution at a known point y ∗ and G is an unknown non-degenerate distribution. Under this mixture model, observations are generated from two sources, representing a heterogeneity in the population. The parameter ω measures the extent of this heterogeneity in the population. In the sequel, the term heterogeneity will then be used to characterize this mixture in distribution. Likewise, the term homogeneity will be used to characterize a population generated from a single-component 37 distribution. In other words, under a homogeneous model, ω = 0. The distribution G requires some basic regularity conditions. In essence, its density function g(y) is a positive smooth function that decays rapidly at infinity with some degree of uniformity. A good example of such functions is the exponential family density function, for which the associated mixture model has been extensively studied in statistical research, especially under the counting measure and y ∗ = 0. Typical examples include the class of zero-inflated models for uncorrelated discrete data first studied by Mullahy (1986), Farewell and Sprott (1988) and Lambert (1992) and its extension to dependent discrete data by Hall (2000). In survival analysis, the so-called cure mixture model, for which the mixture is expressed through the survival function with y ∗ = ∞, is another popular example of this class of models (see, for example, Farewell, 1982). Two-component mixture models and more generally finite mixture models provide an interesting parametric framework to accommodate heterogeneity in the population. A prevailing concern for this class of models, however, is whether the inherent heterogeneity is consistent with observed data. It is well established that mixture models are prone to efficiency loss, especially when heterogeneity does not hold. Fitting homogeneous models, however, may yield biased inferences if observed data support heterogeneity. Many authors have examined this issue in a variety of settings using score statistics (see, for example, van den Broek, 1995; and Deng & Paul, 2000). However, existing testing procedures often rely on restricted alternatives. One general restriction is that constant mixing weights are often assumed under alternatives to homogeneity. Such an approach may fail to detect heterogeneity in a population with many strata and for which heterogeneity is stratum specific. To our knowledge, we are not aware of any test to assess for heterogeneity that depends on covariates in this class of models. In this chapter, we suggest a new test, which assumes that 38 heterogeneity depends on covariates. Complications are that the implied hypotheses may not be typical and standard regularity conditions to conduct the test may not hold. There may be; i) hypothesized parameters that are evaluated at infinity, leading to parameters under the null being on the boundary of the parameter space; ii) one-sided composite hypotheses under the alternatives; iii) some nuisance parameters that are unidentified under the null while others are identified; and iv) an uncertainty regarding the support space of the unidentified nuisance parameters under the null. From a methodological standpoint, there have been several papers in the literature which examine related issues and complications. But these works accommodate only one or two of these complications. We are not aware of any testing procedure that addresses all these complications simultaneously. The problem of parameters under the null being on the boundary of the parameter space has been extensively discussed in the literature. A seminal work on the topic is that of Chernoff (1954) and more recent papers include those of Self and Liang (1987), and Andrews (2001) and references therein. Several papers also consider score tests for one-sided alternatives (see, for, example, Silvapulle & Silvapulle, 1995; and Verbeke & Molenberghs, 2003). But these papers are often discussed in situations where the nuisance parameters are fully identified under the null. Various approaches to the problem of unidentifiable nuisance parameters under the null have appeared in the literature. One approach requires the unidentifiable nuisance parameters to be estimated from data under the alternative hypothesis (see, Conniffe, 2001). This practice however defies the purpose of score tests of not requiring the alternative model to be fitted. Another popular approach to address nonidentifiability under the null is to fix the nuisance parameters conditional upon which the pivotal function results in processes of these parameters (see, for example, Davies, 1977; Hansen, 1996; Andrews, 2001; Ritz & 39 Skovgaard, 2005; Zhu & Zhang, 2006; and Song et al., 2009). This approach has been by far the most popular method for dealing with unidentifiable parameters under the null. Finally, in papers which discuss the issue of unidentified nuisance parameters under the null, it is often assumed that the support space of these parameters is known to the analyst. For example, when testing the hypothesis that q ≥ 2 random effects variances are equal to zero in a random coefficients model, unidentified nuisance parameters under the null are typically the correlations among random coefficients which have a well defined space (see Zhu and Zhang, 2006 for a specific example). In our testing problem as formulated in Section 3.2, the support of the nuisance parameter may not be known to the analyst a priori. In this chapter, we extend the idea of pivotal functions that result in processes of unidentified parameters to show how to evaluate the hypothesis of interest. Specifically, for each fixed value of the unidentified parameters, we construct a modified score function in the spirit of Silvapulle and Silvapulle (1995) to capture one-sided alternatives. The statistic is then constructed by mapping functionals, such as the supremum function, of these score-based processes on the whole support of the unidentified nuisance parameters to the real line. We characterize the limiting null distribution of this statistic, which involves a careful theoretical analysis of its behavior under the null hypothesis. In practice, we suggest a resampling procedure to implement this testing procedure. The rest of this chapter is organized as follows. In section 3.2, a generalized one-sided score test for homogeneity of zeros against alternatives from the mixture models with covariates dependent mixing probabilities is described. We discuss the test for the class of zero-inflated models for discrete data as a special case in section 3.3. In this section, we conduct numerical studies to evaluate the size and power of the proposed testing procedure and illustrate its practical utility using a real life example in dental caries research. Some 40 remaining issues are discussed in section 3.4. 3.2 3.2.1 The main framework Hypothesis formulation and reparameterization Suppose we are interested in evaluating the hypothesis of zero mixing weights against the alternative that the sample observation yi , i = 1, · · · , n, is obtained from mixture model, Pi (yi ) = ωi Q(yi ) + (1 − ωi )G(yi ). (3.1) To specify the alternative hypothesis, specific constraints on these mixing weights are critically important. When the model in (3.1) maintains its hierarchical representation, the mixing weight ωi is a probability mass, requiring the probability constraint 0 ≤ ωi ≤ 1. It is possible however for ωi to be negative when the mixture model looses its hierarchical representation. But in this chapter, we shall assume 0 ≤ ωi ≤ 1 as negative weights seldom arise in practice. Moreover, we shall assume that y ∗ is different from the location parameter (such as the first moment or the median) of the homogeneous distribution. This condition is often satisfied when y ∗ is on the boundary of its support, such as in zero-inflated models for count data and cure mixture models for survival data. Thus, to evaluate the homogeneity hypothesis, one is typically interested in the one-sided hypothesis, H0 : ωi = 0, for all, i vs. H1 : 0 < ωi ≤ 1, for some i. 41 (3.2) To test this homogeneity hypothesis, we need to find a suitable parameterization of ωi that translates the null hypothesis as specified in (3.2) into a hypothesis involving only a small number of parameters. A technique relating the probability mass ωi to some observed covariates facilitates this parameterization. Assume that the data can be arranged to form strata with possibly few stratum-specific mixing weights. Specifically, suppose that the population under study consists of g known distinct subgroups and that each group has its own mixing weight. Let δij be a non random binary variable which takes value 1 if subject g i belongs to group j and 0 otherwise, with j=1 δij = 1. The mixing probability can be parameterized as, ωi = f g j=1 αj δij , where f (.) is an increasing, differentiable and invertible function on a domain R (real line) and codomain [0, 1] with f (−∞) = 0 and f (∞) = 1, and αj ∈ R represents the slope associated with δij , j = 1, · · · , g. In this formulation, f (αj ) is the mixing probability of group j. The hypothesis in (3.2) then reduces to f (αj ) = 0, for all j = 1, 2, · · · , g against f (αj ) > 0, for some j. Conventional statistical theory on one-sided score tests can then be used to assess significance (see Silvapulle & Silvapulle, 1995). However, the stratified approach has its limitations when the number of strata or more generally when the number of covariates is increasing. Alternatively, one may try a regression approach to relate the probability mass ωi to observed covariates. As an example, assume that there exist additional covariates represented by the vector x to adjust the mixing weights under the alternative hypothesis. Assume a parametric formulation for the mixing weights such as, ωi = f g t j=1 αj δij + xi γ , 42 where γ = (γ1 , γ2 , · · · , γp )t ∈ Γ, an arbitrary subset of Rp , is the slope parameter associated with xi . The mixing weight for subject i in group j with covariate xi is f (αj + xt γ). To i specify the homogeneity hypothesis, we adopt the following conditions. Condition A1. The vector x is uniformly bounded in i, i.e. supi xi < ∞. Here and in the sequel . represents the Euclidean norm of a vector. Condition A2. The subspace Γ is compact, i.e. supγ∈Γ γ ≤ τ , for some 0 ≤ τ < ∞. These two conditions are merely technical and ensure that the mixing probabilities are all zero only if αj = −∞, for all j = 1, 2, · · · , g. Under these conditions, the homogeneity hypothesis then requires testing at infinity. The homogeneity and heterogeneity hypotheses in (3.2) then become, H0 : αj = −∞, for all j, vs. H1 : αj > −∞, for some j. Let ̟j = exp{αj }, j = 1 · · · , g and ̟ = (̟1 , · · · , ̟g ), and define Ω = {̟ : ̟j ≥ 0, j = 1, 2, · · · , g} and Ω+ a subset of the cone Ω for which ̟j > 0 for some j. With this reparameterization, this hypothesis finally becomes, H0 : ̟ = 0 vs. H1 : ̟ ∈ Ω+ . When x = 0, the parameter γ is unidentified under the null and the testing problem is nonstandard. 43 3.2.2 Covariate-adjusted score statistic Assume without any loss of generality that the baseline distribution G is a parametric family known up to a parameter vector θ ∈ Θ with a support that is independent of θ. A good example of this family is the exponential family with density functions of the form, g(y; θ) = exp{˜(θ)y − ˜ + c(y)}, where a(θ), ˜ a b(θ) ˜ ˜ b(θ) and c(y) are known continuously differentiable ˜ functions. A non member of the exponential family such as the two-parameter negative binomial distribution also constitutes an example of this parametric family of interest. Under H0 , the mixture model in (3.1) simplifies to P = G with parameter θ that can be estimated if the G is well specified. Let δi = {δi1 , · · · , δig } be the subgroup membership for unit i. We assume without any loss of generality that observed data {yi , δi , xi }n are random i.i.d copies of {y, δ, x}. i=1 Given data {yi , δi , xi }n , we denote by ℓn (γ, θ, ̟) the log likelihood function associi=1 ated with parameter (γ, θ, ̟) and by ∂ℓn (γ, θ, ̟)/∂̟ the corresponding first-order right derivative with respect to parameter vector ̟. We also denote by ∂ℓ2 (γ, θ, ̟)/∂̟∂θt and n ∂ℓ2 (γ, θ, ̟)/∂̟∂̟t , the first-order derivative of ∂ℓn (γ, θ, ̟)/∂̟ with respect to θ and ̟, n respectively. To construct the test, we assume the following condition. Condition B1. Let A∗ be a nonsingular matrix and vn (θ) be the score function with respect to θ under the null distribution G. We assume the existence of a root-n conˆ sistent estimator θ of θ∗ , the true value of θ under the null distribution G such that ˆ n1/2 (θ − θ∗ ) = n−1/2 A∗ −1 vn (θ∗ ) + op (1), where op (1) represents the convergence to 0 in probability as n → ∞. Assume that γ is fixed. We denote by un (γ, θ) the score function ∂ℓn (γ, θ, ̟)/∂̟ evalˆ ˆ uated at ̟ = 0 and by un (γ, θ) its estimated version when θ is replaced by θ. We also 44 denote by bi (γ, θ) and di (γ, θ) the contribution of subject i to the function un (γ, θ) and the ˜ matrix −∂ℓ2 (γ, θ, ̟)/∂̟∂θt evaluated at ̟ = 0, respectively. Let h(γ, θ) = E{d1 (γ, θ)}. n ˜ Define ci (γ, θ∗ ) = bi (γ, θ∗ ) − h(γ, θ∗ )A∗ −1 ai (θ∗ ), where ai (θ∗ ) is the contribution of subject i to the score function vn (θ) evaluated at θ∗ . The Law of Large Numbers states that ˆ n−1/2 un (γ, θ) = n−1/2 n c (γ, θ∗ ) + η (γ), where η (γ) → 0 as n → ∞, for each n n p i=1 i γ. Here →p represents the convergence in probability. This establishes a pointwise conˆ vergence of n−1/2 un (γ, θ) to a centered normal distribution with asymptotic covariance matrix E{c1 (γ, θ∗ )⊗2 }, for each γ. Here v ⊗2 = vv t . Following Silvapulle and Silvapulle (1995), because of the one-sided constraints, ̟j ≥ 0 for all j, we consider the likelihood ˆ ratio test statistic for H0 against H1 based on a single observation n−1/2 un (γ, θ) to derive the modified score statistic, ˆ ι ˆ sn (γ) = n−1 un (γ, θ)tˆ(γ)un (γ, θ) − inf ̟∈Ω ˙ ˆ n−1/2 un (γ, θ) − ̟ ˙ t ˆ ˆ(γ) n−1/2 un (γ, θ) − ̟ ι ˙ (3.3) , with ˆ(γ) being an estimate of the inverse of E{c1 (γ, θ∗ )⊗2 }. A sufficient condition that ι guarantees the existence of this inverse is that E{c1 (γ, θ∗ )⊗2 } is uniformly positive definite over γ ∈ Γ, i.e. inf γ∈Γ,θ∈Θ eigmin(E{c1 (γ, θ)⊗2 }) > 0, where eigmin(.) denotes the smallest eigenvalue of a matrix. The computation of the function sn (γ) for each ˆ fixed γ involves the minimization of the quadratic and convex function (n−1/2 un (γ, θ) − ˆ ̟)tˆ(γ)(n−1/2 un (γ, θ) − ̟) in ̟, subject to the constraint ̟ ∈ Ω. When g > 1, this func˙ ι ˙ ˙ ˙ tion can easily be minimized in practice using available optimization subroutines such as optimize in the language R. Note however that when g = 1, the score-based function sn (γ) ˆ ι ˆ ˆ reduces to the simple form, sn (γ) = n−1 un (γ, θ)tˆ(γ)un (γ, θ)1{un (γ, θ) ≥ 0}, where 1{E} 45 is the indicator function for the event E. If γ is known, the statistic in (3.3) is a generalized one-sided score test statistic proposed by Silvapulle & Silvapulle (1995). In general, however, sn (γ) is really not a score statistic due to unknown γ. To remove the dependence on γ, we introduce the maximum score statistic defined as, Tn = sup sn (γ). γ∈Γ (3.4) The null hypothesis is then rejected for unusual large values of Tn . 3.2.3 Large sample properties Let B = {bi (γ, θ) : γ ∈ Γ, θ ∈ Θ, i = 1, · · · , n} and D = {di (γ, θ) : γ ∈ Γ, θ ∈ Θ, i = 1, · · · , n} be two function classes evaluated at the null ̟ = 0. Define ∗ σb (γ1 , γ2 ) = ∗ E{b1 (γ1 , θ∗ )b1 (γ2 , θ∗ )t } and σc (γ1 , γ2 ) = E{c1 (γ1 , θ∗ )c1 (γ2 , θ∗ )t }, γ1 , γ2 ∈ Γ. In addition to the previous conditions, we adopt the following conditions, Condition B2. Assume that Θ the support of θ is a compact set and θ∗ the true value of θ under the null is an interior point of Θ. Condition B3. The function classes, B and D, are pointwise measurable and satisfy the uniform entropy condition; see van der Vaart & Wellner (2000a) for the definitions. Condition B4. Processes n−1 n b (γ , θ∗ )b (γ , θ∗ )t and n−1 n c (γ , θ∗ )c (γ , θ∗ )t i 2 i 2 i=1 i 1 i=1 i 1 ∗ ∗ converge almost surely to σb (γ1 , γ2 ) and σc (γ1 , γ2 ), respectively, uniformly over γ1 , γ2 ∈ Γ. ˜ Condition B5. Under the null ̟ = 0, the function h(γ, θ) is uniformly bounded above, i.e. ˜ supγ∈Γ,θ∈Θ h(γ, θ) 2 < ∞, where . 2 denotes a matrix norm. 46 Condition B2 defines the parameter space for the implied parameter θ∗ . The entropy ˆ condition B3 ensures that processes un (γ, θ∗ ) and un (γ, θ) are well behaved across all γ ∈ Γ. The condition is satisfied by functions which are uniformly bounded and uniformly Lipschitz of order > {dim(θ) + dim(γ)}/2, where dim(·) denotes the dimension of a vector. Conditions ˆ B4 and B5 give conditions under which uniform asymptotic results for un (γ, θ∗ ) and un (γ, θ) may be obtained. The zero-inflated Poisson and negative binomial models presented in the numerical study in Section 3.3 meet these requirements. They meet more primitive sufficient conditions that imply these high level conditions. Theory results on the pointwise convergence of un (γ, θ∗ ) for each γ ∈ Γ, are well established. We give below a more general result on its uniform convergence across all γ ∈ Γ. Theorem 1. Under Conditions B2-B4 and H0 , as n → ∞, supγ∈Γ n−1 un (γ, θ∗ ) →p 0 and the random processes n−1/2 un (γ, θ∗ ) indexed by γ, converge to centered Gaussian ∗ processes in γ with covariance function σb (γ1 , γ2 ), γ1 , γ2 ∈ Γ. Assume that θ∗ the true but unknown value of θ when ̟ = 0 is replaced by a root-n ˆ consistent estimator θ verifying Condition B1. The theorem below gives a more general ˆ result on the uniform convergence of un (γ, θ) . ˆ Theorem 2. Under H0 and Conditions B1-B5, as n → ∞, supγ∈Γ n−1 un (γ, θ) →p 0 ˆ and the random processes n−1/2 un (γ, θ) indexed by γ, converge in distribution to centered ∗ Gaussian processes in γ with covariance function σc (γ1 , γ2 ), γ1 , γ2 ∈ Γ. Assume that the random processes sn (γ) converge to processes s(γ) as n → ∞. Given that the set of discontinuity points of the supremum function has zero probability mass, 47 the continuous mapping theorem can be used to retain the limit-preserving properties. This leads to the following corollary. Corollary. Under H0 and given conditions of Theorem 2, Tn →d supγ∈Γ s(γ), as n → ∞. A well known result due to Shapiro (1988) states that s(γ) is distributed as a mixture of chi-squared distributions, for each γ ∈ Γ. That is, Pr(s(γ) ≥ ν) = g 2 j=0 πj (γ, Ω)Pr(χj ≥ ν), ν ≥ 0, where χ2 is a chi-squared random variable with j degrees of freedom (χ2 being a zero 0 j point mass distribution) and πj (γ, Ω), j = 0, · · · , g, are non negative mixing weights such g that j=0 πj (γ, Ω) = 1. These weights depend on γ, Ω and E{c1 (γ, θ∗ )⊗2 } and can be expressed in a closed form for some special cases (Shapiro, 1988). While for each fixed γ, the process s(γ) follows a mixture of chi-squared distributions under the null, the asymptotic distribution of supγ∈Γ s(γ) is quite complicated and analytically intractable. Davies (1977) has proposed upper bounds to significance levels for supremum test statistics, but these bounds may be of limited utility in practice (see Hansen, 1996 for a discussion on this issue). A simple nonparametric bootstrap may then be used to approximate the null distribution of the statistic under the null (Efron & Tibshirani, 1993). The validity of the bootstrap follows automatically from empirical process theory under the regularity conditions given in van der Vaart & Wellner (2000b). A difficulty with the nonparametric bootstrap is that it requires fitting the null model for each bootstrap sample, which may be computationally demanding. Instead, we propose a simple resampling technique to approximate the asymptotic null distribution of the test statistic Tn . This technique has been extensively used in the literature when the true asymptotic distribution is hard if not impossible to 48 derive analytically (see for example, Parzen et al., 1994; Lin et al., 1994; Hansen, 1996; Peng & Huang, 2007; and Zhu & Zhang, 2006). Contrarily to Peng & Huang (2007), we do not solve a stochastic equation to derive the asymptotic null distribution of Tn . Instead, we use normal variates to approximate the desired limiting distribution as in Hansen (1996) and Zhu & Zhang (2006). We opern c (γ, θ)ζ where i i=1 i {ζ1 , . . . , ζn } are identical and independent standard normal variates. Processes ξn (γ, θ∗ ) ate conditionally on the sample {yi , δi , xi }n . Define ξn (γ, θ) = i=1 ˆ and ξn (γ, θ) are similarly defined by replacing θ by its true null value θ∗ or a consistent ˆ estimator θ of θ∗ . Given observed data {yi , δi , xi }n , the random variation in ξn (γ, θ∗ ) i=1 ˆ and ξn (γ, θ) results from the randomness of ζi , i = 1, · · · , n. Theorems 3 and 4 below give a theoretical justification for the proposed resampling procedure. Theorem 3. Under H0 and Conditions B1-B5, the unconditional distribution of ˆ n−1/2 un (γ, θ) is asymptotically equivalent to the conditional distribution of n−1/2 ξn (γ, θ∗ ) given {yi , δi , xi }n , the observed data. i=1 We further use asymptotic results on the conditional distribution of n−1/2 ξn (γ, θ∗ ) given ˆ observed data {yi , δi , xi }n to establish the limiting distribution of n−1/2 un (γ, θ). This i=1 is summarized in the following theorem. ˆ Theorem 4. Under H0 and Conditions B1-B5, the conditional distribution of n−1/2 ξn (γ, θ) is asymptotically equivalent to the conditional distribution of n−1/2 ξn (γ, θ∗ ) given observed data {yi , δi , xi }n . i=1 49 Resampling Procedure We give below details on the resampling procedure to obtain the large sample distribution of the proposed score test Tn . For k = 1, · · · , K , we execute the following steps. (k) (k) Step 1. We generate i.i.d standard normal variates, denoted {ζ1 , . . . , ζn }, where superscript (k) represents replications. Step 2. Given the realizations of the data, {yi , δi , xi }n , and values of γ ∈ Γ we cali=1 (k) (k) ˆ ˆ ˆ culate the statistic ξn (γ, θ) = n ci (γ, θ)ζi , where θ is a root-n consistent estimator i=1 of θ∗ under the null model G. Then we calculate, (k) (k) (k) ˆ ˆ ι sn (γ) = n−1 ξn (γ, θ)tˆ(γ)ξn (γ, θ) − inf ̟∈Ω ˙ (k) ˆ ˙ n−1/2 ξn (γ, θ) − ̟ t (k) ˆ ˙ ˆ(γ) n−1/2 ξn (γ, θ) − ̟ ι . (k) (k) Step 3. We calculate the observed value of the test statistic Tn = supγ∈Γ sn (γ) for artificial observations. By repeatedly generating the normal variates {ζ1 , . . . , ζn }, say K times, and repeating (k) steps 2 and 3 for each generated sample, we obtain the empirical distribution of Tn , k = 1, · · · , K, which converges to the asymptotic distribution of Tn . The p-value of the test is then the proportion of these artificial statistics which exceed tn the observed value of the statistic given data {yi , δi , xi }n . That is, p-value= K −1 i=1 3.2.4 K k=1 (k) 1{Tn ≥ tn }. Specification of the support set Γ (k) (k) = supγ∈Γ sn (γ), k = The calculation of the statistic Tn and artificial statistics Tn 1, · · · , K, requires Γ, the support set of γ, to be specified. Ideally, the analyst may use 50 historical information or unrelated data to determine this set. In practice, however, this information is often unavailable to the analyst. Alternatively, two strategies can be adopted. In the first strategy, the analyst may elect to choose plausible values of γ to conduct the test using the full sample. One would expect this approach to work well if the chosen set contains the true parameter and poorly otherwise. Due to this uncertainty, it is recommended that a sensitivity analysis be conducted with respect to the choice of Γ, with a special focus on wider sets. Of course, wider sets may render the test unnecessarily conservative and thereby may sacrifice power. Most importantly, large values of γ may compromise the validity of the test by violating Assumption 2. Another strategy, more practical, consists of randomly splitting the sample into a smaller planning sample of size ρn and a larger analysis sample of size (1 − ρ)n, where 0 < ρ < 1. The planning sample of size ρn is used to empirically approximate Γ, using say a 99% joint confidence region for γ ∗ under the alternative model, then discarded, and the remaining sample to conduct the test. This approach was recently illustrated by Heller et al. (2009) in the context of observational studies, where the planning sample is used to guide decisions about the design in such a way that the study would be less sensitive to unobserved biases. In the sequel, Γρ shall represent the set Γ obtained from the planning sample of size ρn, and by convention, Γ0 shall represent the set not related to the sample. 3.3 Numerical Study We illustrate the use of the proposed score-type statistic to evaluate homogeneity for twocomponent mixture models in discrete data, a particular but representative special case. In this class of models, the mixture is expressed through the density functions with respect to 51 the counting measure. For the real data analysis, the zero-inflated negative binomial model will be used. But for simplicity, the zero-inflated Poisson regression model will be used for the numerical experiment. Other interesting and popular mixture models which will not be discussed numerically are the so-called cure mixture model in survival analysis for which the mixture is expressed through the survival function, and the random intercept model. In this study, the estimating function is the score function derived from parametric likelihood. We conduct the proposed hypothesis tests by replacing the parameter θ∗ by its ˆ maximum likelihood estimate θ under the null. We also approximate Γ the space of the unidentified nuisance parameter γ by a fine grid to ease the computational burden. The choice of the size of the grid depends on the computational feasibility and our empirical observation. 3.3.1 The score test for zero-inflated Poisson models In this section, we conduct a numerical study to evaluate finite sample performances of the score test statistic Tn using the proposed schemes to specify Γ. These performances are compared to those of the constant mixing weight score statistic derived from (3.4) by restricting the support set Γ to the origin. In situations where γ ∗ the true γ is known, the performances of Tn are further compared to those of the statistic evaluated on the singleton set Γ = {γ ∗ } and on a set that does not contain the true parameter γ ∗ . To keep the simulation simple, we consider a homogeneous Poisson process with covariate∗ dependent mean λ∗ = exp{θ0 + 0.04x1i + 0.05x2i }, where covariates x1i and x2i are geni erated independently from a uniform random variable on the interval [0, 1] and a standard ∗ normal variable truncated on the interval [−1, 1], respectively. The intercept parameter θ0 52 takes values in the set {−0.4, −0.3, −0.2}, and captures the separation between the two ˆ components of the mixture model. The maximum likelihood estimate θ of θ∗ is obtained from a homogeneous Poisson model with mean λi = exp{zt θ}, where zi = (1, x1i , x2i )t . i Throughout our simulations, we perform the test Tn assuming the working mixing weight model ωi = {1 + ̟−1 exp{−xt γ}}−1 under the alternative, where xi = (x1i , x2i )t . i The two approaches we mentioned in section 3.2.4 for selecting the set Γ are contrasted. (1) (2) For a priori choices, the support of γ is set to Γ0 = [−6, 6] × [−6, 6] or Γ0 = [−4, 4] × [−4, 4]. Actually, these regions all contain the true γ, but, in practice, it is often unknown to the analyst whether these selected regions contain the true γ. As an alternative, we consider the split sample approach where Γ is approximated by 99% joint confidence regions Γ 1 or 4 Γ 1 for γ ∗ under the alternative model, based on 1/4 or 1/3 of the total sample size. 3 In our calculations, we set the number of resamples K that approximates the null distribution of the test statistics to 1,000. Finally, all simulations were replicated 1,000 times and for sample sizes 100, 200 and 400. To investigate the empirical type I error rate of the proposed tests, we generate data from the homogeneous Poisson model with true mean λ∗ specified above. The empirical test sizes i at 5% nominal level are reported in Table 3.1. All tests have conservative type I error rates ∗ for all sample sizes and all values of θ0 considered. In other words, the tests tend to reject the null hypothesis less often than anticipated. However, this conservativeness diminishes as the sample size increases. These findings are consistent with the literature in that the supremum test can be somewhat conservative in some practical situations (see for example, Hansen 1996). We further investigate the empirical power of the tests when the true mixing weight depends on covariates. Specifically, we generate data from a two-component mixture model 53 Table 3.1: Empirical size of score-based test statistics assuming a homogeneous (ωi = 0) ∗ Poisson model G with mean λ∗ = exp{θ0 + 0.04x1i + 0.05x2i }, at 5% significance level i ∗ θ0 = −0.4 Support (1) Γ0 (2) Γ0 Γ1 4 Γ1 3 {(0, 0)}∓ ∗ θ0 = −0.3 ∗ θ0 = −0.2 Sample size n 100 200 400 Sample size n 100 200 400 Sample size n 100 200 400 0.023 0.026 0.036 0.022 0.028 0.043 0.022 0.029 0.043 0.029 0.031 0.030 0.036 0.043 0.034 0.027 0.026 0.035 0.030 0.045 0.041 0.028 0.024 0.038 0.028 0.050 0.035 0.022 0.032 0.033 0.025 0.023 0.032 0.017 0.029 0.041 0.034 0.037 0.028 0.029 0.050 0.050 0.041 0.044 0.032 0.031 0.041 0.040 0.028 0.029 0.046 0.045 0.044 0.048 (1) (2) Note: Γ0 = [−6, 6] × [−6, 6], Γ0 = [−4, 4] × [−4, 4] ∓ Tests performed using the proposed resampling scheme (first row) and the usual mixture of chi-square distributions (second row) 54 for which the non-degenerate distribution is a Poisson model with mean λ∗ and covariatei ∗ dependent mixing weight ωi = {1 + ̟∗ −1 exp{0.5x1i − 3.5x2i }}−1 , where ̟∗ takes values in the set {0.1, 0.3}. Table 3.2 presents the results of this simulation study. Overall, a larger sample size and increasing values of ̟∗ improve the power of detecting the alternatives for all tests considered. Moreover, the power increases with increasing values of the intercept ∗ ∗ θ0 . But when θ0 is small, the mixture components are poorly separated and the powers are adversely affected. The power also increases for tighter pre-selected sets of γ, with the highest power being achieved when Γ is restricted to the singleton {(−0.5, 3.5)}, the true γ. Covariate-adjusted tests appear to be consistently more powerful than constant-weight based tests whenever the pre-selected set Γ contains the true parameter. Note however that when Γ does not contain the true parameter, covariate-adjusted tests loose power and becomes even less powerful than constant-weight based tests. Similar findings are obtained when the pre-selected Γ does contain the true parameter but is overwhelming wide (simulation results not shown). These are situations where sample splitting is proven to be useful in calibrating the set Γ. Indeed for moderate to large sample sizes, the split sample approach outperforms the test based on constant mixing weights, when the true mixing weight depends on covariates. Its usefulness, however, can be limited when the original sample size is small. This is expected because sample splitting reduces power by discarding observations, which has a dramatic impact on inferences when the original sample is small. We finally conduct a simulation study to evaluate the power of the proposed test statistics under various model misspecifications of the true mixing weight. We consider the situation where the true mixing weight; 1) does not depend on covariates; or 2) does depend on 55 Table 3.2: Empirical power of score-based test statistics assuming covariate-dependent mix∗ ing weight ωi = {1 + ̟∗ −1 exp{0.5x1i − 3.5x2i }}−1 , and Poisson model G with mean ∗ λ∗ = exp{θ0 + 0.04x1i + 0.05x2i }, at 5% significance level i ∗ θ0 = −0.4 Support Sample size n 100 200 400 ∗ θ0 = −0.3 Sample size n 100 200 400 ̟∗ = 0.1 ∗ θ0 = −0.2 Sample size n 100 200 400 (1) Γ0 0.138 0.400 0.773 0.201 0.441 0.794 0.223 0.535 0.871 (2) Γ0 0.183 0.448 0.814 0.260 0.504 0.839 0.281 0.594 0.911 {(−0.5, 3.5)} 0.486 0.695 0.938 0.535 0.755 0.937 0.571 0.815 0.969 0.122 0.237 0.393 0.136 0.245 0.418 0.164 0.261 0.418 0.112 0.240 0.577 0.119 0.276 0.663 0.145 0.377 0.734 0.082 0.226 0.519 0.081 0.303 0.621 0.117 0.314 0.686 0.116 0.202 0.317 0.147 0.212 0.356 0.145 0.253 0.422 0.121 0.213 0.345 0.156 0.219 0.368 0.155 0.255 0.432 (3) Γ0 Γ1 4 Γ1 3 {(0, 0)}∓ ̟∗ = 0.3 (1) Γ0 0.277 0.607 0.944 0.316 0.710 0.976 0.419 0.793 0.990 (2) Γ0 0.345 0.686 0.957 0.408 0.772 0.986 0.502 0.844 0.995 {(−0.5, 3.5)} 0.700 0.877 0.989 0.723 0.916 0.993 0.797 0.940 0.999 0.158 0.386 0.609 0.166 0.419 0.639 0.173 0.424 0.656 0.172 0.469 0.833 0.192 0.548 0.909 0.246 0.613 0.936 0.141 0.411 0.817 0.153 0.467 0.882 0.207 0.547 0.925 0.168 0.267 0.458 0.185 0.317 0.546 0.221 0.355 0.641 0.195 0.296 0.503 0.217 0.350 0.584 0.257 0.415 0.671 (3) Γ0 Γ1 4 Γ1 3 {(0, 0)}∓ (1) (2) (3) Note: Γ0 = [−6, 6] × [−6, 6], Γ0 = [−4, 4] × [−4, 4], Γ0 = [−18, −10] × [10, 18] ∓ Tests performed using the proposed resampling scheme (first row) and the usual mixture of chi-square distribution (second row) 56 covariates via a log-log, or a complementary log-log transformation (see Table 3.3). But the proposed tests are conducted under the assumption that the working mixing weight is related to covariates through a logistic transformation, ωi = {1+̟−1 exp{−xt γ}}−1 under i the alternative. Results of this study are given in Table 3.3. When the true mixing weight does not depend on covariates, the constant-weight based test appears to be more powerful than their covariate-adjusted counterparts. But when the true mixing weight depends on covariates, the covariate-adjusted test with a logistic approximation to the true link function appears to be more powerful compared to the covariate-unadjusted test. In sum, this simulation evidence favors covariate-adjusted tests when the true mixing probability depends on covariates. But the rationale of covariate adjustment builds on the behavior of such tests under model misspecification. We have noticed a slight deterioration of power of these tests when the true mixing probability does not depend on covariates. 3.3.2 Application to Dental caries data To illustrate the practical application of the proposed methodology, we consider data from a survey conducted by the Detroit Center for Research on Oral Health Disparities, hosted at the University of Michigan, Ann Arbor. This is a very unique longitudinal survey designed to collect oral health information on low-income African American children (0-5 years) and their main caregivers (14+ years), living in the city of Detroit. The overall goal of the study is to promote oral health and reduce its disparities within the community of low-income African Americans through the understanding of the social, familial, biological, and neighborhood determinants of dental caries and periodontal disease. Data on oral examinations involving assessment of dental caries, periodontal disease, oral cancer and oral hygiene were collected. 57 Table 3.3: Empirical power of score-based test statistics under various mixing weight model misspecifications (constant and covariate-dependent mixing weights with various link func∗ tions) and Poisson model G with mean λ∗ = exp{θ0 + 0.04x1i + 0.05x2i }, at 5% significance i level ∗ θ0 = −0.4 ∗ θ0 = −0.3 ∗ θ0 = −0.2 Sample size n 100 200 400 Sample size n 100 200 400 0.104 0.220 0.429 (1) Γ0 0.072 0.156 0.277 Sample size n 100 200 400 ∗ ωi = 1/6 0.093 0.171 0.339 (2) Γ0 0.092 0.197 0.332 0.124 0.207 0.383 0.139 0.269 0.498 0.054 0.108 0.199 0.084 0.119 0.234 0.070 0.151 0.308 0.053 0.096 0.186 0.056 0.103 0.214 0.063 0.140 0.278 0.142 0.291 0.467 0.191 0.308 0.545 0.235 0.394 0.663 0.163 0.308 0.483 0.199 0.320 0.552 0.245 0.409 0.668 (1) Γ0 ∗ ωi = exp{− exp{log(10) + 0.5x1i − 3.5x2i }} 0.145 0.315 0.664 0.138 0.391 0.730 0.187 0.435 0.798 (2) Γ0 0.161 0.334 0.670 0.166 0.409 0.743 0.221 0.453 0.807 0.072 0.212 0.478 0.083 0.240 0.585 0.087 0.297 0.605 0.053 0.172 0.460 0.076 0.233 0.517 0.094 0.254 0.574 0.071 0.092 0.161 0.084 0.114 0.189 0.092 0.133 0.201 0.077 0.110 0.169 0.092 0.117 0.195 0.098 0.139 0.207 (1) Γ0 ∗ ωi = 1 − exp{− exp{− log(10) − 0.5x1i + 3.5x2i }} 0.264 0.636 0.961 0.313 0.712 0.985 0.388 0.805 0.989 (2) Γ0 0.348 0.695 0.968 0.382 0.769 0.993 0.465 0.852 0.993 0.138 0.453 0.873 0.159 0.515 0.908 0.219 0.596 0.951 0.123 0.395 0.834 0.146 0.455 0.876 0.190 0.542 0.919 0.114 0.191 0.357 0.136 0.222 0.397 0.147 0.263 0.458 0.138 0.228 0.377 0.155 0.244 0.428 0.176 0.278 0.500 Support Γ1 4 Γ1 3 {(0, 0)}∓ Γ1 4 Γ1 3 {(0, 0)}∓ Γ1 4 Γ1 3 {(0, 0)}∓ (1) (2) Note: Γ0 = [−6, 6] × [−6, 6], Γ0 = [−4, 4] × [−4, 4] ∓ Tests performed using the proposed resampling scheme (first row) and the usual mixture of chi-square distribution (second row) 58 For our illustration, we focus on scores which represent the cumulative severity of dental caries experience in each of the 897 surveyed children. Specifically, these scores count the number of tooth surfaces that show signs of clinically detectable enamel lesions comprising both noncavitated and cavitated lesions for each child. Using this stringent dental caries definition, three different outcomes are derived. The first outcome denoted DS counts the number of decayed surfaces, the second DFS adds the number of filled surfaces, and the third DMFS further adds the number of filled surfaces. These scores have well documented shortcomings regarding their ability to describe the intra-oral distribution of dental caries. But they continue to be instrumental in evaluating and comparing the risks of dental caries across population groups. Most importantly, these scores remain popular in dental caries research for their ability to conduct historical comparisons in population-based studies (Lewsey and Thomson 2004). Our numerical computations are based on cross-sectional data obtained in the first wave of examinations and interviews completed between 2002-2003. Covariates considered include the child’s age in years (Age), the child’s sugar intake (SI) and their multiplicative interaction (Age*SI). A more detailed description of the study can be found elsewhere (see, for example, Tellez et al. 2006). It is a common practice to use zero-inflated models to describe dental caries scores in young children. Moreover, dental caries data in this population often exhibit overdispersion in addition to zero inflation (see, for example, B¨hning et al. 1999). More generally, o confounding of mixtures with overdispersion is not uncommon in practical settings (Lindsay and Roeder 1992). As a basic starting model, we consider a zero-inflated model for which the homogenous model is a negative binomial model with mean λi = exp{zt θ}, where i zi = (1, Agei , SIi , Agei ∗ SIi )t , and overdispersion parameter κ, and a working mixing weight 59 Table 3.4: Score-based test values (p-values) for dental caries data Full sample† DS DFS DMFS † ∗ covariate dep. test (1) Γ = Γ0 covariate dep. test (2) Γ = Γ0 covariate dep. test covariate dep. test Γ = {(0, 0, 0)} Response covariate indep. test Random split samples∗ Γ = Γ1 4 Γ = Γ1 3 0 (0.4944) 0 (0.4905) 0 (0.5070) 116.564 (0.0087) 115.634 (0.0223) 119.342 (0.0123) 116.070 (0.0096) 114.948 (0.0202) 118.869 (0.0118) 93.544 (0.0128) 94.780 (0.0245) 96.283 (0.0156) 69.520 (0.0095) 69.914 (0.0251) 72.182 (0.0096) (1) (2) Γ0 = [−10, 10] × [−10, 10] × [−10, 10] and Γ0 = [−5, 5] × [−5, 5] × [−5, 5] The first sample is used for calibrating the support set of γ and the remaining sample for conducting the test. 60 model ωi = {1 + ̟−1 exp{−xt γ}}−1 , where xi = (Agei , SIi , Agei ∗ SIi )t . i As a goodness-of-fit, we evaluate this zero-inflated model against the homogeneous negative binomial model using the proposed covariate-dependent score tests. To perform these tests, the support of each dimension of γ is set a priori to either [−10, 10] or [−5, 5]. For the split sample approach, Γ is approximated using 99% simultaneous confidence regions of γ ∗ under the alternative model using 1/4 or 1/3 of the total sample size. For comparison purposes, score tests with constant mixing weights are also performed. Since all considered covariates are continuous, we set g = 1. Finally, we set K = 10, 000 resamples to approximate the null distribution of the test statistics. Results of this analysis are presented in Table 3.4. The observed value of the test statistic appears virtually insensitive to the two pre-selected (1) (2) sets, Γ0 and Γ0 in Table 3.4, of γ. In view of the resampling variation, the p-values of the tests are also insensitive to a wide set Γ, and remain all significant at 5% significance. In practice, contemplating extreme values of γ may lead to an overly conservative test. Interestingly, in this example, the evidence of heterogeneity is rather strong, so that the supremum test rejects over a pre-specified wide region. The split sample approach appears to be sensitive to the pre-specified proportions used to approximate the set Γ. But the two proportions, 1/4 and 1/3, also yield significant p-values, providing another evidence of heterogeneity in this population. Unlike covariate-dependent tests, the constant mixing weight test fails to reject the homogeneity hypothesis for all outcomes. This is an excellent example where the score test based on constant mixing weights is not powerful enough to capture heterogeneity in the data. But when covariate-adjusted the score test greatly improves efficiency in detecting 61 heterogeneity. It is worth noting that failure to reject homogeneity does not give evidence that the zeroinflated negative binomial model provides the best fit for these data. Instead, such rejection only gives grounds to this model to be further evaluated against other competing models. One such model is the marginal model that does not maintain the hierarchical representation of the mixture. It allows for negative mixing weights, provided the densities with respect to the counting measure are bounded by 0 and 1. 3.4 Discussion The issue of evaluating the homogeneity hypothesis in two-component mixture models has attracted considerable attention in the literature, but most of this debate has focused primarily on restrictive assumptions on the form of the alternative. One common limitation is that constant mixing probabilities are often assumed. This chapter extends the literature by developing a score-based statistic to detect heterogeneity with covariates dependent mixing weights. The main contribution is that the power of the tests can be greatly improved if the working model for the mixing weights depends on covariates. But there may be a slight deterioration of power when the true mixing probability does not depend on covariates. Given that the true data generating mechanism is usually unknown to the analyst, a general approach that assumes covariates dependent mixing probabilities appears to be the most conservative data analytic strategy. The proposed test can be extended to refine the model specification. For example when the null hypothesis is rejected, it may be of interest to identify subgroups of the population that have nonzero mixing probabilities. We did not discuss how to identify these 62 components, but our results can be used for this purpose. Inference regarding significance of each subgroup’s mixing weight can be conducted taking into account that multiple tests are performed, which necessitates an adjustment of the type I error. Other issues also merit further research. One major issue is to construct tests based on some estimating functions that are robust against outliers and poor separation of components. These issues and other generalizations of the methodology are the subject of further research. 63 Chapter 4 Summary, discussion and future study 4.1 Summary In this thesis, we have developed a general testing procedure to evaluate homogeneity in twocomponent mixture models. Evaluating homogeneity in this class of models has attracted considerable attention in the literature, but most of this debate has focused primarily on restrictive assumptions on the form of the alternative. One common limitation is the constancy assumption under the alternative of the quantity being tested, which may adversely affect statistical power. This thesis has extended the literature by showing that efficiency can be greatly improved if covariate information is incorporated into the testing scheme. There may, however, be a slight deterioration of power when the quantities being tested do not depend on covariates. Nonetheless, given that the true data generating mechanism is usually unknown to the analyst, a general approach that accommodates covariates appears to be the most conservative data analytic strategy. To conduct the test under the marginal representation of the mixture model, we have formulated and developed a generalized score test and Wald test of homogeneity based on 64 an intuitive approach that accommodates the features of the marginal model. The proposed test adopted a novel parameterization that allows the boundary of the mixing weight to be embedded in the testing scheme. Most importantly, we showed that this is a natural strategy to adopt when testing is carried out under the marginal model that ignores its hierarchical representation. To conduct the test under the hierarchical representation of the mixture model, in practice, the analyst is required to specify Γ the support set of unidentified nuisance parameters. In some contexts, information from earlier studies can be used to specify this set. Alternatively, the sample under study can be randomly split into a smaller planning sample used only to calibrate Γ then discarded, and a larger analysis sample to conduct the test. This approach works well when the original sample size is moderate or large but performs poorly otherwise. In small samples, the analyst can avoid the loss of power due to sample splitting, by choosing plausible values of γ to conduct the test using the full sample. This approach works also well if the chosen set contains the true parameter, but does not perform so well otherwise. It is therefore recommended that a sensitivity analysis be conducted with respect to the choice of Γ, with a special focus on wider sets. In the following section, we propose and develop some extensions of the proposed methodology. Specifically, we develop under non-standard conditions a score-based statistic to evaluate parametric restrictions that are positive in the maintained hypothesis. Here, the methodology is presented in a simple but general framework which does not necessitate the full distribution of the population to be fully specified. Instead, an estimating function for a finite dimensional component of the model is assumed but other aspects of the model which may be described by infinite dimensional parameters are left completely unspecified. We also propose an adjustment to reduce the conservativeness of the supremum test statistics 65 in small to moderate sample sizes. 4.2 A score-type test for positive functions In this section, we develop a score-type statistic to evaluate parametric functions that are positive in the maintained hypothesis. These functions take a general form, involving a regression of some model parameters and auxiliary variables. To be specific, consider a random sample yi , i = 1, · · · , n, independently drawn from a potentially stratified population of g ≥ 1 known distinct strata. The law governing this population is assumed parametric with a potentially infinite-dimensional parameter ψ. We are interested in evaluating constraints involving only a finite-dimensional but unknown subset of ψ denoted ξ of dimension s × 1. Specifically, we are interested in parametric functions of the form, fi (ξ) = f (vt α + xt γ), where f (.) is a known, increasing, differentiable and invertible funci i tion on the extended real line R (the real line R plus −∞ and ∞) and codomain C ⊆ [0, ∞), vi = (vi1 , · · · , vig )t , with vij taking value 1 if unit i belongs to stratum j = 1, · · · , g, and 0 g otherwise, subject to j=1 vij = 1, xi is an observed continuous covariate vector of dimension p, α = (α1 , · · · , αg )t and γ = (γ1 , · · · , γp )t are unknown regression parameters associated to covariates vi and xi . Under the maintained hypothesis f (vt α + xt γ) ≥ 0, for all i, i i we consider hypothesis of the general form, H0 : f (vt α∗ + xt γ ∗ ) = 0, for all, i vs. H1 : f (vt α∗ + xt γ ∗ ) > 0, for some i, i i i i (4.1) where α∗ and γ ∗ are the unknown true values of α and γ. Special examples of this testing problem include, as part of goodness-of-fit, the evaluation of homogeneity for parametric 66 mixture models in which heterogeneity depends on covariates through a regression technique. As a simple illustration, suppose we are interested in evaluating heterogeneity under a relatively simple linear mixed effects model, the so-called random intercept model with a covariate dependent random effects variance: t yi = µ i 1m i + z i ϑ + ǫi , where yi and zi are respectively the response vector and covariate vector for cluster unit i, ϑ is a vector of unknown, non-random regression coefficients associated with zi , and 1mi is a vector of ones of length mi , the number of observations for cluster i. The cluster-specific 2 random intercepts µi ∼ N (0, σµ ), i = 1, · · · n, are mutually independent and independently i distributed from the measurement error vector ǫi ∼ N (0, σ 2 Imi ), with Imi being an identity 2 matrix of order mi . Unlike in a classical random intercept model, we assume that σµ i 2 depends on a cluster-level covariate xi through the regression, σµ = exp{α+xt γ}, assuming i i that the population is formed of only one stratum. The hypothesis of no random effect is given by, exp{α∗ + xt γ ∗ } = 0, for all i and the null model is indexed by parameter i θ = (ϑt , σ)t . Classical inferential procedures for assessing heterogeneity in the mixed models described in the following example, is typically based on restricted alternatives. For example, Verbeke and Molenberghs (2003)constructed a score test to evaluate the need of random effects using constant variances under the alternative. Such an approach may fail to reject homogeneity in a population where heterogeneity depends on covariates. Another interesting example where this testing problem is often faced is in evaluating heterogeneity for two-component mixture models. We have introduced the example in the previous chapter. In this class of models, the sample observation yi is obtained from Pi a 67 mixture of two distribution Q and G, related in the simple form, Pi (yi ) = ωi Q(yi ) + (1 − ωi )G(yi ), where 0 ≤ ωi ≤ 1 is an unknown mixing weight which measures the extent of heterogeneity in the population, Q is an atom at a known point y ∗ and G is an unknown non-degenerate distribution known up to a parameter θ. Point y ∗ is often on the boundary of the support of data, such as y ∗ = 0 for count data and y ∗ = ∞ for survival data. Typical examples of these models include the class of zero-inflated models for uncorrelated discrete data first studied by Mullahy (1986), Farewell and Sprott (1988) and Lambert (1992) and its extension to dependent discrete data by Hall (2000). In survival analysis, the so-called cure mixture model for which the mixture is expressed through the survival function (Farewell 1982), is another popular example of this class of models. In practical applications of these twocomponent mixture models, the mixing weight is often allowed to depend on covariates through known functions. A typical example of such functions is the logistic transformation, ωi = {1 + exp{−α − xt γ}}−1 for which the homogeneity hypothesis is given by, {1 + i exp{−α∗ − xt γ ∗ }}−1 = 0 for all i, again assuming only one stratum. i The test developed by Silvapulle and Silvapulle (1995) is useful to assess homogeneity against heterogeneity that depends on categorical covariates. Its usefulness is however limited when heterogeneity depends on continuous covariates. We consider the general hypothesis in (4.1). In the absence of continuous covariates, that ∗ ∗ is xi = 0 for all i, this hypothesis simply reduces to f (αj ) = 0, for all j against f (αj ) > 0, for some j. Conventional testing procedures on one-sided alternatives can then be used to assess significance (see Silvapulle and Silvapulle 1995). However, when xi = 0 for some i, the 68 testing problem is nonstandard. Without any loss of generality, we assume that f (−∞) = 0 and adopt the following assumptions to specify the hypothesis. Assumption 1. The vector x is uniformly bounded in i, i.e. supi xi < ∞. Here and in the sequel . represents the Euclidean norm of a vector. Assumption 2. The subspace Γ ⊂ Rp of parameter vector γ is compact, i.e. supγ∈Γ γ ≤ τ , for some 0 ≤ τ < ∞. The two assumptions are merely technical and ensure that the null hypothesis holds only ∗ if αj = −∞, for all j = 1, 2, · · · , g. Evaluating this hypothesis then requires testing at infinity. For convenience, we consider the transformation ̟j = exp{αj }, j = 1 · · · , g and define the cone Ω = {̟ = (̟1 , · · · , ̟g ) : ̟j ≥ 0, j = 1, 2, · · · , g} and Ω+ a subset of Ω for which ̟j > 0 for some j. Let ̟∗ represent the true but unknown value of ̟. Assuming a population with only one stratum, the homogeneity hypothesis under the simple random intercept model and the two-component mixture model is then given by ̟∗ exp{xt γ ∗ } = 0 i and {1 + ̟∗ −1 exp{−xt γ ∗ }}−1 = 0, for all i, respectively. It is clear that the parameter i γ ∗ is unidentified whenever ̟∗ = 0, which renders the testing problem nonstandard. More generally, the hypothesis in (4.1) becomes, H0 : ̟∗ = 0, for any γ ∗ ∈ Γ, vs. H1 : ̟∗ ∈ Ω+ , for some γ ∗ ∈ Γ. We define by w the vector of distinct elements of {v, x, z}, where v is possibly a subset of z. We assume without any loss of generality that observed data {yi , wi }n are random i.i.d i=1 copies of {y, w}, according to a parametric law known up to a parameter ψ, potentially of infinite dimension. Given data {yi , wi }n , we denote by Qn (ξ) a s × 1 estimating function i=1 69 vector for ξ = (̟t , γ t , θt )t , a subvector of ψ, where parameter vector θ ∈ Θ represents other model characteristics assumed identifiable under the null. To construct the test, we assume the following smoothness condition. Condition A1. We assume the existence of a nonsingular matrix D such that for any ̺ > 0, sup β ≤̺ [n−1/2 {Qn (ξ + n−1/2 β) − Qn (ξ)} + Dβ] = op (1), where op (1) represents the convergence to 0 in probability as n → ∞ and D = limn→∞ −n−1 ∂Qn (ξ)/∂ξ w.p. 1. Additionally, we assume that n−1/2 Qn (ξ) converges in distribution to a central normal distribution. A trivial example of such functions is the score function derived from parametric likelihood, in which case D = limn→∞ n−1 var{Qn (ξ)}. Score functions derived from pseudolikelihood (Besag, 1975) and composite likelihood (Lindsay, 1988) are another great examples of these smooth functions. Let vn (θ) be the component of Qn (ξ) corresponding to θ, evaluated at ̟∗ = 0. This component only depends on θ whenever ̟∗ = 0. Let θ∗ be the true but unknown value of ˆ θ and θ its estimator under the null ̟∗ = 0. Based on the smoothness condition above, ˆ there exists a nonsingular matrix A∗ such that n1/2 (θ − θ∗ ) = n−1/2 A∗ −1 vn (θ∗ ) + op (1). Assume that γ is fixed. We denote by un (γ, θ) the component of Qn (ξ) corresponding to ˆ ˆ ̟ evaluated at ̟∗ = 0 and by un (γ, θ) its estimated version when θ is replaced by θ. We also denote by bi (γ, θ) and di (γ, θ) the contribution of subject i to the function un (γ, θ) and the first-order derivative with respect to θ of the component of Qn (ξ) corresponding to ̟, and evaluated at ̟∗ = 0, respectively. Define ci (γ, θ∗ ) = bi (γ, θ∗ ) − h(γ, θ∗ )A∗ −1 ai (θ∗ ), where ai (θ∗ ) is the contribution of subject i to the score function vn (θ∗ ) and h(γ, θ) = ˆ E{d1 (γ, θ)}. Under condition A1, we have the following pointwise result, n−1/2 un (γ, θ) = 70 n−1/2 n c (γ, θ∗ )+η (γ), where η (γ) → 0 as n → ∞, for each γ. Here → represents n n p p i=1 i ˆ the convergence in probability. This establishes a pointwise convergence of n−1/2 un (γ, θ) to a centered normal distribution with asymptotic covariance matrix E{c1 (γ, θ∗ )⊗2 }, for each γ. Here c⊗2 = c1 ct . 1 1 Following Silvapulle and Silvapulle (1995), because of the one-sided constraints ̟ ∈ Ω, we consider the likelihood ratio test statistic for H0 against H1 based on the asymptotic ˆ normality of the single observation n−1/2 un (γ, θ) to derive the modified score statistic, ˆ ι ˆ sn (γ) = n−1 un (γ, θ)tˆ(γ)un (γ, θ) − inf ̟∈Ω ˙ t ˆ ˆ n−1/2 un (γ, θ) − ̟ ˆ(γ) n−1/2 un (γ, θ) − ̟ ˙ ι ˙ (4.2) , with ˆ(γ) being an estimate of the inverse of E{c1 (γ, θ∗ )⊗2 }. A sufficient condition that ι guarantees the existence of this inverse is given in the following condition. Condition A2. Assume that E{c1 (γ, θ∗ )⊗2 } is uniformly positive definite over γ ∈ Γ and θ ∈ Θ, i.e. inf γ∈Γ,θ∈Θ eigmin(E{c1 (γ, θ)⊗2 }) > 0, where eigmin(.) denotes the smallest eigenvalue of a matrix. When g = 1, the score-based function sn (γ) reduces to the simple form, sn (γ) = ˆ ι ˆ ˆ n−1 un (γ, θ)tˆ(γ)un (γ, θ)1(un (γ, θ) ≥ 0), where 1(.) represents the indicator function for an event. However when g > 1, the computation of sn (γ) for each fixed γ involves the miniˆ ˙ ι ˆ ˙ mization of the quadratic and convex function (n−1/2 un (γ, θ)− ̟)tˆ(γ)(n−1/2 un (γ, θ)− ̟) ˆ in ̟ ∈ Ω. When all components of un (γ, θ) are all positive, this convex function is mini˙ ˆ ˆ mized at n−1/2 un (γ, θ) and takes a value 0. However when some components of un (γ, θ) are negative, this function can easily be minimized numerically in practice using available 71 optimization subroutines such as optimize in the R statistical computing environment. If γ is known, the statistic in (4.2) is a generalized one-sided score test statistic proposed by Silvapulle and Silvapulle (1995). In general, however, sn (γ) is really not a score statistic due to unknown γ. To remove the dependence on γ, we introduce the maximum score statistic defined as, Tn = sup sn (γ). γ∈Γ (4.3) The null hypothesis is then rejected for large values of Tn . The large sample properties and the resampling procedure of this test statistic are similar to those of full likelihood-based models which have been fully discussed in the previous chapter. 4.3 Bias-Adjustment of the score test It has been reported in the literature that the supremum test can be somewhat conservative in some practical situations such as in small to moderate sample sizes. In this section, we extend the discussion from section 3.3.1 to address this conservativeness. In particular, we focus on adjusting the score test for zero-inflated Poisson models. Following the notations in section 3.3.1, we assume that G is a homogeneous Poisson distribution for which the mean λi for subject i is related to observed covariates zi as follows, t λi = exp{zi θ}, i = 1, · · · , n. Without any loss of generality, we assume that g = 1. The ˆ ˆ ˆ contribution of subject i to the score function un (γ, θ) is given by bi (γ, θ) = {˜i exp{λi } − y 1} exp{xt γ}, where yi = ˜ i tˆ ˆ 1{yi = 0} and λi = exp{zi θ}. The test is then constructed by computing the quantity ˆ(γ) from the inverse of the estimate of E{c1 (γ, θ∗ )⊗2 } = B(γ, θ∗ )− ι ˜ ˜ h(γ, θ∗ )A∗ −1 h(γ, θ∗ )t , where matrix B(γ, θ∗ ) represents the contribution of one subject to the expectation E{−∂ℓ2 (γ, θ, ̟)/∂̟∂̟t } evaluated at ̟ = 0 and θ = θ∗ . This inverse is n 72 ˆ ˜ then obtained by replacing θ∗ by its estimate θ in B(γ, θ∗ ), h(γ, θ∗ ) and A∗ −1 . ˆ Although n−1/2 un (γ, θ) is asymptotically unbiased for a fixed γ, a nonzero bias is ˆ induced in small to moderate sample sizes on substitution of θ∗ by θ (See Ng & Cook, 1999). We suggest an adjustment of this statistic via a first order Taylor expansion of tˆ ˆ exp{λi } = exp{exp{zi θ}} around θ∗ . t ˆ t ˆ Specifically, we have exp{λi } ≈ exp{λ∗ } + λ∗ exp{λ∗ }zi (θ − θ∗ ), with λ∗ = exp{zi θ∗ }. i i i i t ˆ This then gives yi exp{λi } ≈ yi exp{λ∗ } + λ∗ exp{λ∗ }zi n−1 A∗ −1 vn (θ∗ ) . Basic calcu˜ ˜ i i i lations give the following; vn (θ∗ ) = n (yi − λ∗ )zi , E{˜i } = exp{−λ∗ } and E{yi yi } = 0. y ˜ i=1 i i ˆ ˆ Hence the expectation of the score function un (γ, θ) is approximately equal to E{un (γ, θ)} ≈ n λ∗ 2 exp{xt γ}z t A∗ −1 z . We define the approximate expected bias as i i=1 i i i t ∆n (γ, θ∗ ) = n−1 n λ∗ 2 exp{xt γ}zi A∗ −1 zi . Given that all individual terms in expected i=1 i i bias have order O(1), the normalized expected bias n−1/2 ∆n (γ, θ∗ ) has order o(n−1/2 ) that −n−1 vanishes as n → ∞. In small to moderate sample sizes, this normalized expected bias may not be ignored (Ng & Cook, 1999). As suggested by these authors, we construct an adˆ justed score statistic Tn by adding the approximate bias evaluated at θ to the score function ˆ un (γ, θ). Specifically, the bias-adjusted test statistic Tn is derived by replacing the score ˆ ˆ ˆ ˆ function un (γ, θ) in (3.3) by un (γ, θ) + ∆n (γ, θ). Given that ∆n (γ, θ) ≥ 0, the bias-adjusted test statistic is at least as large as its unadjusted counterpart with probability 1. Our following simulation study provides empirical evidence that the type I error rate of the unadjusted test is smaller than the nominal level. In this simulation, we conduct a numerical study to evaluate the finite sample perfor0 mance of the score test statistic Tn . This performance is then compared to that of Tn , a score statistic that assumes a constant mixing weight. The latter statistic is derived from 73 0 (3.4) by restricting the support set Γ to the origin, that is Tn = sn (0). We also compare 0 the small sample properties of the unadjusted test statistics Tn and Tn to those of their 0 adjusted counterparts, Tn and Tn . ∗ We consider a homogeneous Poisson process with covariate-dependent mean λ∗ = exp{θ0 + i 0.4x1i + 0.5x2i }, where covariates x1i and x2i are generated independently from a binary random variate taking values 0 and 1 each with probability 0.5 and a standard normal ∗ variable truncated on the interval [−1, 1], respectively. The parameter θ0 represents the ∗ intercept which takes values in the set {−0.4, −0.2, 0}. Several values of θ0 are chosen to investigate the effect of the distance between the mean of the homogeneous Poisson process and 0 on the empirical size of the tests. Throughout our simulations, we perform the tests Tn and Tn assuming the mixing weight ˆ is 0. The maximum likelihood estimate θ of θ∗ is obtained from a homogeneous Poisson model for which the density function on the logarithm scale ln{g(yi ; θ)} is proportional to λi + yi ln{λi }, with λi = exp{θ0 + θ1 x1i + θ2 x2i }. In our calculations, we set the number of resamples K that approximates the null distribution of the test statistics to 10,000. Finally, all simulations were replicated 1,000 times and for sample sizes 50, 100 and 200. To investigate the empirical type I error rate of the proposed adjusted tests, we generate ∗ data from a homogeneous Poisson process with mean λ∗ = exp{θ0 + 0.4x1i + 0.5x2i }. The i empirical test sizes at 5% nominal level are reported in Table 4.1. All the unadjusted score tests have conservative type I error rates for the sample sizes considered. In other words, the unadjusted score tests tend to reject the null hypothesis less often than anticipated. We note however that this conservativeness diminishes as the sample size increases. In contrast, the adjusted tests have well controlled type I error rates, even for sample sizes as small as 74 Table 4.1: Empirical size of the bias-adjusted score test statistics assuming a homogeneous ∗ Poisson model with mean λ∗ = exp{θ0 + 0.4x1i + 0.5x2i }, at 5% significance level i n=50 ∗ θ0 n=100 n=200 -0.4 -0.2 0.0 -0.4 -0.2 0.0 -0.4 -0.2 0.0 ωi = 0 Tn 0 Tn 0.023 0.022 0.021 0.027 0.022 0.023 0.031 0.038 0.030 0.030 0.029 0.024 0.033 0.044 0.035 0.036 0.038 0.031 Tn 0 Tn 0.052 0.071 0.055 0.078 0.052 0.050 0.058 0.074 0.053 0.070 0.049 0.053 0.048 0.076 0.053 0.064 0.047 0.052 50. These results are consistent for the covariate-dependent and independent tests and for ∗ all values of θ0 considered. Our simulations show that this conservativeness can be reduced by appropriately adjusting the score function as in Ng & Cook (1999). 4.4 Future Study The thesis did not discuss the asymptotic optimality properties of the proposed score-type tests. By asymptotic optimality, we mean tests that are uniformly most powerful among tests of asymptotic significance level 0 < α < 1, for testing the null against contiguous alternatives. Other authors have established these asymptotic properties in other contexts. For example, Andrews and Ploberger (1994) studied asymptotic optimal results for the likelihood ratio test, and more recently Song et al., (2009) established more general results using the score, Wald and likelihood ratio statistics in the context of semiparametric models. It might be possible to extend these results that have appeared in the literature to our proposed test statistics. Other issues also merit further research. Although this thesis focuses primarily on 75 likelihood functions and a at rates below √ √ n convergence rate, results for non-smooth functions converging n can be derived in the same sprit as Lee et al., (2011). And the resampling procedure adapted accordingly. These issues and other generalizations of the methodology are the subject of further research. 76 APPENDICES 77 Appendix A Calculation of the information matrix For any parametric non-degenerate distribution, the information matrix is defined as,    Iθθ (θ, α) Iθα (θ, α)  I(θ, α) =   Iαθ (θ, α) Iαα (θ, α) t where, Iθθ (θ, α) = −E[∂ 2 ℓn (θ, α)/∂θ∂θt ], Iθα (θ, α) = −E[∂ 2 ℓn (θ, α)/∂θ∂αt ] = Iαθ (θ, α), Iαα (θ, α) = −E[∂ 2 ℓn (θ, α)/∂α∂αt ]. The calculation of these expectations is often tedious for any general parametric non-degenerate distribution. So we restrict our calculations to the observed information matrix. For this, we shall use the following notations, πi,α = ∂πi /∂α, πi,θ = ∂πi /∂θ, πi,αα = ∂ 2 πi /∂α∂αt , πi,θα = ∂ 2 πi /∂θ∂αt , πi,θθ = ˙ ˙ ¨ ¨ ¨ ˙ ¨ ∂ 2 πi /∂θ∂θt , gi,θ = gi (yi ; θ)/{1−gi (y ∗ ; θ)}, gi,θ = ∂˜i,θ /∂θ, gi,θθ = ∂ 2 gi,θ /∂θ∂θt . By using ˜ ˜ g ˜ ˜ 78 these notations, the second-order derivatives of log-likelihood function are then given by, ∂ 2 ℓn (θ, α) = ∂θ∂θt −¨ π (1 − πi ) − πi,θ πi,θ ˙ ˙t ππ ¨ − πi,θ πi,θ ˙ ˙t ∗ ) i,θθ ∗ i i,θθ + (1 − δi δi 2 (1 − πi )2 πi i=1 ˙ ˙ g g − gi,θ gi,θ ˜ ¨ ˜ ˜ ˜t ∗ ) i,θ i,θ , + (1 − δi g2 ˜ i,θ n ∂ 2 ℓn (θ, α) = ∂θ∂αt −¨ π (1 − πi ) − πi,θ πi,α ˙ ˙t ππ ¨ − πi,θ πi,α ˙ ˙t ∗ ) i,θα ∗ i i,θα , + (1 − δi δi 2 (1 − πi )2 πi i=1 ∂ 2 ℓn (θ, α) = ∂α∂αt ππ ¨ − πi,α πi,α ˙ ˙t −¨ π (1 − πi ) − πi,α πi,α ˙ ˙t ∗ i i,αα ∗ ) i,αα δi + (1 − δi 2 (1 − πi )2 πi i=1 A.1 n n Non-degenerate Poisson model with y ∗ = 0 If the non-degenerate distribution is a Poisson process, we have gi (0; θ) = exp{−µi } where µi = exp{xt β} = exp{xt θ}, with θ = β. A natural parameterization for πi is given by i i t πi = exp{− exp{zi γ}}. Assuming xi = zi , estimates of entries of the information matrix are given by, ˆ Iθθ (θ, 0) = xt diag µi (2 exp{−ˆi } − 1) ˆ µ n x, i=1 µ2 exp{−ˆi } n ˆ µ ˆ 0) = xt diag − i Iθα (θ, x, 1 − exp{−ˆi } i=1 µ ˆ Iαα (θ, 0) = xt diag µ2 exp{−ˆi } ˆi µ n 1 − exp{−ˆi } i=1 µ 79 x, ˆ ˆ where x = (xt , xt , · · · , xt )t and µi = exp{xt θ} with θ being the maximum likelihood ˆ n 1 2 i estimator of θ∗ under H0 . In these expressions, diag{ai }n represents a diagonal matrix i=1 of order n with ai , i = 1, · · · , n being the entries on the diagonal. A.2 Non-degenerate binomial model with y ∗ = 0 If the non-degenerate distribution is a Binomial distribution with number of trials mi and success probability µi , then we have µi = 1/(1 + exp{−xt β}) = 1/(1 + exp{−xt θ}), with i i t θ = β. A natural parameterization for πi is given by πi = {1 + exp{zi γ}}−mi . Assuming xi = zi , estimates of entries of the information matrix are given by, n mi (1 − µi )mi +2 (ˆ2 (1 − µi )−2 − 1) ˆ µi ˆ ˆ 0) = xt diag + mi µi (1 − µi ) ˆ ˆ Iθθ (θ, x, 1 − (1 − µi )mi ˆ i=1 m2 µ2 (1 − µi )mi n ˆ ˆ ˆ 0) = xt diag − i i x, Iθα (θ, 1 − (1 − µi )mi i=1 ˆ m2 µ2 (1 − µi )mi n ˆ i ˆi ˆ 0) = xt diag Iαα (θ, x, 1 − (1 − µi )mi i=1 ˆ ˆ ˆ where x = (xt , xt , · · · , xt )t and µi = 1/(1 + exp{−xt θ}), with θ being the maximum ˆ n 1 2 i likelihood estimator for θ∗ under H0 . 80 Appendix B Example SAS code for the Wald test Estimating the parameters under alternative via NLMIXED procedure ods output ParameterEstimates=outputpara CovMatParmEst=outputcov; proc nlmixed data=dental cov; parameters b0=0 b1=0 g0=0 g1=0 k=1; eta2=beta0 + beta1*age1; eta1=gamma0 + gamma1*age1; /* pi function */ pi=exp(-(1/k1)*log(1+k1*exp(eta1))); /* Negative Binomial mean function */ mu=exp(eta2); /* wi = mixing weight */ wi=(pi-exp((-1/k1)*log(1+k1*mu)))/(1-exp((-1/k1)*log(1+k1*mu))); /* The Zero-Inflated Negative Binomial log likelihood function */ if y=0 then ll=log(wi+(1-wi)*exp(-(1/k1)*log(1+mu*k1))); 81 else ll=log(1-wi)+y*log(mu)+y*log(k1)-lgamma(y+1) -(y+1/k1)*log(1+k1*mu)+lgamma(y+1/k1)-lgamma(1/k1); run; ods output close; Constructing a Wald test proc iml; use outputcov; read all var {b0 b1 g0 g1 k} into covmat; use outputpara; read all var {estimate} into theta; contrast={1 0 -1 0 0,0 1 0 -1 0}; wt=contrast*theta; incovw=inv(contrast*covmat*t(contrast)); /* The Wald test statistic */ Waldtest=t(wt)*incovw*wt; /* Calculate the p-value */ pvalue=1-PROBCHI(Waldtest,2,0); print Waldtest pvalue; quit; 82 Appendix C Proofs of Theorems C.1 Large sample properties of score functions Proof of Theorem 1 i) Uniform consistency of n−1 un (γ, θ∗ ) Condition B3 implies that B is Donsker and hence Glivenko-Cantelli (van der Vaart & Wellner, 2000a). Therefore, supγ∈Γ,θ∈Θ n−1 n b (γ, θ) − E{b (γ, θ)} → 0. By p 1 i=1 i fixing θ at θ∗ and from Condition B2 that requires θ∗ to be an interior point of Θ, we immediately have supγ∈Γ n−1 n b (γ, θ∗ ) − E{b (γ, θ∗ )} → 0. Moreover by noting p 1 i=1 i that E{b1 (γ, θ∗ )} = 0, for all γ ∈ Γ and that n−1 un (γ, θ∗ ) = n−1 n bi (γ, θ∗ ), we have i=1 the desired uniform consistency result. ii) Asymptotic distribution of n−1/2 un (γ, θ∗ ) Because of Condition B3, the set B is Donsker. Hence, under the null hypothesis and Condition B4, the random processes n−1/2 un (γ, θ∗ ) converge weakly to a Gaussian process ∗ with mean 0 and covariance kernel σb (γ1 , γ2 ) = E{b1 (γ1 , θ∗ )b1 (γ2 , θ∗ )t }, γ1 , γ2 ∈ Γ. 83 Proof of Theorem 2 ˆ i) Uniform consistency of n−1 un (γ, θ) Using the Taylor expansion, we have ˆ ˜ ˆ un (γ, θ) = un (γ, θ∗ ) − hn (γ, θ)(θ − θ∗ ), (C.1) n d (γ, θ) and θ belongs to the line segment between θ∗ and θ. That is, ˜ ˜ ˆ i=1 i ˜ ˆ θ = α∗ θ∗ +(1−α∗ )θ, for some α∗ ∈ [0, 1]. This expansion is possible because of Condition B2 ˜ where hn (γ, θ) = that ensures that θ∗ the true value of θ when P = G is an interior point of Θ. By the trianguˆ ˜ ˆ lar inequality, we then have, n−1 un (γ, θ) ≤ n−1 un (γ, θ∗ ) + n−1 hn (γ, θ)(θ−θ∗ ) . From ˜ ˆ Theorem 1, supγ∈Γ n−1 un (γ, θ∗ ) →p 0 as n → ∞. We focus on n−1 hn (γ, θ)(θ − θ∗ ) . ˜ ˆ ˜ ˜ ˜ ˆ ˜ ˜ ˆ Note that, n−1 hn (γ, θ)(θ − θ∗ ) ≤ {n−1 hn (γ, θ) − h(γ, θ)}(θ − θ∗ ) + h(γ, θ)(θ − θ∗ ) . We define the norm of matrix M as M 2 = supz:z=0 M z / z . Hence, we have for any z = 0, M 2 ≥ M z / z and subsequently M z ≤ M 2 z . Applying this inequality ˜ ˜ ˜ ˜ ˆ to matrices n−1 hn (γ, θ) − h(γ, θ) and n−1 hn (γ, θ) and assuming that θ − θ∗ = 0 w.p.1, we get, ˜ ˆ ˜ ˜ ˜ ˜ ˜ ˆ n−1 hn (γ, θ)(θ − θ∗ ) ≤ { n−1 hn (γ, θ) − h(γ, θ) 2 + h(γ, θ) 2 } θ − θ∗ . (C.2) From Condition B3, D is Donsker and hence Glivenko-Cantelli (van der Vaart & Wellner, ˜ 2000a). Therefore, supγ∈Γ,θ∈Θ n−1 hn (γ, θ) − h(γ, θ) 2 →p 0 as n → ∞ and in par˜ ˜ ˜ ticular, supγ∈Γ n−1 hn (γ, θ) − h(γ, θ) 2 →p 0. From Condition B5, it is also clear that ˜ ˜ ˆ supγ∈Γ h(γ, θ) 2 < ∞. Under Condition B1 that θ − θ∗ →p 0 coupled with inequality ˆ (C.2), the uniform consistency of n−1 un (γ, θ) is then obtained. ˆ ˆ ii) Asymptotic distribution of n−1/2 un (γ, θ) Based on the uniform consistency of n−1 un (γ, θ) 84 ˆ and applying the Taylor expansion to un (γ, θ), ˆ ˆ un (γ, θ) ≈ un (γ, θ∗ ) − hn (γ, θ∗ )(θ − θ∗ ), where hn (γ, θ∗ ) = n d (γ, θ∗ ). i=1 i ˆ Under Condition B1, we have n1/2 (θ − θ∗ ) = n−1/2 A∗ −1 vn (θ∗ ) + op (1). This then ˆ leads to, n−1/2 un (γ, θ) ≈ n−1/2 un (γ, θ∗ ) − n−1 hn (γ, θ∗ )n−1/2 A∗ −1 vn (θ∗ ). From Condition B3, D is Donsker and hence Glivenko-Cantelli (van der Vaart & Wellner, 2000a). ˜ Therefore, supγ∈Γ,θ∈Θ n−1 hn (γ, θ) − h(γ, θ) 2 →p 0 as n → ∞ and in particular, ˜ ˆ supγ∈Γ n−1 hn (γ, θ∗ ) − h(γ, θ∗ ) 2 →p 0. Hence, n−1/2 un (γ, θ) ≈ n−1/2 un (γ, θ∗ ) − ˜ h(γ, θ∗ )n−1/2 A∗ −1 vn (θ∗ ). From the definition of un (γ, θ∗ ) and vn (θ∗ ), the previous approximation in distribution can be rewritten, ˆ n−1/2 un (γ, θ) ≈ n−1/2 n {b (γ, θ∗ ) − h(γ, θ∗ )A∗ −1 a (θ∗ )}. ˜ i i=1 i ˜ Under Condition B3, the function class B is Donsker and because of Condition B5, h(γ, θ∗ ) is ˜ uniformly bounded for γ ∈ Γ, we have that the function class {bi (γ, θ∗ )−h(γ, θ∗ )A∗ −1 ai (θ∗ ), ˆ γ ∈ Γ, i = 1, 2 · · · , n} is also Donsker. Therefore, n−1/2 un (γ, θ), random processes in γ, converge in distribution to centered Gaussian processes as n → ∞ with covariance ∗ kernel σc (γ1 , γ2 ) = E{c1 (γ1 , θ∗ )c1 (γ2 , θ∗ )t }, γ1 , γ2 ∈ Γ, where c1 (γ, θ∗ ) = b1 (γ, θ∗ ) − ˜ h(γ, θ∗ )A∗ −1 a1 (θ∗ ), γ ∈ Γ. 85 C.2 Justification of the resampling procedure Proof of Theorem 3 To prove Theorem 3, we need to show that the conditional distribution of n−1/2 ξn (γ, θ∗ ) given observed data {yi , δi , xi }n converges asymptotically to centered Gaussian processes i=1 ∗ as n → ∞ with covariance kernel σc (γ1 , γ2 ), γ1 , γ2 ∈ Γ. Note that given observed data {yi , δi , xi }n , n−1/2 ξn (γ, θ∗ ) is a Gaussian process with a conditional covariance function i=1 Cov(n−1/2 ξn (γ1 , θ∗ ), n−1/2 ξn (γ2 , θ∗ )) = n−1 Cov( n c (γ , θ∗ )ζ , i i=1 i 1 n c (γ , θ∗ )ζ ). i i=1 i 2 From the independence assumption we have, Cov(n−1/2 ξn (γ1 , θ∗ ), n−1/2 ξn (γ2 , θ∗ )) = n−1 n i=1 Note that, n−1 Cov(ci (γ1 , θ∗ )ζi , ci (γ2 , θ∗ )ζi ), γ1 , γ2 ∈ Γ. n Cov(c (γ , θ∗ )ζ , c (γ , θ∗ )ζ ) = n−1 i 1 i i 2 i i=1 n {c (γ , θ∗ )E{ζ 2 }c (γ , θ∗ )t }, i=1 i 1 i i 2 2 with E{ζi } = 1. Using Condition B4, the random processes n−1 n {ci (γ1 , θ∗ )ci (γ2 , θ∗ )t } i=1 ∗ indexed by γ1 , γ2 , converge to σc (γ1 , γ2 ), uniformly over γ1 , γ2 ∈ Γ. We have the desired result in view of Theorem 2. Proof of Theorem 4 Given observed data {yi , δi , xi }n , we show that under H0 and Conditions B1-B5, the i=1 conditional distribution of n−1/2 ditional distribution of n−1/2 n c (γ, θ)ζ is asymptotically equivalent to the conˆ i i=1 i n c (γ, θ∗ )ζ . Using the Taylor expansion, we have, i i=1 i ˆ ˆ ci (γ, θ) ≈ ci (γ, θ∗ ) − di (γ, θ∗ )(θ − θ∗ ). This then gives, n−1/2 n−1/2 { n c (γ, θ)ζ ˆ i i=1 i ≈ n c (γ, θ∗ )ζ − n d (γ, θ∗ )(θ − θ∗ )ζ }. Furthermore consider the quanˆ i i i=1 i i=1 i ˆ ˆ tity, n−1/2 n di (γ, θ∗ )(θ − θ∗ )ζi = {n−1 n di (γ, θ∗ )ζi }{n1/2 (θ − θ∗ )}. From i=1 i=1 86 ˆ Condition B1 note that n1/2 (θ − θ∗ ) = Op (1), where Op (1) represents the boundedness in probability. The function class {di (γ, θ∗ )ζi , γ ∈ Γ, i = 1, 2 · · · , n} is Donsker due to D being Donsker and ζi = Op (1). For fixed data {yi , δi , xi }n , we then have i=1 supγ∈Γ n−1 n d (γ, θ∗ )ζ → 0 as n → ∞. Given observed data {y , δ , x }n , this p i i i i i=1 i=1 i ˆ finally gives, n−1/2 n ci (γ, θ)ζi ≈ n−1/2 n ci (γ, θ∗ )ζi , the desired result. i=1 i=1 87 BIBLIOGRAPHY 88 BIBLIOGRAPHY [1] Andrews, D. W. K. Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica 69, 3 (2001), 683–734. [2] Andrews, D. W. K., and Ploberger, W. Optimal tests when a nuisance parameter is present only under the alternative (STMA V36 3473). Econometrica 62 (1994), 1383– 1414. [3] Besag, J. Statistical analysis of non-lattice data. The Statistician: Journal of the Institute of Statisticians 24 (1975), 179–196. ¨ [4] Bohning, D., Dietz, E., Schlattmann, P., Mendonca, L., and Kirchner, U. The zero-inflated poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society: Series A (Statistics in Society) 162, 2 (1999), 195–209. [5] Chernoff, H. On the distribution of the likelihood ratio. The Annals of Mathematical Statistics 25, 3 (1954), 573–578. [6] Conniffe, D. Score tests when a nuisance parameter is unidentified under the null hypothesis. Journal of Statistical Planning and Inference 97, 1 (2001), 67–83. [7] Cox, D. R., and Hinkley, D. V. Theoretical Statistics. Chapman & Hall Ltd, 1974. [8] Davies, R. B. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64 (1977), 247–254. [9] Deng, D., and Paul, S. R. Score tests for zero inflation in generalized linear models. The Canadian Journal of Statistics / La Revue Canadienne de Statistique 28, 3 (2000), 563–570. 89 [10] Efron, B., and Tibshirani, R. An Introduction to the Bootstrap. Chapman & Hall Ltd, 1993. [11] Farewell, V., and Sprott, D. The use of a mixture model in the analysis of count data. Biometrics 44, 4 (1988), 1191–1194. [12] Farewell, V. T. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38 (1982), 1041–1046. [13] Hall, D. B. Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics 56, 4 (2000), 1030–1039. [14] Hansen, B. E. Inference when a nuisance parameter is not identified under the null hypothesis (STMA V37 4551). Econometrica 64 (1996), 413–430. [15] Heilbron, D. Zero-altered and other regression models for count data with added zeros. Biometrical Journal 36, 5 (1994), 531–547. [16] Heller, R., Rosenbaum, P. R., and Small, D. S. Split Samples and Design Sensitivity in Observational Studies. Journal of the American Statistical Association 104, 487 (2009), 1090–1101. [17] Jansakul, N., and Hinde, J. Score tests for extra-zero models in zero-inflated negative binomial models. Communications in Statistics-Simulation and Computation 38, 1 (2009), 92–108. [18] Jansakul, N., and Hinde, J. P. Score tests for zero-inflated poisson models. Computational Statistics and Data Analysis 40, 1 (2002), 75 – 96. [19] Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34 (1992), 1–14. [20] Lee, S., Seo, M. H., and Shin, Y. Testing for threshold effects in regression models. Journal of the American Statistical Association 106, 493 (2011), 220–231. [21] Lewsey, J. D., and Thomson, W. M. The utility of the zero-inflated poisson and zero-inflated negative binomial models: a case study of cross-sectional and longitudinal dmf data examining the effect of socio-economic status. Community Dentistry and Oral Epidemiology 32 (2004), 183–189. 90 [22] Lin, D. Y., Fleming, T. R., and Wei, L. J. Confidence bands for survival curves under the proportional hazards model. Biometrika 81 (1994), 73–81. [23] Lindsay, B., and Roeder, K. Residual diagnostics for mixture models. Journal of the American Statistical Association 87, 419 (1992), 785–794. [24] Lindsay, B. G. Composite likelihood methods. In Statistical Inference from Stochastic Processes (1988), N. U. Prabhu, Ed., American Mathematical Society, pp. 221–239. [25] Mullahy, J. Specification and testing of some modified count data models. Journal of Econometrics 33 (1986), 341–365. [26] Ng, E. T. M., and Cook, R. J. Adjusted score tests of homogeneity for Poisson processes. Journal of the American Statistical Association 94 (1999), 308–319. [27] Parzen, M. I., Wei, L. J., and Ying, Z. A resampling method based on pivotal estimating functions. Biometrika 81 (1994), 341–350. [28] Peng, L., and Huang, Y. Survival analysis with temporal covariate effects. Biometrika 94, 3 (2007), 719–733. [29] Ridout, M., Demetrio, C. G. B., and Hinde, J. Models for count data with many zeros. In Proceedings of International Biometric Conference (Cape Town, South Africa, 1998), pp. 179–192. [30] Ritz, C., and Skovgaard, I. M. Likelihood ratio tests in curved exponential families with nuisance parameters present only under the alternative. Biometrika 92, 3 (2005), 507–517. [31] Self, S. G., and Liang, K.-Y. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association 82 (1987), 605–610. [32] Shapiro, A. Towards a unified theory of inequality constrained testing in multivariate analysis. International Statistical Review 56 (1988), 49–62. [33] Silvapulle, M. J., and Silvapulle, P. A score test against one-sided alternatives. Journal of the American Statistical Association 90 (1995), 342–349. 91 [34] Song, R., Kosorok, M. R., and Fine, J. P. On asymptotically optimal tests under loss of identifiability in semiparametric models. The Annals of Statistics 37, 5A (2009), 2409–2444. [35] Tellez, M., Sohn, W., Burt, B., and Ismail, A. Assessment of the relationship between neighborhood characteristics and dental caries severity among low-income african-americans: A multilevel approach. Journal of Public Health Dentistry 66 (2006), 30–36. [36] van den Broek, J. A score test for zero inflation in a Poisson distribution. Biometrics 51 (1995), 738–743. [37] van der Vaart, A. W., and Wellner, J. A. Preservation theorems for GlivenkoCantelli and uniform Glivenko-Cantelli theorems. High Dimensional Probability II. eds: E Gin´, DM Mason, JA Wellner, Birkh¨user, Boston., 2000. e a [38] van der Vaart, A. W., and Wellner, J. A. Weak convergence and empirical processes. Springer, New York, 2000. [39] Verbeke, G., and Molenberghs, G. The use of score tests for inference on variance components. Biometrics 59 (2003), 254–262. [40] Xiang, L., Lee, A., Yau, K., and McLachlan, G. A score test for zero-inflation in correlated count data. Statistics in medicine 25, 10 (2006), 1660–1671. [41] Zhu, H., and Zhang, H. Generalized score test of homogeneity for mixed effects models. The Annals of Statistics 34, 3 (2006), 1545–1569. 92