ANALYSIS OF COMPLEX LIFE-HISTORY DATA AND VARIABLE SELECTION IN SURVIVAL ANALYSIS UNDER INTERVAL CENSORING By Daewoo Pak A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics – Doctor of Philosophy 2018 ABSTRACT ANALYSIS OF COMPLEX LIFE-HISTORY DATA AND VARIABLE SELECTION IN SURVIVAL ANALYSIS UNDER INTERVAL CENSORING By Daewoo Pak Event-time data are routinely obtained from longitudinal investigations to evaluate the time to onset and progression of chronic diseases. Such data are commonly referred to as disease life course data and have several nontrivial complications. In many longitudinal studies, disease life course data are subject to interval censoring, within-sampling-unit clustering, and multiplicity of event states. Moreover, because medical studies usually collect a large number of hypothesized risk factors for the disease, identifying pertinent determinants of the disease life course is of interest for disease prevention and prediction. In Chapter 1, we describe a dental caries data set from a unique longitudinal study of young low-income urban African-American children. This data set motivates the three statistical method- ologies developed respectively in Chapters 2-4. In Chapter 2, we formulate a parametric frailty Markov model coupled with a likelihood-based inference to analyze the life course data with the complications of interval censoring, within- sampling-unit clustering and three event states. We also develop a Bayesian approach to predict observational-unit-level future transition probabilities. Such probabilities have implications for precision medicine. Albeit its ease of computations, the proposed parametric method in Chapter 2 has some limita- tions. An obvious limitation is the restrictive parametric model imposed on the baseline intensities. Thus, in Chapter 3, we propose a similar model but with unspecified baseline intensities and develop a penalized spline method for the model estimation. Numerical experiments demonstrate that the proposed methods perform very well in finite samples with moderate sizes. In Chapter 4, we propose a penalized variable selection method for interval censored data under the Cox proportional hazards model. It conducts a penalized nonparametric maximum likelihood estimation with an adaptive Lasso penalty, which can be implemented through a penalized EM algorithm. The method is proved to have the oracle property. We also extend it to left truncated and interval censored data. Our simulation studies show that the method demonstrates the oracle property in samples of modest sizes and outperforms existing approaches in terms of many operating characteristics. The practical utility of the approach is illustrated using the mouth-level dental caries data introduced in Chapter 1. Copyright by DAEWOO PAK 2018 ACKNOWLEDGEMENTS First of all, I would like to express my sincere gratitude to my advisor and co-advisors, Dr. Chenxi Li, Dr. David Todem and Dr. Yuehua Cui, for their unflagging support of my Ph.D study, for their incredible patience, and for their invaluable assistance. I could not have imagined having better mentors through the doctoral program. I would also like to thank the other members of my dissertation committees, Dr. Hyokyoung Hong and Dr. Lyudmila Sakhanenko, each of whom has provided insightful comments and feedback on this dissertation. The dissertation would not be accomplished well without their helps. Thanks to my beloved parents, Chanmo Pak and Geumhwa Jang, and my proud brother, Sewoo Pak, for their endless support during the five years that it took me to earn a Ph.D. Because of them, I was able to keep moving on and overcome any obstacle. Thanks to my fiancée, Jin Kyoung Park, who was standing by me all the time and providing encouraging words and support with her constant love. She always let me stay positive throughout entire doctoral progress. I also want to thank Dr. Song Yang whom I was working with at National Institute of Health. His support was very valuable when deciding the direction of further research, and the discussions with him have always inspired me in many aspects about statistical methodologies. Thanks also goes to the graduate school and the Department of Statistics and Probability for all financial support I recieved throughout my doctoral program such as the funds for conference travels and the dissertation completion fellowship. I am also indebted to all members in the MSU Center for Statistical Training and Consulting. I worked there as a graduate consultant for two years and had excellent opportunities for learning how to interact with people under their guidance and support, and for meeting the professors and students from many disciplines, which will be the valuable assets in my life. Thanks to my friends, Jiwoong Kim, Metin Eroğlu, Yi-Chen Zhang and Shawn Santo. I also would like to thank my office-mates, Robin Todd and Satabdi Saha. I am so fortunate to get to v know them. vi TABLE OF CONTENTS LIST OF TABLES . . . LIST OF FIGURES . . . . . . . . . . . . . . . . . CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix xi 1 CHAPTER 2 PARAMETRIC ANALYSIS OF CORRELATED AND INTERVAL CEN- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SORED EVENT-HISTORY DATA . . . . . . Parameter estimation and inference . Prediction of future transition probabilities . . . . . . . . Simulation setting . . Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 2.3.1 2.3.2 Introduction . . 2.1 . . 2.2 Statistical methodologies . . 2.3 Simulation study . . 2.4 Application . . 2.5 Discussion . . . CHAPTER 3 SEMIPARAMETRIC ANALYSIS OF CORRELATED AND INTERVAL . . . Introduction . . 3.1 . . 3.2 Statistical methodologies . . . . . . . . . . . . . . . . . . . . 3.2.1 A penalized likelihood estimation . 3.2.2 . 3.2.3 CENSORED EVENT-HISTORY DATA . . . . . . . . . Statistical inference . . Prediction of future transition probabilities . . . . . . . . Simulation setting . . Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Application . . . . 3.5 Discussion . 3.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . FOR INTERVAL CENSORED DATA . . . . . . . . CHAPTER 4 VARIABLE SELECTION IN THE PROPORTIONAL HAZARDS MODEL . . . . . . . . . . . . . . . . . . . 4.1 . . . 4.2 Statistical methodologies . . . 4.2.1 Variable selection . . . 4.2.2 Covariance matrix of the adaptive Lasso estimator . . 4.2.3 Thresholding parameter tuning . . . . . . . . . . . 4.3 Asymptotic properties . 4.4 Extension to left truncated and interval censored data . 4.5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii . 7 7 . . 9 . 12 . 12 . 14 . 14 . 15 . 17 . 21 . 24 . 24 . 27 . 28 . 33 . 33 . 35 . 35 . 35 . 40 . 45 . 48 . 48 . 50 . 51 . 54 . 54 . 55 . 61 . 62 4.5.1 4.5.2 Simulation setting . . Simulation results . . . . . . . . . . . . 4.6 Application . . 4.7 Discussion . . . . . . . . . APPENDICES . APPENDIX A APPENDIX B APPENDIX C . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 . 63 . 68 . 70 . 73 . 74 . 77 . 80 . . . 82 viii LIST OF TABLES Table 1.1: The classification rule for three caries states based on a two-digit number system in the International Caries Detection and Assessment System . . . Table 2.1: Simulation results for estimating the Weibull and variance parameters . . Table 2.2: Simulation results for estimating the regression parameters β . . . . Table 2.3: The DDHP data analysis using the univariate random effect model . . . . . . . . . . . . . . 5 . . . 15 . . . 16 . . . 18 Table 2.4: The Wald test statistics (p-value) for the three types of symmetry in caries transition intensity . . . . . . . . . . . . . . Table 3.1: The simulation results for estimating β01 . . Table 3.2: The simulation results for estimating β12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 . . . 36 . . . 36 Table 3.3: The proportion of the Monte Carlo iterations where the 95% prediction interval contains the true future transition probability for the last tooth of the last subject at VmMm + t∗ where t∗ = 0.1, 0.3, 0.5, 1, 1.5 . . . . . . . . . . . . . . . . . . . . 40 Table 3.4: The regression parameter estimates for the saturated model based on the DDHP data 41 Table 3.5: The Wald test statistics (p-value) for the three types of symmetry in caries transition intensity stratified by gender and transition type . . . . . . . . . . . . 43 Table 3.6: The detailed comparisons associated with the significant non-symmetries in caries transition intensity found in Table 3.5 . . . . . . . . . . . . . . . . . . . . 44 Table 4.1: The variable selection percentages of our method (Ours), Zhang & Lu (2007)’s method based on mid-point imputation (ZL), and Wu & Cook (2015)’s method (WC) in the untruncated case . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 4.2: Average numbers of correct and incorrect zero coefficients and mean squared errors (MSE) of the coefficient estimators for our method (Ours), Zhang & Lu (2007)’s method based on mid-point imputation (ZL), and Wu & Cook (2015)’s method (WC) in the untruncated case. In the parentheses are the ideal numbers of correct and incorrect zero coefficients . . . . . . . . . . . . . . . . . 64 . 65 ix Table 4.3: Simulation results on the estimation of the non-zero coefficients in the untrun- cated case. The oracle method is the unpenalized nonparametric maximum likelihood estimation with only the covariates whose coefficients are non- zero, i.e., X1, X2, X9 and X10. The standard errors and the coverage of the Normal-based confidence intervals for Wu & Cook (2015)’s method were not computed due to the bootstrap, suggested by Wu & Cook (2015) for computing the standard errors, being time-consuming . . . . . . . . . . . . . . . . . . . . . 66 Table 4.4: The candidate covariates considered in the analysis of DDHP data . . . . . . . . 69 Table 4.5: The DDHP data analysis results. NPMLE is the coefficient estimate from the nonparametric maximum likelihood estimation, and Adaptive Lasso is the coefficient estimate from the proposed shrinkage method. Standard errors are given in the parentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Table A.1: The regression parameter estimates for the saturated model based on the DDHP data 75 Table A.2: The Wald test statistics (p-value) for the three types of symmetry in caries transition intensity stratified by gender and transition type . . . . . . . . . . . . 75 Table A.3: The detailed comparisons associated with the significant non-symmetries in caries transition intensity found in Table A.2 . . . . . . . . . . . . . . . Table B.1: The variable selection percentages of our method in the truncated case . . . . . . . 76 . . . 77 Table B.2: Average numbers of correct and incorrect zero coefficients and mean squared errors (MSE) of the coefficient estimators for our method in the truncated case. In the parentheses are the ideal numbers of correct and incorrect zero coefficients 77 Table B.3: Simulation results on the estimation of the non-zero coefficients in the trun- cated case. The oracle method is the unpenalized nonparametric maximum likelihood estimation with only the covariates whose coefficients are non-zero, i.e., X1, X2, X9 and X10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Table C.1: Questionnaire items for the family-level measures (OHSE, KBU, KCOH) Table C.2: Questionnaire items for the family-level measures (OHF, SUPPORT) . . . . . . . 80 . . . 81 x LIST OF FIGURES Figure 2.1: The progressive three states: state 0 for normal condition, state 1 for early caries and state 2 for advanced caries . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 2.2: Molars in a child’s mouth clockwise starting from the upper-right teeth (num- bers starting 5) from the dentist’s view. The child’s right side corresponds to the tooth chart’s left side. The numbers ending 4 and 5 correspond to first and second molars respectively . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.3: The predicted transition probabilities from state 0 to state 1 and to state 2 for the upper right first molar (A) and from state 1 to state 2 for the upper right second molar (B) for a boy in the DDHP dataset . . . . . . . . . . . . . . . . Figure 3.1: The average estimates of h(0)01 (t) and h(0)12 (t), their average 95% pointwise confidence intervals and their mean integrated squared errors (MISE) across the 1000 Monte Carlo samples . . . . . . . . . . . . . . . . . . . . . . . . . . 16 . 22 . 37 Figure 3.2: The comparison between the empirical standard errors and the average theo- retical standard errors of the baseline intensity estimators . . . . . . . . . . . . 38 Figure 3.3: The empirical coverage probabilities of the 95% pointwise confidence inter- vals for the baseline intensities . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Figure 3.4: The future transition probabilities to every possible state out of the caries state at the last visit for the upper right first molar (A) and the upper right second molar (B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 4.1: Normal Q-Q plots of the proposed estimators for the non-zero coefficients in the untruncated case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Figure B.1: Normal Q-Q plots of the proposed estimators for the non-zero coefficients in the truncated case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 xi CHAPTER 1 INTRODUCTION In medical research involving chronic diseases, it is often the practice to use a small number of predefined clinical states, which are exhaustive and mutually exclusive contemporaneously, to represent the spectrum of the disease. These classifications coupled with longitudinal investigations can be used to study the transition of subjects across the spectrum of the disease, which is paramount in understanding the life course of the condition. A good example is the joint damage in psoriatic arthritis, where researchers typically classify subject participants four states ranging from normal to severe states requiring surgery, in view of the reading of X-ray images (see, for example, Sutradhar & Cook, 2008) . Many chronic diseases manifest themselves at several regions of the body, resulting in endpoints that are potentially correlated even after conditioning on observed covariates. In diabetic retinopathy, for instance, retinal lesions in the two eyes within a body develop dependently, due to the two eyes being exposed to common risk factors, including genetic variants, environmental factors, control of glucose in the blood (management of diabetes), nutritional status, etc., which may not be fully measured. Any analysis that ignores this correlation is apt to yield biased inferences for parameters describing the life-history of the disease, if multiple endpoints are recorded cross- sectionally on the same subject. Finally, in many longitudinal investigations, even in studies that aim at describing the life course of the disease, data are collected intermittently, yielding interval censored data. For such data, the exact time of the transition from one state to another is known up to being bracketed by two assessment times. A good example of correlated and interval censored life-history data is in a life-course study of tooth decay. Tooth decay medically known as dental caries is the most prevalent chronic disease in childhood, despite being largely preventable (Selwitz et al., 2007). It is estimated that tooth decay is five times more prevalent than asthma in childhood and seven times more prevalent than hay fever (US Department of Health and Human Services, 2000; Benjamin, 2010). Its diagnosis is typically performed at the tooth-surface level where both the shape and the depth of any potential 1 lesions are scored on an ordinal scale representing the stages of the disease process, ranging from sound (no lesion) to lesions into the dental pulp. These detailed tooth-surface level assessments have enabled epidemiological and clinical studies to describe the intra-oral susceptibility of dental caries (Kandelman & Lewis, 1988; Batchelor & Sheiham, 2004; Zhang et al., 2011). While tooth and tooth-surface susceptibilities to cavity are well understood, the intra-oral life course of dental caries remains relatively understudied. It is well known that caries susceptibility differs between the maxilla and mandible and also across individual teeth and their surfaces, but little is known about the timing and the transition of the disease for these intra-oral characteristics. Understanding the intra-oral caries life course may provide important evidence on the timing and the transition of the disease, critical for caries prevention and treatment. The study of intra-oral caries life course involves a longitudinal association due to the same tooth or tooth surface being observed over time. This association is of primary importance and used to model the transition across caries states. A basic statistical approach is to use a multistate survival model where the outcome of interest is the time to transition across various caries states. Due to intermittent dental assessments, however, the time to disease transition is often interval censored in many medical researches. Interval censoring in multistate survival models gives rise to a new difficulty, which does not exist in classical survival models, in that several paths are possible for transitioning from one state to another between two sequential assessments. Unlike the longitudinal association, the intra-oral association is essentially a nuisance term that needs to be accommodated to yield valid inferences for the model parameters describing all tooth-level progressions. For example, when the transition durations across the caries states are to be compared for several teeth or tooth surfaces of the same child, a correlated data model is needed to accommodate the inherent intra-oral correlation. A unique database that provides detailed information for the study of intra-oral caries life course is the Detroit Dental Health Project (DDHP). This is a longitudinal study designed to understand the oral health of low-income African-American children (0-5 years at baseline) and their main caregivers (14+ years) residing in the city of Detroit, Michigan. Although it is well established that 2 low income and minority children exhibit the most adverse oral health outcomes (Vargas et al., 1998; Mouradian et al., 2000; Edelstein & Chinn, 2009), little is known about the intra-oral life course of caries in this underserved population. Among other information, this study comprises longitudinal investigations involving tooth and tooth-surface level information on caries life course which can shed light on important research questions such as: which teeth or tooth surfaces transition faster from sound to noncavitated lesions and ultimately to cavitated lesions? Another interesting intra- oral question is whether there are left-right or upper-lower symmetries in the development of tooth decay. In addition to the intra-oral investigations, the DDHP also provides information on a relatively large number of factors (child-, household- and community-levels) that are known to be associated with caries prevalence. Because a high proportion of children (81% in the DDHP data) did not exhibit any caries activity at the first examination (Wave 1), it may of interest to determine which of these variables, albeit their associations with caries prevalence, are associated with the duration of transition from no caries to some caries. The study on the duration of caries transition or caries development is ultimately of interest in dental practice as the information gained can provide guidance on the timing of dental screening and treatment, which is especially critical for low income African American children who have fewer dental visits and fewer protective sealants than any other socio-economic and racial groups (Mouradian et al., 2000; Edelstein & Chinn, 2009). The DDHP data contain longitudinal investigations of caries on all deciduous teeth in 1021 children during the three waves (2002-2003, 2004-2005 and 2007). The caries status was assessed at tooth-surface level and its result was coded as a two-digit number according to the International Caries Detection and Assessment System (ICDAS). The first digit identifies restorations, sealants or other conditions: 0 for surface neither restored nor sealed (sound), 1 for partial sealant, 2 for full sealant, 3 for tooth colored restoration, 4 for amalgam restoration, 5 for stainless steel crown, 6 for other various restorations (Full porcelain, PFM crown, etc.) 7 for lost restoration, 8 for temporary restoration, and 9 for surface missing due to carious or non-carious related reasons. The second digit codes the caries severity: 0 for sound surface, 1-2 for noncavitated lesions, 3-6 for cavitated 3 lesions with various severity (Enamel cavity, Extensive cavity, etc.), 7 for tooth missing because of caries, 8 for tooth missing for reasons other than caries, and 9 for an unerupted tooth. Since all teeth were assessed simultaneously when a child had a study visit, the examination times are the same for all teeth within a child. The statistical modeling and intra-oral and inter-oral inferences for the DDHP data are inherently complex because the data have several theoretical complications inherent in the study of intra-oral caries life course. The ICDAS scoring system generates multiple caries states over time (up to three times in DDHP study). Moreover, intra-oral clustering of teeth or tooth-surfaces with an unusual spatial configuration poses another level of difficulty. And, finally, the exact ’failure’ time to disease transition is not directly observed but it is only known up to occur between the periodic assessments, yielding interval censored survival endpoints. Such interval censoring is referred as mixed case interval censoring because it is generated with varying numbers of monitoring times among children, which is the most common type of interval censoring (Schick & Yu, 2000). Methodologically, there have been several works in the literature which examine related issues and complications. To our best knowledge, we are not aware of any robust statistical methodology that addresses all these complications simultaneously. In Chapter 2 and Chapter 3, we study how to analyze intra-oral caries life course data at the tooth-level. Thus, we reformulate the DDHP tooth-surface level data to the tooth-level data by classifying all possible two-digit numbers into three states: 0 for normal condition, 1 for early caries (noncavitated), and 2 for advanced caries (cavitated), as following Table 1.1. After then, the most severe caries status of all surface of a tooth were taken to represent the caries status of the tooth. An unerupted tooth is considered as normal condition (coded 99), and a lost tooth due to caries (coded 97) is treated as being in the advanced caries state. A deciduous tooth that is missing due to replacement by a permanent tooth is assumed to be right-censored at the last visit time when the primary tooth was seen. In these two chapters, motivated by the DDHP data, we propose a multistate Markov frailty models to analyze interval censored tooth-level life-history data of caries in the deciduous dentition. Specifically, in Chapter 2, we propose a multistate frailty Markov model 4 Table 1.1: The classification rule for three caries states based on a two-digit number system in the International Caries Detection and Assessment System A two-digit number system caries state 00, 10, 20, 99 01, 02, 11, 12, 21, 22 03, 04, 05, 06, 13, 14, 15, 16, 23, 24, 25, 26, 30, 31, 32, 33, 34, 35, 36, 40, 41, 42, 43, 44, 45, 46, 50, 51, 52, 53, 54, 55, 56, 60, 61, 62, 63, 64, 65, 66, 70, 71, 72, 73, 74, 75, 76, 80, 81, 82, 83, 84, 85, 86, 97 96, 98 0 1 2 right-censored coupled with a likelihood-based inference to analyze tooth-level life course data in caries research. In Chapter 3, the less restrictive formulation that uses spline functions to control the fitting of the baseline intensities in the multistate frailty model framework is described. In the analyses of Chapter 2 and Chapter 3, we focus on any spatial symmetry in the mouth with respect to the life course of dental caries and whether the same type of tooth has a similar decay process in male and female, using the tooth location and gender as covariates in the models. These analyses prove challenging because of nontrivial complications including intra-oral clustering, interval censoring, multiplicity of caries states, and computational complexities. Besides fitting appropriate models, the methodologies for predicting individuals’ future transition probabilities are also developed. It is helpful to know the probability of a tooth transitioning to a more severe caries state at a future time point for the timing of dentist visit and disease-modifying intervention. Because there is usually heterogeneity in transition rates between different individuals that are not captured by measured covariates, the future trajectory of a subject’s life-history process would be predicted more accurately if the future transition probabilities are predicted conditional on the subject’s up-to-date life course data. 5 In Chapter 4, we introduce a new variable selection approach for the time-to-event data subject to interval censoring and left truncation. Our interest lies in identifying which among important risk factors are associated with increased risk of developing dental caries from DDHP data. Guided by the conceptual model in Fisher-Owens et al. (2007), potential variables to examine in our analysis of the DDHP data will include child-, household- and community-level characteristics. Child-level variables will include demographic variables (e.g., age) and known risk factors (e.g, weight, brushing frequency). Household-level variables will include children’s oral health-related measures (e.g., oral health self-efficacy, usage of water filter/purifier on kitchen tap) and caregiver’s status (e.g., education level, parenting stress score). Community-level characteristics will include the number of dentists, full-scale grocery stores and churches in predefined neighborhoods. A list of the variables of interest for the analysis will be discussed in detail in Chapter 4. Instead of tooth- or tooth-surface level caries life course data, in this chapter, we consider mouth-level caries life course data, aiming to early detection of carious lesions among the surfaces of all teeth in a mouth. Thus, we create the mouth-level data from DDHP data by taking the time intervals within which first noncavitated/cavitated lesion is observed in any surface. The proposed variable selection is conducted under the Adaptive Lasso framework, and the model estimation is through a penalized Expectation–Maximization algorithm (EM) based on the penalized nonparametric maximum likelihood approach for the baseline hazard function and the regression parameters. The consistency of the regression parameter estimator is also studied along with its sparsity. 6 CHAPTER 2 PARAMETRIC ANALYSIS OF CORRELATED AND INTERVAL CENSORED EVENT-HISTORY DATA 2.1 Introduction Several survival analysis methods, which can accommodate some of the complications of the DDHP data such as interval censoring and intra-oral correlation, have been developed and appied to caries history data. For example, Tommi et al. (2000) and Komárek et al. (2005) used nonparametric and semiparametric Bayesian intensity models to analyze clustered and interval censored time-to- caries data. Komárek & Lesaffre (2008) proposed a Bayesian accelerated failure time model for clustered doubly-interval censored data for the time to caries of permanent first molars. Beside these methods tailored for caries data, there are other survival analysis methods for multiple (correlated) interval censored survival time data (see e.g., Goggins & Finkelstein, 2000; Bogaerts et al., 2002; Kim & Xue, 2002; Kor et al., 2013) that can be applied to caries research. These existing models, however, can only handle time-to-event data where the event is a binary endpoint, e.g. the incidence of caries defined according to a dichotomized version of the ICDAS scoring criterion. Although important, this approach only provides partial information on the complete time course of dental caries. To our best knowledge, only few papers (see e.g., Sutradhar & Cook, 2008; Joly et al., 2012) studied the class of multistate models for clustered life-history data under interval censoring. However, neither of Sutradhar & Cook (2008) and Joly et al. (2012) considered cluster-level or observation-unit-level covariates in their models, and they did not discuss how to predict unit-level future transition. Simulations corroborating their methods are also lacking. From a methodological viewpoint, this chapter aims at developing a statistical method that simultaneously addresses the complications of the intra-oral caries life course data and the limitations of existing works. Intra-oral caries life course data are complex for statistical analysis. Firstly, dental caries is usually classified into multiple states according to its severity. Secondly, the caries state of 7 each tooth is determined at intermittent dental examinations, which leads to interval censoring of the transition time between states. Thirdly, the progressions of caries at different teeth within a mouth are correlated. To address these complexities simultaneously in the analysis, we consider a parametric estimation of multistate models for clustered interval censored data coupled with a conditionally progressive Markov model. The approach is challenging in many points. Even for uncorrelated data, non-homogeneous Markov multistate models are hard to fit with interval censored observations and general software is not readily available (Cook & Lawless, 2014). The practical novelty of the methodology relies essentially on the application of multistate models to analyze complex caries life course data. This methodological approach will advance knowledge in dental caries research by providing a more detailed information on the caries life course, both at tooth and tooth-surface levels. Additionally, the effects of potential determinants of early childhood caries transition, e.g. gender, tooth types and sites on teeth, will be easily evaluated using regression techniques through the transition intensities. And finally, the proposed model will be able to predict future caries outcomes for a tooth or tooth surface given its current disease status. The proposed statistical methods also have general applications to modeling chronic dieases involving paired or multiple organs in cohort studies where the diseases status is measured inter- mittently. Besides dental caries, popular examples includes sacroiliac joint damage in psoriatic arthritis and diabetic retinopathy as discussed in Chapter 1. For these cohort studies of chronic diseases, the proposed models and the associated estimation and inferences can be used to estimate the rate of disease onset and progression, to identify demographic, genetic and environmental risk factors, and to assess therapeutic interventions; and predict future disease status at the individual or population level. Despite a wide range of applications of the proposed statistical methods, their application to dental caries is unique in view of the intral-oral clustering with an unusual spatial configuration. The rest of the chapter is organized as follows. In Section 2.2, we introduce the three-state non- homogeneous Markov frailty model and derive the corresponding likelihood of tooth-level caries life course data. Section 2.2.1 discusses the likelihood-based estimation and inference. Section 8 2.2.2 presents the Bayesian prediction method. In Section 2.3, we perform a simulation study to investigate the finite sample performance of the likelihood-based inferential approach, followed by the application of the proposed methodology to the DDHP data in Section 4.6. Section 2.5 gives some concluding remarks. 2.2 Statistical methodologies We model the caries transition profile of n specific primary teeth (n = 20 if all the primary teeth are modeled). The time scale is chosen to be age, so birth is the time origin. Let m be the number of children. The study visit times of the i-th child are denoted by Vi1, Vi2, . . . , ViMi where Mi is the number of visits for the i-th child and can vary across children. Let Ai j(t) j = 1, . . . , n) at time t be the caries state of the j-th tooth for the i-th subject (i = 1, . . . , m; , (t ≥ 0), observed only at the finite time points. Our basic model assumes that Ai j(t) follows a conditionally progressive Markov model with the three states representing the tooth-level caries spectrum because a direct transition from state 0 to 2 is biologically impossible and the only plausible reverse transition, 1 to 0 (Ismail et al., 2011), is rare in reality (3.4% of all the transitions on primary molars in the DDHP data). Also note that any continuous intensity model for Ai j(t) that allows reverse transitions is non-identifiable under interval censoring. Therefore, we consider tooth decay a progressive process from normal condition to early caries and finally to advanced caries as illustrated by Figure 2.1. The corresponding observed caries states of the j-th tooth are denoted by K( j)i1 , . . . , K( j)iMi). Let Xi j be the baseline covariate vector of interest for the j-th tooth of the i-th child, e.g., gender, tooth respectively. Define Vi ≡ (Vi1, . . . , ViMi) and Ki j ≡ (K( j)i1 , K( j)i2 , . . . , K( j)iMi location, left-handed or right-handed, number of dentists nearby, and socioeconomic status. Define Yi j ≡ (Vi, Ki j, Xi j) and Yi ≡ (Yi1, . . . , Yin). Then an interval censored tooth-level caries-history data set consists of Y ≡ (Y1, . . . , Ym). Caries life course data of the teeth within a child are correlated, so are the transition times between different pairs of states in a tooth. We use a subject-level random effect on transition intensities modulated by transition types to capture these two kinds of correlation. Specifically, we 9 Figure 2.1: The progressive three states: state 0 for normal condition, state 1 for early caries and state 2 for advanced caries account for the intra-oral correlation between Ai j(t)’s ( j = 1, . . . , n) by positing a common random effect ui in their transition intensities, and assume that hab(t|Xi j, ui) = lim ∆t→0 = lim ∆t→0 Pr(Ai j(t− + ∆t) = b| Ai j(u), 0 ≤ u < t, Ai j(t−) = a, Xi j, ui) Pr(Ai j(t− + ∆t) = b| Ai j(t−) = a, Xi j, ui), where (a, b) ∈ {(0, 1), (1, 2)}. Furthermore, we postulate the following multiplicative intensity model for h01(t|Xi j, ui) and h12(t|Xi j, ui), h01(t|Xi j, ui) = h(0)01 (t) exp(XT h12(t|Xi j, ui) = h(0)12 (t) exp(XT i j β01 + σ01ui) i j β12 + σ12ui) (2.1a) (2.1b) where β01 and β12 are the regression coefficient vectors of Xi j , ui’s are assumed to be i.i.d. from the standard normal distribution, σ10 and σ12 are the standard deviations of the log frailties corresponding to the two transition types respectively, and h(0)01 (t) and h(0)12 (t) are the baseline intensity functions. In the parametric modeling framework, we can use the Weibull distribution to model the baseline intensities, namely, h(0)01 (t) = k01r01tr01−1 and h(0)12 (t) = k12r12tr12−1 with unknown positive parameters (k01, r01, k12, r12). Denote the vector of all parameters by θ ≡ (log k01, log r01, log k12, log r12, log σ01, log σ12, β01, β12). Let f (·) be a generic notation of density/mass function and f (·|·) be a generic notation of conditional density/mass function. Under the assumption that the tooth decay process is Markovian and Yi’s are independent across children, the likelihood function of the data equals, up to a multiplicative constant not dependent on θ, mi=1Þ ∞ −∞ L(θ; Y) = nj =1 f (Ki j|θ, Vi, Xi j, ui)φ(ui)dui, (2.2) 10 where φ(·) is the density function of the standard normal distribution. Under the Markov assumption, f (Ki j|θ, Vi, Xi j, ui) can be expressed as the product of transition probabilities, namely, f (Ki j|θ, Vi, Xi j, ui) = Miq=1 f (Ai j(Viq) = K( j)iq | Ai j(Vi,q−1) = K( j)i,q−1 , Xi j, ui), where Vi0 = 0, K( j)i0 Thus the log-likelihood function, l(θ; Y), is simply, = 0 and Ai j(0) = 0 according to the caries classification scheme in Chapter 1. f (Ai j(Viq) = K( j)iq | Ai j(Vi,q−1) = K( j)i,q−1 φ(ui)dui . (2.3) l(θ; Y) = logÞ ∞ −∞ mi=1 Miq=1 nj =1 , Xi j, ui) To simplify the notations, we write the transition probability f (Ai j(Viq) = K( j)iq | Ai j(Vi,q−1) = (Vi,q−1, Viq, Xi j, ui, θ), which , Xi j, ui) on the right-hand side of (2.3) as a function p K( j)i,q−1 K( j)i,q−1 can be computed using h01(t|Xi j, ui) and h12(t|Xi j, ui) as follows, K( j)iq p00(Vi,q−1, Viq, Xi j, ui, ξ, σ) = S01(Vi,q−1, Viq|Xi j, ui) p01(Vi,q−1, Viq, Xi j, ui, ξ, σ) =Þ Viq Vi,q−1 S01(Vi,q−1, r|Xi j, ui)h01(r|Xi j, ui)S12(r, Viq|Xi j, ui)dr (2.4b) p02(Vi,q−1, Viq, Xi j, ui, ξ, σ) Vi,q−1Þ Viq =Þ Viq =Þ Viq Vi,q−1 r1 S01(Vi,q−1, r1|Xi j, ui) · h01(r1|Xi j, ui)S12(r1, r2|Xi j, ui)h12(r2|Xi j, ui)dr2dr1 S01(Vi,q−1, r|Xi j, ui)h01(r|Xi j, ui){1 − S12(r, Viq|Xi j, ui)}dr (2.4a) (2.4c) (2.4d) p11(Vi,q−1, Viq, Xi j, ui, ξ, σ) = S12(Vi,q−1, Viq|Xi j, ui) 11 p12(Vi,q−1, Viq, Xi j, ui, ξ, σ) =Þ Viq Vi,q−1 S12(Vi,q−1, r|Xi j, zi)h12(r|Xi j, ui)dr = 1 − S12(Vi,q−1, Viq|Xi j, ui) (2.4e) (2.4f) p22(Vi,q−1, Viq, Xi j, ui, ξ, σ) = 1 where Sab(x, y|Xi j, ui) ≡ exp{−∫ y have a closed form and thus are numerically approximated using the Gauss-Jacobi quadrature. x hab(t|Xi j, ui)dt}. The integrals involved in p01 and p02 do not 2.2.1 Parameter estimation and inference We maximize the log likelihood of the observed data (2.3) to get the maximum likelihood estimate (MLE) ˆθ. The Gauss-Jacobi quadrature is used to compute the integrals over finite intervals in (2.4b, 2.4c) that do not have a closed form, and the Gauss-Hermite quadrature is used to compute the integrals with respect to the random effect in (2.3). In the Gauss-Jacobi quadrature, the number of quadrature points is determined by # of quadrature points = min(cid:18)25, 50 × length max.length(cid:19) , where ‘length’ is the length of the integration interval and ‘max.length’ is the maximum length among all the integration intervals. In the Gauss-Hermite quadrature, 25 quadrature points are used to get a numerical integral value. The optimization is carried out by the R function nlminb. The observed information matrix for θ, Iobs( ˆθ), is founded by the summation of the cross products of the subject-level score function. The inference about θ is based on the asymptotic distribution of ˆθ, approximated by N(θ, Iobs( ˆθ)−1). 2.2.2 Prediction of future transition probabilities Using all the children’s caries life-history data, the transition probability of the j-th tooth in the i-th child from the last observed state is expressed by, p K( j)iMi q(ViMi , t, Xi j, ui, θ), 12 (2.5) and q ≥ K( j)iMi . Given the data, θ and ui are the only two unknown quantities in (2.5). where t > ViMi So a natural estimator of (2.5) can be obtained by substituting ˆθ for θ and ˆui for ui, where ˆui is the empirical Bayes estimator of ui under the zero-one loss obtained by maximizing f (Yi|ui, ˆθ)φ(ui). However, it is difficult to construct a prediction interval for (2.5) using a frequentist approach. We can also predict the transition probabilities using a Bayesian approach. Specifically, we assume θ has a flat prior distribution so that we can predict (2.5) by its posterior mean or median obtained from simulating (ui, θ) according to their joint posterior distribution. The credible interval for (2.5) can also be obtained from the simulation. We derive the posterior distribution of (ui, θ) as follows, f (ui, θ|Y) =Þ · · ·Þ f (u1, . . . , um, θ|Y)du1 · · · dui−1dui+1 · · · dum φ(u j)du1 · · · dui−1dui+1 · · · dum 1 ∝ = = ∝ f (Yi|ui, θ)φ(ui) f (Y) mj =1, j ,i mj =1Þ f (Y j|u j, θ)φ(u j)du j f (Y)Þ · · ·Þ f (Y|u1, . . . , um, θ) ∫ f (Yi|ui, θ)φ(ui)dui ×m ∫ f (Yi|ui, θ)φ(ui)dui × f (θ|Y), f (Yi|ui, θ)φ(ui) f (Yi|ui, θ)φ(ui) j =1∫ f (Y j|u j, θ)φ(u j)du j f (Y) (2.6) which implies f (ui|Y, θ) ∝ f (Yi|ui, θ)φ(ui). When m is large enough, we can approximate f (θ|Y) by the multivariate normal distribution, N( ˆθ, Iobs( ˆθ)−1). Given the observed data of i-th child along with the child’s random effect and the parameters, i.e, (Yi, ui, θ), f (Yi|ui, θ) can be computed using the transition probabilities (2.4a - 2.4f). Therefore, we can generate a posterior realization, (u∗i , θ∗), from f (ui, θ|Y) in the following two steps: (a) generate θ∗ from the multivariate normal distribution N( ˆθ, Iobs( ˆθ)−1) (b) generate u∗i from f (Yi|ui, θ∗)φ(ui) using the Metropolis-Hastings algorithm with the burn- in period 5000 and the proposal distribution being a normal distribution with appropriate variance allowing the acceptance ratio to be between 20% and 40% in general. 13 p From a large sample of (ui, θ)’s simulated from f (ui, θ|Y), we can get a large sample of , t, Xi j, ui, θ)’s. Thus, the posterior mean or median of (2.5) can be approximated by K( j)iMi the sample mean or median, and a 95% credible interval for (2.5) can be approximated by the q(ViMi interval between the 2.5% and 97.5% sample quantiles of the p q(ViMi , t, Xi j, ui, θ)’s. K( j)iMi 2.3 Simulation study 2.3.1 Simulation setting We performed a Monte Carlo simulation study to investigate the finite-sample performance of the likelihood-based inferential approaches for θ. Two different sets of sample sizes were con- sidered: 1) m = 200 and n = 4, and 2) m = 400 and n = 4. With the assumption of the Weibull intensities, the vector of Weibull and variance parameters in (2.1) was chosen to be (r01, k01, r12, k12, σ12, σ12) = (1.2, 0.1, 0.9, 0.2, 1, 1.2). After taking the logarithm of these values, we get (log r01, log k01, log r12, log k12, log σ01, log σ12) = (0.18,−2.30, −0.11,−1.61, 0, 0.18). As covariates, three dummy variables X1, X2 and X3 indicating the four tooth locations and a stan- dard normal random variable X4 were considered. For the dummy variables, the location of the first tooth was set as the reference. The continuous covariate was chosen to be a subject-specific variable which means all teeth within a subject have the same value of X4. The coefficients of , β(4)01 ) = (0.2, −0.1, 0.1, 0.3) for the 0-to-1 (X1, X2, X3, X4) were chosen to be β01 ≡ (β(1)01 transition and β12 ≡ (β(1)12 , β(4)12 ) = (0.5, −0.2, 0.3, 0.2) for the 1-to-2 transition. The , β(2)01 , β(3)01 , β(2)12 , β(3)12 exact transition times from one caries state to the next were generated by the inverse cumulative distribution function method. Three visit times, V1, V2 and V3, were generated for each subject in this manner: V1 ∼ U(1.6, 2.4), V2 = V1 + U(0.8, 1.2), and V3 = V2 + 5. The final data set in each iteration of simulation consist of visit times, caries state at each visit time, and covariates for the four teeth of every subject. The number of iterations was one thousand in the Monte Carlo simulation. 14 Table 2.1: Simulation results for estimating the Weibull and variance parameters log r01 log k01 log r12 log k12 log σ01 log σ12 True value 0.182 -2.303 -0.105 -1.609 0 0.182 m=200, n = 4 Estimate† Coverage‡ MSE Bias Empirical SE Mean of SEs∗ m=400, n = 4 Estimate Coverage MSE bias Empirical SE of Mean of SEs 0.188 0.968 0.002 0.006 0.049 0.054 0.183 0.960 0.001 0.000 0.035 0.038 -2.317 0.957 0.025 0.015 0.157 0.164 -2.302 0.946 0.013 0.000 0.115 0.113 -0.083 0.958 0.019 0.022 0.135 0.150 -0.093 0.962 0.009 0.012 0.092 0.103 -1.674 0.968 0.139 0.065 0.367 0.402 -1.636 0.961 0.065 0.027 0.254 0.275 -0.011 0.944 0.008 0.011 0.087 0.087 -0.017 0.939 0.004 0.017 0.062 0.061 0.164 0.947 0.017 0.019 0.129 0.128 0.157 0.951 0.008 0.026 0.087 0.088 † The average of the estimates across the 1000 simulations. ‡ The empirical coverage of the 95% confidence interval based on the asymptotic distribution of ˆθ. ∗ The average of the asymptotic standard errors across the 1000 simulations 2.3.2 Simulation results Table 2.1 and Table 2.2 show the simulation results of model estimation for the two different sample sizes. As seen in Table 2.1 and Table 2.2, the bias, standard error and mean squared error of every parameter’s estimator are small in both scenarios. The empirical coverage of every parameter’s asymptotic confidence interval is close to the nominal level. The empirical standard error of every estimator is also close to the asymptotic standard error. When the sample size doubles, both MSE and variance of every estimator reduce about by half as expected according to the asymptotic theory. In a word, the likelihood-based inference for θ performs very well when the number of subjects is equal to or greater than 200. 15 Table 2.2: Simulation results for estimating the regression parameters β β(1)01 0.2 β(2)01 -0.1 β(3)01 0.1 β(4)01 0.3 β(1)12 0.5 β(2)12 -0.2 β(3)12 0.3 β(4)12 0.2 True value m=200, n = 4 0.198 -0.098 0.100 0.305 0.510 -0.191 0.310 0.201 Estimate 0.965 0.938 0.953 Coverage 0.017 0.036 0.014 MSE 0.010 0.001 Bias 0.002 0.188 0.118 Empirical SE 0.131 Mean of SEs 0.135 0.199 0.112 0.954 0.928 0.972 0.018 0.008 0.035 0.000 0.005 0.010 0.134 0.091 0.186 0.136 0.084 0.198 0.954 0.040 0.009 0.200 0.204 0.963 0.017 0.002 0.132 0.138 m=400, n = 4 0.200 -0.103 0.099 0.307 0.497 -0.199 0.302 0.203 Estimate 0.947 0.956 0.921 Coverage 0.018 0.008 0.009 MSE 0.002 0.003 bias 0.000 0.135 0.090 Empirical SE 0.095 Mean of SEs 0.094 0.137 0.076 0.951 0.912 0.947 0.009 0.004 0.018 0.001 0.007 0.003 0.093 0.065 0.133 0.094 0.058 0.137 0.953 0.009 0.003 0.093 0.096 0.962 0.017 0.001 0.132 0.141 Figure 2.2: Molars in a child’s mouth clockwise starting from the upper-right teeth (numbers starting 5) from the dentist’s view. The child’s right side corresponds to the tooth chart’s left side. The numbers ending 4 and 5 correspond to first and second molars respectively 16 2.4 Application In this section, we apply the three-state Markov frailty model to the Analysis of Detriot Dental Health Project (DDHP) data and demonstrate how to predict the future tooth decay based on the given information on a child. The data on the first and second molars were analyzed, resulting in eight teeth per child (see Figure 2.2). We consider fifteen dummy covariates to indicate every possible combination of tooth type (first or second molar), tooth location (upper or lower, right or left) and gender of the child in the model. The upper right first molar for male is set as the reference category. The total number of parameters is 36 because there are additionally four Weibull parameters and two random-effect scale parameters and all the parameters are specific to the transition type (0-to-1 and 1-to-2). We first fit this saturated model, and the resulting estimates and standard errors are shown in the Table 2.3. Based on the saturated model, we try to answer the following three scientific questions: i. Is there an interaction effect between gender and tooth identity (type and location) on tooth decay? ii. Do the first and second molars have the same tooth decay profile? iii. Is there a upper-lower symmetry or right-left symmetry in tooth decay profile? Every question above can be answered by testing a hypothesis of the form H0: Aθ = 0 formulated by choosing an appropriate matrix A with full row rank s. A test for it would be the Wald test: (A ˆθ)T(AI−1 obs( ˆθ)AT)−1 A ˆθ H0∼ χ2 s . (2.7) To test the interaction between gender and tooth identity, the hypothesis is formulated as H0:  αi,01 = β(i+1),01 − β1,01 αi,12 = β(i+1),12 − β1,12 17 , i = 1, . . . , 7. (2.8) Table 2.3: The DDHP data analysis using the univariate random effect model Parameter Estimate SE p-value α1,01 α1,12 α2,01 α2,12 α3,01 α3,12 α4,01 α4,12 α5,01 α5,12 α6,01 α6,12 α7,01 α7,12 β1,01 β1,12 β2,01 β2,12 β3,01 β3,12 β4,01 β4,12 β5,01 β5,12 β6,01 β6,12 β7,01 β7,12 β8,01 β8,12 -0.0087 0.1088 0.3464 0.2495 0.3450 0.2214 1.4359 -0.1986 1.5080 -0.2002 1.3746 -0.1279 1.4169 -0.1234 0.2178 0.0110 0.3071 0.0769 0.2723 0.3337 0.3277 0.3509 1.6215 -0.0885 1.6814 0.0694 1.8786 0.0509 1.9004 -0.0679 0.1219 0.1842 0.1106 0.1768 0.1028 0.1657 0.9431 0.5546 0.0017 0.1580 0.0008 0.1816 0.2453 0.1051 < 0.0001 0.1709 0.1071 < 0.0001 0.1537 0.1006 < 0.0001 0.1585 0.1057 < 0.0001 0.1543 0.1925 0.4194 0.4237 0.1198 0.1621 0.1118 0.1565 0.1126 0.1518 0.1141 0.1608 0.0691 0.9461 0.0060 0.6230 0.0156 0.0279 0.0041 0.0291 0.5590 0.1088 < 0.0001 0.1515 0.1153 < 0.0001 0.1503 0.1192 < 0.0001 0.1531 0.1114 < 0.0001 0.1518 0.6546 0.6440 0.7395 Gender Molar Type Quadrant Upper left teeth (64) 1st molar Lower left teeth (74) Male 2nd molar Lower right teeth (84) Upper right teeth (55) Upper left teeth (65) Lower left teeth (75) Lower right teeth (85) Upper right teeth (54) 1st molar Upper left teeth (64) Lower left teeth (74) Female 2nd molar Lower right teeth (84) Upper right teeth (55) Upper left teeth (65) Lower left teeth (75) Lower right teeth (85) Transition 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 18 According to the data analysis, this null hypothsis was rejected at a 5% significant level (p- value< 0.0001). It indicates that there is a statistically significant interaction effect between gender and tooth identity on tooth decay, so questions (ii) and (iii) would be more meaningful if asked for each gender. The subsequent null hypotheses of question (ii) are formulated as H0: for male and αj,01 = α( j +4),01 α4,01 = 0 αj,12 = α( j +4),12 α4,12 = 0  βk,01 = β(k+4),01 βk,12 = β(k+4),12 , j = 1, 2, 3, (2.9) , k = 1, 2, 3, 4, (2.10) H0:  for female. The p-values for testing these two hypotheses are both less than 0.0001. Thus the first and second molars have different tooth decay profiles. Question (iii) asks whether the dental caries progresses in a same pattern between lower (or right) and upper (or left) teeth. The null hypotheses for the upper-lower symmetry are formulated as (2.11) α1,01 = α2,01 α1,12 = α2,12 α3,01 = α3,12 = 0 α4,01 = α7,01 α4,12 = α7,12 α5,01 = α6,01 α5,12 = α6,12 H0:  19 for male and H0: β1,01 = β4,01 β1,12 = β4,12 β2,01 = β3,01 β2,12 = β3,12 β5,01 = β8,01 β5,12 = β8,12 β6,01 = β7,01 β6,12 = β7,12 (2.12)  for female, and the null hypotheses for the right-left symmetry are formulated as for male and αi,01 = αi+1,01 αi,12 = αi+1,12 , i = 2, 4, 6, (2.13) α1,01 = α1,12 = 0 βk,01 = βk+1,01 βk,12 = βk+1,12 , k = 1, 3, 5, 7, (2.14) H0:  H0:  for female. The test statistics for the hypotheses (2.11), (2.12), (2.13) and (2.14) are 35.412 (p-value < 0.0001), 29.658 (p-value = 0.0002), 0.999 (p-value = 0.8415) and 2.916 (p-value = 0.9395) respectively. Therefore, right-left symmetry exists in the tooth decay profile for each gender. The hypotheses above can be expressed in simpler forms if we use more subscripts (see Section 3.4 of Chapter 3), although the current expression of the hypotheses is more intuitive. We further performed the tests for questions (ii) and (iii) only to the transition from 0 to 1 or from 1 to 2. Table 2.4 shows the Wald test statistics and their p-values of these tests for symmetry. One can see that there is a symmetry in 1-to-2 transition intensity between the upper and lower molars in boys. Since hypotheses (2.13) and (2.14) were not rejected, it is no wonder that none of the tests for right-left symmetry in Table 2.4 shows any significant result. 20 Table 2.4: The Wald test statistics (p-value) for the three types of symmetry in caries transition intensity Gender Test for symmetry Upper vs Lower Right vs Left Transition 0 to 1 28.95(< 0.0001) 0.52(0.9716) 1 to 2 4.77(0.3115) 0.37(0.9849) Male Female First molar vs Second molar 1013.65(< 0.0001) 28.65(< 0.0001) Upper vs Lower Right vs Left 16.58(0.0023) 1.08(0.8967) First molar vs Second molar 1742.93(< 0.0001) 12.42(0.0145) 1.83(0.7666) 16.45(0.0025) To illustrate predicting the future transition probability, we chose a boy from the dataset who had only two visits (2.14 yrs and 4.23 yrs) with the following caries states: the upper right first molar and all the lower teeth in study remained sound over the two study visits; the rest of the considered teeth were sound at the first visit but were found to have early caries at the second. No tooth had advanced caries. The empirical Bayes estimator of ui is 0.13 for the child. The posterior median and the 95 % credible interval of this boy’s transition probability after his last visit were computed over the time interval (4.23 yrs, 8 yrs). A sample of 200 transition probability functions were computed based on 200 (ui, θ)’s generated from their joint posterior distribution. The top plot of Figure 2.3 shows the probabilities of the 0-to-1 and 0-to-2 transitions for his upper right first molar, and the bottom plot shows the 1-to-2 transition probability for his upper right second molar. The solid line is the posterior median of the transition probability. The credible interval refers to the interval between the 2.5% and 97.5% sample quantiles of the transition probability. One can see that at age six, this boy’s upper right first molar only has probability 0.03 of having advanced caries, but his upper right second molar has probability 0.23 to be in advanced caries state. 2.5 Discussion We proposed a parametric three-state Markov frailty model for tooth-level caries life course data under interval censoring. The likelihood-based inference for the model was shown to perform very well in finite samples by simulation. Furthermore, we came up with a Bayesian approach to predict future transition probabilities given a child’s observed caries life-history data. It is worthy to point 21 A) B) y t i l i b a b o r p n o i t i s n a r t e r u t u F 8 . 0 6 . 0 4 . 0 2 . 0 0 y t i l i b a b o r p n o i t i s n a r t e r u t u F 8 . 0 6 . 0 4 . 0 2 . 0 0 4.23 5 6 7 8 4.23 5 6 7 8 Age (years) Age (years) Posterior mode (0−to−1 transition) 95% prediction interval (0−to−1 transition) Posterior mode (0−to−2 transition) 95% prediction interval (0−to−2 transition) Posterior mode (1−to−2 transition) 95% prediction interval (1−to−2 transition) Figure 2.3: The predicted transition probabilities from state 0 to state 1 and to state 2 for the upper right first molar (A) and from state 1 to state 2 for the upper right second molar (B) for a boy in the DDHP dataset out that the prediction approach can predict future caries development not only for individual teeth but also at the mouth level. For instance, using the Bayesian approach, one can obtain a predicted value and a prediction interval of the probability that no tooth in the mouth will develop new caries within one year after the most recent dental visit. Information gained from applying the proposed methods to real data will provide guidance on the timing of dental screening and treatment. Our statistical methods also have general applications to modeling chronic diseases involving paired or multiple organs in cohort studies where the disease status is measured intermittently. Besides dental caries, popular examples include sacroiliac joint damage in psoriatic arthritis (Rahman et al., 1998) and diabetic retinopathy (Diabetes Control and Complications Trial Research Group, 1995). Several future research directions are worth to pursue. Firstly, it would be appealing to model the baseline transition intensity nonparametrically, which will be discussed in Chapter 3. To achieve 22 this, we will use a linear spline model for the log baseline intensity since it does not induce additional numerical integrations in the likelihood calculation. Secondly, the intra-oral and the inter-transition correlations of caries life course data can be accounted for in the analysis in a different way. For instance, we can replace the shared random effect in our model by two transition-specific random effects with a bivariate normal distribution. GEE-type of analysis is also feasible. Lastly, one may relax the proportionality assumption on intensities between different teeth by forming strata (e.g., dental quadrants) and using stratum-specific baseline intensities. 23 CHAPTER 3 SEMIPARAMETRIC ANALYSIS OF CORRELATED AND INTERVAL CENSORED EVENT-HISTORY DATA 3.1 Introduction As discussed in Chapter 1, in many instances, the data generated from longitudinal investiga- tions are interval censored, in that the transition time point is only known up to a time interval. Additionally, many chronic conditions, despite being inherently continuous, are often represented by a finite number of states derived from clinical evaluations, serological tests and other medical assessment tools. Furthermore, many chronic conditions manifest themselves at various locations of the body, resulting in correlated time-to-event endpoints even after conditioning on observed covariates. In caries research, carious lesions are likely to develop on multiple teeth of a person, owing to the intra-oral association resulting from sharing common risk factors, including genetic variants and environmental factors which may not be fully measured. In this chapter, we are interested in formulating a flexible multistate model to describe a set of multiple life-history processes observed only at intermittent time points. A data example motivating the proposed model comes from the Detroit Dental Health Project (DDHP) involving tooth-level caries assessments in primary dentition for young inner-city African-American children, which has been introduced in detail in Chapter 1. In this era of precision medicine, understanding the transition process of a tooth across the caries spectrum could aid dentists design tooth-specific prevention and treatment plans. Existing works in caries research have primarily focused on mouth-level indices such as the number of Decayed, Missing or Filled teeth or surfaces, a typical measure of caries experience in human populations (Lewsey & Thomson, 2004). Albeit important, these indices have well documented limitations regarding their inability to provide a comprehensive assessment of dental caries (Lewsey & Thomson, 2004), which is critical for personalizing preventive oral health care and treatment plans. 24 Analysis of correlated interval censored event-history data is nontrivial, owing to the difficulty of the theoretical argument and the computational burden of the numerical implementation. For example, the use of counting process techniques is very difficult in the framework of interval censored data, hence prohibiting the use of martingale theory. Consequently, unlike right-censored data, one has to rely on complicated empirical process arguments to investigate the asymptotic properties of the estimation procedures (Zhang & Sun, 2010). It is also well known that when multiple life-history assessments are made on the same sampling unit, any analysis involving the within-unit correlation of the data, viewed as a quantity of scientific interest or a nuisance, necessitates the use of methods for correlated survival data (Kalbfleisch & Prentice, 2002, Chapter 10). To the best of our knowledge, only few statistical papers (Sutradhar & Cook, 2008; Joly et al., 2012) have studied analysis of correlated interval censored event-history data without imposing the common modeling restriction that the life-history process is time-homogeneous, i.e. having a constant transition intensity matrix across time. Assuming a restrictive model structure is apt to yield biased analysis results due to model misspecification. The existing works and the methodological approach of Chapter 2 relied on likelihood-based inference by means of frailty models for intensity functions to account for the within-sampling-unit clustering. Albeit important, these works still have notable limitations. For example, the works of Sutradhar & Cook (2008) and Joly et al. (2012) are not amenable to analysis involving cluster-level or observation-unit level covariates. As well as the approach in Chapter 2, the works of Joly et al. (2012) also assumed that the baseline intensity functions are of Weibull type, which is apt to yield biased inferences for the covariate effects on the transition intensities. Sutradhar and Cook (2008) attempted to relax this modeling assumption by allowing the baseline intensities to be piecewise constant. A limitation, however, is that the selection of the number and locations of knots was ad hoc ignoring that a poorly chosen number of knots may introduce bias. Finally, existing works such as Sutradhar & Cook (2008) and Joly et al. (2012) often lacked numerical simulations to support the heuristic arguments being proposed. To circumvent some of the limitations of existing works, in this chapter, we propose a semipara- metric Markov model with frailties to analyze clustered interval censored tooth-level event-history 25 data on early childhood caries. Our model is similar in spirit to the parametric model introduced in Chapter 2, which uses frailties to tie together all life-history data from the same sampling unit. But unlike Chapter 2 which considers Weibull-type baseline intensities and shared frailties for various types of transitions, the formulation in this chapter relaxes these assumptions by using unshared frailties and approximating the unknown baseline intensities by spline functions. The estimation of the model parameters including fixed effects coefficients, spline parameters, and random effects’ variance components then proceeds by maximizing a penalized log-likelihood which constrains the spline coefficients to avoid the overfitting problem. As discussed by Brumback & Rice (1998) and Cai & Betensky (2003), penalizing the spline coefficients is equivalent to treating these coefficients as random effects, and penalty parameters can be estimated as reciprocal variance components of the random effects using a restricted maximum likelihood approach. Thus, this mixed-model representation avoids the use of computationally intensive techniques such as cross-validation to select penalty parameters. More importantly, it enables the fitting algorithm and inferential ap- proaches to be devised within the framework of hierarchical likelihood (Lee & Nelder, 1996). As a by-product of the analysis, we propose a Bayesian approach for predicting the future transition probabilities of tooth-level caries in the spirit of Section 2.2.2. Due to the inherent heterogeneity in transition rates among different individuals not captured by measured covariates, these future transition probabilities are predicted using the subject’s historical data via the frailties estimates. We conduct a simulation study to evaluate the performance of the Bayesian prediction procedure. The rest of the chapter is organized as follows. The proposed semiparametric Markov frailty model is outlined in Section 3.2, followed by a detailed description of the model estimation and associated inference, and a Bayesian approach for predicting future transition probabilities. In Section 3.3, simulation studies are conducted to evaluate the accuracy of the estimators as well as the coverage rates of the confidence intervals for the regression parameters and baseline intensities. We also investigate the coverage of the prediction intervals for future transition probabilities through simulations. The application of the proposed methods to the DDHP data is presented in Section 3.4. We give some concluding remarks in Section 3.5. 26 3.2 Statistical methodologies We define Ai j(t) as the caries state of the j-th tooth for the i-th subject (i = 1, . . . , m; j = 1, . . . , n) at time t (t ≥ 0), observed only at the time points Vi. The intra-oral correlation among the Ai j(t)’s ( j = 1, . . . , n) is modeled through a bivariate frailty wi = (wi,01, wi,12)T on transition intensities, coupled with the conditional independence assumption. To highlight the dependence on covariates and the frailty, we denote by hab(t|Xi j, wi) the transition intensity from a to b, defined as follows, hab(t|Xi j, wi) = lim ∆t→0 = lim ∆t→0 where (a, b) ∈ {(0, 1), (1, 2)}. Pr(Ai j(t− + ∆t) = b| Ai j(u), 0 ≤ u < t, Ai j(t−) = a, Xi j, wi) Pr(Ai j(t− + ∆t) = b| Ai j(t−) = a, Xi j, wi), The following multiplicative intensity model for hab(t|Xi j, wi) is assumed hab(t|Xi j, wi) = h(0)ab(t)wi,ab exp(XT i j βab), (a, b) ∈ {(0, 1), (1, 2)}, where h(0)ab(t)’s are unspecified baseline transition intensities and log(wi) = (log(wi,01), log(wi,12))T is assumed to follow a bivariate zero-mean normal distribution, that is, log(wi) ∼ N©(cid:173)(cid:173)« ©(cid:173)(cid:173)« 0 0ª®®¬ ,©(cid:173)(cid:173)« σ2 01 σ01σ12 ρ σ01σ12 ρ σ2 12 . ª®®¬ ª®®¬ Here σ2 ab ((a, b) ∈ {(0, 1), (1, 2)}) reflects not only the strength of the intra-oral association of caries progressions among different teeth, but also the magnitude of heterogeneity in caries transition intensities between subjects that is not captured by observed covariates. The parameter ρ reflects the within-subject correlation between the two types of transitions, with positive ρ’s implying that the two intensities evolve similarly (for example, if the transition from 0 to 1 is fast and so is 1 to 2). The assumed bivariate frailty provides a more flexible representation of the inter-transition dependence compared to the shared frailty model in chapter 2, which constrains ρ the correlation between the log frailties of the two transitions to one. 27 For computational simplicity, we reparameterize log(wi) as log(wi) = (σ01zi1, σ12(ρzi1 + p1 − ρ2zi2))T where zi1 and zi2 are independent standard normal random variables. Under this for- mulation, hab(t|Xi j, wi) are rewritten as hab(t|Xi j, zi) ((a, b) ∈ {(0, 1), (1, 2)}) with zi ≡ (zi1, zi2)T . 3.2.1 A penalized likelihood estimation As described in Chapter 1, the transition times between caries states in the Detroit study are only known up to an interval between two consecutive study visits. This naturally complicates the estimation of the multistate model for tooth decay. Without interval censoring, the regression parameters and baseline intensities for different transitions can be estimated separately based on the transition-specific partial likelihood and Breslow’s estimator (Kalbfleisch & Prentice, 2002, Section 8.3), even after conditioning on random effects; while under interval censoring, they have to estimated jointly as the partial likelihood approach is not applicable. Consequently, the model estimation would require a full-likelihood-based method, which is developed in this subsection. Unlike the model in Chapter 2 which imposes a parametric form on h(0)01 (·) and h(0)12 (·), we estimate these functions nonparametrically via penalized spline smoothing. Specifically, we ap- proximate the logarithms of baseline intensity functions by linear splines, log h(0)ab(t) = α0,ab + b0,abt + Nabj =1 b j,ab(t − κ j,ab)+, (a, b) ∈ {(0, 1), (1, 2)} where (x)+ = max(0, x), and κ j,01’s ( j = 1, . . . , N01) and κ j,12’s ( j = 1, . . . , N12) are the locations of knots. The roughness of the spline functions, defined later, is penalized in the model estimation. We choose linear spline basis because it leads to closed-form expressions of cumulative baseline intensities involved in the likelihood function. The numbers of knots for these spline models increase with the sample size. In practice, a large number of knots can be used to reduce the approximation bias while not incurring the overfitting problem due to the penalization, but small N01 and N12 are preferred computationally. According to Kauermann et al. (2009), a penalized spline with dimension proportional to a small fractional power of the sample size can achieve the optimal rate of convergence (Stone, 1982). This guides our selection on the number of knots discussed later. The 28 knots for the a-to-b transition, (κ1,ab, · · · , κNab,ab), are equally spaced with respect to the quantiles of the unique values of {Vi,l−1, Vil : an a-to-b transition occurred to a tooth in (Vi,l−1, Vil], i = , i = 1, . . . , m}, 1, . . . , m, l = 1, . . . , Mi} ∪ {VMi where Vi0 = 0. : no a-to-b transition occurred to a tooth by VMi Define σ ≡ (σ01, σ12, ρ)T . Denote the vector of all model parameters except the variance components by ξ ≡ (βT 12)T where b01 ≡ (b0,01, b1,01, · · · , bN01,01)T and b12 ≡ (b0,12, b1,12, · · · , bN12,12)T . Under the assumption that the tooth decay processes are , α0,01, α0,12, bT 01 , βT 12 , bT 01 independent across subjects, the log-likelihood of the observed data Y is l(Y|ξ, σ) = mi=1 −∞Þ ∞ logÞ ∞ −∞ nj =1 f (Yi j|zi, ξ, σ)φ(zi1)φ(zi2)dzi1dzi2, where φ(·) is the standard normal density. Under the conditional Markov assumption on the tooth decay process, f (Yi j|zi, ξ, σ) ∝ Miq=1 Pr(cid:16)Ai j(Viq) = K( j)iq | Ai j(Vi,q−1) = K( j)i,q−1 , Xi j, zi, ξ, σ(cid:17) , (3.1) where K( j)i0 = 0 (Ai j(0) = 0 according to the caries classification scheme in Chapter 1). To simplify the notations, we write the transition probabilities on the right-hand side of (3.1) as ,K( j)iq (Vi,q−1, Viq, Xi j, zi, ξ, σ), which can be expressed in terms of the transition intensities p K( j)i,q−1 hab(t|Xi j, zi) ((a, b) ∈ {(0, 1), (1, 2)}) as follows, p00(Vi,q−1, Viq, Xi j, zi, ξ, σ) = S01(Vi,q−1, Viq|Xi j, zi) p01(Vi,q−1, Viq, Xi j, zi, ξ, σ) =Þ Viq p02(Vi,q−1, Viq, Xi j, zi, ξ, σ) =Þ Viq Vi,q−1 Vi,q−1 S01(Vi,q−1, r|Xi j, zi)h01(r|Xi j, zi)S12(r, Viq|Xi j, zi)dr S01(Vi,q−1, r|Xi j, zi)h01(r|Xi j, zi){1 − S12(r, Viq|Xi j, zi)}dr p11(Vi,q−1, Viq, Xi j, zi, ξ, σ) = S12(Vi,q−1, Viq|Xi j, zi) p12(Vi,q−1, Viq, Xi j, zi, ξ, σ) = 1 − S12(Vi,q−1, Viq|Xi j, zi) p22(Vi,q−1, Viq, Xi j, zi, ξ, σ) = 1 29 hab(t|Xi j, zi)dt}. The integrals involved in p01 and p02 do not have a closed form and thus are numerically approximated using the Gauss-Jacobi quadrature. In x where Sab(x, y|Xi j, zi) ≡ exp{−∫ y max(cid:18)5,(cid:24)10 × our implementation, the number of quadrature points is, max{Vik − Vi,k−1 : all i and k}(cid:25)(cid:19) . Viq − Vi,q−1 We estimate (ξ, σ) by maximizing the following penalized log-likelihood, lpen(Y|ξ, σ) = l(Y|ξ, σ) − τ01 2 bT 01b01 − τ12 2 bT 12b12, (3.2) where τ01 and τ12 are two penalty parameters that regularize the roughness of the two estimated baseline intensity functions. Our roughness definition is similar to∫ (∂ log h(0)ab(t)/∂t)2dt but slightly different from that in Equation (2.4) of Kauermann et al. (2009) , which does not include . We tried the penalty excluding b2 b2 0,ab than including b2 Kauermann et al. (2009) to set the two numbers of knots to N01 = N12 = max(⌈(mn)1/5⌉, 5). . Albeit using a slightly different penalty term, we follow Assumption 4 of in simulations; the results (not reported here) are worse 0,ab 0,ab The values of the penalty parameters are generally determined by cross-validation techniques. However, these procedures are known to be computationally intensive, especially for penalized log-likelihoods involving intractable integrals. An alternative approach is to treat the penalized likelihood as that of a mixed-effects model and then estimate the penalty parameters as reciprocal variance components of the random effects as, e.g., in Cai & Betensky (2003). More specifically, we treat the spline parameter vectors b01 and b12 as random effects that follow the multivariate normal distributions below: b01 ∼ N(cid:16)0, θ2 01IN01+1(cid:17) and b12 ∼ N(cid:16)0, θ2 12IN12+1(cid:17) , = 1/τ01, θ2 where θ2 01 sulting mixed-effects model has fixed effect parameters ζ ≡ (βT parameters γ ≡ (bT ψ(ρ), log θ01, log θ12)T . Here, ψ(·) denotes the Fisher’s z-transformation function. = 1/τ12, and Ik denotes a k × k identity matrix (k ∈ Z+). The re- , α01, α12)T , random effect m)T , and variance component parameters ν ≡ (log σ01, log σ12, , · · · , zT , βT 12 , bT 12 , zT 1 01 12 01 30 Ideally, we would like to estimate ζ and ν by maximizing the log-likelihood of the observed data Y (marginal likelihood) based on the above mixed-effects model, lmixed(Y|ζ, ν) = logÞ Þ Þ f (Y|z, ξ, σ) f (z) f (b01) f (b12)dzdb01db12 = logÞ elh(Y,γ|ζ,ν)dγ − (N01 + 1) log θ01 − (N12 + 1) log θ12, mi=1 nj =1 log(cid:8) f (Yi j|zi, ξ, σ)(cid:9) − 2  − bT 01b01 − , · · · , zT m)T and 1 2θ2 12 bT 12b12. zT i zi 1 2θ2 01 where z = (zT 1 lh(Y, γ|ζ, ν) = (3.3) (3.4) However, this marginal maximum likelihood estimation for ζ and ν is numerically rather complex, due to the marginal likelihood involving a multiple integral that does not have a closed form. To circumvent this difficulty, other methods which are computationally feasible have been proposed for estimating parameters of mixed-effects models; see, e.g., Pinheiro & Bates (1995) for a summary. Here we adopt Lee & Nelder (1996)’s hierarchical likelihood approach to estimate ζ , γ and ν. This approach estimates ν by maximizing the following restricted likelihood with respect to ν, lr(Y|ν) = logÞ elh(Y,γ|ζ,ν)dγdζ − (N01 + 1) log θ01 − (N12 + 1) log θ12. (3.5) This restricted likelihood is constructed following Harville (1974)’s Bayesian interpretation of the restricted maximum likelihood estimator (REML) for variance components in linear mixed- effects models. Specifically, assuming that ζ and ν have a prior density that is flat relative to the likelihood function lmixed(Y|ζ, ν), lr(Y|ν) is then the logarithm of the marginal posterior density for ν. Thus, the estimator for ν, denoted by ˆν, obtained via maximizing lr(Y|ν) is ν’s marginal posterior mode, which is the same as the Bayesian interpretation of REML for linear mixed-effects models (Harville, 1974). Given ˆν, the hierarchical likelihood approach estimates η ≡ (ζ T, γT)T via maximizing lh(Y, γ|ζ, ˆν), called h-likelihood by Lee & Nelder (1996), with respect to η. In practice, due to the lack of a closed form of the multiple integral in (3.5), we evaluate the restricted likelihood using the Laplace’s approximation. This leads to the adjusted profile h-likelihood (APHL) (Lee & Nelder, 1996), lA(Y|ν) = lh(Y, ˆγ(ν)| ˆζ(ν), ν)− 1 2 log |−Hh( ˆη(ν); Y, ν)|−(N01 +1) log θ01−(N12 +1) log θ12, (3.6) 31 where ˆη(ν) ≡ ( ˆγ(ν), ˆζ(ν)) maximizes lh(Y, γ|ζ, ν) given ν, Hh( ˆη(ν); Y, ν) is the Hessian matrix of lh(Y, γ|ζ, ν) with respect to η evaluated at ˆη(ν), and | · | denotes the matrix determinant. Because there is no analytical expression for ˆη(ν), we compute the maximum adjusted profile h-likelihood estimate (MAPHLE) ˆν for ν and the maximum h-likelihood estimate (MHLE) ˆη for η by alternatively maximizing lA(Y|ν) and lh(Y, γ|ζ, ν) as in the following steps: (i) Set s = 0, choose an initial value ν(0) for ν and maximize lh(Y, γ|ζ, ν(0)) w.r.t. η to obtain η(0); (ii) Maximize the following function w.r.t. ν to obtain ν(s+1): lh(Y, γ(s)|ζ(s), ν) − 2−1 log | − Hh(η(s); Y, ν)| − (N01 + 1) log θ01 − (N12 + 1) log θ12; (iii) Maximize lh(Y, γ|ζ, ν(s+1)) w.r.t. η to obtain η(s+1) and set s = s + 1; (iv) If k(η(s)T )T k/k(η(s−1)T , ν(s−1)T )T k < 0.01, stop and set ˆη = η(s) , ν(s)T )T −(η(s−1)T , ν(s−1)T and ˆν = ν(s); otherwise, go to (ii). Here, k · k denotes the Euclidean norm. The simulation study and data analysis in this chapter were performed using the R statistical software (R 3.1.1). We used the optim function with the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton option to maximize lh(Y, γ|ζ, ν(s+1)) w.r.t. η and lh(Y, γ(s)|ζ(s), ν) − 2−1 log | − Hh(η(s); Y, ν)| − (N01 + 1) log θ01 −(N12 + 1) log θ12 w.r.t. ν. In particular, the limited- memory BFGS method was used to maximize lh(Y, γ|ζ, ν(s+1)), since it is more suited for the maximization of a function w.r.t. a large number of arguments. We computed the gradient and the hessian matrix of h(Y, γ|ζ, ν) w.r.t η using their analytic expressions. expressions, we sometimes encountered the situation where the denominator of a fraction was so In evaluating those close to zero that the software recognized it as zero. When that happened, the denominator was replaced with 10−40. 32 3.2.2 Statistical inference According to Lee & Nelder (1996), the MHLE ˆζ is asymptotically equivalent to the marginal MLE for ζ obtained by maximizing (3.3), and both are asymptotically normal with a mean ζ and a covariance matrix that can be consistently estimated by the upper-left (2p + 2)×(2p + 2) sub-matrix of −H−1 h ( ˆη; Y, ˆν) where 2p + 2 is the length of the vector ζ . Hence, hypothesis tests and confidence intervals for ζ can be constructed based on this asymptotic normal distribution. In view of the random effect treatment of the penalized spline coefficients, we use a Bayesian approach to conduct inferences on h(0)01 (·) and h(0)12 (·). Specifically, we treat ζ and ν as random quantities with a flat joint prior density, whence exp{lh(Y, γ|ζ, ν)} is proportional to the joint posterior density of ξ and z, f (ξ, z|Y, ν), where z = (zT m)T . A Taylor expansion similar to (5.3) in O’Sullivan (1988) can be used to show that f (ξ|Y, ˆν) is approximately a multivariate normal distribution when m is large. Similar arguments to Appendices B, C and G in Lee , . . . , zT 1 & Nelder (1996) can be used to show that the normal approximation of f (ξ|Y, ˆν) has a mean approximately equal to the MHLE ˆξ and a covariance matrix approximately equal to the upper-left (2p + 4 + N01 + N12)×(2p + 4 + N01 + N12) block of −H−1 h ( ˆη; Y, ˆν), which enables one to construct pointwise Bayesian confidence intervals, a.k.a. credible intervals, for h(0)01 (t) and h(0)12 (t). The use of the conditional posterior f (ξ|Y, ˆν) is justified because ξ and ν are expected to be asymptotically orthogonal (Ha et al., 2016, Section 4.1). 3.2.3 Prediction of future transition probabilities Understanding the process by which a tooth transitions from being sound to being decayed is of great interest for primary and secondary prevention in dental care. An important step toward reaching this goal is to predict the tooth-level probabilities of future caries state transitions. The future transition probability is defined here as the conditional probability of a tooth being in a specific caries state at a future time point given the individual’s up-to-date caries life-history data. In terms of the previous notations, the future transition probability of the j-th tooth of the i-th 33 subject is where t > ViMi K( j)iMi p K( j)iMi q(ViMi , t, Xi j, zi, ξ, σ), (3.7) and q ≥ K( j)iMi . Note that ViMi is the last dental exam time for the i-th child and is the dental caries state of the j-th tooth of the i-th child at ViMi . A natural point predictor K( j)iMi q(ViMi for (3.7) is p , t, Xi j, ˆzi, ˆξ, ˆσ), which is the empirical Bayes estimator of (3.7) under the zero-one loss. It is, however, not straightforward to construct a 1 − α prediction interval for the future transition probability, so we resort to Bayesian methods again. Treating ζ and ν as random quantities with a flat joint prior density, we can obtain credible intervals for (3.7) as its prediction intervals through simulating (ξ, zi) from their joint posterior density f (ξ, zi|Y, ˆν). The use of this conditional posterior is justified because (3.7) depends on (zi, ξ, σ) through (ξ, wi), which is expected to be asymptotically orthogonal to ν (Ha et al., 2016, Section 4.1). A posterior realization of (ξ∗, z∗i ) is generated as follows: (a) Generate ξ∗ from f (ξ|Y, ˆν) ≈ N(A ˆη( ˆν), −AH−1 h ( ˆη( ˆν); Y, ˆν)AT), where A = (I2p+4+N01+N12 N12) × 2m zero matrix. , 0(2p+4+N01+N12)×2m) and 0(2p+4+N01+N12)×2m is a (2p + 4 + N01 + (b) Generate z∗i from f (zi|Y, ξ∗, ˆν) ∝ f (Yi|zi, ξ∗, ˆν)φ(zi1)φ(zi2) using the Metropolis-Hastings algorithm which employs a standard bivariate normal distribution as the proposal distribution and has a burn-in period of 5000. p q(ViMi , t, Xi j, z∗i We repeat steps (a) and (b) until we obtain enough pairs of (ξ∗, z∗i ), each of which results in a , ξ∗, ˆσ), a realization of (3.7) from its posterior distribution. The credible K( j)iMi interval formed by the α/2-th and (1 − α/2)-th sample quantiles of the posterior sample of (3.7) is then used as a 1 − α prediction interval for (3.7). The posterior sample mean and median are two alternative point predictors for (3.7) to the empirical Bayes estimator. 34 3.3 Simulation study 3.3.1 Simulation setting A Monte Carlo simulation study was performed to evaluate the finite-sample performance of the estimation and inference methods for the regression parameters and the baseline intensity functions, and the prediction method for the future transition probability. The simulation study was conducted for two sample sizes m = 100 and m = 200, and for n = 8 tooth-level processes. Weibull intensities, h(0)01 (t) = 0.12t1.2−1 and h(0)12 (t) = 0.18t0.9−1 were used as the baseline intensities. Seven dummy variables X1 through X7 were generated to indicate the eight tooth sites, with the , β(2)01 first tooth serving as the reference. Additionally, one continuous tooth-specific covariate X8 was generated from a standard normal distribution. The coefficients of (X1, · · · , X8) were chosen to , β(8)01 ) = (0.2, −0.1, 0.1, 0.3, 0.2, −0.3, 0.1, 0.2) for the 0-to-1 be (β(1)01 , β(3)01 , β(8)12 ) = (0.5, −0.2, 0.3, 0.2, 0.2, −0.3, 0.2, −0.1) , β(6)12 transition and (β(1)12 for the 1-to-2 transition. We set σ01, σ12 and ρ as 1, 0.8 and 0.8, respectively. Three dental exam times were simulated as follows: V1 ∼ U(1.6, 2.4), V2 = V1 + U(1.6, 2.4), and V3 = V2 + U(3.2, 4.8). And finally, 1000 Monte Carlo samples were generated for each of the two sample sizes. , β(5)01 , β(3)12 , β(4)01 , β(2)12 , β(6)01 , β(4)12 , β(7)01 , β(5)12 , β(7)12 3.3.2 Simulation results Tables 3.1 and 3.2 show the simulation results for estimating β ≡ (βT 01 , βT 12)T . The mean estimates are close to the true parameter values. The empirical coverage probabilities of the 95% confidence intervals based on the asymptotic normal distribution of ˆβ are centered around the nominal level. The standard deviation of the estimates (empirical standard error) for each component in β is almost the same as the mean of the estimated asymptotic standard errors. The mean squared errors (MSEs) and the empirical variances decrease roughly by half as the number of subjects doubles, which is consistent with the asymptotic theory. Figure 3.1 shows that the average baseline intensity estimates are reasonably close to the true baseline intensities over the interval [1.6, 9.6], with the endpoints being the lower and the upper 35 Table 3.1: The simulation results for estimating β01 (0-to-1 transition) Truth β(1)01 0.2 β(2)01 -0.1 β(3)01 0.1 β(4)01 0.3 β(5)01 0.2 β(6)01 -0.3 β(7)01 0.1 β(8)01 0.2 m=100 Estimate Coverage MSE Bias Empirical SE Mean of SEs m=200 Estimate Coverage MSE Bias Empirical SE Mean of SEs 0.201 -0.106 0.096 0.301 0.193 -0.304 0.098 0.194 0.958 0.961 0.955 0.032 0.012 0.033 0.001 0.002 0.006 0.178 0.109 0.181 0.181 0.182 0.113 0.954 0.956 0.954 0.032 0.031 0.032 0.004 0.001 0.007 0.179 0.177 0.178 0.182 0.180 0.181 0.953 0.035 0.004 0.186 0.188 0.940 0.035 0.006 0.186 0.185 0.199 -0.100 0.105 0.303 0.200 -0.299 0.097 0.195 0.964 0.956 0.950 0.016 0.015 0.006 0.003 0.005 0.001 0.124 0.076 0.128 0.128 0.129 0.079 0.959 0.954 0.954 0.016 0.016 0.015 0.005 0.003 0.000 0.125 0.127 0.124 0.128 0.127 0.128 0.943 0.017 0.001 0.132 0.133 0.955 0.016 0.000 0.125 0.130 Table 3.2: The simulation results for estimating β12 (1-to-2 transition) Truth β(1)12 0.5 β(2)12 -0.2 β(3)12 0.3 β(4)12 0.2 β(5)12 0.2 β(6)12 -0.3 β(7)12 0.2 β(8)12 -0.1 m=100 Estimate Coverage MSE Bias Empirical SE Mean of SEs m=200 Estimate Coverage MSE Bias Empirical SE Mean of SEs 0.509 -0.191 0.308 0.222 0.205 -0.281 0.211 -0.104 0.966 0.942 0.012 0.053 0.000 0.009 0.108 0.230 0.242 0.105 0.938 0.960 0.958 0.060 0.054 0.056 0.008 0.022 0.005 0.245 0.231 0.237 0.245 0.241 0.244 0.947 0.060 0.011 0.245 0.246 0.943 0.074 0.019 0.272 0.269 0.961 0.062 0.009 0.249 0.259 0.484 -0.197 0.291 0.189 0.197 -0.298 0.188 -0.099 0.948 0.959 0.005 0.027 0.001 0.016 0.164 0.072 0.073 0.170 0.961 0.954 0.954 0.028 0.028 0.028 0.009 0.011 0.003 0.166 0.167 0.166 0.172 0.170 0.171 0.950 0.030 0.012 0.174 0.173 0.948 0.033 0.003 0.181 0.182 0.953 0.035 0.002 0.188 0.189 36 0−1 transition (m=100, n=8, MISE=0.0148) 0−1 transition (m=200, n=8, MISE=0.0080) True baseline intensity Average estimate Average 95% pointwise CI True baseline intensity Average estimate Average 95% pointwise CI y t i s n e t n i e n i l e s a B 8 . 0 6 . 0 4 . 0 2 . 0 0 . 0 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 Time (t) Time (t) 1−2 transition (m=100, n=8, MISE=0.0115) 1−2 transition (m=200, n=8, MISE=0.0073) True baseline intensity Average estimate Average 95% pointwise CI True baseline intensity Average estimate Average 95% pointwise CI y t i s n e t n i e n i l e s a B 8 . 0 6 . 0 4 . 0 2 . 0 0 . 0 y t i s n e t n i e n i l e s a B y t i s n e t n i e n i l e s a B 8 . 0 6 . 0 4 . 0 2 . 0 0 . 0 8 . 0 6 . 0 4 . 0 2 . 0 0 . 0 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 Time (t) Time (t) Figure 3.1: The average estimates of h(0)01 (t) and h(0)12 (t), their average 95% pointwise confidence intervals and their mean integrated squared errors (MISE) across the 1000 Monte Carlo samples bounds of V1 and V3 respectively. The mean integrated squared errors (MISE), E[∫ 9.6 h(0)ab(t)}2dt] (ab = 01, 12), decrease with the sample size m. 1.6 { ˆh(0)ab(t) − The average 95% pointwise confidence intervals for the baseline intensities are of proper widths and contain the true values at all the time points. The posterior standard deviations of the baseline intensities in the calculation of the Bayesian confidence intervals were obtained by the delta method, and they are treated as the theoretical standard errors of the baseline intensity estimators. The pointwise averages of these theoretical standard errors over the 1000 Monte Carlo 37 0−1 transition (m=100, n=8) 0−1 transition (m=200, n=8) Average of theoretical SE Empirical SE Average of theoretical SE Empirical SE 4 . 0 3 . 0 l e u a V 2 . 0 1 . 0 0 . 0 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 Time (t) Time (t) 1−2 transition (m=100, n=8) 1−2 transition (m=200, n=8) Average of theoretical SE Empirical SE Average of theoretical SE Empirical SE 4 . 0 3 . 0 l e u a V 2 . 0 1 . 0 0 . 0 4 . 0 3 . 0 l e u a V 2 . 0 1 . 0 0 . 0 4 . 0 3 . 0 l e u a V 2 . 0 1 . 0 0 . 0 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 Time (t) Time (t) Figure 3.2: The comparison between the empirical standard errors and the average theoretical standard errors of the baseline intensity estimators samples are shown to be close to the empirical standard errors of the baseline intensity estimators in Figure 3.2, except for ˆh(0)01 (t) near the upper boundary. In addition, both standard errors reduce as the sample size increases. Figure 3.3 displays the empirical coverage probabilities of the 95% pointwise confidence inter- vals for the baseline intensities. Most of the coverage probabilities are scattered around 95% except when the time point is near the upper boundary, which shows that the Bayesian confidence interval 38 0−1 transition (m=100, n=8) 0−1 transition (m=200, n=8) ********************************************************************************* ********************************************************************************* l s a v r e t n i e c n e d i f n o c % 5 9 e h t f o e g a r e v o c e h T 0 0 . 1 5 9 . 0 0 9 . 0 0 8 . 0 0 7 . 0 0 6 . 0 0 5 . 0 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 Time (t) Time (t) 1−2 transition (m=100, n=8) 1−2 transition (m=200, n=8) ********************************************************************************* ********************************************************************************* l s a v r e t n i e c n e d i f n o c % 5 9 e h t f o e g a r e v o c e h T 0 0 . 1 5 9 . 0 0 9 . 0 0 8 . 0 0 7 . 0 0 6 . 0 0 5 . 0 l s a v r e t n i e c n e d i f n o c % 5 9 e h t f o e g a r e v o c e h T l s a v r e t n i e c n e d i f n o c % 5 9 e h t f o e g a r e v o c e h T 0 0 . 1 5 9 . 0 0 9 . 0 0 8 . 0 0 7 . 0 0 6 . 0 0 5 . 0 0 0 . 1 5 9 . 0 0 9 . 0 0 8 . 0 0 7 . 0 0 6 . 0 0 5 . 0 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 1.6 3.0 4.0 5.0 6.0 7.0 8.0 9.0 9.6 Time (t) Time (t) Figure 3.3: The empirical coverage probabilities of the 95% pointwise confidence intervals for the baseline intensities has a good frequentist performance. To evaluate the accuracy of the prediction procedure for the future transition probability de- scribed in Section 3.2.3, we picked the last tooth (the 8-th) of the last (the m-th) subject in each Monte Carlo sample and checked whether the 95% prediction interval contains the true probability of that tooth remaining in the last observed caries state at several future time points. We considered five future time points, which were obtained by adding 0.1, 0.3, 0.5, 1 and 1.5 to the last visit time 39 Table 3.3: The proportion of the Monte Carlo iterations where the 95% prediction interval contains the true future transition probability for the last tooth of the last subject at VmMm + t∗ where t∗ = 0.1, 0.3, 0.5, 1, 1.5 m 100 200 t∗ = 0.1 0.9445 0.9381 VmMm + t∗ 0.3 0.5 1 1.5 0.9428 0.9428 0.9428 0.9376 0.9381 0.9434 0.9434 0.9416 respectively. Five hundred posterior realizations of (ξ∗, z∗i ) were used to compute the prediction intervals. Table 3.3 shows the coverage rates of the 95% prediction intervals at the future time points. They are all close to 0.95. The Monte Carlo iterations where the m-th subject has an advanced caries state at the last visit were not considered in the calculation of the coverage rates because that is an absorbing state. The number of removed iterations was 423 for m = 100 and 435 for m = 200. Overall, the estimation and inference methods for regression parameters and baseline intensities as well as the prediction procedure for the future transition probability all perform reasonably well when the number of subjects is no less than 100 and the number of teeth considered is at least eight. 3.4 Application We applied the proposed methods to the analysis of the DDHP data. This analysis was restricted to caries data on first and second molars of each quadrant, resulting in n = 8 teeth per child being studied. One subject was removed from the analysis data set because that subject’s birth date was missing, leading to m = 1020 including 487 boys and 533 girls. The caries state of a tooth may be missing at a study visit if some tooth surface is not visible; the tooth is missing for reasons other than caries; or the child simply missed the visit. In these instances, the observation was removed from the analysis data set, assuming a missing at random mechanism. The remaining observations are then used to study the intra-oral distribution of dental caries by contrasting for example the transition intensities for the first molar vs. second molar, the upper molar (maxilla) vs. lower (mandible) molar, and right quadrant vs. left quadrant. More importantly, they are used to predict tooth-level future transition probabilities for caries formation. 40 Table 3.4: The regression parameter estimates for the saturated model based on the DDHP data Gender Molar Type Quadrant Upper right tooth (54) 1st molar Upper left tooth (64) Lower left tooth (74) Male Lower right tooth (84) Upper right tooth (55) 2nd molar Upper left tooth (65) Lower left tooth (75) Lower right tooth (85) Upper right tooth (54) 1st molar Upper left tooth (64) Lower left tooth (74) Female Lower right tooth (84) Upper right tooth (55) 2nd molar Upper left tooth (65) Lower left tooth (75) Lower right tooth (85) Transition 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 Parameter Estimate SE p-value β1,1,1,01 β1,1,1,12 β1,1,2,01 β1,1,2,12 β1,1,3,01 β1,1,3,12 β1,1,4,01 β1,1,4,12 β1,2,1,01 β1,2,1,12 β1,2,2,01 β1,2,2,12 β1,2,3,01 β1,2,3,12 β1,2,4,01 β1,2,4,12 β2,1,1,01 β2,1,1,12 β2,1,2,01 β2,1,2,12 β2,1,3,01 β2,1,3,12 β2,1,4,01 β2,1,4,12 β2,2,1,01 β2,2,1,12 β2,2,2,01 β2,2,2,12 β2,2,3,01 β2,2,3,12 β2,2,4,01 β2,2,4,12 The reference category -0.0200 0.1208 0.3410 0.2174 0.3580 0.2139 1.4652 -0.3856 1.5330 -0.3883 1.4061 -0.3307 1.4426 -0.3115 0.1274 0.0096 0.2324 0.0483 0.1805 0.2544 0.2326 0.2647 1.5512 -0.3185 1.6083 -0.2099 1.8307 -0.2834 1.8725 -0.3777 0.1147 0.1681 0.1118 0.1591 0.1125 0.1581 0.8614 0.4724 0.0023 0.1719 0.0015 0.1762 0.0095 0.0293 0.0104 0.1075 < 0.0001 0.1504 0.1081 < 0.0001 0.1498 0.1078 < 0.0001 0.1517 0.1076 < 0.0001 0.1496 0.1617 0.1624 0.1608 0.1582 0.1612 0.1584 0.1606 0.1581 0.0373 0.4306 0.9530 0.1484 0.7599 0.2628 0.1083 0.1477 0.0941 0.0300 0.1569 < 0.0001 0.1467 0.1573 < 0.0001 0.1457 0.1579 < 0.0001 0.1454 0.1580 < 0.0001 0.1458 0.1496 0.0512 0.0096 Because each child has a total of eight primary molars, we considered a saturated model with fifteen dummy covariate variables indicating the combination of tooth identity and gender. The upper right first molar for boys was set as the reference category. Seven knots were used for approximating log h(0)01 (t) and log h(0)12 (t) by linear splines. The total number of parameters in the model is 51 including the spline parameters, the regression coefficients and the variance components of the log frailties for both types of transitions. 41 The regression coefficient estimates and their standard errors for the saturated model are shown in Table 3.4. The subscripts of β represent gender (1= Male, 2= Female), molar type (1=First molar, 2=Second molar), quadrant (1=Upper right, 2=Upper left, 3=Lower left, 4=Lower right) and transition type (01=0-to-1 transition, 12=1-to-2 transition) in order. For instance, β1,1,1,01 and β1,1,1,12 are the coefficients for 0-1 and 1-2 transitions of the upper right first molar in a boy, the reference category, respectively, thus β1,1,1,01 = β1,1,1,12 = 0. It is of scientific interest to compare tooth decay profiles between first and second molars, between upper and lower molars, and between right and left molars. These symmetry evaluations need to be stratified by gender if there exists an interaction effect between gender and tooth identity (type and quadrant) on the tooth decay profile. All these questions can be answered by testing a hypothesis of the form H0 : Gβ = 0, where G is an appropriate matrix with full row rank, denoted by r, and 0 is a zero vector of length r. Based on the asymptotic normality of ˆβ, the corresponding Wald test is, (G ˆβ)T(G ˆV ar( ˆβ)GT)−1G ˆβ H0 χ2 r , where ˆV ar( ˆβ) is the sub-matrix of −H−1 h ( ˆη; Y, ˆν) corresponding to β. The null hypothesis for testing the interaction between gender and tooth identity can be formu- lated as H0 : β1,t,l,ab − β1,1,1,ab = β2,t,l,ab − β2,1,1,ab, for all t, l, ab. This null hypothesis was rejected at 5% significant level (p-value = 0.0016) based on the data. Hence we evaluate whether there are any intra-oral symmetries with respect to the tooth decay profile for each gender. The null hypothesis of comparing the tooth decay profile between first and second molars for each gender is, H0 : βg,1,l,ab = βg,2,l,ab for all l, ab, where g takes values 1 or 2. These two null hypotheses were rejected (both p-values less than 0.0001). The null hypothesis of comparing the tooth decay profile between upper and lower molars for each gender is, H0 : βg,t,1,ab = βg,t,4,ab, βg,t,2,ab = βg,t,3,ab for all t, ab, 42 Table 3.5: The Wald test statistics (p-value) for the three types of symmetry in caries transition intensity stratified by gender and transition type Gender Test for symmetry Transition 0 to 1 1 to 2 Male Female Upper vs Lower 21.87(0.0002) 0.71(0.9498) 18.96(0.0008) 1.86(0.7617) First molar vs Second molar 775.38(< 0.0001) 2.74(0.6020) 0.54(0.9695) First molar vs Second molar 538.75(< 0.0001) 48.35(< 0.0001) 5.75(0.2180) 1.66(0.7973) 51.22(0.0004) Right vs Left Upper vs Lower Right vs Left where g takes values 1 or 2. This hypothesis was rejected for each gender (p-value = 0.0011 for boys and p-value= 0.0020 for girls). Lastly, the right-left symmetry in tooth decay for each gender can be tested using the null hypothesis: H0 : βg,t,1,ab = βg,t,2,ab, βg,t,3,ab = βg,t,4,ab for all t, ab, (3.8) where g takes values 1 or 2. The corresponding test statistics are 1.256 (p-value = 0.9961) for boys and 3.690 (p-value= 0.8840) for girls, which indicates that the right-left symmetry in tooth decay exists for both genders. We further performed the tests for the three types of symmetries in caries transition intensity stratified by gender and transition type. The Wald test statistics and the p-values for these tests are shown in Table 3.5. Since the null hypotheses (3.8) were not rejected, it is expected that none of the tests comparing the right and left molars in Table 3.5 was rejected. The detailed comparisons associated with the significant non-symmetries found in Table 3.5 (at 5% level) were conducted in order to find out where the difference is. The results of these specific tests can be found in Table 3.6. The upper-lower non-symmetry in 0-to-1 transition intensity for boys is due to lower first molars transitioning faster from state 0 to 1 than upper first molars in boys. The first vs. second difference in 0-to-1 transition intensity for boys is because of the first molar transitioning slower from 0 to 1 than the second molar in every quadrant of a boy’s mouth. In contrast, the first molar transitions faster from state 1 to 2 than the second molar in every quadrant of a boy’s mouth, which explains the first vs. second non-symmetry in 1-to-2 transition intensity for boys. The upper-lower 43 Table 3.6: The detailed comparisons associated with the significant non-symmetries in caries transition intensity found in Table 3.5 Gender Transition Teeth Compared Wald Test Statistic (p-value) 0 → 1 Male 0 → 1 1 → 2 0 → 1 Upper right first Upper left first Upper right second Lower right second Upper left second Lower right first Lower left first Lower left second Upper right first Upper left first Lower right first Lower left first Upper right first Upper left first Lower right first Lower left first Upper right second Upper left second Lower right second Lower left second Upper right second Upper left second Lower right second Lower left second Upper right first Upper left first Upper right second Lower right second Upper left second Lower right first Lower left first Lower left second Female 0 → 1 1 → 2 Upper right first Upper left first Lower right first Lower left first Upper right first Upper left first Lower right first Lower left first Upper right second Upper left second Lower right second Lower left second Upper right second Upper left second Lower right second Lower left second -3.181 (0.0007) -3.200 (0.0007) 0.236 (0.4068) 1.323 (0.0928) -13.625 (< 0.0001) -14.212 (< 0.0001) -10.382 (< 0.0001) -10.261 (< 0.0001) 2.563 (0.0052) 3.443 (0.0003) 3.864 (0.0001) 3.934 (< 0.0001) -0.988 (0.1592) 0.494 (0.3015) -3.489 (0.0002) -2.423 (0.0077) -14.055 (< 0.0001) -13.681 (< 0.0001) -16.200 (< 0.0001) -16.168 (< 0.0001) 2.424 (0.0077) 2.000 (0.0227) 4.966 (< 0.0001) 4.167 (< 0.0001) non-symmetry in 0-to-1 transition intensity for girls is due to lower second molars transitioning faster from state 0 to 1 than upper second molars in girls. The first vs. second difference in 0-to-1 transition intensity for girls is because of the first molar transitioning slower from 0 to 1 than the second molar in every quadrant of a girl’s mouth, but it is the opposite for comparing the first molars with the second in 1-to-2 transition intensity for girls: the first molar transitions faster from 1 to 2 than the second in every quadrant of a girl’s mouth. We predicted future transition probabilities for two molars of a boy in the data set who had only two study visits at 2.15 and 4.25 years of age respectively. The boy is the same child who was used for the data analysis in Chapter 2. The molars of interest were his upper right first and second 44 molars. His upper right second molar was sound at the first visit but was in the early caries state at the second visit, whereas his upper right first molar remained sound over the two visits. For each molar, the empirical Bayes estimates and the 95% prediction intervals for the probabilities of all possible transitions after his last visit were computed up to age 8 using 500 posterior realizations of (ξ∗, z∗i ). The left-side plot in Figure 3.4 shows the probabilities of the 0-to-0, 0-to-1 and 0-to-2 transitions for his upper right first molar, and the right-side plot shows the probabilities of the 1-to-1 and 1-to-2 transitions for his upper right second molar. The probabilities for the molars to stay in the current state get smaller as the child ages, and those for the molars to develop caries get greater as time goes by. Dentists can decide the proper timing of future dental exams for this child based on these probabilities. The correlation parameter ρ in the distribution of the bivariate frailty could be impacted by covariates. To examine the heterogeneity in ρ and assess its impact on the analysis results, we fit our model with ρ being gender-specific to the Detroit data. The analysis results are given in Appendix A and they are very similar to the results obtained from assuming a homogeneous ρ, although the likelihood ratio test for the heterogeneity based on the adjusted profile h-likelihood (3.6) showed that gender has a statistically significant effect on ρ (p-value = 0.0343). 3.5 Discussion We have proposed a semiparametric three-state Markov frailty model for tooth-level caries life course data subject to interval censoring. Estimation of the model proceeds by maximizing a penalized likelihood, imposing a constraint on the coefficients of the linear splines that approximate the unknown baseline intensities. The linear spline approximation is computationally convenient as it avoids certain numerical integrations with respect time appearing in the log-likelihood function, the score function and the hessian matrix. The penalized likelihood resembles the joint likelihood of a mixed model, treating the coefficients of the spline as random effects. Using the mixed-model representation avoids the classical problem of penalty parameter tuning. Because the marginal maximum likelihood estimation for the mixed-effects model involves high-dimensional intractable 45 A) B) y t i l i b a b o r p n o i t i s n a r t e r u t u F 0 . 1 8 . 0 6 . 0 4 . 0 2 . 0 0 . 0 y t i l i b a b o r p n o i t i s n a r t e r u t u F 0 . 1 8 . 0 6 . 0 4 . 0 2 . 0 0 . 0 4.23 5.00 6.00 Age (years) 7.00 8.00 4.23 5.00 6.00 Age (years) 7.00 8.00 Posterior mode (0−to−0 transition) 95% prediction interval (0−to−0 transition) Posterior mode (0−to−1 transition) 95% prediction interval (0−to−1 transition) Posterior mode (0−to−2 transition) 95% prediction interval (0−to−2 transition) Posterior mode (1−to−1 transition) 95% prediction interval (1−to−1 transition) Posterior mode (1−to−2 transition) 95% prediction interval (1−to−2 transition) Figure 3.4: The future transition probabilities to every possible state out of the caries state at the last visit for the upper right first molar (A) and the upper right second molar (B) integrals, we adopted the hierarchical likelihood approach of Lee and Nelder (1996) for model fitting and inference. In addition, we developed a Bayesian approach, as a by-product of the model estimation, to predict future caries transition probabilities at the tooth level. Tooth-level caries transition risks are useful for prevention and treatment planning in dental care. The simulation study demonstrated that the estimation and inference methods for the regression parameters and baseline intensities have a good performance. The Bayesian prediction interval for the future transition probability is also shown to have a good frequentist property. The methodology is applied to the DDHP data to describe the intra-oral life-history of caries, with an emphasis on spatial symmetry evaluation. Our statistical methods are generally applicable to cohort studies and medical records involving chronic diseases in which the disease progression is assessed at various locations of the body and at intermittent time points. The model has some limitations. For example, to account for the intra-oral clustering, we assume that the intensities of different teeth share a bivariate frailty. This restriction can be relaxed 46 by allowing quadrant-specific frailties. Also, the conditional Markov assumption might not be realistic and could be replaced by a conditional semi-Markov assumption. These extensions of the model merit future research. 47 CHAPTER 4 VARIABLE SELECTION IN THE PROPORTIONAL HAZARDS MODEL FOR INTERVAL CENSORED DATA 4.1 Introduction Epidemiologic/clinical studies usually collect a large number of variables such as subjects’ biomarkers, genotypes, demographic characteristics, and environmental risk factors to study their relationships with the outcome. Building a regression model including all those variables is often undesirable because it has low prediction accuracy and is hard to interpret. Therefore, variable selection is an important topic in regression modeling. There have been many statistical methods on variable selection. Among them, the traditional method of best subset selection is being generally used in many applications, but it is known to be unstable as resulting in poor predictive performance (Breiman, 1996). Another popular class of methods, which is more stable, is variable selection via regularization, also known as penalized variable selection. This class of methods can simultaneously determine which variables to put in a regression model and estimate their effects on the outcome. Popular penalized variable selection methods include Lasso (Tibshirani, 1996), the smoothy-clipped absolute deviation (SCAD; Fan & Li, 2001) and adaptive Lasso (Zou, 2006). The latter two methods enjoy the so-called oracle property: when the sample size is large, they build the regression model as if they knew which variables are important and which are not. Most penalized variable selection methods were proposed for linear or generalized linear models, where the outcome is fully observed. There are some methods for survival analysis. For example, Lasso, SCAD and adaptive Lasso have been extended to the Cox model for right censored data by Tibshirani (1997), Fan & Li (2002) and Zhang & Lu (2007) respectively. However, to the best of our knowledge, there is only one published method (Wu & Cook, 2015) on penalized variable selection for interval censored data. Wu & Cook (2015) assumes a proportional hazards model with a piecewise constant baseline hazard function, and performs the variable selection through 48 penalized likelihood estimation with a LASSO, SCAD or adaptive LASSO penalty. They did not prove any asymptotic property of the method and used an ad hoc approach to specify the number and location of break points for the piecewise constant baseline hazard function. In this chapter, we propose a penalized variable selection method for the Cox proportional hazards model of interval censored data and investigate the performance of the proposed model in realistic samples, in terms of both variable selection and estimation consistency. The model is semiparametric because its baseline hazard function is completely unspecified. The nonparametric maximum likelihood estimator for the baseline harzard function is considered with the support set which was introduced by Alioum & Commenges (1996). The variable selection is performed via a penalized nonparametric maximum likelihood estimation with an adaptive Lasso penalty. In the model estimation procedure, we first obtain the unpenalized nonparametric maximum likelihood estimators for the regression parameters as following the EM algorithm in Zeng et al. (2016) with some modifications, and use these consistent estimates in the penalized likelihood estimation procedure with the adaptive LASSO penalty. We prove that our method has the oracle property and give an approach to estimate the covariance matrix of the penalized estimator. In addition, we extend the method to left truncated and interval censored data. To evaluate model performance, we compared our proposed method to the existing methods introduced by Zhang & Lu (2007) and Wu & Cook (2015) under interval censoring. Because Zhang and Lu’s method is designed for right censored data, the evaluation on their method is conducted using mid-point imputations of the censoring intervals, which is a common practice in many studies. The application of the proposed model in this chapter is performed with the mouth-level DDHP data described in Chapter 1. We consider the event that a child has at least one noncavitated/cavitated lesion in the surfaces of all teeth. Unlike Chapter 2 and Chapter 3, we have only one censored interval for each child. In addition, we use the child-level, family-level or community-level influence variables as covariates of the model to determine social and behavioral factors that play a critical role in developing dental caries in deciduous dentition. 49 In Section 4.2, we introduce a new variable selection approach via adaptive Lasso estimation for interval censored time-to-event data. Followed by Section 4.2, the asymptotic property of the adaptive Lasso estimator from the proposed model is shown in Section 4.3. The extension of the proposed model to and interval censored data is demonstrated in Section 4.4. We also conduct the simulation study in Section 4.5, as comparing the proposed model with the existing variable selection approaches. The simulation study with left truncated and interval censored data is also described in this section. The practical utility of the proposed variable selection approach is illustrated with the mouth-level DDHP data in Section 4.6. The concluding remarks are given in Section 4.7. 4.2 Statistical methodologies We consider a random sample of n independent subjects. Let Ti and Xi respectively denote the time to event of interest and a p-dimensional vector of covariates for subject i (i = 1, . . . , n). We study how to select significant covariates among Xi for the time-to-event outcome Ti in the situation where Ti is subject to mixed case interval censoring (Schick & Yu, 2000). Denote the sequence of inspection times for subject i by ®Vi = (Vi1, . . . , ViKi)T . Define ∆ik = I(Vi,k−1 < Ti ≤ Vik) (k = 1, . . . , Ki) with Vi0 = 0, and ®∆i = (∆i1, . . . , ∆iKi)T . Then the observed data consist of {Oi = (®Vi, ®∆i, Xi)}n i=1 . We assume that the inspection process is independent of the time to event given the covariates, i.e. Ti ⊥ (®Vi, Ki)|Xi . We also assume that the conditional distribution of Ti given Xi satisfies the Cox proportional hazards model, i.e. h(t|Xi) = h(t) exp(βT Xi), (4.1) where h(t|Xi) ≡ lim∆t→0+ pr(t ≤ Ti < t + ∆t|Ti ≥ t, Xi)/∆t, h(t) is an unspecified baseline hazard function, and β is a vector of unknown regression parameters. 50 4.2.1 Variable selection We conduct the variable selection via an adaptive Lasso estimation. Let Li and Ri denote respec- tively the last inspection time before Ti and the first inspection time after Ti. Set Li = 0 if Ti is smaller than Vi1 and Ri = ∞ if Ti is larger than ViKi (4.1) and the mixed case interval censoring, the logarithm of the observed-data likelihood is , i.e. Ti is right censored. Under the Cox model ln(β, H) = ni=1 where H(t) =∫ t Lasso estimation is 0 h(s)ds and the convention, exp{−H(∞) exp(βT Xi)} = 0, is used. The adaptive loghexp{−H(Li) exp(βT Xi)} − exp{−H(Ri) exp(βT Xi)}i , β,H | β j|/| ˜β j| ln(β, H) − nθ dj =1 max , (4.2) (4.3) where ˜β ≡ ( ˜β1, . . . , ˜βd)T is the unpenalized nonparametric maximum likelihood estimator for β, which can be obtained using the EM algorithm in Zeng et al. (2016), θ is a thresholding parameter whose selection is discussed later, and the maximization with respect to H is over the space of nondecreasing nonnegative functions. We denote the estimator from (4.3) by ( ˆβ, ˆH). Note that the penalty in (4.3) does not involve H. Thus the support set over which ˆH(·) increases is the same as that of the unpenalized nonparametric maximum likelihood estimator for H, which was characterized by Alioum & Commenges (1996). Specifically, the estimator ˆH increases only on so-called maximal intersections: intervals of the form (l, u] where l ∈ {Li}n u ∈ {Ri}n and there is no Li or Ri in (l, u). Additionally, ˆH is indifferent to how it increases on the maximal intersections, as only the overall jump sizes over (l, u]’s (u < ∞), ˆH(u) − ˆH(l), affect i=1 i=1 , the penalized likelihood in (4.3). Write the maximal intersections with a finite upper endpoint as (l1, u1], . . . , (lm, um]. According to the characterizations of ˆH, we can assume, just for the purpose of computing ˆH, that H is flat on ∪m j =1(lj, u j]. Define hk = H(uk) − H(lk) (k = 1, . . . , m). Then 51 ln(β, h) = = the log likelihood ln(β, H) can be written as ln(β, h), where h = (h1, . . . , hm)T , and ni=1 ni=1 log exp{− uk≤Li exp{− uk≤Li log©(cid:173)(cid:173)« hk exp(βT Xi)} − I(Ri < ∞) exp{− uk≤Ri hk exp(βT Xi)} 1 − exp{− Li 0) : 1 ≤ i ≤ n and I(Ri) < ∞} as an observed data set, where Ai = 0 means that Ai is observed to be zero and Bi > 0 means that Bi is observed to be positive. The log likelihood of ˜O has the form Wik and Bi = I(Ri < ∞)Li 0) = E(Wik| Li 0) exp(β(s)T Xi) h(s)j exp(β(s)T Xi)} . At the M-step, we first maximize the expected complete-data log likelihood with respect to h conditioning on β. The maximizer has an analytical expression: h(s+1) k (β) ≡ n n i=1 i=1 I(uk ≤ R∗i ) ˆE(Wik) I(uk ≤ R∗i ) exp(βT Xi) (k = 1, . . . , m). (β) into the conditional expectation of (4.4), we update β by maximizing Q(β, h(s+1)(β)| β(s), h(s)) − nθ | β j|/| ˜β j|, Plugging hk = h(s+1) k where dj =1 − log dj =1 Q(β, h(s+1)(β)| β(s), h(s)) = mk=1 ni=1 (β), . . . , h(s+1)m I(uk ≤ R∗i ) ˆE(Wik) nj =1 I(uk ≤ R∗j) exp(βT X j) + βT Xi 1 and h(s+1)(β) = (h(s+1) (β))T . To perform this maximization, we approximate −Q(β, h(s+1)(β)| β(s), h(s)) by a second-order Taylor expansion around β(s). It can be written in a quadratic form 2−1(Y−Zβ)T(Y−Zβ), where Z is from the Cholesky decomposition of∇2Q(β(s)) ≡ −∂2Q/∂ β∂ βT|β=β(s), that is ∇2Q(β(s)) = ZT Z, and Y = (ZT)−1{∇2Q(β(s))β(s)−∇Q(β(s))} with ∇Q(β(s)) = −∂Q/∂ β|β=β(s). Then we minimize 1 2(Y − Zβ)T(Y − Zβ) + nθ | β j|/| ˜β j| (4.5) to obtain β(s+1), using the modified shooting algorithm in Zhang & Lu (2007). The corresponding update for h is h(s+1) = h(s+1)(β(s+1)). The EM algorithm stops if the maximum of the absolute differences between the estimates at two successive iterations is smaller than, say 10−3. We choose the initial parameter values 53 (β(0), h(0)) to be the unpenalized nonparametric maximum likelihood estimator ( ˜β, ˜h), which can be obtained from the same EM algorithm as the above except that the objective function (4.5) becomes 2−1(Y − Zβ)T(Y − Zβ). 4.2.2 Covariance matrix of the adaptive Lasso estimator Note that the estimator ˆβ obtained from (4.3) is equivalent to the solution of the following penalized profile likelihood estimation, max β  lpn(β) − nθ dj =1 | β j|/| ˜β j| , (4.6) where lpn(β) = supH ln(β, H). Adapting the standard error derivation in Section 4.1 of Lu & Zhang (2007) to the Adaptive Lasso estimation (4.6), the covariance matrix of ˆβ can be estimated by a sandwich formula: {∇2lpn( ˆβ) + nθ A( ˆβ)}−1Σ( ˆβ){∇2lpn( ˆβ) + nθ A( ˆβ)}−1, (4.7) where ∇2lpn(β) is the negative hessian of lpn(β), Σ( ˆβ) = {∇2lpn( ˆβ) + nθ D( ˆβ)}{∇2lpn( ˆβ)}−1 {∇2lpn( ˆβ) + nθ D( ˆβ)}, A(β) = diag(1/β2 , . . . , I(βd , 0)/β2 d). Here we take the convention 0/0 = 0 and set 1/0 to a very large number, say 1010. Since lpn(β) does not have a closed form, we calculate ∇2lpn( ˆβ) using a second-order numerical 1 , . . . , 1/β2 d) and D(β) = diag(I(β1 , 0)/β2 1 difference as in Murphy & Van Der Vaart (1999), that is, (∇2lpn( ˆβ))i, j ≈ − lpn( ˆβ + δnei + δne j) − lpn( ˆβ + δnei) − lpn( ˆβ + δne j) + lpn( ˆβ) δ2 n , (4.8) where ei is the ith unit vector in Rd and δn = Op(n−1/2). The value of lpn(β) can be evaluated using the EM algorithm in Section 4.2.1 with β held fixed. 4.2.3 Thresholding parameter tuning Like all the variable selection methods via regularization, the performance of our adaptive Lasso estimator ˆβθ depends critically on the choice of the thresholding parameter θ. To select it, we 54 minimize the following Bayesian information criterion (BIC), BIC(θ) = −2lpn( ˆβθ) + |αθ| log(n), (4.9) where αθ = { j : ˆβθ, j , 0} is the active set identified by the adaptive Lasso estimation with thresholding parameter θ, and |αθ| is its size. For generalized linear models with a fixed number of covariates, the adaptive Lasso with the thresholding parameter selected using BIC identifies the true model consistently (Zhang et al., 2010; Hui et al., 2015). This motivates us to use BIC to select the thresholding parameter in our variable selection method for interval censored data. The grid points of θ is constructed by following (Simon et al., 2011, Section 2.3). We first find the smallest tuning parameter θ which all coefficient estimates are zero through a rough search, and denote it as θmax because it will be used as the largest value among the possible tuning parameters. After then, the minimum value for the tuning parameter is calculated from θmax by θmin = ǫ θmax where ǫ = 0.0001. In our simulation study and data analysis, we use one hundred grid points for θ between θmin and θmax (including the maximum and the minimum) so as to ignore a value of θ very near 0 and to distribute them reasonable. Thus, as suggested by Simon et al. (2011), the j-th grid point is chosen by θl = θmax(θmin/θmax)l/100, l = 1, · · · , 100. 4.3 Asymptotic properties We study the asymptotic properties of the adaptive Lasso estimator ˆβ obtained from maximizing the penalized likelihood, Wn(β, H) = ln(β, H) − nθn dj =1 | β j|/| ˜β j|, (4.10) 10 with respect to β and H. Denote the true value of β by β0 = (βT 20)T , where β10 denotes the vector of all q non-zero components (1 ≤ q ≤ d) and β20 the vector of zero components. Write ˆβ as ( ˆβ 2)T accordingly. , βT T T 1, ˆβ We assume the following regularity conditions: (C1) The true value β0 belongs to the interior of a known compact set B. The union of the supports of (V1, . . . , VK) is a finite interval [ζ, τ], where 0 < ζ < τ < ∞. The true value of 55 H(·), denoted by H0(·), is strictly increasing and continuously differentiable on [ζ, τ], and 0 < H0(ζ) < H0(τ) < ∞. (C2) The covariate vector X is bounded almost surely. (C3) The covariance matrix of X is positive definite. (C4) The number of inspection times, K, is positive almost surely, and E(K) < ∞. Additionally, pr(Vj +1 − Vj ≥ η|X, K) = 1 ( j = 1, . . . , K − 1) for some positive constant η. Furthermore, the conditional densities of (Vj, Vj +1) given (X, K), denoted by fj(s, t|X, K) ( j = 1, . . . , K − 1), have continuous second-order derivatives with respect to s and t when t − s > η and are continuously differentiable with respect to X. Condition (C1) is the regularity condition 1 assumed in Zeng et al. (2016). (C2) and (C3) are the special cases of their regularity conditions 2 and 3 respectively. (C4) is almost the same as their regularity condition 4 except that we do not assume pr(VK = τ|X, K) to be greater than a positive constant since this assumption is too restrictive and unnecessary for proving the asymptotic properties in Zeng et al. (2016) as discussed by Zeng et al. (2017). Note that the regularity condition 5 in Zeng et al. (2016) automatically holds for the Cox proportional hazards model. Conditions (C1)–(C4) ensure the root-n consistency of the unpenalized maximum likelihood estimator ˜β (Zeng et al., 2016), which is required for the penalty term in (4.10) to be an adaptive Lasso penalty as defined in Zou (2006). These conditions also ensure that the log profile likelihood lpn(β) has a quadratic expansion around β0 (see Remark A1 in Zeng et al., 2016). Our proofs of the asymptotic properties of ˆβ relies on it. The following asymptotic results are obtained under the above regularity conditions. Theorem 4.3.1 If √nθn = Op(1), then k ˆβ − β0k = Op(n−1/2). Proof First note that the estimator ˆβ obtained from maximizing (4.10) is equivalent to the solution 56 of maximizing the following penalized profile likelihood, Qn(β) = lpn(β) − nθn dj =1 | β j|/| ˜β j|. (4.11) Under the regularity conditions (C1)–(C4), Theorem 1 of Murphy & van der Vaart (2000) is applicable (see Remark A1 in Zeng et al., 2016). Hence, for any random sequence β∗n log profile likelihood lpn(β∗n) has the following quadratic expansion around β0: lpn(β∗n) = lpn(β0) +(β∗n− β0)T n(β∗n− β0)T ˜I0(β∗n− β0) +oPβ0,H0(√nk β∗n− β0k +1)2, ˜S0(Oi)− 2 1 p −→ β0, the ni=1 (4.12) where ˜S0 is the efficient score function for β as given implicitly in Zeng et al. (2016) and ˜I0 is its covariance matrix, the efficient Fisher information matrix. According to Proposition 3.1 and the discussion in Section 4.4 of Huang & Wellner (1997), ln(β, h) is concave in (β, h). Following the proof of Theorem 1 in Zeng et al. (2016), it can be shown that ˆH(τ) is bounded almost surely when n is large. Thus, without loss of generality, we may restrict the parameter h to a compact space. Then lpn(β) = suph ln(β, h) is concave in β. This j =1 | β j|/| ˜β j| leads to that Qn(β) is concave. Hence, to show the root-n consistency of ˆβ, it is sufficient to show that, for any ǫ > 0, there exists a large constant together with the concavity of −nθnd pr( sup C such that lim inf n→∞ kuk=C Qn(β0 + n−1/2u) < Qn(β0)) ≥ 1 − ǫ . (4.13) This implies that when n is large, there is a local maximizer of Qn(β) in the interior of the ball {β0 + n−1/2u : kuk ≤ C} with probability at least 1 − ǫ . By the concavity of Qn(β), the local maximizer must be ˆβ and thus k ˆβ − β0k = Op(n−1/2). By the quadratic expansion (4.12), we have, when n is large, 1 n{lpn(β0 + n−1/2u) − lpn(β0)} = = 1 n 1 n ni=1 dj =1 uT n−1/2 ˜S0(Oi) − 1 2n uT ˜I0u + oPβ0,H0(n−1/2kuk + n−1/2)2 Op(1) |u j| − 1 2n uT ˜I0u + oPβ0,H0((kuk + 1)2 n ) , 57 1 n{Qn(β0 + n−1/2u) − Qn(β0)} where the second equality holds because n−1/2n dj =1 qj =1 n{lpn(β0 + n−1/2u) − lpn(β0)} − θn n{lpn(β0 + n−1/2u) − lpn(β0)} − θn ≤ 1 1 = 1 n ≤ Op(1) |u j| − 1 2n dj =1 i=1 ˜S0(Oi) = Op(1). It follows that | β j0 + n−1/2u j| − | β j0| | ˜β j| | β j0 + n−1/2u j| − | β j0| | ˜β j| uT ˜I0u + oPβ0,H0((kuk + 1)2 n ) + n−1/2θn qj =1 . |u j| | ˜β j| (4.14) Since the unpenalized maximum likelihood estimator ˜β satisfies k ˜β − β0k = Op(n−1/2), we have, for 1 ≤ j ≤ q, 1 | ˜β j| = 1 | β j0| − sign(β j0) β2 j0 ( ˜β j − β j0) + op(| ˜β j − β j0|) = + Op(1)√n . 1 | β j0| Then, under the condition √nθn = Op(1), we have + |u j| √n = n−1/2θn n−1/2θn qj =1(cid:18) |u j| | β j0| qj =1 |u j| | ˜β j| Op(1)(cid:19) ≤ Cn−1/2θnOp(1) = Cn−1Op(1) Therefore, in (4.14), the first and fourth terms are of the order Cn−1, the third term is of a order smaller than C2n−1, and the second term is of the order C2n−1 because ˜I0 is positive definite (see the proof in Zeng et al., 2016). If C is sufficiently large, the second term dominates the others in (4.14). Hence, (4.13) holds, which completes the proof. Theorem 4.3.2 If √nθn → 0 and nθn → ∞, then ˆβ has the following properties: (i) limn→∞ pr( ˆβ2 = 0) = 1; (ii) √n( ˆβ1 − β10) N(0, ˜I−1 10 ), where ˜I10 is the upper-left q × q sub-matrix of the efficient Fisher information matrix for β, denoted by ˜I0, as given implicitly in Zeng et al. (2016). Proof (i) Since k ˆβ − β0k = Op(n−1/2) according to Theorem 4.3.1, to prove limn→∞ pr( ˆβ2 = 0) = 1, it suffices to show that for any sequence β1 such that k β1 − β10k = Op(n−1/2) and for any 58 positive constant C, pr(Qn(β1, 0) = lim n→∞ Qn(β1, β2)) = 1. max k β2k≤Cn−1/2 We show this by proving that for any such sequence β1 and any positive constant C, ∂Qn(β)/∂ β j and β j have opposite signs for any β j ∈ (−Cn−1/2, 0) ∪ (0, Cn−1/2) ( j = q + 1, . . . , d) with probability tending to 1. For any β1 satisfying k β1− β10k = Op(n−1/2) and any β2 such that k β2k ≤ Cn−1/2, lpn{β} has the quadratic expansion (4.12) when n is large. It is then easy to see that ∂lpn{β}/∂ β j = Op(n1/2) ( j = q + 1, . . . , d). Hence, for β j ∈ (−Cn−1/2, 0) ∪ (0, Cn−1/2) ( j = q + 1, . . . , d), = Op(n1/2) − (nθn)n1/2 sign(β j) |n1/2 ˜β j| ∂lpn(β) ∂ β j − nθn ∂Qn(β) ∂ β j sign(β j) | ˜β j| = . Since n1/2( ˜β j − 0) = Op(1), it follows that ∂Qn(β) ∂ β j = n1/2(cid:26)Op(1) − nθn sign(β j) |Op(1)|(cid:27) . (4.15) Because nθn → ∞, the sign of β j determines the sign of ∂Qn(β)/∂ β j in (4.15) when n is large, and they have opposite signs. representation of ˆβ1 in the event { ˆβ2 = 0}. Set ˆh = √n( ˆβ1 − β10) and ∆1n = n−1/2n (ii) According to (i), limn→∞ pr( ˆβ2 = 0) = 1. Thus we just need to derive the asymptotic ˜S10(Oi), where ˜S10 is the sub-vector of ˜S0 corresponding to β1. For any random sample in the event { ˆβ2 = 0}, we have Qn( ˆβ1, 0) ≥ Qn(β0 + n−1/2( ˜I−1 ∆1n, 0)). Note that k ˆβ1 − β10k = Op(n−1/2) according to Theorem 4.3.1 and ˜I−1 10 ∆1n = Op(1). By (4.12), we have i=1 10 qj =1 | ˆβ j| | ˜β j| ˆh + op(k ˆhk + 1)2 − nθn | ˆβ j| | ˜β j| ˆh + op(1) − nθn qj =1 Qn( ˆβ1, 0) = lpn(β0) + ˆhT ∆1n − = lpn(β0) + ˆhT ∆1n − 1 2 1 2 ˆhT ˜I10 ˆhT ˜I10 59 and 10 Qn(β0 + n−1/2( ˜I−1 ˜I−1 = lpn(β0) + ∆T 10 1n ∆1n, 0)) ∆1n − 1 2 ∆T 1n ˜I−1 10 ∆1n + op(k ˜I−1 10 ∆1nk + 1)2 − nθn = lpn(β0) + 1 2 ∆T 1n ˜I−1 10 ∆1n + op(1) − nθn qj =1 | β j0 + n−1/2( ˜I−1 10 | ˜β j| ∆1n) j| . qj =1 | β j0 + n−1/2( ˜I−1 10 | ˜β j| ∆1n) j| Since Qn( ˆβ1, 0) − Qn(β0 + n−1/2( ˜I−1 10 ∆1n, 0)) ≥ 0, it follows that ˆhT ∆1n − 1 2 ˆhT ˜I10 ˆh − 1 2 ∆T 1n ˜I−1 10 ∆1n − nθn The left side of (4.16) is equal to qj =1 | ˆβ j| − | β j0 + n−1/2( ˜I−1 10 | ˜β j| ∆1n) j| ≥ op(1) (4.16) 1 2( ˆh − ˜I−1 10 − ≤ −ck ˆh − ˜I−1 10 10 ∆1n) ˜I10( ˆh − ˜I−1 ∆1nk2 + √nθn | ˆβ j| − | β j0 + n−1/2( ˜I−1 10 ∆1n) j| ∆1n) − nθn qj =1 | ˜β j| ∆1n) j| |√n( ˆβ j − β j0) − ( ˜I−1 10 | ˜β j| qj =1 qj =1 for some positive constant c since ˜I10 is positive definite. It follows that k ˆh − ˜I−1 10 ∆1nk2 ≤ √nθn 1 c |√n( ˆβ j − β j0) − ( ˜I−1 10 | ˜β j| ∆1n) j| + op(1). p −→ | β j0| > 0 for j = 1, . . . , q. Thus, under the condition √nθn → 0, k ˆh− ˜I−1 10 ∆1nk = Note that | ˜β j| op(1), and then √n( ˆβ1 − β10) = ˜I−1 10 ∆1n + op(1) N(0, ˜I−1 10 ). The consistency and sparsity of ˆβ shown in Theorem 4.3.1 and Theorem 4.3.2(i) respec- tively imply that the adaptive Lasso estimator enjoys the selection consistency property, that is, limn→∞ pr(An = {1, . . . , q}) = 1 with An = { j : ˆβ j , 0}. Theorem 4.3.2(ii) implies that the adaptive Lasso estimator for the non-zero regression parameters is semiparametrically efficient as if the unimportant covariates were known. It is more efficient than the unpenalized maximum likelihood estimator ˜β1, whose covariance matrix is ( ˜I−1 0 )11, the leading q × q sub-matrix of ˜I−1 0 . 60 4.4 Extension to left truncated and interval censored data In this section, we extend the variable selection method to left truncated and interval censored data. Such data have one additional variable Vi0 (i = 1, . . . , n), the time to entering the study, to the observed data described in Section 1. Left truncation means that the random sample comes from the subpopulation of subjects whose time to event is greater than his/her time to study entry, also called left truncation time. To avoid confusion in understanding the sampling plan, we define T∗, V∗0 and X∗ to be the time to event, left truncation time and covariate vector of a subject from the target population, respectively. Then (Ti, Vi0, Xi) (i = 1, . . . , n) are a random sample from the subpopulation of (T∗, V∗0 . To describe the assumption below on the inspection process, we introduce a positive random variable Uk (k = 1, . . . , K) to represent the time from study entry to the k-th inspection of a sampled subject, where K denotes the random total number , X∗) with T∗ > V∗0 of inspections. Hence Uik = Vik −Vi0 (i = 1, . . . , n; k = 1, . . . , Ki). Define ®U = (U1, . . . , UK)T . We assume that T∗ is independent of (V∗0 , K, ®U, X∗) , K, ®U) given X∗ and that the joint distribution of (V∗0 does not involve β and H. Under the above assumptions, the log likelihood of left truncated and interval censored data is, up to an additive constant free of (β, H), l(T)n (β, H) = loghexp{−(H(Li) − H(Vi0)) exp(βT Xi)} − exp{−(H(Ri) − H(Vi0)) exp(βT Xi)}i , ni=1 (4.17) where the superscript(T) means truncation. We perform the variable selection through the following adaptive Lasso estimation, , (4.18) | β j|/| ˜β j| l(T)n (β, H) − nθ dj =1 max β,H where ˜β ≡ ( ˜β1, . . . , ˜βd)T is the unpenalized nonparametric maximum likelihood estimator for β, which can be obtained using the EM algorithm below except no penalty being involved, θ is a thresholding parameter whose selection can follow the method in Section 4.2.3, and the maximization with respect to H is over the space of nondecreasing nonnegative functions. We 61 denote the estimator from (4.18) by ( ˆβ, ˆH). Similar to the case of interval censored data, ˆH has the same characterizations as the unpenalized nonparametric maximum likelihood estimator for H, which were given in Alioum & Commenges , i=1 i=1 (1996). Specifically, the estimator ˆH increases only on intervals of the form (l, u] where l ∈ {Li}n u ∈ {Ri, Vi0}n and there is no Li, Ri or Vi0 in (l, u). Additionally, ˆH is indifferent to how it increases on those intervals, as only the overall jump sizes over (l, u]’s with l > 0 and u < ∞, ˆH(u) − ˆH(l), affect the penalized likelihood in (4.18). Write the intervals (l, u]’s with l > 0 and u < ∞ as (l1, u1], . . . , (lm, um]. According to the characterizations of ˆH, we can assume, just for the purpose of computing ˆH, that H is flat on ∪m j =1(lj, u j]. Define hk = H(uk) − H(lk) (k = 1, . . . , m). Then the log likelihood l(T)n (β, H) can be written as l(T)n (β, h), where h = (h1, . . . , hm)T , and l(T)n (β, h) = log exp{− Vi0 = 7), and frequencies of wiping teeth, cleaning teeth with water (0 for never or rarely, 1 for sometimes or usually) and whether a child participates in WIC or Head Start (0 for no, 1 for yes). The weight-for-age percentiles are obtained using the 2000 Centers for Disease Control and Prevention growth charts (see Kuczmarski, et al., 2000). Eleven family-level factors are taken into consideration in the model, inclusive of five children’s oral health-related measures (OHSE, KBU, KCOH, OHF, KITCHEN) and six caregiver’s status-related measures (CESD, PARENTSTRESS, SUPPORT, EDU, EMPLOY, RELIGION). Some family-level measures (OHSE, KBU, KCOH) are constructed by averaging the responses to the questionnaire. SUPPORT was created based on availability of social support in five areas: 1 for support from all five areas and 0 for otherwise (Ismail et al., 2009). The items of the questionnaire for each measure (if applicable) are shown in Appendix C (see Table C.1 and 68 Table 4.4: The candidate covariates considered in the analysis of DDHP data Variable Name Variable Description Child-level GENDER WEIGHT BRUSHRATE WIPE WATER WIC HEADSTART Family-level OHSE KBU KCOH OHF KITCHEN CESD PARENTSTRESS SUPPORT EDU Gender of child Weight-for-age percentile Brushing frequency during the preceding week (0 for < 7; 1 for ≥ 7) Frequency of wiping teeth of the child (0=never or rarely; 1=sometimes or usually) Frequency of cleaning teeth of the child with water (0=never or rarely; 1=sometimes or usually) Participating in WIC (0=no; 1=yes) Participating in Head Start (0=no; 1=yes) Caregiver’s score of perception of self-efficacy related to brushing the child’s teeth regularly Caregiver’s score of knowledge of bottle use Caregiver’s score of knowledge of children’s oral hygiene Caregiver’s belief in oral health fatalism (0 = neutral, disagree, or strongly disagree; 1 = agree or strongly agree) Water filter/purifier on kitchen tap (0=no; 1=yes) Caregiver’s depressive symptoms (0=absence; 1=presence) Caregiver’s parenting stress score Social support received by the caregiver (0=low; 1=high) Caregiver’s education attainment (0 = less than high school, 1 = high school diploma or more) Caregiver’s full-time employment status (0 = no, 1 = yes) EMPLOYMENT RELIGIOUSNESS Caregiver’s frequency of attending religious services (1 for less than three times a month; 0 for at least once a week) Community-level DENTIST GROCER CHURCH Number of dentists in the neighborhood Number of grocery stores in the neighborhood Number of churches in the neighborhood 69 Table C.2). The depression scale (CESD) is dichotomized additionally (0 for < 23, 1 for >= 23) to indicate the presence of symptoms by following Siefert et al. (2007). The rest of the factors are also dichotomized: oral health-related fatalism (0 for neutral or disagree, 1 for agree), the level of caregiver’s education (0 for less than high school, 1 for high school diploma or more), Having water filter/purifier on kitchen tap (0 for no, 1 for yes), the employment status (0 for no, 1 for yes), and frequency of attending religious services (0 for less than three times a month, 1 for at least once a week). Lastly, the numbers of dentists, grocery stores and churches in the neighborhood are included as community-level factors, based on geocoding (Tellez et al., 2006). The data analysis results are shown in Table 4.5. Six covariates (BRUSHRATE, WIPE, HEAD- START, KBU, PARENTSTRESS and EDU) were selected in the Cox model by the proposed adaptive Lasso approach. All of their effect directions except WIPE’s and PARENTSTRESS’s make common sense. A possible explanation about the positive effect of WIPE could be that the wiping cloth was dirty so that wiping teeth accelerated the caries development. A possible expla- nation about the negative effect of PARENTSTRESS could be that spending more efforts taking care of the child’s oral health increased the caregiver’s prarenting stress. As can be seen from Table 4.5, the 5%-level Wald tests for each covariate based on the unpenalized nonparametric maximum likelihood estimation would pick the same set of significant variables as the adaptive Lasso. But this test-based variable selection approach suffers from the multiple-testing issue, especially in this case of 21 candidate covariates. 4.7 Discussion In this chapter, we have proposed a new adaptive Lasso approach for left truncated and interval censored data, based on nonparametric maximum likelihood estimation. The simulation study shows that the proposed model led to better and reasonable performance in parameter estimation and variable selection, compared to the other two models. The data application of the proposed model was performed with the mouth-level DDHP data, focusing on the effects of individual-, family- and community-level factors on the development of noncavitated/cavitated lesions in a 70 Table 4.5: The DDHP data analysis results. NPMLE is the coefficient estimate from the nonpara- metric maximum likelihood estimation, and Adaptive Lasso is the coefficient estimate from the proposed shrinkage method. Standard errors are given in the parentheses Variable NPMLE Adaptive Lasso Child-level GENDER WEIGHT BRUSHRATE WIPE WATER WIC HEADSTART -0.039 (0.040) -0.038 (0.039) -0.114 (0.043) 0.160 (0.041) -0.002 (0.041) 0.033 (0.04) -0.112 (0.042) 0 (-) 0 (-) -0.092 (0.087) 0.138 (0.086) 0 (-) 0 (-) -0.087 (0.107) 0 (-) -0.060 (0.043) 0 (-) 0 (-) 0 (-) 0 (-) -0.094 (0.060) 0 (-) -0.080 (0.085) 0 (-) 0 (-) 0 (-) 0 (-) 0 (-) Family-level OHSE KBU KCOH OHF -0.032 (0.042) -0.098 (0.044) 0.058 (0.045) 0.040 (0.041) 0.015 (0.040) -0.003 (0.041) -0.118 (0.044) -0.023 (0.042) -0.094 (0.042) -0.023 (0.040) EMPLOYMENT RELIGIOUSNESS -0.026 (0.045) PARENTSTRESS KITCHEN CESD SUPPORT EDU Community-level DENTIST GROCER CHURCH 0.054 (0.051) 0.040 (0.052) -0.077 (0.052) 71 child’s caries experience trajectory. We consider the support set of Alioum & Commenges (1996) that has positive mass of NPMLE of H when constructing the log-likelihood. This support set is also applicable for the left truncated and interval censored data, which fact allows us to extend the proposed approach to the truncated case with a simple modification of the estimation procedure. In the truncated case, the corresponding covariance matrix can also be estimated using the profile likelihood method (Murphy & van der Vaart, 2000). These approaches, to the best of our knowledge, are new in the literature. They performed well in finite samples, as seen in Table B.3 of Appendix B. The proposed variable selection method can be readily extended to interval censored data with time-dependent covariates whose trajectories are fully observed, e.g., marital status and parity. The asymptotic properties for the extension can be also easily derived based on this chapter and Zeng et al. (2016), which considered fully-observed time-dependent covariates. A more interesting and challenging extension is to the high-dimensional setting, i.e., d is comparable to n or even much larger than n. The modification of the EM algorithm by replacing the penalty term with a Lasso penalty will be applicable. However, it will loose the consistency of the estimators, as being generally discussed in much of the Lasso approaches. 72 APPENDICES 73 APPENDIX A We regress the correlation parameter ρ in the distribution of the bivariate frailty on gender after the Fisher z-transformation, namely, 1 2 log 1 + ρ 1 − ρ = β0 + β1gender. Table A.1 shows the model estimation results based on the heterogeneity in ρ. The further results are available in Table A.2 and Table A.3. 74 Table A.1: The regression parameter estimates for the saturated model based on the DDHP data Gender Molar Type Quadrant Upper right tooth (54) 1st molar Upper left tooth (64) Lower left tooth (74) Male Lower right tooth (84) Upper right tooth (55) 2nd molar Upper left tooth (65) Lower left tooth (75) Lower right tooth (85) Upper right tooth (54) 1st molar Upper left tooth (64) Lower left tooth (74) Female Lower right tooth (84) Upper right tooth (55) 2nd molar Upper left tooth (65) Lower left tooth (75) Lower right tooth (85) Transition 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 0 → 1 1 → 2 Parameter Estimate β1,1,1,01 β1,1,1,12 β1,1,2,01 β1,1,2,12 β1,1,3,01 β1,1,3,12 β1,1,4,01 β1,1,4,12 β1,2,1,01 β1,2,1,12 β1,2,2,01 β1,2,2,12 β1,2,3,01 β1,2,3,12 β1,2,4,01 β1,2,4,12 β2,1,1,01 β2,1,1,12 β2,1,2,01 β2,1,2,12 β2,1,3,01 β2,1,3,12 β2,1,4,01 β2,1,4,12 β2,2,1,01 β2,2,1,12 β2,2,2,01 β2,2,2,12 β2,2,3,01 β2,2,3,12 β2,2,4,01 β2,2,4,12 -0.0216 0.1192 0.3386 0.2177 0.3556 0.2131 1.4635 -0.3788 1.5315 -0.3816 1.4042 -0.3242 1.4407 -0.3054 0.1258 0.0293 0.2306 0.0680 0.1788 0.2740 0.2308 0.2839 1.5512 -0.2991 1.6082 -0.1906 1.8309 -0.2635 1.8726 -0.3576 SE p-value The reference category 0.0118 0.0326 0.0108 0.8506 0.4781 0.0025 0.1709 0.0016 0.1775 0.1147 0.168 0.1118 0.159 0.1125 0.158 0.1075 < 0.0001 0.1504 0.1081 < 0.0001 0.1497 0.1078 < 0.0001 0.1516 0.1076 < 0.0001 0.1495 0.1616 0.1623 0.1607 0.1581 0.1611 0.1583 0.1605 0.158 0.1568 < 0.0001 0.1466 0.1571 < 0.0001 0.1456 0.1575 < 0.0001 0.1453 0.1575 < 0.0001 0.1457 0.0411 0.4361 0.8567 0.1512 0.6672 0.2670 0.0835 0.1504 0.0723 0.1904 0.0413 0.0697 0.0141 Table A.2: The Wald test statistics (p-value) for the three types of symmetry in caries transition intensity stratified by gender and transition type Gender Test for symmetry Transition 0 to 1 1 to 2 Male Female Right vs Left Upper vs Lower 2.74(0.6025) 0.53(0.9709) First molar vs Second molar 538.41(< 0.0001) 47.08(< 0.0001) 5.73(0.2202) 1.66(0.7976) 51.20(0.0004) 21.71(0.0002) 0.72(0.9490) 19.05(0.0008) 1.85(0.7625) First molar vs Second molar 779.47(< 0.0001) Upper vs Lower Right vs Left 75 0 → 1 Male 0 → 1 1 → 2 0 → 1 Female 0 → 1 1 → 2 Lower left second Upper right second Upper left second Lower right second Lower left second Upper right second Upper left second Lower right second Lower left second Lower right first Lower left first Upper right first Upper left first Upper right second Lower right second Upper left second Upper right first Upper left first Lower right first Lower left first Upper right first Upper left first Lower right first Lower left first Upper right first Upper left first Upper right second Lower right second Upper left second Upper right first Upper left first Lower right first Lower left first Upper right first Upper left first Lower right first Lower left first Lower left second Upper right second Upper left second Lower right second Lower left second Upper right second Upper left second Lower right second Lower left second Table A.3: The detailed comparisons associated with the significant non-symmetries in caries transition intensity found in Table A.2 Gender Transition Teeth Compared Wald Test Statistic (p-value) Lower right first Lower left first -3.160 (0.0008) -3.193 (0.0007) 0.238 (0.4060) 1.327 (0.0923) -13.611 (< 0.0001) -14.208 (< 0.0001) -10.381 (< 0.0001) -10.261 (< 0.0001) 2.519 (0.0059) 3.388 (0.0004) 3.815 (0.0001) 3.891 (< 0.0001) -0.986 (0.1597) 0.494 (0.3016) -3.495 (0.0002) -2.427 (0.0076) -14.066 (< 0.0001) -13.693 (< 0.0001) -16.235 (< 0.0001) -16.201 (< 0.0001) 2.428 (0.0076) 2.003 (0.0226) 4.961 (< 0.0001) 4.165 (< 0.0001) 76 APPENDIX B Table B.1: The variable selection percentages of our method in the truncated case n X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 200 0.996 0.986 0.063 0.062 0.064 0.070 0.075 0.087 0.988 0.993 400 0.031 0.039 0.035 0.033 0.031 0.042 1 1 1 1 Table B.2: Average numbers of correct and incorrect zero coefficients and mean squared errors (MSE) of the coefficient estimators for our method in the truncated case. In the parentheses are the ideal numbers of correct and incorrect zero coefficients n Correct (6) Incorrect (0) MSE 200 400 5.579 5.789 0.037 0 0.0877 0.0347 77 Table B.3: Simulation results on the estimation of the non-zero coefficients in the truncated case. The oracle method is the unpenalized nonparametric maximum likelihood estimation with only the covariates whose coefficients are non-zero, i.e., X1, X2, X9 and X10 Method Oracle β1 = 0.5 β2 = 0.5 β9 = 0.5 β10 = 0.5 Our method β1 = 0.5 β2 = 0.5 β9 = 0.5 β10 = 0.5 n = 200 n = 400 Estimate Coverage Empirical SE Mean of SEs Estimate Coverage Empirical SE Mean of SEs 0.542 0.546 0.545 0.539 0.479 0.480 0.478 0.475 0.968 0.955 0.958 0.960 0.934 0.942 0.945 0.931 0.128 0.129 0.129 0.129 0.143 0.148 0.150 0.145 0.134 0.134 0.134 0.133 0.137 0.147 0.147 0.136 0.542 0.546 0.545 0.539 0.495 0.492 0.496 0.495 0.946 0.948 0.948 0.937 0.944 0.958 0.953 0.942 0.086 0.086 0.088 0.088 0.091 0.093 0.096 0.092 0.089 0.089 0.089 0.089 0.091 0.098 0.098 0.091 78 b^ 1 b^ 2 s e l i t n a u Q e p m a S l 1.0 0.8 0.6 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 Theoretical Quantiles Theoretical Quantiles b^ 9 b^ 10 s e l i t n a u Q e p m a S l 1.0 0.8 0.6 0.4 0.2 0.0 s e l i t n a u Q e p m a S l s e l i t n a u Q e p m a S l 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 Theoretical Quantiles Theoretical Quantiles (a) n = 200 b^ 1 b^ 2 s e l i t n a u Q e p m a S l s e l i t n a u Q e p m a S l 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 s e l i t n a u Q e p m a S l 1.0 0.8 0.6 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 Theoretical Quantiles Theoretical Quantiles b^ 9 b^ 10 s e l i t n a u Q e p m a S l 1.0 0.8 0.6 0.4 0.2 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 Theoretical Quantiles Theoretical Quantiles (b) n = 400 Figure B.1: Normal Q-Q plots of the proposed estimators for the non-zero coefficients in the truncated case 79 APPENDIX C Table C.1: Questionnaire items for the family-level measures (OHSE, KBU, KCOH) Questionnaire items Oral health self-efficacy (OHSE): 1 = not at all confident, 4 = very confident (Q1 - Q9) How confident are you that the child’s teeth will get brushed when: - you are stressed - you are depressed - you are anxious - you are too busy - you are tired - you are worried - child is crying - child does not stay still - child do not feel like brushing right now Knowledge of bottle use (KBU): 1 = strongly agree, 4 = strongly disagree (Q1 - Q3) Putting a baby to bed with a bottle helps the child: - to be better fed - sleep better - to gain weight and grow Q4. There is nothing wrong with putting a baby to bed with a bottle Knowledge of children’s oral hygiene (KCOH): 1 = strongly agree, 5 = strongly disagree Q1. Cavities in baby teeth don’t matter since they fall out anyway Q2. Keeping baby teeth clean is not very important; after all, they fall out Q3. There is not much I can do to stop the child from developing dental cavities Q4. There is not much I can do to help the child have healthy teeth Q5. Children don’t need to brush every day until they get their permanent teeth Q6. Children don’t really need their own toothbrush until all their teeth are in 80 Table C.2: Questionnaire items for the family-level measures (OHF, SUPPORT) Questionnaire items Oral health-related fatalism (OHF): 1 = strongly disagree, 5 = strongly agree Q1. Most children eventually develop dental cavities Social support (SUPPORT): 1 = yes, 0 = no (Q1 - Q5). Is there someone you could count on to: - run errand for you if you needed them to - lend you some money if you really needed it in a time of financial crisis - give you encouragement and reassurance if you really needed it - watch your (child/children) for you if you needed them to - lend you a car or give you a ride if you needed them 81 BIBLIOGRAPHY 82 BIBLIOGRAPHY Alioum, Ahmadou & Daniel Commenges. 1996. A proportional hazards model for arbitrarily censored and truncated data. Biometrics 52(2). 512–524. Batchelor, Paul A & Aubrey Sheiham. 2004. Grouping of tooth surfaces by susceptibility to caries: a study in 5–16 year-old children. BMC Oral Health 4(1). 2. Benjamin, Regina M. 2010. Oral health: the silent epidemic. Public health reports 125(2). 158–159. Bogaerts, Kris, Roos Leroy, Emmanuel Lesaffre & Dominique Declerck. 2002. Modelling tooth emergence data based on multivariate interval-censored data. Statistics in medicine 21(24). 3775–3787. Breiman, Leo. 1996. Heuristics of instability and stabilization in model selection. Ann. Statist. 24(6). 2350–2383. Brumback, Babette A & John A Rice. 1998. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association 93(443). 961–976. Cai, Tianxi & Rebecca A Betensky. 2003. Hazard regression for interval-censored data with penalized spline. Biometrics 59(3). 570–579. Cook, Richard J & Jerald F Lawless. 2014. Statistical issues in modeling chronic disease in cohort studies. Statistics in Biosciences 6(1). 127–161. Diabetes Control and Complications Trial Research Group. 1995. Progression of retinopathy with intensive versus conventional treatment in the diabetes control and complications trial. Ophthalmology 102. 647–661. Edelstein, Burton L. & Courtney H. Chinn. 2009. Update on disparities in oral health and access to dental care for America’s children. Academic Pediatrics 9(6). 415 – 419. Fan, Jianqing & Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456). 1348–1360. Fan, Jianqing & Runze Li. 2002. Variable selection for cox’s proportional hazards model and frailty model. Ann. Statist. 30(1). 74–99. Fisher-Owens, Susan A, Stuart A Gansky, Larry J Platt, Jane A Weintraub, Mah-J Soobader, Influences on children9s oral health: A Matthew D Bramlett & Paul W Newacheck. 2007. conceptual model. Pediatrics 120(3). e510–e520. Goggins, William B & Dianne M Finkelstein. 2000. A proportional hazards model for multivariate interval-censored failure time data. Biometrics 56(3). 940–943. 83 Ha, Il Do, Florin Vaida & Youngjo Lee. 2016. Interval estimation of random effects in proportional hazards models with frailties. Statistical Methods in Medical Research 25(2). 936–953. Harville, David A. 1974. Bayesian inference for variance components using only error contrasts. Biometrika 383–385. Huang, Jian & Jon A Wellner. 1997. Interval censored survival data: a review of recent progress. In Proceedings of the first seattle symposium in biostatistics, 123–169. Springer. Hui, Francis K. C., David I. Warton & Scott D. Foster. 2015. Tuning parameter selection for the adaptive lasso using eric. Journal of the American Statistical Association 110(509). 262–269. Ismail, A.I., W. Sohn, S. Lim & J.M. Willem. 2009. Predictors of dental caries progression in primary teeth. Journal of Dental Research 88(3). 270–275. Ismail, Amid I., Lim Sungwoo & Sohn Woosung. 2011. A transition scoring system of caries increment with adjustment of reversals in longitudinal study: evaluation using primary tooth surface data. Community Dentistry and Oral Epidemiology 39(1). 61–68. Joly, Pierre, Thomas A. Gerds, Vibeke Qvist, Daniel Commenges & Niels Keiding. 2012. Esti- mating survival of dental fillings on the basis of interval-censored data and multi-state models. Statistics in Medicine 31. 1139–1149. Kalbfleisch, John D & Ross L Prentice. 2002. The statistical analysis of failure time data. John Wiley & Sons. Kandelman, D. P. & D. W. Lewis. 1988. Pit and fissure sealants. Preventive Dental Services second edition. Ottawa: Minister of Supply and Services Canada.[Cat. No. H 39-4/1988E] 13–31. Kauermann, Göran, Tatyana Krivobokova & Ludwig Fahrmeir. 2009. Some asymptotic results on generalized penalized spline smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(2). 487–503. Kim, Mimi Y & Xiaonan Xue. 2002. The analysis of multivariate interval-censored survival data. Statistics in Medicine 21(23). 3715–3726. Komárek, Arnost & Emmanuel Lesaffre. 2008. Bayesian accelerated failure time model with multivariate doubly interval-censored data and flexible distributional assumptions. Journal of the American Statistical Association 103(482). 523–533. Komárek, Arnost, Emmanuel Lesaffre, Tommi T Härkänen, Dominique Declerck & Jorma I. Virta- nen. 2005. A bayesian analysis of multivariate doubly-interval-censored dental data. Biostatistics 6 1. 145–55. Kor, Chew-Teng, Kuang-Fu Cheng & Yi-Hau Chen. 2013. A method for analyzing clustered interval-censored data based on cox’s model. Statistics in medicine 32(5). 822–832. Kuczmarski, Robert J. 2000. Cdc growth charts; united states . 84 Lee, Youngjo & John A Nelder. 1996. Hierarchical generalized linear models. Journal of the Royal Statistical Society. Series B (Methodological) 619–678. Lewsey, J. D. & W. M. Thomson. 2004. The utility of the zero-inflated poisson and zero-inflated negative binomial models: a case study of cross-sectional and longitudinal dmf data examining the effect of socio-economic status. Community Dentistry and Oral Epidemiology 32(3). 183– 189. Lu, Wenbin & Hao H. Zhang. 2007. Variable selection for proportional odds model. Statistics in Medicine 26(20). 3771–3781. Mouradian, WE, E Wehr & JJ Crall. 2000. Disparities in children’s oral health and access to dental care. JAMA 284(20). 2625–2631. Murphy, S. A. & A. W. van der Vaart. 2000. On profile likelihood. Journal of the American Statistical Association 95(450). 449–465. Murphy, Susan A. & Aad W. Van Der Vaart. 1999. Observed information in semi-parametric models. Bernoulli 5(3). 381–412. O’Sullivan, F. 1988. Fast computation of fully automated log-density and log-hazard estimators. SIAM Journal on Scientific and Statistical Computing 9(2). 363–379. Pinheiro, José C & Douglas M Bates. 1995. Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of computational and Graphical Statistics 4(1). 12–35. Rahman, P, D D Gladman, R J Cook, Y Zhou, G Young & D Salonen. 1998. Radiological assessment in psoriatic arthritis. Rheumatology 37. 760–765. Schick, Anton & Qiqing Yu. 2000. Consistency of the gmle with mixed case interval-censored data. Scandinavian Journal of Statistics 27(1). 45–55. Selwitz, Robert H, Amid I Ismail & Nigel B Pitts. 2007. Dental caries. The Lancet 369(9555). 51–59. Siefert, Kristine, David R Williams, Tracy L Finlayson, Jorge Delva & Amid I Ismail. 2007. Modifiable risk and protective factors for depressive symptoms in low-income african american mothers. American Journal of Orthopsychiatry 77(1). 113–123. Simon, Noah, Jerome Friedman, Trevor Hastie & Rob Tibshirani. 2011. Regularization paths for cox’s proportional hazards model via coordinate descent. Journal of statistical software 39(5). 1. Stone, Charles J. 1982. Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10(4). 1040–1053. Sutradhar, Rinku & Richard J Cook. 2008. Analysis of interval-censored data from clustered multistate processes: application to joint damage in psoriatic arthritis. Journal of the Royal Statistical Society: Series C (Applied Statistics) 57. 553–566. 85 Tellez, Marisol, Woosung Sohn, Brian A Burt & Amid I Ismail. 2006. Assessment of the relationship between neighborhood characteristics and dental caries severity among low-income african- americans: a multilevel approach. Journal of public health dentistry 66(1). 30–36. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1). 267–288. Tibshirani, robert. 1997. The lasso method for variable selection in the cox model. Statistics in Medicine 16(4). 385–395. Tommi, Harkanen, Virtanen Jorma I. & Arjas Elja. 2000. Caries on permanent teeth: A non- parametric bayesian analysis. Scandinavian Journal of Statistics 27(4). 577–588. US Department of Health and Human Services. 2000. National Institute of Dental and Craniofacial Research. Oral Health in America: A Report of the Surgeon General. Washington, DC: May . Vargas, Clemencia M., James J. Crall & Donald A. Schneider. 1998. Sociodemographic distribution of pediatric dental caries: NHANES III, 1988-1994. The Journal of the American Dental Association 129. 1229–1238. Wu, Ying & Richard J. Cook. 2015. Penalized regression for interval-censored times of disease progression: Selection of hla markers in psoriatic arthritis. Biometrics 71(3). 782–791. Zeng, Donglin, Fei Gao & D. Y. Lin. 2017. Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104(3). 505–525. Zeng, Donglin, Lu Mao & D. Y. Lin. 2016. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103(2). 253–271. Zhang, Hao Helen & Wenbin Lu. 2007. Adaptive lasso for cox’s proportional hazards model. Biometrika 94(3). 691–703. Zhang, Yang, D. Todem, K. Kim & Emmanuel Lesaffre. 2011. Bayesian latent variable models for spatially correlated tooth-level binary data in caries research. Statistical modelling 11(1). 25–47. Zhang, Yiyun, Runze Li & Chih-Ling Tsai. 2010. Regularization parameter selections via general- ized information criterion. Journal of the American Statistical Association 105(489). 312–323. Zhang, Zhigang & Jianguo Sun. 2010. Interval censoring. Statistical Methods in Medical Research 19(1). 53–70. PMID: 19654168. Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476). 1418–1429. 86