th..4' we .7... .vt... . 1.. . ., 3.1.5.7 is: w-.. 3...“... '51.} & 2.: N. 1323...? , . _ :immmfi. A Hierarchical Bayesian Approach to Model Spatially Binary Data with Applications to Dental Research PhD. This is to certify that the dissertation entitled Correlated presented by Yanwei Zhang has been accepted towards fulfillment of the requirements for the degree in Statistics fl Dav/Q WajWor’s Signature 06 / 0 3/08 Date MSU is an afiirmative-action, equal-opportunity employer Michigan State LIBRARY University r—‘i I 0c] Pm, PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 K‘IPrOJIAcc8Pres/ClRC/DateDue indd A HIERARCHICAL BAYESIAN APPROACH TO MODEL SPATIALLY CORRELATED BINARY DATA WITH APPLICATIONS TO DENTAL RESEARCH By Yanwei Zhang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics 2008 ABSTRACT A HIERARCHICAL BAYESIAN APPROACH TO MODEL SPATIALLY CORRELATED BINARY DATA WITH APPLICATIONS TO DENTAL RESEARCH Bv U YanweiZhang Statistical analysis of multivariate binary data measured repeatedly in time or cross—sectionally clustered in space, besides the difficulties of non-continuous nature of data, raises a number of challenges. For instance, dental data from oral health research community are always discrete, clustered spatially and repeated in time. The researchers are interested in the risk factors and spatial symmetry property of caries prevalence incidence. It is well believed. for example. that the caries outcomes adjacent to each other are highly likely to he cm'related. which necessitates the use of methodologies for correlated discrete data. Generalizml estimating equation(GEE) based approach might help answer marginal mean and pairwise association types of research questions about. correlated units of interest. When association among units is of primary research concern, GEE suffers seriously from less efficiency. Methodologies for analyzing n'niltive-u'iate categorical data clustered in space, with both marginal mean and association being of research interest, need to contimie. In this thesis, we will introduce complete likelihood based approaches for analyzing spatially correlated binary data. Specifically, we are going to discuss a class of methods that attempt to explicitly take some very unique spatial structure features into consideration for valid and efficient inferences at tooth level. Furthermore. we proposed different models by using latent Variables with hierarchical levels to account for the spatial dependence of the data features from different points of view. The hierarchical structure of the model and local identifiability of latent variable models make the statistical inference aptn‘opriate within Bayesian framework through the MCMC based posterior sampling algorithm. Comparison among the performances of different models was made under Bayesian model selection criterion (DIC) for missing data problem. Finally, we gave Bayesian hypothesis testing for the spatial symmetries of caries incidence by providing semitendinous credible regions for the differences of quantities that were used to measure spatial association strength. The methodology is illustrated by using dental data from Signal Tandn’iobiel (STM) project. ACKNOWLEDGMENTS I would like. to express my sincere gratitude to my advisor Dr. David Todem for his raluable advice and continuing encouragernents during my years as a research assistant at the department of epidemiology. His help and expectation made my aca- demic life challenging but a rewarding one. I am also grateful to the financial support from his funded project. Dr. Toden'is hardworking, dedication, and creativeness have been and will always be inspiring me to achieve my career goal. Appreciation is extended to Dr. Ramamoorthi and Dr. Gardiner for making it possible for me to become an applied statistician of being Bayesian in principle. I would like to thank Dr. Ramamoortl‘ii, in the first year of my studying in east lansing. who helped me went through the toughest time in my life so far. Dr. Gardiner gave me some advice that helped me make transitions from a mathematician to an applied statistician. I also would like to thank Dr. .\lelfi and Dr. Cui for being my thesis committee members. Special thanks to my parents, they are the greatest parents I could possibly ask for. Especially, my father has been always supportive of my efforts for carrying out my career goals. It. was my father who tried as much as possible to make my education financially possible. I dedicate this work to all of you. TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 Introduction 1.1 Background and objectives ........................ 1.2 Principles for the analysis ........................ 1.3 Outline of the thesis ........................... 2 Bayesian Generalized Latent Variable Models 2.1 Introduction ................................ 2.2 The Spatial Dependence Structures ................... 2.2.1 Notation .............................. 2.2.2 Principles of our modeling approach ............... 2.3 Models ................................... 2.3.1 Generalized Latent. variable model ............... 2.4 Bayesian Estimations and Statistical Inference ............. 2.4.1 Identifiability of the models ................... 2.4.2 Prior distributions ........................ 2.4.3 Posterior computations ...................... 2.4.4 Missing data issue ........................ 2.4.5 Bayesian Model Selection ..................... 2.4.6 Spatial symmetry hypothesis testing .............. 2.4.7 Example ............................. 2.5 The Signal Tandmobiel Project Example ................ 2.5.1 Primary results .......................... 2.5.2 The results from our approach ................. 2.6 Discussion ................................ 3 Bayesian Finite Mixture of Generalized Latent Variable Models 3.1 Introduction ................................ 3.2 The Spatial Dependence Structures ................... 3.2.1 Notation .............................. 3.2.2 Principles of our modeling approach ............... 3.3 Models ................................... 3.3.1 Bayesian Mixture Models .................... 3.3.2 Response Models ......................... 3.3.3 The Structure Model for Latent Variables ........... vii ix NUHH 9 9 11 11 11 13 13 22 22 22 25 27 28 30 36 37 38 39 44 46 46 49 49 50 51 51 52 3.4 Bayesian Estimations and Statistical Inference ............. 59 3.4.1 Identifiability of the models ................... 59 3.4.2 Prior distributions ........................ 60 3.4.3 Posterior computations ...................... 62 3.4.4 Bayesian Model Selection ..................... 65 3.5 The Signal Tandmobiel Project Example ................ 74 3.5.1 Primary results .......................... 74 3.5.2 The results from our approach ................. 75 3.6 Discussion ................................ 84 4 Discussion and Ejture Research 86 4.1 Bayesian generalized latent variable models ............... 86 4.2 Bayesian mixture of generalized latent variable models ........ 90 4.3 Missing data ................................ 93 4.4 Comparison between frequentist and Bayesian ............. 95 APPENDICES 100 A The First Appendix 100 A.1 WinBUCS code one for BGLVM .................... 100 A.2 thBUCS code two for BGLVM .................... 102 B The Second Appendix 107 8.1 WinBUCS code one for BMGLVM ................... 107 B2 WinB UGS code two for BMGLVM ................... 109 BIBLIOGRAPHY 114 vi 2.1 2.2 2.3 2.4 to CI! 2.6 3.1 3.2 3.4 3.5 LIST OF TABLES Prevalence of caries experience(% affected) in the deciduous dentition of 7-year-old children tit-1.351. . Odds ratios and 95% confidence intervals for the 2X2 association mod- els for caries on deciduous molars on tooth in 7—year—old children. Credible intervals of overall spatial association strength comparisons Based on UGGM with unstructured covariance structure Credible intervals of overall spatial association strength comparisons Based on UGGM with CAR model based covariance structure Credible intervals of specific spatial association strength comparisons Based on UGGM with unstructured covariance structure Credible intervals of specific spatial association strength comparisons Based on UGGM with CAR model based covariance structure Prevalence of caries experience(% affected) in the deciduous dentition of 6.7.8-year—old children 11:4,351. . Odds ratios and 95% confidence intervals for the 2x2 association mod- els for caries on deciduous molars on tooth in 7-year-old children. Credible intervals of spatial association strength comparisons based on BGLVMs and UGGM with unstructured covariance structure . Credible intervals of spatial similarity comparisons based on mixture model with 2 components and UGGM with unstructured covariance structure . Credible intervals of spatial similarity comparisrms based on based on mixture model with 2 (1'omp(ments and UGGM with CAR model based covariance structure . 38 39 40 40 41 42 74 ‘1 CI] 79 8t) 3.6 3.7 Credible intervals of spatial similarity comparisons based on mixture model with 3 components and UGGM with unstructured covariance structure . Credible intervals of spatial similarity comparisons based on based on mixture model with 3 components and UGGM with CAR model based covariance structure . viii 81 82 2.1 2.2 2.3 3.1 3.2 LIST OF FIGURES The response vectors rm. ya. 311-3 and y” are tighten by spatial latent vector Qi = (Qil.Q,;2,sz3,Ql-4)' whose joint distribution is given by UGGM with unstructured precision matrix. .............. The response variables yU-l, yijQ, yljg, 31.04 and gym-5 are tighten by “If a run ..— I. . . . 3,. . . . . I 7 ‘ spatral latent vector TI] — (71.1(])‘Tt92fjl’letifj)’Tl.4(])’Tl.5(J)) whose joint. distribution is given by UGGM with unstructured precision matrix. Note: The response variables yz-J-l, gig-2, yU-g, ygj4 and yij5 are tighten I ' ' ‘ ' ' — . . . . . . . . . . I by spatral latent vector T1] —- (TZ‘IU),T,’2(J),T,’3(]),Tz‘4(J),Tz,5(J)) whose joint distribution is given by UGGM with precision matrix under CAR. model assumption. ......................... The response variables .l/zijl~ ij'Q, yiJ-g. 1904 and yus are allocated to the. mlh cluster and tighten by spatial latent vector Tim = r I . ‘ . . .‘ .. . .‘ (Tum-n):Ti.2(m)szi.3(m)~[i.4(m)~Tz‘.5(m)) whose joint. distribution is given by UGGM with unstructured precision matrix. ......... Note: The response variables gut-1, 90-2, 9113‘ 31,-1.1 and gym-5 are are allocated to the mth cluster and tighten. by spatial latent vector . / . . . . Tim. = (Ti,1(,,,).Ti‘2(m),Ti,3(m),T,-.4(m),Tj,5(m)) whose jornt distrrbu- tion is given by UGGM with precision matrix under CAR. model as- sumption. ................................. 16 18 56 CHAPTER 1 Introduction 1.1 Background and objectives In biomedical studies. it is common in practice for a binary disease outcome to be measured either repeatedly across time or cross—sectionally across spatial spots. The motivating example for this research comes from dental research. The caries status of teeth are evaluated as binary outcomes with 1 indicating the presence of caries and 0 otherwise. The caries prevalence incidences are suspected to have a certain spatial symmetry property in terms of the quadrants configuration within the mouth, which is well believed by dentists in practice. It is well known. for example, that the dental caries outcomes adjacent to one another are likely to be correlated. Specifically, there are four quadrants within the mouth and all the quadrants are believed to be correlated to one another. Within each quadrant, the adjacent teeth are also likely to be correlated and the correlation might be affected by the quadrant. Hence, it necessitates the use of methodologies for correlated data to analyze dental data VVl‘ren a patient first visits a dentist. either for a check-up or a. more serious dental issue, the. dentist will normally conduct. a full examination to gain an understand— ing of the patients overall dental health as well as the patients particuh-rr dental problem(s). if any. Because of the complexity and diversity of dental issues: and the numerous teeth involved. it is difficult for the dental health researchers to analyze the dental data, except in a most general and superficial way with respect to quad- rant.tooth position. age. sex, geographical region, etc. In dental practice. it is of interest. to find out some patterns in terms of caries of the teeth, which will help the dentist efficiently examine oral health of the patients and provide people informa- tive guidance for intervention of caries. Researchers have been working on different methodologies to analyze the dental data to address caries incidence pattern related questions. The traditional method for analyzing dental data is based on the num- ber of Deca.ye(.l/ Missing/ Filled Surfaces (DMFS) or Decayed/ h’lissing/ Filled Teeth (DMFT). introduced by Klein cl (1!. (1938). DMFT and DRIFS can roughly express the caries prwalence numerically and are obtained by calculating the number of De- cayed (D). Missing (M) and Filled (F) teeth (T) or surfaces (S) within the mouth. The DMFT evaluation method is a well-known technique and has been used for many years to analyze the effects of variables. such as fluoride, on the dental health of given populations. This approach operates the analysis at the mouth level, which is not informative. in terms of caries pattern. to dentists and patients for oral health exam— ination and caries interventioiis. Dentists and patients are really interested in spatial symmetry patterns of caries. For example, if one caries was found on one specific tooth within a specific quadrant. which tooth will be the next that is highly likely to have caries. If the dentists has some information about the spatial symmetry of the caries, they may efficiently locate or predict which is the next tooth with high risk of developing caries. If so. dentists and patients may be able to pay more attentions to the teeth with high risk. Due to the spatial configuratkm of the quadrants and teeth within each quadrant. the nature of the data requires the methodology for correlated binary data. Lesaffre at (if. (2006) proposed a several methods to analyze the dental data from the Signal Tandmobiel (STM) project. Their approach was based on the General- ized Estimating Equation (GEE)(Zeger and Liang, 1986) to deal with correlated data. Lesaffre‘s approach used logistic regression model framework to model marginal caries incidence using exchangeable working correlation matrix to account the dependence of the data. Their GEE based approach is not able to capture the special correlation structure among quadrants and among teeth within the same quadrant. Roy (2006) proposed a model-based approach for imputing these missing values. His method ex- ploited the spatial correlation among teeth without considering the different strength of spatial dependence among quadrants. \/'z-rrrobbergen el al.(2007) proposed ALR.( Alternating Logistic regression)(Carey ct, (If. (1993)) approach to investigate spa- tial correlation respect to caries patterns in primary dentition in 7-year-old children. At the population level. symmetry in the prevalence of caries experience across the midline was tested at the tooth and tooth surface levels under ALR model. ALR simultaneously modeled marginal expectation of each binary variable as well as the association between paries of outcomes using GEE. Liang et al. (1992) showed that GEE estimates only can reasonably efficient when covariance structure of the response variables is correctly specified. hileanwhile, ALR models have issues of convergence when the cluster size is large. GEE based logistic regression models and ALR models are both marginal model, which means they did not take. care of the heterogeneity and dependence among quad- rants and teeth nested within corresponding quadrants. The estimate of parameters of interest for fixed effect is consistent. but it might. be inefficient and seriously biased. The GEE based approach. as a distribution free. methodology. does not lend itself to classical tools for model checking. GEE is based on the first. order moment and ALR is trying to model the higher order moment of the data while still only focusing on pairwise association without trying to model the joint relationship among the ob- servations. More importantly. it is infeasible to address to the spatial symmetry of association strength among quadrants and the teeth within corresponding quadrants since all these higher order moments characteristics are unobserved. Hence, searching for alternative solutions continues. The valid and efficient joint model for the spatially correlated binary dental data is to incorporate latent variables to induce the dependence structure among quadrants and the nested dependence structure among teeth within corresponding quadrants. Meanwhile the latent variables also can generate a flexible multivariate distributions for the binary dental (lata. Without obvious multivariate distributions for the mul- tivariate spatially correlated binary data. the joint. model for accounting the nature of the data is not straightforward. Another way to model the dental data is us- ing mixture models. Specifically. we can view the distribution of the caries status of the tooth of interest as being a mixture of bernoulli distributions with different probabilities of success. The probability of the incidence of caries is modeled by a logistic regression model that takes the design structure. quadrant and tooth position within the corresponding quadrant, into account. Generalized latent variables and mixture models allow factorization of the joint distributions of the multivariate cor- related binary data into the product of a conditional distributions, given the latent variables and allocation random variables that induce the unobserved heterogerreities and dependence structures among the observations. The of.)jective of this thesis is to develop a. new methodology for complex and likelihood based analysis of multi- variate spatially correlated binary caries experience from the dental data, which can help us examine spatial symmetry of the quadrants, association strength among teeth within each specific: quadrants. In this thesis, we proposed Bayesian generalized latent. variable model (BGLVM) and Bayesian mixture of generalized latent. variable model (BN‘IGLVM) to give flexible multivariate distributions of the spatially correlated bi— nary dental data with dependence structure induced by the latent variables. BGLVM and BMGLVh-f are specified from Frequentist’s point of view but implemented under Bayesian framework. The BGLVM uses logistics regression model giving a flexible multivariate distribution for the dental data with two level of latent variables induc- ing dependence structure for corresponding level of spatial configuration. For the BGLVM, the dependence structures among quadrants and teeth nested within quad— rants are induced by the latent variable models whose covariance structure, modeled by undirected graphical Gaussian model or conditional autoregressive model. For the BMGLVM, the dependence among quadrants is induced by the weights of the mixture components of the mixture model and the dependence among teeth within the same quadrant is induced by generalized latent variable model in the same way as in BGLVM. 1.2 Principles for the analysis The principle of our approach for modeling the multivariate spatially correlated dental data is based on the concept of latent variables that are incorporated into the like- lihood based model for generating flexible multivariate distributions for the observa- tions and inducing multilevel dependence structures due to unobserved heterogeneity from the complex structure of the multivariate correlated binary data. Specifically, two level of random vectors are introduced into to the model via latent variables, which are used to induce spatial dependence structures among subunits at their correspond? ing level and generate flexible distributions for the subunits. The joint distribution for each of the two levels of latent variables is given by Undirected Graphical Gaussian Model (UGGM)(Dempster,1972, Giudici and Green 1999) with respect to different spatial configurations of the subunits at corresponding level. Each level of latent vari- ables is used to induce spatial dependence among subunits at the corresponding level. The first level of spatial dependence structure is the spatial association among four quadrants. The four quadrants are adjacent in spatial frame and also coexist in the C31 same oral biological environment, which make them correlated in some unobserved structure. the second level of spatial dependence structure is the spatial association among the teeth within the same quadrant. It is reasonable to believe that teeth adjacent to one another are likely to be correlated. Meanwhile, we know the oral biological environment is very complicate in the way that the associations exist not only between teeth adjacent to each other, but also with other teeth in the same quadrant. The UGGMs for the latent variables will be based on different precision matrixes: one is unstructured type and the other is Markovian type based on CAR. model (Cressie (1,991)). In this thesis, we are trying to combine the merits of frequetist’s and Bayesian's in model formulations and implement. Specifically, the design structure based mod- els are formulated within the framework of frequentist for considering the marginal identifiability of the model. The latent variables are incorporated hierarchically in the graphical structure of Bayesian model and models are implemented in Baeysian principle. Since our models are based on latent variable approach, local identifia- bility and model complexity will raise lots of technical problems within frequentist’s fran‘iework. For example. computational feasibility in optimization. singularity of the information matrix and accuracy and computatiorial feasibility of high dimensional integration approximation by using either adaptive Gaussain quadrature or MCMC based approaches. Bayesian provides a way to avoid all the above technical concerns by using Gibbs sampling to obtain the posterior distributions of the quantities of interest. we use noninformative priors for the parameters of interest, since posterior inference will not rely on the subjective prior information and it will also give the comparable result with frequentist’s as sample size increase. Meanwhile, we use in- dependent proper conjugate priors to the parameters of interest, which will ensure the validity of the posterior samples obtained by Gibbs sampling and improve the convergence of the MCth based posterior sampling algorithm. More importantly, Bayesian approach can be helpful in complex modeling situations where a frequentist analysis is difficult or does not exist. Lee and Song demonstrated better performance of a Baysian approach in small samples compared with ML estimation. Frequentist’s results rely on the asymptotic arguments, but Bayesian inference is feasible as long as the posterior sampling algorithm converge which can be increased easily in large number of MCMC iterations. All the inferences will be based on credible intervals within Bayesian framework and implemented in WinBUGS. The appropriate model will be chosen by a formal Bayesian model selection criteria based on the DIC for missing data problems (Celeux et al. 2006). 1.3 Outline of the thesis In chapter ‘2. we will systematically describe the principles of generalized latent vari— able approaches for joint modeling correlated discrete data. We will also describe the generalized latent variable model context within the Bayesian framework for an- alyzing the. dental from STM. We. use nmltivariate spatial latent variables at both quadrant level and tooth nested within quadrant level to model a very flexible multi— variate distribution for the binary vectors and induce spatial dependence among tooth through the dependence structure of the spatial latent vectors in the generalized lin— ear model settings. The joint relationship among spatial latent will be modeled under the context of undirected graphical model and conditional autoregressive model cor- respondingly. Model fitting and statistical inferences about the parameters of interest are going to be under Bayesian framework. In chapter 3, we will describe the finite mixture model within the Bayesian frame- work for analyzing the dental from STM. We use Dirichlet. process to model the mixing proportions and multivariate spatial latent Variables to model a very flexible multivariate distribution for the mixture component and induce spatial dependence among teeth through the dependence structure of the spatial latent vectors in the latent variable model settings. The joint relationship among spatial latent will be modeled under the context of undirected graphical model and conditional autore- gressive model correspondingly. Model fitting and statistical inferences about the parameters of interest are going to be under Bayesian framework. In Chapter 4. we will sunnnarize our work and give some routes for the future work. CHAPTER 2 Bayesian Generalized Latent Variable Models 2. 1 Introduction Dental caries is a common oral disease that results in den'iineralization of the tooth. In oral health research, the number of Decayed/Missing/Filled Surfaces (DMFS) or Decayed/Missing/Filled Teeth (DMFT), introduced by Klein et al. (1938), are often analyzed. The two scores are the sums of binary indicators of caries on the teeth and tooth surfaces for the primary dentition. This approach operates the analysis at the mouth level. Leroux et (i1. (200(5) mentioned dental data presents an unique set of challenges for statistical analysis, including large cluster sizes, multilevel data struc- tures (e.g.. teeth within patients, sites or surfaces within teeth). complex correlation structures. Lesaffre et (if. (2006) proposed several methods to analyze the dental data from the Signal Tandmobiel (STM) project. They used GEE based logistic model and log-linear model to model marginal mean with exchangeable working correlation ma- trix to account for the dependence of the data. Vanobbergen et al.(2007) proposed ALR( Alternating Logistic regression ) approach to investigate spatial correlation with respect to caries activities patterns in primary dentition in 7-year-old children. ALR simultaneously models marginal expectation of each binary variable as well as the association between paries of outcomes. Zhu et a1. (2005) proposed a generalized latent variable model framework to analyze multivariate spatially correlated data, which gave an appropriate approach to complex spatially correlated data with large cluster sizes and multilevel data structures. Their approach is sensitive to Euclidian space, and can not take care of multi-level dependence structure of the dental data. More importantly, their method is EM based and implemented via MCMC, which is computationally intensive for high dimensional correlated latent variables poste— rior sampling and without fisher information matrix as byproduct. The purpose of this article is to introduce a Bayesian Generalized Latent Variable Model (BGLVM) framework for general spatial topology structures to explain multi-level correlations introduced by ”between-cluster” and ”within—cluster” random effects. Specifically, the ”between—cluster” random effects are used to induce dependence among quad— rants and ”within-cluster” random effects are used to induce dependence among teeth within the same quadrant. The BGLVM, implemented using Gibbs sampling with non-informative priors, allows us to model the "between-cluster” and ”within-cluster” correlation structures explicitly. It is possible for us to examine the spatial symmetry of quadrants in terms of caries incidence, and capture the special spatial association structure between quadrants for the same subject of interest and among teeth within quadrants, which can help us efficiently characterize the caries incidence at tooth level. 10 2.2 The Spatial Dependence Structures 2.2.1 Notation To model the observations. let y”). denote the kth response. variable within jth cluster of ith subject of interest, where k : l,...,l\'.j == 1....,J.i = 1,...,n. Let yij : (gm-1, ..., ylflw“ yU-K)’ denote the response vector within jth cluster of Hit subject. Let y) : («Uflv yl-j. yffl' denote the collection of response variables of it}; subject. let y = (y’1,...,y;,...,y;,)' denote the collection of response variables of all subjects in this study. For modeling the latent Variables, we. use undirected graphical Gaussian model. let Q, = (Q11, ...,Q,J, ...,Q,-J)’ denote the latent variables at cluster level for i sub- ject, where i : l, ...,‘I'I.. Let TU : (Tijl‘ ""T’iflf‘ ...,TU-A-)’ denote the intermediate level latent variables that are nested within the jth cluster associated with the ith. subject. Let T) : (T51, TIT Tb), denote the collection of all latent variables at intermediate level associated with the ith. subject in the study. Let Li 2 ( LTD, denote the collection of latent variables at both levels associated with the ith subject. 2.2.2 Principles of our modeling approach The dental data shows a two—level spatial association structures, i.e., the first level spatial association structures are among quadrant(V)-(VIII). For the convenience of indexing the data, we will use quadrant(l) instead of quadrant (V) and corresponding index for the other quadrant. The second level spatial association structure is, nested within corresponding quadrant. the spatial correlation among teeth. In general, the valid approaches for analyzing correlated data without explicit multivariate distribution consist are based on either GEE or random effect models. The former is suitable for marginal mean or pairwise associations between response outcomes orientated statistical problems and the latter is for subject specific statis— ll tical issues. The dental data is spatially correlated and has information about. teeth spatial configurations that need to be incarnated in the model to provide explicit structure for inducing dependence among quadrants and teeth at their corresponding levels. The main contribution of this paper is to develop a methodology to model this unique spatial dependence of the deciduous dentition. There is no explicit multivari- ate distribution available for the spatially correlated binary dental caries experience outcomes. Generalized latent. variable models (Skrondal 84 Rabe-Hesketh(2004)) are commonly used to generate flexible multivariate distributions and induce unobserved heterogeneity for correlated data with implicit multivariate distribution. To take the unique spatial structure of dental data into account, we use two levels of latent variables to take care. of the spatial dependence of the teeth within the mouth for each subject. For the it}: subject, at the higher level, we introduce the quadrant level latent vector Q,- that is used to tight the four quadrants by inducing dependence structure among quadrants. The latent vector at higher level is also used to generate flexible multivariate distrilnitions for the quadrant specific response vectors. The joint distribution of this spatial latent vector is given by Undirected graphical Gaussian model with spatial configurations of the quadrants taken into account. The quadrant- wise observation vectors {f/ij : j = 1, J} will be conditionally independent given Q, for i = 1, 72. At the intermediate level level, quadrant—specific spatial latent vector Ti]- is introduced, which is used to tight the five teeth within the same quadrant by inducing dependence structure among teeth within the same. quadrant. Similarly, the intermediate level spatial latent. vector is also used to generate flexible univariate distributions for the tooth specific response outcomes. The joint distribution of this spatial latent vector is given by Undirected graphical Gaussian model with spatial configurations of the teeth and the quadrant taken into account. The observatirms {yijk : k, : 1. It} will be conditionally independent given TU for j = 1. J and i : l, ..., n. l\leanwhile. the intermediate level spatial latent. vectors {Tl-j : j = 1, ..., J} are conditional independent given the higher level spatial latent vector Q,- for 2' = 1, ...,n. In order to assess the spatial symmetry of the caries experience of deciduous dentition, we will examine the association among latent variables at higher level. Due to the complexity of oral biological system, we will give flexible covariance structure for the undirected graphical Gaussian models and formal model selection procedure will be used to choose appropriate one for the data. 2.3 Models 2.3.1 Generalized Latent variable model Sammel(1997) proposed an joint model for different outcomes in Generalized linear model framework with normal latent variables introduced to different models. Mous- taki(2000) extended this framework to a class of generalized latent trait models. Both of the approaches are based on EM algorithm for model fitting and the computational hurdles arise seriously as the number of latent variables increases. One of the primary difficulties is in integrating out the latent variables, although standard approximation can be used, the accuracy will decrease with the dimension of the latent variables. Dunson(2000) proposed a model allows observed and latent variables to have distribu- tion in exponential family. Wang’s (2003) multivariate spatial latent variable model was extended by Zhu et al. (2005) into generalized linear latent variable models for repeated measurements of spatially correlated multivariate data. A MCEMG(Monte Carlo EM Gradient) algorithm was used for model implement, which was based on numerical approximations to marginalize the score functions and Hessian matrix over latent variables. It is well known that MCEMG is seriously computationally intensive and less accurate as the dimension of latent variables increases. In this paper, we propose a Bayesian generalized linear latent variable models with two levels of spatial latent vectors. The joint distributions of the latent vec- tors are given by Undirected Graphical Gaussian model(UGGM) (Dempster,1972, 13 Giudici and Green 1999). In order to test the spatial symmetry property of tooth caries experience within the subject, we proposed statistical hypothesis testing for all possible situations under Bayesian framework. Under the latent variable models, it is assumed that, given the two levels of the spatial latent vectors, the teeth are mutually conditionally independent then we can specify the complete likelihood. We will use MCMC approach to perform posterior inference for the quantities of interest using non—informative priors, which will give the data more flexibility to decide what is going on and also can give comparable inference results to Frequentist’s. Response Models we model the let/L response variable within the jth quadrant of the ith subject, yljk, which is a binary indicator of caries experience of tooth.,-k(j). Conditional on the corresponding two levels of latent variables for the kth. tooth position within the j th quadrant of the ith subject, the response model is given by an exponential family distribution with the probability density function in a general form nijkyijk — bz‘fmjk) all?) ptyijle-i. (1.7.3. e) = Pfl/ijklflijkv so) = exp{ + Cifyz'jka 99)}. (2.1) where U-ijk = a + 1% + 716(1) + Qij + Tik(j) (hv'IcCuIlagh and Nelder ct al. 1989). We assume that the link function g() is a canonical link that relates the mean of yijk to a linear predictor as follows 9(Elllz'jkl'th'A-l) = ”at = a + 31' + MU) + Q2] + Tutu)» where (1,3 = (131,..“131',...,3J)I,')’ = (71(1):---’71s'(1)v7’1(2)~.--°27K(J))’ are the regres- sion coefficients for the fixed effects with constraints Zj A3} = 0 and Zk 7M1) = 0;j = 1, J for identifiability of the marginal mean. Qij and Tik(j) are the random effects that are used to generate flexible multivariate distributions and induce dependence unobserved heterogeneity of the spatially correlated binary dental caries outcomes. It 14 is assumed that the quadrant level spatial latent vector {Q,} are identically indepen- dent Gaussian with zero mean and covariance matrix 2Q. Furthermore, we assume that, given the quadrant level spatial latent vectors {(2, : i = 1, n}, the tooth level spatial latent vectors fTij :j = l, J,i = 1, ..., n} are mutually independently mul- tivariate Gaussian with mean zeros. covariance matrix {ZJT : j = I, J} correspond- ingly. The generalized linear model relates the response variables to quadraiit-specific and l()(')lll-SI)(‘(?lflC covariates and the latent. \l'ariables. Under the latent variable model approach. we can assume that the response vari- ables are conditionally mutually independent, given the vectors of latent variables L : {L.,- = (Q;.TI-’1.....T’ T'J)’ : 2' = 1, ...,n}. The joint probability density of y 1.)..." I conditional on the set of latent variables L and {(1, (3’. “y". tp} is as follows . - J 1' _ plylLe. J'U‘r’wfi) = lIfzr flj=1 Hit-‘21!)fyijkl’lijkw?) . . jk“bif’lgkl (U) .1 K 7r: kg" : explZLl ijl Zkz1f I] 10109) + sz1/ijkas9)}l Structure I'llodels for Latent. l”'(r7"iables In the response model. given the. two levels of spatial latent variables, the conditional independence assui’nption allows the specification of complete likelihood for the re- sponse model. In our modeling approach, the two levels of spatial latent. vectors are used to induce the dependence structure of the teeth of interest. In order to incorpo- rate appropriate spatial latent vectors into the model, we need to choose the ones that can really represent the design structure and characterize the random mechanism of data generating process. The objective of these latent processes is to generate flexible distributions for observations and induce the dependency among observations. UG- G;\ls need to work on specific nodes spatial configurations and we list the possible graphs for both quadrant. and tooth nested within quadrant levels as below. As shown above graphs, the four quadrants can be viewed as four nodes in a graph. If two nodes are not directly connected, they are said to be conditionally indepen- Quadrant V Quadrant VI 9:1 @212 Quadrant Vl II Quadrant Vll 3/111 ya} UGGM Q1: (QilyQi'Zan’Ban‘q), Figure 2.1. The response vectors y”. 11,3, 11,3 and y” are tighten by spatial latent. vector Qi 2 (Q11.Qi2,Q;3.QM)’ whose joint distribution is given by UGGM with unstructured precision matrix. 16 Incisor Molar yijl \ lncisor>< Molar //yw5 yap yij4 Canine 11:13 Figure. 2.2. The response variables tlij1~ yifl, y,-j3. 11,-1.1 and yup-3 are. tighten by spatial 1 given by UGGM with unstructured precision matrix. latent vector Ti} : (T'.1(j_)~Ti.2(j)~Ti.3(j)~T11.4(J')vTram), whose joint. distribution is 17 UGGM _ r~_,__ __.% Incisor Incisor iC uspm Molar Molar «(lb/1 —— 0-" 5/112 —————— L .q/JJ [ i_- T—‘ .ylll ——_"- .ylj?) l___-_ _-_-._ _ _ ---___._J Tz" = (Tut J)” I Trey), T2130), T2340), 71.50)) Figure. 2.3. Note: The response variables yU-l, yU-Z. i103. yup; and guy) are tighten by ' spatial latent vector TU- = (Ti.1(j)~ TLQU)‘ Th3”), T2340). 7115(1)), whose joint distribu- tion is given by UGGM with precision matrix under CAR model assumption. 18 dent given the other nodes in the graph. The graphical model, for the ith subject, is used to describe the spatial configuration of the nodes and characterize the associ- ation strength between nodes of interest. by partial correlation of the corresponding between random variables Q, = (QiLQt'Z- (213. QM)’ that. are assigned to the nodes. As matter of fact. in statistics. partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random vari- ables removed. \\'e can assign an multivariate Gaussian distributed random vector to (2,; i.,e., Q,- ~ ."\'(0, 2Q), which will lead to undirected graphical Gaussian model. After introducing latent variable vector Q, modeled by UGGM, the quadrants, i.e., the quadrant-wise response vectors {Ii/,3- : j = 1, J}, are conditionally mutually independent. Considering the nested spatial structure between quadrants and teeth within quadrants, it is necessary to notice that the nested dependence structure is essential to make the model valid for the problem of interest. The second level of spa- tial latent. vectors, nested within the corresponding quadrant, need to be incorporated into the model. Similarly, within one specific quadrant. say the jth quadrant, a quad- T rant specific UGGM with random nodes Ti] : (Tilt (~20). T130). Ti-'1(j)~Ti5(j)), are .J)‘ introduced. The spatial associations among teeth within jlh quadrant are induced by TU" which is mutually independent conditional on Qt; Furthermme, we assume the ,‘j ~ N(O, 2%):j : 1. .1. After introducing latent. variable vector TU, modeled by UGGM, the teeth within jth quadrant are conditionally mutually independent. “'0 know Gaussian random variables are determined by the first two moments. For the identifiability. we already assume the mean structures of the two levels of spatial latent variables are, vectors of zeros, then the problem will become issues about the covariance structures. The general covariance matrix will be unstructured with symmetric and positive definite constraints. The unstructured covariance matrix can be simplified if we assume Markovian properties for the. nodes, somehow as shown in the third graph. The l\‘larkovian type covariance matrix can be incorporated within 19 spatial statistics by CAR model (Cressie (1991)). The choice of the two types of covariance structure for the spatial latent vectors at tooth nested within quadrant level is made through model selection in Bayesian framework via Deviance Information Criterion(DlC) for missing data problem proposed by Celeux et (1.1.9001), which is an extension of the DIC introduced in Spiegelhalter ct (1.1.(2002) for Bayesian model selections. Uml'i'rected Graphical Gaussian. 2110(er In this sectitm, we review the graphical Gaussian model (Demj’)ster,1972) required for this paper. Let C = (V. E) be an undirected graph with vertex set V = {1.....k,...,I\'} and edge set E = {ekk’ : k 74 A" = 1,...,K}, where ekk’ = 1 or 0 according to whether vertices k and k’, l S k. 75 k’ S K are directly con- nected in C or not. In the undirected graphical Gaussian model, the edges set describes the associate structures of the. vertex set. Random vector is assigned to edges set to represent the a-issociation strength between corresponding vertexes. The undirected graphical Gaussian model consists of all k dimensional normal dis- tribution, say X : {X1....,XA ..... ,XK}. with X ~ .'\'(O,Z) and precision matrix Q : 23—1 : {wk/{CI : k yé k’ : 1, K}, where Z is unknown but satisfies the following restrictions in terms of the pairwise conditional independencies determined by the Markov properties (Drton and Perlman (2004)): 7 . . .’ _ ' «\‘r\{kk/} 4:) pkk/ : 0 VA :/£ 11 »—1,....I\s where {pkk’} is the so called the partial correlation between the kth and HM vertex in the graph, defined as p“, = —tuH,//,/wkk *wk/kI. This partial correlation 15 a measurement of association between two quadrants of interest with the effect of the rest quadrant being removed. We will use partial correlation to examine the spatial symmetry property of caries prevalence. ‘20 Conditional A utoregresslve Models For the vector of univariate variables V = (1/1,z/2.....z/K)’, the zero-centered CAR Specification, where s is the number of spatial nodes of interest, following Cressie( 1991), sets (Ukll/_k,0g) ~ 1V“) 2 bkk/l/LJIE); k =1,..., K, (2.3) Vk’EV—k where u_k = I/\ {Vk}. Following Brooks (1961) lemma the resulting joint density for u takes the form f(u|02) o< exp{—%VTD;21(I — pB)1/} (2.4) where B is K x K matrix with B = (bkk’) and bkk = 0 and D02 us an K x K diagonal matrix with non-zero entries {0,3 : k = 1, ..., K}. The precision matrix D—21(I — pB) ' (7 need to be symmetric, which. yields the conditions rim/oi, = bk/kflg; Vk, k’ = 1,1r. (2.5) If the precision matrix is positive definite, then (4) is a proper distribution. Un- der above parameterizations, the precision matrix D‘21(I — pB) is nonsingular if (7 p E (A‘1 Afnlw) where Amina Am” are the smallest and largest eigenvalues of B re- min’ spectively. It usually assumes that the D02 = 02M, where M is diagonal matrix with diagonal elements Mick proportional to the conditional variance of 0%. Meanwhile, 02 controls the overall variability and p represent the overall spatial association. Weights matrix B with Bkk’ need to reflect the spatial association between nodes k and k’. GoeBUGS(2004) sets Bkk’ = bkk’ = l/le, for k yé k’ and Mkk = l/nk where nk is the number of nodes which is adjacent to node k. Under the above settings, the spatial latent vector V will follow a proper distribution, i.e. u ~ N(0,02(1 — pB)'1M) (2.6) 21 2.4 Bayesian Estimations and Statistical Inference 2.4.1 Identifiability of the models Frequently. models with latent variables are not globally identifiable. One can inte- grate out the latent spatial variable vectors to obtain a marginal likelihood to assess whether parameters are redundant. The likelihood of the latent variable model is parameterized by SQ and {EJT : j = 1, ..., J}. The identifiability problem become to examine if the parameters involved in the covariance are redundant, which might be problematic within frequentists framework. Dawid (1979) and Gelfand & Sahu (1999) discussed model identifiability issues within Bayesian framework. In partic- ular, Suppose that the Bayesian model is denoted by the likelihood L(t9; y) and the prior [(0) and we partition the parameters of interest as 6’ = (91.63). If f(02|61.y) = ff92|flil (2-7) then we say that ()2 is not identifiable, where f(92|()1.y) oc L(61.92:y)f(62|91)f(91). That is, if observing data y does not increase our prior knowledge about 92 given 91, then 62 is not identifiable by the data. Dawid's formal definition of Bayesian model nonidentifiability states that 62 is not identifiable if and only if “61,92; y) is free of 62. In order to make our model identifiable, we need to not. only take care of marginal identifiability of the model through integrating out. the latent. variables, but also put some constraints to the covariance matrix of the Gaussian spatial latent vectors at both levels. 2.4.2 Prior distributions In this section, prior distributions are chosen for the regression parameters 0 and association parameters 6. Gibbs sampling algorithm is applied to simulate the samples from the posterior distributions of the quantities of interest. Zhao et al.(20f)6), Zeger cl (1.1.(1991) and Dunson et al.(2000) all suggested noninformative conjugate prior 22 distributions for the parameters of interest, which can wash out the effect of priors as sample size increases. Bedrick et al.(1996) noted that normal prior distributions were suggested for the logistic regression coefficient 6. 9 N lV(/l, F), (2.8) where u is the a vector of location parameters and F is the covariance matrix. It is common to take it as vector of zeros and F as diagonal matrix with very larger entries. we are interested in the joint posterior distribution of (6, fily). Under mild con- dition in (Geman and Geman ct al.(198tl)), Gibbs sampler can obtain the joint pos- terior distribution by sampling from the conditional posterior distributions (QR/,5) and (fly, 0) correspondingly. To simplify the sampling from the conditional posterior distributions, we choose hierarchical independent priors for 9 and 5 in this hierarchi- cal Bayesian model, i.e. (Eli/,6) = (fily), which is true as long as the priors satisfy p(9,€) = p(6)'p(§). We proposed two covariance structures for the Guassian spatial latent. variable models. In the generalized linear model setting with Gaussian random effects, the proper noninformative conjugate priors will be Inverse Gamma(IG) for signal variance component and Inverse W ishart distribution for a variance-covariance matrix. Let QQ = EEC—21 and {QT}. = )3}; : j = 1, ..., J} denote the precision matrixes of the two levels of spatial latent vectors correspondingly. At higher level, the precision matrix for the spatial latent vector {(2, : 7' :2 1.....n} is unstructured. W'ishart priors (Dunson et al.(2000), O'Malley and Alan M. Zaslavsky et al.(20()6)) is applied as conjugate non-informative priors for the precision matrix 9Q under unstructured situation. QQ ~ W’ishart(eQ, AQ), (2.9) with the degrees of oQ and the precision matrix AQ. In practice, the common 23 noninformative \V'ishart. prior is chosen by specifying AQ = IvapQ and ’t‘Q = ra'nlc(ZQ)+1. It will yield a prior under which the marginal distribution of each corre- lation parameter is U(1. 1)(O‘.\Ialley ct, (Ll.(20()(3)). At intermediate level, we have two precision matrix structure for the spatial latent vectors {Tij : j = 1, J,i = 1, n} and we will give noninformative priors correspondingly. (1) Unstructured precision matrix in the UGGM: Conditional on the higher level spatial latent vector (2,, the intermediate level spatial latent vectors {Tl-J- : j = 1, ..., J} are conditionally independent. So, we give independent priors to the pre— cision matrix {QT}. : j = 1,...,J}. Similarly, independent Wishart processes are assigned as priors for these precision matrixes. 127‘}. ~ ll'lsll(ll‘l(l’Tj..\7:}.)2 j: l.....]. (2.10) with the degrees of 17 : run/{($12) + 1 and the precision matrix AT- = 11.7. “T . J J J j j (2) CAR. model based precision matrix in the UGGM: Conditional on the higher level spatial latent. vector (2,, the intermediate level spatial latent vectors {Tij : j = 1, J} are conditionally independent. So. we give independent priors precision ma- .- ..-__ ...,... .-,, 2 _.'_ trix {QT}... j — 1.....J} that arc parameteiizcd by {of/2] .j — 1....,J}. Sunllaily, independent Inverse Gamma (Dunson et al.(2()()())) distributions. proper conjugate aims. are assi net as rior.‘ o «1% )v r1 variation )' 'n ;e o. : '= , f‘lt. st tl((€1ll t araretrs 12 1 J and independent uniform distribution with supports constraints in section 3.1.4 to the overall spatial association parameters {pj : j = 1,...,J}, improper priors CeoBUGS(200‘1) for the over quadrant specific spatial association parameters, re- spectively. of ~ l(I(E.5): j: 1....../. (2.11) and r —1 —1 . , p} N L" (Ann-”'AITIQJ'): J : 1~ J- (212) 2:1 where 5 is very small positive number and Ami/1’ A;,},_,. are as defined in section 3.1.4. 2.4.3 Posterior computations MCMC techniques are used for the posterior computations in the models proposed in section 3. The posterior distributions of parameters of interest can be obtained in standard way (Dunson et al.(2000), Zeger et al.(1991)). Given the precision matrixes (2Q and {QTi : i = 1,...,I}, the joint posterior distribution for the regression parameters and latent variables at both higher and intermediate level is Pt9~Q~lel 0< [In/WOT)” (9 Q T) ”UV/1M b ”(ft/l) oc exp {21.12 k{ “z( ) + (wiggle-4119)} - %9’F—16 (2.13) X exp {’72 2?:1 QngQ-z‘ ’ 52?;12521TQQTF11 1 where Zijk denote 21-12]; 121— 1, 7r( )denote the joint prior density, 62— — ( ’1,...,Q:-,...,Q,,)’ with Q,- = (Q,-1,...,Q,-j,. .,Qz'J)’, T— — (Tl',..., Ti’,...,T,’,)’ with T, = (Tf1,..,T,-'J-,...,TZJ)' and ”Tl-j : (Ti.l(j)~"'1Tz'.k(j)1'"1T2'.K(j)ll' Furthermore, 9=(0 3’7 eldlld17.J1—a+flj+n +sz+ T413)- If the MCMC algorithm is a Gibbs sampler, the full conditional distribution of each of the unknowns in (13) needs to be specified, which can be obtained in a standard way Dunson et al.(2000). Zegeret al.(1991)). For the fixed effect. 6, the full conditional distribution is 71111-1. — b,(n. '1) PfngsT.y)O(exp Z{ ’J ’3 (.) U i,j,k 0" 5“ 1 _ +cr,(1/,jk.1,9)}—;)-6'F 16 . (2.14) The full conditional distribution for the Gaussian spatial latent vector 62,, is m 1U 1 ”(7231.) thilTy9)of(3lllyobsa ymis: Qldymis- (222) The full likelihood of (Luv, 9) is proportional to the above, i.e. quzztc't QlYobm M) OC ftYLb.s-J1|cfi'..0)- (223) If the the distribution of missing-data mechanism does not depend on the missing values Ym is» then f(Yobsv Allie/'3 Q) : f<1lllyobse 9) f f(Y0bs'symisl1U§ Wyldyniis , (2.24) : [(A'Ilyobs» mflyobslzui 749)- Under MAR (missing at random) assumption and LI" and Q are distinct. the like]il‘iood-based inferences for L will be the same as likelihood-based inlerermes for Lil! from f(l;.)gmlt"). From Bayesian point of view, missing data is treated as random as well as the parameters of interest. One of the advantages of the Bayesian hierarchical approach implemented in WinB U GS is that missing data from the response variables can be routinely handled. In most. statistical packages, incomplete cases (in either the re- sponse or the covariates) are removed from any analysis. W'mBUGS generates a sample to replace missing responses from the posterior distribution of the response variable under MAR assumption. 2.4.5 Bayesian Model Selection The formal procedure for choosing an appropriate Bayesian l‘iierarchical model for the observed data necessities methods to compare alternative models within Bayesian 28 framework. The DIC (deviance information criterion, Spiegelhalter ct (Ll. (2002)) is a hierarchical modeling selection criterion that can be viewed as a generalization of the AIC (Akaike information criterion,Akaike, 1973) and BIC (Bayesian information criterion, Kass and Raftery. 1995). It is particularly useful in Bayesian model selec- tion problems where the posterior distributions of parameters have been obtained by Markov chain Monte Carlo (MCMC) sinmlation. The DIC—statistic is a measure of model complexity and goodness of fit with the definition as 01C 2 0(a)) + [)0, where 0(a)) is the posterior mean of the deviance 0(u’i) = —2log(f(y|L/Lv)), which is a measurement. of goodness of fit of the proposed model for the observed data. Let 0(L_) be the deviance evaluated at the posterior mean of L9. Let 110 = 0m —— 0(0) denote the effective number of pz-u‘ameters in the model. which is a penalty for the. complexity of the mode]. The quantities 00—) and 0(0) can he obtained routinely from an MCMC simulation chain. Our hierarcliial models contains two levels of latent variables. which necessitates the model selections to be based on the DIC for missing data problems (Celeux et al..200(i). In terms of our problem, we have to deal with both missing data and latent variables to get a complete DICs. In order to deal with missing data, we consider the complete likelihood (21) and the deviance function has the form 0(0) 2 -2 log {f(YO,M. .‘lll’L‘. 9)} = ’210g{j ffyobs‘ ymislfl‘li lwlff‘wlyobw Urn-2's: Qldymis} ~ where 0 = (W, g')’. Pettitt et a1. (2006) gave an approximation for (21) in the form (2.25) 0(0) = ~210s{ftz/uawyfil‘ll:t")f(:\/l.I/ubs~.117}9)} ~ (226) where im, is the posterior predictor for the missing data. Y,,,,;,.. In order to deal with the latent vectors. we need to compute the complete DICs in C'eleux el, (1.]. (2000). Let E,,.[()|y. (1. t] denote the posterior mean of Liv, based on the 29 complete data (y', q’, t'), where (q’, t’)’ is the realization of the spatial latent vectors (Q’, T’)’. The DIC for the complete data model is DICtysq.!)=--~-1Et.l10s(f(y.q.tlc'))ly.c1~fl+Must/(31.0.tlEulwlywtt0). (52-27) As in the EM algorithm, we can then integrate Q and T out from (26) to get DIC = EQ,T[DIC(y,Q, T)lyl = “4EL”.Q,Tl108(ff!/~Q»Tl'¢3’))lyl+ 25Q,Tl10s(f(y,QyTlel'wly, enmy]. (2.28) All the integrations can be obtained trivially through Monte Carlo integration approximation using the MCMC posterior samples in the coda file of WinB U GS. Combining (2.25)-(2.28), we have the DIC for Bayesian generalized latent variable models with missing data in the form. DIC : EQ,TlD1C(yobsv 97727.5“ 0’1: Qa T)ly0b8) All = ‘4Ew,g,Q.Tl10g(f(yobs~ymiSvQLTlUI)» (W: Qlfuulyobsw 352.73, Qlllyobsv Ml +2EQ,Tl10gff(f/obs~ 30:33- Q~ TlL @ffjl/lyobw fmsw allyobsr .vlf], (2.29) where (1:) = EL‘.('I."'I;t/Ubs. AI. Q, T] and E = EQlQlyubs' M]. 2.4.6 Spatial symmetry hypothesis testing The spatial symmetry property in our problem means the joint caries experience pre- sentations for response variables at quadrant level are highly associated with one an- other. Dentists do believe that spatial symmetry exist in mouth. Lesaflre et al.(2006) showed empirically that the caries experience for left and right quadrants are more strongly associated than the other cases. Unfortunately, few literatures have discussed this issue comprehensively. In our UGGM at quadrant level, we know the partial cor- relation parameters {pjj/ : j # j’ = 1,...,J} measures the strength of the spatial association among two different nodes(quadrants). One of the major concerns of the spatial symmetry in mouth can be formulated as the following hypothesis situation: 30 Hypothesis testing for pairwise comparisons among spatial association strength parameters In order to assess the. spatial symmetry of the four quadrants. we need to introduce different ”Neighborhoods” relationships that. can explain the relative spatial struc- tures of the quadrants of interest. Spatial symmetry is assessed at. the quadrant level, instead of tooth level. At quadrant level, We define the vector of teeth to be ”Hori- zontal Neighbcn‘s” to each other, if the two quadrants are both in either ”Upper Jaw” or ”Lower Jaw”, and to be Vertical Neighbors” to one another, if the two quad- rants are both in either ”Left Jaw” or ”Right Jaw” and to be ”Across Neighbors” to one another, if the two quadrants are either in ”Left Jaw” or ”Right Jaw”. The assessment of quadrant. spatial symmetry in terms of cries prevalence will be based on ”Left-right”, i.e., ” Horizontal Neighbors”. ”Up-down”, i.e.,” Vertical Neighbors” and ”Across”, i.e., ”Across Neighbors”. There are two ways to assess the spatial symmetry among quadrants in terms of caries prevalence incidence through statistical hypothesis statement. The first one is lgiased on the so called ”overall” spatial symmetry assessments via a weighted statistic and the second is the so called ”specific” spatial symmetry assessment that is the direct. comparisons of the spatial symmetry measurements. First. of all. the weighted statistics for assessing the overall spatial associations in terms of ”Left..-rigl1t”, ”Up-down” and ”Across” can be formulated as below: 1 PLR = "2-(p56 + p78): 1 PUD = 50%;? + [158): l , PA : 3m»; + {1.37) 31 The statistical hypothesis testing about the overall spatial association in terms of ”Left-right” V.S. ”Up-down”, ”Left-right” V.S. ”Across” and ”Across” V.S. ”Up- down” can be formulated as follows: (1 ) Left—right. Versus Up-down 110 : flue : PUD V-S- H1, : PLR # MI): (230) H (2) Left-right Versus Across Ho : pm = m V-S- Ha ; fiLR 7Q PA; (231) f3) Across l"'(%7'b"lt5 5719an HO : PA = PUD V-S- Ha : PA 7'4 PUD- (2-32) Secondly, if the assessment is based on the direct comparisons of spatial symmetry measurement, there are twelve possible hypothesis testing situations for the spatial symmetries in terms of partial correlation between quadrants. (1.1) Left—right. Versus Up-down The association between quadrant 5 and quad- rant 6 VS. the association between quadrant 6 and quadrant. 7, with quadrant 6 as reference. Ho : {1.56 = .067 V-S- Ha : 056 5* 967: (2.33) (1.2) Left-right Lenses (»"p-d()*urn The association between quadrant 5 and quad- rant 6 VS. the association between quadrant 5 and quadrant 8. with quadrant 5 as reference. H0 : [’56 = P58 V-S- Ha : p.56 # #582 (234) 32 (1.3) Left-right Versus Up-dow'n. The association between quadrant 7 and quad- rant 8 V.S. the association between quadrant 6 and quadrant 7, with quadrant 7 as reference. H0 : P78 = P67 V-S- Hu : P78 7g P67: (235) (1.4) Left—right versus Uzi—down The association between quadrant 7 and quad- rant 8 VS. the association between quadrant 5 and quadrant 8, with quadrant 8 as reference. H0 1 P78 = P58 VS- Ha 1 P78 7A PS8: (236) (2.1) Left-right Versus Across The association between quadrant 5 and quadrant 6 V .S. the association between quadrant 6 and quadrant. 8, with quadrant 6 as reference. H0 1 P50 = P88 VS- Ha : P58 74 P881 ('2-37) (2.2) Left-right, 17"ersus Across The association between quadrant 5 and quadrant 6 VS. the association between quadrant 5 and quadrant 7, with quadrant 5 as reference. H0 1 P56 = P57 V-S- Ha I P56 7é P57: (238) (2.3) Left-771071.! Versus Across The association between quadrant 7 and quadrant 8 V .S. the association between quadrant 6 and quadrant 8, with quadrant 8 as reference. H0 : P78 = P68 V-S- Ha. 1 P78 # P68; (239) (2.4) Left-right Versus Across The association between quadrant 7 and quadrant 8 VS. the association between quadrant. 5 and quadrant 7, with quadrant 7 as reference. ”0 1 P78 : P57 1’75- Ha : P78 7f P371 (210) 33 (3.1) Across Versus U p-dourn The association between quadrant 5 and quadrant 7 VS. the association between quadrant 5 and quadrant 8, with quadrant 5 as reference. H0 : p57 2 p53 VS. Ha : p57 # p58: (2.111) (3.2) Across Versus Lip-down The association between quadrant 5 and quadrant 7 VS. the association between C uadrant 6 and ( uadrant 7, with (, uadrant 7 as reference. I l 1 H0 : P57 = P87 V-S- Ha : P57 7A P67; (1)-42) (3.3) Across Versus Up—down The association between quadrant 6 and quadrant 8 VS. the association between quadrant 5 and quadrant 8, with quadrant 8 as reference. Ho 1 P88 : P38 VS Ho 1 P68 # P581 (243) (3.4) Across l'crsus Up-dourn. The association between quadrant 6 and quadrant 8 VS. the association between (piadrant 6 and quadrant 7. with quadrant. 6 as reference. H0 1 [’68 2 p07 VS. Ha 2 p68 # {’67- (2.44) Simultaneous credible intervals Pairwise spatial symmetry hypothesis testing is based on credible intervals for the dif- ferences between two partial correlations corresponding to two different. nodes (quad— rants) in the UGGiVI. In Bayesian statistics, a credible interval is a posterior proba- bility interval, used for purposes similar to those of confidence intervals in frequentist statistics. Suppose that parameter c is of interest, a (1 — (010008 credible interval for the parameter c ofinterest is any set C such that Pauly)“ E C) : l— (.l‘, where 77(gly) is the posterior distribution of parameter c given the observed data y. There are two ways to assess the spatial synnnetry among quadrants in terms of caries prevalence 3‘1 incidence through statistical hypothesis statement. The first one is based on the so called ”overall" spatial symmetry assessments via a weighted statistic. and the second is the so called ”specific” spatial symmetry assessn'ient that is the direct comparisons of the spatial synnnetry measurements. Since we are performing a multiple spatial symn‘ietry comparisons among quad- rants in terms of all possible hypothesis testing situations, it is necessary to give a simultaneous credible regions (Besag ct (1.1. (1995)) to control type S error rate (Gelman et (11.), i.e., the similar concept as type I error rate in frequentist’s frame- work. The 1001x'/M% simultaneous credible regions for overall spatial associations differences are based on order statistics (Besag et al. (1995)) {[(p, — p1,)l‘lf+l_’*l. (p1 — p11)l”*l] : (1,11) 6 Neighborhood}, where 1* =1'1'1111{f=#{(P1 — P11)"‘”+17’ l S (PI _ P10“) S (P1 - P11)” l} .>_ Kl» and {(p] — p11)(’) : t = 1......)-[. (1,11) 6 Neighborhood} are the posterior samples of {(pl — p”) : (1.11) E Neighborhood}. Here, Neighborhood 2 {(”LR”,”UD”).("LR”,”A").(”A”,”U0”)}. Similarly, the 100K/ ill-1' ‘70 sin'iultaneous credible regions for specific spatial associ- ations difference are given by 1M 4* 1* . . . . . . . . . . {Han—W/V’H ldflfl'flfl”lhirflorjhu/f¢049mj=1vaf where . ,' . _ * '* .. (.* :111111{f2#{([)iJ-I ~/)J.j/)l.\l+1 I l S (”111’ _ ”11”“) S (pm, _ 101.101! l} 2 A } am ps—pJWML=1whm7¢fij¢fpha¢oymej=r J “mm; n J] posterior samples of {phi — pjj’ : i 75 i',j 75 j’. (7,7’) # (j,j’),i,j = 1,...,J}. 2.4.7 Example Now we show how the above methodology works for dental data and need to spec- ify all the functions and general notations. In our study, all of the responses are binary, so we have the following: ai(Lo) = 1, bif’iz‘jk) = log(1 + exp(7],-jk_)), expffl' A) , - z _ . .r . . _ . _ —_J——1+9xpf'hjkl' g(..1:) — log(1—fir), for k —— 1,...,5,} — Ciff/ijkWD) = 0» Elyer-l'flz‘jkl = 1,...,4,'i = 1,...,n. Hence, the parameters of interest in the observational model is 6 = (o,,3’,*y’)’ and .5 : §_1(EQ,ZI,...,Z]f,....2%)’, then logp(y,-J-k|n,-jk) logp(gij;‘.lQ.,,Tl-'A.U).6) : "lz'jkyzijk — log(1 + exp(1/,~jk)). The canonical parameter {’hijk : k = 1,..,5,j= 1,....4,i=1,...,n} is defined as follows: Priors for parameters of interest are given by iioninformative proper conjugate priors. which will give comparable results as frequentistis as the sample size increases. More specifically, the priors are given as follows: 07 ~ N(0, 1000); (2.46) sj ~ N(0,1000); Vj = 1, J — 1, (2.47) with constraints 2:] .3]- : 0 and Z). Mo) = 0. ‘j : 1. J for identifiability of the. observation model. For the priors of precision matrix, O’Malley and Zaslavsky (2006) proposed scaled \N'ishart distril’mtion as conjugate proper priors QQ ~ I'l"'is/i.(1rrt(-1 + 1,1), (2.49) where I is r1 x 4 identity matrix. 36 For the priors of the precision matrix {QT}. : j = 1, ..., J}, there are two different models for the the structures of the precision matrix. (1) Unstructured precision matrix: QTJ- ~ l'f"'ishart(5 + 1,11), Vj = 1, ....4 (2.50) where I I is 5 x 5 identity matrix. (2) CAR model based precision matrix: rj ~ G'a'm.'m.u(0.001,0.001); V] = 1. ...,4, (2.51) pj ~ U(A;,}n,A;,},,). Vj=1,...,4, (2.52) where {OJ—2 = r]- : j = 1.....4} are the quadrant specific parameters for overall variability and {pl- : j = 1,...1} are the quadrant specific parameters for overall spatial effects. Am," and Am” are as defined in CAR models in section 3.1.4. To construct 95% simultaneous credible regions, we use 11,000 MCMC iterations with 1000 burn in, i.e., M = 10,000 and K = 9, 500. The 95% simultaneous credible regions are more convertive simultaneous confidence regions than frenquentist’s for the multiple hypothesis statements since they have a type S error rate between 0% and 2.5% (Gelman et at). 2.5 The Signal Tandmobiel Project Example In the Signal-Tandmobiel project, there are 4,468 7-year-old schoolchildren (born in 1989) from 179 schools in Flanders (Belgium) who were selected by a stratified clus- tered random sample. The mean age of the children on the day of examination was 7.1 years (SD = 0.4). The 15 strata were obtained by combining the 3 types of edu- cational system (public. municipal and private schools) with geographical areas (the 37 Table 2.1. Prevalence of caries experience(“1 affected) in the ('leciduous dentition of 7-year—old children 1121.351. tooth 55 54 53 52 51 u 61 62 63 64 65 Prevalence 8.92 5.20 0.74 3.72 7.81 H 7.06 2.23 1.86 5.20 8.55 tooth 85 84 83 82 81 “j 71 72 73 74 75 Prevalence 10.78 13.75 1.12 0.74 0.37 I} 0.37 0.37 0.37 11.15 9.67 5 Flemish provinces). The schools represented the clusters. This sample represents about 7% of the corresponding Flemish population. The sampling procedure aimed at selecting each child in Flanders with equal probability. A more detailed descrip— tion of the design of the Signal-Tandrnobiel project is reported in Vanobbergen et al. (2000). 2.5.1 Primary results The frequency table for the prevalence of caries experience in the deciduous dentition is shown in table 1, for the 7—year—old children. The descriptive statistics suggested a spatial symmetrical pattern in terms of caries experience. In Vanobbergen et ul. (2007). pairwise associations were assessed in terms of odds ratio of caries experience via ALR model. The results are shown in table 2. The above result shows that it. is left-right spatial symmetry is the most notable. Decayed teeth of discordant contralateral pairs tend to aggregate on the right. or the left side of the subjects mouth than would be expected by chance alone (Vanobbergen ct. (1(.(2007) ). 38 Table 2.2. Odds ratios and 95% confidence intervals for the 2x2 association models for caries on deciduous molars on tooth in 7-year-old children. First Molar (ALR model) 54 6:1 74 81 54 1648(1375—1974) 8.17(6.91—9.64) 7.23(6.13-8.53) 64 7.61(6.47-8.97) 7.18(6.10-8.44) 74 2282(1928-2700) Second Molar (ALR model) 55 65 75 85 55 1547(1300—1828) 878(752—1027) 923(790—1079) 65 8.08(6.92-9.42) 8.86(7.58-10.35) 75 20.37(17.20-24.11) 2.5.2 The results from our approach Our generalized latent variable models are. implemented in WinBUCS, using nonin- formative priors for the parameters of interest. After 1,000 burn-in, the posterior distributions of the quantities are. based on 10.000 MCMC iterations. There are two possible models indexed by the precision matrix structure for spatial latent vector at intermediate level. The choice for appropriate model is based on the DIC for missing data problem (Celexu et al.(2006)). In this part, we will give the results for both over- all and specific spatial symmetries assessment through simultaneous credible regions for the differences of interest. The results start from the overall spatial symmetry Essessment under different model assumptions. in table 3—4, based on 95% simulta- neous credible regions of the differences that are corresponding to their hypothesis testing situations. It was then followed by the results for specific spatial symmetry assessments in table 5—6. Based on the results from two different models, the posterior inferences about the spatial symmetries are similar. which tells us both models work fairly well. Bayesian 39 Table 2.3. Credible intervals of overall spatial association strength comparisons Based on UGGM with unstructured covariance structure Simultaneous Spatial Effects Credible intervals left/righ .v.s. across pLR —- pA (0.807, 1.238) left/righ .v.s. upper/down across .v.s. upper/down pA —— PHD (-0.775, 0.491) DIC 593.300 N.burnin 1000 Ninteration 11000 Table 2.4. Credible intervals of overall spatial association strength comparisons Based on UGGM with CAR model based covariance structure Simultaneous Spatial Effects , . Credible intervals lefWrigh .v.s. across PLR -— pA (0.807, 1.236) feft/righ .v.s. upper/down PLR jun (0310, 1-427) across .v.s. upper/ down [)A _[)UD (-0.779, 0.410) DIC 780.500 N.burnin 1000 N .interat ion 11000 40 Table 2.5. Credible intervals of specific spatial association strength comparisons Based on UGGM with unstructured covariance structure Simultaneous Spatial Effects Credible intervals left/righ .v.s. across p55 — {168 (0.134, 1.581) (’56 -—' {257 (0.394. 1.719 {173 — {’68 (0.237, 1.589) [’78 11):)? (0133,1728) left/righ .v.s. upperfdown [156 ~ p67 (0.235, 1.551) p56 —- {)58 (0.117, 1.48@ p78 — p67 (0.230, 1.601) p78 — p58 (0.215, 1.5041) across .v.s. upper/down p68 — p67 (-1.303, 1.313) P68 — {158 (-1.327, 1.204L p57 -— p67 (-1.442, 1.109) p57 - p58 (—1.488, 1.042) DIC 593.300 N.burnin 1000 N .interation 11000 «11 Table 2.6. Credible intervals of specific spatial association strength comparisons Based on UGGM with CAR model based covariance structure Spatial Effects _ , C redible intervals left/righ .v.s. across p56 M [10,3 (0.068. 1.404 p50- —- p57 (0.5115, 1.001) p73 — [)68 (0.236, 1.416) p78 — [151- (0477, 1.662L left. frigh .v.s. upper / down {156 ~— p67 (0.297, 1.458) [J56 — p58 (0.007, 1.455) mg — [157 (0.291, 1.524) p78 — p58 0.262, 1.450 across .v.s. upper / down 1068 — p67 (-1.020, 1.209) [)68 —— 10.58 (—1.078, 1.146) p57 — [)(57 (—1.258, 0.970) p57 — p58 (—1.381, 0.950) DIC 780.500 N . burnin 1000 Ninteration 1 1000 model selection is based on DICs, the smaller the. DIC. the better the model. It is common in practice that if the difference between the DICs of two different models are more than 10 then the model with smaller DIC is the better one. Hence, from the results from table 3 to table 6, conditional on the data, the model with unstruc- tured precision matrix is the better one. Specifically, the appropriate hierarchical generalized latent variable model consist of two levels of latent vectors. The first level of Gaussian spatial latent vector has unstructured precision matrix. The sec— ond level of Gaussian spatial latent vectors also have unstructured precision matrix. Furthermore. the choice for the unstructured covariance structure can be explained by the following two facts. (1) The oral biological environment. is so complected that. the higher level Gaussian spatial latent vector might not be able to account for the heterogeneity from four quadrant—wise response vectors sufficiently and leave some residuals to the intermediate level spatial latent vectors. (2) At intermediate level, Gaussian spatial latent vector with CAR model based precision matrix are not sophisticated to account for both the residual heterogeneity and the one from the teeth within corresponding quadrants. Hence, the second level of Gaussian spatial latent vectors need more complicated precision matrix than Markovian type (CAR model based covariance matrix). Based on the chosen model, the conclusion of the hypotl‘iesis testing about both overall and specific spatial symmetry among quadrants are as follows: (1) Left-right. spatial association is the strongest, which is shown in terms of 95% sinuiltaneous credible intervals of the (‘lifferences between left—right and across and the differences between left-right and up—down with lower bounds are all positive. (2) The difference of spatial associations between across and up-down is not significant at type S error rate between 0‘7» and 2.5%, (Gelman (2006)), since 95% simultaneous credible intervals of the difference between across spatial association and up—down spatial association includes zero. :13 2.6 Discussion In this chapter, we propose a flexible class of Bayesian Generalized latent variable models for multivariate spatially correlated binary data with multi-level dependence structure. Our approach is to model the response variables by distributions in the ex- ponential family and impose a n’iultivariate spatial correlation structure on the latent variables, which accounts for the multi—level spatial dependence structures. Statisti- cal inference is based on posterior sampling from the posterior distributions of the parameters of interest. We have used undirected graphical Gaussian model(UGGM) for constructing the precision matrix structures of multivariate spatial latent vectors at both higher and intermediate levels. One consideration is the parameterizations of both the observational and latent variable models, for the identifiability of the model, we constrain sum to zero for the fixed effects and the spatial process has mean zeros. N oninformative conjugate priors are applied for the parameters of interest, which will give a comparable inference results to the frequentist’s as the sample size increases. We proposed two possible models to account for the dependence structure in the den- tal data. Bayesian model selection is based on DIC for missing data problem. Spatial symmetry hypothesis is assessed by simultaneous credible intervals for multiple com- parisons of pairwise spatial association strength. The results from both models Show the generalized latent variables model work well and consistent to one another and also comparable to the results in existing literatures: It. concluded that the left-right spatial association is the strongest and the spatial associations for across and up—down are not different significantly at type 8 error rate between 0% and 2.5%. For the data example, we have assumed that the Gaussian spatial latent process {(2,- : i = 1.71} at higher level and {77,}: : j r l. ./,i : 1.11} at intermediate level are sufficient to induce the. unobserved heterogeneities from the. data at corresptmding levels. It. would be interesting to introduce non—Gaussian latent process to model the underly- ing spatial dependence among quadrants and teeth nested within the corresponding 44 quadrant, which can lead to a richer class of the latent processes {62,- : i = 1, n} and {Tij : j = 1, J,i = 1, 71}. Finally, our model selection is based on DIC and it will be optimal when the model selection is simultaneous through Reversible Jump Monte Carlo Markov Chain(RJMCMC) (Green (1995)) or Birth and Death Monte Carlo Markov Chain(BDMCMC)(Stephens (2000)) . It will be more interesting to consider the symmetry pattern of quadrants for a longitudinal study, which will lead to the spatial-temporal analysis. 45 CHAPTER 3 Bayesian Finite Mixture of Generalized Latent Variable Models 3. 1 Introduction As we have noticed in the above chapter that the dental showed a unique nested dependence structure among the caries experience response variables for the teeth of interest, which lead to a wide heterogeneity of distribution for the multivariate spatially correlated binary response variables. Finite mixture of distributions have provided a matl‘iematical-based approach to model various random phenomena with the flexible distribution. It. is obvious that mixture distributions are extremely useful in the modeling of heterogeneity in a cluster analysis context. It is of great interest that. we can view the quadrant—wise multivariate binary response vectors as from a certain number of underlying subpopulation or clusters. Each of the underlying clus- ter is cl'raracterized by the corresponding underlying cluster-specific parameters and some common parameter to describe. the marginal distribution of the binary response 46 variable with respect to the spatial configurations for each quadrant-wise response vector. The spatial syn'nnetry among quadrants, in terms of the caries prevalence, can be measured by the probabilities that two different quadrant-wise response vec- tors will fall into the same underlying cluster that is indexed by a corresponding cluster-specific multivariate distribution. Zhang et al. (2007) proposed a Bayesian Generalized Latent. Variable Model (BGLVM) to analyze the dental data from the STM project. Their approach used a hierarchical generalized latent. variable model to take care of the multiple level nested dependence structure of the dental data. The multiple level spatial latent variables are used to generate a flexible multivariate distribution for the multivariate binary outcomes and induce the unique nested dependence structure. The joint behavior of the multiple level spatial latent variables are described by Gaussian undirected graph- ical model with different ways to account for the covariance matrix structures. Spatial symmetry checking was based on the partial correlation parameters of the graphical models. Model implement and hypothesis testing are within Bayesian framework. Since we know mixture model is very flexible method of modeling. it is interesting to view the same problem from the mixture model point of view in stead of general- ized latent variable model. It. is also very helpful to give a general framework to use mixture model for analyzing spatially correlated multivariate binary data. Fernandez and Green et a]. (2002) proposed a Bayesian mixture model to analyze spatial correlated data, which gives an appropriate approach in the case of finite, typically irregular. patterns of points or regions with prescribed spatial relationships. The spatial association strength was assessed through parameters that are used to adjust the variability of mixing weights in the mixture from one location to another. Their approach is sensitive to Euclidian space, and can not take care. of multi-level correlations induced by both ”between-cluster" and "within-cluster” spatial configu- ration of the data. Fernandez and Green focused specifically on Poisson distributed data with applications in disease mapping, which are quite different from the situ- ation what the dental data are facing. For the estimation of the true risk pattern. their approach is based on a contimiouslv distributed Markov random fields to model the mixture weight for the correspoiiding con'ij’mnent via legit-normal model. They did not consider other mixture components that can yield flexible distributions for the outcomes and induce complex heterogeneity structure. However, their approach introduced spatial mixture models as an interesting new tool for those modeling het- erogeneity in spatial data. Zhou and Wakefield et al. (2006) proposed a Bayesian mixture model for partitirming gene expression data. which is essentially an approach of clustering the observed data by a mixture model with unknown number of cluster inferred by the data. The aim of their research in which time ordered gene expression data are collected is to determine genes that co—express, that is, have similar patterns of expression, which provided a probabilistic framework for partitioning or clustering, which naturally provides a measure of similarity among genes in terms of expression. Under their approach. partitioning and estimation are conducted simultaneously, and the number of partitions can be treated as a random parameter. which will give the method a certain flexibility in applications. It is noticeable that as always for para- metric l‘iierarchial modeling, the measures of uncertainty are only as reliable as the model, so extensive model checking should be carried out in applications. It is nec- essary to give flexibility to the mixture components rather than as what they did via modeling a marginal parametric mean structure. Extension needs to incorporate covariates at. various stages and other external information need to be taken into account. It is also meaningful to give the framework for analyzing non-normal data under mixture models for clustering. The purpose of this article is to introduce a Bayesian Mixture of Generalized Latent variable Model (BMGLVM) framework for general spatial topology structures to explain nmlti-level correlatirms. The BMGLVM. implemented via Gibbs sampling with non—informal.ive priors, allows us to model the ”between—cluster“ and ”within- cluster” correlation structure explicitly. It is possible for us to examine the spatial symmetry of quadrants in terms of caries incidence. and capture the special spatial association structure among (‘pradrants for the same subject of interest and among teeth within quadrants, which can help us efficiently characterize the pattern of caries incidence at. tooth level. 3.2 The Spatial Dependence Structures 3.2. 1 Notation To model the observations, let yijk denote the kill response variable within jth cluster of ith subject of interest. where k = 1.....K.j = 1.....J.’i = 1.....n. Let yu- (,i/,J1,...y,J-A., ----L‘/zjl\')’ denote the response vector within jlh cluster of i”). subject. Let y,- : (11:1. ”U" flip/l, denote the collection of response variables of ill). subject. let y : (y'l, (11;. 311,), denote the collection of response variables of all subjects in this study. A multinomial model is applied for the allocation process associated with mix- ture models. let Qi 2: (Q21. ....ij, ""QiJ)/ denote the mixture component alloca- tion random variables for the Hh subject, where Q?) : (ijl‘"'7Qf_jlll"“?iji"l), and M is the number of mixture components in the mixture model. for i = 1, n and j : 1,....J. It is assumed that Qij‘s are identically independently multino— rnial distributed. For modeling the latent variables, we use conditional autoregres- sive model. Let Tm, : (Ti 1( . m.) I . - x 7 l . ‘ .....Ti’Mm), ""Ti~1\"(r7'1)) dtnote the latent variables associated with the mth. mixture components for the ith subject. of interest. Let T,- : (Tl-'1, Tim. 'I‘I’V)’ denote the collection of latent variables at intermediate level for the ff}? subject in the study. Let L r {(2:.Tl’ : i = 1.....72} denote the collection of all allocation random variables and latent variables for all subjects. 49 3.2.2 Principles of our modeling approach The dental data shows a two-level spatial association structures, i.e., the first level spatial association structures are among quadrant(V)-(VIII). For the convenience of indexing the data, we will use quadrant(f) instead of quadrant (V) and corresponding index for the others. The second level spatial association structure is. nested within corresponding quadrant, the spatial correlation among teeth. In general, the valid approaches for analyzing correlated data without explicit multivariate distribution consist are based on either GEE and random effect models. The former is suitable for marginal mean or pairwise associations between response outcomes orientated statistical problems and the latter is for subject specific statis- tical issues. The dental data is spatially correlated and has information about teeth spatial configurations that need to be incarnated in the model to provide explicit structure for inducing (lependernre among quadrants and teeth at their corresponding levels. The main contribution of this paper is to develop a methodology to model this unique spatial dependence of the deciduous dentition. There is no explicit multivari— ate distribution available for the spatially correlated binary dental caries experience outcomes. Mixture models(l\IcLachlan and Peel (2000)) are commonly used to are generate flexible n'iultivariate distributions and induce unobserved heterogeneity for correlated data with implicit multivariate distribution. To take the unique spatial structure of dental data into account, we use two lev- els of latent variables to take care of the spatial dependence of the teeth within the mouth for each subject. At higher level, the mixture component allocation random vectors {Qij : j = 1, ..., J} for the ith subject are used to allocate the quadrant-wise response vector 3],] to its corresponding subgroup that is characterized by the mix- ture components of the mixture nmdel. The mixture component allocation process has the function to mix the multiple mixture components into a flexible multivari- ate distributions and induce the dependence among quadrants. Given the mixture component allocation process, the quadrant-wise response vectors {yij : j = 1, ..., J} are cornlitionally nrutually independent. At intermediate level, conditional on the allocation status of the mixture component process. we introduce spatial latent vec- tors, {T,-m : m : 1,.....rlf.i : 1.....n}, that are used to tight the generate the mth mixture component flexibly and induce dependence structure among teeth. The joint distribution of this spatial latent vector is given by Undirected graphical Gaus- sian model with spatial configurations of the teeth taken into account. The obser- vations {llijk : k = 1,...,K,j = 1,...,J} will be conditionally independent given Q,- and T,- for 2'. = 1. ...,n. Meanwhile, the intermediate level spatial latent vectors {Tim : m = 1, .11} are conditional independent given the higher level spatial la- tent vector Q, for i = 1. n. In order to assess the spatial symmetry of the caries experience of deciduous dentition, we will examine the pairwise comparisons for the similarity scores that will be defined later on. Due to the complexity of oral biological system, we will give flexible covariance structure for the undirected graphical Gaus- sian models and leave the number of mixture components to be unknown. A formal model selection procedure will be used to choose appropriate mixture model for the data. 3 . 3 Models 3.3.1 Bayesian Mixture Models Finite mixture models with regression structure have a long and extensive literature and have been commonly used. .\rcha(.-hlan and Peel et (if. (2000) gave a. very general framework for mixture model with non-normal components to deal with overdispersed data. Mixture models are used to facilitate the modeling of the heterogeneity from the overdispersed and correlated data by generating flexible distributions of the responses variable of interest and inducing dependence structures among response variables. Conditional on mixture component allocation process Q”. for mixture model with M components. 1], : (yf1,....,1/,j. 4:11,) has contribution to the likelihood as 1 AI piyilQiigl 21—11 H1{7ijpm(yilezjm=116)}Qi‘jms (3'1) ]:1 711:] where {7ij : m : 1,...,!l[,j = 1,...,J} are the mixture proportions and pm(y1-J-|ij,,, = 1; 6) is the mth components of the mixture model. It is known that the estimation for mixture models is straightforward using EM algorithm but with difficulties and challenges. Bayesian estimation for mixture models is feasible and well defined as long as the posterior simulation algorithm converges. Key initial papers on the Bayesian analysis of 111ixture models using MCMC methods include Diebolt and Robert (1091) and Escobar and West (1995). Provided that suitable (proper conjugate) priors are used, the posterior density will be proper. l’i'i'nBUGS can be used to provide valid posterior samples of the quantities of interest. However, there are some difficulties that have to be addressed with the Bayesian approach in the context of 111ixture models. First of all, improper priors might yield i11’1proper posterior distributions. Secondly, when the number of components M is unknown, the parameter space is ill-defined, which prevents the use of classical testing procedures and priors. Finally, label switching occurs when some of the labels of the mixture components permute. The effect of label switching is important when the solution is calculated itm‘atively because there is the possibility that the labels of the components 111ay be switched on different iterations. In this paper. we will discuss the methods that have been proposed for overcoming the problems 111entioned above. 3.3.2 Response Nlodels \Ve. model the kth response variable within the jth quadrant. of the 1th subject, yijks which is a binary indicator of caries experience of toothi‘km. The response model is specified hierarchically. At higher level, the mixture model (3.1) will give a flexible multivariate distributions for the (uadrant—wise binary data “1 ~ : ' = 1. J and l . . 1} 52 induce the dependence structure among the four quadrants. Simultaneously, there exists mixture component allocation random indicator Ql- = ( 21, ..., ij‘ ...,QZJ)’, where Ql-j = (Q01, ...,Qijm, ---»Q1jM), is a random binary vector with only element being 1, for j = 1, J. 1‘ = 1, n. At intermediate level, condition on the 62,-, for instance, QI'J',” : 1, i.e., yij follows the mth mixture component in the mixture model. Meanwhile, there exists a spatial latent vector Tm, = (T1.1(m)- 71“,”). Tith‘mlll that is used to tight the .1 binary response variables (yi11,...,y,J-k, ”.,}/UK), . The joint distribution of Tm, is given by undirected graphical Gaussian model(UGCM) with spatial configurations of the If teeth taking into account. Essentially, Tim is used to gent—irate flexible multivariate distribution for the binary response vector and induce. the dependence for .‘lz'j'k- Conditional on Q, and {Tl-m : m = 1.111}, the binary response variable 9!ij can be modeled by an exponential family distribution with the probability density function as the general form "irrikyijk — bi (flunk) arte) + Cify-z'jk» (9)}, (3.2) Pruiyzjleijm. : 1~ T1,k(171)ia~'7e13199) : €XI){ where "hm/c : (1m 1+ .31. + T1.k(m) (McCullagh and Nelder et. (11. 1989). \\'e assume that the link function I(/(-) is a canonical link that relates the mean of gut. to a. linear predictor as follows giEif/ijleijm : 1i flint/cl) : Think : “m + L3k + Ti.k(m)* I . , ' - - , where a = (01,...,(1,,,,...,(1M) overall component mean w1th 11'1creasmg order constraints and .13 = (431....,.‘3,‘ ..... ,.lK)’ are the regression coefficients of general- ized latent varial'ile models with constraints deA. = 0. Furthermore, Tim = (T1.l(m')‘ TLHm), T11,K(m)) are the Gauss1an w1th mean a l() and L0\a1‘1d11(.( ma trix {2’1" : m : 1......11}. we assume that {Qt : i = 1,...,n} and {Tim : m = 1. 311,2'. 2 1, ...,71} are mutually independent, which relate. the response variables 53 to quadrant—specific and tooth—specific covariates and the latent variables. Under the mixture model and latent variable model approach, we can assume that the response variables are conditionally mutually independent,given the vectors of latent variables L : {L1 : (Q:,7',fl....,Tlf7,l.....TlfMy : 1' = l,...,'n,}. The joint probability density of g conditional on the set of latent variables L and {7r’, (1’. 3’. 99}, I where 7r = (7n, 7r], ...,7r’J) with 7rj = (7Tj1, ...,7rjm, ...,7rJ-M)’, is as follows .~ ,. K , Qijm p(y|Li7rli(-Y’1.JI~.Q) : Hj‘jvnj {Tb/.7", szl [)r’l(ytjlel]nl : 1" Tz.k(m)‘- 99)} _ K "I‘ ky-v'k—bifflv- k) . =<‘XI){Z,-.j,m{01171.{10st7gml+Zt.—.1l 1m ”(,..-(oi u" “aft/0am] . (3.3) , x , , X n J M . n J A! , where n'ijm and Zijm denote Flizlllj21Hn1.:1 and 21:12]”:12171=1 corre- spondingly. 3.3.3 The Structure Model for Latent Variables In the response model, given the two levels of latent variables, the conditional inde- pendence assumption allows the specification of complete likelihood for the response model. 111 our modeling approach, the two levels of spatial latent vectors are used to induce the dependence structure of the teeth of interest. I11 order to incorporate appropriate spatial latent vectors into the model. we need to choose the ones that can really represent the design structure and characterize the random mechanism of data generating process. The objective of these latent processes is to generate flexible distributions for observatimis and induce the dependency among observations. At higher level, it is assumed there exist independent mixture component alloca- W/ 'I I . tion plocesses, say, Q111m‘szj‘szJv w1th er = (Qijlw-injni-s-'-1Qij111), ~ Mllvll‘.i.1(1,7rj), j: 1....,J,i : 1, ..,.n (3.4) At intermediate level, we will follow the approach in Zhang et (11. (2007) by incor- porating appropriate spatial. latent vectors to formulate flexible mixture components. Incisor yifl UGGM Ti'm : (711.1011): Tithm): 71,3071)» Tidhn): 7115071)), Figure 3.1. The response variables yifl. ij'Q. yiJ-g, yij4 and yijg, are al- located to the mth cluster and tighten by spatial latent vector Tim = (T‘.1(m)1T'.‘2(m)~71,3071)»71.-1(711)~Ti.5(m))l whose joint distribution is given by UGGM with unstrluctured precision matrix. Undirected Graphical Gaussian Models (UGGMs)t are used to give the joint distri- bution for the spatial latent vectors. The UGGMs will take the spatial configurations of the teeth within quadrants into account. The spatial configurations of the teeth within quadrants are as below. As shown above graphs, the five teeth within each quadrant can be viewed as five nodes in a graph. If two nodes are not directly connected, they are. said to be conditionally independent given the other nodes in the graph. For the mth mix- ture component, a UGGM is used to describe the spatial configuration of the nodes 55 Incisor Incisor Cusbid Molar lylolar 92.31 9112 “ya-3 yzj4 —“ 311.15 UGGM 73m : (Ti,l(m)7R,2(n1.)17i7,3(m)1n.4(771.)171,5(7n))’ / ’ /’" /’ ‘fx /" \ /"‘\ 1 11) f 12, ( l3) ( ll ( 1') p/ \_// \d/ V V Figure 3.2. Note: The response variables 311'an ij-Q, yij3, yl-Jq and yl-J-5 are are allocated to the mth cluster and tighten by spatial latent vector Tim = (Ti11(m),Tz-,2(m), Ti‘3(m),T,-‘4(m), Ti,5('m)), whose joint distribution is given by UGGM with precision matrix under CAR, model assumption. and manifest the associations among nodes of interest by assigning random variables ‘ . ... . . _ .,. I ' 3‘ ' - .' . '_ TIT” -— (Thum).T,-‘2(,»,,).Tj‘3(,,,).T,‘,-1(,,,).Thom”) to the nodes in the graph. Mean while, {Tm : m = 1,111} are 111utually independent conditional on Q). A UGGM assumes that Tim I (T‘,1(7‘1‘1)’T'.2(m)’ T’,3(71‘1)’T'.4(m)’Ti,5(m))l N N(O, 2573'), m = 1’ ‘M’ (3'5) L I. l l where 9’3 is a synnnetrical and positive definite matrix for m. = l, ..., M. We know Gaussian random variables are determined by the first two moments. For the identifiability, we already assume the mean structures of the two levels of spatial latent variables are. vectors of zeros. then the problem will become issues about the cove-triance structures {2’77 : m : 1.....11}. A general covariance matrix will be unstructured with syn'nnetric and positive definite constraints. The unstructured covariance matrix can be simplified if we assume Markovian properties for the nodes, somehow as shown in the second graph. The Markovian type covariance matrix can be incorporated within spatial statistics by CAR model (Cressie (1991)). The choice of the two types of crwariance structure for the spatial latent vectors at tooth nested within quadrant level is made through model selection in Bayesian framework via Deviance Information Criterion(DIC) for missing data problem proposed by Celeux et al.(2004), which is an extension of the DIC introduced in Spiegelhalter et al.(2002) for Bayesian model selections. Undirected Graphical Gaussian Model In this section. we review the graphical Gaussian model (Dempster,197‘2) required for this paper. Let C : (LE) be an undirected graph with vertex set V = {1....,A',...,I\'} and edge set E : {Ck/J : 1.? 7f Af' : 1,...,I\'}, where EN." = 1 or 0 according to whether \A'ertices k, and k', 1 S k # k’ s K are directly con- nected in G or not. In the undirected graphical Gaussian model, the edges set describes the associate structures of the vertex set. Random vector is assigned ‘1 Cf! to edges set to represent the association strength between corresponding vertexes. The undirected graphical Gaussian model consists of all k dimensional normal dis- tribution, say X = {X1,...,Xk,...,XK}, with X N N(0,Z) and precision matrix Q = 2‘1 = {wkk’ : k 7é I." = 1, ..., K}, where Z is unknown but satisfies the follow- ing restrictions in terms of the pairwise conditional independences determined by the Markov properties (Drton and Perlman (2004)): (DA/L, 20¢ Xk _.l._ X,M\{ Akl}' Vk- #A.,—_— ...,A,. Conditional Autoregressive Models For the vector of univariate \i'ariz-ibles 1/ = (V1,V2,...,I/K)’, the zero—centered CAR specification, where K is the. number of spatial nodes of interest, following Cressie( 1991), sets (”UV—A1202) ~ ‘\7(/ )Z bIc'A’Vk10L-likzl1"‘1K1 (3.6) )Vk’EV—k where V—k z 1/\ {wk}. Following Brooks (1964) lemma the resulting joint density for 1/ takes the form f(1/|02.')ocerp{— éI/TD: 2(I— pB)1/,} (3.7) where B is K x K matrix with B : {bkk’ : k 7é k' = 1,...,K} and bkk = 0,Vk = 1,...,K and D02 11s an K x K diagonal matrix with non-zero entries {02 : A: = 1,. .,IX }. The precision mat11x D; 21(—I pB) need to be sy mmetric which yields the conditions bkkmi, : bpkaz; W. k.’ = 1, K. (3.8) If the precision matrix is positive definite, then (3.7) is a proper distribution. Un- der above parameter1/at1ons the. p1er1s1on matiix D; 21(1 — pB) is nonsingular if GOV—11in Affllu.) where ,\,,,l-,,,/\,,,,a,l; are the smallest. and largest eigenvalues of B respectively. It. usually assumes that the D02 2 0211!, where M is diagonal matrix with diagonal elements 111k); proportional to the conditional variance of 0%. 02 con- trols the overall variability and p represent the overall spatial association. Weights matrix B with Bkk’ need to reflect the spatial association between nodes k and k’. GoeBUGS(‘2004) sets BA-k’ = b“; = 1/n.k, for k # k’ and M“. = 1/nk where ”k is the number of nodes which is adjacent to node k. Under the above settings. the spatial latent vector 1/ will follow a proper distril’mtion, i.e., 1/ ~ N(0,(72(I — pB)*11‘lI). (3.9) 3.4 Bayesian Estimations and Statistical Inference 3.4.1 Identifiability of the models Based on the framework of the mixture of generalized latent variable models, we have to deal with the model identifiability issues at both mixture model level and generalized latent variable level. At mixture model level, we need to deal with label smite/1.11119 issue. The interchanging of component labels is generally handled by a constraints on the mixing proportions of the form W‘j]STf-j2£m<7rj\l‘ j:1,...,.]. or on the component means of the form 0'13 012 S ~13 0.11- Frequently, models with latent variables are not globally identifiable. One can inte- grate out the latent. spatial variable vectors to obtain a marginal likelihood to assess whether parameters are redundant. The contribution to the likelihood from the latent. variable model is parameterized by {25’} : m : 1, M}. The identifiability problem become to examine if the parameters involved in the covariance are redundant, which 59 might be problematic within frequentist’s framework. Dawid (1979) and Gelfand & Sahu (1999) discussed model identifiability issues within Bayesian framework. In par— ticular, Suppose that the Bayesian model is denoted by the likelihood L(6; y) and the prior f (0) and we partition the parameters of interest. as 0 = (91,02). If #921914!) = f(92l91)1 (3.10) then we say that 92 is not identifiable, where f(62|61,y) (x L(61,92: y)f(92|01)f(91). That is, if observing data y does not increase our prior knowledge about 6-2 given 61, then 62 is not identifiable by the data. Dawid’s formal definition of Bayesian model nonidentifiability states that 192 is not identifiable if and only if L(01, 92; y) is free of 62. In order to make our model identifiable, we need to not only take care of marginal identifiability of the model through integrating out the latent variables, but also put some constraints to the covariance matrix of the Gaussian spatial latent vectors at both levels. 3.4.2 Prior distributions In this section, prior distributions are chosen for the parameters 6 : (7r’,o1’,d')’ and and association parameters 6. The priors are assigned hierarchically to the corre- sponding parameters of interest. Gibbs sampling algorithm is applied to simulate the samples from the posterior distributions of the quantities of interest. At higher level, McLachlan and Peel (2000) used a non-informative conjugate proper prior to mixture proportions in the form: 7Tj =(Wj1111'17rjm111'17rj111),N DfTiChl€l((t,91,”WHO/(1),), ]=1,...,J, (3.11) where (1,91, ..., 9911 1) is the weights vector for the mixture proportions. At intermediate level, Zhao et al.(2006), Zeger et al.(1991) and Dunson et al.(2000) all suggested noninformative conjugate prior distributions for the parameters of interest, which 60 can wash out the effect of priors as sample size increases. Bedrick et al.(1996) noted that normal prior distril‘mtions were suggested for the logistic regression coefficient 9. ((1,, 3’), ~ 1N"'(11, F). (3.12) where. 1.1 is the a vector of location parameters 71' and F is the covariance matrix. It is common to take 11. as vector of zeros and F as diagonal matrix with very larger entries. We are interested in the joint posterior distribution of (9,5Iy). Under mild condition in (Geman and Geman et (1!.(1984)), Gibbs sampler can obtain the joint posterior distribution by sampling from the conditional posterior distributions (Oly, .f) and (fly. 6) correspondingly. To simplify the sampling from the conditional posterior distributions, we choose hierarchical independent priors for 6 and g in this hierarchical Bayesian model, i.e. (€|y,6) = (ély), which is true as long as the priors satisfy p(6, 5) = p(6)p(€). We proposed two covariance structures for the Guassian spatial latent variable models. In the generalized linear model setting with Gaussian random effects, the proper noni11formative conjugate priors will be Inverse Gamma(IG) for signal variance compmient and Inverse Vi'ishart distribution for a variarice-covariance matrix. Based on the relative relationship among nodes in the graph of the UGGM, we give noninforn'iative priors to precision matrix parameter correspondingly. (1) Unstructured precision matrix in the UGGiV‘l: For the ith. subject, conditional 011 the higher level spatial latent vector Q), the in- termediate level spatial latent vectors {Tm : m. = 1, M} are conditionally indepen- dent. So, we give independent priors to the precision matrix {Sle : m = 1, .11}. Similarly, independent \Vishart processes are assigned as priors for these precision matrixes. ”Tm ~ II’IIsltartUJTm,ATm): m = 1,111, (3.13) with the degrees of UT,” 2 ran.k(ETm) + 1 and the precision matrix ATm = 1 =1, , ., . . tr771 KLTI'H 61 (2) CAR model based precision matrix in the UGGM: For the ith subject. conditional on the higher level spatial latent vector Q,, the intermediate level spatial latent vectors {Tim : m = 1, M} are conditionally in- dependent. So, we give independent priors precision matrix {QTm : m = 1, AI} that are parameterized by {03,.p,-,, : m = 1, ...,M}. Sin'iilarly, independent Inverse Gamma (Dunson et al.(2000)) distributions, proper conjugate priors, are assigned as priors to the overall variation parameters {0%, : m = 1.1111} and independent uniform distribution with supports constraints in section 3.3.2 to the overall spatial association parameters {pm : m = l, 1M}, improper priors 0608 U GS (2004) for the over quadrant specific spatial association parameters, respectively. 0,2,, ~ IC(5,€); m = 1, 111, (3.14) and p,,,~U(A—1 ,A;,},,); m=1,...,1\I, (3.15) min where 5 is very small positive number and A7711", A3,, are as defined in section 3.3.2. 3.4.3 Posterior computations Let (7r',a’,t3’,Q’,T’, ET)’ denote the current state of the Markov chain. We will follow the steps (1)-(3) to obtain the posterior samples of quantities of interest from their posterior distributions. McLachlan & Peel (2000) and Fernandez & Green (2002) gave a general posterior sampling algorithm for the mixture model. Step 1: Posterior sampling for mixture proportions; I . .,V f v I 7rj = (7Tj1, ...,7rjm, ”Haj-M) ~ Dzr'zchlct((t,91+i\'J~1....,p,~,,+ Ajm» ..,.,QM + .N'jM) ) (3.16) 02 where NJ-m = Zilez’jm for j— — 1,. J. m = 1,...,M and to = (3.91....,t,oM)’ is a known weight vector. Step 2: Posterior sampling for ‘misrture component allocation random variables; Qij = (Qij1~---~szj.~tlll ~ Aluuifilils(lew-w'rjjll’), j=11i=1,...,n, Wj-,71Pn2(1/1J'IQU,71:1 Tim 6) M , with 2),:1 ”jkPA-(yzlerjk: lviTk 0) where TJ-m :- K Pmil/ileijm Z lsTimi 6) = H th‘lijleij-ni = 1vTi,k(rn.)i6)» k=1 and p,,l(y,J-k|Q,-j,,, : 1,T,‘A.(,,,)20) is defined in (3) with [091% (ytjlelJln— — 1 TLk(lll)'6)} : (1,” +113k + Tl.k(nl)' Step .J. Posterior sampling for generalized latent variable models. Conditional on the mixture component allocation process at higher level. the pos- terior distributions of parameters and latent variables in the generalized latent vari- able model can be obtained in standard way (Dunson et. al.(2000), Zegeret al.(1991)). Given the precision matrix {Sle : m : 1,111}, the joint posterior distribution for the regression parameters and latent variables at intermediate level is 19(9aTlQay; 7r) CX rat/162,71 y; 9)f(9. T) 0< eXI) {2:th Qijm {loam-m) + Z}.- Qijmi "jinikyykv—bimmk) + Cin‘jka W} l} “1159) ’11 X exp { é—QIF 16 — 21:14:12172—1Tilflr/iQTmTlm} (3.18) where Elaine denotes 3:12)]:1 ,,,_1, f() denote the joint prior den- sity, Q r (Q'l.....Q:....Q:,)' with Q, : (Q11.....Q’......Q:J)’ and Qt} : (Qijbw-sznr~~~QijzlllI~ T : (Tl/...,,Tll.....T,’,)l with T1: : (Ti/1’""Tilrn‘m‘Ti'IJU), and Tm. = (Tr.1(nz.)v"-~T1.k(m)*"'~Tz.1{(m)l’- it} 6 = (0".-l’~<2)’ and Tltmk = (rm. + BA. + Ti.k(1n)- In practice, we set tr. to be vector of zeros and F to 63 be diagonal matrix with large diagonal elements. If the MCMC algorithm is a Gibbs sampler, the full conditional distribution of each of the unknowns in (19) needs to be specified, which can be obtained in a standard way Dunson et al.(2000), Zeger et al.(1991)). For the fixed eflect 6, the full conditional distribution is PWIQ, T» y; 7r) 71v kyrk—HTIN k) 0( exp {Zi,j.m Qijm {lOg(7ij) + 2k. Q'ijm{ (III Oat-($9; 2m + Ci(yijk~ 99)} }} >< exp {—%6'F_16}. (3.19) The full conditional distribution for the Gaussian spatial latent vectors Tim is P(T2'.m lQ y; 71'. 6) Il' Amt/j ‘A‘Wbtll? k) 0< 9X1) {211,3,?,»))Qijn2. {log(mml + Zk szm{ ”n U I n" + 02(yijk»¢)}}} (1)029) X exp { 1 ZN?"1H72S2T772 Tim} (3.20) The. full conditional distributions of precision matrix {QTm : m, = 1,211} can be obtained in terms of different precision matrix structures correspondingly. (1) Unstructured precision matrix: memo T 3) 7r 6). — U’isllalfluTm + Nm,ATm + Zr T,’,,,) (3.21) 21:1 . T J l ' J Whele 17V") : ijl ern I 2?:121'2162213'712- (2) CAR model based precision matrix: (2.1) Overall precision parameters: [)(Tm IQ T. y: 7r. 0) _ ”(mt y, A bill/21221;) “x CXp{.£\—_:1j.222(»)1jlt?{1(lgiT’TJIIIll‘L 2k szm{ J “2qu ) + ci(l/2jlr~ ‘29)} x 75,4 exp {-TmE}. (3.22) 64 (2.2) Overall Spatial parameters: p(p,,,|Q,T,y:7r,0) 77' 'ky'k’bl7l‘ k) 0< exp{ZiJ,1nQij'n2{1()g(7fj221)+ 2k Qz'jmf m U l H" +Cz(3/2jk~s9)}}} aim X[(/\—1 A—l )(pml- mt'n.’ mar (3.23) All the posterior distributionS, except for {Pm : m = 1, , ..., M}, are proper based on their proper conjugate priors. The uniform priors for the overall spatial parameters are not conjugate, which might lead to improper posterior distributions. The simplest technique for verifying if the posterior distributions of the parameters is proper is to verify if the posterior distribution is proper for reduced data by discarding all but a single outcome per subject leading to a reduced data set consisting of independent outcomes, are proper (O’Brien and Dunson, 2004). Since the covariance structures do not appear in the reduced data likelihood and also the support for the spatial association parameters is finite, i.e.» {pm 6 (A5171‘,A;,,j1x) : m = 1,...,M} , so the posterior distributions of the spatial association parameters {pm : m = 1, ..., M} are proper. The algorithm for the posterior computation is through sampling 7r, Q, 6, T, and 5 respectively from the above conditional distributions. 3.4.4 Bayesian Model Selection The formal procedure for choosing an appropriate Bayesian hierarchical model for the observed data necessities methods to compare alternative models within Bayesian framework. The DIC (deviance information criterion, Spiegelhalter et at. (2002)) is a hierarchical modeling selection criterion that can be viewed as a generalization of the AIC (Akaike information criterion,Akaike, 1973) and BIG (Bayesian information criterion, Kass and Raftery, 1995). It is particularly useful in Bayesian model selec- tion problems where the posterior distributions of parameters have been obtained by Markov chain Monte Carlo (MCMC) simulation. The DIC-statistic is a measure of 65 model complexity and goodness of fit with the definition as DIC = 0(0) + pD, where D(29) is the deviance given the model parameters 29 = (7r’, 6',§’)’, defined as 0(0) = -210s(p(yl'lt)) + 210g(h(y))~ where h(y) is some fully specified standardizing term which is function of the data alone. —(_19) is the posterior mean of the deviance, a measurement of goodness of fit of the proposed model for the observed data. D65) is the deviance evaluated at the posterior mean of 29 and p0 = W - 13(5) is the effective number of parameters in the model, a penalty for the complexity of the model. The quantities BET) and D02) can be trivially computed from an MCMC simulation chain. Rather than the conventional DIG introduced in Spiegelhalter et al. (2002), our hierarchial models containing two levels of latent variables, which necessitates the model selections to be based on the DIC for missing data problems (Celeux et (21.2006). MCMC methods, such as the Gibbs sampler, can be employed conveniently to produce posteriors for parameters that are marginalized over latent spatial vectors. We computed the com- plete DICS (Celeux et al..2000) by using the MCMC simulation results to get both the measurement of goodness of fit and the number of effective parameters associated with each models and used these statistics to select the most appropriate model. In terms of our problem, we have to deal with latent variables to get a complete DICs. In order to compute the complete DICs, Celeux et at. (2006) gave a definition of complete data DIC, by defining the complete data estimator Ei)['29|y.q,t]. which does not suffer from identifiability problems since the components are identified by (q’, t’)’, the realization of the spatial latent vectors (Q’, T')’, and then obtain DIC for the complete model as 01004.0 = -4Eol10g(p(y,q,tlflhlaqil + 210g(P(y,q.t|E0l?9|ytqa tl))- (324) 66 As in the EM algorithm, we can then integrate this quantity to define DIC EQTlDICU/«Qleltll = —4Eu.Q.Tl10g(P(U Q, T|0))|;Ul + 2EQ.Tl10g(P(U Q, TlEull’lU Q. Tl)l|Ul- (3.25) II More. specifically. notice that a Q Tl10s(P(U Q Tltllll‘Ul: Eu {EQ(ETlP(10g().~.U Q TM) |U Q'Elly 19 )IU } — E.) {ET(EQl10s(I(.U Q Til”) |;U T I9HU l9)| )IU} (326) also l()g(p(y. Q Til”) I 2) j m {szm 103(7 7TJ'mp( Timlzm) )} (3.27 + Zed, m. k {(1)sz log(prll(szAlean— “‘1 Ti}. k(m)- 6>)}a ) where pm(y,§jk|Q,-Jm : lvTi.k(m)19) is given in (3) and p(T,-.,,,|ZTY’3) is given in (6). Interchanging the order of Q and T in the integrations by Fubini‘s theorem , we can have Er).(,).Ti10g(/(y~Qa T|0))|Ul = E.) {ErtEQllosUtUQ.TlU))l.U- T: Ulllth’HU} : Er) {ET(Zi.j.7n {EQ(QiJmi(/~Tiu) log(fijmpiTimisznD} il/flliy +E1) {ETizthmk {EQinJmiy- T: (9) logilhnlyzjkiQijm : 1- Tim: 9”} lye 0)iy} (3.28) where Eq(Q,;J-,,,|g. T: U) is given as below: fljllll)lll(ytintjlll : 1~Ti17236) .U , ‘ Zkzl let'IJA‘iylleijrlz : leikiB) EQinJrni!/~T3 U) : PlQifln : 1i.‘/~T3 0) : with k. PruiyiJlTimig =HW337)- k=1 At the lower level, conditional on the higher level spatial latent vector {62,- : 2'. = 1, ...,n} and intermediate level spatial latent vectors {Tl-J- : j = 1,...,J,2' = 1, ...,n}. The binary response variable yijk is mutually independent and from Bernoulli family with probability of success 7rzijk = Pug-fl, = 1). That is, (yijleijajijAt? 0,37) N Berno’uuiffiijk) l= fj(i)(yijle-ijvTi,k(j)i 9), where lo,qit(7r,jk|Q,1j,TM”): a, ,3, '7) = (Y + flj + MU) + Q1] + TM‘U)‘ and 6 = (o, .3'. y')’ with constraints 2)];1 flj : 0 and 2:le 7km = 0 forj = l, ..., J. Let Q = {C2, : '1? = 1,...,n} with Q, = {QU :j= 1,...,J} and T 2 {Ti : z' : 1,...,n} with T,- = {Tl-J- :j = 1,...,J} and Tij = {TM-(j) : k =1,...,K}. If the model formulation is viewed as missing data problem where we treat Q and T as missing 89 covariates that are used to explain the wide heterogeneity of dental caries experience outeomes,then the complete likelihood is, my, em 2a {2%) = f(le,T;6)p(leq)p(Tl{E§~}) _ = 2:1 f(yilQi:Tii9)p(QilZQ)Hj=1p(Tijl2%‘)} 412;] 11321 fiesta-a;epmjlzti}pea-12(2)} = $21 H3121 Phil{fk(j)(yijlei’7i,k(j);6)}pmJIET)}p(Qf|EQ)}' (4.1) The distributions for Q, and Ti]- are given by UGGMS correspondingly as below: Qt E3Q N NJ(0.EQ); i=1....,n. and 713,123; ~ .N1,'(O,Elj~). j = 1,...J,z' = 1,...,n, where 2Q is unstructured and 231'. can be either unstructured or CAR model based. Other consideration for parameterizations of the fixed effects 6’ and the probabilis- tic descriptions about the spatial latent vectors {Tz-j : j = 1, ..., J,z' = 1, ..., n} may be chosen differently. However, as it can be expected, the results of the inference would not be affected substantially(Agresti(1997)). The model selection is based on DIC for missing data problems(Celeux, et al.(2006)). The optimal model selection needs to be based on RJMCMC(Green et al.(1995)) or BDMCMC (Stephens (2000)), which is essential a simultaneous model selection at each iteration of the MCMC posterior sampling algorithm. 4.2 Bayesian mixture of generalized latent variable models Besides the generalized latent variable models, a finite mixture of distributions is another way to model response variables with wide heterogeneity. Finite mixtures of distributions are mathematical-based approaches to the statistical modeling of a 90 wide variety of random phenomena. They have been known as an extremely flexible method of modeling. The usefulness of finite mixture distributions in the modeling of heterogeneity in cluster analysis context is obvious. Mixture model provides a con- venient semiparametric framework in which to model unknown distributional shapes, whatever the objective, whether it is density estimation or the flexible construction of Bayesian priors. Mixture, model is also able to model quite complex distributions through an appropriate choices of its components and number of mixture components to represent accurz’ttely the local areas of support of the true distribution. It can han- dle situations where a single [')arametric family is unable to provide a satisfactory model for local variations in the observed data. In our approach, we assumed that each of the four quadrant-wise response vectors was from one of a certain number. say. I 3 :1! S 4. of multivariate distributions with corresponding probability. The .11 multivariate. distributions are characterized by M different situations which can accurately represent the corresponding local heterogeneity of observed binary vector. A convenient semiparametric way to incorporate the variability among these four ob- served quadrant-wise response vectors is to formulate their distributions uniformly in the form of a mixture of these M multivariate distributions. Specifically, the M multivariate distributions corres1')ond to .le underlying subgroups 0r subpopulations that where the four quadra-int-wise response vectors are supposed to be able to iden— tify if the subgroups actually exist: and each of the .1! multivz-iriate distribution is corresponding to one component in the mixture model. Mixture model can be viewed as missing data problem where the mixture compo- nent allocation process is latent. The latent process allocates each of the quadrant- wise res1‘)onse vector. yij to one of the. mixture components. say, the 'mth. component, which means yU- can be characterized by the local situation. i.e., in terms of hetero- geneity of the observed vector. associated with the III/h underlying cluster. Hierarchi- cally. at higher level. for the ill) subject, there exists a mixture component allocz-ition 91 latent process, Q1: (1 :1, ...,ng,... ,QAJ)’ with - ., . » I . . sz = (Q1111 ~-~Q1‘jm1 ""Qij.’\[) N IVUU'IMU» (W111.--17TJ‘M) ), J :1....,J,2 =1,...,n, which means (Uilez’jn-i = 1) ~ fluff/Ufa)- The complete distribution can be given as below: J M ffyilQi 7T 9) = H H {ijf-mf'yzleijm =119llejm - (4-2) jl: m=1 At the intermediate level, for the mih component that is a multivariate distribution, there exists a Gaussian spatial latent vector Tim = (T1,1(rn.)i TiMm)? TA,K(m))I N NATO-121T”): which is used to generate flexible distribution for the A" binary response variables that is from the exponential fan’iily (McCullagh and Nelder et a1.) and induce the dependence among the J variables. At lower level, conditional on the allocation process and Gaussian spatial latent vectors, the conditional distribution for the binary caries experience outcome 91);; is given by (yijAlejm — 1. TM HA,” :6) ~ BernUogit1(77A,,,A))) A: fA‘Hmff/zj/tlQijni:Tin/((77010). where ’limA" —— (1m +— 3A + TM A‘tm) and 6 = ((.i'. .3')’. with constraints (i1 S 02 S, -, -, -, < 0‘!” and Zk\:1 dle : 0. Let Q I ((23. Q’A, Q;,)’ and T = (T’, T-’, T,’,)', then the complete likeli- 2 hood is specified as my. Q. TIM {zen = H111{H;‘11’_.1{@11meT111119»{WWW} :l—lizl Hf:1{ 23:1 {fljmf {HA-z 1fA( (.)m )(yzjk TLA (m) MW} Timlzm)}Qijm} (4.3) The model structure has two uncertainties from both mixture model at the higher level and generalized latent variable models at intermediate level. At the higher level, the number of mixture components is left unknown. At intermediate level, the 92 covariance matrix, {2’11” : m = 1, M},for the generalized latent variable models can be either unstructured or CAR model based. The appropriate model needs to be determined by formal model selection criterion based DIC for missing data problem (Celeux, et al.(2006)). The implement of" the model is within Bayesian framework via WinB U CS with noninformative conjugate proper priors. Other consideration for parameterizations of the fixed effects 0 and the prob- abilistic descriptions about the spatial latent vectors {Qi : 2'. = 1, ..., n} and {TA-J : j = 1, .l,2'. : 1, ...,72} may be chosen differently. However, as it can be ex- pecte.d, the results of the inference would not be affected substantially(Agresti(1997)). The optimal model selection needs to be based on RJMCMC(Green et al.(1995)) or BDMCMC (Stephens (2000)), which is essential a simultaneous model selection at each iteration of the MCMC posterior sampling algorithm. 4.3 Missing data In biomedical research. missing data problem is common and there are lots of liter- atures with different approaches discussed in this area. but still the methods are not mature enough yet. to handle general situations. Our model were built from the fea- tures of the dental data at hand, they have general applications to situations where multilevel discrete data recorded were spatially. The models were implemented via l’l’rtTlB U GS that allows missing values in the data set. What thB U GS does to miss- ing values is to replace the missing data by the random sample from its posterior distribution p(y,,,A-55A,,glyobserwd; 6), which is essentially assumed that the missing is at random, i.e., the missing mechanism is noninformative. However, the missing data is very likely informative, since the teeth within the mouth share the same biological environment. In the presence of the informative missing data, our models need to be extended accordingly. In the futures work, we need to extend the model by incorpo- 93 rating the informative missing mechanism in a dropout process that is a parametric model for making inference about the missing values in the data set. The process for modeling the dropout pattern is problematic because the parameter that relates the measurement and the drOpout process, say, A, is always unidentifiable from the data at hand. Non-identifiability of the model always yields difficulties in the numerical optimization because of either flat or multimodal likelihood and singular informa- tion matrix, which makes the statistical inference infeasible in the frequentist’s frame work. Under the Bayesian frame work, the statistical inference is always available as long as the MCMC algorithm converges that are used to sample the posterior samples of the the quantities from their proper posterior distributions that are related to the data at hand. Bayesian approach for dealing with the informative missing data is known as the selection model (Arminger et al., 1995), which requires the terms representing the non-response mechanism be included explicitly in the likelihood. Best et al.(l996) discussed the selection model for informative non—responses in a study of dementia and cognitive decline in the elders. They viewed the full model as two submodels; one representing the substantive relationship of interest and one reflecting the missing data process, with the possibly unobserved response variable representing the com- mon link between the two submodels. Such a model may be readily expressed as a directed conditional independence graph, thus leading itself to Bayesian inference using MCMC approach. However, there is considerable current interest in the topic of informative drop-out(Diggle and Kenward (1994)) in which some argue that any at- tempt to learn about the selection mechanism will be heavily dependent on modeling assumptions, and that it is preferable to conduct sensitivity analysis to alternative plausible mechanisms. Meanwhile, the MCMC approach can easily provide predictive distributions for any variable of interest and. unlike approaches based on maximum likelihood or empirical Bayes. the MCMC predictions fully account for uncertainty in 94 both the model and the parameter estimations. Since the data often can not provide much inforn'iation for estimating the parameters of the models for non—response mech- anism, informative prior distributions for the parameters of interest in the selection models are used to facilitate the posterior sampling algorithm based on MCMC. So sensitivity analysis for the priors in the selection model is essential for the validity of the Bayesian analysis for model the non-response. mechanism that is incorporated explicitly in the likelihood. The future work will intend to develop a more general statistical procedure for assessing the sensitivity for both the non-response mechanism learning process and the informative priors used in the selection models. The procedure may be based on either different. model selection criteria, for instance. DIC for missing data problem and posterior predictive checking. or dynamic algorithms based on RJMCM (Grecn(l995)) and BDMCMC (Stephens (2000)) for simultaneous model selection and parameter estimations. 4.4 Comparison between frequentist and Bayesian It is well known that many standard statistical methods can be justified by both Bayesian and frequentist arguments. However, even when there is only one un- known parameter, there is a wide class of problems for which no Bayesian method can be found which satisfies the basic frequentist criterion (Bartholomew (1965)). Bartholomew raised two important questions in the comparisons between Bayesian and frequentist when discrej‘mncy arose. The first one is the practical question of whether the discrepancy between the two approaches is ever such as to lead to widely differing conclusions. The second is concerned with the reason for the two approaches to inference giving different results in some cases but not. in the other. The two ques- tions is also of great interest to be addressed in our future work. The starting point for this work will be considering the differences in the statistical thinking of the two statistical schools. For instance, suppose the observations y = (y1, yi, yn)’ on a continuous random variable with density function f (ylt9) and consider the Bayesian and frequentist solution to the problem about making an inference about 0. The Bayesian first specifies a prior for 6 then combining this with the likelihood to obtain a posterior distribution which enable people to make a probability statement about 6 of the form Pfé S gaff/Hy) = 0b (4-4) where ob denotes a degree of belief. The major problem for Bayesian is to select a prior density 7r(0) to express his ignorance about 9. Kass et al.(1995) reviewed several methods for determining a suitable prior distributions for the parameters of interest. For instance, based on Jerreys's rule, if we are ignorant about 0 then we are ignorant of about any function of 9. This leads to him to formulate the invariant principle, i.e., 7r(t9) cc [(6) where [(6) is the Fisher’s information function. The frequentist who wishes to make a statement of the form (6.10) is precluded from treating 9 as a random variable as it was treated by Bayesian. He must try to find a statistics 6(y) such that P09 g @IO) = ax (4.5) where of indicates the probability is to be interpreted in a frequency sense. The frequentists ignorance about 6 is expressed by the fact. that of is independent of 9. The statement 9 S 6A,(y) is thus true in the long run with probability (if for any sequence of 6's. In general, there are many functions 6(y) satisfying (6.11) and the frequentist‘s problem is to choose one of them. It may be possible to choose 6( y) such 0(y) / pt9ly)d9 = a -30 that Where p(9|y) is the posterior density of 6 for some prior 7r((-}). If the statistics 6(y) is 96 chosen in the way (6.11) is true, we say the Bayesian inference in (6.10) has frequency or confidence property. Under these circumstances, the Bayesian and frequentist approaches are said to agree. Welch et al. (1963) gave the following necessary and sufficient conditions for agreement. (1) It must be possible to write f(y|6) in the form f(s —— 7') where s and 7' are monotonic functions of y respectively and with —oo < 7', s < oo. (‘2) The prior density of 7' must be uniform over the real line. In large sample size, it is known that the influence of the prior 7r(6) for parameter 6 on the form of the posterior density [)(6ly) diminishes as 'n, —> 00. This means that, under very general conditions. Bayesian statement of (6.10) has the confidence property in the limit as 'n. —> 00 and the approach to agreement is more rapid with n if 7r(6) oc «1(6). Gelman ct. (1.1. (2004) discussed the asymptotic normality and consistency of the posterior mean and median. Under some regularity conditions, i.e., the likelihood is a continuous function of 6 and that 60, the true value of the parameter, is not on the boundary of the parameter space, as n —+ 00, the posterior distribution of 6 approaches normality with mean 60 and variance (n1(60))_1, where [(60) is the Fisher inforn‘iation evaluated at 60. In the limit of large n, the posterior mode. 6, approaches 60. and the curvature (observed information) approaches n1(60). \‘Vllt‘ll the truth is included in the family of models being fitted, the posterior mode, the posterior mean and median. are consistent. asyinptotically unbiased and efficient under mild regular conditions (Gelman ct. ul.(2001)). \Vhen sample sizes are small. the prior distribution is a critical part of the model sp<—>cification. It can only be a serious discrepancy between Bayesian and frequentist- methods if the density f(y|6) does not satisfy Welch’s condition (1), if the sample size is small. or possible. if it is determined sequentially. Bartholomew raised two objectives for the comparison between the two approaches. The first object is about the. extent to which Bayesian and frequentist statement of the form (6.10) and (6.11) 97 may differ in small samples. The second object is the reason for the differences which occur and how they may be avoided. Lee and Song (2006) did a simulation study, which showed Bayesian inference for hierarchial models with small to moderate sample size has a better performance than frequentist’s. Bartholomew (1965) pointed three conclusions in terms of the agreement between Bayesian and frequentist. (a) For shape parameter in gamma distribution, Bayesian interval estimates gave good agreement even if sample size is one; for restricted lo- cation parameter and exponential mean the agreement was not so good, but can be in'iproved by an appropriate chosen confidence interval. i.e. either ”shortest interval" or ”equal tails”. (b) Coverage probability of a two-tailed Bayesian interval estimate depends on not only prior but also the way that the interval is chosen. (c) Agreement may be achieved by using a sequential rather than a fixed sample size experiment design. The numerical magnitude of differences between frequentist and Bayesian methods of inference can be practically related to (a) and (b). The reason for the dis- crepancy is given by (c). He also conjectured that agreement can be always obtained if a correspondence is established between the Bayesian’s apprOpriate choice of prior distributions and the frequentist‘s choice of sampling rules. 98 APPENDICES 99 APPENDIX A The First Appendix A.1 Wz'nB U GS code one for BGLVM (with unstructured covariance matrix at intermediate level) for overall spatial sym- metry assessment. model{ ### Gaussian Graphical Models at Quadrant level ### InvSigaman1:I,1:I] " dwish(IQ[,].(I+1)) muQ[1]<- O muQ[2]<- O mqu3]<- O muQ[4]<- O ### Gaussian Graphical Models at Tooth level ### InvSigamaT[1:J,1:J] " dwish(IT[,],(J+1)) muT[1]<- O muT[2]<- 0 muT[3]<- 0 muT[4]<- 0 100 muT[5]<- O ### Generalized Latent Variable Models ### for(k in 1:N){ Q[k,1 I] " dmnorm(muQ[1:I],InvSigamaQ[1:I,1:I]) for( i in 1:I){ T[k,i,1:J] " dmnorm(muT[1:J],InvSigamaT[1:J,1:J]) } for( i in 1:I){ for( j in 1:J){ Lat[j,i,k]<-a1pha+(beta[i]-mean(beta[]))+ (gammalj,il-mean(gamma[,il))+Q[k,il+T[k,i,j] } for( i in 1:I){ for( j in 1:J){ logit(p[j,i,k])<-Lat[j,i,k] y[(k-1)*20+(i—1)*5+j] “ dbin(p[j,i,k],1) } } ### Priors ### alpha ” dnorm(0,0.001) for(i in 1:I){ beta[i] " dnorm(0,0.00l) } for( i in 1:I){ for( j in 1:J){ 101 gamma[j,i] " dnorm(0,0 01) } } ### Spatial association assessment ### ## Spatial association assessment between Left and Right ## Tlam12 <- -InvSigamaQ[1,2]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[2,2]) T1am34<~ -InvSigamaQ[3,4]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[4,4]) ## Spatial association assessment between Upper and Down ## Tlam23<- -InvSigamaQ[2,3]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[3,3]) Tlam14<- -InvSigamaQ[1,4]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[4,4]) ## Spatial association assessment between Across quadrants ## T1am13<- -InvSigamaQ[1,3]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[1,1]) Tlam24<- -InvSigamaQ[2,4]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[4,4]) ### Hypothesis Testing Overall Spatial Symmetry ### LRvsUD<-1/2*(Tlam12+T1am34)-1/2*(Tlam23+T1am4) LRvsA<-1/2*(Tlam12+T1am34)-1/2*(T1am13+Tlam24) AvsUD<-1/2*(Tlam13+T1am24)-1/2*(T1am23+Tlam14) A.2 WinB U GS code two for BGLVM (with CAR model based ('(wariance matrix at intermediate level) for overall spatial syuiuiuatrv asse581n(nit. model{ ### Gaussian Graphical Models at Quadrant level ### InvSigamaQ[1:I,1:I] " dwish(IQ[,],(I+1)) muQ[1]<- O 102 muQ[2]<- O muQ[3]<- O muQ[4]<- O ### Gaussian Graphical Models at Tooth level ### ### with CAR assumption for precision matrix ### num[1]<- 1 num[2]<— 2 num[3]<- 2 num[4]<— 2 num[5]<- 1 m[1]<- 1 m[2]<- 1/2 m[3]<- 1/2 m[4]<- 1/2 m[5]<- 1 cumsum[1]<- O for( i in 2:6){ cumsum[i]<-sum(num[1:(i-1)]) } for(k in 1:8){ for(i in 1:5){ I pick[k,i]<- step(k-cumsum[i]-esp)*step(cumsum[i+1]-k) } C[k]<— 1/inprod(num[],pick[k,]) } esp<- 0.0001 adj[1]<- 2 1(l3 ...; adj[2]<- (JO adj[3]<' M adj[4]<- .b adj[5]<- (JO adj[6]<- adj[7]<- 0'1 4:. adj[8]<- muT[1]<— O muT[2]<- O muT[3]<- O muT[4]<- O muT[5]<- O ### Generalized Latent Variable Models ### for(k in 1:N){ Q[k,1:4] " dmnorm(muQ[1:4],InvSigamaQ[1:4,1:4]) T[k,1,1:5] ” car.proper(muT[],C[],adj[],num[],m[],prec,spat1) T[k,2,1:5] ” car.pr0per(muT[],C[],adj[],num[],m[],prec,spat2) T[k,3,1:5] " car.proper(muT[],C[],adj[],num[],m[],prec,spat3) T[k,4,1:5] " car.pr0per(muT[],C[],adj[],num[],m[],prec,spat4) for( i in 1:1){ for( j in 1:J){ Lat[j,i,k]<-alpha+(beta[i]-mean(beta[]))+ (gamma[j,il-mean(gamma[,il))+Q[k.i]+T[k,i.jl } for( i in 1:I){ for( j in 1:J){ 104 logit(p[j,i,k])<—Lat[j,i,k] y[(k—1)*20+(i—1)*5+j] “ dbin(p[j,i,k],1) } } ### Priors ### alpha ” dnorm(0,0.01) for(i in 1:1){ beta[i] ” dnorm(0,0.01) } for( i in 1:I){ for( j in 1:J){ gamma[j,i] " dnorm(0,0.01) } } prec " dgamma(0.005,0.001) spatmax<- 0.35 spatmin<- -O.95 ~ spatl dunif(spatmin,spatmax) spat2 dunif(spatmin,spatmax) spat3 ” dunif(spatmin,spatmax) spat4 ” dunif(spatmin,spatmax) ### Spatial association assessment ### ## Spatial association assessment between Left and Right ## Tlam12 <- -InvSigamaQ[1,2]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[2,2]) Tlam34<- -InvSigamaQ[3,4]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[4,4]) ## Spatial association assessment between Upper and Down ## 1()5 Tlam23<- -InvSigamaQ[2,3]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[3,3]) T1am14<- -InvSigamaQ[1,4]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[4,4]) ## Spatial association assessment between Across quadrants ## T1am13<- -InvSigamaQ[1,3]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[1,1]) T1am24<- -InvSigamaQ[2,4]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[4,4]) ### Hypothesis Testing Overall Spatial Symmetry ### LRVSUD<—1/2*(Tlam12+Tlam34)-1/2*(Tlam23+Tlam4) LRVSA<-1/2*(Tlam12+Tlam34)-1/2*(Tlam13+Tlam24) AvsUD<-1/2*(T1am13+Tlam24)-1/2*(T1am23+Tlam14) 106 APPENDIX B The Second Appendix B.1 WinBUGS code one for BMGLVM (with 3 components and unstructured covariance matrix at intermediate level) for overall spatial symmetry assessment. model{ for( n in 1:N){ ### Mixture models (for "mth" mixture( with M components)) ### ### at Quadrant level ### for( i in 1:I){ ### Mixture models (for "kth" mixture( with K components)) ### ### at Tooth level ### for( j in 1:J){ y[((n-1)*20+(i-1)*5+j)] " dbern(p[n,AQ[n,i],j]) }# End of positions index # APQ[n,i,1:M] “ ddirch(alphaQ[]) AQ[n,i] " dcat(APQ[n,i,] ) }# End of quadrants index # 107 Q12[n]<- equals(AQ[n,1],AQ[n,2]) Q13[n]<- equals(AQ[n,1],AQ[n,3]) Q14[n]<- equals(AQ[n,1],AQ[n,4]) Q23[n]<- equals(AQ[n,2],AQ[n,3]) Q24[n]<- equalS(AQ[n,2],AQ[n,4]) Q34[n]<- equals(AQ[n,3],AQ[n,4]) }# End of Subjects index # ### Mixture Components Specification via ### ### GLVMs with Unstructured Covariance ### theta[1:J] ' dmnorm(mu[],ian[,]) alphal " dnorm(0,tau) local " dnorm(0,tau)I(O,) 10ca2 " dnorm(0,tau)l(,0) alpha[1]<- alphal alpha[2]<- alpha1+loca1 alpha[3]<- alpha1+loca2 for( n in 12N){ for( m in 11M){ T[n,m,1 5] " dmnorm(muT[1:5],InvSigamaT[1:5,1:5]) for(j in 1:J){ logit(p[n,m,j])<-alpha[m]+theta[j]-mean(theta[])+T[n,m,j] } ### Priors ### InvSigamaT[1:5,1:5] ~ dwish(IT[,],6) tau ” dgamma(0.01,0.01) 108 muT[1]<— O muT[2]<- 0 muT[3]<- O muT[4]<— 0 muT[5]<— O ### Similarity Assessment ### MQ12<- mean(012[]) MQ13<- mean(013[]) M014<— mean(014[]) MQ23<- mean(Q23[]) MQ24<- mean(Q24[]) M034<— mean(034[]) ### Hypothesis Testing Overall Spatial Symmetry ### LRvsUD<- 1/2*(MQ12+M034)-1/2*(M023+MQI4) LRvsA<— 1/2*(M012+MQ34)-1/2*(MQ13+MQ24) AvsUD<- 1/2*(MQ13+MQ24)-1/2*(M023+MQI4) } B.2 WinB U GS code two for BMGLVM (with 3 components and CAR model based covariance matrix at intermediate level) for overall spatial symmetry assessment. model{ for( n in 1:N){ ### Mixture models (for "mth" mixture( with M components)) ### ### at Quadrant level ### for( i in 1:I){ 109 ### Mixture models (for "kth" mixture( with K components)) ### ### at Tooth level ### for( j in 1:J){ y[((n-1)*20+(i-1)*5+j)] " dbern(p[n,AQ[n,i],j]) }# End of positions index # APQ[n,i,1:M] " ddirch(alphaQ[]) AQ[n,i] ” dcat(APQ[n,i,] ) }# End of quadrants index # Q12[n]<- equals(AQ[n,1],AQ[n,2]) Q13[n]<- equals(AQ[n,1],AQ[n,3]) Ql4[n]<- equals(AQ[n,1],AQ[n,4]) Q23[n]<- equals(AQ[n,2],AQ[n,3]) Q24[n]<- equals(AQ[n,2],AQ[n,4]) QB4[n]<- equals(AQ[n,3],AQ[n,4]) }# End of Subjects index # ### Mixture Components Specification via GLVMs under CAR Model ### theta[1:J] ” dmnorm(mu[],ian[,]) alphal " dnorm(0,tau) local “ dnorm(0,tau)I(0,) loca2 " dnorm(0,tau)I(,O) alpha[1]<- alphal alpha[2]<- alpha1+loca1 alpha[3]<- alpha1+loca2 for( n in 1 N){ for( m in 1:M){ T[n,m,1:5] " car.proper(muT[],C[],adj[],num[],invm[],prec,spat[m]) for(j in 1:J){ 110 logit(p[n,m,j])<- alpha[m]+theta[j]-mean(theta[])+T[n,m,jl } } } ### CAR models specification ### num[1]<- 1 num[2]<— 2 num[3]<- 2 num[4]<- 2 num[5]<— 1 invm[1]<- 1 invm[2]<- 1/2 invm[3]<- 1/2 invm[4]<- 1/2 invm[5]<- 1 cumsum [1] <— O for( i in 2:6){ cumsum[i]<- sum(num[1 (i-1)]) } for(k in 1 8>{ for(i in 1 5){ pick[k,i]<- step(k-cumsum[i]-esp)*step(cumsum[i+1]-k) } C[k]<- 1/inprod(num[],pick[k,]) } esp<-0.0001 adj[1]<— 2 111 adj[2]<- 1 adj[3]<- 3 adj[4]<- 2 adj[5]<— 4 adj[6]<- 3 adj[7]<- 5 adj[8]<— 4 muT[1]<- 0 muT[2]<- O muT[3]<- O muT[4]<- 0 muT[5]<- O ### Priors ### prec " dgamma(0.005,0.001) spatmax<- 0.35 spatmin<- -O.95 spatl " dunif(spatmin,spatmax) spat2 “ dunif(spatmin,spatmax) spat3 " dunif(spatmin,spatmax) spat[1]<- spatl spat[2]<- spat2 spat[3]<- spat3 tau “ dgamma(0.001,0.001) ### Similarity Assessment ### MQ12<- mean(012[]> MQ13<- mean(Q13[]) MQI4<— mean(Q14[]) 112 MQ23<- mean(Q23[]) MQ24<- mean(Q24[]) MQ34<- mean(Q34[]) ### Hypothesis Testing Overall Spatial Symmetry ### LRvsUD<- 1/2*(MQ12+MQ34)-1/2*(MQ23+MQ14) LRvsA<— 1/2*(M012+M034)—1/2*(M013+MQQ4) AvsUD<- 1/2*(M013+MQ24)-1/2*(MQ23+MQ14) } 113 BIBLIOGRAPHY Aitkin, M. and Rubin, D., B. (1985). Estimation and hypothesis testing in finite mixture models. Journal of the royal Statistical Society B 47. 67-75. Aitkin, M. (I996). A general maximum likelihood analysis ofoverdispersion in general- ized linear models. Statistics and Computing, 6, 251-262. Akaike, H. (1973). Information theory and an extension of the maximum likelihood princ- iple. In Proc. 2nd Int. Symp. Information Theory (eds B. N. Petrov and F.Cspaki), pp.267-28l. Alan Agresti. (1997). A model for repeated measurements of a multivariate binary respo-nses. Journal of American Statistical Association, Vol. 92, No. 437, 3 l 5-321. Alan Agresti. (2002). Categorical Data Analysis. Second Edition. New York: Wiley. Alan Agresti and and D., Hitchcock. (2005). Bayesian inference for categorical data ana- lysis. Statistical Methods and Application (Journal of the Italian Statistical Society). Alan E. Gelfand and Penelope Vounatsou. (2003). Proper Multivariate Conditional Auto- regressive Models for Spatial Data Analysis. Biostatistics 4, I, pp. ll-25. Alan E. Gelfand and Sujit K. Sahu. (I994). ldentifiability, improper priors, and Gibbs sampling for generalized linear models. Journal of the American Statistical Association, Vol. 94, N0. 445. pp. 247-253. Anderson, T., W. (l97l ). An Introduction to Multivariate Statistical Analysis, 2‘"d edition. New York: Wiley. Anders Ekholm, Peter W., F., Smith and John W. McDonald. (1995). Marginal regression analysis of a multivariate binary response Biometrika, Vol. 82, No. 4.,pp. 847- 854. Anders Skrondal and Sophia Rabe-Hesketh. (2004). Generalized Latent Variable Modeling. New York: Chapman and Hall. Andrew, Gelman, John, B. Carlin. Hal, S. Stern and Donald, B. Rubin. (2004). Bayesian Data Analysis Chapman & Hall/C RC . Andrew Gelman and Francis Tuerlincks. Type S error rates for classical and Bayesian single and multiple comparison procedures. Working paper. ”4 Arminger, G., Clogg, C., C. and Sobel, M., E. (eds). (I995). Handbook of Statistical Modeling for the Social and Behavioral Science. New York: Plenum. Bartholomew, D., J. (I965). A comparison of some Bayesian and frequentist inferences. Biometrika, 52, I and 2, l9-35. Bartholomew, D., J. (I984a). The Foundations of Factor Analysis, Biometrika, 7l , 22]- 232. Bartholomew, D., J. (I984b). Scaling Binary Data using a Factor Model, Journal of the Royal Statistical Society, Series B, 46, l20-123. Bartholomew, D., J. (I988). The sensitivity of latent trait analysis to the choice of prior distribution. British Journal of Mathematical and Statistical Psychology, 4 I , I 0 I -l 07. Bartholomew, D., J. (1994). Bayes's theorem in latent variable modeling. Aspects of uncertainty: A tribute to D. V. Lindley. Freeman,P.R. & Smith, A., F., M., Ed. London: Wiley. Bartholomew, D., J. and Knott, M. (I999). Latent Variable Models and Factor Analysis. Edward Arnold Publishers Ltd. Bayes, T., R. (I 763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 53, 370- 418. Reprinted in Biometrika, 45, 243—3 l 5, I 958. Bedrick, E., J ., Christensen, R., and Jonson, W. (I996). A new perspective on priors for generalized linear models. Journal of the American Statistical Association, 91, I450- 1460. Best, N., G., Spiegelhalter, D., J ., Thomas, A. and Brayne, C., E., G. (I996). Bayesian analysis of realistically complex models. Journal of Royal Statistical Association, A, 159, part 2, 323-342. Bettina GrAun and Friedrich Leisch. (2006). Fitting finite mixtures of generalized linear regressions in R. Computational Statistics and Data Analysis. Box, 6., E., P., and Tiao, G., C. (1973). Bayesian inference in statistical analysis, Reading, MA: Addison-Wesley. Breslow, N., E. and Clayton, D., G. (I993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9-25. Brook, D. (I964). On the distinction between the conditional probability and joint probability approaches in the specification of the nearest neighbor systems. Biometrika 5 I, 481-489. “5 Bruce Rannala. (2002). Identifiability of parameters in MCMC Bayesian inference of phylogeny. systematic biology, Vol.51, No.5, pp. 754-760. Carey, V., Zeger, S. and Diggle, P. (1993). Modeling multivariate binary data with alternating logistic regressions. Biometrika, 80, 3, pp. 517-526. Carmen Fernandez and Peter, J. Green. (2002). Modeling spatially correlated data via mixture: A Bayesian approach. Celeux, 6., Forbes, F., Robert, C., Titterington, BM. (2006). Deviance Information Criteria for missing data models with discussion. Bayesian Analysis, in print. Chuan zhou and Jon Wakefield. (2006). A Bayesian mixture model for partitioning gene expression data. Biometrics 62, 515-525. Collett, D. and Stephiewska, K. (1999). Some practical issues in binary data analysis. Statist. Med. 18, 2209-2221. Coull, B., A. and Agresti, A. (2000). Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. Biometrics. 56, 73-80. Cox, D., R. (1970). The Analysis of Binary Data. London: Methuen. Cox, D., R. (1972). The analysis of multivariate binary data. Applied Statistics, Vol. 21, No. 2, pp. 1 13-120. Cressie, N. (1991). Statistics for Spatial Data. New York: Wiley. pp: 61-63. Dale, J ., R. (1986). Global cross—ratio models for bivariate discrete ordered responses. Biometrics 42, 909-917. David, B., Dunson. (2007). Bayesian methods for latent trait modeling of longitudinal data. Statistical Methods in Medical Research, 16: 399-415. Davidian, M. and Giltinan D., M. (2003). Nonlinear models for repeated measurement data: an overview and update. Journal of A gricultural, Biological, and Environmental Statistics 8, 387-409. Dawid, A.,P. (I979). Conditional Independence in Statistical Theory. (with discussion), Journal of the Royal Statistical Society, Ser. B, 41,1-31. Dempster, A. (1972). Covariance selection. Biometrics 28, 157-175. Diebolt, J. and Robert, C., P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society B 56, 363-375. 116 Diggle, P. (1992). Discussion of paper by K.-Y. Liang, S., L., Zeger and B., Qaqish. J. R. Statist. Soc. B 45, 28-39. Diggle, P., Liang, L. and Zeger, S. (1994). Analysis of Longitudinal Data, Clarendon Press, Oxford. Diggle, P. and Kenward, M., G. (1994). Informative drop-out in longitudinal data analysis (with discussion). Applied Statistics, 43, 49-93. Dunson, D., B. (2000). Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society B 62, 355-366. Edwards, D. (1990). Hierarchical interaction models. Journal of the Royal Statistical Society. Series B, 52, 3-20. Emily, L. Webb and Jonathan, J. Forster. (2006). Bayesian model determination for multivariate ordinal and binary data. Escobar, M., D. and West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of American Statistical Association 90, 577-588. Everitt, B., S. and Hand, D., J. (1981). Finite mixture distributions. London: Chapman and Hall. Eva Tzala and Nicky Best. (2007). Bayesian latent variable modeling of multivariate spatial-temporal variable in cancer mortality. Statistical Methods in Medical Research 2007; 1-22. Fitzmaurice, G., M. & Laird, N., M. (1993). A likelihood-based method for analyzing longitudinal binary responses. Biometrika, 80, 141-51. Fisher, R., A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transitions of the Royal Society of London, Series A, 222, 309-368. Geman, S. and D. Geman. (1984). "Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images." IEEE Trans. Pattern Analysis and Machine Intelligence 6: 721-741. Green, Peter. ( 1995). Reversiblejump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 71 1-732. Hobert, J ., P. and Casella, G. (1996). The effect of improper priors on Gibbs sampling in Hierarchical linear mixed models. Journal of the American Statistical Association 91, 1461-1473. 117 Ibrahim, J ., Chen, M. and Lipsitz, S. (2002). Bayesian methods for generalized linear models with covariates missing at random. Canadian Journal of Statistics, 30, 55-78. Jansen R., C. (1993). Maximum likelihood in a generalized linear finite mixture model by using the EM algorithm. Biometrics 49, 227-231. Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, 186, 453-461. Jeroen K. Vermunt and Jay Magidson. (2004). Hierarchical mixture models for nested data structure. Julian Besag, Peter Green, David Higdon and Kerrie Mengersen. (1995). Bayesian computation and stochastic systems. Statistical Science Vol. 10, No. 1, 3-66. Kass, R. and Rafiery, A. ( I995). Bayes factors and model uncertainty. Journal of American Statistical Association, 90, 773-795. Laplace, P., S. (1820). English translation: Philosophical essay on Probabilities (1951). New York: Dover. Lauritzen, S., L. and Wermuth, N. (1989). Graphical models for association between variables, some of which are qualitative and some quantitative. Ann. Statist., 17, 31-57. Lee S., Y. and Song X., Y. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research 2004; 39: 653-686. Leroux, B. (2006). Analysis of Correlated Dental Data: Challenges and Recent Developments. Statistical Methods for Oral Health Research, JSM 2006. Lesaffre, E. and Bogaerts, K. Spatial Correlations in Caries Attack Patterns in the Deciduous Dentition. Biostatistics, K. U. Leuven. Liang, K., Y. and Zeger, S., L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. Liang, K., Y., Zeger, S., L. (1989). A class oflogistic regression models for multivariate binary time series. Journal of American Statistical Association, 84, 447-451. Liang, K., Y., Zeger, S., L. and Qaqish, B. (1992). Multivariate regression analysis of categorical data (with discussion). Journal of the Royal Statistical Society, Series B. 45, 3-40. 118 Lipsitz, R., Laird, N., M. & Harrington, P. (1991). Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78, 153-160. Lipsitz, S. and Ibrahim, J. (1996). A conditional model for incomplete covariates in parametric regression models. Biometrika, 83, 916-922. Little, R. and Rubin, D. (1987). Statistical Analysis with Missing Data. New York: Wiley. Manski, C., F. and McFadden, D. (1981). Structural Analysis of Discrete Data with Econometric Applications. Cambridge: Massachusetts Institute of Technology Press. Mardia, K. V. (1988). Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. Journal of Multivariate Analysis 24, 265-284. Mathias, Drton and Micheal Perlman. (2004). Model selection for Gaussian concentra- tion graphs. Biometrika, 91,3, pp. 591-602. McClachIan, G., J. and Basford, K., E. (1988). Mixture models: inference and applica- tions to clustering. New York: Marcel Dekker. McLachlan, G and Peel, D. Finite Mixture Models. Wiley (2001). McCullagh, P. and Nelder, J., A. (1989). Generalized Linear Models, 2nd edition. New York: Chapman and Hall. Morton, R. (1987). A generalized linear model with nested strata of extra-Poisson variation. Biometrika, 74, 247-257. Moustaki, 1.1(1996). A Latent Trait and a Latent Class Model for Mixed Observed Variables, British Journal of Mathematical and Statistical Psychology, 49, 313-334. Neuthaus, J., M., Hauck, W., W. and Kalbfleisch, J., D. (1992). The effects of mixture distribution misspecification when fitting mixed effects logistics models. Biomatrika, 79, 755-762. O'Malley, A., James and Zaslavsky, Alan M. (2006). Domain-level covariance analysis for survey data with structured nonresponse. Working paper. Paolo Giudici and Peter J. Green. (1999). Decomposable graphical Gaussian model determination. Biometrika 86, 4, pp. 785-801. Pettitt, A., N., Tran, T., T., Haynes, M., A. and Hay, J ., L. (2006). A Bayesian hierarch- ical model for categorical longitudinal data from a social survey of immigrants. Journal of the Royal Statistical Society A (2006), 169, Part I, pp. 97-1 14. 119 Prentice, R., L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, 1033-1048. Richardson, S. and Green. P. (1997). On Bayesian analysis of mixtures with an unknown number of components, Journal of the Royal Statistical Society, Series B. Vol. 59, No. 4. (1997), pp. 731-792. Robert, C., P. (1996). Mixture ofdistributions: inference and estimation. In Markov Chain Monte Carlo in Practice, W.R. Gilks , S. Richardson, and DP. Spiegelhalter (Eds). London: Chapman & Hall, pp. 441-464. Robert E. Kass; Larry Wasserman. (1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, Vol. 91, No. 435. pp. 1343-1370. Roeder, K. and Wasserman, L. (1997). Practical density estimation using mixtures of normals. Journal of the American Statistical Association 92, 894-902. Roy, J. (2006). Statistical Approaches for Dealing with Missing Tooth- and Surface Level Data in Caries Research. Statistical Methods for Oral Health Research, JSM 2006. Sammel, M., D., Ryan, L., M., and Legler, J., M. (1997). Latent variable models for mixed discrete and continuous outcomes. Journal of the Royal Statistical Society B 59, 667-678. Samuel, M. Manda, Rebecca, E., Walls and Mark, S., Gilthorpe. (2007). A full Bayesian hierarchical mixture model for the variance of gene differential expression. BMC Bioinformatics. Scott L. Zeger and M. Rezaul Karim. (1991). Generalized Linear models with random effects; A Gibbs Sampling Approach. Journal of the American Statistical Association, theory and Methods Vol. 86, No. 413, 79-86. Seam, M. O'Brien and David B. Dunson. (2004). Bayesian multivariate logistics regre- ssion. Biometrics 60, 739-746. Spiegelhalter, D., J., Best, N., G., Carlin, B., P. and van der Linde A. (2002). Bayesian measures of model complexity and fit (with discussion) Journal of Royal Statistical Society. B. 64, 583-640. Spiegelhalter, D., J., Thomas, A. and Best, N. (2003). WinBUGS Versionl .4 User Manual. mm: hrc-hsu. cam. ac. Itlt' hugs. Stephens, M. (2000). Bayesian analysis of mixtures with an unknown number of comp- onentsan alternative to reversiblejump methods, The Annals of Statistics. Volume 28, Number 1 (2000), 40-74. 120 Stiratelli, R., Laird, N. and Ware, J ., H. (1984). Random-effects models for serial obser- vations with binary response. Biometrics, 40, 961 -971. Stuart, R., Lipsitz and Garrett Fitzmaurice. ( 1994). An extension of Yule's Q to multiva- riate binary data. Biometrics, 50, 847-852. Titterington, D., M., Smith, A., F. and Makov, U., E. (1985). Statistical analysis of finite mixture distributions. New York: Wiley. Thompson T., J ., Smith P., J. and Boyle J., P. (1998). Finite mixture models with conco- mitant information: assessing diagnostic criteria for diabetes. Applied Statistics, 47, 393- 404. Thomas, A., Best, N., Lunn, D., Arnold, R. and Spiegelhalter, D., J. (2004). GeoBUGS Versionl .2 User Manual. www. hrc-bsu.camac. ilk/bugs. Van Duijn Maj and Bockenholt U. (1995). Mixture models for the analysis of repeated count data. Applied Statistics 44, 47385. Vanobbergen, J ., Martens, L., Lesaffre, E., Declerck, D. (2000). The Signal-Tandmobiel project, a longitudinal intervention health promotion study in Flanders (Belgium): baseline and first year results. Europe Journal Paediatric Dentistry 2000; 2: 87-96. Vanobbergen, J ., Lesaffre, E., Garca-Zattera, M., J ., Jara, A., Martens, L. and Declerck, J. (2007). Caries Patterns in Primary Dentition in 3-, 5- and 7-year-old Children: Spatial Correlation and Preventive Consequences. Verbeke, G. and Molenberghs, G. (2000). Linear mixed models for longitudinal data. New York: Springer, (Springer series in statistics). Vincent Carey, Scott L. Zeger and Peter Diggle. (1993). Modeling multivariate binary data with alternating logistic regressions. Biometrika, 80, 3, pp. 517-526. Wang, P., Puterman, M. L., Cockbum, I. and Le, ND. (1996). Mixed Poisson regression models with covariate rates. Biometrics 52, 381-400. Wang, P. and Puterman M., L. (1998). Mixed logistic regression models. Journal of Agricultural, Biological, and Environmental Statistics 3, 175-200. Wang, F., J. and Wall, M., M. (2003). Generalized common spatial factor model. Biostatistics 4, 569-5 82. Welch, B., L. and Peers, H., W. (1963). On formulae for confidence points based on integrals of weighted likelihood. Journal of Royal Statistical Association, B, 25, 318-329. 121 West, M., Muller, P., and Escobar, M., D. (1994). Hierarchical priors and mixture models with application in regression and density estimation. In aspects of Uncertainty: A tribute to D. V. Lindley, A.F.M. Smith and P. Freeman (Eds). New York: Wiley, pp. 363-386. Wu, M., C. and Carroll, R., J. (1988). Estimation and comparison of change in the presence of informative right censoring by modeling the censoring process. Biometrics, 44,175-188. Zabell, S., L. (1992). R. A. Fisher and the fiducial argument. Statistical Science, 7, 369- 387. Zhao, L., P. and Prentice, R., L. (1990). Correlated binary regression using a quadratic exponential model. Biometrika 77, 642-648. Zhao, Y., Staudenmayer, J., Coull, B., A. andWand, M., P. (2006). General Design Bayesian Generalized Linear Mixed Models. Statistical Science Vol. 21, No.1, 35-51. Zhu, J ., Erickhoff, J ., C. and Yan, P. (2005). Generalized Linear Latent Variable Models for Repeated Measures of Spatially Correlated Multivariate Data. Biometrics, 61, 674- 683. 122 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII WWWWMWW