th..4'
we .7...

.vt... . 1.. .
., 3.1.5.7
is: w-..

3...“... '51.}

 

 

 

 

& 2.: N. 1323...? , . _ :immmﬁ.

 

 

 

 

A Hierarchical Bayesian Approach to Model Spatially

Binary Data with Applications to Dental Research

PhD.

This is to certify that the
dissertation entitled

Correlated

presented by

Yanwei Zhang

has been accepted towards fulﬁllment
of the requirements for the

degree in Statistics

 

 

ﬂ Dav/Q

WajWor’s Signature
06 / 0 3/08

Date

 

MSU is an aﬁirmative-action, equal-opportunity employer

 

 

Michigan State

LIBRARY

University

 

 

r—‘i

I 0c] Pm,

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 K‘IPrOJIAcc8Pres/ClRC/DateDue indd

A HIERARCHICAL BAYESIAN APPROACH
TO MODEL SPATIALLY CORRELATED
BINARY DATA WITH APPLICATIONS TO
DENTAL RESEARCH

By

Yanwei Zhang

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics

2008

ABSTRACT

A HIERARCHICAL BAYESIAN APPROACH
TO MODEL SPATIALLY CORRELATED
BINARY DATA WITH APPLICATIONS TO
DENTAL RESEARCH

Bv

U

YanweiZhang

Statistical analysis of multivariate binary data measured repeatedly in time or
cross—sectionally clustered in space, besides the difficulties of non-continuous nature
of data, raises a number of challenges. For instance, dental data from oral health
research community are always discrete, clustered spatially and repeated in time.
The researchers are interested in the risk factors and spatial symmetry property of
caries prevalence incidence. It is well believed. for example. that the caries outcomes
adjacent to each other are highly likely to he cm'related. which necessitates the use
of methodologies for correlated discrete data. Generalizml estimating equation(GEE)
based approach might help answer marginal mean and pairwise association types of
research questions about. correlated units of interest. When association among units is
of primary research concern, GEE suffers seriously from less efficiency. Methodologies
for analyzing n'niltive-u'iate categorical data clustered in space, with both marginal
mean and association being of research interest, need to contimie. In this thesis, we
will introduce complete likelihood based approaches for analyzing spatially correlated
binary data. Specifically, we are going to discuss a class of methods that attempt to
explicitly take some very unique spatial structure features into consideration for valid
and efficient inferences at tooth level. Furthermore. we proposed different models by

using latent Variables with hierarchical levels to account for the spatial dependence

of the data features from different points of view. The hierarchical structure of the
model and local identifiability of latent variable models make the statistical inference
aptn‘opriate within Bayesian framework through the MCMC based posterior sampling
algorithm. Comparison among the performances of different models was made under
Bayesian model selection criterion (DIC) for missing data problem. Finally, we gave
Bayesian hypothesis testing for the spatial symmetries of caries incidence by providing
semitendinous credible regions for the differences of quantities that were used to
measure spatial association strength. The methodology is illustrated by using dental

data from Signal Tandn’iobiel (STM) project.

ACKNOWLEDGMENTS

I would like. to express my sincere gratitude to my advisor Dr. David Todem for
his raluable advice and continuing encouragernents during my years as a research
assistant at the department of epidemiology. His help and expectation made my aca-
demic life challenging but a rewarding one. I am also grateful to the financial support
from his funded project. Dr. Toden'is hardworking, dedication, and creativeness have
been and will always be inspiring me to achieve my career goal.

Appreciation is extended to Dr. Ramamoorthi and Dr. Gardiner for making it
possible for me to become an applied statistician of being Bayesian in principle. I
would like to thank Dr. Ramamoortl‘ii, in the first year of my studying in east lansing.
who helped me went through the toughest time in my life so far. Dr. Gardiner gave
me some advice that helped me make transitions from a mathematician to an applied
statistician. I also would like to thank Dr. .\lelfi and Dr. Cui for being my thesis
committee members.

Special thanks to my parents, they are the greatest parents I could possibly ask
for. Especially, my father has been always supportive of my efforts for carrying out
my career goals. It. was my father who tried as much as possible to make my education

financially possible. I dedicate this work to all of you.

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

1 Introduction
1.1 Background and objectives ........................
1.2 Principles for the analysis ........................
1.3 Outline of the thesis ...........................

2 Bayesian Generalized Latent Variable Models
2.1 Introduction ................................
2.2 The Spatial Dependence Structures ...................
2.2.1 Notation ..............................
2.2.2 Principles of our modeling approach ...............
2.3 Models ...................................
2.3.1 Generalized Latent. variable model ...............
2.4 Bayesian Estimations and Statistical Inference .............
2.4.1 Identiﬁability of the models ...................
2.4.2 Prior distributions ........................
2.4.3 Posterior computations ......................
2.4.4 Missing data issue ........................
2.4.5 Bayesian Model Selection .....................
2.4.6 Spatial symmetry hypothesis testing ..............
2.4.7 Example .............................
2.5 The Signal Tandmobiel Project Example ................
2.5.1 Primary results ..........................
2.5.2 The results from our approach .................
2.6 Discussion ................................

3 Bayesian Finite Mixture of Generalized Latent Variable Models
3.1 Introduction ................................
3.2 The Spatial Dependence Structures ...................

3.2.1 Notation ..............................
3.2.2 Principles of our modeling approach ...............
3.3 Models ...................................
3.3.1 Bayesian Mixture Models ....................
3.3.2 Response Models .........................
3.3.3 The Structure Model for Latent Variables ...........

vii

ix

NUHH

9

9
11
11
11
13
13
22
22
22
25
27
28
30
36
37
38
39
44

46
46
49
49
50
51
51
52

3.4 Bayesian Estimations and Statistical Inference ............. 59

3.4.1 Identiﬁability of the models ................... 59

3.4.2 Prior distributions ........................ 60

3.4.3 Posterior computations ...................... 62

3.4.4 Bayesian Model Selection ..................... 65

3.5 The Signal Tandmobiel Project Example ................ 74
3.5.1 Primary results .......................... 74

3.5.2 The results from our approach ................. 75

3.6 Discussion ................................ 84

4 Discussion and Ejture Research 86
4.1 Bayesian generalized latent variable models ............... 86
4.2 Bayesian mixture of generalized latent variable models ........ 90
4.3 Missing data ................................ 93
4.4 Comparison between frequentist and Bayesian ............. 95
APPENDICES 100
A The First Appendix 100
A.1 WinBUCS code one for BGLVM .................... 100
A.2 thBUCS code two for BGLVM .................... 102

B The Second Appendix 107
8.1 WinBUCS code one for BMGLVM ................... 107
B2 WinB UGS code two for BMGLVM ................... 109
BIBLIOGRAPHY 114

vi

2.1

2.2

2.3

2.4

to
CI!

2.6

3.1

3.2

3.4

3.5

LIST OF TABLES

Prevalence of caries experience(% affected) in the deciduous dentition
of 7-year-old children tit-1.351. .

Odds ratios and 95% confidence intervals for the 2X2 association mod-

els for caries on deciduous molars on tooth in 7—year—old children.

Credible intervals of overall spatial association strength comparisons
Based on UGGM with unstructured covariance structure

Credible intervals of overall spatial association strength comparisons
Based on UGGM with CAR model based covariance structure

Credible intervals of specific spatial association strength comparisons

Based on UGGM with unstructured covariance structure

Credible intervals of specific spatial association strength comparisons
Based on UGGM with CAR model based covariance structure

Prevalence of caries experience(% affected) in the deciduous dentition
of 6.7.8-year—old children 11:4,351. .

Odds ratios and 95% confidence intervals for the 2x2 association mod-

els for caries on deciduous molars on tooth in 7-year-old children.

Credible intervals of spatial association strength comparisons based on
BGLVMs and UGGM with unstructured covariance structure .

Credible intervals of spatial similarity comparisons based on mixture
model with 2 components and UGGM with unstructured covariance
structure .

Credible intervals of spatial similarity comparisrms based on based on
mixture model with 2 (1'omp(ments and UGGM with CAR model based
covariance structure .

38

39

40

40

41

42

74

‘1

CI]

79

8t)

3.6

3.7

Credible intervals of spatial similarity comparisons based on mixture
model with 3 components and UGGM with unstructured covariance
structure .

Credible intervals of spatial similarity comparisons based on based on

mixture model with 3 components and UGGM with CAR model based
covariance structure .

viii

81

82

2.1

2.2

2.3

3.1

3.2

LIST OF FIGURES

The response vectors rm. ya. 311-3 and y” are tighten by spatial latent
vector Qi = (Qil.Q,;2,sz3,Ql-4)' whose joint distribution is given by
UGGM with unstructured precision matrix. ..............

The response variables yU-l, yijQ, yljg, 31.04 and gym-5 are tighten by
“If a run ..— I. . . . 3,. . . . . I 7 ‘
spatral latent vector TI] — (71.1(])‘Tt92fjl’letifj)’Tl.4(])’Tl.5(J)) whose
joint. distribution is given by UGGM with unstructured precision matrix.

Note: The response variables yz-J-l, gig-2, yU-g, ygj4 and yij5 are tighten

I ' ' ‘ ' ' — . . . . . . . . . . I
by spatral latent vector T1] —- (TZ‘IU),T,’2(J),T,’3(]),Tz‘4(J),Tz,5(J))
whose joint distribution is given by UGGM with precision matrix under

CAR. model assumption. .........................

The response variables .l/zijl~ ij'Q, yiJ-g. 1904 and yus are allocated
to the. mlh cluster and tighten by spatial latent vector Tim =

r I . ‘ . . .‘ .. . .‘
(Tum-n):Ti.2(m)szi.3(m)~[i.4(m)~Tz‘.5(m)) whose joint. distribution is
given by UGGM with unstructured precision matrix. .........

Note: The response variables gut-1, 90-2, 9113‘ 31,-1.1 and gym-5 are are
allocated to the mth cluster and tighten. by spatial latent vector
. / . . . .

Tim. = (Ti,1(,,,).Ti‘2(m),Ti,3(m),T,-.4(m),Tj,5(m)) whose jornt distrrbu-
tion is given by UGGM with precision matrix under CAR. model as-

sumption. .................................

16

18

56

CHAPTER 1

Introduction

1.1 Background and objectives

In biomedical studies. it is common in practice for a binary disease outcome to be
measured either repeatedly across time or cross—sectionally across spatial spots. The
motivating example for this research comes from dental research. The caries status
of teeth are evaluated as binary outcomes with 1 indicating the presence of caries
and 0 otherwise. The caries prevalence incidences are suspected to have a certain
spatial symmetry property in terms of the quadrants configuration within the mouth,
which is well believed by dentists in practice. It is well known. for example, that the
dental caries outcomes adjacent to one another are likely to be correlated. Specifically,
there are four quadrants within the mouth and all the quadrants are believed to be
correlated to one another. Within each quadrant, the adjacent teeth are also likely
to be correlated and the correlation might be affected by the quadrant. Hence, it
necessitates the use of methodologies for correlated data to analyze dental data
VVl‘ren a patient first visits a dentist. either for a check-up or a. more serious dental
issue, the. dentist will normally conduct. a full examination to gain an understand—

ing of the patients overall dental health as well as the patients particuh-rr dental

problem(s). if any. Because of the complexity and diversity of dental issues: and the
numerous teeth involved. it is difficult for the dental health researchers to analyze
the dental data, except in a most general and superficial way with respect to quad-
rant.tooth position. age. sex, geographical region, etc. In dental practice. it is of
interest. to find out some patterns in terms of caries of the teeth, which will help the
dentist efficiently examine oral health of the patients and provide people informa-
tive guidance for intervention of caries. Researchers have been working on different
methodologies to analyze the dental data to address caries incidence pattern related
questions. The traditional method for analyzing dental data is based on the num-
ber of Deca.ye(.l/ Missing/ Filled Surfaces (DMFS) or Decayed/ h’lissing/ Filled Teeth
(DMFT). introduced by Klein cl (1!. (1938). DMFT and DRIFS can roughly express
the caries prwalence numerically and are obtained by calculating the number of De-
cayed (D). Missing (M) and Filled (F) teeth (T) or surfaces (S) within the mouth.
The DMFT evaluation method is a well-known technique and has been used for many
years to analyze the effects of variables. such as fluoride, on the dental health of given
populations. This approach operates the analysis at the mouth level, which is not
informative. in terms of caries pattern. to dentists and patients for oral health exam—
ination and caries interventioiis. Dentists and patients are really interested in spatial
symmetry patterns of caries. For example, if one caries was found on one speciﬁc
tooth within a specific quadrant. which tooth will be the next that is highly likely to
have caries. If the dentists has some information about the spatial symmetry of the
caries, they may efficiently locate or predict which is the next tooth with high risk of
developing caries. If so. dentists and patients may be able to pay more attentions to
the teeth with high risk. Due to the spatial configuratkm of the quadrants and teeth
within each quadrant. the nature of the data requires the methodology for correlated
binary data.

Lesaffre at (if. (2006) proposed a several methods to analyze the dental data from

the Signal Tandmobiel (STM) project. Their approach was based on the General-
ized Estimating Equation (GEE)(Zeger and Liang, 1986) to deal with correlated data.
Lesaffre‘s approach used logistic regression model framework to model marginal caries
incidence using exchangeable working correlation matrix to account the dependence
of the data. Their GEE based approach is not able to capture the special correlation
structure among quadrants and among teeth within the same quadrant. Roy (2006)
proposed a model-based approach for imputing these missing values. His method ex-
ploited the spatial correlation among teeth without considering the different strength
of spatial dependence among quadrants. \/'z-rrrobbergen el al.(2007) proposed ALR.(
Alternating Logistic regression)(Carey ct, (If. (1993)) approach to investigate spa-
tial correlation respect to caries patterns in primary dentition in 7-year-old children.
At the population level. symmetry in the prevalence of caries experience across the
midline was tested at the tooth and tooth surface levels under ALR model. ALR
simultaneously modeled marginal expectation of each binary variable as well as the
association between paries of outcomes using GEE. Liang et al. (1992) showed that
GEE estimates only can reasonably efficient when covariance structure of the response
variables is correctly speciﬁed. hileanwhile, ALR models have issues of convergence
when the cluster size is large.

GEE based logistic regression models and ALR models are both marginal model,
which means they did not take. care of the heterogeneity and dependence among quad-
rants and teeth nested within corresponding quadrants. The estimate of parameters
of interest for fixed effect is consistent. but it might. be inefficient and seriously biased.
The GEE based approach. as a distribution free. methodology. does not lend itself to
classical tools for model checking. GEE is based on the first. order moment and ALR
is trying to model the higher order moment of the data while still only focusing on
pairwise association without trying to model the joint relationship among the ob-

servations. More importantly. it is infeasible to address to the spatial symmetry of

association strength among quadrants and the teeth within corresponding quadrants
since all these higher order moments characteristics are unobserved. Hence, searching
for alternative solutions continues.

The valid and efficient joint model for the spatially correlated binary dental data is
to incorporate latent variables to induce the dependence structure among quadrants
and the nested dependence structure among teeth within corresponding quadrants.
Meanwhile the latent variables also can generate a flexible multivariate distributions
for the binary dental (lata. Without obvious multivariate distributions for the mul-
tivariate spatially correlated binary data. the joint. model for accounting the nature
of the data is not straightforward. Another way to model the dental data is us-
ing mixture models. Specifically. we can view the distribution of the caries status
of the tooth of interest as being a mixture of bernoulli distributions with different
probabilities of success. The probability of the incidence of caries is modeled by a
logistic regression model that takes the design structure. quadrant and tooth position
within the corresponding quadrant, into account. Generalized latent variables and
mixture models allow factorization of the joint distributions of the multivariate cor-
related binary data into the product of a conditional distributions, given the latent
variables and allocation random variables that induce the unobserved heterogerreities
and dependence structures among the observations. The of.)jective of this thesis is
to develop a. new methodology for complex and likelihood based analysis of multi-
variate spatially correlated binary caries experience from the dental data, which can
help us examine spatial symmetry of the quadrants, association strength among teeth
within each specific: quadrants. In this thesis, we proposed Bayesian generalized latent.
variable model (BGLVM) and Bayesian mixture of generalized latent. variable model
(BN‘IGLVM) to give flexible multivariate distributions of the spatially correlated bi—
nary dental data with dependence structure induced by the latent variables. BGLVM

and BMGLVh-f are specified from Frequentist’s point of view but implemented under

Bayesian framework. The BGLVM uses logistics regression model giving a ﬂexible
multivariate distribution for the dental data with two level of latent variables induc-
ing dependence structure for corresponding level of spatial conﬁguration. For the
BGLVM, the dependence structures among quadrants and teeth nested within quad—
rants are induced by the latent variable models whose covariance structure, modeled
by undirected graphical Gaussian model or conditional autoregressive model. For
the BMGLVM, the dependence among quadrants is induced by the weights of the
mixture components of the mixture model and the dependence among teeth within
the same quadrant is induced by generalized latent variable model in the same way

as in BGLVM.

1.2 Principles for the analysis

The principle of our approach for modeling the multivariate spatially correlated dental
data is based on the concept of latent variables that are incorporated into the like-
lihood based model for generating ﬂexible multivariate distributions for the observa-
tions and inducing multilevel dependence structures due to unobserved heterogeneity
from the complex structure of the multivariate correlated binary data. Speciﬁcally,
two level of random vectors are introduced into to the model via latent variables, which
are used to induce spatial dependence structures among subunits at their correspond?
ing level and generate flexible distributions for the subunits. The joint distribution for
each of the two levels of latent variables is given by Undirected Graphical Gaussian
Model (UGGM)(Dempster,1972, Giudici and Green 1999) with respect to different
spatial configurations of the subunits at corresponding level. Each level of latent vari-
ables is used to induce spatial dependence among subunits at the corresponding level.
The ﬁrst level of spatial dependence structure is the spatial association among four

quadrants. The four quadrants are adjacent in spatial frame and also coexist in the

C31

same oral biological environment, which make them correlated in some unobserved
structure. the second level of spatial dependence structure is the spatial association
among the teeth within the same quadrant. It is reasonable to believe that teeth
adjacent to one another are likely to be correlated. Meanwhile, we know the oral
biological environment is very complicate in the way that the associations exist not
only between teeth adjacent to each other, but also with other teeth in the same
quadrant. The UGGMs for the latent variables will be based on different precision
matrixes: one is unstructured type and the other is Markovian type based on CAR.
model (Cressie (1,991)).

In this thesis, we are trying to combine the merits of frequetist’s and Bayesian's
in model formulations and implement. Specifically, the design structure based mod-
els are formulated within the framework of frequentist for considering the marginal
identiﬁability of the model. The latent variables are incorporated hierarchically in
the graphical structure of Bayesian model and models are implemented in Baeysian
principle. Since our models are based on latent variable approach, local identiﬁa-
bility and model complexity will raise lots of technical problems within frequentist’s
fran‘iework. For example. computational feasibility in optimization. singularity of the
information matrix and accuracy and computatiorial feasibility of high dimensional
integration approximation by using either adaptive Gaussain quadrature or MCMC
based approaches. Bayesian provides a way to avoid all the above technical concerns
by using Gibbs sampling to obtain the posterior distributions of the quantities of
interest. we use noninformative priors for the parameters of interest, since posterior
inference will not rely on the subjective prior information and it will also give the
comparable result with frequentist’s as sample size increase. Meanwhile, we use in-
dependent proper conjugate priors to the parameters of interest, which will ensure
the validity of the posterior samples obtained by Gibbs sampling and improve the

convergence of the MCth based posterior sampling algorithm. More importantly,

Bayesian approach can be helpful in complex modeling situations where a frequentist
analysis is difﬁcult or does not exist. Lee and Song demonstrated better performance
of a Baysian approach in small samples compared with ML estimation. Frequentist’s
results rely on the asymptotic arguments, but Bayesian inference is feasible as long
as the posterior sampling algorithm converge which can be increased easily in large
number of MCMC iterations. All the inferences will be based on credible intervals
within Bayesian framework and implemented in WinBUGS. The appropriate model
will be chosen by a formal Bayesian model selection criteria based on the DIC for

missing data problems (Celeux et al. 2006).

1.3 Outline of the thesis

In chapter ‘2. we will systematically describe the principles of generalized latent vari—
able approaches for joint modeling correlated discrete data. We will also describe
the generalized latent variable model context within the Bayesian framework for an-
alyzing the. dental from STM. We. use nmltivariate spatial latent variables at both
quadrant level and tooth nested within quadrant level to model a very flexible multi—
variate distribution for the binary vectors and induce spatial dependence among tooth
through the dependence structure of the spatial latent vectors in the generalized lin—
ear model settings. The joint relationship among spatial latent will be modeled under
the context of undirected graphical model and conditional autoregressive model cor-
respondingly. Model fitting and statistical inferences about the parameters of interest
are going to be under Bayesian framework.

In chapter 3, we will describe the finite mixture model within the Bayesian frame-
work for analyzing the dental from STM. We use Dirichlet. process to model the
mixing proportions and multivariate spatial latent Variables to model a very flexible

multivariate distribution for the mixture component and induce spatial dependence

among teeth through the dependence structure of the spatial latent vectors in the
latent variable model settings. The joint relationship among spatial latent will be
modeled under the context of undirected graphical model and conditional autore-
gressive model correspondingly. Model fitting and statistical inferences about the
parameters of interest are going to be under Bayesian framework.

In Chapter 4. we will sunnnarize our work and give some routes for the future

work.

CHAPTER 2

Bayesian Generalized Latent

Variable Models

2. 1 Introduction

Dental caries is a common oral disease that results in den'iineralization of the tooth.
In oral health research, the number of Decayed/Missing/Filled Surfaces (DMFS) or
Decayed/Missing/Filled Teeth (DMFT), introduced by Klein et al. (1938), are often
analyzed. The two scores are the sums of binary indicators of caries on the teeth and
tooth surfaces for the primary dentition. This approach operates the analysis at the
mouth level. Leroux et (i1. (200(5) mentioned dental data presents an unique set of
challenges for statistical analysis, including large cluster sizes, multilevel data struc-
tures (e.g.. teeth within patients, sites or surfaces within teeth). complex correlation
structures. Lesaffre et (if. (2006) proposed several methods to analyze the dental data
from the Signal Tandmobiel (STM) project. They used GEE based logistic model and
log-linear model to model marginal mean with exchangeable working correlation ma-
trix to account for the dependence of the data. Vanobbergen et al.(2007) proposed

ALR( Alternating Logistic regression ) approach to investigate spatial correlation

with respect to caries activities patterns in primary dentition in 7-year-old children.
ALR simultaneously models marginal expectation of each binary variable as well as
the association between paries of outcomes. Zhu et a1. (2005) proposed a generalized
latent variable model framework to analyze multivariate spatially correlated data,
which gave an appropriate approach to complex spatially correlated data with large
cluster sizes and multilevel data structures. Their approach is sensitive to Euclidian
space, and can not take care of multi-level dependence structure of the dental data.
More importantly, their method is EM based and implemented via MCMC, which
is computationally intensive for high dimensional correlated latent variables poste—
rior sampling and without fisher information matrix as byproduct. The purpose of
this article is to introduce a Bayesian Generalized Latent Variable Model (BGLVM)
framework for general spatial topology structures to explain multi-level correlations
introduced by ”between-cluster” and ”within—cluster” random effects. Specifically,
the ”between—cluster” random effects are used to induce dependence among quad—
rants and ”within-cluster” random effects are used to induce dependence among teeth
within the same quadrant. The BGLVM, implemented using Gibbs sampling with
non-informative priors, allows us to model the "between-cluster” and ”within-cluster”
correlation structures explicitly. It is possible for us to examine the spatial symmetry
of quadrants in terms of caries incidence, and capture the special spatial association
structure between quadrants for the same subject of interest and among teeth within
quadrants, which can help us efficiently characterize the caries incidence at tooth

level.

10

 

2.2 The Spatial Dependence Structures

2.2.1 Notation

To model the observations. let y”). denote the kth response. variable within jth cluster
of ith subject of interest, where k : l,...,l\'.j == 1....,J.i = 1,...,n. Let yij :
(gm-1, ..., ylﬂw“ yU-K)’ denote the response vector within jth cluster of Hit subject.
Let y) : («Uflv yl-j. yfﬂ' denote the collection of response variables of it}; subject.
let y = (y’1,...,y;,...,y;,)' denote the collection of response variables of all subjects in
this study.

For modeling the latent Variables, we. use undirected graphical Gaussian model.
let Q, = (Q11, ...,Q,J, ...,Q,-J)’ denote the latent variables at cluster level for i sub-
ject, where i : l, ...,‘I'I.. Let TU : (Tijl‘ ""T’iﬂf‘ ...,TU-A-)’ denote the intermediate
level latent variables that are nested within the jth cluster associated with the ith.
subject. Let T) : (T51, TIT Tb), denote the collection of all latent variables at
intermediate level associated with the ith. subject in the study. Let Li 2 ( LTD,

denote the collection of latent variables at both levels associated with the ith subject.

2.2.2 Principles of our modeling approach

The dental data shows a two—level spatial association structures, i.e., the first level
spatial association structures are among quadrant(V)-(VIII). For the convenience of
indexing the data, we will use quadrant(l) instead of quadrant (V) and corresponding
index for the other quadrant. The second level spatial association structure is, nested
within corresponding quadrant. the spatial correlation among teeth.

In general, the valid approaches for analyzing correlated data without explicit
multivariate distribution consist are based on either GEE or random effect models.
The former is suitable for marginal mean or pairwise associations between response

outcomes orientated statistical problems and the latter is for subject specific statis—

ll

tical issues. The dental data is spatially correlated and has information about. teeth
spatial configurations that need to be incarnated in the model to provide explicit
structure for inducing dependence among quadrants and teeth at their corresponding
levels. The main contribution of this paper is to develop a methodology to model this
unique spatial dependence of the deciduous dentition. There is no explicit multivari-
ate distribution available for the spatially correlated binary dental caries experience
outcomes. Generalized latent. variable models (Skrondal 84 Rabe-Hesketh(2004)) are
commonly used to generate flexible multivariate distributions and induce unobserved
heterogeneity for correlated data with implicit multivariate distribution.

To take the unique spatial structure of dental data into account, we use two levels
of latent variables to take care. of the spatial dependence of the teeth within the mouth
for each subject. For the it}: subject, at the higher level, we introduce the quadrant
level latent vector Q,- that is used to tight the four quadrants by inducing dependence
structure among quadrants. The latent vector at higher level is also used to generate
flexible multivariate distrilnitions for the quadrant specific response vectors. The joint
distribution of this spatial latent vector is given by Undirected graphical Gaussian
model with spatial configurations of the quadrants taken into account. The quadrant-
wise observation vectors {f/ij : j = 1, J} will be conditionally independent given Q,
for i = 1, 72. At the intermediate level level, quadrant—specific spatial latent vector
Ti]- is introduced, which is used to tight the five teeth within the same quadrant by
inducing dependence structure among teeth within the same. quadrant. Similarly,
the intermediate level spatial latent. vector is also used to generate flexible univariate
distributions for the tooth specific response outcomes. The joint distribution of this
spatial latent vector is given by Undirected graphical Gaussian model with spatial
configurations of the teeth and the quadrant taken into account. The observatirms
{yijk : k, : 1. It} will be conditionally independent given TU for j = 1. J

and i : l, ..., n. l\leanwhile. the intermediate level spatial latent. vectors {Tl-j : j =

1, ..., J} are conditional independent given the higher level spatial latent vector Q,-
for 2' = 1, ...,n. In order to assess the spatial symmetry of the caries experience of
deciduous dentition, we will examine the association among latent variables at higher
level. Due to the complexity of oral biological system, we will give ﬂexible covariance
structure for the undirected graphical Gaussian models and formal model selection

procedure will be used to choose appropriate one for the data.

2.3 Models

2.3.1 Generalized Latent variable model

Sammel(1997) proposed an joint model for different outcomes in Generalized linear
model framework with normal latent variables introduced to different models. Mous-
taki(2000) extended this framework to a class of generalized latent trait models. Both
of the approaches are based on EM algorithm for model ﬁtting and the computational
hurdles arise seriously as the number of latent variables increases. One of the primary
difficulties is in integrating out the latent variables, although standard approximation
can be used, the accuracy will decrease with the dimension of the latent variables.
Dunson(2000) proposed a model allows observed and latent variables to have distribu-
tion in exponential family. Wang’s (2003) multivariate spatial latent variable model
was extended by Zhu et al. (2005) into generalized linear latent variable models for
repeated measurements of spatially correlated multivariate data. A MCEMG(Monte
Carlo EM Gradient) algorithm was used for model implement, which was based on
numerical approximations to marginalize the score functions and Hessian matrix over
latent variables. It is well known that MCEMG is seriously computationally intensive
and less accurate as the dimension of latent variables increases.

In this paper, we propose a Bayesian generalized linear latent variable models
with two levels of spatial latent vectors. The joint distributions of the latent vec-

tors are given by Undirected Graphical Gaussian model(UGGM) (Dempster,1972,

13

Giudici and Green 1999). In order to test the spatial symmetry property of tooth
caries experience within the subject, we proposed statistical hypothesis testing for
all possible situations under Bayesian framework. Under the latent variable models,
it is assumed that, given the two levels of the spatial latent vectors, the teeth are
mutually conditionally independent then we can specify the complete likelihood. We
will use MCMC approach to perform posterior inference for the quantities of interest
using non—informative priors, which will give the data more ﬂexibility to decide what

is going on and also can give comparable inference results to Frequentist’s.

Response Models

we model the let/L response variable within the jth quadrant of the ith subject, yljk,
which is a binary indicator of caries experience of tooth.,-k(j). Conditional on the
corresponding two levels of latent variables for the kth. tooth position within the j th
quadrant of the ith subject, the response model is given by an exponential family

distribution with the probability density function in a general form

nijkyijk — bz‘fmjk)

all?)

 

ptyijle-i. (1.7.3. e) = Pfl/ijklflijkv so) = exp{ + Cifyz'jka 99)}. (2.1)

where U-ijk = a + 1% + 716(1) + Qij + Tik(j) (hv'IcCuIlagh and Nelder ct al. 1989).
We assume that the link function g() is a canonical link that relates the mean of

yijk to a linear predictor as follows
9(Elllz'jkl'th'A-l) = ”at = a + 31' + MU) + Q2] + Tutu)»

where (1,3 = (131,..“131',...,3J)I,')’ = (71(1):---’71s'(1)v7’1(2)~.--°27K(J))’ are the regres-
sion coefficients for the ﬁxed effects with constraints Zj A3} = 0 and Zk 7M1) = 0;j =
1, J for identiﬁability of the marginal mean. Qij and Tik(j) are the random effects
that are used to generate flexible multivariate distributions and induce dependence

unobserved heterogeneity of the spatially correlated binary dental caries outcomes. It

14

is assumed that the quadrant level spatial latent vector {Q,} are identically indepen-
dent Gaussian with zero mean and covariance matrix 2Q. Furthermore, we assume
that, given the quadrant level spatial latent vectors {(2, : i = 1, n}, the tooth level
spatial latent vectors fTij :j = l, J,i = 1, ..., n} are mutually independently mul-
tivariate Gaussian with mean zeros. covariance matrix {ZJT : j = I, J} correspond-
ingly. The generalized linear model relates the response variables to quadraiit-specific
and l()(')lll-SI)(‘(?lflC covariates and the latent. \l'ariables.

Under the latent variable model approach. we can assume that the response vari-
ables are conditionally mutually independent, given the vectors of latent variables
L : {L.,- = (Q;.TI-’1.....T’ T'J)’ : 2' = 1, ...,n}. The joint probability density of y

1.)..." I

conditional on the set of latent variables L and {(1, (3’. “y". tp} is as follows

. - J 1' _
plylLe. J'U‘r’wﬁ) = lIfzr flj=1 Hit-‘21!)fyijkl’lijkw?) . .
jk“bif’lgkl (U)

.1 K 7r: kg"
: explZLl ijl Zkz1f I] 10109) + sz1/ijkas9)}l

 

Structure I'llodels for Latent. l”'(r7"iables

In the response model. given the. two levels of spatial latent variables, the conditional
independence assui’nption allows the speciﬁcation of complete likelihood for the re-
sponse model. In our modeling approach, the two levels of spatial latent. vectors are
used to induce the dependence structure of the teeth of interest. In order to incorpo-
rate appropriate spatial latent vectors into the model, we need to choose the ones that
can really represent the design structure and characterize the random mechanism of
data generating process. The objective of these latent processes is to generate flexible
distributions for observations and induce the dependency among observations. UG-
G;\ls need to work on specific nodes spatial configurations and we list the possible
graphs for both quadrant. and tooth nested within quadrant levels as below.

As shown above graphs, the four quadrants can be viewed as four nodes in a graph.

If two nodes are not directly connected, they are said to be conditionally indepen-

 

 

Quadrant V Quadrant VI
9:1 @212

 

 

 

 

 

 

 

 

 

 

 

 

Quadrant Vl II Quadrant Vll
3/111 ya}

 

 

 

 

 

 

UGGM Q1: (QilyQi'Zan’Ban‘q),

 

 

 

Figure 2.1. The response vectors y”. 11,3, 11,3 and y” are tighten by spatial latent.
vector Qi 2 (Q11.Qi2,Q;3.QM)’ whose joint distribution is given by UGGM with
unstructured precision matrix.

16

 

 

 

Incisor Molar

yijl \ lncisor>< Molar //yw5

yap yij4

Canine
11:13

 

 

 

 

 

 

 

 

 
   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure. 2.2. The response variables tlij1~ yiﬂ, y,-j3. 11,-1.1 and yup-3 are. tighten by spatial

1
given by UGGM with unstructured precision matrix.

latent vector Ti} : (T'.1(j_)~Ti.2(j)~Ti.3(j)~T11.4(J')vTram), whose joint. distribution is

17

 

 

 

UGGM

_ r~_,__ __.%
Incisor Incisor iC uspm Molar Molar
«(lb/1 —— 0-" 5/112 —————— L .q/JJ [ i_- T—‘ .ylll ——_"- .ylj?)
l___-_ _-_-._ _ _ ---___._J

 

 

 

 

 

 

Tz" = (Tut

J)”

I
Trey), T2130), T2340), 71.50))

Figure. 2.3. Note: The response variables yU-l, yU-Z. i103. yup; and guy) are tighten by '

spatial latent vector TU- = (Ti.1(j)~ TLQU)‘ Th3”), T2340). 7115(1)), whose joint distribu-

tion is given by UGGM with precision matrix under CAR model assumption.

18

dent given the other nodes in the graph. The graphical model, for the ith subject,
is used to describe the spatial configuration of the nodes and characterize the associ-
ation strength between nodes of interest. by partial correlation of the corresponding
between random variables Q, = (QiLQt'Z- (213. QM)’ that. are assigned to the nodes.
As matter of fact. in statistics. partial correlation measures the degree of association
between two random variables, with the effect of a set of controlling random vari-
ables removed. \\'e can assign an multivariate Gaussian distributed random vector
to (2,; i.,e., Q,- ~ ."\'(0, 2Q), which will lead to undirected graphical Gaussian model.
After introducing latent variable vector Q, modeled by UGGM, the quadrants, i.e.,
the quadrant-wise response vectors {Ii/,3- : j = 1, J}, are conditionally mutually
independent. Considering the nested spatial structure between quadrants and teeth
within quadrants, it is necessary to notice that the nested dependence structure is
essential to make the model valid for the problem of interest. The second level of spa-
tial latent. vectors, nested within the corresponding quadrant, need to be incorporated
into the model. Similarly, within one specific quadrant. say the jth quadrant, a quad-

T

rant specific UGGM with random nodes Ti] : (Tilt (~20). T130). Ti-'1(j)~Ti5(j)), are

.J)‘
introduced. The spatial associations among teeth within jlh quadrant are induced by
TU" which is mutually independent conditional on Qt; Furthermme, we assume the
,‘j ~ N(O, 2%):j : 1. .1. After introducing latent. variable vector TU, modeled by
UGGM, the teeth within jth quadrant are conditionally mutually independent.

“'0 know Gaussian random variables are determined by the first two moments.
For the identifiability. we already assume the mean structures of the two levels of
spatial latent variables are, vectors of zeros, then the problem will become issues about
the covariance structures. The general covariance matrix will be unstructured with
symmetric and positive definite constraints. The unstructured covariance matrix can
be simplified if we assume Markovian properties for the. nodes, somehow as shown in

the third graph. The l\‘larkovian type covariance matrix can be incorporated within

19

spatial statistics by CAR model (Cressie (1991)). The choice of the two types of
covariance structure for the spatial latent vectors at tooth nested within quadrant level
is made through model selection in Bayesian framework via Deviance Information
Criterion(DlC) for missing data problem proposed by Celeux et (1.1.9001), which is
an extension of the DIC introduced in Spiegelhalter ct (1.1.(2002) for Bayesian model

selections.

Uml'i'rected Graphical Gaussian. 2110(er

In this sectitm, we review the graphical Gaussian model (Demj’)ster,1972) required
for this paper. Let C = (V. E) be an undirected graph with vertex set V =
{1.....k,...,I\'} and edge set E = {ekk’ : k 74 A" = 1,...,K}, where ekk’ = 1
or 0 according to whether vertices k and k’, l S k. 75 k’ S K are directly con-
nected in C or not. In the undirected graphical Gaussian model, the edges set
describes the associate structures of the. vertex set. Random vector is assigned
to edges set to represent the a-issociation strength between corresponding vertexes.
The undirected graphical Gaussian model consists of all k dimensional normal dis-
tribution, say X : {X1....,XA ..... ,XK}. with X ~ .'\'(O,Z) and precision matrix
Q : 23—1 : {wk/{CI : k yé k’ : 1, K}, where Z is unknown but satisfies the following
restrictions in terms of the pairwise conditional independencies determined by the

Markov properties (Drton and Perlman (2004)):

 

7 . . .’ _ '
«\‘r\{kk/} 4:) pkk/ : 0 VA :/£ 11 »—1,....I\s

where {pkk’} is the so called the partial correlation between the kth and HM vertex
in the graph, defined as p“, = —tuH,//,/wkk *wk/kI. This partial correlation 15 a
measurement of association between two quadrants of interest with the effect of the
rest quadrant being removed. We will use partial correlation to examine the spatial

symmetry property of caries prevalence.

‘20

Conditional A utoregresslve Models

For the vector of univariate variables V = (1/1,z/2.....z/K)’, the zero-centered
CAR Speciﬁcation, where s is the number of spatial nodes of interest, following

Cressie( 1991), sets

(Ukll/_k,0g) ~ 1V“) 2 bkk/l/LJIE); k =1,..., K, (2.3)
Vk’EV—k

where u_k = I/\ {Vk}. Following Brooks (1961) lemma the resulting joint density for
u takes the form

f(u|02) o< exp{—%VTD;21(I — pB)1/} (2.4)

where B is K x K matrix with B = (bkk’) and bkk = 0 and D02 us an K x K diagonal
matrix with non-zero entries {0,3 : k = 1, ..., K}. The precision matrix D—21(I — pB)
' (7

need to be symmetric, which. yields the conditions

rim/oi, = bk/kﬂg; Vk, k’ = 1,1r. (2.5)

If the precision matrix is positive deﬁnite, then (4) is a proper distribution. Un-
der above parameterizations, the precision matrix D‘21(I — pB) is nonsingular if
(7

p E (A‘1 Afnlw) where Amina Am” are the smallest and largest eigenvalues of B re-

min’
spectively. It usually assumes that the D02 = 02M, where M is diagonal matrix with
diagonal elements Mick proportional to the conditional variance of 0%. Meanwhile, 02
controls the overall variability and p represent the overall spatial association. Weights
matrix B with Bkk’ need to reflect the spatial association between nodes k and k’.
GoeBUGS(2004) sets Bkk’ = bkk’ = l/le, for k yé k’ and Mkk = l/nk where nk

is the number of nodes which is adjacent to node k. Under the above settings, the

spatial latent vector V will follow a proper distribution, i.e.
u ~ N(0,02(1 — pB)'1M) (2.6)

21

 

2.4 Bayesian Estimations and Statistical Inference

2.4.1 Identiﬁability of the models

Frequently. models with latent variables are not globally identifiable. One can inte-
grate out the latent spatial variable vectors to obtain a marginal likelihood to assess
whether parameters are redundant. The likelihood of the latent variable model is
parameterized by SQ and {EJT : j = 1, ..., J}. The identifiability problem become
to examine if the parameters involved in the covariance are redundant, which might
be problematic within frequentists framework. Dawid (1979) and Gelfand & Sahu
(1999) discussed model identifiability issues within Bayesian framework. In partic-
ular, Suppose that the Bayesian model is denoted by the likelihood L(t9; y) and the

prior [(0) and we partition the parameters of interest as 6’ = (91.63). If

f(02|61.y) = ff92|flil (2-7)

then we say that ()2 is not identifiable, where f(92|()1.y) oc L(61.92:y)f(62|91)f(91).
That is, if observing data y does not increase our prior knowledge about 92 given 91,
then 62 is not identifiable by the data. Dawid's formal definition of Bayesian model
nonidentifiability states that 62 is not identifiable if and only if “61,92; y) is free of
62. In order to make our model identifiable, we need to not. only take care of marginal
identifiability of the model through integrating out. the latent. variables, but also put
some constraints to the covariance matrix of the Gaussian spatial latent vectors at

both levels.

2.4.2 Prior distributions

In this section, prior distributions are chosen for the regression parameters 0 and
association parameters 6. Gibbs sampling algorithm is applied to simulate the samples
from the posterior distributions of the quantities of interest. Zhao et al.(20f)6), Zeger

cl (1.1.(1991) and Dunson et al.(2000) all suggested noninformative conjugate prior

22

distributions for the parameters of interest, which can wash out the effect of priors
as sample size increases. Bedrick et al.(1996) noted that normal prior distributions

were suggested for the logistic regression coefﬁcient 6.
9 N lV(/l, F), (2.8)

where u is the a vector of location parameters and F is the covariance matrix. It
is common to take it as vector of zeros and F as diagonal matrix with very larger
entries.

we are interested in the joint posterior distribution of (6, ﬁly). Under mild con-
dition in (Geman and Geman ct al.(198tl)), Gibbs sampler can obtain the joint pos-
terior distribution by sampling from the conditional posterior distributions (QR/,5)
and (ﬂy, 0) correspondingly. To simplify the sampling from the conditional posterior
distributions, we choose hierarchical independent priors for 9 and 5 in this hierarchi-
cal Bayesian model, i.e. (Eli/,6) = (ﬁly), which is true as long as the priors satisfy
p(9,€) = p(6)'p(§). We proposed two covariance structures for the Guassian spatial
latent. variable models. In the generalized linear model setting with Gaussian random
effects, the proper noninformative conjugate priors will be Inverse Gamma(IG) for
signal variance component and Inverse W ishart distribution for a variance-covariance
matrix.

Let QQ = EEC—21 and {QT}. = )3}; : j = 1, ..., J} denote the precision matrixes of
the two levels of spatial latent vectors correspondingly. At higher level, the precision
matrix for the spatial latent vector {(2, : 7' :2 1.....n} is unstructured. W'ishart
priors (Dunson et al.(2000), O'Malley and Alan M. Zaslavsky et al.(20()6)) is applied
as conjugate non-informative priors for the precision matrix 9Q under unstructured
situation.

QQ ~ W’ishart(eQ, AQ), (2.9)

with the degrees of oQ and the precision matrix AQ. In practice, the common

23

 

noninformative \V'ishart. prior is chosen by specifying AQ = IvapQ and ’t‘Q =
ra'nlc(ZQ)+1. It will yield a prior under which the marginal distribution of each corre-
lation parameter is U(1. 1)(O‘.\Ialley ct, (Ll.(20()(3)). At intermediate level, we have two
precision matrix structure for the spatial latent vectors {Tij : j = 1, J,i = 1, n}
and we will give noninformative priors correspondingly.

(1) Unstructured precision matrix in the UGGM: Conditional on the higher level
spatial latent vector (2,, the intermediate level spatial latent vectors {Tl-J- : j =
1, ..., J} are conditionally independent. So, we give independent priors to the pre—
cision matrix {QT}. : j = 1,...,J}. Similarly, independent Wishart processes are

assigned as priors for these precision matrixes.
127‘}. ~ ll'lsll(ll‘l(l’Tj..\7:}.)2 j: l.....]. (2.10)

with the degrees of 17 : run/{($12) + 1 and the precision matrix AT- = 11.7. “T .
J J J j j

(2) CAR. model based precision matrix in the UGGM: Conditional on the higher

level spatial latent. vector (2,, the intermediate level spatial latent vectors {Tij : j =

1, J} are conditionally independent. So. we give independent priors precision ma-

.- ..-__ ...,... .-,, 2 _.'_

trix {QT}... j — 1.....J} that arc parameteiizcd by {of/2] .j — 1....,J}. Sunllaily,

independent Inverse Gamma (Dunson et al.(2()()())) distributions. proper conjugate
aims. are assi net as rior.‘ o «1% )v r1 variation )' 'n ;e o. : '=

, f‘lt. st tl((€1ll t araretrs 12 1 J

and independent uniform distribution with supports constraints in section 3.1.4

to the overall spatial association parameters {pj : j = 1,...,J}, improper priors

CeoBUGS(200‘1) for the over quadrant specific spatial association parameters, re-

spectively.
of ~ l(I(E.5): j: 1....../. (2.11)
and
r —1 —1 . ,
p} N L" (Ann-”'AITIQJ'): J : 1~ J- (212)

2:1

where 5 is very small positive number and Ami/1’ A;,},_,. are as defined in section 3.1.4.

2.4.3 Posterior computations

MCMC techniques are used for the posterior computations in the models proposed
in section 3. The posterior distributions of parameters of interest can be obtained in
standard way (Dunson et al.(2000), Zeger et al.(1991)).

Given the precision matrixes (2Q and {QTi : i = 1,...,I}, the joint posterior
distribution for the regression parameters and latent variables at both higher and
intermediate level is

Pt9~Q~lel 0< [In/WOT)” (9 Q T)

”UV/1M b ”(ft/l)

oc exp {21.12 k{ “z( ) + (wiggle-4119)} - %9’F—16 (2.13)
X exp {’72 2?:1 QngQ-z‘ ’ 52?;12521TQQTF11 1

 

where Zijk denote 21-12]; 121— 1, 7r( )denote the joint prior density, 62— —

( ’1,...,Q:-,...,Q,,)’ with Q,- = (Q,-1,...,Q,-j,. .,Qz'J)’, T— — (Tl',..., Ti’,...,T,’,)’ with
T, = (Tf1,..,T,-'J-,...,TZJ)' and ”Tl-j : (Ti.l(j)~"'1Tz'.k(j)1'"1T2'.K(j)ll' Furthermore,
9=(0 3’7 eldlld17.J1—a+ﬂj+n +sz+ T413)-

If the MCMC algorithm is a Gibbs sampler, the full conditional distribution of
each of the unknowns in (13) needs to be specified, which can be obtained in a
standard way Dunson et al.(2000). Zegeret al.(1991)). For the fixed effect. 6, the full
conditional distribution is

71111-1. — b,(n. '1)
PfngsT.y)O(exp Z{ ’J ’3 (.) U
i,j,k 0" 5“

 

1 _
+cr,(1/,jk.1,9)}—;)-6'F 16 . (2.14)

The full conditional distribution for the Gaussian spatial latent vector 62,, is

 

m 1U 1 ”(7231.)
thilTy9)o<exp Z{ J ’J ”

arts?)

1 TI
+ Cifl/ijk: #9)} — 5 Z QiQQQi
22:1

i,j,lc

A
5°
p-a

The full conditional distribution for the Gaussian spatial latent vectors Tij is

 

n1y1— bhrm
1371(1le U0 )CXGXP Zf J] U J I] +Ci(yijk199)~%ZZTiJQTTTU

(2.16)
The full conditional distributions of precision matrix QQ is
n
p(QQ|Q. T. y: 6) : ll'is/uu't(t.'Q + N. AQ + Z Q,Q:). (2.17)
1:1
The posterior distributions for {SZTJJ = 1, J...,} can be obtained in terms of
different precision matrix structures correspondingly.
(1) Unstructured precision matrix:
IV
p(oT |Q T y' 6)— — n 15-1,..11‘1(th, + v AT + ZT TU)J (2.18)
7721

(2) CAR model based precision matrix:

(2.1) Overall precision parameters:

 

U1U1—UKU1)
ptrle.T.y; Wong-"1 exp 2] ’J ”am ’3 +c,(y,,j,,,..,o)}—TJ-s .(219)
at ‘

(2.2) Overall spatial parameters:

U1U1 b071)
Pf/leQmTyb’wap Zf 'J 'J atel JJ +('z’(yijl.~¢)} 1W1 mind/J1)"

m i n'
i.)' k

 

(2.20)

All the posterior distributions, except for {p}- : j = 1, , J}, are proper based on
their proper conjugate priors. The uniform priors for the overall Spatial parameters
are not conjugate, which might lead to improper posterior distributions. The simplest
technique for verifying if the posterior distributions of the parameters is proper is to
verify if the posterior distribution is proper for reduced data by discarding all but a
single outcome per subject leading to a reduced data set consisting of independent

outcomes, are proper (O’Brien and Dunson, 2004). Since the covariance structures

26

do not appear in the reduced data likelihood and also the support for the spatial

association parameters is finite, i.e., {pj E (AT—”injxfniu.) : j = 1,...,J} , so the
posterior distributions of the spatial association parameters {pj : j = 1,...,J} are

proper. The algorithm for the posterior computation is through sampling (9. Q T,

and { respectiver from the above conditional distributions.

2.4.4 Missing data issue

In medical research. missing data is a very common problem. Little and Rubin et.
al.(2002) gave a comprehensive framework for dealing with missing data. We will
follow their framework to incm‘porate missing data into our model. Let Y denote the
data that. would occur in the absence of missing -'alues. we write Y = (1‘;))S,Y,,,,;S),
where Yam. denotes the observed values and Ymis- denotes the missing values. Let
f (Hip) E f(Y0b8,1",,ujs|z;') denote the probability or density of the joint distribution
of Yobs and sz‘s- From frequentist’s point of view, the inference is based on the
marginal probability density of Yobs is obtained by integrating out the missing data

Ymi s 1

filibhsiu) :/fuel)»Unn.~'1l‘+')([.‘/rnis-

More ‘enerallv. we define a "mom —d(1.f(i Tn.dz'('(zf()7' as follows:
. .1

1, Eli)!" missing.
.11”). 2: (2.21)
0. (JIM. observed.

The full model treats .11 as random Variables and specifies the joint distribution

of :11 and Y. That is.

m: 11111959) : f0'lM: U”)f(1l[ly~.9)1 (1.1.9) e 91....

where ”due is the parametric space of (111,9).

ND
‘1

The actual observed data is (Yob.s~ M). The distribution of the observed data is
obtained by integrating sz‘s out of the joint density of Y = (lgwanu-s) and AI.

That is.

f( Jobs" All?“ 9) : fflybb.s- ymisl‘lfz "¥">f(3lllyobsa ymis: Qldymis- (222)

The full likelihood of (Luv, 9) is proportional to the above, i.e.

quzztc't QlYobm M) OC ftYLb.s-J1|cﬁ'..0)- (223)

If the the distribution of missing-data mechanism does not depend on the missing

values Ym is» then

f(Yobsv Allie/'3 Q) : f<1lllyobse 9) f f(Y0bs'symisl1U§ Wyldyniis

, (2.24)
: [(A'Ilyobs» mflyobslzui 749)-

Under MAR (missing at random) assumption and LI" and Q are distinct. the
like]il‘iood-based inferences for L will be the same as likelihood-based inlerermes for Lil!
from f(l;.)gmlt").

From Bayesian point of view, missing data is treated as random as well as the
parameters of interest. One of the advantages of the Bayesian hierarchical approach
implemented in WinB U GS is that missing data from the response variables can be
routinely handled. In most. statistical packages, incomplete cases (in either the re-
sponse or the covariates) are removed from any analysis. W'mBUGS generates a
sample to replace missing responses from the posterior distribution of the response

variable under MAR assumption.

2.4.5 Bayesian Model Selection

The formal procedure for choosing an appropriate Bayesian l‘iierarchical model for

the observed data necessities methods to compare alternative models within Bayesian

28

framework. The DIC (deviance information criterion, Spiegelhalter ct (Ll. (2002)) is
a hierarchical modeling selection criterion that can be viewed as a generalization of
the AIC (Akaike information criterion,Akaike, 1973) and BIC (Bayesian information
criterion, Kass and Raftery. 1995). It is particularly useful in Bayesian model selec-
tion problems where the posterior distributions of parameters have been obtained by
Markov chain Monte Carlo (MCMC) sinmlation. The DIC—statistic is a measure of
model complexity and goodness of fit with the definition as

01C 2 0(a)) + [)0,

where 0(a)) is the posterior mean of the deviance 0(u’i) = —2log(f(y|L/Lv)), which is
a measurement. of goodness of fit of the proposed model for the observed data. Let
0(L_) be the deviance evaluated at the posterior mean of L9. Let 110 = 0m —— 0(0)
denote the effective number of pz-u‘ameters in the model. which is a penalty for the.
complexity of the mode]. The quantities 00—) and 0(0) can he obtained routinely
from an MCMC simulation chain. Our hierarcliial models contains two levels of latent
variables. which necessitates the model selections to be based on the DIC for missing
data problems (Celeux et al..200(i). In terms of our problem, we have to deal with
both missing data and latent variables to get a complete DICs. In order to deal with
missing data, we consider the complete likelihood (21) and the deviance function has
the form

0(0) 2 -2 log {f(YO,M. .‘lll’L‘. 9)}
= ’210g{j ffyobs‘ ymislfl‘li lwlff‘wlyobw Urn-2's: Qldymis} ~

where 0 = (W, g')’. Pettitt et a1. (2006) gave an approximation for (21) in the form

(2.25)

0(0) = ~210s{ftz/uawyﬁl‘ll:t")f(:\/l.I/ubs~.117}9)} ~ (226)
where im, is the posterior predictor for the missing data. Y,,,,;,..
In order to deal with the latent vectors. we need to compute the complete DICs

in C'eleux el, (1.]. (2000). Let E,,.[()|y. (1. t] denote the posterior mean of Liv, based on the

29

complete data (y', q’, t'), where (q’, t’)’ is the realization of the spatial latent vectors

(Q’, T’)’. The DIC for the complete data model is

DICtysq.!)=--~-1Et.l10s(f(y.q.tlc'))ly.c1~fl+Must/(31.0.tlEulwlywtt0). (52-27)

As in the EM algorithm, we can then integrate Q and T out from (26) to get

DIC = EQ,T[DIC(y,Q, T)lyl

= “4EL”.Q,Tl108(ff!/~Q»Tl'¢3’))lyl+ 25Q,Tl10s(f(y,QyTlel'wly, enmy].
(2.28)

All the integrations can be obtained trivially through Monte Carlo integration
approximation using the MCMC posterior samples in the coda file of WinB U GS.
Combining (2.25)-(2.28), we have the DIC for Bayesian generalized latent variable

models with missing data in the form.

DIC : EQ,TlD1C(yobsv 97727.5“ 0’1: Qa T)ly0b8) All
= ‘4Ew,g,Q.Tl10g(f(yobs~ymiSvQLTlUI)» (W: Qlfuulyobsw 352.73, Qlllyobsv Ml

+2EQ,Tl10gff(f/obs~ 30:33- Q~ TlL @ffjl/lyobw fmsw allyobsr .vlf],
(2.29)

where (1:) = EL‘.('I."'I;t/Ubs. AI. Q, T] and E = EQlQlyubs' M].

2.4.6 Spatial symmetry hypothesis testing

The spatial symmetry property in our problem means the joint caries experience pre-
sentations for response variables at quadrant level are highly associated with one an-
other. Dentists do believe that spatial symmetry exist in mouth. Lesaﬂre et al.(2006)
showed empirically that the caries experience for left and right quadrants are more
strongly associated than the other cases. Unfortunately, few literatures have discussed
this issue comprehensively. In our UGGM at quadrant level, we know the partial cor-
relation parameters {pjj/ : j # j’ = 1,...,J} measures the strength of the spatial
association among two different nodes(quadrants). One of the major concerns of the

spatial symmetry in mouth can be formulated as the following hypothesis situation:

30

Hypothesis testing for pairwise comparisons among spatial association

strength parameters

In order to assess the. spatial symmetry of the four quadrants. we need to introduce
different ”Neighborhoods” relationships that. can explain the relative spatial struc-
tures of the quadrants of interest. Spatial symmetry is assessed at. the quadrant level,
instead of tooth level. At quadrant level, We deﬁne the vector of teeth to be ”Hori-
zontal Neighbcn‘s” to each other, if the two quadrants are both in either ”Upper Jaw”
or ”Lower Jaw”, and to be Vertical Neighbors” to one another, if the two quad-
rants are both in either ”Left Jaw” or ”Right Jaw” and to be ”Across Neighbors”
to one another, if the two quadrants are either in ”Left Jaw” or ”Right Jaw”. The
assessment of quadrant. spatial symmetry in terms of cries prevalence will be based
on ”Left-right”, i.e., ” Horizontal Neighbors”. ”Up-down”, i.e.,” Vertical Neighbors”
and ”Across”, i.e., ”Across Neighbors”.

There are two ways to assess the spatial symmetry among quadrants in terms
of caries prevalence incidence through statistical hypothesis statement. The first
one is lgiased on the so called ”overall” spatial symmetry assessments via a weighted
statistic and the second is the so called ”specific” spatial symmetry assessment that
is the direct. comparisons of the spatial symmetry measurements.

First. of all. the weighted statistics for assessing the overall spatial associations in

terms of ”Left..-rigl1t”, ”Up-down” and ”Across” can be formulated as below:
1
PLR = "2-(p56 + p78):

1
PUD = 50%;? + [158):

l ,
PA : 3m»; + {1.37)

31

The statistical hypothesis testing about the overall spatial association in terms
of ”Left-right” V.S. ”Up-down”, ”Left-right” V.S. ”Across” and ”Across” V.S. ”Up-
down” can be formulated as follows:

(1 ) Left—right. Versus Up-down

110 : ﬂue : PUD V-S- H1, : PLR # MI): (230)

H

(2) Left-right Versus Across

Ho : pm = m V-S- Ha ; ﬁLR 7Q PA; (231)

f3) Across l"'(%7'b"lt5 5719an

HO : PA = PUD V-S- Ha : PA 7'4 PUD- (2-32)

Secondly, if the assessment is based on the direct comparisons of spatial symmetry
measurement, there are twelve possible hypothesis testing situations for the spatial
symmetries in terms of partial correlation between quadrants.

(1.1) Left—right. Versus Up-down The association between quadrant 5 and quad-
rant 6 VS. the association between quadrant 6 and quadrant. 7, with quadrant 6 as

reference.

Ho : {1.56 = .067 V-S- Ha : 056 5* 967: (2.33)

(1.2) Left-right Lenses (»"p-d()*urn The association between quadrant 5 and quad-
rant 6 VS. the association between quadrant 5 and quadrant 8. with quadrant 5 as

reference.

H0 : [’56 = P58 V-S- Ha : p.56 # #582 (234)

32

(1.3) Left-right Versus Up-dow'n. The association between quadrant 7 and quad-
rant 8 V.S. the association between quadrant 6 and quadrant 7, with quadrant 7 as

reference.

H0 : P78 = P67 V-S- Hu : P78 7g P67: (235)

(1.4) Left—right versus Uzi—down The association between quadrant 7 and quad-
rant 8 VS. the association between quadrant 5 and quadrant 8, with quadrant 8 as

reference.

H0 1 P78 = P58 VS- Ha 1 P78 7A PS8: (236)

(2.1) Left-right Versus Across The association between quadrant 5 and quadrant 6

V .S. the association between quadrant 6 and quadrant. 8, with quadrant 6 as reference.

H0 1 P50 = P88 VS- Ha : P58 74 P881 ('2-37)

(2.2) Left-right, 17"ersus Across The association between quadrant 5 and quadrant 6

VS. the association between quadrant 5 and quadrant 7, with quadrant 5 as reference.

H0 1 P56 = P57 V-S- Ha I P56 7é P57: (238)

(2.3) Left-771071.! Versus Across The association between quadrant 7 and quadrant 8

V .S. the association between quadrant 6 and quadrant 8, with quadrant 8 as reference.

H0 : P78 = P68 V-S- Ha. 1 P78 # P68; (239)

(2.4) Left-right Versus Across The association between quadrant 7 and quadrant 8

VS. the association between quadrant. 5 and quadrant 7, with quadrant 7 as reference.

”0 1 P78 : P57 1’75- Ha : P78 7f P371 (210)

33

(3.1) Across Versus U p-dourn The association between quadrant 5 and quadrant 7

VS. the association between quadrant 5 and quadrant 8, with quadrant 5 as reference.

H0 : p57 2 p53 VS. Ha : p57 # p58: (2.111)

(3.2) Across Versus Lip-down The association between quadrant 5 and quadrant 7

VS. the association between C uadrant 6 and ( uadrant 7, with (, uadrant 7 as reference.
I l 1

H0 : P57 = P87 V-S- Ha : P57 7A P67; (1)-42)

(3.3) Across Versus Up—down The association between quadrant 6 and quadrant 8

VS. the association between quadrant 5 and quadrant 8, with quadrant 8 as reference.

Ho 1 P88 : P38 VS Ho 1 P68 # P581 (243)

(3.4) Across l'crsus Up-dourn. The association between quadrant 6 and quadrant 8

VS. the association between (piadrant 6 and quadrant 7. with quadrant. 6 as reference.

H0 1 [’68 2 p07 VS. Ha 2 p68 # {’67- (2.44)

Simultaneous credible intervals

Pairwise spatial symmetry hypothesis testing is based on credible intervals for the dif-
ferences between two partial correlations corresponding to two different. nodes (quad—
rants) in the UGGiVI. In Bayesian statistics, a credible interval is a posterior proba-
bility interval, used for purposes similar to those of confidence intervals in frequentist
statistics. Suppose that parameter c is of interest, a (1 — (010008 credible interval for
the parameter c ofinterest is any set C such that Pauly)“ E C) : l— (.l‘, where 77(gly)
is the posterior distribution of parameter c given the observed data y. There are two

ways to assess the spatial synnnetry among quadrants in terms of caries prevalence

3‘1

incidence through statistical hypothesis statement. The first one is based on the so
called ”overall" spatial symmetry assessments via a weighted statistic. and the second
is the so called ”specific” spatial symmetry assessn'ient that is the direct comparisons
of the spatial synnnetry measurements.

Since we are performing a multiple spatial symn‘ietry comparisons among quad-
rants in terms of all possible hypothesis testing situations, it is necessary to give
a simultaneous credible regions (Besag ct (1.1. (1995)) to control type S error rate
(Gelman et (11.), i.e., the similar concept as type I error rate in frequentist’s frame-
work. The 1001x'/M% simultaneous credible regions for overall spatial associations

differences are based on order statistics (Besag et al. (1995))

{[(p, — p1,)l‘lf+l_’*l. (p1 — p11)l”*l] : (1,11) 6 Neighborhood},
where
1* =1'1'1111{f=#{(P1 — P11)"‘”+17’ l S (PI _ P10“) S (P1 - P11)” l} .>_ Kl»

and {(p] — p11)(’) : t = 1......)-[. (1,11) 6 Neighborhood} are the posterior
samples of {(pl — p”) : (1.11) E Neighborhood}. Here, Neighborhood 2
{(”LR”,”UD”).("LR”,”A").(”A”,”U0”)}.

Similarly, the 100K/ ill-1' ‘70 sin'iultaneous credible regions for speciﬁc spatial associ-

ations difference are given by

1M 4* 1* . . . . . . . . . .
{Han—W/V’H ldﬂﬂ'ﬂﬂ”lhirﬂorjhu/f¢049mj=1vaf
where

. ,' . _ * '* ..
(.* :111111{f2#{([)iJ-I ~/)J.j/)l.\l+1 I l S (”111’ _ ”11”“) S (pm, _ 101.101! l} 2 A }

am ps—pJWML=1whm7¢ﬁj¢fpha¢oymej=r J “mm;
n J]

posterior samples of {phi — pjj’ : i 75 i',j 75 j’. (7,7’) # (j,j’),i,j = 1,...,J}.

2.4.7 Example

Now we show how the above methodology works for dental data and need to spec-
ify all the functions and general notations. In our study, all of the responses
are binary, so we have the following: ai(Lo) = 1, bif’iz‘jk) = log(1 + exp(7],-jk_)),

expfﬂ' A) , -
z _ . .r . . _ . _
—_J——1+9xpf'hjkl' g(..1:) — log(1—ﬁr), for k —— 1,...,5,} —

Ciff/ijkWD) = 0» Elyer-l'flz‘jkl =
1,...,4,'i = 1,...,n. Hence, the parameters of interest in the observational model
is 6 = (o,,3’,*y’)’ and .5 : §_1(EQ,ZI,...,Z]f,....2%)’, then logp(y,-J-k|n,-jk)

logp(gij;‘.lQ.,,Tl-'A.U).6) : "lz'jkyzijk — log(1 + exp(1/,~jk)). The canonical parameter

{’hijk : k = 1,..,5,j= 1,....4,i=1,...,n} is defined as follows:

Priors for parameters of interest are given by iioninformative proper conjugate
priors. which will give comparable results as frequentistis as the sample size increases.

More specifically, the priors are given as follows:

07 ~ N(0, 1000); (2.46)
sj ~ N(0,1000); Vj = 1, J — 1, (2.47)

with constraints 2:] .3]- : 0 and Z). Mo) = 0. ‘j : 1. J for identifiability of the.
observation model. For the priors of precision matrix, O’Malley and Zaslavsky (2006)

proposed scaled \N'ishart distril’mtion as conjugate proper priors
QQ ~ I'l"'is/i.(1rrt(-1 + 1,1), (2.49)

where I is r1 x 4 identity matrix.

36

For the priors of the precision matrix {QT}. : j = 1, ..., J}, there are two different
models for the the structures of the precision matrix.

(1) Unstructured precision matrix:

QTJ- ~ l'f"'ishart(5 + 1,11), Vj = 1, ....4 (2.50)
where I I is 5 x 5 identity matrix.
(2) CAR model based precision matrix:
rj ~ G'a'm.'m.u(0.001,0.001); V] = 1. ...,4, (2.51)
pj ~ U(A;,}n,A;,},,). Vj=1,...,4, (2.52)
where {OJ—2 = r]- : j = 1.....4} are the quadrant specific parameters for overall
variability and {pl- : j = 1,...1} are the quadrant speciﬁc parameters for overall

spatial effects. Am," and Am” are as deﬁned in CAR models in section 3.1.4.

To construct 95% simultaneous credible regions, we use 11,000 MCMC iterations
with 1000 burn in, i.e., M = 10,000 and K = 9, 500. The 95% simultaneous credible
regions are more convertive simultaneous conﬁdence regions than frenquentist’s for
the multiple hypothesis statements since they have a type S error rate between 0%

and 2.5% (Gelman et at).

2.5 The Signal Tandmobiel Project Example

In the Signal-Tandmobiel project, there are 4,468 7-year-old schoolchildren (born in
1989) from 179 schools in Flanders (Belgium) who were selected by a stratified clus-
tered random sample. The mean age of the children on the day of examination was
7.1 years (SD = 0.4). The 15 strata were obtained by combining the 3 types of edu-

cational system (public. municipal and private schools) with geographical areas (the

37

Table 2.1. Prevalence of caries experience(“1 affected) in the ('leciduous dentition of

7-year—old children 1121.351.

 

 

tooth 55 54 53 52 51 u 61 62 63 64 65

 

Prevalence 8.92 5.20 0.74 3.72 7.81 H 7.06 2.23 1.86 5.20 8.55

 

 

 

 

tooth 85 84 83 82 81 “j 71 72 73 74 75

 

Prevalence 10.78 13.75 1.12 0.74 0.37 I} 0.37 0.37 0.37 11.15 9.67

 

 

5 Flemish provinces). The schools represented the clusters. This sample represents
about 7% of the corresponding Flemish population. The sampling procedure aimed
at selecting each child in Flanders with equal probability. A more detailed descrip—
tion of the design of the Signal-Tandrnobiel project is reported in Vanobbergen et al.

(2000).

2.5.1 Primary results

The frequency table for the prevalence of caries experience in the deciduous dentition
is shown in table 1, for the 7—year—old children. The descriptive statistics suggested a
spatial symmetrical pattern in terms of caries experience.

In Vanobbergen et ul. (2007). pairwise associations were assessed in terms of odds
ratio of caries experience via ALR model. The results are shown in table 2.

The above result shows that it. is left-right spatial symmetry is the most notable.
Decayed teeth of discordant contralateral pairs tend to aggregate on the right. or the
left side of the subjects mouth than would be expected by chance alone (Vanobbergen

ct. (1(.(2007) ).

38

Table 2.2. Odds ratios and 95% confidence intervals for the 2x2 association models
for caries on deciduous molars on tooth in 7-year-old children.

 

 

First Molar (ALR model)

 

 

54 6:1 74 81
54 1648(1375—1974) 8.17(6.91—9.64) 7.23(6.13-8.53)
64 7.61(6.47-8.97) 7.18(6.10-8.44)

74 2282(1928-2700)

 

 

 

 

Second Molar (ALR model)

 

 

55 65 75 85
55 1547(1300—1828) 878(752—1027) 923(790—1079)
65 8.08(6.92-9.42) 8.86(7.58-10.35)
75 20.37(17.20-24.11)

 

 

2.5.2 The results from our approach

Our generalized latent variable models are. implemented in WinBUCS, using nonin-
formative priors for the parameters of interest. After 1,000 burn-in, the posterior
distributions of the quantities are. based on 10.000 MCMC iterations. There are two
possible models indexed by the precision matrix structure for spatial latent vector at
intermediate level. The choice for appropriate model is based on the DIC for missing
data problem (Celexu et al.(2006)). In this part, we will give the results for both over-
all and specific spatial symmetries assessment through simultaneous credible regions
for the differences of interest. The results start from the overall spatial symmetry
Essessment under different model assumptions. in table 3—4, based on 95% simulta-
neous credible regions of the differences that are corresponding to their hypothesis
testing situations. It was then followed by the results for specific spatial symmetry
assessments in table 5—6.

Based on the results from two different models, the posterior inferences about the

spatial symmetries are similar. which tells us both models work fairly well. Bayesian

39

Table 2.3. Credible intervals of overall spatial association strength comparisons Based
on UGGM with unstructured covariance structure

 

 

Simultaneous Spatial Effects Credible intervals
left/righ .v.s. across

pLR —- pA (0.807, 1.238)
left/righ .v.s. upper/down

 

 

 

 

 

across .v.s. upper/down
pA —— PHD (-0.775, 0.491)
DIC 593.300
N.burnin 1000
Ninteration 11000

 

 

Table 2.4. Credible intervals of overall spatial association strength comparisons Based
on UGGM with CAR model based covariance structure

 

 

Simultaneous Spatial Effects , .
Credible intervals
lefWrigh .v.s. across

 

 

 

 

 

 

PLR -— pA (0.807, 1.236)
feft/righ .v.s. upper/down
PLR jun (0310, 1-427)
across .v.s. upper/ down
[)A _[)UD (-0.779, 0.410)
DIC 780.500
N.burnin 1000
N .interat ion 11000

 

 

40

Table 2.5. Credible intervals of specific spatial association strength comparisons Based

on UGGM with unstructured covariance structure

 

 

Simultaneous Spatial Effects

Credible intervals

 

left/righ .v.s. across

 

 

 

 

 

 

 

 

 

p55 — {168 (0.134, 1.581)
(’56 -—' {257 (0.394. 1.719
{173 — {’68 (0.237, 1.589)
[’78 11):)? (0133,1728)
left/righ .v.s. upperfdown
[156 ~ p67 (0.235, 1.551)
p56 —- {)58 (0.117, 1.48@
p78 — p67 (0.230, 1.601)
p78 — p58 (0.215, 1.5041)
across .v.s. upper/down
p68 — p67 (-1.303, 1.313)
P68 — {158 (-1.327, 1.204L
p57 -— p67 (-1.442, 1.109)
p57 - p58 (—1.488, 1.042)
DIC 593.300
N.burnin 1000
N .interation 11000

 

 

«11

Table 2.6. Credible intervals of specific spatial association strength comparisons Based
on UGGM with CAR model based covariance structure

 

 

Spatial Effects _ ,
C redible intervals

 

left/righ .v.s. across

 

 

 

 

 

 

 

 

 

p56 M [10,3 (0.068. 1.404
p50- —- p57 (0.5115, 1.001)
p73 — [)68 (0.236, 1.416)
p78 — [151- (0477, 1.662L
left. frigh .v.s. upper / down
{156 ~— p67 (0.297, 1.458)
[J56 — p58 (0.007, 1.455)
mg — [157 (0.291, 1.524)
p78 — p58 0.262, 1.450
across .v.s. upper / down
1068 — p67 (-1.020, 1.209)
[)68 —— 10.58 (—1.078, 1.146)
p57 — [)(57 (—1.258, 0.970)
p57 — p58 (—1.381, 0.950)
DIC 780.500
N . burnin 1000
Ninteration 1 1000

 

 

model selection is based on DICs, the smaller the. DIC. the better the model. It is
common in practice that if the difference between the DICs of two different models
are more than 10 then the model with smaller DIC is the better one. Hence, from
the results from table 3 to table 6, conditional on the data, the model with unstruc-
tured precision matrix is the better one. Speciﬁcally, the appropriate hierarchical
generalized latent variable model consist of two levels of latent vectors. The ﬁrst
level of Gaussian spatial latent vector has unstructured precision matrix. The sec—
ond level of Gaussian spatial latent vectors also have unstructured precision matrix.
Furthermore. the choice for the unstructured covariance structure can be explained
by the following two facts. (1) The oral biological environment. is so complected
that. the higher level Gaussian spatial latent vector might not be able to account
for the heterogeneity from four quadrant—wise response vectors sufficiently and leave
some residuals to the intermediate level spatial latent vectors. (2) At intermediate
level, Gaussian spatial latent vector with CAR model based precision matrix are not
sophisticated to account for both the residual heterogeneity and the one from the
teeth within corresponding quadrants. Hence, the second level of Gaussian spatial
latent vectors need more complicated precision matrix than Markovian type (CAR
model based covariance matrix). Based on the chosen model, the conclusion of the
hypotl‘iesis testing about both overall and specific spatial symmetry among quadrants
are as follows: (1) Left-right. spatial association is the strongest, which is shown in
terms of 95% sinuiltaneous credible intervals of the (‘lifferences between left—right and
across and the differences between left-right and up—down with lower bounds are all
positive. (2) The difference of spatial associations between across and up-down is not
significant at type S error rate between 0‘7» and 2.5%, (Gelman (2006)), since 95%
simultaneous credible intervals of the difference between across spatial association

and up—down spatial association includes zero.

:13

2.6 Discussion

In this chapter, we propose a flexible class of Bayesian Generalized latent variable
models for multivariate spatially correlated binary data with multi-level dependence
structure. Our approach is to model the response variables by distributions in the ex-
ponential family and impose a n’iultivariate spatial correlation structure on the latent
variables, which accounts for the multi—level spatial dependence structures. Statisti-
cal inference is based on posterior sampling from the posterior distributions of the
parameters of interest. We have used undirected graphical Gaussian model(UGGM)
for constructing the precision matrix structures of multivariate spatial latent vectors
at both higher and intermediate levels. One consideration is the parameterizations of
both the observational and latent variable models, for the identifiability of the model,
we constrain sum to zero for the fixed effects and the spatial process has mean zeros.
N oninformative conjugate priors are applied for the parameters of interest, which will
give a comparable inference results to the frequentist’s as the sample size increases.
We proposed two possible models to account for the dependence structure in the den-
tal data. Bayesian model selection is based on DIC for missing data problem. Spatial
symmetry hypothesis is assessed by simultaneous credible intervals for multiple com-
parisons of pairwise spatial association strength. The results from both models Show
the generalized latent variables model work well and consistent to one another and
also comparable to the results in existing literatures: It. concluded that the left-right
spatial association is the strongest and the spatial associations for across and up—down
are not different significantly at type 8 error rate between 0% and 2.5%. For the data
example, we have assumed that the Gaussian spatial latent process {(2,- : i = 1.71}
at higher level and {77,}: : j r l. ./,i : 1.11} at intermediate level are sufficient
to induce the. unobserved heterogeneities from the. data at corresptmding levels. It.
would be interesting to introduce non—Gaussian latent process to model the underly-

ing spatial dependence among quadrants and teeth nested within the corresponding

44

quadrant, which can lead to a richer class of the latent processes {62,- : i = 1, n}
and {Tij : j = 1, J,i = 1, 71}. Finally, our model selection is based on DIC and
it will be optimal when the model selection is simultaneous through Reversible Jump
Monte Carlo Markov Chain(RJMCMC) (Green (1995)) or Birth and Death Monte
Carlo Markov Chain(BDMCMC)(Stephens (2000)) . It will be more interesting to
consider the symmetry pattern of quadrants for a longitudinal study, which will lead

to the spatial-temporal analysis.

45

CHAPTER 3

Bayesian Finite Mixture of

Generalized Latent Variable

Models

3. 1 Introduction

As we have noticed in the above chapter that the dental showed a unique nested
dependence structure among the caries experience response variables for the teeth
of interest, which lead to a wide heterogeneity of distribution for the multivariate
spatially correlated binary response variables. Finite mixture of distributions have
provided a matl‘iematical-based approach to model various random phenomena with
the flexible distribution. It. is obvious that mixture distributions are extremely useful
in the modeling of heterogeneity in a cluster analysis context. It is of great interest
that. we can view the quadrant—wise multivariate binary response vectors as from a
certain number of underlying subpopulation or clusters. Each of the underlying clus-
ter is cl'raracterized by the corresponding underlying cluster-speciﬁc parameters and

some common parameter to describe. the marginal distribution of the binary response

46

variable with respect to the spatial configurations for each quadrant-wise response
vector. The spatial syn'nnetry among quadrants, in terms of the caries prevalence,
can be measured by the probabilities that two different quadrant-wise response vec-
tors will fall into the same underlying cluster that is indexed by a corresponding
cluster-specific multivariate distribution.

Zhang et al. (2007) proposed a Bayesian Generalized Latent. Variable Model
(BGLVM) to analyze the dental data from the STM project. Their approach used a
hierarchical generalized latent. variable model to take care of the multiple level nested
dependence structure of the dental data. The multiple level spatial latent variables
are used to generate a flexible multivariate distribution for the multivariate binary
outcomes and induce the unique nested dependence structure. The joint behavior of
the multiple level spatial latent variables are described by Gaussian undirected graph-
ical model with different ways to account for the covariance matrix structures. Spatial
symmetry checking was based on the partial correlation parameters of the graphical
models. Model implement and hypothesis testing are within Bayesian framework.
Since we know mixture model is very flexible method of modeling. it is interesting
to view the same problem from the mixture model point of view in stead of general-
ized latent variable model. It. is also very helpful to give a general framework to use
mixture model for analyzing spatially correlated multivariate binary data.

Fernandez and Green et a]. (2002) proposed a Bayesian mixture model to analyze
spatial correlated data, which gives an appropriate approach in the case of finite,
typically irregular. patterns of points or regions with prescribed spatial relationships.
The spatial association strength was assessed through parameters that are used to
adjust the variability of mixing weights in the mixture from one location to another.
Their approach is sensitive to Euclidian space, and can not take care. of multi-level
correlations induced by both ”between-cluster" and "within-cluster” spatial configu-

ration of the data. Fernandez and Green focused specifically on Poisson distributed

data with applications in disease mapping, which are quite different from the situ-
ation what the dental data are facing. For the estimation of the true risk pattern.
their approach is based on a contimiouslv distributed Markov random fields to model
the mixture weight for the correspoiiding con'ij’mnent via legit-normal model. They
did not consider other mixture components that can yield flexible distributions for
the outcomes and induce complex heterogeneity structure. However, their approach
introduced spatial mixture models as an interesting new tool for those modeling het-
erogeneity in spatial data. Zhou and Wakefield et al. (2006) proposed a Bayesian
mixture model for partitirming gene expression data. which is essentially an approach
of clustering the observed data by a mixture model with unknown number of cluster
inferred by the data. The aim of their research in which time ordered gene expression
data are collected is to determine genes that co—express, that is, have similar patterns
of expression, which provided a probabilistic framework for partitioning or clustering,
which naturally provides a measure of similarity among genes in terms of expression.
Under their approach. partitioning and estimation are conducted simultaneously, and
the number of partitions can be treated as a random parameter. which will give the
method a certain flexibility in applications. It is noticeable that as always for para-
metric l‘iierarchial modeling, the measures of uncertainty are only as reliable as the
model, so extensive model checking should be carried out in applications. It is nec-
essary to give flexibility to the mixture components rather than as what they did
via modeling a marginal parametric mean structure. Extension needs to incorporate
covariates at. various stages and other external information need to be taken into
account. It is also meaningful to give the framework for analyzing non-normal data
under mixture models for clustering.

The purpose of this article is to introduce a Bayesian Mixture of Generalized
Latent variable Model (BMGLVM) framework for general spatial topology structures

to explain nmlti-level correlatirms. The BMGLVM. implemented via Gibbs sampling

with non—informal.ive priors, allows us to model the ”between—cluster“ and ”within-
cluster” correlation structure explicitly. It is possible for us to examine the spatial
symmetry of quadrants in terms of caries incidence. and capture the special spatial
association structure among (‘pradrants for the same subject of interest and among
teeth within quadrants, which can help us efficiently characterize the pattern of caries

incidence at. tooth level.

3.2 The Spatial Dependence Structures

3.2. 1 Notation

To model the observations, let yijk denote the kill response variable within jth cluster
of ith subject of interest. where k = 1.....K.j = 1.....J.’i = 1.....n. Let yu-
(,i/,J1,...y,J-A., ----L‘/zjl\')’ denote the response vector within jlh cluster of i”). subject.
Let y,- : (11:1. ”U" flip/l, denote the collection of response variables of ill). subject.
let y : (y'l, (11;. 311,), denote the collection of response variables of all subjects in
this study.

A multinomial model is applied for the allocation process associated with mix-
ture models. let Qi 2: (Q21. ....ij, ""QiJ)/ denote the mixture component alloca-
tion random variables for the Hh subject, where Q?) : (ijl‘"'7Qf_jlll"“?iji"l),
and M is the number of mixture components in the mixture model. for i = 1, n
and j : 1,....J. It is assumed that Qij‘s are identically independently multino—
rnial distributed. For modeling the latent variables, we use conditional autoregres-

sive model. Let Tm, : (Ti 1(

. m.)

I . - x 7 l . ‘
.....Ti’Mm), ""Ti~1\"(r7'1)) dtnote the latent variables
associated with the mth. mixture components for the ith subject. of interest. Let
T,- : (Tl-'1, Tim. 'I‘I’V)’ denote the collection of latent variables at intermediate

level for the ff}? subject in the study. Let L r {(2:.Tl’ : i = 1.....72} denote the

collection of all allocation random variables and latent variables for all subjects.

49

3.2.2 Principles of our modeling approach

The dental data shows a two-level spatial association structures, i.e., the ﬁrst level
spatial association structures are among quadrant(V)-(VIII). For the convenience of
indexing the data, we will use quadrant(f) instead of quadrant (V) and corresponding
index for the others. The second level spatial association structure is. nested within
corresponding quadrant, the spatial correlation among teeth.

In general, the valid approaches for analyzing correlated data without explicit
multivariate distribution consist are based on either GEE and random effect models.
The former is suitable for marginal mean or pairwise associations between response
outcomes orientated statistical problems and the latter is for subject speciﬁc statis-
tical issues. The dental data is spatially correlated and has information about teeth
spatial configurations that need to be incarnated in the model to provide explicit
structure for inducing (lependernre among quadrants and teeth at their corresponding
levels. The main contribution of this paper is to develop a methodology to model this
unique spatial dependence of the deciduous dentition. There is no explicit multivari—
ate distribution available for the spatially correlated binary dental caries experience
outcomes. Mixture models(l\IcLachlan and Peel (2000)) are commonly used to are
generate flexible n'iultivariate distributions and induce unobserved heterogeneity for
correlated data with implicit multivariate distribution.

To take the unique spatial structure of dental data into account, we use two lev-
els of latent variables to take care of the spatial dependence of the teeth within the
mouth for each subject. At higher level, the mixture component allocation random
vectors {Qij : j = 1, ..., J} for the ith subject are used to allocate the quadrant-wise
response vector 3],] to its corresponding subgroup that is characterized by the mix-
ture components of the mixture nmdel. The mixture component allocation process
has the function to mix the multiple mixture components into a flexible multivari-

ate distributions and induce the dependence among quadrants. Given the mixture

component allocation process, the quadrant-wise response vectors {yij : j = 1, ..., J}
are cornlitionally nrutually independent. At intermediate level, conditional on the
allocation status of the mixture component process. we introduce spatial latent vec-
tors, {T,-m : m : 1,.....rlf.i : 1.....n}, that are used to tight the generate the
mth mixture component flexibly and induce dependence structure among teeth. The
joint distribution of this spatial latent vector is given by Undirected graphical Gaus-
sian model with spatial configurations of the teeth taken into account. The obser-
vations {llijk : k = 1,...,K,j = 1,...,J} will be conditionally independent given
Q,- and T,- for 2'. = 1. ...,n. Meanwhile, the intermediate level spatial latent vectors
{Tim : m = 1, .11} are conditional independent given the higher level spatial la-
tent vector Q, for i = 1. n. In order to assess the spatial symmetry of the caries
experience of deciduous dentition, we will examine the pairwise comparisons for the
similarity scores that will be defined later on. Due to the complexity of oral biological
system, we will give flexible covariance structure for the undirected graphical Gaus-
sian models and leave the number of mixture components to be unknown. A formal
model selection procedure will be used to choose appropriate mixture model for the

data.

3 . 3 Models

3.3.1 Bayesian Mixture Models

Finite mixture models with regression structure have a long and extensive literature
and have been commonly used. .\rcha(.-hlan and Peel et (if. (2000) gave a. very general
framework for mixture model with non-normal components to deal with overdispersed
data. Mixture models are used to facilitate the modeling of the heterogeneity from the
overdispersed and correlated data by generating flexible distributions of the responses
variable of interest and inducing dependence structures among response variables.

Conditional on mixture component allocation process Q”. for mixture model with

M components. 1], : (yf1,....,1/,j. 4:11,) has contribution to the likelihood as

1 AI
piyilQiigl 21—11 H1{7ijpm(yilezjm=116)}Qi‘jms (3'1)
]:1 711:]
where {7ij : m : 1,...,!l[,j = 1,...,J} are the mixture proportions and

pm(y1-J-|ij,,, = 1; 6) is the mth components of the mixture model.

It is known that the estimation for mixture models is straightforward using EM
algorithm but with difficulties and challenges. Bayesian estimation for mixture models
is feasible and well defined as long as the posterior simulation algorithm converges.
Key initial papers on the Bayesian analysis of 111ixture models using MCMC methods
include Diebolt and Robert (1091) and Escobar and West (1995). Provided that
suitable (proper conjugate) priors are used, the posterior density will be proper.
l’i'i'nBUGS can be used to provide valid posterior samples of the quantities of interest.
However, there are some difficulties that have to be addressed with the Bayesian
approach in the context of 111ixture models. First of all, improper priors might yield
i11’1proper posterior distributions. Secondly, when the number of components M is
unknown, the parameter space is ill-defined, which prevents the use of classical testing
procedures and priors. Finally, label switching occurs when some of the labels of the
mixture components permute. The effect of label switching is important when the
solution is calculated itm‘atively because there is the possibility that the labels of the
components 111ay be switched on different iterations. In this paper. we will discuss

the methods that have been proposed for overcoming the problems 111entioned above.

3.3.2 Response Nlodels

\Ve. model the kth response variable within the jth quadrant. of the 1th subject, yijks
which is a binary indicator of caries experience of toothi‘km. The response model is
specified hierarchically. At higher level, the mixture model (3.1) will give a flexible

multivariate distributions for the (uadrant—wise binary data “1 ~ : ' = 1. J and
l . . 1}

52

induce the dependence structure among the four quadrants. Simultaneously, there
exists mixture component allocation random indicator Ql- = ( 21, ..., ij‘ ...,QZJ)’,
where Ql-j = (Q01, ...,Qijm, ---»Q1jM), is a random binary vector with only element
being 1, for j = 1, J. 1‘ = 1, n. At intermediate level, condition on the 62,-, for
instance, QI'J',” : 1, i.e., yij follows the mth mixture component in the mixture model.
Meanwhile, there exists a spatial latent vector Tm, = (T1.1(m)- 71“,”). Tith‘mlll
that is used to tight the .1 binary response variables (yi11,...,y,J-k, ”.,}/UK), . The
joint distribution of Tm, is given by undirected graphical Gaussian model(UGCM)
with spatial configurations of the If teeth taking into account. Essentially, Tim is
used to gent—irate flexible multivariate distribution for the binary response vector and
induce. the dependence for .‘lz'j'k- Conditional on Q, and {Tl-m : m = 1.111}, the

binary response variable 9!ij can be modeled by an exponential family distribution

with the probability density function as the general form

"irrikyijk — bi (flunk)
arte)

 

+ Cify-z'jk» (9)},
(3.2)

Pruiyzjleijm. : 1~ T1,k(171)ia~'7e13199) : €XI){

where "hm/c : (1m 1+ .31. + T1.k(m) (McCullagh and Nelder et. (11. 1989).
\\'e assume that the link function I(/(-) is a canonical link that relates the mean of

gut. to a. linear predictor as follows

giEif/ijleijm : 1i flint/cl) : Think : “m + L3k + Ti.k(m)*

I . , ' - - ,
where a = (01,...,(1,,,,...,(1M) overall component mean w1th 11'1creasmg order
constraints and .13 = (431....,.‘3,‘ ..... ,.lK)’ are the regression coefﬁcients of general-
ized latent varial'ile models with constraints deA. = 0. Furthermore, Tim =

(T1.l(m')‘ TLHm), T11,K(m)) are the Gauss1an w1th mean a l() and L0\a1‘1d11(.( ma

trix {2’1" : m : 1......11}. we assume that {Qt : i = 1,...,n} and {Tim : m =

1. 311,2'. 2 1, ...,71} are mutually independent, which relate. the response variables

53

to quadrant—speciﬁc and tooth—specific covariates and the latent variables.

Under the mixture model and latent variable model approach, we can assume
that the response variables are conditionally mutually independent,given the vectors
of latent variables L : {L1 : (Q:,7',fl....,Tlf7,l.....TlfMy : 1' = l,...,'n,}. The joint
probability density of g conditional on the set of latent variables L and {7r’, (1’. 3’. 99},

I

where 7r = (7n, 7r], ...,7r’J) with 7rj = (7Tj1, ...,7rjm, ...,7rJ-M)’, is as follows

.~ ,. K , Qijm
p(y|Li7rli(-Y’1.JI~.Q) : Hj‘jvnj {Tb/.7", szl [)r’l(ytjlel]nl : 1" Tz.k(m)‘- 99)}
_ K "I‘ ky-v'k—bifﬂv- k) .
=<‘XI){Z,-.j,m{01171.{10st7gml+Zt.—.1l 1m ”(,..-(oi u" “aft/0am] .
(3.3)
, x , , X n J M . n J A! ,
where n'ijm and Zijm denote Flizlllj21Hn1.:1 and 21:12]”:12171=1 corre-

spondingly.

 

3.3.3 The Structure Model for Latent Variables

In the response model, given the two levels of latent variables, the conditional inde-
pendence assumption allows the specification of complete likelihood for the response
model. 111 our modeling approach, the two levels of spatial latent vectors are used
to induce the dependence structure of the teeth of interest. I11 order to incorporate
appropriate spatial latent vectors into the model. we need to choose the ones that
can really represent the design structure and characterize the random mechanism of
data generating process. The objective of these latent processes is to generate flexible
distributions for observatimis and induce the dependency among observations.

At higher level, it is assumed there exist independent mixture component alloca-

W/ 'I I .
tion plocesses, say, Q111m‘szj‘szJv w1th

er = (Qijlw-injni-s-'-1Qij111), ~ Mllvll‘.i.1(1,7rj), j: 1....,J,i : 1, ..,.n (3.4)

At intermediate level, we will follow the approach in Zhang et (11. (2007) by incor-

porating appropriate spatial. latent vectors to formulate flexible mixture components.

 

 
     
 

Incisor
yiﬂ

UGGM Ti'm : (711.1011): Tithm): 71,3071)» Tidhn): 7115071)),

 

Figure 3.1. The response variables yiﬂ. ij'Q. yiJ-g, yij4 and yijg, are al-
located to the mth cluster and tighten by spatial latent vector Tim =
(T‘.1(m)1T'.‘2(m)~71,3071)»71.-1(711)~Ti.5(m))l whose joint distribution is given by UGGM

with unstrluctured precision matrix.
Undirected Graphical Gaussian Models (UGGMs)t are used to give the joint distri-
bution for the spatial latent vectors. The UGGMs will take the spatial conﬁgurations
of the teeth within quadrants into account. The spatial conﬁgurations of the teeth
within quadrants are as below.

As shown above graphs, the ﬁve teeth within each quadrant can be viewed as
ﬁve nodes in a graph. If two nodes are not directly connected, they are. said to be
conditionally independent given the other nodes in the graph. For the mth mix-

ture component, a UGGM is used to describe the spatial conﬁguration of the nodes

55

 

 

 

 

 

Incisor Incisor Cusbid Molar lylolar
92.31 9112 “ya-3 yzj4 —“ 311.15

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

UGGM 73m : (Ti,l(m)7R,2(n1.)17i7,3(m)1n.4(771.)171,5(7n))’

 

/ ’ /’" /’ ‘fx /" \ /"‘\
1 11) f 12, ( l3) ( ll ( 1')
p/ \_// \d/ V V

Figure 3.2. Note: The response variables 311'an ij-Q, yij3, yl-Jq and yl-J-5 are
are allocated to the mth cluster and tighten by spatial latent vector Tim =
(Ti11(m),Tz-,2(m), Ti‘3(m),T,-‘4(m), Ti,5('m)), whose joint distribution is given by UGGM
with precision matrix under CAR, model assumption.

and manifest the associations among nodes of interest by assigning random variables
‘ . ... . . _ .,. I ' 3‘ ' - .' . '_
TIT” -— (Thum).T,-‘2(,»,,).Tj‘3(,,,).T,‘,-1(,,,).Thom”) to the nodes in the graph. Mean

while, {Tm : m = 1,111} are 111utually independent conditional on Q). A UGGM

assumes that

Tim I (T‘,1(7‘1‘1)’T'.2(m)’ T’,3(71‘1)’T'.4(m)’Ti,5(m))l N N(O, 2573'), m = 1’ ‘M’ (3'5)

L I. l l

where 9’3 is a synnnetrical and positive deﬁnite matrix for m. = l, ..., M.

We know Gaussian random variables are determined by the ﬁrst two moments. For
the identifiability, we already assume the mean structures of the two levels of spatial
latent variables are. vectors of zeros. then the problem will become issues about the
cove-triance structures {2’77 : m : 1.....11}. A general covariance matrix will be
unstructured with syn'nnetric and positive definite constraints. The unstructured
covariance matrix can be simplified if we assume Markovian properties for the nodes,
somehow as shown in the second graph. The Markovian type covariance matrix can
be incorporated within spatial statistics by CAR model (Cressie (1991)). The choice
of the two types of crwariance structure for the spatial latent vectors at tooth nested
within quadrant level is made through model selection in Bayesian framework via
Deviance Information Criterion(DIC) for missing data problem proposed by Celeux
et al.(2004), which is an extension of the DIC introduced in Spiegelhalter et al.(2002)

for Bayesian model selections.

Undirected Graphical Gaussian Model

In this section. we review the graphical Gaussian model (Dempster,197‘2) required
for this paper. Let C : (LE) be an undirected graph with vertex set V =
{1....,A',...,I\'} and edge set E : {Ck/J : 1.? 7f Af' : 1,...,I\'}, where EN." = 1
or 0 according to whether \A'ertices k, and k', 1 S k # k’ s K are directly con-
nected in G or not. In the undirected graphical Gaussian model, the edges set

describes the associate structures of the vertex set. Random vector is assigned

‘1

Cf!

to edges set to represent the association strength between corresponding vertexes.
The undirected graphical Gaussian model consists of all k dimensional normal dis-
tribution, say X = {X1,...,Xk,...,XK}, with X N N(0,Z) and precision matrix
Q = 2‘1 = {wkk’ : k 7é I." = 1, ..., K}, where Z is unknown but satisﬁes the follow-
ing restrictions in terms of the pairwise conditional independences determined by the

Markov properties (Drton and Perlman (2004)):

(DA/L, 20¢ Xk _.l._ X,M\{ Akl}' Vk- #A.,—_— ...,A,.

Conditional Autoregressive Models

For the vector of univariate \i'ariz-ibles 1/ = (V1,V2,...,I/K)’, the zero—centered
CAR speciﬁcation, where K is the. number of spatial nodes of interest, following

Cressie( 1991), sets

(”UV—A1202) ~ ‘\7(/ )Z bIc'A’Vk10L-likzl1"‘1K1 (3.6)
)Vk’EV—k

where V—k z 1/\ {wk}. Following Brooks (1964) lemma the resulting joint density for
1/ takes the form

f(1/|02.')ocerp{— éI/TD: 2(I— pB)1/,} (3.7)

where B is K x K matrix with B : {bkk’ : k 7é k' = 1,...,K} and bkk = 0,Vk =
1,...,K and D02 11s an K x K diagonal matrix with non-zero entries {02 : A: =
1,. .,IX }. The precision mat11x D; 21(—I pB) need to be sy mmetric which yields the

conditions

bkkmi, : bpkaz; W. k.’ = 1, K. (3.8)

If the precision matrix is positive definite, then (3.7) is a proper distribution. Un-
der above parameter1/at1ons the. p1er1s1on matiix D; 21(1 — pB) is nonsingular if

GOV—11in Afﬂlu.) where ,\,,,l-,,,/\,,,,a,l; are the smallest. and largest eigenvalues of B

respectively. It. usually assumes that the D02 2 0211!, where M is diagonal matrix
with diagonal elements 111k); proportional to the conditional variance of 0%. 02 con-
trols the overall variability and p represent the overall spatial association. Weights
matrix B with Bkk’ need to reflect the spatial association between nodes k and k’.
GoeBUGS(‘2004) sets BA-k’ = b“; = 1/n.k, for k # k’ and M“. = 1/nk where ”k

is the number of nodes which is adjacent to node k. Under the above settings. the

spatial latent vector 1/ will follow a proper distril’mtion, i.e.,

1/ ~ N(0,(72(I — pB)*11‘lI). (3.9)

3.4 Bayesian Estimations and Statistical Inference

3.4.1 Identiﬁability of the models

Based on the framework of the mixture of generalized latent variable models, we
have to deal with the model identifiability issues at both mixture model level and
generalized latent variable level. At mixture model level, we need to deal with label
smite/1.11119 issue. The interchanging of component labels is generally handled by a

constraints on the mixing proportions of the form

W‘j]STf-j2£m<7rj\l‘ j:1,...,.].

or on the component means of the form
0'13 012 S ~13 0.11-

Frequently, models with latent variables are not globally identiﬁable. One can inte-
grate out the latent. spatial variable vectors to obtain a marginal likelihood to assess
whether parameters are redundant. The contribution to the likelihood from the latent.
variable model is parameterized by {25’} : m : 1, M}. The identifiability problem

become to examine if the parameters involved in the covariance are redundant, which

59

might be problematic within frequentist’s framework. Dawid (1979) and Gelfand &
Sahu (1999) discussed model identiﬁability issues within Bayesian framework. In par—
ticular, Suppose that the Bayesian model is denoted by the likelihood L(6; y) and the

prior f (0) and we partition the parameters of interest. as 0 = (91,02). If

#921914!) = f(92l91)1 (3.10)

then we say that 92 is not identifiable, where f(62|61,y) (x L(61,92: y)f(92|01)f(91).
That is, if observing data y does not increase our prior knowledge about 6-2 given 61,
then 62 is not identiﬁable by the data. Dawid’s formal deﬁnition of Bayesian model
nonidentifiability states that 192 is not identiﬁable if and only if L(01, 92; y) is free of
62. In order to make our model identiﬁable, we need to not only take care of marginal
identiﬁability of the model through integrating out the latent variables, but also put
some constraints to the covariance matrix of the Gaussian spatial latent vectors at

both levels.

3.4.2 Prior distributions

In this section, prior distributions are chosen for the parameters 6 : (7r’,o1’,d')’ and
and association parameters 6. The priors are assigned hierarchically to the corre-
sponding parameters of interest. Gibbs sampling algorithm is applied to simulate the
samples from the posterior distributions of the quantities of interest. At higher level,
McLachlan and Peel (2000) used a non-informative conjugate proper prior to mixture

proportions in the form:

7Tj =(Wj1111'17rjm111'17rj111),N DfTiChl€l((t,91,”WHO/(1),), ]=1,...,J, (3.11)

where (1,91, ..., 9911 1) is the weights vector for the mixture proportions. At intermediate
level, Zhao et al.(2006), Zeger et al.(1991) and Dunson et al.(2000) all suggested

noninformative conjugate prior distributions for the parameters of interest, which

60

can wash out the effect of priors as sample size increases. Bedrick et al.(1996) noted

that normal prior distril‘mtions were suggested for the logistic regression coefficient 9.
((1,, 3’), ~ 1N"'(11, F). (3.12)

where. 1.1 is the a vector of location parameters 71' and F is the covariance matrix. It
is common to take 11. as vector of zeros and F as diagonal matrix with very larger
entries. We are interested in the joint posterior distribution of (9,5Iy). Under mild
condition in (Geman and Geman et (1!.(1984)), Gibbs sampler can obtain the joint
posterior distribution by sampling from the conditional posterior distributions (Oly, .f)
and (ﬂy. 6) correspondingly. To simplify the sampling from the conditional posterior
distributions, we choose hierarchical independent priors for 6 and g in this hierarchical
Bayesian model, i.e. (€|y,6) = (ély), which is true as long as the priors satisfy
p(6, 5) = p(6)p(€). We proposed two covariance structures for the Guassian spatial
latent variable models. In the generalized linear model setting with Gaussian random
effects, the proper noni11formative conjugate priors will be Inverse Gamma(IG) for
signal variance compmient and Inverse Vi'ishart distribution for a variarice-covariance
matrix.

Based on the relative relationship among nodes in the graph of the UGGM, we
give noninforn'iative priors to precision matrix parameter correspondingly.

(1) Unstructured precision matrix in the UGGiV‘l:

For the ith. subject, conditional 011 the higher level spatial latent vector Q), the in-
termediate level spatial latent vectors {Tm : m. = 1, M} are conditionally indepen-
dent. So, we give independent priors to the precision matrix {Sle : m = 1, .11}.

Similarly, independent \Vishart processes are assigned as priors for these precision

matrixes.

”Tm ~ II’IIsltartUJTm,ATm): m = 1,111, (3.13)
with the degrees of UT,” 2 ran.k(ETm) + 1 and the precision matrix ATm =
1

=1, , ., . .
tr771 KLTI'H

61

(2) CAR model based precision matrix in the UGGM:

For the ith subject. conditional on the higher level spatial latent vector Q,, the
intermediate level spatial latent vectors {Tim : m = 1, M} are conditionally in-
dependent. So, we give independent priors precision matrix {QTm : m = 1, AI}
that are parameterized by {03,.p,-,, : m = 1, ...,M}. Sin'iilarly, independent Inverse
Gamma (Dunson et al.(2000)) distributions, proper conjugate priors, are assigned
as priors to the overall variation parameters {0%, : m = 1.1111} and independent
uniform distribution with supports constraints in section 3.3.2 to the overall spatial
association parameters {pm : m = l, 1M}, improper priors 0608 U GS (2004) for the

over quadrant specific spatial association parameters, respectively.

0,2,, ~ IC(5,€); m = 1, 111, (3.14)

and

p,,,~U(A—1 ,A;,},,); m=1,...,1\I, (3.15)

min

where 5 is very small positive number and A7711", A3,, are as defined in section 3.3.2.

3.4.3 Posterior computations

Let (7r',a’,t3’,Q’,T’, ET)’ denote the current state of the Markov chain. We will
follow the steps (1)-(3) to obtain the posterior samples of quantities of interest from
their posterior distributions. McLachlan & Peel (2000) and Fernandez & Green (2002)
gave a general posterior sampling algorithm for the mixture model.

Step 1: Posterior sampling for mixture proportions;

I . .,V f v I
7rj = (7Tj1, ...,7rjm, ”Haj-M) ~ Dzr'zchlct((t,91+i\'J~1....,p,~,,+ Ajm» ..,.,QM + .N'jM) )

(3.16)

02

where NJ-m = Zilez’jm for j— — 1,. J. m = 1,...,M and to = (3.91....,t,oM)’ is a

known weight vector.

Step 2: Posterior sampling for ‘misrture component allocation random variables;

Qij = (Qij1~---~szj.~tlll ~ Aluuiﬁlils(lew-w'rjjll’), j=11i=1,...,n,

Wj-,71Pn2(1/1J'IQU,71:1 Tim 6)

M , with
2),:1 ”jkPA-(yzlerjk: lviTk 0)

 

where TJ-m :-

K
Pmil/ileijm Z lsTimi 6) = H th‘lijleij-ni = 1vTi,k(rn.)i6)»
k=1

and p,,l(y,J-k|Q,-j,,, : 1,T,‘A.(,,,)20) is defined in (3) with
[091% (ytjlelJln— — 1 TLk(lll)'6)} : (1,” +113k + Tl.k(nl)'

Step .J. Posterior sampling for generalized latent variable models.

Conditional on the mixture component allocation process at higher level. the pos-
terior distributions of parameters and latent variables in the generalized latent vari-
able model can be obtained in standard way (Dunson et. al.(2000), Zegeret al.(1991)).
Given the precision matrix {Sle : m : 1,111}, the joint posterior distribution for
the regression parameters and latent variables at intermediate level is

19(9aTlQay; 7r) CX rat/162,71 y; 9)f(9. T)

0< eXI) {2:th Qijm {loam-m) + Z}.- Qijmi "jinikyykv—bimmk) + Cin‘jka W} l}

“1159)
’11
X exp { é—QIF 16 — 21:14:12172—1Tilﬂr/iQTmTlm}

 

(3.18)
where Elaine denotes 3:12)]:1 ,,,_1, f() denote the joint prior den-
sity, Q r (Q'l.....Q:....Q:,)' with Q, : (Q11.....Q’......Q:J)’ and Qt} :
(Qijbw-sznr~~~QijzlllI~ T : (Tl/...,,Tll.....T,’,)l with T1: : (Ti/1’""Tilrn‘m‘Ti'IJU),

and Tm. = (Tr.1(nz.)v"-~T1.k(m)*"'~Tz.1{(m)l’- it} 6 = (0".-l’~<2)’ and

Tltmk = (rm. + BA. + Ti.k(1n)- In practice, we set tr. to be vector of zeros and F to

63

be diagonal matrix with large diagonal elements. If the MCMC algorithm is a Gibbs
sampler, the full conditional distribution of each of the unknowns in (19) needs to
be speciﬁed, which can be obtained in a standard way Dunson et al.(2000), Zeger et

al.(1991)). For the fixed eflect 6, the full conditional distribution is
PWIQ, T» y; 7r)
71v kyrk—HTIN k)
0( exp {Zi,j.m Qijm {lOg(7ij) + 2k. Q'ijm{ (III Oat-($9; 2m + Ci(yijk~ 99)} }}
>< exp {—%6'F_16}.

 

(3.19)

The full conditional distribution for the Gaussian spatial latent vectors Tim is

P(T2'.m lQ y; 71'. 6)
Il' Amt/j ‘A‘Wbtll? k)
0< 9X1) {211,3,?,»))Qijn2. {log(mml + Zk szm{ ”n U I n" + 02(yijk»¢)}}}

(1)029)
X exp { 1 ZN?"1H72S2T772 Tim}

 

(3.20)
The. full conditional distributions of precision matrix {QTm : m, = 1,211} can be
obtained in terms of different precision matrix structures correspondingly.
(1) Unstructured precision matrix:

memo T 3) 7r 6). — U’isllalﬂuTm + Nm,ATm + Zr T,’,,,) (3.21)
21:1

. T J l ' J
Whele 17V") : ijl ern I 2?:121'2162213'712-
(2) CAR model based precision matrix:
(2.1) Overall precision parameters:
[)(Tm IQ T. y: 7r. 0)
_ ”(mt y, A bill/21221;)
“x CXp{.£\—_:1j.222(»)1jlt?{1(lgiT’TJIIIll‘L 2k szm{ J “2qu ) + ci(l/2jlr~ ‘29)}

x 75,4 exp {-TmE}.

 

(3.22)

64

(2.2) Overall Spatial parameters:

p(p,,,|Q,T,y:7r,0)
77' 'ky'k’bl7l‘ k)
0< exp{ZiJ,1nQij'n2{1()g(7fj221)+ 2k Qz'jmf m U l H" +Cz(3/2jk~s9)}}}

aim

 

X[(/\—1 A—l )(pml-

mt'n.’ mar

(3.23)

All the posterior distributionS, except for {Pm : m = 1, , ..., M}, are proper based

on their proper conjugate priors. The uniform priors for the overall spatial parameters
are not conjugate, which might lead to improper posterior distributions. The simplest
technique for verifying if the posterior distributions of the parameters is proper is to
verify if the posterior distribution is proper for reduced data by discarding all but a
single outcome per subject leading to a reduced data set consisting of independent
outcomes, are proper (O’Brien and Dunson, 2004). Since the covariance structures
do not appear in the reduced data likelihood and also the support for the spatial
association parameters is finite, i.e.» {pm 6 (A5171‘,A;,,j1x) : m = 1,...,M} , so the
posterior distributions of the spatial association parameters {pm : m = 1, ..., M} are
proper. The algorithm for the posterior computation is through sampling 7r, Q, 6, T,

and 5 respectively from the above conditional distributions.

3.4.4 Bayesian Model Selection

The formal procedure for choosing an appropriate Bayesian hierarchical model for
the observed data necessities methods to compare alternative models within Bayesian
framework. The DIC (deviance information criterion, Spiegelhalter et at. (2002)) is
a hierarchical modeling selection criterion that can be viewed as a generalization of
the AIC (Akaike information criterion,Akaike, 1973) and BIG (Bayesian information
criterion, Kass and Raftery, 1995). It is particularly useful in Bayesian model selec-
tion problems where the posterior distributions of parameters have been obtained by

Markov chain Monte Carlo (MCMC) simulation. The DIC-statistic is a measure of

65

model complexity and goodness of ﬁt with the definition as

DIC = 0(0) + pD,

where D(29) is the deviance given the model parameters 29 = (7r’, 6',§’)’, deﬁned as

0(0) = -210s(p(yl'lt)) + 210g(h(y))~

where h(y) is some fully speciﬁed standardizing term which is function of the data
alone. —(_19) is the posterior mean of the deviance, a measurement of goodness of ﬁt
of the proposed model for the observed data. D65) is the deviance evaluated at the
posterior mean of 29 and p0 = W - 13(5) is the effective number of parameters
in the model, a penalty for the complexity of the model. The quantities BET) and
D02) can be trivially computed from an MCMC simulation chain. Rather than the
conventional DIG introduced in Spiegelhalter et al. (2002), our hierarchial models
containing two levels of latent variables, which necessitates the model selections to be
based on the DIC for missing data problems (Celeux et (21.2006). MCMC methods,
such as the Gibbs sampler, can be employed conveniently to produce posteriors for
parameters that are marginalized over latent spatial vectors. We computed the com-
plete DICS (Celeux et al..2000) by using the MCMC simulation results to get both
the measurement of goodness of fit and the number of effective parameters associated
with each models and used these statistics to select the most appropriate model. In
terms of our problem, we have to deal with latent variables to get a complete DICs.

In order to compute the complete DICs, Celeux et at. (2006) gave a deﬁnition
of complete data DIC, by deﬁning the complete data estimator Ei)['29|y.q,t]. which
does not suffer from identiﬁability problems since the components are identiﬁed by

(q’, t’)’, the realization of the spatial latent vectors (Q’, T')’, and then obtain DIC for

the complete model as

01004.0 = -4Eol10g(p(y,q,tlﬂhlaqil + 210g(P(y,q.t|E0l?9|ytqa tl))- (324)

66

As in the EM algorithm, we can then integrate this quantity to deﬁne

DIC EQTlDICU/«Qleltll
= —4Eu.Q.Tl10g(P(U Q, T|0))|;Ul + 2EQ.Tl10g(P(U Q, TlEull’lU Q. Tl)l|Ul-

(3.25)

II

More. speciﬁcally. notice that

a Q Tl10s(P(U Q Tltllll‘Ul: Eu {EQ(ETlP(10g().~.U Q TM) |U Q'Elly 19 )IU }
— E.) {ET(EQl10s(I(.U Q Til”) |;U T I9HU l9)| )IU}
(326)

also

l()g(p(y. Q Til”) I 2) j m {szm 103(7 7TJ'mp( Timlzm) )}

(3.27
+ Zed, m. k {(1)sz log(prll(szAlean— “‘1 Ti}. k(m)- 6>)}a )

where pm(y,§jk|Q,-Jm : lvTi.k(m)19) is given in (3) and p(T,-.,,,|ZTY’3) is given in (6).
Interchanging the order of Q and T in the integrations by Fubini‘s theorem , we can

have

Er).(,).Ti10g(/(y~Qa T|0))|Ul
= E.) {ErtEQllosUtUQ.TlU))l.U- T: Ulllth’HU}
: Er) {ET(Zi.j.7n {EQ(QiJmi(/~Tiu) log(ﬁjmpiTimisznD} il/ﬂliy
+E1) {ETizthmk {EQinJmiy- T: (9) logilhnlyzjkiQijm : 1- Tim: 9”} lye 0)iy}
(3.28)

where Eq(Q,;J-,,,|g. T: U) is given as below:

ﬂjllll)lll(ytintjlll : 1~Ti17236)
.U , ‘
Zkzl let'IJA‘iylleijrlz : leikiB)

 

EQinJrni!/~T3 U) : PlQiﬂn : 1i.‘/~T3 0) :

with

k.

PruiyiJlTimig =HW<llzjleijr712 1 Ti}. ,k(m)v 6),

and

EQ,:rllOg(p(U Q~TIEUWIU Qle))|.Ul
: ET {EQ[log(p(y, Q, TlEﬂjﬁjlei Tl))lUl} A
= ET {Eu/2m {EQiQiJm II]. T; 19) log(T'F/JTNIATW” IZIYQD} Ill}

+ET{Z(.].I‘H.A‘{E(J(Qij77llt(/'Ti19)log(1)?”(ii/(jkin'an = 1» Tim's 6”} ly}
(3.29)

where 0, {7T}; : J = 1, J. m. = 1. M}, {E37 : m = 1, ..., ill},éa1‘e posterior means
of i9, {71an :J: 1, J,m : 1,...,M}, {$3173 : m. = 1, ...,M},0 correspondingly.
All the integration can be obtained routinely by Monte Carlo integration approx-

imation using the MCMC posterior samples in the coda ﬁle of W-inBUGS.

Spatial symmetry hypothetical testing

The spatial symmetry property in our problem means the. joint caries experience pre-
sentations for response variables at quadrant level are highly associated with one an-
other. Dentists do believe that spatial symmetry exist in mouth. Lesaflre et al.(2006)
showed empirically that the caries experience for left and right quadrants are more
strongly associated than the other cases. Unfortunately. few literatures have discussed
this issue comprehensively. The mixture of generalized latent variable models pro—
vides a way to examine the spatial symmetry of the four quadrants in terms of joint
caries experiences. Under the mixture model, if there are two quadrant-wise binary
response vectors yU : (,zjiﬂ,...,iJ,-jk,...,;IJ.,J-]\')’ and yU-I = (yiJ/l‘""yij’k"”’yij’K),
who have the exact joint probatnlistic behaviors, then the mixture component
allocation processes will always assign the two binary response vectors to the
same mixture component. Speciﬁcally, the mixture component allocation processes

211:1 P(Q1Jm = Qij’m == lly) = 1. Hence, the strength of the sin'iilarity of two

quadrant—wise responses vectors that. is deﬁned by S J- J" say. the J'HI quadrant and

68

the fill quadrant, can be measured by the below quantity

nM

51]., Z 1 Z Z P(Qijm E Qty/m. = lly) (3-30)

TL i=1 m=1

Hypothetical testing for pairwise comparisons among spatial association strength
parameters.

In order to assess the spatial symmetry of the four quadrants, we need to in-
troduce (.lill'erent ”Neighborhoods” relationships that can explain the relative spatial
structures of the quz-idrants of interest. Spatial syirin‘ietry is assessed at the quadrant
level, instead of tooth level. At. quadrant level. We deﬁne the vector of teeth to be
”Horizontal Neighlmrs” to each other, if the two quadrants are both in either ”Up-
per Jaw” or ”Lower Jaw”, and to be ” Vertical Neighbors” to one another, if the two
quadrants are both in either ”Left Jaw” or ”Right Jaw” and to be ”Across Neighbors”
to one another, if the two quadrants are either in ”Left Jaw” or ”Right Jaw”. The
assessment of quadrant spatial symmetry in terms of cries prevalence will be based
on ”Left—right”, i.e., Horizontal Neighbors”, ”Up-down”, i.e.,” Vertical Neighbors”
and ”Across”, i.e., ”Across Neighbors”.

There are two ways to assess the spatial symmetry among quadrants in terms
of caries prevalence incidence through statistical hypothesis statement. The ﬁrst
one is based on the so called ”overall” spatial symmetry assessments via a weighted
statistic and the second is the so called ”speciﬁc” spatial symmetry assessment that
is the direct (:ton'iparisons of the spatial symmetry measurements.

First of all. the weighted statistics for assessing the overall spatial associations in

terms of ”Left—right”. ”Up—down” and ”Across” can be formulat ed as below:

1
SLR = 5(356 + 578);

l
SUD = 72’(867 + 3.58):

69

1
SA = §(568 + 557)-

The statistical hypothesis testing about the overall spatial association in terms
of ”Left-right” V.S. ”Up—down”, ”Left-right” V.S. ”Across” and ”Across” V.S. ”Up-
down” can be formulated as follows:

(I) Left—right l-"ersus L’p—doum

H0151]; = SUD VS. Ha I SLR # SUD; (3.31)

(2) Left-right Versus Across

HOZSLR=SA v.3. IJaZSLR?é SA; (3.32)

(3) Across Versus Up-down

H0 2 SA = SUD V.S Ha. 3 SA 35 SUD' (3.33)

Secondly, if the assessment is based on the direct comparisons of spatial syn‘imetry
measurement, there are twelve possible hypothesis testing situations for the spatial
symmetries in terms of partial correlation between quadrants.

(1.1) Left-right Versus Up-down The association between quadrant 5 and quad-
rant 6 V.S. the association between quadrant 6 and quadrant 7, with quadrant 6 as

reference.

H0 : S56 = 867 v.3. Ha : S56 7E S67; (3.34)

(1.2) Left-right Versus Up-down The association between quadrant 5 and quad—
rant 6 VS the association between quadrant 5 and quadrant 8, with quadrant 5 as

refererite.

70

 

 

H0 2 S56 = 558 VS. Ha : S56 75 S58; (3.35)

(1.3) Left-right Versus Up—down The association between quadrant 7 and quad-
rant 8 V.S. the association between quadrant 6 and quadrant 7, with quadrant 7 as

reference.

H0 : S78 = 867 VS. Ha : S78 # 867; (3.36)

(1.4) Left-right Versus Up—down The association between quadrant 7 and quad-
rant 8 VS the association between quadrant 5 and quadrant 8, with quadrant 8 as

reference.

H0 2 S78 = S58 v.5. Ha : S78 75 558; (3.37)

(2.1) Left—right Versus Across The association between quadrant 5 and quadrant 6

VS. the association between quadrant 6 and quadrant 8, with quadrant 6 as reference.

{103856 1‘ 868 VS. Ha 2 S56 7Q 868; (3.38)

{ 2. 2) Left—right Versus Across The association between quadrant 5 and quadrant 6

VS. the association between quadrant 5 and quadrant 7, with quadrant 5 as reference.

H0 : S56 = S57 v.3. Ha 1555 75 S57; (3.39)

( 2. 3) Left-right Versus Across The association between quadrant 7 and quadrant 8

VS. the association between quadrant 6 and quadrant 8, with quadrant 8 as reference.

H0 : s78 : 568 vs. H, 1578 s 568; (3.40)

(2.4) Left-right Versus Across The association between quadrant 7 and quadrant 8

VS. the association between quadrant 5 and quadrant 7, with quadrant 7 as reference.

71

H() I S78 = S57 VS. Ha I S78 74 57: (3.41)

(3.1) Across Versus t'p—dowu The association between quadrant 5 and quadrant 7

VS. the association between quadrant 5 and quadrant 8, with quadrant 5 as reference.

H0 2 S57 = S58 VS. Ha 2 S57 7é 858; (3.42)

(3.2) Across Versus U p-dow'n The association between quadrant 5 and quadrant 7

V .S. the association between quadrant 6 and quadrant 7, with quadrant 7 as reference.

HO : S57 = S57 V.S. Ha 2 S57 75 867; (3.43)

(3.3) Across Versus Up—do'wu The association between quadrant 6 and quadrant 8

VS. the association between quadrant 5 and quadrant 8, with quadrant 8 as reference.

HO 2 Sm; = $58 VHS Ha I 868 f 558: (3.14)

{3.4) Across Versus Up-(z’ourn. The association between quadrant 6 and quadrant 8

VS. the association between quadrant 6 and quadrant 7, with quadrant 6 as reference.

HO : 568 2 S67 VS. Ha I 568 79 S67. (3.45)

Simultaneous credible intervals

Pairwise spatial synnnetry hypothesis testing is based on credible intervals for the dif-
ferences between two partial correlations corresponding to two different nodes (quad—
rants) in the UGGM. In Bayesian statistics, a. credible interval is a posterior proba-
bility interval. used for purposes similar to those of conﬁdence intervals in frequentist

statistics. Suppose that parameter c is of interest, a (1 — o)lOO% credible interval

72

for the parameter c of interest. is any set C such that Pﬂgmﬁq E C) = l -— (i, where
7r(g|y) is the posterior distribution of parameter g given the observed data y.

Since we are performing a multiple spatial symmetry comparisons among quad-
rants in terms of all possible hypothesis testing situations, it is necessary to give a
simultaneous credible regions (Besag et a1. (1995)) to control type S error rate (Gel—
man et (11.), i.e. the similar concept as type I error rate in frequentist’s framework.
The 100K /M ”/0 simultaneous credible regions is based on order statistics (Besag et

a1. (1995))

{[(s, — s,pW+1—’*l, (s, — s,,)li*l] : (1,11) 6 Neighborhood},

where
1* :min{1:#{(Sl—S[])l‘l[+1_t*l g (s,~s,,)(’) g (s, — s,,)l’*l} 2 K}.

and {(S, — 311)“) : t : 1,....;l-1.(1,11) E Neighborhood} are the posterior
samples of {(51 ~511) : (1,11) E Neighborhood}. Here, Neighborhood :-
{(”LR”,”UD”),(”LR”,”A”),(”A”,”UD”)}.

Similarly, the IGOR/111% simultaneous credible regions for speciﬁc spatial associ-

ations difference are given by

{l(S11/ — Sj]’)[”\[+1_t 1‘ (811’ — 8.1.1,)“ 1] il7é1’,j#j’,(1,ll,)#(j.]l),l,j:1,...,J},
where

1* : 1))i11{1‘, ; # {(51.11 _ Sjj,)(31+l——f*} S (Sn-I _ Sjj’)(1) 3(511’“ Sin/VFW} 2 11'},

Ellltl {(51-11 — Sjj’)(’) :1 : l....,1)1,’1.7é1/j#jl((1/)7é (J.JI)I*1 : l,...,.]} arethe

posterior samples of {(SU-I -- SN") : 2' 75 i’.j 75 J’. (1'. 1") # (J,J'),i,j : 1, J}.

Table 3.1. Prevalence. of caries experien(‘re(% affected) in the deciduous dentition of
6,7.8-year-old children n=l.351.

 

 

 

 

 

 

 

tooth 55 54 53 52 51 (j 61 62 63 64 65
Prevalence 8.92 5.20 0.74 3.72 7.81 (1 7.06 2.23 1.86 5.20 8.55
tooth 85 84 83 82 81 [j 71 72 73 74 75

 

Prevalence 10.78 13.75 1.12 0.74 0.37 H 0.37 0.37 0.37 11.15 9.67

 

 

3.5 The Signal Tandmobiel Project Example

In the Signal-Tandmobiel project, there are 4.468 schoolchildren who were among
6.7.8-ycar—old, (born in 1989) from 170 schools in Flamglers (Belgium) and were se-
lected by a stratified clustered random sample. The mean age of the children on the
day of exaniiinttion was 7.1 years (SD = 0.4). The 15 strata were obtained by com-
bining the 3 types of educational system (public, municipal and private schools) with
geographical areas (the 5 Flemish provinces). The schools represented the clusters.
This sample represents about 7% of the corresponding Flemish population. The sam-
pling procedure. aimed at selecting each child in Flanders with equal probability. A
more. detailed description of the design of the Signal-Tandmobiel project is reported

in Vanobbergen el, (11. (2000).

3.5. 1 Primary results

The population prevalence data of caries experience in the deciduous dentition at the
tooth is shown in table 1 for the 6.7.8-year-old children. The (‘lescriptive observations
suggested a symmetrical distribution of caries experience at the population level.

In Vanobbergen cf (11. , the. Null hypothesis of population symnu‘try at tooth level
was tested for all deciduous molars. The results are shown in table 2.

The above result shows that it is left-right spatial syrmnetry is the most notable.

7-1

 

Table 3.2. Odds ratios and 95% conﬁdence intervals for the 2x2 association models
for caries on deciduous molars on tooth in 7-year-old children.

 

 

First Molar (ALR model)

 

 

 

 

 

 

 

 

54 64 74 34
54 16.48(13.75-19.74) 817(691-964) 723(613-853)
64 7.61 647-8 97 7.18 6.10-8.44
74 22.82(19.28-27.00)
Second Molar (ALR. model)
55 65 75 85
55 1547(1309—1828) 8.78(7.52-10.27) 9.23(7.90-10.79)
65 8.08(6.92-9.42) 8.86(7.58-10.35
75 20.37(17.20-24.11)

 

 

Decayed teeth of discordant contralateral pairs tend to aggregate on the right or the
left side of the sub ject’s mouth than would be expected by chance alone (Vanobbergen
et al.(2006)).

Zhang cl a1.(2007) proposed a Bayesian generalized latent variable mod-
els(BGLVMs) that is a complete likelihood approach for analyzing the dental data
and gave a 95% simultaneous credible intervals, in table 3, for the differences of the
partial correlations, which are used to measure the association strength among differ-
ent nodes (quadrants). The simultaneous credible intervals for the spatial symmetry
testing situations are given as follow.

The above result also shows that the spatial symmetry. in terms of the caries
prevalence. between left and right quadrant is stronger than the ones either between

Upper quadrant and down quadrant or across quadrants.

3.5.2 The results from our approach

Now we show how the above methodology works for dental data and need to spec-

ify all the functions and general notations. In our study, all of the responses

75

Table 3.3. Credible intervals of spatial association strength comparisons based on
BGLVMs and UGGM with unstructured covariance structure

 

 

Simultaneous Spatial Effects , _
Credible intervals

 

left/righ .v.s. across

 

 

 

 

 

 

 

 

 

p56 — p68 (0.134, 1.581)
p56 - p57 0.394, 1.711)
p78 — p68 (0.237, 1.589)
p78 - p57 (0.433, 1.728)
left/righ .v.s. upper/down
p56 — p57 0.235, 1.551)
p56 — P58 (0.117, 1.485
p78 - p67 0.230, 1.601
P78 — p58 (0.215, 1.504
across .v.s. upper / down
p68 — p67 (-1.303, 1.313)
P68 — p58 (4.327, 1.204)
p57 - p67 -1.442, 1.109)
p57 ‘1’58 -I.488, 1.042)
DIC 593.300
N.burnin 1000
N.int.eration l 1000

 

 

76

are binary, so we have the following: (ll-((0) = 1, b,(17,7.,,k) = log(1 + exp(77,-mk)),
. a _. . _ .- .r .. _ . , __ exPV”. ') ,
(1(91J’A'Wr') _ 0~ 1 9(1') __ log(m). ‘Ely1jletjl7l __ 11111771111 ‘— W for
k. = 1,...,5,m = 1,...111,j : 1,...,4.2'. = 1,...,n. Hence, the parameters of interest
in the observational model is 6 = (77’ a" 13'), and 5 = E-l((V1 7" BUY)
' -’ ( ./ . 3 ,' HTlH" T,..., T ,
then11(yijleij7n Z 1111177117319) Z p771(yijleij77i Z lalnimk): and logpfyijleijm Z
1171177119.) Z log P(yijleijrri. Z 1» Tim: 6) Z 771771113101: _ log(1 + €XP(771mk))- The canoni'

cal parameter {71117111 : k = 1, .., K, m : 1, 11.1.1 2 1,71} is deﬁned as follows:
’117111.‘ Z (1171. + 3k T 711.k(117.)‘ (3'46)

Priors for parameters of interest are given by noninformative proper conjugate pri-
ors. which will give comparable results as frequentist 's when sample size large enough,
in which case the sample can provide. enough information for parameter estimates and
prior information will be washed away, also conjugate priors will make the posterior
proper if the prior is proper, in which case the Gibbs sampler can efficiently pro-
vide the appropriate posterior samples from the target posterior distributions. More

speciﬁcally. the priors are given as follows:

Trj : (WI/31‘ "" 9.1.771.) "'3 ”11111)IN DerCtht(”19)' j = 1’ H" J, (347)

where ,9 is a M-dimensional vector of ones with M being prespeciﬁed, and
(in) N 1\f(0,1000), 171’ I 1, ..., 111, (3.48)

and

3,. ~ .v(0.1000); 1:: 1, ...1 (3.49)

We assume the order restriction to the mixture component effect (1, i.e., (1:1 3 (12 S
, g ”M for the label switching problems with the mixture model. For identiﬁability
of the generalized latent variable model, we assume 22:1 13k = 0. For the priors of
precision matrix, O‘Malley and Zaslavsky (2006) proposed scaled \Nishart distribution

as conjugate proper priors

*1

\1

,..-.- _ -1.. _ , ...
Fol the pi lois of thc precision matrix {fle — 2T,” . m — 1,...,1l[}, there are
two different models for the the structures of the precision matrix. (1) Unstructured
precision matrix, the. common noninformative conjugate proper prior is Wishart dis-

tribution, i.e.

if] z of,” ~ Ill/'1'slza'rt((5 + 1). I): m = 1.11, (3.50)

where I is 5 X 5 identity matrix, which will give a noninformative conjugate proper

priors for the precision matric “Tm :- 2%1 ,m 2 1.11].
”I

(‘2) Covariance matrix with structure under CAR. model:

0,7,2 = Tm ~ chnma(0.001,0.001); m = 1,111, (3.51)

and
pm ~ U(A;2%n,/\;llu). m = 1, ...,.M, (3.52)
where {(73, : m = 1, M } are the quadrant specific parameters for overall variability
and {pm : m z 1, ...,M} are the quadrant speciﬁc parameters for overall spatial

effects. Arum. and Am“ are as defined in CAR models in section 33.2.

Our mixture of generalized latent variable models are implemented in WinB U GS,
using noninformative priors for the parameters of interest. After 1000 burn in, the
posterior inference is based on 11000 iterations. The model selection in terms of
number of mixture components at higher level and covariance matrix structure for
spatial latent vectors at intermediate level is based on DIC for missing data problem
(Celeux et at. (2006)).

Based on the above results from table (1)(7) for four different models, the poste-
rior ii‘iferences about the spatial similarity in terms of caries prevalence are roughly
similar, which is because all the models work fairly well. Bayesian model selection
is based on Dle of both of the models, the smaller the DIC, the better the model.

It is common that if the difference between two different models are more than 10

78

Table 3.1. Credible intervals of spatial similarity comparisons based on mixture model
with 2 components and UGGM with unstructured covariance structure

 

 

Credible
intervals
(95 %)

 

(-0021, 0.194
(0.014. 0.229

 

(0.028, 0.236

 

)
(0011, 0.208)
)

(0.000, 0.222
(0.007. 0.208)

 

Spatial
Effects
left/righ .v.s. across
5:50 — Spa
5:36 — 5p?
578 ~ 508
373 — 507
left/righ .v.s. upper/down
556 — 307
3.56 - 5.58
S78 - 307
578 * 5:38

(0.014. 0.229)
(0.000, 0.215)

 

across .v.s. 111.11.)er/(_lown

 

 

 

 

 

508 — Sb”? (—0.111, 0.097)
853 — 838 (—0.111, 0.097)
S57 * Sb"; (-0.lll. 0.097)
857 — 85,», (_-0.111. 0.097)
DIC 035.4100
N.burnin 1000
N.interat ion 11000

 

 

79

Table 3.5. Credible intervals of spatial similarity comparisons based on based on
mixture model with ‘2 components and UGGM with CAR model based covariance

structure

 

 

 

 

 

 

Spatial Credible
Effects intervals
(95 (70)
left / righ .v.s. across
S56 — 568 (0.007, 0.208)
56 - 857 (0.042, 0.243)
578 —— 368 (0.021, 0.222)
S78 — S57 (0.050, 0.257)
left/righ .v.s. upper/down
S55 — SGT (0.035, 0.236)
850 — 355 (0.021, 0.222)
878 — 807 (0.010. 0.213)

V

578 - 555

01035.

11220)

 

across .v.s. upper/down

 

 

 

 

 

808 -- 567 (~0.10‘l, 0.083)
Sag —- 555 00.101, 0.081i
357 — 807 (0.1011, 0.083)
857 — S58 (~0.104. 0.083)
DIC 452.200
N.burnin 1000
N.interation 11000

 

 

Table 3.6. Credible intervals of spatial similarity comparisons based on mixture model
with 3 components and UGGM with unstructured covariance structure

 

 

 

 

 

 

 

 

 

 

 

 

Spatial ,Ciediblle
s in erva s
Effects (95 (70)
left/righ .v.s. across
356 — 568 0.007, 0.188
856 — 357 0.035, 0.215)
378 - 368 (0.007, 0.188)
378 - S57 (0.035, 0.215)
left/righ .v.s. upper/down
856 — 867 (0.028, 0.208)
850 — 558 (0.0151. 0.195)
S78 SOT (0.028, 0.201)
578 — 358 (0.0111, 0.195)
across .v.s. upper / down
568 — 567 (-0.090, 0.076)
808 — Sag (-0.090, 0.076)
557 — 367 (—0.090, 0.076)
357 - SS8 (-0.090, 0.076
DIC 537.500
N.burnin 1000
N .interat ion 1 1000

 

 

81

Table 3.7. Credible intervals of spatial similarity comparisons based on based on
mixture model with 3 components and UGGM with CAR model based covariance

structure

 

Spatial
Effects

Credible
intervals

(95 ‘70)

 

left / righ .v.s. across

(0.014, 0.215)
(0.042, 0.243

 

J
left/righ .v.s.

556 — 368
556 - 557
378 ~ 568

— S51
‘ upper/down

@1035. 0.229
(0.056, 0.250)

    

 

S56 — 867 0.042, 0.243
556 —— 55.8 0.021, 0.215
578 - 867 0 056, 0 257
378 — 85.8 0 035, 0 229

  

 

across .v.s. upper/down
568 — 367
S — S

(-0090, 0.097)

 

 

(—0.090, 0.097

57 - 67 {-0090, 0.097)
S57 — 558 —0.090, 0.097
DIC 348.600
N.burnin 1000
N.interation 11000

 

82

then the model with smaller DIC is the better one. Hence the model (shown in table
7) with 3 components and CAR model based covariance matrix for the correspond-
ing spatial latent vectors is more appropriate than the other models for the observed
data. Specifically. at higher level. the quadrant-wise response vectors follow a mixture
model with 3 components, and were assigned mixture label for each response vectors
by its mixture component allocation process. Conditional on the mixture label, at
the intermediate level, the Gaussian spatial latent vectors, modeled by UGGM with
CAR model based covariance matrix, were introduced to specify the corresponding
mixture component. It's noticeable that our model tried to account for the hetero-
geneity from the dental data hierarchically in two parts. The first part is through
the. mixture of flexible I‘nultivariate distributh’ms. which gives much more. flexibility
for the distributions of the quadrant-wise response vectors than what was done in
BGLV M (Zhang et al. (2007)) at the quadrant level. The second part is through
the generalized latent variable models that is similar to what was done in Zhang’s
et al.(2007) at intermediate level. The choice of the model is reasonable, since the
mixture model can take more than enough heterogeneity from the quadrant-wise re-
sponse vectors, which makes the intermediate level Gaussian spatial latent vectors
with CAR model based precision matrix structure sophisticated enough to explain
the left heterogeneity of the dental data. Based on the chosen model, the conclusion
of the hypothesis testing about spatial symmetry among quadrants is as follows: (1)
Left-right spatial association relationship is the strongest, which is shown in terms
of 95%. credible intervals of the differences between left-right and across and the dif-
ferences between left-right and tip—down with lower bounds are all positive. (2) The
difference of spatial asstx'iation between across and tip-down is not significant at type
S error rate l_)etwee11 0% and 2.58/1 (Gelman (2000)), since the 95% credible interval

of the difference between across and up—down includes zero.

83

3.6 Discussion

In this paper, we propose a flexible class of Bayesian mixture of generalized latent vari—
able models for multivariate spatially correlated binary data with multi-level nested
covariance structure. Our approach is to model the response variables in a hierar-
chical structure. At higher level, we model the quadrant-wise response vectors by
a mixture of generalized latent. varial_)le models. At intermediate level, the response
variables within quadrants are assumed to be from the canonical exponential family
with the canonical parameters modeled by the generalized latent variable models.
Meanwhile we imposed a multivzuiate spatial correlation structure on the latent vari-
ables. which induces the spatial correlation structures among the teeth within the
same quadrant. Statistical inference is based on the posterior distributions of the
parameters of interest. The spatial symmetry among quadrants is assessed by the
similarity score defined in (31). There are two considerations in the model specifica-
tions. The first one is that we used the order constraints for the component marginal
means to deal with the label switching issues for the Bayesian mixture model. The
second consideration is the parameterizations for generalized latent variable models.
For the identifiability of the model. we use sum to zero constraint ﬁxed effects for
the tooth position and assume spatial process has mean zero. Noninformative conju-
gate priors are applied for the parameters of interest, which will give a. comparable
inference results to the frequentists as the sample size increases. “7e proposed four
models to account for both number of mixture component, at higher level and the
covariance structure of Gaussian spatial latent vectors at. intermediate level. The
choices of the number of mixture component and cova—uiance structure are based on
DIC for missing data problem. Spatial hypothesis about the spatial symmetry of
quadrants is based on simultaneous credible intervals for the differences of pairwise
similarity scores of interest. The results from our model show the mixture of gener-

alized latent variables models work fairly well and also comparable to the results in

(X)
..;_.

existing literatures. It concluded that the left-right spatial association is the strongest
and the spatial associations for across and up-down are not different signiﬁcantly at
type S error rate between 0% and 2.5% (Gelman (2006)). For the data example, we
have assumed that the mixture component allocation process {Qi : z' = 1, ...,n} at
higher level and {Tim : m = 1, 117,1 = 1, n} at intermediate level are sufﬁcient
to generate flexible multivariate distribution and induce dependence among teeth to
account for the wide heterogeneities in the dental data. It would be interesting to
introduce different probability models to latent variables at both higher and inter-
mediate level. For instance, non-Gaussian latent process to model the underlying
spatial dependence among teeth, which can lead to a richer class of the latent pro-
cesses {Tim : m = 1, ..., M ,i = 1, ...,n}. Finally, Other approaches for dealing with
label switching problems associated with Bayesian mixture model may be interesting.
It will be optimal when the model selection is simultaneous through either Reversible
Jump Monte Carlo Markov Chain (MJMCMC)(Green et al. (1995)) or Birth and
Death Monte Carlo Markov Chain (BDMCMC) (Stephens (2002)). It will be more
interesting to consider the symmetry pattern of quadrants for a longitudinal study,

which will lead to the spatial-temporal analysis.

85

CHAPTER 4

Discussion and Future Research

4.1 Bayesian generalized latent variable models

We have described generalized latent variable models for analyzing multilevel spatially
correlated binary outcomes, i.e., the multivariate binary caries experience outcomes
from STM project, which is similar to the mixed model with random effects be—
ing two levels of Gaussian spatial latent vectors at both a quadrant level and tooth
nested within quadrant level. It is noticeable that our model is formulated in a hi-
erarchiciitlly dynamic structure which is not only feasible but also relatively easier
within Bayesian framework, when compared to Frequentist's approach where multi-
level dynamic model is either very difﬁcult or infeasible to formulate. The hierarchial
structure of our models specification makes our approach valid for the dental data
with nmltilevel dependence among the subunits of interest. because it approximates
the way in which the multilevel correlated binary outcomes were generated. Our ap-
proach can be viewed as a graph with three. levels of tree structure. At the higher
level, there exists a quadrant. level Gaussian spatial latent vector that tights the four
quadrant—wise binary response vectors together to induce the dependence among the

quadre‘mts and generate ﬂexible multivariate distributions for each response vector.

At this level, our model provides both fixed effects corresponding to quadrant location
and random effects presented by the higher level Gaussian spatial latent vector. Con-
ditional on the Gaussian spatial latent vector at quadrant. level. the quadrant—wise
response vectors are mutually independent. The joint probabilistic behavior of the
quadrant level Gaussian spatial latent vector is given by the UGGMS with mean vec—
tor of zeros and unstructured covariance matrix. At the intermediate level, there exist
four Gaussian spatial latent vectors that are nested within corresponding quadrants in
which the toot is located. In other words, the four intermediate level Gaussian spatial
latent vectors are characterized by quadrant index. Each of the four latent vectors
is used to tight the corresponding ﬁve binary caries response variables together to
induce the dependence among the teeth within the same quadrant and generate ﬂex-
ible distributions for each response variable. At this level, our model provides both
ﬁxed effects for tooth location and random effects, i.e., the intermediate level Gaus-
sian spatial latent vector nested within the. corresponding quadrant. that generates
flexible distributions for binary caries experience outcomes and induce the depen-
dence among the teeth within the same quadrant. For the model identiﬁability, it is
assumed that the Gaussian spatial latent vectors at intermediate level are mutually
independent given the Gaussian spatial latent vector at quadrant level. Conditional
on the Gaussian spatial latent vectors at both higher and intermediate level, all the
binary response of caries experience in the month are mutually independent. This
hierarchical model specification makes complete likelihood approach feasible. which
will improve the efficiency of the estimation of the model parameters. At the lower
level, a liner mixed model is speciﬁed to describe. the log odds of the caries experience
for each tooth of interest. An important feature of our model is that it allows irreg-
ularly spaced multilevel measurements under different spatial conﬁgurations, where
the measurements are characterized by a hierarchical spatial dependence structure.

The common way to implement the generalized latent variable models is through EM

algorithm in frequentist’s framework. where the marginal likelihood is approximated
by using an adaptive Gauss—Hermite quadrature approach to numerically integrate
out the low dimensional latent variables in the model. For a high dimensional latent
variable models. a Monte Carlo EM approach is applied instead. It is known that
latent variable models are only locally identiﬁable and hierarchical models have com-
plex structures, which lead to some consequences. i.e., local Optimizer and singular
information matrix. In order to obtain valid inference, we implemented our model
within Bayesian framework via WinB UGS, since Bayesian inference is always feasible
as long as the MClV’IC algorithm converges. Meanwhile, Bayesian makes it much eas-
ier to specify the l’iierarchial model than under frequentist’s framework. It is also easy
to incorporate missing data in WinBUGS through replacing ymissing by the posterior
sample from 1)(y,,,,,,,.,-,,g|y(.,bw,.,.,.dz 6). The in’iplement of the model is within Bayesian
framework via l‘lr’inBUGS with noninformative conjugate proper priors.

V'Vithout an obvious multivariate distribution for the hierarchically spatially cor-
related binary response variables. multilevel correlated latent variables can be used to
model the wide heterogeneity of the outcomes. Speciﬁcally, the dependence structure
among the Gaussian spatial latent variables, at the higher level, that are used to
induce dependence among four quadrants, is given by UGGMS with zero mean vec-
tor and unstructured covariance matrix. Similarly, the dependence structure for the
Gaussian spatial latent vectors. at the intermediate level. i.e., the four spatial latent
vectors accounting for the heterogeneity of teeth within the same quadrant, is given
by UGGMS with zero mean vectors and covariance matrix that is either unstructured
or structured under CAR model assumption, i.e., a Markovian type of covariance
structure with taking spatial conﬁguration into account. For the identiﬁability of the
model. the two levels of spatial latent vectors are mutually independent with one
another. The model is specified as below:

At the higher level, for the [Hi in the study. there exists a spatial latent. vector Q,- =

88

(Q11, Qz-j, QU)’ that is used to induce the dependence structure among quad—
rants and generate ﬂexible multivariate distributions, fj(-), j = 1, ..., J, for quadrant-
wise response vectors yi = (y;1,....y;j,...,y;J)' with yij = (yijl,...,yijk,...,y,ij)'.

The conditional joint multivariate distribution for the response vectors is speciﬁed as
J
ffyleiﬂEQ) = H fj(y-ilezfj;9, 2Q),
1:1

where the associations among the elements of Q,- are used to induce the associations
among the four quadrants.
At the intermediate level, for each quadrant 2', there exists a spatial latent vector
I - . - .
Ti- = (Ti,1(j)v---~Ti.k(j)~.-~-Tz'..K(j)) that 18 used to induce the dependence struc-
ture among teeth nested within the jth quadrant and generate ﬂexible distributions,

{fk(j)(') : k = 1, K},for binary response variable yijk- The conditional joint dis-

tribution for the binary response variables is speciﬁed as
. K .
fifyilez'sztjﬂ, $le = H fk(j)(yijleijaTi,k(j)i 9» >337)-
k=1

At the lower level, conditional on the higher level spatial latent vector {62,- : 2'. =
1, ...,n} and intermediate level spatial latent vectors {Tl-J- : j = 1,...,J,2' = 1, ...,n}.
The binary response variable yijk is mutually independent and from Bernoulli family

with probability of success 7rzijk = Pug-ﬂ, = 1). That is,

(yijleijajijAt? 0,37) N Berno’uuifﬁijk) l= fj(i)(yijle-ijvTi,k(j)i 9),
where
lo,qit(7r,jk|Q,1j,TM”): a, ,3, '7) = (Y + ﬂj + MU) + Q1] + TM‘U)‘
and 6 = (o, .3'. y')’ with constraints 2)];1 ﬂj : 0 and 2:le 7km = 0 forj = l, ..., J.
Let Q = {C2, : '1? = 1,...,n} with Q, = {QU :j= 1,...,J} and T 2 {Ti : z' :
1,...,n} with T,- = {Tl-J- :j = 1,...,J} and Tij = {TM-(j) : k =1,...,K}. If the model

formulation is viewed as missing data problem where we treat Q and T as missing

89

covariates that are used to explain the wide heterogeneity of dental caries experience

outeomes,then the complete likelihood is,

my, em 2a {2%) = f(le,T;6)p(leq)p(Tl{E§~}) _
= 2:1 f(yilQi:Tii9)p(QilZQ)Hj=1p(Tijl2%‘)}
412;] 11321 fiesta-a;epmjlzti}pea-12(2)}

= $21 H3121 Phil{fk(j)(yijlei’7i,k(j);6)}pmJIET)}p(Qf|EQ)}'
(4.1)

The distributions for Q, and Ti]- are given by UGGMS correspondingly as below:

Qt E3Q N NJ(0.EQ); i=1....,n.

and

713,123; ~ .N1,'(O,Elj~). j = 1,...J,z' = 1,...,n,

where 2Q is unstructured and 231'. can be either unstructured or CAR model based.

Other consideration for parameterizations of the ﬁxed effects 6’ and the probabilis-
tic descriptions about the spatial latent vectors {Tz-j : j = 1, ..., J,z' = 1, ..., n} may be
chosen differently. However, as it can be expected, the results of the inference would
not be affected substantially(Agresti(1997)). The model selection is based on DIC for
missing data problems(Celeux, et al.(2006)). The optimal model selection needs to
be based on RJMCMC(Green et al.(1995)) or BDMCMC (Stephens (2000)), which
is essential a simultaneous model selection at each iteration of the MCMC posterior

sampling algorithm.

4.2 Bayesian mixture of generalized latent variable
models

Besides the generalized latent variable models, a ﬁnite mixture of distributions is
another way to model response variables with wide heterogeneity. Finite mixtures

of distributions are mathematical-based approaches to the statistical modeling of a

90

wide variety of random phenomena. They have been known as an extremely flexible
method of modeling. The usefulness of finite mixture distributions in the modeling
of heterogeneity in cluster analysis context is obvious. Mixture model provides a con-
venient semiparametric framework in which to model unknown distributional shapes,
whatever the objective, whether it is density estimation or the flexible construction
of Bayesian priors. Mixture, model is also able to model quite complex distributions
through an appropriate choices of its components and number of mixture components
to represent accurz’ttely the local areas of support of the true distribution. It can han-
dle situations where a single [')arametric family is unable to provide a satisfactory
model for local variations in the observed data. In our approach, we assumed that
each of the four quadrant-wise response vectors was from one of a certain number.
say. I 3 :1! S 4. of multivariate distributions with corresponding probability. The
.11 multivariate. distributions are characterized by M different situations which can
accurately represent the corresponding local heterogeneity of observed binary vector.
A convenient semiparametric way to incorporate the variability among these four ob-
served quadrant-wise response vectors is to formulate their distributions uniformly
in the form of a mixture of these M multivariate distributions. Speciﬁcally, the M
multivariate distributions corres1')ond to .le underlying subgroups 0r subpopulations
that where the four quadra-int-wise response vectors are supposed to be able to iden—
tify if the subgroups actually exist: and each of the .1! multivz-iriate distribution is
corresponding to one component in the mixture model.

Mixture model can be viewed as missing data problem where the mixture compo-
nent allocation process is latent. The latent process allocates each of the quadrant-
wise res1‘)onse vector. yij to one of the. mixture components. say, the 'mth. component,
which means yU- can be characterized by the local situation. i.e., in terms of hetero-
geneity of the observed vector. associated with the III/h underlying cluster. Hierarchi-

cally. at higher level. for the ill) subject, there exists a mixture component allocz-ition

91

latent process, Q1: (1 :1, ...,ng,... ,QAJ)’ with
- ., . » I . .
sz = (Q1111 ~-~Q1‘jm1 ""Qij.’\[) N IVUU'IMU» (W111.--17TJ‘M) ), J :1....,J,2 =1,...,n,

which means
(Uilez’jn-i = 1) ~ fluff/Ufa)-

The complete distribution can be given as below:

J M
ffyilQi 7T 9) = H H {ijf-mf'yzleijm =119llejm - (4-2)
jl: m=1

At the intermediate level, for the mih component that is a multivariate distribution,
there exists a Gaussian spatial latent vector Tim = (T1,1(rn.)i TiMm)? TA,K(m))I N
NATO-121T”): which is used to generate flexible distribution for the A" binary response
variables that is from the exponential fan’iily (McCullagh and Nelder et a1.) and
induce the dependence among the J variables. At lower level, conditional on the

allocation process and Gaussian spatial latent vectors, the conditional distribution

for the binary caries experience outcome 91);; is given by

(yijAlejm — 1. TM HA,” :6) ~ BernUogit1(77A,,,A))) A: fA‘Hmff/zj/tlQijni:Tin/((77010).

where ’limA" —— (1m +— 3A + TM A‘tm) and 6 = ((.i'. .3')’. with constraints (i1 S 02 S, -, -, -, <

0‘!” and Zk\:1 dle : 0.
Let Q I ((23. Q’A, Q;,)’ and T = (T’, T-’, T,’,)', then the complete likeli-

2

hood is specified as

my. Q. TIM {zen = H111{H;‘11’_.1{@11meT111119»{WWW}
:l—lizl Hf:1{ 23:1 {ﬂjmf {HA-z 1fA( (.)m )(yzjk TLA (m) MW} Timlzm)}Qijm}
(4.3)
The model structure has two uncertainties from both mixture model at the higher
level and generalized latent variable models at intermediate level. At the higher

level, the number of mixture components is left unknown. At intermediate level, the

92

covariance matrix, {2’11” : m = 1, M},for the generalized latent variable models
can be either unstructured or CAR model based. The appropriate model needs to be
determined by formal model selection criterion based DIC for missing data problem
(Celeux, et al.(2006)). The implement of" the model is within Bayesian framework via
WinB U CS with noninformative conjugate proper priors.

Other consideration for parameterizations of the ﬁxed effects 0 and the prob-
abilistic descriptions about the spatial latent vectors {Qi : 2'. = 1, ..., n} and
{TA-J : j = 1, .l,2'. : 1, ...,72} may be chosen differently. However, as it can be ex-
pecte.d, the results of the inference would not be affected substantially(Agresti(1997)).
The optimal model selection needs to be based on RJMCMC(Green et al.(1995)) or
BDMCMC (Stephens (2000)), which is essential a simultaneous model selection at

each iteration of the MCMC posterior sampling algorithm.

4.3 Missing data

In biomedical research. missing data problem is common and there are lots of liter-
atures with different approaches discussed in this area. but still the methods are not
mature enough yet. to handle general situations. Our model were built from the fea-
tures of the dental data at hand, they have general applications to situations where
multilevel discrete data recorded were spatially. The models were implemented via
l’l’rtTlB U GS that allows missing values in the data set. What thB U GS does to miss-
ing values is to replace the missing data by the random sample from its posterior
distribution p(y,,,A-55A,,glyobserwd; 6), which is essentially assumed that the missing is
at random, i.e., the missing mechanism is noninformative. However, the missing data
is very likely informative, since the teeth within the mouth share the same biological
environment. In the presence of the informative missing data, our models need to be

extended accordingly. In the futures work, we need to extend the model by incorpo-

93

rating the informative missing mechanism in a dropout process that is a parametric
model for making inference about the missing values in the data set. The process for
modeling the dropout pattern is problematic because the parameter that relates the
measurement and the drOpout process, say, A, is always unidentiﬁable from the data
at hand. Non-identiﬁability of the model always yields difficulties in the numerical
optimization because of either flat or multimodal likelihood and singular informa-
tion matrix, which makes the statistical inference infeasible in the frequentist’s frame
work. Under the Bayesian frame work, the statistical inference is always available as
long as the MCMC algorithm converges that are used to sample the posterior samples
of the the quantities from their proper posterior distributions that are related to the
data at hand.

Bayesian approach for dealing with the informative missing data is known as the
selection model (Arminger et al., 1995), which requires the terms representing the
non-response mechanism be included explicitly in the likelihood. Best et al.(l996)
discussed the selection model for informative non—responses in a study of dementia
and cognitive decline in the elders. They viewed the full model as two submodels;
one representing the substantive relationship of interest and one reflecting the missing
data process, with the possibly unobserved response variable representing the com-
mon link between the two submodels. Such a model may be readily expressed as
a directed conditional independence graph, thus leading itself to Bayesian inference
using MCMC approach. However, there is considerable current interest in the topic of
informative drop-out(Diggle and Kenward (1994)) in which some argue that any at-
tempt to learn about the selection mechanism will be heavily dependent on modeling
assumptions, and that it is preferable to conduct sensitivity analysis to alternative
plausible mechanisms. Meanwhile, the MCMC approach can easily provide predictive
distributions for any variable of interest and. unlike approaches based on maximum

likelihood or empirical Bayes. the MCMC predictions fully account for uncertainty in

94

both the model and the parameter estimations. Since the data often can not provide
much inforn'iation for estimating the parameters of the models for non—response mech-
anism, informative prior distributions for the parameters of interest in the selection
models are used to facilitate the posterior sampling algorithm based on MCMC. So
sensitivity analysis for the priors in the selection model is essential for the validity
of the Bayesian analysis for model the non-response. mechanism that is incorporated
explicitly in the likelihood.

The future work will intend to develop a more general statistical procedure for
assessing the sensitivity for both the non-response mechanism learning process and the
informative priors used in the selection models. The procedure may be based on either
different. model selection criteria, for instance. DIC for missing data problem and
posterior predictive checking. or dynamic algorithms based on RJMCM (Grecn(l995))
and BDMCMC (Stephens (2000)) for simultaneous model selection and parameter

estimations.

4.4 Comparison between frequentist and Bayesian

It is well known that many standard statistical methods can be justified by both
Bayesian and frequentist arguments. However, even when there is only one un-
known parameter, there is a wide class of problems for which no Bayesian method
can be found which satisﬁes the basic frequentist criterion (Bartholomew (1965)).
Bartholomew raised two important questions in the comparisons between Bayesian
and frequentist when discrej‘mncy arose. The first one is the practical question of
whether the discrepancy between the two approaches is ever such as to lead to widely
differing conclusions. The second is concerned with the reason for the two approaches
to inference giving different results in some cases but not. in the other. The two ques-

tions is also of great interest to be addressed in our future work. The starting point

for this work will be considering the differences in the statistical thinking of the two
statistical schools. For instance, suppose the observations y = (y1, yi, yn)’ on a
continuous random variable with density function f (ylt9) and consider the Bayesian
and frequentist solution to the problem about making an inference about 0.

The Bayesian ﬁrst speciﬁes a prior for 6 then combining this with the likelihood to
obtain a posterior distribution which enable people to make a probability statement
about 6 of the form

Pfé S gaff/Hy) = 0b (4-4)

where ob denotes a degree of belief. The major problem for Bayesian is to select a
prior density 7r(0) to express his ignorance about 9. Kass et al.(1995) reviewed several
methods for determining a suitable prior distributions for the parameters of interest.
For instance, based on Jerreys's rule, if we are ignorant about 0 then we are ignorant
of about any function of 9. This leads to him to formulate the invariant principle,
i.e., 7r(t9) cc [(6) where [(6) is the Fisher’s information function.

The frequentist who wishes to make a statement of the form (6.10) is precluded
from treating 9 as a random variable as it was treated by Bayesian. He must try to
ﬁnd a statistics 6(y) such that

P09 g @IO) = ax (4.5)
where of indicates the probability is to be interpreted in a frequency sense. The
frequentists ignorance about 6 is expressed by the fact. that of is independent of 9.
The statement 9 S 6A,(y) is thus true in the long run with probability (if for any
sequence of 6's. In general, there are many functions 6(y) satisfying (6.11) and the

frequentist‘s problem is to choose one of them. It may be possible to choose 6( y) such

0(y)
/ pt9ly)d9 = a

-30

that

Where p(9|y) is the posterior density of 6 for some prior 7r((-}). If the statistics 6(y) is

96

chosen in the way (6.11) is true, we say the Bayesian inference in (6.10) has frequency
or conﬁdence property. Under these circumstances, the Bayesian and frequentist
approaches are said to agree. Welch et al. (1963) gave the following necessary and
sufficient conditions for agreement.

(1) It must be possible to write f(y|6) in the form f(s —— 7') where s and 7' are
monotonic functions of y respectively and with —oo < 7', s < oo.

(‘2) The prior density of 7' must be uniform over the real line.

In large sample size, it is known that the influence of the prior 7r(6) for parameter
6 on the form of the posterior density [)(6ly) diminishes as 'n, —> 00. This means
that, under very general conditions. Bayesian statement of (6.10) has the conﬁdence
property in the limit as 'n. —> 00 and the approach to agreement is more rapid with
n if 7r(6) oc «1(6). Gelman ct. (1.1. (2004) discussed the asymptotic normality and
consistency of the posterior mean and median. Under some regularity conditions,
i.e., the likelihood is a continuous function of 6 and that 60, the true value of the
parameter, is not on the boundary of the parameter space, as n —+ 00, the posterior
distribution of 6 approaches normality with mean 60 and variance (n1(60))_1, where
[(60) is the Fisher inforn‘iation evaluated at 60. In the limit of large n, the posterior
mode. 6, approaches 60. and the curvature (observed information) approaches n1(60).
\‘Vllt‘ll the truth is included in the family of models being ﬁtted, the posterior mode,
the posterior mean and median. are consistent. asyinptotically unbiased and efficient
under mild regular conditions (Gelman ct. ul.(2001)).

\Vhen sample sizes are small. the prior distribution is a critical part of the model
sp<—>cification. It can only be a serious discrepancy between Bayesian and frequentist-
methods if the density f(y|6) does not satisfy Welch’s condition (1), if the sample
size is small. or possible. if it is determined sequentially. Bartholomew raised two
objectives for the comparison between the two approaches. The first object is about

the. extent to which Bayesian and frequentist statement of the form (6.10) and (6.11)

97

may differ in small samples. The second object is the reason for the differences which
occur and how they may be avoided. Lee and Song (2006) did a simulation study,
which showed Bayesian inference for hierarchial models with small to moderate sample
size has a better performance than frequentist’s.

Bartholomew (1965) pointed three conclusions in terms of the agreement between
Bayesian and frequentist. (a) For shape parameter in gamma distribution, Bayesian
interval estimates gave good agreement even if sample size is one; for restricted lo-
cation parameter and exponential mean the agreement was not so good, but can be
in'iproved by an appropriate chosen conﬁdence interval. i.e. either ”shortest interval"
or ”equal tails”. (b) Coverage probability of a two-tailed Bayesian interval estimate
depends on not only prior but also the way that the interval is chosen. (c) Agreement
may be achieved by using a sequential rather than a ﬁxed sample size experiment
design. The numerical magnitude of differences between frequentist and Bayesian
methods of inference can be practically related to (a) and (b). The reason for the dis-
crepancy is given by (c). He also conjectured that agreement can be always obtained
if a correspondence is established between the Bayesian’s apprOpriate choice of prior

distributions and the frequentist‘s choice of sampling rules.

98

APPENDICES

99

APPENDIX A

The First Appendix

A.1 Wz'nB U GS code one for BGLVM

(with unstructured covariance matrix at intermediate level) for overall spatial sym-

metry assessment.

model{
### Gaussian Graphical Models at Quadrant level ###
InvSigaman1:I,1:I] " dwish(IQ[,].(I+1))

muQ[1]<- O
muQ[2]<- O
mqu3]<- O
muQ[4]<- O

### Gaussian Graphical Models at Tooth level ###
InvSigamaT[1:J,1:J] " dwish(IT[,],(J+1))
muT[1]<- O
muT[2]<- 0
muT[3]<- 0

muT[4]<- 0

100

muT[5]<- O
### Generalized Latent Variable Models ###
for(k in 1:N){
Q[k,1 I] " dmnorm(muQ[1:I],InvSigamaQ[1:I,1:I])
for( i in 1:I){
T[k,i,1:J] " dmnorm(muT[1:J],InvSigamaT[1:J,1:J])
}
for( i in 1:I){
for( j in 1:J){
Lat[j,i,k]<-a1pha+(beta[i]-mean(beta[]))+

(gammalj,il-mean(gamma[,il))+Q[k,il+T[k,i,j]

}
for( i in 1:I){
for( j in 1:J){
logit(p[j,i,k])<-Lat[j,i,k]
y[(k-1)*20+(i—1)*5+j] “ dbin(p[j,i,k],1)
}

}

### Priors ###
alpha ” dnorm(0,0.001)
for(i in 1:I){
beta[i] " dnorm(0,0.00l)
}
for( i in 1:I){

for( j in 1:J){

101

gamma[j,i] " dnorm(0,0 01)
}
}
### Spatial association assessment ###

## Spatial association assessment between Left and Right ##
Tlam12 <- -InvSigamaQ[1,2]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[2,2])
T1am34<~ -InvSigamaQ[3,4]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[4,4])

## Spatial association assessment between Upper and Down ##
Tlam23<- -InvSigamaQ[2,3]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[3,3])
Tlam14<- -InvSigamaQ[1,4]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[4,4])

## Spatial association assessment between Across quadrants ##
T1am13<- -InvSigamaQ[1,3]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[1,1])
Tlam24<- -InvSigamaQ[2,4]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[4,4])

### Hypothesis Testing Overall Spatial Symmetry ###
LRvsUD<-1/2*(Tlam12+T1am34)-1/2*(Tlam23+T1am4)
LRvsA<-1/2*(Tlam12+T1am34)-1/2*(T1am13+Tlam24)

AvsUD<-1/2*(Tlam13+T1am24)-1/2*(T1am23+Tlam14)

A.2 WinB U GS code two for BGLVM

(with CAR model based ('(wariance matrix at intermediate level) for overall spatial

syuiuiuatrv asse581n(nit.

model{
### Gaussian Graphical Models at Quadrant level ###
InvSigamaQ[1:I,1:I] " dwish(IQ[,],(I+1))

muQ[1]<- O

102

muQ[2]<- O

muQ[3]<- O

muQ[4]<- O

### Gaussian Graphical Models at Tooth level ###
### with CAR assumption for precision matrix ###

num[1]<- 1

num[2]<— 2

num[3]<- 2

num[4]<— 2

num[5]<- 1

m[1]<- 1

m[2]<- 1/2

m[3]<- 1/2

m[4]<- 1/2

m[5]<- 1

cumsum[1]<- O

for( i in 2:6){
cumsum[i]<-sum(num[1:(i-1)])
}
for(k in 1:8){

for(i in 1:5){

I

pick[k,i]<- step(k-cumsum[i]-esp)*step(cumsum[i+1]-k)

}
C[k]<— 1/inprod(num[],pick[k,])
}
esp<- 0.0001
adj[1]<- 2

1(l3

...;

adj[2]<-

(JO

adj[3]<'

M

adj[4]<-

.b

adj[5]<-

(JO

adj[6]<-

adj[7]<-

0'1

4:.

adj[8]<-
muT[1]<— O
muT[2]<- O
muT[3]<- O
muT[4]<- O
muT[5]<- O
### Generalized Latent Variable Models ###
for(k in 1:N){
Q[k,1:4] " dmnorm(muQ[1:4],InvSigamaQ[1:4,1:4])
T[k,1,1:5] ” car.proper(muT[],C[],adj[],num[],m[],prec,spat1)
T[k,2,1:5] ” car.pr0per(muT[],C[],adj[],num[],m[],prec,spat2)
T[k,3,1:5] " car.proper(muT[],C[],adj[],num[],m[],prec,spat3)
T[k,4,1:5] " car.pr0per(muT[],C[],adj[],num[],m[],prec,spat4)
for( i in 1:1){
for( j in 1:J){
Lat[j,i,k]<-alpha+(beta[i]-mean(beta[]))+

(gamma[j,il-mean(gamma[,il))+Q[k.i]+T[k,i.jl

}

for( i in 1:I){

for( j in 1:J){

104

 

logit(p[j,i,k])<—Lat[j,i,k]
y[(k—1)*20+(i—1)*5+j] “ dbin(p[j,i,k],1)
}

}
### Priors ###
alpha ” dnorm(0,0.01)
for(i in 1:1){
beta[i] ” dnorm(0,0.01)
}
for( i in 1:I){
for( j in 1:J){
gamma[j,i] " dnorm(0,0.01)
}
}
prec " dgamma(0.005,0.001)
spatmax<- 0.35
spatmin<- -O.95

~

spatl dunif(spatmin,spatmax)

spat2 dunif(spatmin,spatmax)
spat3 ” dunif(spatmin,spatmax)
spat4 ” dunif(spatmin,spatmax)
### Spatial association assessment ###
## Spatial association assessment between Left and Right ##
Tlam12 <- -InvSigamaQ[1,2]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[2,2])
Tlam34<- -InvSigamaQ[3,4]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[4,4])

## Spatial association assessment between Upper and Down ##

1()5

Tlam23<- -InvSigamaQ[2,3]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[3,3])
T1am14<- -InvSigamaQ[1,4]/sqrt(InvSigamaQ[1,1]*InvSigamaQ[4,4])
## Spatial association assessment between Across quadrants ##
T1am13<- -InvSigamaQ[1,3]/sqrt(InvSigamaQ[3,3]*InvSigamaQ[1,1])
T1am24<- -InvSigamaQ[2,4]/sqrt(InvSigamaQ[2,2]*InvSigamaQ[4,4])
### Hypothesis Testing Overall Spatial Symmetry ###
LRVSUD<—1/2*(Tlam12+Tlam34)-1/2*(Tlam23+Tlam4)
LRVSA<-1/2*(Tlam12+Tlam34)-1/2*(Tlam13+Tlam24)

AvsUD<-1/2*(T1am13+Tlam24)-1/2*(T1am23+Tlam14)

106

APPENDIX B

The Second Appendix

B.1 WinBUGS code one for BMGLVM

(with 3 components and unstructured covariance matrix at intermediate level) for

overall spatial symmetry assessment.

model{
for( n in 1:N){
### Mixture models (for "mth" mixture( with M components)) ###
### at Quadrant level ###
for( i in 1:I){
### Mixture models (for "kth" mixture( with K components)) ###
### at Tooth level ###
for( j in 1:J){
y[((n-1)*20+(i-1)*5+j)] " dbern(p[n,AQ[n,i],j])
}# End of positions index #
APQ[n,i,1:M] “ ddirch(alphaQ[])
AQ[n,i] " dcat(APQ[n,i,] )

}# End of quadrants index #

107

Q12[n]<- equals(AQ[n,1],AQ[n,2])
Q13[n]<- equals(AQ[n,1],AQ[n,3])
Q14[n]<- equals(AQ[n,1],AQ[n,4])
Q23[n]<- equals(AQ[n,2],AQ[n,3])
Q24[n]<- equalS(AQ[n,2],AQ[n,4])
Q34[n]<- equals(AQ[n,3],AQ[n,4])
}# End of Subjects index #

### Mixture Components Specification via ###

### GLVMs with Unstructured Covariance ###
theta[1:J] ' dmnorm(mu[],ian[,])
alphal " dnorm(0,tau)
local " dnorm(0,tau)I(O,)
10ca2 " dnorm(0,tau)l(,0)
alpha[1]<- alphal
alpha[2]<- alpha1+loca1
alpha[3]<- alpha1+loca2

for( n in 12N){

for( m in 11M){
T[n,m,1 5] " dmnorm(muT[1:5],InvSigamaT[1:5,1:5])
for(j in 1:J){

logit(p[n,m,j])<-alpha[m]+theta[j]-mean(theta[])+T[n,m,j]
}

 

### Priors ###
InvSigamaT[1:5,1:5] ~ dwish(IT[,],6)

tau ” dgamma(0.01,0.01)

108

muT[1]<— O
muT[2]<- 0
muT[3]<- O
muT[4]<— 0
muT[5]<— O
### Similarity Assessment ###
MQ12<- mean(012[])
MQ13<- mean(013[])
M014<— mean(014[])
MQ23<- mean(Q23[])
MQ24<- mean(Q24[])
M034<— mean(034[])

### Hypothesis Testing Overall Spatial Symmetry ###
LRvsUD<- 1/2*(MQ12+M034)-1/2*(M023+MQI4)
LRvsA<— 1/2*(M012+MQ34)-1/2*(MQ13+MQ24)
AvsUD<- 1/2*(MQ13+MQ24)-1/2*(M023+MQI4)
}

B.2 WinB U GS code two for BMGLVM

(with 3 components and CAR model based covariance matrix at intermediate level)

for overall spatial symmetry assessment.

model{
for( n in 1:N){
### Mixture models (for "mth" mixture( with M components)) ###
### at Quadrant level ###

for( i in 1:I){

109

### Mixture models (for "kth" mixture( with K components)) ###
### at Tooth level ###
for( j in 1:J){
y[((n-1)*20+(i-1)*5+j)] " dbern(p[n,AQ[n,i],j])
}# End of positions index #
APQ[n,i,1:M] " ddirch(alphaQ[])
AQ[n,i] ” dcat(APQ[n,i,] )
}# End of quadrants index #
Q12[n]<- equals(AQ[n,1],AQ[n,2])
Q13[n]<- equals(AQ[n,1],AQ[n,3])
Ql4[n]<- equals(AQ[n,1],AQ[n,4])
Q23[n]<- equals(AQ[n,2],AQ[n,3])
Q24[n]<- equals(AQ[n,2],AQ[n,4])
QB4[n]<- equals(AQ[n,3],AQ[n,4])
}# End of Subjects index #
### Mixture Components Specification via GLVMs under CAR Model ###
theta[1:J] ” dmnorm(mu[],ian[,])
alphal " dnorm(0,tau)
local “ dnorm(0,tau)I(0,)
loca2 " dnorm(0,tau)I(,O)
alpha[1]<- alphal
alpha[2]<- alpha1+loca1
alpha[3]<- alpha1+loca2
for( n in 1 N){
for( m in 1:M){
T[n,m,1:5] " car.proper(muT[],C[],adj[],num[],invm[],prec,spat[m])

for(j in 1:J){

110

logit(p[n,m,j])<- alpha[m]+theta[j]-mean(theta[])+T[n,m,jl
}
}
}
### CAR models specification ###
num[1]<- 1
num[2]<— 2
num[3]<- 2
num[4]<- 2
num[5]<— 1
invm[1]<- 1
invm[2]<- 1/2
invm[3]<- 1/2
invm[4]<- 1/2
invm[5]<- 1
cumsum [1] <— O
for( i in 2:6){
cumsum[i]<- sum(num[1 (i-1)])
}
for(k in 1 8>{
for(i in 1 5){

pick[k,i]<- step(k-cumsum[i]-esp)*step(cumsum[i+1]-k)

}
C[k]<- 1/inprod(num[],pick[k,])
}
esp<-0.0001
adj[1]<— 2

111

adj[2]<- 1
adj[3]<- 3
adj[4]<- 2
adj[5]<— 4
adj[6]<- 3
adj[7]<- 5
adj[8]<— 4
muT[1]<- 0
muT[2]<- O
muT[3]<- O
muT[4]<- 0
muT[5]<- O
### Priors ###
prec " dgamma(0.005,0.001)
spatmax<- 0.35
spatmin<- -O.95
spatl " dunif(spatmin,spatmax)
spat2 “ dunif(spatmin,spatmax)
spat3 " dunif(spatmin,spatmax)
spat[1]<- spatl
spat[2]<- spat2
spat[3]<- spat3
tau “ dgamma(0.001,0.001)
### Similarity Assessment ###
MQ12<- mean(012[]>
MQ13<- mean(Q13[])

MQI4<— mean(Q14[])

112

MQ23<- mean(Q23[])

MQ24<- mean(Q24[])

MQ34<- mean(Q34[])
### Hypothesis Testing Overall Spatial Symmetry ###
LRvsUD<- 1/2*(MQ12+MQ34)-1/2*(MQ23+MQ14)

LRvsA<— 1/2*(M012+M034)—1/2*(M013+MQQ4)

AvsUD<- 1/2*(M013+MQ24)-1/2*(MQ23+MQ14)

}

 

 

113

BIBLIOGRAPHY

Aitkin, M. and Rubin, D., B. (1985). Estimation and hypothesis testing in ﬁnite mixture
models. Journal of the royal Statistical Society B 47. 67-75.

Aitkin, M. (I996). A general maximum likelihood analysis ofoverdispersion in general-
ized linear models. Statistics and Computing, 6, 251-262.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood
princ- iple. In Proc. 2nd Int. Symp. Information Theory (eds B. N. Petrov and F.Cspaki),
pp.267-28l.

Alan Agresti. (1997). A model for repeated measurements of a multivariate binary
respo-nses. Journal of American Statistical Association, Vol. 92, No. 437, 3 l 5-321.

Alan Agresti. (2002). Categorical Data Analysis. Second Edition. New York: Wiley.

Alan Agresti and and D., Hitchcock. (2005). Bayesian inference for categorical data
ana- lysis. Statistical Methods and Application (Journal of the Italian Statistical Society).

Alan E. Gelfand and Penelope Vounatsou. (2003). Proper Multivariate Conditional
Auto- regressive Models for Spatial Data Analysis. Biostatistics 4, I, pp. ll-25.

Alan E. Gelfand and Sujit K. Sahu. (I994). ldentiﬁability, improper priors, and Gibbs
sampling for generalized linear models. Journal of the American Statistical Association,
Vol. 94, N0. 445. pp. 247-253.

Anderson, T., W. (l97l ). An Introduction to Multivariate Statistical Analysis, 2‘"d edition.
New York: Wiley.

Anders Ekholm, Peter W., F., Smith and John W. McDonald. (1995). Marginal
regression analysis of a multivariate binary response Biometrika, Vol. 82, No. 4.,pp. 847-

854.

Anders Skrondal and Sophia Rabe-Hesketh. (2004). Generalized Latent Variable
Modeling. New York: Chapman and Hall.

Andrew, Gelman, John, B. Carlin. Hal, S. Stern and Donald, B. Rubin. (2004). Bayesian
Data Analysis Chapman & Hall/C RC .

Andrew Gelman and Francis Tuerlincks. Type S error rates for classical and Bayesian
single and multiple comparison procedures. Working paper.

”4

Arminger, G., Clogg, C., C. and Sobel, M., E. (eds). (I995). Handbook of Statistical
Modeling for the Social and Behavioral Science. New York: Plenum.

Bartholomew, D., J. (I965). A comparison of some Bayesian and frequentist inferences.
Biometrika, 52, I and 2, l9-35.

Bartholomew, D., J. (I984a). The Foundations of Factor Analysis, Biometrika, 7l , 22]-
232.

Bartholomew, D., J. (I984b). Scaling Binary Data using a Factor Model, Journal of the
Royal Statistical Society, Series B, 46, l20-123.

Bartholomew, D., J. (I988). The sensitivity of latent trait analysis to the choice of prior
distribution. British Journal of Mathematical and Statistical Psychology, 4 I , I 0 I -l 07.

Bartholomew, D., J. (1994). Bayes's theorem in latent variable modeling. Aspects
of uncertainty: A tribute to D. V. Lindley. Freeman,P.R. & Smith, A., F., M., Ed. London:
Wiley.

Bartholomew, D., J. and Knott, M. (I999). Latent Variable Models and Factor Analysis.
Edward Arnold Publishers Ltd.

Bayes, T., R. (I 763). An essay towards solving a problem in the doctrine of
chances. Philosophical Transactions of the Royal Society, 53, 370- 418. Reprinted in
Biometrika, 45, 243—3 l 5, I 958.

Bedrick, E., J ., Christensen, R., and Jonson, W. (I996). A new perspective on priors for
generalized linear models. Journal of the American Statistical Association, 91, I450-
1460.

Best, N., G., Spiegelhalter, D., J ., Thomas, A. and Brayne, C., E., G. (I996). Bayesian
analysis of realistically complex models. Journal of Royal Statistical Association, A, 159,
part 2, 323-342.

Bettina GrAun and Friedrich Leisch. (2006). Fitting ﬁnite mixtures of generalized linear
regressions in R. Computational Statistics and Data Analysis.

Box, 6., E., P., and Tiao, G., C. (1973). Bayesian inference in statistical analysis,
Reading, MA: Addison-Wesley.

Breslow, N., E. and Clayton, D., G. (I993). Approximate inference in generalized linear
mixed models. Journal of the American Statistical Association, 88, 9-25.

Brook, D. (I964). On the distinction between the conditional probability and joint

probability approaches in the speciﬁcation of the nearest neighbor systems. Biometrika
5 I, 481-489.

“5

Bruce Rannala. (2002). Identiﬁability of parameters in MCMC Bayesian inference of
phylogeny. systematic biology, Vol.51, No.5, pp. 754-760.

Carey, V., Zeger, S. and Diggle, P. (1993). Modeling multivariate binary data with
alternating logistic regressions. Biometrika, 80, 3, pp. 517-526.

Carmen Fernandez and Peter, J. Green. (2002). Modeling spatially correlated data via
mixture: A Bayesian approach.

Celeux, 6., Forbes, F., Robert, C., Titterington, BM. (2006). Deviance Information
Criteria for missing data models with discussion. Bayesian Analysis, in print.

Chuan zhou and Jon Wakeﬁeld. (2006). A Bayesian mixture model for partitioning gene
expression data. Biometrics 62, 515-525.

Collett, D. and Stephiewska, K. (1999). Some practical issues in binary data analysis.
Statist. Med. 18, 2209-2221.

Coull, B., A. and Agresti, A. (2000). Random effects modeling of multiple binomial
responses using the multivariate binomial logit-normal distribution. Biometrics. 56, 73-80.

Cox, D., R. (1970). The Analysis of Binary Data. London: Methuen.

Cox, D., R. (1972). The analysis of multivariate binary data. Applied Statistics, Vol. 21,
No. 2, pp. 1 13-120.

Cressie, N. (1991). Statistics for Spatial Data. New York: Wiley. pp: 61-63.

Dale, J ., R. (1986). Global cross—ratio models for bivariate discrete ordered responses.
Biometrics 42, 909-917.

David, B., Dunson. (2007). Bayesian methods for latent trait modeling of longitudinal
data. Statistical Methods in Medical Research, 16: 399-415.

Davidian, M. and Giltinan D., M. (2003). Nonlinear models for repeated measurement
data: an overview and update. Journal of A gricultural, Biological, and Environmental

Statistics 8, 387-409.

Dawid, A.,P. (I979). Conditional Independence in Statistical Theory. (with discussion),
Journal of the Royal Statistical Society, Ser. B, 41,1-31.

Dempster, A. (1972). Covariance selection. Biometrics 28, 157-175.

Diebolt, J. and Robert, C., P. (1994). Estimation of ﬁnite mixture distributions through
Bayesian sampling. Journal of the Royal Statistical Society B 56, 363-375.

116

Diggle, P. (1992). Discussion of paper by K.-Y. Liang, S., L., Zeger and B., Qaqish. J. R.
Statist. Soc. B 45, 28-39.

Diggle, P., Liang, L. and Zeger, S. (1994). Analysis of Longitudinal Data, Clarendon
Press, Oxford.

Diggle, P. and Kenward, M., G. (1994). Informative drop-out in longitudinal data
analysis (with discussion). Applied Statistics, 43, 49-93.

Dunson, D., B. (2000). Bayesian latent variable models for clustered mixed outcomes.
Journal of the Royal Statistical Society B 62, 355-366.

Edwards, D. (1990). Hierarchical interaction models. Journal of the Royal Statistical
Society. Series B, 52, 3-20.

Emily, L. Webb and Jonathan, J. Forster. (2006). Bayesian model determination for
multivariate ordinal and binary data.

Escobar, M., D. and West, M. (1995). Bayesian density estimation and inference using
mixtures. Journal of American Statistical Association 90, 577-588.

Everitt, B., S. and Hand, D., J. (1981). Finite mixture distributions. London: Chapman
and Hall.

Eva Tzala and Nicky Best. (2007). Bayesian latent variable modeling of multivariate
spatial-temporal variable in cancer mortality. Statistical Methods in Medical Research
2007; 1-22.

Fitzmaurice, G., M. & Laird, N., M. (1993). A likelihood-based method for analyzing
longitudinal binary responses. Biometrika, 80, 141-51.

Fisher, R., A. (1922). On the mathematical foundations of theoretical statistics.
Philosophical Transitions of the Royal Society of London, Series A, 222, 309-368.

Geman, S. and D. Geman. (1984). "Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images." IEEE Trans. Pattern Analysis and Machine Intelligence
6: 721-741.

Green, Peter. ( 1995). Reversiblejump Markov chain Monte Carlo computation and
Bayesian model determination. Biometrika, 82, 71 1-732.

Hobert, J ., P. and Casella, G. (1996). The effect of improper priors on Gibbs sampling in

Hierarchical linear mixed models. Journal of the American Statistical Association 91,
1461-1473.

117

Ibrahim, J ., Chen, M. and Lipsitz, S. (2002). Bayesian methods for generalized linear
models with covariates missing at random. Canadian Journal of Statistics, 30, 55-78.

Jansen R., C. (1993). Maximum likelihood in a generalized linear ﬁnite mixture model
by using the EM algorithm. Biometrics 49, 227-231.

Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems.
Proceedings of the Royal Society of London. Series A, 186, 453-461.

Jeroen K. Vermunt and Jay Magidson. (2004). Hierarchical mixture models for nested
data structure.

Julian Besag, Peter Green, David Higdon and Kerrie Mengersen. (1995). Bayesian
computation and stochastic systems. Statistical Science Vol. 10, No. 1, 3-66.

Kass, R. and Raﬁery, A. ( I995). Bayes factors and model uncertainty. Journal of
American Statistical Association, 90, 773-795.

Laplace, P., S. (1820). English translation: Philosophical essay on Probabilities (1951).
New York: Dover.

Lauritzen, S., L. and Wermuth, N. (1989). Graphical models for association between
variables, some of which are qualitative and some quantitative. Ann. Statist., 17, 31-57.

Lee S., Y. and Song X., Y. (2004). Evaluation of the Bayesian and maximum likelihood
approaches in analyzing structural equation models with small sample sizes. Multivariate
Behavioral Research 2004; 39: 653-686.

Leroux, B. (2006). Analysis of Correlated Dental Data: Challenges and Recent
Developments. Statistical Methods for Oral Health Research, JSM 2006.

Lesaffre, E. and Bogaerts, K. Spatial Correlations in Caries Attack Patterns in the
Deciduous Dentition. Biostatistics, K. U. Leuven.

Liang, K., Y. and Zeger, S., L. (1986). Longitudinal data analysis using generalized
linear models. Biometrika 73, 13-22.

Liang, K., Y., Zeger, S., L. (1989). A class oflogistic regression models for multivariate
binary time series. Journal of American Statistical Association, 84, 447-451.

Liang, K., Y., Zeger, S., L. and Qaqish, B. (1992). Multivariate regression analysis of

categorical data (with discussion). Journal of the Royal Statistical Society, Series B. 45,
3-40.

118

Lipsitz, R., Laird, N., M. & Harrington, P. (1991). Generalized estimating equations for
correlated binary data: Using the odds ratio as a measure of association. Biometrika 78,
153-160.

Lipsitz, S. and Ibrahim, J. (1996). A conditional model for incomplete covariates in
parametric regression models. Biometrika, 83, 916-922.

Little, R. and Rubin, D. (1987). Statistical Analysis with Missing Data. New York:
Wiley.

Manski, C., F. and McFadden, D. (1981). Structural Analysis of Discrete Data with
Econometric Applications. Cambridge: Massachusetts Institute of Technology Press.

Mardia, K. V. (1988). Multi-dimensional multivariate Gaussian Markov random ﬁelds
with application to image processing. Journal of Multivariate Analysis 24, 265-284.

Mathias, Drton and Micheal Perlman. (2004). Model selection for Gaussian concentra-
tion graphs. Biometrika, 91,3, pp. 591-602.

McClachIan, G., J. and Basford, K., E. (1988). Mixture models: inference and applica-
tions to clustering. New York: Marcel Dekker.

McLachlan, G and Peel, D. Finite Mixture Models. Wiley (2001).

McCullagh, P. and Nelder, J., A. (1989). Generalized Linear Models, 2nd edition. New
York: Chapman and Hall.

Morton, R. (1987). A generalized linear model with nested strata of extra-Poisson
variation. Biometrika, 74, 247-257.

Moustaki, 1.1(1996). A Latent Trait and a Latent Class Model for Mixed Observed
Variables, British Journal of Mathematical and Statistical Psychology, 49, 313-334.

Neuthaus, J., M., Hauck, W., W. and Kalbﬂeisch, J., D. (1992). The effects of mixture
distribution misspeciﬁcation when ﬁtting mixed effects logistics models. Biomatrika, 79,
755-762.

O'Malley, A., James and Zaslavsky, Alan M. (2006). Domain-level covariance analysis
for survey data with structured nonresponse. Working paper.

Paolo Giudici and Peter J. Green. (1999). Decomposable graphical Gaussian model
determination. Biometrika 86, 4, pp. 785-801.

Pettitt, A., N., Tran, T., T., Haynes, M., A. and Hay, J ., L. (2006). A Bayesian hierarch-

ical model for categorical longitudinal data from a social survey of immigrants. Journal
of the Royal Statistical Society A (2006), 169, Part I, pp. 97-1 14.

119

 

Prentice, R., L. (1988). Correlated binary regression with covariates speciﬁc to each
binary observation. Biometrics, 44, 1033-1048.

Richardson, S. and Green. P. (1997). On Bayesian analysis of mixtures with an
unknown number of components, Journal of the Royal Statistical Society, Series B. Vol.
59, No. 4. (1997), pp. 731-792.

Robert, C., P. (1996). Mixture ofdistributions: inference and estimation. In Markov
Chain Monte Carlo in Practice, W.R. Gilks , S. Richardson, and DP. Spiegelhalter
(Eds). London: Chapman & Hall, pp. 441-464.

Robert E. Kass; Larry Wasserman. (1996). The selection of prior distributions by formal
rules. Journal of the American Statistical Association, Vol. 91, No. 435. pp. 1343-1370.

Roeder, K. and Wasserman, L. (1997). Practical density estimation using mixtures of
normals. Journal of the American Statistical Association 92, 894-902.

Roy, J. (2006). Statistical Approaches for Dealing with Missing Tooth- and Surface
Level Data in Caries Research. Statistical Methods for Oral Health Research, JSM 2006.

Sammel, M., D., Ryan, L., M., and Legler, J., M. (1997). Latent variable models for
mixed discrete and continuous outcomes. Journal of the Royal Statistical Society B 59,
667-678.

Samuel, M. Manda, Rebecca, E., Walls and Mark, S., Gilthorpe. (2007). A full Bayesian
hierarchical mixture model for the variance of gene differential expression. BMC
Bioinformatics.

Scott L. Zeger and M. Rezaul Karim. (1991). Generalized Linear models with random
effects; A Gibbs Sampling Approach. Journal of the American Statistical Association,
theory and Methods Vol. 86, No. 413, 79-86.

Seam, M. O'Brien and David B. Dunson. (2004). Bayesian multivariate logistics regre-
ssion. Biometrics 60, 739-746.

Spiegelhalter, D., J., Best, N., G., Carlin, B., P. and van der Linde A. (2002). Bayesian
measures of model complexity and ﬁt (with discussion) Journal of Royal Statistical
Society. B. 64, 583-640.

Spiegelhalter, D., J., Thomas, A. and Best, N. (2003). WinBUGS Versionl .4 User
Manual. mm: hrc-hsu. cam. ac. Itlt' hugs.

 

Stephens, M. (2000). Bayesian analysis of mixtures with an unknown number of comp-
onentsan alternative to reversiblejump methods, The Annals of Statistics. Volume 28,
Number 1 (2000), 40-74.

120

Stiratelli, R., Laird, N. and Ware, J ., H. (1984). Random-effects models for serial obser-
vations with binary response. Biometrics, 40, 961 -971.

Stuart, R., Lipsitz and Garrett Fitzmaurice. ( 1994). An extension of Yule's Q to multiva-
riate binary data. Biometrics, 50, 847-852.

Titterington, D., M., Smith, A., F. and Makov, U., E. (1985). Statistical analysis of ﬁnite
mixture distributions. New York: Wiley.

Thompson T., J ., Smith P., J. and Boyle J., P. (1998). Finite mixture models with conco-
mitant information: assessing diagnostic criteria for diabetes. Applied Statistics, 47, 393-
404.

Thomas, A., Best, N., Lunn, D., Arnold, R. and Spiegelhalter, D., J. (2004). GeoBUGS
Versionl .2 User Manual. www. hrc-bsu.camac. ilk/bugs.

 

Van Duijn Maj and Bockenholt U. (1995). Mixture models for the analysis of repeated
count data. Applied Statistics 44, 47385.

Vanobbergen, J ., Martens, L., Lesaffre, E., Declerck, D. (2000). The Signal-Tandmobiel
project, a longitudinal intervention health promotion study in Flanders (Belgium):
baseline and ﬁrst year results. Europe Journal Paediatric Dentistry 2000; 2: 87-96.

Vanobbergen, J ., Lesaffre, E., Garca-Zattera, M., J ., Jara, A., Martens, L. and Declerck, J.
(2007). Caries Patterns in Primary Dentition in 3-, 5- and 7-year-old Children: Spatial
Correlation and Preventive Consequences.

Verbeke, G. and Molenberghs, G. (2000). Linear mixed models for longitudinal data.
New York: Springer, (Springer series in statistics).

Vincent Carey, Scott L. Zeger and Peter Diggle. (1993). Modeling multivariate binary
data with alternating logistic regressions. Biometrika, 80, 3, pp. 517-526.

Wang, P., Puterman, M. L., Cockbum, I. and Le, ND. (1996). Mixed Poisson regression
models with covariate rates. Biometrics 52, 381-400.

Wang, P. and Puterman M., L. (1998). Mixed logistic regression models. Journal of
Agricultural, Biological, and Environmental Statistics 3, 175-200.

Wang, F., J. and Wall, M., M. (2003). Generalized common spatial factor model.
Biostatistics 4, 569-5 82.

Welch, B., L. and Peers, H., W. (1963). On formulae for conﬁdence points based on
integrals of weighted likelihood. Journal of Royal Statistical Association, B, 25, 318-329.

121

West, M., Muller, P., and Escobar, M., D. (1994). Hierarchical priors and mixture
models with application in regression and density estimation. In aspects of Uncertainty:
A tribute to D. V. Lindley, A.F.M. Smith and P. Freeman (Eds). New York: Wiley, pp.
363-386.

Wu, M., C. and Carroll, R., J. (1988). Estimation and comparison of change in the
presence of informative right censoring by modeling the censoring process. Biometrics,
44,175-188.

Zabell, S., L. (1992). R. A. Fisher and the ﬁducial argument. Statistical Science, 7, 369-
387.

Zhao, L., P. and Prentice, R., L. (1990). Correlated binary regression using a quadratic
exponential model. Biometrika 77, 642-648.

Zhao, Y., Staudenmayer, J., Coull, B., A. andWand, M., P. (2006). General Design
Bayesian Generalized Linear Mixed Models. Statistical Science Vol. 21, No.1, 35-51.

Zhu, J ., Erickhoff, J ., C. and Yan, P. (2005). Generalized Linear Latent Variable Models

for Repeated Measures of Spatially Correlated Multivariate Data. Biometrics, 61, 674-
683.

122

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

WWWWMWW