USING MACHINE LEARNING TO UNCOVER POPULATION
     HETEROGENEITY IN LONGITUDINAL STUDY
                              By
                         Youngjun Lee
                      A DISSERTATION
                          Submitted to
                  Michigan State University
          in partial fulfillment of the requirements
                       for the degree of
 Measurement and Quantitative Methods—Doctor of Philosophy
                             2022


                                             ABSTRACT
                 USING MACHINE LEARNING TO UNCOVER POPULATION
                         HETEROGENEITY IN LONGITUDINAL STUDY
                                                  By
                                             Youngjun Lee
        Machine learning has been an emerging data analytic tool in the fields of quantitative social
and behavioral sciences. Among others, model-based recursive partitioning (MOB) is one of the
popular comprehensive approaches incorporating parametric model into tree-based algorithm. It has
gained growing interests as a complementary data analytic tool to address population heterogeneity
by detecting parameter instability over candidate covariates. Structural equation models using tree
algorithm (SEM Trees) has particularly shown its benefits for discovering informative covariates and
their complex interactions that predict differences in structural parameters with interpretable results,
which in turn produces distinct homogeneous subgroups. While all previous studies make important
contributions to use this approach, it has been less examined to investigate the performance of SEM
Trees where there exist interaction effects of various types of covariates (i.e., categorical, ordinal,
and continuous), which is the key motivation of this study.
        This study has three main purposes. First, it aims to introduce a framework of MOB for
 educational researchers and guide them when it can be beneficial with an illustrative example
 using nationally representative longitudinal data (High School Longitudinal Study of 2009). A
 parametric latent growth curve model (LGCM) is used as a template model along with MOB.
 Second, a simulation study for a given LGCM is conduced to investigate the performance of
 MOB, which provides researchers with statistical evidence of how well MOB recovers true
 subgroups. Simulation conditions include a) effect size (0.2, 0.4, 0.6, 0.8, and 1.0), b) sample
 size (1,000, 2,000, 5,000, 10,000, and 20,000), c) three different test statistic for ordinal


covariate (chi-square, adapted maximum Lagrange multiplier, and a weighted double maximum),
d) pre pruning option of limiting the minimum sample size per subgroup (250 vs. none), and e)
post pruning option (BIC vs. none). The main evaluation criteria are a) statistical power to
recover true subgroups, b) overall classification accuracy and precision, c) accuracy of cut points
of ordinal/continuous covariates and labels of categorical covariates, and d) bias and root mean
squared error (RMSE) of the parameter estimates per subgroup. Third, the simulation is
parallelly conducted with GMM, and the results of it are compared with the ones of MOB.
       The key findings suggest that medium effect size (0.4 - 0.6) with relatively large sample
sizes (5,000, 10,000, and 20,000) and large effect size (0.8 - 1.0) with adequate sample size
(1,000 or 2,000) are enough to distinguish the difference in focal parameters, recovering the true
number of subgroups. In addition, treating ordinal variables as either ordinal or categorical is not
different in terms of recovering the true subgroups. However, the empirical study suggests that
using test statistic for the ordinal covariates is desired when there exist association between the
outcome and ordinal covariate. Post pruning using BIC and limiting the minimum size per
subgroup simultaneously are also desired options. Without the post pruning with BIC, MOB
tends to over-extract the subgroups across conditions. With the same simulated datasets, GMM
produced neither accurate subgroups nor reliable parameter estimates.
       This study sheds light on how to uncover subpopulations using MOB algorithm with a
popular parametric model for longitudinal study. This approach is beneficial for large-scale data
such as more than 10,000 sizes with large number of potential covariates. Limitations and future
directions are also discussed. The findings play a critical role to lay the groundwork of extending
the application of MOB into various statistical models by investigating its performance regarding
complex covariate effects.


Copyright by
YOUNGJUN LEE
2022


This dissertation is dedicated to my parents, wife, and friends.
            Thank you for always believing in me.
                                v


                                              TABLE OF CONTENTS
LIST OF TABLES ............................................................................................................................ viii
LIST OF FIGURES .............................................................................................................................. x
CHAPTER 1. INTRODUCTION ......................................................................................................... 1
 1.1      Background and rationale .................................................................................................. 1
 1.2      Research purposes and questions ........................................................................................ 6
CHAPTER 2. LITERATURE REVIEW .............................................................................................. 9
 2.1      Overview of statistical models explaining population heterogeneity ................................ 9
 2.2      Model-based recursive partitioning ................................................................................. 11
          2.2.1 Parameter estimation for a template model ......................................................... 13
          2.2.2 Testing instability of parameter estimates........................................................... 15
          2.2.3 Partitioning sample into subgroups along with selected covariates .................... 18
          2.2.4 Repeating steps until stopping rules are met ....................................................... 19
CHAPTER 3. AN ILLUSTRATIVE EXAMPLE .............................................................................. 20
 3.1      Data .................................................................................................................................. 23
 3.2      Template model: LGCM .................................................................................................. 27
 3.3      Using MOB to grow a tree ............................................................................................... 34
CHAPTER 4. METHODS .................................................................................................................. 41
 4.1      Population model for data generation .............................................................................. 41
 4.2      Simulation design ............................................................................................................. 46
 4.3      Evaluation criteria ............................................................................................................ 47
CHAPTER 5. RESULTS .................................................................................................................... 52
 5.1      Statistical power to recover the true number of subgroups .............................................. 52
 5.2      Overall classification accuracy and precision .................................................................. 55
 5.3      Accuracy of splitting points of covariates ........................................................................ 62
 5.4      Bias and RMSE of parameter estimates ........................................................................... 64
 5.5      Several desirable options.................................................................................................. 74
          5.5.1 Test statistics of ordinal covariates ..................................................................... 74
          5.5.2 Post pruning method using BIC .......................................................................... 76
          5.5.3 Limiting minimum sample size per subgroup ..................................................... 78
 5.6      Results of growth mixture model ..................................................................................... 84
CHAPTER 6. CONCLUSION AND DISCUSSION ......................................................................... 85
 6.1      Summary of findings ........................................................................................................ 85
 6.2      Discussions....................................................................................................................... 89
 6.3      Limitation and future research ......................................................................................... 92
                                                                   vi


APPENDIX ......................................................................................................................................... 95
REFERENCES ................................................................................................................................. 103
                                                                    vii


                                                LIST OF TABLES
Table 1. A list of variables of High School Longitudinal Study of 2009 data ............................. 26
Table 2. Descriptive statistics of continuous and ordinal variables without sampling weights
          (n=9,275) ...................................................................................................................... 28
Table 3. Frequency and proportions of categorical variables without sampling weights ............. 28
Table 4. Correlation matrix between continuous variables with a visualized figure .................... 29
Table 5. Results of unconditional quadratic latent growth curve model ...................................... 33
Table 6. Parameter estimates of the LGCM tree of MOB using all covariates ............................ 38
Table 7. Parameters of a population model .................................................................................. 42
Table 8. Parameters of fixed effects for each subgroup depending on effect size........................ 43
Table 9. Confusion table ............................................................................................................... 48
Table 10. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
          true number of subgroups using WDMO statistics ....................................................... 54
Table 11. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
          (MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.2) ............................. 57
Table 12. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
          (MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.4) ............................. 58
Table 13. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
          (MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.6) ............................. 59
Table 14. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
          (MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.8) ............................. 60
Table 15. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
          (MAR), and F1 (MAF1) using WDMO statistics (Effect size=1.0) ............................. 61
Table 16. Mean of splitting points of covariates (MS) using WDMO statistics ........................... 63
Table 17. Bias of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=0.4) ............................................................................... 66
Table 18. Bias of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=0.6) ............................................................................... 67
                                                               viii


Table 19. Bias of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=0.8) ............................................................................... 68
Table 20. Bias of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=1.0) ............................................................................... 69
Table 21. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=0.4) ............................................................................... 70
Table 22. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=0.6) ............................................................................... 71
Table 23. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=0.8) ............................................................................... 72
Table 24. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
          WDMO statistics (Effect size=1.0) ............................................................................... 73
Table 25. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
          true number of subgroups (Effect size = 0.2) ............................................................... 79
Table 26. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
          true number of subgroups (Effect size = 0.4) ............................................................... 80
Table 27. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
          true number of subgroups (Effect size = 0.6) ............................................................... 81
Table 28. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
          true number of subgroups (Effect size = 0.8) ............................................................... 82
Table 29. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
          true number of subgroups (Effect size = 1.0) ............................................................... 83
                                                 ix


                                                   LIST OF FIGURES
Figure 1. An example of LGCM tree using MOB. ....................................................................... 22
Figure 2. An example of expected GPA changes across four grades (Nodes are subgroups). ..... 23
Figure 3. Spaghetti plot of overall GPA for 50 randomly chosen students. ................................. 30
Figure 4. The expected GPA score over four years from quadratic latent growth curve model. . 32
Figure 5. A LGCM tree of MOB with minimum size of 250 per node and pruning of BIC using
           WDMO statistic (BLK = Black, HIP = Hispanic, OTS = Others, ASA = Asian, WHT =
           White, mi = μI; mean intercept, ms = μS; mean linear slope, mq = μQ; mean quadratic
           slope). ........................................................................................................................... 39
Figure 6. Expected GPA changes over four years of the 13 distinct subgroups (nodes). ............. 40
Figure 7. The expected GPA changes over four years of the four subpopulations depending on
           effect size. ..................................................................................................................... 44
Figure 8. A true tree structure of four subgroups using the effect size of 0.2. ............................. 45
                                                                    x


                                    CHAPTER 1. INTRODUCTION
1.1      Background and rationale
         Machine learning has been an emerging data analytic tool in the fields of methodological
statistics as well as social and behavioral sciences since the first appearance of automated interaction
detection (AID) for tree-structured regression analysis introduced by Morgan and Sonquist (1963).
Emergence of new academic communities such as Educational Data Mining and Learning Analytics
reflect the growing trend in application of machine learning within educational contexts (e.g., Baker
et al., 2016; Paquette et al., 2020). Machine learning refers to either supervised or unsupervised
process of modeling that automatically reveals patterns of variation in large-scale datasets or so-
called big data. The main goal of machine learning is to build reliable and accurate predictive
models, and it has several advantageous features over traditional statistical regression models. First,
it can handle high-dimensional predictors even when the number of the predictors (i.e., the number
of columns) is larger than the sample size (i.e., the number of rows). Second, there are no statistical
assumptions with a model. That is, it has merely set of optimal tuning parameters and several best
performing classifiers (i.e., algorithms) to enhance the performance of prediction. Third, no prior
knowledge is required to select predictors (covariates or features) for constructing a model, which
allows a screening of informative predictors in exploratory research. Fourth, machine learning
automatically detects nonlinearity and complex interaction effects of covariates with an iterative
algorithmic approach.
         Among many machine learning approaches, tree-based methods, also known as recursive
partitioning or decision-tree, have been extensively and increasingly employed in educational and
psychological research (e.g., Grimm & Jacobucci, 2020; Jacobucci & Grimm, 2020; Strobl et al.,
2011; Strobl et al., 2015). A well-known algorithm within the realm of tree-based methods is the
                                                    1


classification and regression trees (CART; Breiman et al., 2017). To briefly explain the tree
algorithm, a structured tree is grown by recursively partitioning the samples using available
variables, starting from the entire sample (i.e., root node) through subgroups (i.e., child nodes or
inner nodes) to final subgroups (i.e., terminal nodes or final nodes) of subjects based on the values
of variables selected by algorithms. Thus, the subgroups of subjects are determined by a set of
variables that are related to the outcome, in which the values of the outcome are different across
subgroups. Two key strengths of using tree-based methods are interpretability and predictability.
Although the predictive power of decision-tree is relatively weaker than other modern approaches
(Fernandez-Delgado et al., 2014), such as random forests introduced by Breiman (2001), a single
tree is still significantly valuable due to its simple, intuitive, and clear interpretability because it
produces a visualization of the tree with the results (e.g., Eo & Cho, 2013; Le & Moore, 2020).
Consequently, one can easily understand the structure, composition, and characteristics of the
subgroups depicted by the tree.
         In contrast to the machine learning detecting complex interactions automatically, traditional
statistical models should specify all the interaction terms. In the realm of social sciences, “the
testing of interactions is at the very heart of theory testing” (Cohen et al., 2014). Interactions can be
a form of person-person, context-context, or person-context (Bauer & Shanahan, 2007). If the
interaction effects are well-established and correctly specified in a model, the results of the
statistical models are clearly interpretable, unbiased, and efficient (De Gonzalez & Cox, 2007).
However, when given many variables in datasets, specifying all possible combinations of
interactions of the variables produces complex higher-order interactions. This not only makes ones
hard to understand the terms in a model, but also is not common in practice. Even if higher-order
interactions are modeled correctly, the interpretation is greatly limited as the effects of each
covariate are estimated controlling for both effects of other key covariates and interaction effects.
                                                      2


Moreover, for categorical variables, multiple dummy variables have to be created and the quantity
of their interactions with other covariates would increase dramatically. For instance, if there are
three covariates including gender (e.g., female, male) and race/ethnicity (e.g., White, Black,
Hispanic, Asian, and others), and categorized socio-economic status (e.g., low, medium, and high),
the possible number of interaction terms is twenty-two. As researchers also tend to include and
interpret interaction terms based on the statistical significance of each term, the model specification
is likely to be exploratory and subject to a great degree of modifications, which may lead to spurious
results.
         To partially compensate for specifying complex interaction effects of the statistical models,
there has been substantially growing interests in combining parametric models and tree-based
methods over the last two decades (see Loh, 2014 for more details). This analytic approach is
beneficial for finding informative covariates as their higher-order interactions and nonlinearity
effects can be automatically detected. Among many others, the model-based recursive partitioning
(MOB; Zeileis et al., 2008) provides a unified framework that fits a parametric model locally, in
which the heterogeneous subsamples are determined by testing overall parameter instability. In
many cases of social science research, it may be unrealistic to assume that a global model fits the
whole sample at a satisfactory. Instead, it would be more reasonable to assume that data for varying
subsamples of subjects well fits diverse models (Zeileis et al, 2008). As regards, MOB attempts to
find such subsamples given available covariates using a huge recursive searching method. Within
this framework, the definition of a covariate is a candidate variable that could be potentially related
to the interested outcome(s) in the available datasets. This concept of covariate is broader than the
one used in statistics. That is, all the variables that could be related to the outcome(s) that
researchers believe can be the candidates of covariates, making algorithm(s) detect and find those
relationship automatically. The basic idea of the generic MOB is that a particular parametric model
                                                       3


is fitted to each subsample, which can be in a form of OLS regressions, generalized linear models,
item response models, or structural equation models. The estimated parameters are then tested
whether they are statistically different depending on the values of covariates. It has been shown that
these new approaches adopting machine learning offer valuable opportunities to answer novel
research questions for social and behavioral research that is notably different from what traditional
parametric statistical models can answer; for example, how the subgroups (formed based on the
combinations of covariates) differ in their characteristics, raising the issues of population
heterogeneity (e.g., Serang, 2021).
         Over the past decade, researchers have been actively adopting MOB particularly for
structural equation models, so called (SEM Trees), which was introduced by Brandmaier et al.
(2013). SEM Trees integrates the comprehensive and flexible SEM framework with tree algorithm.
The main goal of SEM Trees is analogous to MOB, which is to identify subgroups having similar
covariance structures or item response patterns using a data-driven, but theory-constrained search as
SEM Trees utilizes a template structural parametric model derived from an existing theory. SEM
Trees are currently implemented via specific software packages, such as semtree (Brandmaier et
al., 2013) by either connecting it with OpenMx (Neale et al., 2015) or lavaan (Rosseel, 2012) to
estimate SEM models. Stegmann et al. (2018) further proposed an approach called nonlinear
longitudinal recursive partitioning with an associated package of longRpart2, which is useful to
model inherent nonlinearity of changes. Recently, Serang and his colleagues (2020) extended SEM
Trees to make it easily available with popular commercial software Mplus by connecting it with
MplusTrees in R package. They claim that it has some advantages that can cover broader range
of SEMs estimated from Mplus over OpenMx or lavaan. Currently, there are a few simulation
studies employing SEM Trees for longitudinal data (Usami et al., 2017; Usami et al., 2019). The
results indicate that the informativeness of a dichotomous covariate related to the true subgroups,
                                                    4


which is measured by their correlation, was the most critical factor to recover the true number of
subgroups.
        SEM Trees originally use likelihood-ratio test (LRT; testing change of deviance) by default
for evaluating heterogeneity of parameters to search for optimal split points in covariates. The LRT-
guided split evaluation appears to be powerful and efficient for dichotomous covariates, although it
requires an additional step of locating the optimal cut point for categorical, ordinal, or continuous
covariates that have more than two unique values. The likelihood ratio for each possible splitting
values of each covariate should be calculated, which in turn produces computationally demanding
processes. To complement this, Arnold et al. (2021) recently added options of various score-based
testing methods for ordinal and continuous covariates into existing semtree. They determined that
not only it was more computationally efficient than LRT, but also it showed having higher enough
power to detect group differences and unbiased estimates in the selected covariates to grow a tree. A
key difference between SEM Trees and MOB is how they choose the split points of a covariate,
which is one of the key elements to determine the characteristics of subgroups. While SEM Trees
locates the cut points by means of score-based testing, MOB locates the cut points by comparing
likelihood ratios. That is, MOB first selects the covariate using test statistic. Then, it determines the
cut point by optimizing the sum of the loss function between the two resulting subgroups. This
makes MOB computationally more efficient than the original SEM Trees. However, no prior studies
have explored the performance of MOB for SEMs.
        More importantly, while all previous studies make important contributions, most of the
simulation studies merely examined single informative covariate that is directly related to the
terminal subgroups. A few empirical studies using SEM Trees did explore various types of
covariates showing interactions (e.g., Stegmann et al., 2018) and yet, their applications in terms of
adopting options for finding optimal and generalizable results are largely insufficient. Finding
                                                    5


empirical literature using SEM Trees or MOB is also limited in social science literature.
Furthermore, until today, there is no simulation study that investigates the performance of MOB for
varying types of covariates and their interaction effects within the context of longitudinal SEM,
which is the key motivation for this study. To the best of my knowledge, this is the first simulation
study to employ the MOB algorithm from partykit for the SEM trees rather than using
semtree to scrutinize how well it performs.
1.2      Research purposes and questions
          This study has three main purposes to address the needs of using MOB approach: a)
Demonstrate how to use MOB with longitudinal data using two R packages, partykit and
lavaan with an empirical data (High School Longitudinal Study of 2009). b) Investigate the
performance of MOB with latent growth curve model (LGCM) having interactions of multiple types
of covariates (categorical, ordinal, and continuous) via a simulation study. C) Compare the results of
MOB with the ones of growth mixture model (GMM).
          To accomplish the above three research purposes, there are specific nine research questions
to be answered:
        1) How can the approach of MOB with longitudinal data be used to find heterogeneous
            subgroups?
        2) For a given population model of LGCM and the certain number of covariates, how well
            MOB correctly determine the true number of subgroups?
        3) For a given population model of LGCM and the certain number of covariates, how
            accurately and precisely MOB classify the true subgroups?
        4) For a given population model of LGCM and the certain number of covariates, how well
            MOB recover the splitting points of the covariates?
        5) For a given population model of LGCM and the certain number of covariates, how
                                                   6


             accurately and precisely MOB recover the parameter estimates (mean intercept)?
        6) What is the best option for test statistic of the ordinal covariates?
        7) When is the post pruning option of BIC more desirable than without it?
        8) When is limiting the minimum sample size per a subgroup more desirable than without
             it?
        9) For a given population model of LGCM, how well GMM correctly determine the true
             number of subgroups compared to MOB?
           The first research question is answered by an illustrative example with detailed procedures.
This study employs an empirical data from a nationally representative longitudinal study. Using the
results from the empirical data, a population model is specified and datasets for simulations are
generated. The second to the eighth research questions are answered by a Monte Carlo simulation
study. The performance of MOB was evaluated under various conditions including a) effect size
(0.2, 0.4, 0.6, 0.8, and 1.0), b) sample size (1,000, 2,000, 5,000, 10,000, and 20,000), c) treatment of
ordinal covariate with different test statistic (chi-square, adapted maximum Lagrange multiplier, and
a weighted double maximum), d) pre-pruning option limiting minimum sample size per subgroup
(250 vs. none), and e) post-pruning option (BIC vs. none). To answer the nineth research question,
the simulated datasets were simultaneously fitted to a growth mixture model (GMM) to see how the
results are different from each other.
           The next chapter reviews traditional statistical approaches dealing with population
heterogeneity depending on the types of variables (outcomes), assumptions, and contexts, followed
by reviewing the detailed procedures of MOB and previous literature. Chapter 3 describes how to
use MOB with the empirical longitudinal data to find heterogeneous subgroups, followed by the
interpretation of the resulting tree and figures. Chapter 4 presents a population model along with
simulation designs and evaluation criteria for the simulation study. Chapter 5 describes the results in
                                                      7


the order of the research questions, a) statistical power to determine the true number of subgroups,
b) overall classification accuracy and precision of the subgroups, c) accuracy of the splitting points
of the covariates, d) bias and root mean squared error (RMSE) of the parameter estimates, and e)
desirable options for test statistic of the ordinal covariates, post pruning method using BIC, and
limiting minimum sample size per a subgroup. Primary findings, implications, limitations of this
study are discussed in Chapter 6.
                                                      8


                                 CHAPTER 2. LITERATURE REVIEW
2.1       Overview of statistical models explaining population heterogeneity
           Population heterogeneity has gained attention in social and behavioral science literature
(Lubke & Muthén, 2005). Sources of the heterogeneity can be either observed or unobserved. For
the former, heterogeneity is often explained by fixed effects of covariates such as demographics,
contextual backgrounds, or behavioral characteristics and psychological traits. These fixed effects
reveal the difference of the outcome between the observed covariates specified by researchers in a
statistical model. The unexplained individual differences can be captured by residual (error term) in
the regression model. If the data has a hierarchically nested structure (i.e., students are nested within
schools, and the schools are nested in counties, and so forth), multilevel models (also known as
hierarchical linear models; Raudenbush & Bryk, 2002) can be used to specify the associated random
effects in a model not only to capture the remained heterogeneity of the individuals, but also to
explain the differences of parameters of regression coefficients by regressing them on observed
covariates at each level. This approach has been widely used for nested data structure. Another
popular approach to investigating heterogeneity is to employ multi-group structural equation models
(MGSEM; Jöreskog, 1971) or differential item functioning in item response theory (DIF;
Mellenbergh, 1989), which tests separate structural factor models or item response functions with
two or more pre-defined grouping covariates such as gender or race. This is especially useful when
there is a small number of groups to be tested for comparisons. However, it becomes tiresome and
infeasible with numerous groups because multiple estimation and testing should be done to compare
parameters across every pair of groups, requiring adjustments of multiple statistical testing.
Eventually, this reduces statistical power.
           If the source(s) of the heterogeneity is unobserved or latent rather than observed, other
                                                     9


approaches can be used to specify and estimate them in a statistical model. Within a comprehensive
and general statistical framework of latent variable modeling (LVM), a popular flexible modeling
approach to investigating heterogeneity is called finite mixture model (FMM). FMM can be defined
as a parametric statistical model assuming the presence of unobserved distinct groups, also known
as latent classes (McLachlan & Basford, 1988; McLachlan et al., 2019). If it is reasonable to assume
that the sample consists of several latent groups showing different characteristics or distributions in
terms of measured outcome variable(s), a variant of FMM can be a decent choice for researchers to
adopt depending on the types of observed indicators and latent variables as well as research
questions. Under the umbrella of the extended FMM framework, representative examples of sub-
models that explore unobserved heterogeneity include (a) latent class analysis for both categorical
observed and latent variables (LCA; McCutcheon, 1987), (b) latent profile analysis for continuous
observed indicators and categorical latent variables (LPA; Gibson, 1959), (c) factor mixture
model/analysis (FMA; Lubke & Muthén, 2005) and mixture item response theory (Mixture IRT;
Rost, 1990) for continuous and categorical/ordinal indicators, respectively, specifying
simultaneously both categorical and continuous latent variables for cross-sectional outcomes of
measurement models, and (d) growth mixture model for longitudinal outcomes (GMM; Muthén &
Asparouhov, 2007; Muthén & Shedden, 1999).
           Once the number of the latent classes and their structures are determined via a series of
process of selecting the best fitting model based on some criteria such as AIC (Akaike Information
Criterion; Akaike, 1974) or BIC (Bayesian Information Criterion; Schwarz, 1978) or relevant other
statistical testing and theoretical consideration, a natural interest is to identify the characteristics of
the latent classes using available covariates and previous knowledge from established theory. Since
the latent class itself does not sufficiently inform meticulous implications, informative covariates
should be added to explain the latent class memberships. However, there exist mixed
                                                     10


recommendations for guiding and determining the number of classes whether one should include
covariates during the modeling process or after class enumeration. While some argue that
simultaneous estimation of a correct model specification with the added covariates produces reliable
class enumeration (e.g., Lubke & Muthén, 2007), others, for example, Vermunt (2010) and Nylund-
Gibson and Masyn (2016) suggest that the number of latent classes should be determined without
covariates first, then covariates or distal outcomes would be added to examine their associations
with latent class memberships. The latter approach is called as three-step approach in contrast to
one-step approach for the former (Asparouhov & Muthén, 2013). Following the three-step approach,
it provides estimated posterior probabilities for each observation to belong to certain classes
accounting for uncertainty in class membership. The class membership, which is a nominal variable
generated from the posterior probabilities, is then explained by the added covariates using
multinomial logistic regression model (Nylund-Gibson et al., 2014). However, the relationship
between the covariates and the class membership is typically presumed to be linear, and their
interactions are manually formed by one’s own choice/decision. If there are many available potential
informative covariates that are likely to interact with others, and less established prior knowledge
about the relationship, it challenges researchers to correctly specify those complex interactions that
might be associated with the class membership. The model-based recursive partitioning method
adopting machine learning technique of the decision tree algorithm can handle this issue of complex
interactions of the covariates, which helps to understand and interpret the subgroups that are distinct
in terms of the specified statistical models.
2.2      Model-based recursive partitioning
          The model-based recursive partition is introduced by Zeileis et al (2008) to not only
uncover subgroups but also investigate different treatment effects depending on the groups. MOB
employs an empirical score function for detecting the parameter instability (Zeileis & Hornik,
                                                    11


2007). The score is a case-wise derivative of the estimation function at the estimated parameters. It
is used to inspect if the parameter estimates fluctuate randomly around their mean of zero or exhibit
systematic deviations from zero over the values of covariates, which leading to construct relevant
test statistic for any types of covariates. It has been widely used in various social and behavioral
research including but not limited to psychometric research, such as measurement invariance (e.g.,
Merkle et al., 2014). MOB is also adopted to beta regression for limited responses (Grün et al.,
2012) and Rasch item response theory (Rasch Trees; Strobl et al., 2015). Recently, it was extended
to a linear mixed model that handles multilevel data structure (Fokkema et al., 2018). They
conducted a simulation study to see if it recovers treatment-subgroup interactions under nested data
structure. The results showed higher accuracy and predictive power for recovering the interaction of
fixed effects, while the random effects were set to constant across subgroups.
           The general procedures of MOB algorithm have four steps (Zeileis et al., 2008):
         (1) A parametric model is chosen by a researcher and fitted to all samples via a selected
         estimation method that should have a form that either maximizes or minimizes an objective
         function.
         (2) Stability (or volatility) of parameter estimates is assessed for every covariate considered
         (𝑍𝑍𝑝𝑝 = 𝑍𝑍1 , 𝑍𝑍2 , … , 𝑍𝑍𝑃𝑃 ). If any overall instability is detected with respect to particular
         covariate(s), a covariate showing the highest instability are chosen based on the p-value from
         the test statistics. If there is no significant volatility detected across all the values of the
         covariates, the process stops.
         (3) The fitted model is divided into a set of segmented models according to the split points
         (values) of the covariates that are searched and computed to locally fit the model better. The
         number of splits can be either fixed or adaptively chosen. The split points are determined
         through optimizing the sum of the log-likelihoods of two partitioned models. That is, with
                                                              12


           the selected covariate, a split point that improves the highest model fit is determined and the
           samples are divided into another subsample.
           (4) The split is done with the selected covariates and the steps (1) - (3) are repeated until
           there is no more significant instability. Some stopping rules also can be used by researchers
           depending on research questions and sample sizes, and if the criteria are met, splitting does
           not proceed anymore.
           The details of how the above steps are carried out are described in the following.
2.2.1       Parameter estimation for a template model
             The first step is to fit a parametric model to a whole sample using M-estimators such as
ordinary least squares or maximum likelihood estimation. The specification of a template parametric
model is determined at this stage according to research questions and established theory. Focusing
on maximum likelihood estimation under multivariate normality assumption, the likelihood function
given 𝑛𝑛 independent and identically distributed observations 𝑦𝑦𝑖𝑖 (𝑖𝑖 = 1, 2, … , 𝑛𝑛) and a set of
parameters 𝜽𝜽 = {𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝐾𝐾 } is obtained by the product of the individual densities as
                           𝑛𝑛
                                   𝑘𝑘          1        1             ′
         𝐿𝐿(𝜽𝜽; 𝑦𝑦𝑖𝑖 ) = �(2𝜋𝜋)−2 |Σ(𝜽𝜽)|−2 𝑒𝑒𝑒𝑒𝑒𝑒 �− �𝑦𝑦𝑖𝑖 − 𝜇𝜇(𝜽𝜽)� Σ(𝜽𝜽)−1 �𝑦𝑦𝑖𝑖 − 𝜇𝜇(𝜽𝜽)��,      (2-1)
                                                        2
                         𝑖𝑖=1
where 𝜇𝜇(𝜃𝜃) is the 𝑘𝑘 × 1 mean vector and Σ(𝜃𝜃) is the 𝑘𝑘 × 𝑘𝑘 covariance matrix. A set of parameters is
estimated consistently and efficiently by maximizing the above likelihood function or minimizing
the negative log-likelihood. The 𝑦𝑦𝑖𝑖 individuals equally contribute to the whole log-likelihood as
                                                            𝑛𝑛
                           1                                              ′
   𝑙𝑙𝑙𝑙𝑙𝑙(𝜽𝜽; 𝑦𝑦𝑖𝑖 ) = − �𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛(2𝜋𝜋) + 𝑛𝑛𝑛𝑛𝑛𝑛|Σ(𝜽𝜽)| + ��𝑦𝑦𝑖𝑖 − 𝜇𝜇(𝜽𝜽)� Σ(𝜽𝜽)−1 �𝑦𝑦𝑖𝑖 − 𝜇𝜇(𝜽𝜽)��. (2-2)
                           2
                                                          𝑖𝑖=1
             The above function is used typically via well-established iterative ways to find the
parameter estimates, 𝜃𝜃�. It is also more widely employed to assess the goodness of model fit between
two competing models to select a better fitting model. Multiplying the log-likelihood by -2, the
                                                            13


difference between two log-likelihoods of competing models asymptotically follows a 𝜒𝜒 2
distribution with q (the difference in the number of parameters between two models) degrees of
freedom. This 𝜒𝜒 2 is used as a test statistic for likelihood ratio testing (LRT). LRT is initially
suggested by Brandmaier et al. (2013) to determine whether or not the samples are to be divided into
sub-samples according to the values of covariates as indicated earlier. MOB uses LRT for locating
the split points, but selecting a covariate is completed by other test statistic constructed using a score
function.
             The score is defined as the gradient of the log-likelihood function with respect to the vector
of k parameters. The individual scores are obtained from the individual likelihoods
                              1                                                           ′
       𝑙𝑙𝑙𝑙𝑙𝑙(𝜽𝜽; 𝑦𝑦𝑖𝑖 ) = − �𝑘𝑘𝑘𝑘𝑘𝑘(2𝜋𝜋) + 𝑙𝑙𝑙𝑙|Σ(𝜽𝜽)| + �𝑦𝑦𝑖𝑖 − 𝜇𝜇(𝜽𝜽)� Σ(𝜽𝜽)−1 �𝑦𝑦𝑖𝑖 − 𝜇𝜇(𝜽𝜽)��,                   (2-3)
                              2
by taking the partial derivatives of them with respect to each parameter where the expected gradient
of the function is zero. Parameters estimates 𝜃𝜃� can be computed by a summation of the partial
derivatives of the individual log-likelihood function with respect to a set of 𝜃𝜃� under mild regularity
conditions (White, 1994) as
                                                𝑛𝑛
                                                       𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕�𝜽𝜽� ; 𝑦𝑦𝑖𝑖 �
                                             �−                              = 0.                                     (2-4)
                                                                 �
                                                               𝜕𝜕𝜽𝜽
                                              𝑖𝑖=1
             Then, the individual scores are calculated by solving the first partial derivatives of the
individual log-likelihood function with respect to each 𝜽𝜽                       � . The score function is represented as a
matrix form:
                                              𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕�𝜃𝜃�1 ; 𝑦𝑦1 �           𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕�𝜃𝜃�𝐾𝐾 ; 𝑦𝑦1 �
                                            ⎡                           ⋯                            ⎤
                                            ⎢        𝜕𝜕𝜃𝜃�1                          𝜕𝜕𝜃𝜃�𝐾𝐾         ⎥
                               �
                            𝑠𝑠�𝜽𝜽; 𝑦𝑦𝑖𝑖 � = ⎢          ⋮                ⋱              ⋮             ⎥.               (2-5)
                                            ⎢𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕�𝜃𝜃� ; 𝑦𝑦 �              𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕�𝜃𝜃𝐾𝐾 ; 𝑦𝑦𝑛𝑛 �⎥⎥
                                                                                        �
                                            ⎢             1 𝑛𝑛
                                                                        ⋯
                                            ⎣        𝜕𝜕𝜃𝜃�1                          𝜕𝜕𝜃𝜃�𝐾𝐾         ⎦
             These scores represent the extent to which an individual’s log-likelihood is maximized by
                                                                         14


each k parameters. The closer the individual score is to zero, the better the individual fits the model.
On the other hand, large values of scores imply misfit between the individual and model. Thus, it is
possible to inspect if the scores deviate from zero systematically according to particular covariates.
2.2.2        Testing instability of parameter estimates
             The second step is to test if all parameter estimates are stable, or they fluctuate over a set
of partitioning covariates (𝑍𝑍𝑝𝑝 ) using the empirical score function. Based on the matrix of the
empirical contributions to the gradient, the instability of the parameter estimates is tested if splitting
the sample with respect to one of covariates (𝑍𝑍𝑝𝑝 ) improves the model fit. One of the methods is to
check if the scores fluctuate randomly around zero or deviate systematically from zero. Under
parameter stability, the empirical score function fluctuates randomly around its expected value of
zero. If there are some instabilities over parameters, systematic departures from zero for sub-
subsamples related to certain covariates can be detected. Intuitively, the idea is similar to examining
the randomness of residuals in linear regression. The deviations are monitored by the empirical
fluctuation process, which is defined as the K-dimensional cumulative score process,
                                                       ⌊𝑛𝑛𝑗𝑗⌋
                                       1          −1/2
                            � ; 𝑗𝑗� =
                         𝐵𝐵�𝜽𝜽            𝐼𝐼�𝜃𝜃��      � 𝑠𝑠�𝜃𝜃� ; 𝑦𝑦𝑖𝑖 , 𝑍𝑍𝑝𝑝 � (0 ≤ 𝑗𝑗 ≤ 1),         (2-6)
                                      √𝑛𝑛              𝑖𝑖=1
where n is the total sample size within a subgroup (node), j is the number of sorted samples by a
candidate covariate 𝑍𝑍𝑝𝑝 that is being examined. ⌊𝑛𝑛𝑗𝑗⌋ is a floor operator producing an integer part of
          � ; 𝑗𝑗� is the partial sum process of the scores to njth samples ordered by 𝑍𝑍𝑝𝑝 , scaled by the
nj. 𝐵𝐵𝑝𝑝 �𝜽𝜽
inverse of the square root of both n and the estimated covariance matrix of Fisher’s information,
𝐼𝐼�𝜃𝜃��, evaluated at the parameter estimates. This produces an n × K matrix for a pth covariate
accounting for the ordering of individuals simultaneously.
             𝐵𝐵�𝜽𝜽� ; 𝑗𝑗� converges to a univariate distribution of Brownian bridge by a functional central
                                                                 15


limit theorem under the null hypothesis that the parameter estimates are stable (see Merkle et al.,
2014; Zeileis & Hornik, 2007 for details). Formally, the functional central limit theorem holds as
                                                          𝑑𝑑
                                               � ; ∙ � → 𝐵𝐵 0 (∙),
                                            𝐵𝐵�𝜽𝜽                                                                  (2-7)
          𝑑𝑑
where → denotes convergence in distribution and 𝐵𝐵 0 (∙) is a k-dimensional Brownian bridge. The
Brownian bridge is a stochastic process that is pinned at the start (i=0) and the end (i=n). The
expected value of the bridge is zero with variance 𝑗𝑗(1 − 𝑗𝑗), implying that most volatility occurs in
the middle of the bridge. Thus, an empirical cumulative score can be represented within an n × K
matrix with elements 𝐵𝐵𝑝𝑝 �𝜽𝜽� ; 𝑖𝑖/𝑛𝑛�. This will be denoted as 𝐵𝐵�𝜽𝜽            � � . Test statistic of a single value of
                                                                                        𝑖𝑖𝑖𝑖
scalar can be derived by aggregating 𝐵𝐵�𝜽𝜽       � � over i individuals and k parameters. Each row of the
                                                     𝑖𝑖𝑖𝑖
matrix represents a cumulative sum of scores of individuals who belong to 𝑖𝑖/𝑛𝑛 percentile of the
covariate, 𝑍𝑍𝑝𝑝 or below. Different ways of aggregating them produce different test statistic depending
on the types of covariates (Merckle & Zeileis, 2013; Merckle et al., 2014). Then, the null hypothesis
of parameter homogeneity can be tested by comparing the test statistic obtained by aggregating
   � � with the corresponding analogous statistic from a Brownian bridge. Currently, MOB
𝐵𝐵�𝜽𝜽 𝑖𝑖𝑖𝑖
provides different test statistic for continuous and categorical covariates as a default option, and
users can optionally utilize other two test statistic for ordinal covariates (Hothorn & Zeileis, 2015).
a) Test statistic for categorical covariates (declared as ‘factor’ in R)
             Assessing the parameter instabilities for categorical covariates, 𝑍𝑍𝑝𝑝 having M levels of
categories, is achieved by constructing a test statistic, through summing the squared differences in
the sum of scores corresponding to the associated category m = 1, 2, …, M of the covariate over K
parameters,
                                     𝑀𝑀    𝐾𝐾
                                                                                           2
                          𝐿𝐿𝐿𝐿 = � � �𝐵𝐵�𝜽𝜽         ��                  ��
                                                                   − 𝐵𝐵�𝜽𝜽              � ,                        (2-8)
                                                          𝑖𝑖 𝑚𝑚 𝑘𝑘         𝑖𝑖 (𝑚𝑚−1) 𝑘𝑘
                                    𝑚𝑚=1 𝑘𝑘=1
                                                                   16


where 𝑖𝑖(𝑚𝑚−1) is the size of individuals within m-1 category. 𝐿𝐿𝐿𝐿 follows 𝜒𝜒 2 distribution
asymptotically with 𝐾𝐾(𝑀𝑀 − 1) degrees of freedom. This captures the fluctuation within each of the
categories of the partitioning covariate, 𝑍𝑍𝑝𝑝 . This is a Lagrange multiplier (LM) type test statistic,
which is asymptotically equivalent to the corresponding LRT. Using 𝐿𝐿𝐿𝐿 is especially beneficial
compared to LRT approach in terms of reducing computational burden because the model is fitted
once in the current subgroup (node) to estimate parameters and corresponding score functions are
computed per node. Then, the scores are simply reordered and aggregated, producing a test statistic
each time (Zelleis et al., 2008). The corresponding vector of p-values for the 𝑍𝑍𝑝𝑝 covariates can also
be obtained (Hjort & Koning, 2002). This test statistic can also be used if the ordinal variable is
treated as unordered/categorical in the analysis.
b) Test statistic for ordinal covariates (declared as ‘ordered’ in R)
          Although the above 𝐿𝐿𝐿𝐿 statistic for categorical variables can be used by default for ordinal
variables, two statistics for the ordinal variables were proposed by Merkle et al. (2014). The first
one is a weighted double maximum (𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 ) and the second one is an adapted 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 . Their
associated test statistic can be also obtained. The former employs multivariate normal probability to
calculate the p-values, and the latter gets p-values by means of simulating the critical values on the
fly (Kleiber et al., 2002) requiring some computation time. Formally, these statistics can be
represented as
                                             𝑖𝑖𝑚𝑚       𝑖𝑖𝑚𝑚 −1/2
                  𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 =      max      �      �1 − ��            max �𝐵𝐵�𝜽𝜽� � �,             (2-9)
                               𝑚𝑚=1,…,𝑀𝑀−1    𝑛𝑛         𝑛𝑛        𝑘𝑘=1,…,𝐾𝐾        𝑖𝑖𝑖𝑖
                                                                        𝐾𝐾
                                                   𝑖𝑖𝑚𝑚       𝑖𝑖𝑚𝑚 −1
                     𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 = max � �1 − �� � 𝐵𝐵�𝜽𝜽                    � �2 .              (2-10)
                                    𝑚𝑚=1,…,𝑀𝑀−1 𝑛𝑛             𝑛𝑛               𝑖𝑖𝑖𝑖
                                                                      𝑘𝑘=1
c) Test statistic for continuous covariates (declared as ‘numeric’ or ‘integer’ in R)
                                                            17


           Among the proposed three different test statistic for continuous covariates (Merkle &
Zeileis, 2013), a currently available statistic is the supremum of Lagrange multiplier that can be
presented as
                                                                  𝐾𝐾
                                                   𝑖𝑖     𝑖𝑖 −1
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝐶𝐶 = max � �1 − �� � 𝐵𝐵�𝜽𝜽          � �2 .                         (2-11)
                                        𝑖𝑖=𝑖𝑖,…,𝑖𝑖 𝑛𝑛    𝑛𝑛               𝑖𝑖𝑖𝑖
                                                                𝑘𝑘=1
           This is the maximum of the sum of the squares of the 𝐵𝐵𝑖𝑖𝑖𝑖 �𝜽𝜽     � � in each k-dimensional vector
of parameters over all single-split j samples, scaled by its variance component. Since 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝐶𝐶
considers the values of all parameters at a single point, it is suitable when there exists a noticeable
single change point in several parameters. The lower and upper bounds �𝑖𝑖, 𝑖𝑖� are typically specified
for the continuous covariates since few individuals belonging to the extreme values could have an
effect on instability of the test statistic. 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝐶𝐶 is asymptotically equivalent to the supremum of
likelihood-ratio statistics (Chow, 1960; Zelleis et al., 2008), and the asymptotic p-values can be
obtained from a table proposed by Hansen (1997). If there are many ties in the partitioning
covariates, the maximum value is not unique, and the results may be affected by the ordering of the
individuals. In this case, it is suggested to either investigate the results by breaking ties randomly or
treat the continuous variable as an ordinal variable.
2.2.3   Partitioning sample into subgroups along with selected covariates
           In the third step, sample in the current node is divided into child nodes along with the
partitioning covariate, 𝑍𝑍𝑝𝑝∗ . The covariate showing the strongest association with the parameter
instability producing the highest p-value is firstly chosen. Then, a split value (breakpoint) that
optimizes the estimating function with the largest improvement of the model fit is computed. This is
achieved by an exhaustive iterative search procedure, during which the two models consisting of the
divided subsamples within each node are fit and the split point is determined for each noticeable
breakpoint in the chosen 𝑍𝑍𝑝𝑝∗ covariate based on likelihood ratio testing.
                                                          18


2.2.4    Repeating steps until stopping rules are met
           Steps 1) - 3) are repeated until there is no significant instability of parameters is detected in
the current nodes. The node showing stable/homogeneous set of parameters is called a terminal
node, which is the uncovered distinct subgroup. Optionally, there are pre- and post- pruning
strategies to determine the optimal tree size. The options for the former include a setting of a
minimal number of sizes for a node, for example, setting up a node size of 100, and/or to use a
Bonferroni-adjusted p-value. However, when the sample size is very large that the traditional
significance level is not useful, the resulting trees typically produce excessive number of terminal
nodes because small parameter instabilities can be detected, which is not concise to interpret the
trees. In this case, a grown large tree is pruned back if the splits did not improve the model based on
AIC or BIC (Su et al., 2004). That is, the large tree is pruned/evaluated based on whether or not the
sum of model fit of the divided subgroups improves the model fit of the previous group including
the divided subgroups statistically. This option is currently available in partykit (Hothorn &
Zeileis, 2015). Among others, limiting the sample size per a subgroup (node) and post pruning
method using BIC are investigated in a simulation study.
                                                      19


                         CHAPTER 3. AN ILLUSTRATIVE EXAMPLE
          To achieve the first research purpose, this chapter demonstrates how to use MOB with a
parametric template model of latent growth curve model (LGCM) with an educational empirical
data. Before describing the procedure of MOB, it is necessary to understand how MOB works to
uncover statistically distinct and interpretable groups creating a visualized tree. I used two
categorical covariates, gender, and race. They are considered to be related to the outcome,
academic achievement. The outcome variables are overall GPA scores from 9th to 12th grades,
which produces four repeated measurements. A quadratic latent growth curve model was chosen
as a template parametric model rather than linear growth model, to estimate the fixed effects of
the intercept, linear slope and quadratic slope plus their covariance-variance and the residuals.
The result shows how the changes in GPAs over four years differ across demographics.
          An example of LGCM tree using MOB is presented in Figure 1. The composition of the
subgroups can be found in this visualized tree directly. Following the terminology of the
machine learning community, the top of the tree is called root node, which indicates the whole
sample that is used for the study. The subsamples (subgroups) in the middle of the tree are called
inner nodes or child nodes. These subgroups were determined by a race covariate first in this
example because the parameters of some ethnic groups are most significantly different from
other ethnic groups. Specifically, the left side of the tree consists of three race groups, Black
(BLK), Hispanic (HIP), and Other ethnic groups (OTS), and the right side of the tree consists of
two race categories, Asian (ASA) and White (WHT). Then, the machine learning algorithm built
in MOB keeps continuing to find heterogeneity (instability) of the model parameters. The second
covariate dividing the subgroups is the gender, and the subsamples are divided into subsamples
again and again until either some stopping rules are met or there is no more significant instability
                                                  20


of the parameters across the values of the covariates. The final distinct subgroups are called
terminal nodes. Each final distinct subgroups have their sample size and own model parameters
including the intercept, linear slope, and quadratic slope, to distinguish their change trajectories
based on the latent growth curve model.
           In this example, there are nine distinct subgroups showing significantly different initial
GPA score and their changes over time. For example, the third subgroup from the left at the
bottom of the tree is the group of Black male students (Node 8), which shows the lowest initial
GPA score. In contrast, the sixth subgroup from the left at the bottom of the tree is the group of
Asian female students (Node 13) showing the highest initial GPA score. This tree easily informs
us the composition of the subgroups, which enables us to understand complex interaction effects
between the race and gender. The information in the bottom of the tree can be customized by
researchers. For instance, all the parameter estimates including the random effects and their
standard errors can be also presented while this study presents three coefficients only for the
illustrative purpose.
                                                   21


Figure 1. An example of LGCM tree using MOB.
          Using the parameter estimates of each terminal node, the expected GPA change
trajectories across four grades can be visualized as Figure 2. Interestingly, most of the subgroups
show similar change trajectories that slightly decrease from the first year to the third year then
increase at the four years. This result is expected and can be interpreted as common in population
because most of the students and schools would make their great efforts to manage the GPA
scores of the 12th grade. However, this analysis reveals that two other subgroups (Node 13 and
Node 16) do not follow the same trend. Node 13 and Node 16 in the Figure 2 show that their
overall GPA scores constantly decreased over the four years despite of their highest initial GPA
score at the first year.
                                                  22


Figure 2. An example of expected GPA changes across four grades (Nodes are subgroups).
          Looking at the previous Figure 1, the Nodes 13 and 16 are the groups of Asian female
and Asian male students, respectively. Moreover, regarding the change between the third and
fourth year, the amount of increase for the nodes 14 (White female), 4 (Black / Hispanic female),
and 8 (Black male) is large compared to other groups. The lowest initial GPA group is Black and
Hispanic male student groups (nodes 8 and 9). The combination of two covariates produces ten
intersectional groups (interaction effect) though, nine groups are statistically different in terms of
the change in GPAs over time. This section would not attempt to interpret the substantive
meaning of the results because this analysis used only two covariates of the race and gender, to
provide an overview of the tree and what information MOB can produce for the interpretation.
Next section describes how to use MOB in detail.
3.1     Data
          The data used for this study is High School Longitudinal Study (HSLSL:09). This is a
representative sample of the U.S. high school students collecting a variety of information on
                                                 23


students’ academic outcomes, experiences, environments, and backgrounds. It used a stratified
two-stage sampling design, which samples schools first followed by students from the selected
schools as the second step. It started to collect the data from the academic year of 2009, and
follow-up data were collected in 2012, 2013, and 2016. In 2013, the transcripts were collected to
add more detailed information on students’ academic outcomes containing students’ course
taking and GPA scores. In addition, the HSLS:09 contains numerous covariates on students’
experiences, demographics, and their noncognitive scale scores related to the educational
outcomes. To producing replicable results, this study uses publicly available datasets.
           The interested educational outcomes are the overall GPA scores from 9th to 12th grades,
which is the same as the above exemplary analysis. These measures are non-cumulative GPA
scores. Measurement invariance across years is assumed for the purpose of this study that GPA
score measures the same construct across years. The outcomes are used to fit unconditional latent
growth curve model. One of the most important reasons to choose GPA score as an outcome for
this study is that GPA scores have been known as strong indicators to predict academic success
in college and career (Allensworth & Clark, 2020). According to their recent study, high school
GPA is the most critical indicator of the academic readiness and performance for students and
institutions of higher education. In that sense, the high school GPA scores of the different
students would follow different trajectories having different initial points and shapes. Some of
groups of students could share their backgrounds, demographics, and experiences in schools in
terms of the GPA scores. The purpose of this approach, MOB, is to find those subgroups using
available predictors/covariates.
           The covariates found to be related to GPA scores are selected based on previous
literature to avoid any irrelevant covariates and to reduce the estimation time (e.g., Bowers &
                                                   24


Sprott, 2012). A few available student- and school-level covariates are considered for this study
though, there is no limit to the number of covariates. That is, if there is no existing prior
literature matching to the research topic or if the research purpose is to explore any potentially
relevant covariates, researchers can add any kinds of covariates without any limitations while
this would increase the estimation time. One of the benefits of using MOB is to use covariates as
they are. It is not required to make multiple dummy variables for making groups with categorical
variables, which is commonly done in regression modeling. In addition, it is unnecessary to
assume that the ordinal responses are either continuous or categorical. Researchers can declare
the ordinal variables as ordinal, and MOB can get the relevant test statistic for the hypothesis
testing of the ordinal variables as I described in Chapter 2. A list of the used four outcomes and
ten covariates are presented in Table 1. The covariates of the first year (9th grade) were only
chosen to be related to the initial GPA score and its change over time.
                                                  25


Table 1. A list of variables of High School Longitudinal Study of 2009 data
       Variables                                  Descriptions                            Types
 Outcomes
   GPA9
   GPA10                   Overall GPA scores from 9th to 12th grades (four
                                                                                       Continuous
   GPA11                   repeated measures and non-cumulative GPA scores)
   GPA12
 Covariates
   SEX                     Female or male                                              Categorical
   RACE                    Black, Hispanic, Asian, White, and others                   Categorical
   LOCA                    School location: city, suburban, town, and rural            Categorical
   FLUNCH*                 Categorized percentage of students enrolled in the school Ordinal
                           who receive free or reduced-price lunch; 0 = 0%, 1=
                           more than 0% but less than 10%, 2 = at least 10% but
                           less than 20%, …, 11 = 100%. This has 11 ordered
                           categories
   HACT                    Hours spent on extracurricular activities on typical school Ordinal
                           day; 1 = less than 1 hour, 2 = 1 to 2 hours, …, 6 = 5 or
                           more hours. This has six ordered categories
   MISBEHAV*               Frequency of student in-class misbehavior at this school; Ordinal
                           1 = Daily, 2 = At least once a week, 3 = At least once a
                           month, 4 = On occasion, and 5 = Never happens. This
                           has five ordered categories
   SES                     Socio-economic status scale                                 Continuous
   MATEFF                  Standardized scale of student’s math self-efficacy; higher Continuous
                           values represent higher math self-efficacy
   BEHAVSCH                Standardized scale of student’s answer about in-school      Continuous
                           behavior within last 6 months. Higher values represent
                           more positive assessments of the school's problems, i.e.,
                           fewer problems are indicated
   SCHCLI                  Standardized scale of administrator’s assessment of         Continuous
                           school climate; higher values represent more positive
                           assessments of the school’s climate, i.e., fewer problems
                           are indicated
Note. All continuous variables are rounded to one decimal place for the analysis. FLUNCH and
MISBEHAV are restricted-use variables. The covariates are chosen from the first year (9th
grade) only.
                                                  26


           First, the categorical covariates are 1) gender (female and male), 2) race (Black,
Hispanic, Asian, White, and Others), and 3) school location (city, suburban, town, and rural).
Second, the ordinal covariates are 1) hours spent on extracurricular activities on typical school
day (HACT; 1 = less than 1 hour, 2 = 1 to 2 hours, …, 6 = 5 or more hours), which has six
ordered categories, 2) the categorized percentage of students enrolled in the school who receive
free or reduced-price lunch (FLUNCH; 0 = 0%, 1 = more than 0% but less than 10%, 2 = at least
10% but less than 20%, …, 11 = 100%), 3) the frequency of student in-class misbehavior at this
school (MISBEHAV; 1 = daily, 2 = at least once a week, 3 = at least once a month, 4 = on
occasion, and 5 = never happens). Third, the continuous covariates are 1) socio-economic status
(SES; standardized scale score), 2) student’s math self-efficacy scale score (MATEFF; higher
values represent higher math self-efficacy), 3) school’s motivation scale score (BEHAVSCH;
higher values represent more positive assessments of the school’s problem), and 4) scale score of
the administrator’s assessment of school climate (SCHCLI; higher values represent more
positive assessments of the school’s climate, i.e., fewer problems are indicated). All these scale
scores are standardized composite scores and rounded to one decimal place to reduce the
estimation time.
3.2      Template model: LGCM
           To use a latent growth curve model (LGCM) as a template model of the MOB, the first
step is to scrutinize descriptive statistics of the variables that are intended to be used based on
previous literature or theories. After excluding all non-responses for each variable, 9,275
samples were chosen to be analyzed for the illustrative purpose assuming the missingness of
missing completely at random. The descriptive statistics of the continuous/ordinal variables and
frequency and proportions of the categorical variables are presented in Table 2 and Table 3,
                                                    27


respectively. Although the descriptive statistics of the ordinal variables are presented with the
continuous variables together, the analysis treats the ordinal variables as ordinal rather than
categorical or continuous. This feature was discussed in detail Chapter 2.
Table 2. Descriptive statistics of continuous and ordinal variables without sampling weights
(n=9,275)
                         Mean           SD          Min         Max        Skewness        Kurtosis
 GPA9                    2.982        0.741        0.130       4.000          3.870          -0.667
 GPA10                   2.937        0.750        0.000       4.000          4.000          -0.632
 GPA11                   2.938        0.741        0.000       4.000          4.000          -0.686
 GPA12                   3.027        0.727        0.000       4.000          4.000          -0.971
 FLUNCH*                 4.494        2.468        1.000      12.000         11.000          0.296
 HACT*                   2.600        1.434        1.000       6.000          5.000          0.805
 MISBEHAV*               2.116        1.151        1.000       5.000          4.000          0.652
 SES                     0.178        0.780       -1.800       2.900          4.700          0.286
 MATEFF                  0.059        0.997       -2.500       1.700          4.200          -0.356
 BEHAVSCH                0.124        0.868       -5.600       1.200          6.800          -1.640
 SCHCLI                  0.253        0.995       -3.200       2.600          5.800          -0.411
Note. FLUNCH, HACT, and MISBEHAV are ordinal variables. FLUNCH = categorized
percentage of students enrolled in the school who receive free or reduced-price lunch, HACT =
hours spent on extracurricular activities on typical school day. MISBEHAV = frequency of
student in-class misbehavior at this school. SES = socio-economic status scale score. MATEFF =
student’s math self-efficacy scale score. BEHAVSCH = student’s school motivation scale score.
SCHCLI = scale score of the administrator’s assessment of school climate.
Table 3. Frequency and proportions of categorical variables without sampling weights
             Gender      Female         Male         Total       Female         Male          Total
 Race                                     n                                      %
 Others                    410           458          868           4.4          4.9            9.4
 Asian                     373           397          770           4.0          4.3            8.3
 Black                     336           333          669           3.6          3.6            7.2
 Hispanic                  640           638         1,278          6.9          6.9           13.8
 White                    2,856         2,834        5,690         30.8         30.6           61.3
 Total                    4,615         4,660        9,275         49.8         50.2          100.0
                           City        Suburb        Town         Rural                Total
 School location          2,518         3,287        1,207        2,263                9,275
                        (0.27%)       (0.35%)      (0.13%)      (0.24%)              (100.0%)
                                                  28


          Correlations between continuous variables are estimated and visually presented with
the coefficients in Table 4. Color intensity and the size of the circle are proportional to the
correlation coefficients. In the right side of the correlogram, the legend color shows the
correlation coefficients and the corresponding colors. The SES and school motivation
(BEHAVSCH) have stronger associations (around 0.3s) with GPA scores than two other
covariates (around 0.1 - 0.2).
Table 4. Correlation matrix between continuous variables with a visualized figure
                                                                         MAT- BEHA- SCH-
                   GPA9 GPA10 GPA11 GPA12                      SES
                                                                          EFF      VSCH         CLI
 GPA9              1.000      0.843      0.761      0.684     0.354      0.216     0.290      0.132
 GPA10             0.843      1.000      0.821      0.721     0.343      0.219     0.323      0.154
 GPA11             0.761      0.821      1.000      0.776     0.341      0.272     0.361      0.160
 GPA12             0.684      0.721      0.776      1.000     0.307      0.176     0.320      0.152
 SES               0.354      0.343      0.341      0.307     1.000      0.133     0.080      0.177
 MATEFF            0.216      0.219      0.272      0.176     0.133      1.000     0.200      0.066
 BEHAVSCH 0.290               0.323      0.361      0.320     0.080      0.200     1.000      0.079
 SCHCLI            0.132      0.154      0.160      0.152     0.177      0.066     0.079      1.000
Note. GPA9 - GPA12 = Overall GPA scores from the 9th to the 12th grade. SES = socio-
economic status scale score. MATEFF = student’s math self-efficacy scale score. BEHAVSCH =
student’s school motivation scale score. SCHCLI = scale score of the administrator’s assessment
of school climate.
                                                   29


          The general procedure is the same as the traditional approach. One of the most
important parts of this is to examine whether the growth pattern shows linear or quadratic or
other nonlinear forms. Then, researchers can compare the model fit indices to find the best fitting
model. Among them, ones can also determine if the random effects are either constrained to be
the same across times or freely estimated by comparing all the candidate models. In this study,
the average trend of GPA scores across four grades (9th to 12th) was examined by drawing a
spaghetti plot of 50 randomly chosen students. In Figure 3, it is not easy to find a noticeable
general trend pattern of the GPA score across years. However, it is hard to find dramatic changes
of GPA score from the year to year. Keeping this in mind, both linear and quadratic latent
growth curve models are considered to be fitted to this data.
Figure 3. Spaghetti plot of overall GPA for 50 randomly chosen students.
                                                30


            Following the classical notations from Bollen and Curran (2006), a latent growth curve
model can be represented as
                                           𝑌𝑌𝑖𝑖 = Λ ∙ 𝜂𝜂𝑖𝑖 + 𝜀𝜀𝑖𝑖 ,
                                                                                                     (3-1)
                                             𝜂𝜂𝑖𝑖 = 𝜇𝜇𝜂𝜂𝑖𝑖 + 𝜁𝜁𝑖𝑖 ,
where 𝑌𝑌𝑖𝑖 is a 𝑇𝑇 × 1 vector of 𝑇𝑇 repeated measures for 𝑖𝑖th individual (𝑖𝑖 = 1, 2, … , 𝑛𝑛), Λ is a
𝑇𝑇 × 𝑘𝑘 matrix of factor loadings (intercept, linear slope, and quadratic slope: these are fixed
based on the coding of time points), 𝜂𝜂𝑖𝑖 is a 𝑘𝑘 × 1 vector of 𝑘𝑘 latent factors, and 𝜀𝜀𝑖𝑖 is a 𝑇𝑇 × 1
vector of residuals. The vector of latent factors is decomposed into 𝜇𝜇𝜂𝜂𝑖𝑖 + 𝜁𝜁𝑖𝑖 , which are the mean
and deviance, respectively.
            A combined form can be expressed as 𝑌𝑌𝑖𝑖 = Λ ∙ �𝜇𝜇𝜂𝜂𝑖𝑖 + 𝜁𝜁𝑖𝑖 � + 𝜀𝜀𝑖𝑖 . Then, the model-
implied covariance matrix is written as
                                          Σ = ΛΨΛ′ + Θ𝜀𝜀𝑖𝑖 ,                                         (3-2)
where Σ is the covariance matrix of the responses 𝑌𝑌𝑖𝑖 , Ψ is the covariance structure of latent
factors of 𝜁𝜁𝑖𝑖 , and Θ𝜀𝜀𝑖𝑖 is the covariance structure of the residuals, which is a diagonal matrix
consisting of all the variance components. With four time points, the factor loadings of intercept,
linear slope, and quadratic slope were fixed and coded as [1, 1, 1, 1], [0, 1, 2, 3], and [0, 1, 4, 9],
respectively. The variance of residuals (𝜀𝜀𝑖𝑖 ) are fixed to be the same across time points to avoid
negative variances and to enhance model fit in this study. Depending on distribution of data, the
residuals can be freely estimated. They are assumed to be normally and equivalently distributed
as 𝜀𝜀𝑖𝑖 ~ 𝑁𝑁 (0, 𝜎𝜎𝜀𝜀2 ) following a common practice though, one can specify different distributional
forms.
            Denoting 𝜃𝜃 to be all parameters to be estimated, 𝜇𝜇(𝜃𝜃) and Σ(𝜃𝜃) are model-implied
mean and covariance structures, respectively. Without covariates/predictors, the parameters
                                                            31


estimated from this model specification are a) means of three latent factors (fixed effects;
intercept (𝜇𝜇𝐼𝐼 ), linear slope (𝜇𝜇𝑆𝑆 ), and quadratic slope (𝜇𝜇𝑄𝑄 )), b) their variances (𝜎𝜎𝐼𝐼2 , 𝜎𝜎𝑆𝑆2 , and 𝜎𝜎𝑄𝑄2 )
and covariances (𝜎𝜎𝐼𝐼𝐼𝐼 , 𝜎𝜎𝐼𝐼𝐼𝐼 , and 𝜎𝜎𝑆𝑆𝑆𝑆 ), and c) a fixed residual (𝜎𝜎𝜀𝜀2 ). In total, there are ten
parameters. Parameters are estimated by employing the robust maximum likelihood (MLR)
estimation under the mild multivariate normality assumption.
Figure 4. The expected GPA score over four years from quadratic latent growth curve model.
          This study examined a variety of models including the linear latent growth curve model
and quadratic LGCM with freely estimated residuals across times. However, the model fit
indices of the linear model were poorer than the ones of the quadratic model. In addition, there
were negative variance estimates with the freely estimated residuals. Based on this result, this
study decided to use the quadratic LGCM as a template model. Table 5 shows the parameter
                                                             32


estimates and the model fit indices of the unconditional quadratic LGCM. As displayed in Figure
4, the expected overall GPA score decreases from the first year (9th grade) to the third year (11th
grade) and then, increases in the fourth year (12th grade). The variances and covariances of the
latent growth factors are significant except the covariance between intercept and linear slope.
The intercept variance is fairly large compared to the slopes, indicating that there are significant
individual differences in the GPA score at 9th grade (i.e., Time point 1), however, the overall
patterns of the changes over time is alike among individuals. To explore what demographic,
environmental, and behavioral factors are associated with the initial status (i.e., GPA in 9th
grade) and changes over time, informative covariates can be used to explain the differences.
Table 5. Results of unconditional quadratic latent growth curve model
 Parameters                                        Estimates               S.E.               p-value
 Means              Intercept (𝜇𝜇𝐼𝐼 )                2.984                0.008               < 0.001
                    Linear slope (𝜇𝜇𝑆𝑆 )             -0.087               0.005               < 0.001
                    Quadratic slope (𝜇𝜇𝑄𝑄 )          0.034                0.002               < 0.001
 Variances                          2
                    Intercept (𝜎𝜎𝐼𝐼 )                0.471                0.008               < 0.001
                                       2
                    Linear slope (𝜎𝜎𝑆𝑆 )             0.057                0.005               < 0.001
                                         2
                    Quadratic slope (𝜎𝜎𝑄𝑄 )          0.007                0.001               < 0.001
 Covariances        I ~ S (𝜎𝜎𝐼𝐼𝐼𝐼 )                  0.001                0.004                0.791
                    I ~ Q (𝜎𝜎𝐼𝐼𝐼𝐼 )                  -0.012               0.001               < 0.001
                    S ~ Q (𝜎𝜎𝑆𝑆𝑆𝑆 )                  -0.016               0.002               < 0.001
 Variances          Residuals (𝜎𝜎𝜀𝜀 )2
                                                     0.084                0.001               < 0.001
 Model fit indices
                    AIC                                51542.013
                    BIC                                51613.364
                    CFI                                     0.994
                    TLI                                     0.991
                    RMSEA                                   0.064
Note. AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), CFI
(Comparative Fit Index), TLI (Tucker Lewis Index), RMSEA (Root Mean Square Error of
Approximation)
                                                  33


3.3       Using MOB to grow a tree
           The above unconditional model can be used as a template model to grow trees with
available potentially informative covariates by partitioning recursively. The idea of this study
was introduced by Zeileis (2020). Thus, most of the details of how to connect two R packages of
partykit and lavaan is referenced by his paper. The detailed R codes for this empirical
analysis was presented in Appendix. There are two ways to determine the optimal size of the
node. The first option is to set up the minimal size per subgroup in advance. The second option is
to use post-pruning strategy that prunes splits back using information criteria such as BIC as
described earlier. The partitioning algorithm stops when there is no significant parameter
instability based on pre-specified Type-I error rate (e.g., a = 0.05 with Bonferroni correction
controlling for familywise error rate), however, small differences can be identified as they are
significant with large size dataset like this study. To avoid such large number of terminal nodes
with small sizes, a post-pruning method adopting BIC is employed. Additionally, the minimum
sample size per node is set to 250 to get stable parameter estimates for each subgroup. Previous
literature suggests to have minimum sample size of 250 to correctly detect nonlinear changes
with four time points (Diallo et al., 2014). Indeed, when the minimum number of nodes was set
as less than 250 (e.g., 100, 150, or 200), there were negatively estimated variances of the linear
slope for a specific terminal node. The test statistic for the ordinal covariate was set to use either
the weighted double maximum (𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 ) or the adapted 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 because the results were the
same. However, using the test statistic of the categorical covariate for the ordinal covariate,
which is the default option in partykit, was not the same as the above two. This different
feature was investigated in the simulation study. The following is the specific procedure how
MOB is used to grow a tree.
                                                   34


           Firstly, the model was fitted with default options without both limiting the number of
subgroups and post pruning of using BIC. The default option for the ordinal covariate was to use
the test statistic of the categorical covariate. The resulting tree produced 57 terminal nodes,
which is too large to interpret the resulting tree. Also, the number of sample size of the smallest
subgroup was 36. This would not be acceptable to say that the parameter estimates are stable.
Furthermore, there are a lot of subgroups in a tree. It is neither meaningful to interpret the tree
nor concise to have better understanding about the composition of the subgroups.
           Secondly, the model was fitted with the same options of the above with adding the
minimum sample size of 250. The resulting tree produced 25 terminal nodes, which was smaller
than the above. The first choice of splitting covariate was the SES as well. Still, there were a lot
of overlapped subgroups in terms of trajectories. In addition, the tree structure was very
complicated. Next, the same model was fitted with both limiting the number of subgroups and
post pruning of using BIC. The default option for the ordinal covariate was used again. The
resulting tree produced 9 terminal nodes, which is concise and meaningful to interpret the
resulting tree. The first covariate used for splitting was SES again. Interestingly, the ordinal
covariates were not used to split the subgroups in a tree so far. Only the continuous and
categorical covariates were used for splitting.
           Finally, the option using test statistic of 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 which is used for the ordinal
covariates, was added to the existing other options of the above. The resulting tree produced 13
terminal nodes, which was larger than the above. More importantly, the ordinal covariates, such
as HACT and FLUNCH covariates, were used to split the groups. The parameter estimates of
MOB with the quadratic LGCM for each subgroup are presented in Table 6. The first column
represents the composition of the subgroups with the node number. The substantive
                                                     35


interpretation of the results can be accomplished by using the Table 6 along with Figures 5 and 6.
First, the subgroup that shows the lowest mean intercept was the group of students who 1) spent
more than one hour on extracurricular activities on typical school day, 2) have less than or equal
to -1.2 scale score of the school motivation, 3) have less than or equal to 0.6 scale score of socio-
economic status, and 4) attending a school in which the percentage of students who receive free
or reduced-price lunch is more than 20% (Node 18: HACT > 1H & FLUNCH > 20% &
BEHAVSCH ≤ -1.2 & SES ≤ 0.6). This group shows dramatic decrease of GPA score from the
first to the second year, and then it increases from the third to the fourth year looking at the
Figure 6. Second, the subgroup that shows the highest mean intercept was the group of female
students who 1) spent more than one hour on extracurricular activities on typical school day, 2)
have larger than 0.5 scale score of socio-economic status, and 3) attending a school in which the
percentage of students who receive free or reduced-price lunch is less than or equal to 20%
(Node 14: HACT > 1H & FLUNCH ≤ 20% & SES > 0.5 & Female). This group shows slight
decrease of GPA score from the first to the second year, and then it slightly increases from the
third to the fourth year looking at the Figure 6.
           There are distinct subgroups showing notable different trajectories compared to other
subgroups. Looking at the Figure 6, purple and pink lines show gradual increase of GPA score
across the school years. Using the information from the Table 6, these groups are node 6 (purple)
and node 5 (pink). The Node 6 is the group of students who 1) spent less than or equal to one
hour on extracurricular activities on typical school day, 2) have larger than -0.1 scale score of the
school motivation, and 3) have larger than 0.1 scale score of socio-economic status (HACT ≤ 1H
& BEHAVSCH > -0.1 & SES > 0.1). The Node 5 is the group of students who 1) spent less than
or equal to one hour on extracurricular activities on typical school day, 2) have larger than -0.1
                                                   36


scale score of the school motivation, and 3) have smaller than 0.1 scale score of socio-economic
status (HACT ≤ 1H & BEHAVSCH > -0.1 & SES ≤ 0.1). The only difference between these two
groups is the socio-economic status, making the difference of the mean intercept of GPA score in
the first year. These groups spent less than one hour on extracurricular activities though, they had
higher behavioral motivation in school on average, leading to gradual increase of GPA score
across years. The interesting finding is that the overall GPA scores of these groups are higher
than the ones of the node 3 group (HACT ≤ 1H & BEHAVSCH ≤ -0.1) across years. The node 3
group had lower behavioral motivation in school on average, leading to the second lowest overall
GPA score across years regardless of their socio-economic status.
                                                  37


Table 6. Parameter estimates of the LGCM tree of MOB using all covariates
          Subgroup composition (Node #)              𝜇𝜇𝐼𝐼    𝜇𝜇𝑆𝑆    𝜇𝜇𝑄𝑄    𝜎𝜎𝐼𝐼2     𝜎𝜎𝑆𝑆2   𝜎𝜎𝑄𝑄2   𝜎𝜎𝐼𝐼𝐼𝐼  𝜎𝜎𝐼𝐼𝐼𝐼   𝜎𝜎𝑆𝑆𝑆𝑆 𝜎𝜎𝜀𝜀2
 HACT > 1H & FLUNCH > 20% &
                                                   2.450 -0.261 0.083 0.456 0.067 0.007 -0.042 0.005 -0.018 0.189
 BEHAVSCH ≤ -1.2 & SES ≤ 0.6 (18)
 HACT ≤ 1H & BEHAVSCH ≤ -0.1 (3)                   2.474 -0.154 0.057 0.494 0.106 0.013 -0.018 -0.010 -0.031 0.128
 HACT ≤ 1H & BEHAVSCH > -0.1 & SES ≤ 0.1 (5)       2.748 0.000 0.014 0.484 0.095 0.009 -0.039 -0.004 -0.024 0.113
 HACT > 1H & FLUNCH > 20% &
                                                   2.771 -0.146 0.050 0.424 0.044 0.007 0.007 -0.015 -0.014 0.108
 BEHAVSCH > -1.2 & SES ≤ 0.6 (19)
 HACT > 1H & FLUNCH ≤ 20% &
                                                   2.780 -0.153 0.053 0.383 0.040 0.006 0.014 -0.014 -0.012 0.091
 BEHAVSCH > 0.0 & SES ≤ 0.5 (10)
 HACT > 1H & FLUNCH > 20% &
 BEHAVSCH > 0.3 & Others or Black or               2.866 -0.033 0.021 0.450 0.047 0.004 -0.032 -0.005 -0.010 0.090
 Hispanic (22)
 HACT > 1H & FLUNCH > 20% & BEHAVSCH
                                                   3.001 -0.068 0.032 0.439 0.084 0.008 -0.040 -0.003 -0.021 0.073
 > 0.3 & SES < -0.3 & Asian or White (24)
 HACT ≤ 1H & BEHAVSCH > -0.1 & SES > 0.1 (6)       3.138 -0.039 0.023 0.384 0.048 0.006 0.002 -0.012 -0.015 0.063
 HACT > 1H & FLUNCH ≤ 20% &
                                                   3.177 -0.055 0.020 0.320 0.023 0.003 0.006 -0.010 -0.007 0.057
 BEHAVSCH > 0.0 & SES ≤ 0.5 (11)
 HACT > 1H & FLUNCH > 20% &
                                                   3.198 -0.163 0.052 0.341 0.039 0.005 -0.004 -0.005 -0.011 0.072
 BEHAVSCH ≤ 0.3 & SES > 0.6 (20)
 HACT > 1H & FLUNCH ≤ 20% &
                                                   3.240 -0.039 0.012 0.261 0.026 0.004 0.005 -0.007 -0.008 0.045
 SES > 0.5 & Male (13)
 HACT > 1H & FLUNCH > 20% & BEHAVSCH
                                                   3.349 -0.088 0.033 0.260 0.033 0.004 0.017 -0.013 -0.010 0.063
 > 0.3 & SES > -0.3 & Asian or White (25)
 HACT > 1H & FLUNCH ≤ 20% &
                                                   3.479 -0.063 0.022 0.206 0.041 0.004 -0.008 -0.006 -0.010 0.032
 SES > 0.5 & Female (14)
Note. FLUNCH, HACT, and MISBEHAV are ordinal variables. FLUNCH = categorized percentage of students enrolled in the school
who receive free or reduced-price lunch, HACT = hours spent on extracurricular activities on typical school day. SES = socio-
economic status scale score. BEHAVSCH = student’s school motivation scale score.
                                                                  38


Figure 5. A LGCM tree of MOB with minimum size of 250 per node and pruning of BIC using 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 statistic (BLK = Black, HIP =
Hispanic, OTS = Others, ASA = Asian, WHT = White, mi = 𝜇𝜇𝐼𝐼 ; mean intercept, ms = 𝜇𝜇𝑆𝑆 ; mean linear slope, mq = 𝜇𝜇𝑄𝑄 ; mean quadratic
slope).
                                                               39


Figure 6. Expected GPA changes over four years of the 13 distinct subgroups (nodes).
                                                                 40


                                    CHAPTER 4. METHODS
          This chapter describes how to evaluate the performance of model-based recursive
partitioning (MOB) with latent growth curve model (LGCM) under a population model. The
simulated datasets are also analyzed using growth mixture model (GMM) to compare the results
each other. The data generation and analysis of MOB with LGCM were conducted using R 4.0.5
software (R Core Team, 2021) with the two suggested packages of partykit (Hothorn &
Zeileis, 2015) and lavaan (Rosseel, 2012). Also, MplusAutomation (Hallquist & Wiley,
2018) and Mplus version 8 (Muthén & Muthén, 1998-2017) were used together in R software to
fit GMM.
4.1     Population model for data generation
          A plausible population model was chosen to simulate datasets based on the result of the
empirical analysis described in Chapter 3. The values of the parameters are presented in Table 7.
This study only considers the differences of the mean intercept because the results of the
empirical analysis showed that most of the subgroups had similar change trajectories across time.
Moreover, the random effects of the subgroups were very similar to each other (see Table 6).
Thus, this study tries to mimic the empirical results.
          The number of true subgroups is set to four. The four subgroups are determined by
three covariates having interaction effect. This study presumes that the four subgroups have only
different intercepts depending on the effect size with little variance. There are two reasons for
this scenario. The first reason of considering one population model is that the purpose of this
study is to investigate the performance of MOB with LGCM under a few different conditions
                                                 41


and other available options for researcher need to choose. This helps ones to not only interpret
the results more concisely, but also focus on the practical options that needed to be investigated.
The second reason is that the mean intercept difference dramatically increased when the amount
of the random effect of the intercept was large. This is unrealistic in practice. Therefore, the
random effect is set to small imitating the empirical results as well. All the remaining parameters
including the linear and quadratic slopes as well as their variance-covariance components were
assumed to be the same across four groups for the purpose of study.
Table 7. Parameters of a population model
 Parameters                                                                    Coefficients
 Means                        Intercept (𝜇𝜇𝐼𝐼 )                                    2.800
                              Linear slope (𝜇𝜇𝑆𝑆 )                                 -0.100
                              Quadratic slope (𝜇𝜇𝑄𝑄 )                              0.060
 Variances                    Intercept (𝜎𝜎𝐼𝐼2 )                                   0.471
                              Linear slope (𝜎𝜎𝑆𝑆2 )                                0.057
                              Quadratic slope (𝜎𝜎𝑄𝑄2 )                             0.007
 Covariances                  I ~ S (𝜎𝜎𝐼𝐼𝐼𝐼 )                                      0.001
                              I ~ Q (𝜎𝜎𝐼𝐼𝐼𝐼 )                                      -0.012
                              S ~ Q (𝜎𝜎𝑆𝑆𝑆𝑆 )                                      -0.016
 Variances                    Residuals (𝜎𝜎𝜀𝜀2 )                                   0.084
          The mean intercept was varied across subgroups systematically using the Cohen’s
effect size to enhance interpretation. Based on the values of the effect size, the mean intercept
values were calculated to have different values. The Cohen’s effect size is obtained by the mean
difference between two groups (𝜇𝜇1𝐼𝐼 − 𝜇𝜇2𝐼𝐼 ) divided by the pooled standard deviation (𝜎𝜎𝐼𝐼 ) as the
equation (4-1) where I indicates the intercept.
                                                    42


                                                       𝜇𝜇1𝐼𝐼 − 𝜇𝜇2𝐼𝐼
                                    𝐶𝐶𝐶𝐶ℎ𝑒𝑒𝑛𝑛′ 𝑠𝑠 𝑑𝑑 =               .                                 (4-1)
                                                             𝜎𝜎𝐼𝐼
           Since the effect size is ranged from 0.2 to 1.0, their corresponding mean intercept
differences are 0.14, 0.27, 0.41, 0.55, and 0.69. Thus, the expected GPA score changes of the
four subpopulations over four years can be represented as Figure 7. While the mean intercept is
different across four subgroups, the change trajectory is not different from each other. The
specific population mode with the parameters to simulate datasets for each group across the
effect size is presented in Table 8.
Table 8. Parameters of fixed effects for each subgroup depending on effect size
    Effect      Subgrou
                                 Intercept (𝜇𝜇𝐼𝐼 )                Linear slope (𝜇𝜇𝑆𝑆 ) Quadratic slope (𝜇𝜇𝑄𝑄 )
     size           p
     0.2           G1                 2.660                            -0.100                0.060
                   G2                 2.800                            -0.100                0.060
                   G3                 2.940                            -0.100                0.060
                   G4                 3.080                            -0.100                0.060
     0.4           G1                 2.530                            -0.100                0.060
                   G2                 2.800                            -0.100                0.060
                   G3                 3.070                            -0.100                0.060
                   G4                 3.340                            -0.100                0.060
     0.6           G1                 2.390                            -0.100                0.060
                   G2                 2.800                            -0.100                0.060
                   G3                 3.210                            -0.100                0.060
                   G4                 3.620                            -0.100                0.060
     0.8           G1                 2.250                            -0.100                0.060
                   G2                 2.800                            -0.100                0.060
                   G3                 3.350                            -0.100                0.060
                   G4                 3.900                            -0.100                0.060
       1           G1                 2.110                            -0.100                0.060
                   G2                 2.800                            -0.100                0.060
                   G3                 3.490                            -0.100                0.060
                   G4                 4.180                            -0.100                0.060
Note. G1-G4 are names of subgroups. The variance-covariance components and residuals
are the same across subgroups as the population model.
                                                         43


Figure 7. The expected GPA changes over four years of the four subpopulations depending on
effect size.
          The four repeated outcomes are generated by the quadratic latent growth curve model
for each subgroup. Three informative covariates are used to differentiate the four subgroups
showing interaction effects of the covariates. Three types of covariates, which are categorical,
ordinal, and continuous covariates, are used. In addition, four noise variables (two continuous,
one ordinal and one categorical variables) are generated regardless of the subgroups. A true tree
structure of the four subgroups with the cut-points is visualized in Figure 8. Within the square
boxes at the bottom of the tree, all the parameter values of the specific subgroup are presented.
Looking at the tree, the first splitting covariate is the ordinal covariate with the cut point of 2.
The first subgroup from the left has the lowest mean intercept of 2.660, and this group has an
interaction between the ordinal and continuous covariates. That is, if the informative ordinal
covariate is less than or equal to 2, the whole samples are divided into subsamples first, then the
                                                    44


subsamples are again divided into two subsamples if the informative continuous covariate is less
than or equal to -0.7. If the informative continuous covariate is more than -0.7, the second
subgroup from the left has higher mean intercept than the first subgroup. Back to the top of the
tree, if the informative ordinal covariate is larger than 2, the subsamples are divided into the right
part of the tree. Then, the subsamples are divided into two subgroups based on the values of the
informative categorical covariate. The splitting points of these informative covariates are the
same across simulation conditions.
Figure 8. A true tree structure of four subgroups using the effect size of 0.2.
                                                   45


4.2     Simulation design
          This study considered five simulation conditions; a) effect size (0.2, 0.4, 0.6, 0.8, and
1.0), b) sample size (1,000, 2,000, 5,000, 10,000, and 20,000), c) treatment of ordinal covariate
with different test statistic (chi-square test statistic, adapted maximum Lagrange multiplier, and a
weighted double maximum), d) minimum sample size per subgroup (250 vs. none), and e) post-
pruning option (BIC vs. none). Crossing the conditions fully (5 x 5 x 3 x 2 x 2), there are 300
conditions. The number of replications was 100. Since the conditions of c), d), and e) are the
options of MOB, 2,500 (5 x 5 x 100) datasets in total were simulated and analyzed by MOB with
LGCM and GMM, respectively.
          The third purpose of this study is to investigate how well GMM extracts the true
number of subgroups (latent class in GMM) with the same datasets generated from the
population model. Unconditional GMM without covariates was fitted to the data to enumerate
the number of latent classes (subgroups). The covariates were not used to predict the latent
classes because it is infeasible to make all possible interaction effects among the three
informative and four noninformative covariates, which was considered for the population model.
Moreover, GMM requires several more steps to find the best fitting model among multiple
candidate models, which has one, two, three, four, or five classes. In addition, it is suggested to
increase the number of initial random starts and final stage optimization to avoid local maxima
solutions, ensuring that the best likelihood was replicated. Thus, the number of 1,000 and the
number of 100 were chosen for the initial random starts and the final stage optimization,
respectively. In sum, five candidate models were fitted to the same data and the best fitting
model was chosen with the lowest value of BIC and significant statistical testing of Lo-Mendell-
Rubin likelihood ratio test (Chen et al., 2017; Lo et al., 2001; Nylund et al., 2007). That is, if a
                                                     46


model shows the lowest BIC value as well as the significant LRT result which the p-value is less
than 0.05, the model was chosen to be the best fitting model and the relevant estimates were
stored to be evaluated. Entropy was also examined, but it was not considered to determine the
number of classes.
4.3      Evaluation criteria
           For each condition, a) recovery of the true number of subgroups, b) overall
classification accuracy and precision of the subgroups, c) accuracy of the splitting points of the
covariates, d) average bias and root mean squared error (RMSE) of focal fixed effects, which is
the mean intercept estimate, and e) desirable options for test statistic of the ordinal covariates,
post pruning method using BIC, and limiting minimum sample size per a subgroup, were
evaluated.
           First, the recovery of the true number of subgroups was evaluated using two statistics,
the mean number of estimated subgroups (MNS) and the statistical power (SP) to correctly
recover the true number of subgroups (𝐶𝐶𝑇𝑇 ) among the number of the estimated subgroups (𝐶𝐶̂𝑟𝑟 ).
𝑆𝑆𝑆𝑆 can be calculated as the sum of the number of estimated subgroups that is equal to the true
number of subgroups (P), where r is the rth replication and R is the total number of replications.
While the former statistic informs whether the number of subgroups is overestimated or
underestimated, the latter one informs the power to recover the true number of subgroups. These
statistics, however, do not tell whether the composition of the subgroups is correctly recovered or
not.
                                                 𝑅𝑅=100
                                              1
                                     𝑀𝑀𝑀𝑀𝑀𝑀 =      � 𝐶𝐶̂𝑟𝑟                                      (4-2)
                                              𝑅𝑅
                                                   𝑟𝑟=1
                                                     47


                                          𝑅𝑅=100
                                   𝑆𝑆𝑆𝑆 = � |𝐶𝐶̂𝑟𝑟 ∈ 𝐶𝐶𝑇𝑇 |                                     (4-3)
                                            𝑟𝑟=1
           Second, overall classification accuracy and precision of the subgroups were calculated
using the information from a confusion table presented in Table 4.3. The confusion matrix
provides a tabular summary of the actual subgroup labels versus the predicted ones. Since this
study has the four number of subgroups, the confusion table consists of 4 x 4 matrix. Let 𝑁𝑁 be
the total number of samples in the confusion table. 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 is defined as the number of correctly
classified samples per subgroup. This produces a vector of 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = [𝑁𝑁11, 𝑁𝑁22, 𝑁𝑁33, 𝑁𝑁44]. The
number of samples per subgroup is a vector of 𝑇𝑇𝑇𝑇 = [𝑇𝑇𝑇𝑇1, 𝑇𝑇𝑇𝑇2, 𝑇𝑇𝑇𝑇3, 𝑇𝑇𝑇𝑇4] and the number of
predictions per subgroup is a vector of 𝑃𝑃𝑃𝑃 = [𝑃𝑃𝑃𝑃1, 𝑃𝑃𝑃𝑃2, 𝑃𝑃𝑃𝑃3, 𝑃𝑃𝑃𝑃4].
Table 9. Confusion table
                                                        Predicted subgroup
                                    G1            G2              G3            G4          Total
                    G1             N11           N12             N13           N14          TN1
                    G2             N21           N22             N23           N24          TN2
     True
                    G3             N31           N32             N33           N34          TN3
  subgroup
                    G4             N41           N42             N43           N44          TN4
                   Total           PN1           PN2             PN3           PN4            N
Note. G1-G4 are names of subgroups. N11-N44 are the number of samples in each cell.
           The overall classification accuracy is regarded as a metric to evaluate overall
performance of the model. This can be calculated as the total number of correct predictions
divided by the total number of predictions made for the dataset, which represented in the (4-4)
equation. The prediction is the number of positive subgroup predictions that actually belong to
the positive subgroup. This is obtained by the equation (4-5), which is the 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 divided by 𝑃𝑃𝑃𝑃.
                                                  48


The recall is the number of positive subgroup predictions made out of all positive individuals in
the dataset. This is obtained by the equation (4-6), which is the 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 divided by 𝑇𝑇𝑇𝑇. The F1 is a
single score that balances both the concerns of precision and recall in one number. This is the
harmonic average of recall and precision. This can be obtained by two times precision multiplied
by recall divided by the sum of the precision and recall as presented in the equation (4-7). All
metrics are ranged between 0 and 1.
          The precision, recall, and F1 score are the subgroup-specific metrics, which produces
vectors having four values, respectively. Since this study aims to evaluate the overall
performance of the classification, those three metrics (precision, recall, and F1) are averaged
over four subgroups rather than using subgroup-specific metrics, resulting in macro-averaged
precision, recall, and F1 score from the equations (4-8) - (4-10).
                                         ∑ 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑁𝑁11 + 𝑁𝑁22 + 𝑁𝑁33 + 𝑁𝑁44
                    𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =                 =                                      ,    (4-4)
                                             𝑁𝑁                                𝑁𝑁
                                                                    𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
                                            𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =             ,                     (4-5)
                                                                      𝑃𝑃𝑃𝑃
                                                                 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
                                               𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =             ,                        (4-6)
                                                                   𝑇𝑇𝑇𝑇
                                              2 ∗ 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗ 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
                                     𝐹𝐹1 =                                          ,              (4-7)
                                                 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
                                                                                                1
                 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 (𝐴𝐴𝐴𝐴) = � 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗    (4-8)
                                                                                                4
                                                                                          1
                     𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝐴𝐴𝐴𝐴) = � 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 ∗ ,          (4-9)
                                                                                          4
                                                                                      1
                          𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝐹𝐹1 (𝑀𝑀𝑀𝑀𝑀𝑀1) = � 𝐹𝐹1 ∗ ,                   (4-10)
                                                                                      4
                                                                   49


            Then, the evaluation was conducted by calculating the average of each metric over the
 replication for the simulation study, which are the mean accuracy, mean macro averaged
 precision, mean macro averaged recall, and mean macro averaged F1 represented by the
 equations (4-11) - (4-14).
                                                                      𝑅𝑅=100
                                                                   1
                         𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 (𝑀𝑀𝑀𝑀) =             � 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝑟𝑟          (4-11)
                                                                   𝑅𝑅
                                                                        𝑟𝑟=1
                                                                                           𝑅𝑅=100
                                                                                       1
                𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 (𝑀𝑀𝑀𝑀𝑀𝑀) =           � 𝐴𝐴𝐴𝐴𝑟𝑟 (4-12)
                                                                                       𝑅𝑅
                                                                                             𝑟𝑟=1
                                                                                       𝑅𝑅=100
                                                                                   1
                  𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝑀𝑀𝑀𝑀𝑀𝑀) =           � 𝐴𝐴𝐴𝐴𝑟𝑟     (4-13)
                                                                                   𝑅𝑅
                                                                                         𝑟𝑟=1
                                                                                   𝑅𝑅=100
                                                                                1
                   𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝐹𝐹1 (𝑀𝑀𝑀𝑀𝑀𝑀1) =              � 𝐴𝐴𝐴𝐴1𝑟𝑟        (4-14)
                                                                                𝑅𝑅
                                                                                     𝑟𝑟=1
           Third, it is also important to make sure if the composition of the estimated subgroups is
correct. This is evaluated by checking the splitting points of the covariates. The accuracy of the
splitting (cut) points can be calculated using an average of split point where S stands for splitting
point.
                                                             𝑅𝑅=100
                                                         1
                                              𝑀𝑀𝑀𝑀 =           � 𝑆𝑆̂𝑟𝑟                                 (4-8)
                                                         𝑅𝑅
                                                               𝑟𝑟=1
           Fourth, it is necessary to examine the parameter estimates of each subgroup if the
                                                                  50


estimates are unbiased and efficient. This also can be evaluated using bias and RMSE of the
estimates. Since the population model focuses on the differences of the mean intercept, only the
mean intercept was evaluated using below two equations.
                                                 𝑅𝑅=100
                                              1
                                 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐸𝐸 =      � 𝜃𝜃�𝑟𝑟 − 𝜃𝜃𝑇𝑇                          (4-9)
                                              𝑅𝑅
                                                   𝑟𝑟=1
                                                      𝑅𝑅
                                                1                 2
                              𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝐸𝐸 = � ��𝜃𝜃�𝑟𝑟 − 𝜃𝜃𝑇𝑇 �                             (4-10)
                                                𝑅𝑅
                                                    𝑟𝑟=1
          Fifth, a simulation is parallelly conducted with unconditional GMM using the same
datasets, and the results of MOB are compared with the ones of GMM. The insights from the
comparative study reveal how different approaches can be useful.
                                                         51


                                      CHAPTER 5. RESULTS
           This chapter presents the results of the simulation study organized into six sections
corresponding to the research questions described in Chapter 1. The first four sections show the
results of the comprehensive simulation studies that investigate the performance of MOB with
LGCM from a certain population model under considered conditions. Statistical power to detect
the true number of subgroups, classification accuracy and precision, accuracy of splitting point
of the covariates, and bias and root mean squared error of parameter estimates are presented. The
fifth section reveals desirable several options of test statistic for ordinal covariates, post pruning
option using BIC, and setting the minimum sample size per a subgroup. The last section briefly
presents the results of comparison between MOB and GMM primarily focusing on the number of
subgroups.
5.1     Statistical power to recover the true number of subgroups
           This section is to answer the second research question that how well MOB correctly
determine the true number of subgroups for a given population model. Table 10 shows the results
of the average number of the estimated subgroups and the statistical power to recover the true
number of subgroups. The analysis was conducted using 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 test statistic that is the weighted
double maximum for ordinal covariate. The results show that as the magnitude of effect size and
the sample size increase, the statistical power increases with the post pruning method using BIC.
The highlighted cells in the table with green color indicate 95-100% of recovery rate of the true
number of subgroups. Without the pruning option using BIC, there was a tendency to over
extract the number of subgroups. Even without setting up the minimum sample size per
subgroup, using BIC as the pruning method worked to recover the true number of subgroups
                                                  52


with conditions that the effect size is larger than or equal to 0.4 and the sample size is larger than
or equal to 10,000.
          With the small effect size of 0.2, the sample sizes of 1,000 and 2,000 were not
sufficient to recover the true number of subgroups. The average number of estimated subgroups
is ranged from 1.00 to 2.15 in this condition. As the sample sizes increase to 5,000, 10,000, and
20,000, the average number of the estimated subgroup and the power increase as well. However,
the power was ranged from 75% to 84%, which was not enough to correctly detect the true
number of subgroups. When the effect size is small, post pruning option makes less the number
of subgroups, which the power is 0%.
          With the conditions of medium effect size of 0.4 and 0.6 using BIC pruning option, the
sample sizes of 10,000 and 5,000, respectively, were sufficient to recover the true number of
subgroups. The mean estimated number of subgroups with the sample size of 2,000 and the
minimum sample size of 250 without the BIC pruning option was 4.0, which is the same with the
true number of subgroups, and its power to recover it was 94%. With the large effect sizes of 0.8
and 1.0, the results with sample sizes of 1,000 and 2,000 show that the true number of subgroups
without pruning and with the minimum sample size of 250 was perfectly recovered. This feature
of options will be described in the fifth section in detail.
                                                  53


Table 10. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
true number of subgroups using WDMO statistics
                                      N=1,000     N=2,000    N=5,000      N=10,000  N=20,000
  Effect       Min       Prunin
                                     MN          MN         MN            MN        MN
  Size         #         g                   P          P           P             P        P
                                      S            S         S             S         S
  0.2          No        No          2.13    1   2.15   2   3.47   35    4.27    78 4.24  80
                         BIC         1.00    0   1.00   0   2.00    0    2.00     0 2.00   0
               250       No          2.02    0   2.09   0   3.50   41    4.18    84 4.33  75
                         BIC         1.00    0   1.03   0   2.00    0    2.00     0 2.00   0
  0.4          No        No          2.48    8   4.13  84   4.17   86    4.16    85 4.19  84
                                                                                 10       10
                                     2.00    0   2.00   0   2.04    0    4.00       4.00
                         BIC                                                      0        0
               250       No          2.88   17   4.00  94   4.21   81    4.11    90 4.14  87
                                                                                 10       10
                                     2.00    0   2.00   0   2.05    0    4.00       4.00
                         BIC                                                      0        0
  0.6          No        No          4.16   84   4.22  83   4.23   80    4.27    79 4.27  78
                                                                   10            10       10
                                     2.00    0   2.11   1   4.00         4.00       4.00
                         BIC                                        0             0        0
               250       No          3.98   98   4.03  97   4.15   87    4.23    84 4.14  87
                                                                   10            10       10
                                     2.00    0   2.12   3   4.00         4.00       4.00
                         BIC                                        0             0        0
  0.8          No        No          4.16   84   4.32  74   4.21   82    4.23    78 4.18  84
                                                       10          10            10       10
                                     2.17    2   4.00       4.00         4.00       4.00
                         BIC                            0           0             0        0
                                            10
                                     4.00        4.02  98   4.19   85    4.12    89 4.21  83
               250       No                  0
                                                       10          10            10       10
                                     2.15    2   4.00       4.00         4.00       4.00
                         BIC                            0           0             0        0
  1.0          No        No          4.12 88 4.26 79 4.25 77 4.22 84 4.19                 85
                                                       10          10            10       10
                                     3.99 99 4.00           4.00         4.00       4.00
                         BIC                            0           0             0        0
                                            10         10
                                     4.00        4.00       4.17 84 4.27 78 4.17          83
               250       No                  0          0
                                            10         10          10            10       10
                                     4.00        4.00       4.00         4.00       4.00
                         BIC                 0          0           0             0        0
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
MNS = mean number of estimated subgroups. P = statistical power.
                                                  54


5.2     Overall classification accuracy and precision
           This section is to answer the third research question that how well MOB accurately and
precisely classify the true subgroups for a given population model. Tables 11 - 15 show the
results of the classification accuracy, macro-averaged precision, recall, and F1 score using
𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 test statistic which is the weighted double maximum for ordinal covariate. The tables are
presented in the order of effect size. Since the values of recall and F1 score were very similar to
the accuracy and precision, this section focuses on describing the accuracy and precision.
           First, looking at the Table 11, most of the means of classification accuracy is below
0.90. The maximum of the classification accuracy was 0.90 with the sample size of 20,000
without both minimum sample size and BIC pruning option. Some cells with the sample sizes of
1,000 and 2,000 empty because those conditions did not even recover the four true subgroups
once. As sample size increases, the classification accuracy, precision, recall, and F1 score tend to
increase. When the effect size was 0.2, the classification accuracy and precision were around
0.90 and 0.98, respectively without BIC pruning option. Even though the power to recover the
true number of subgroups was ranged from 75 - 84% without the BIC pruning option for the
sample sizes of 10,000 and 20,000 in this condition, their classification accuracy and precision
were ranged from 0.89 - 0.99.
           Second, Table 12 shows the results of the effect size of 0.4. For the cells of perfectly
recovering the true number of subgroups, their classification accuracy and precision were also
100%. These conditions are the sample sizes of 10,000 and 20,000 with the BIC pruning option.
Interestingly, classification accuracy and precision were 0.97 and 0.98, respectively, for the
sample size of 2,000 using minimum sample size option without pruning method. As the effect
                                                  55


size increases from 0.4 to 0.6, the classification accuracy and precision tend to increase as well
under relatively small sample sizes, such as 1,000 and 2,000 with the option of setting minimum
sample size of 250. As presented in Table 13, the performance of MOB in terms of the
classification accuracy and precision was almost perfect under smaller sample sizes (1,000 and
2,000) with the option of setting the minimum sample size of 250 rather than with BIC pruning
option. However, the performance was reversed under larger sample sizes (5,000 - 20,000) with
the BIC pruning option regardless of the minimum sample size of 250.
          Third, the results of large effect sizes (0.8 and 1.0) are presented in Tables 14 and 15,
respectively. There were conditions that perfectly classified the subgroups with smaller sample
sizes, such as 1,000 and 2,000. For the sample size of 1,000, MOB performed perfectly with the
option of minimum sample size of 250 without BIC pruning when the effect sizes were 0.8 and
1.0. As the sample size increases to 2,000, MOB performs perfectly except one condition that
there are no options of setting minimum sample size and BIC pruning. As the sample size
increases more than 5,000, MOB requires pruning option using BIC to perfectly classify the
subgroups. This is because MOB tends to over-estimate the number of subgroups without
pruning with the larger sample size as presented in Table 10.
                                                   56


Table 11. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
(MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.2)
                                                                  Effect Size = 0.2
 Sample Size            Min #          Pruning
                                                        MA        MAP         MAR   MAF1
 1,000                  No             No               0.27      0.27        0.51   0.51
                                       BIC              0.25      0.25          -       -
                        250            No               0.25      0.25        0.50   0.50
                                       BIC              0.25      0.25          -       -
 2,000                  No             No               0.27      0.27        0.51   0.51
                                       BIC              0.25      0.25          -       -
                        250            No               0.27      0.27        0.52   0.52
                                       BIC              0.25      0.25          -       -
 5,000                  No             No               0.66      0.72        0.79   0.79
                                       BIC              0.25      0.25        0.50   0.50
                        250            No               0.69      0.74        0.81   0.81
                                       BIC              0.25      0.25        0.50   0.50
 10,000                 No             No               0.89      0.97        0.89   0.89
                                       BIC              0.25      0.25        0.50   0.50
                        250            No               0.90      0.95        0.90   0.90
                                       BIC              0.25      0.25        0.50   0.50
 20,000                 No             No               0.90      0.99        0.90   0.90
                                       BIC              0.25      0.25        0.50   0.50
                        250            No               0.87      0.95        0.87   0.87
                                       BIC              0.25      0.25        0.50   0.50
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                 57


Table 12. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
(MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.4)
                                                                  Effect size = 0.4
 Sample Size            Min #          Pruning
                                                        MA        MAP         MAR   MAF1
 1,000                  No             No               0.35      0.35        0.58   0.58
                                       BIC              0.25      0.25        0.50   0.50
                        250            No               0.61      0.64        0.82   0.82
                                       BIC              0.25      0.25        0.50   0.50
 2,000                  No             No               0.91      0.95        0.92   0.92
                                       BIC              0.25      0.25        0.50   0.50
                        250            No               0.97      0.98        0.97   0.97
                                       BIC              0.25      0.25        0.50   0.50
 5,000                  No             No               0.92      0.97        0.92   0.92
                                       BIC              0.27      0.27        0.52   0.52
                        250            No               0.92      0.97        0.92   0.92
                                       BIC              0.26      0.25        0.51   0.51
 10,000                 No             No               0.93      0.96        0.93   0.93
                                       BIC              1.00      1.00        1.00   1.00
                        250            No               0.94      0.96        0.94   0.94
                                       BIC              1.00      1.00        1.00   1.00
 20,000                 No             No               0.91      0.95        0.91   0.91
                                       BIC              1.00      1.00        1.00   1.00
                        250            No               0.92      0.96        0.92   0.92
                                       BIC              1.00      1.00        1.00   1.00
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                 58


Table 13. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
(MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.6)
                                                                   Effect size=0.6
 Sample Size            Min #          Pruning
                                                        MA        MAP         MAR   MAF1
 1,000                  No             No               0.88       0.92        0.89   0.89
                                       BIC              0.25       0.25        0.50   0.50
                        250            No               1.00       1.00        1.00   1.00
                                       BIC              0.25       0.25        0.50   0.50
 2,000                  No             No               0.90       0.95        0.90   0.90
                                       BIC              0.29       0.29        0.54   0.54
                        250            No               0.98       0.99        0.98   0.98
                                       BIC              0.29       0.29        0.53   0.53
 5,000                  No             No               0.90       0.94        0.90   0.90
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               0.93       0.96        0.93   0.93
                                       BIC              1.00       1.00        1.00   1.00
 10,000                 No             No               0.87       0.93        0.87   0.87
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               0.91       0.95        0.91   0.91
                                       BIC              1.00       1.00        1.00   1.00
 20,000                 No             No               0.89       0.94        0.89   0.89
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               0.92       0.99        0.92   0.92
                                       BIC              1.00       1.00        1.00   1.00
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                 59


Table 14. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
(MAR), and F1 (MAF1) using WDMO statistics (Effect size=0.8)
                                                                   Effect size=0.8
 Sample Size            Min #          Pruning
                                                        MA        MAP         MAR   MAF1
 1,000                  No             No               0.91      0.95        0.91   0.91
                                       BIC              0.29      0.29        0.54   0.54
                        250            No               1.00      1.00        1.00   1.00
                                       BIC              0.30      0.30        0.55   0.55
 2,000                  No             No               0.89      1.00        0.89   0.89
                                       BIC              1.00      1.00        1.00   1.00
                        250            No               0.98      0.99        0.98   0.98
                                       BIC              1.00      1.00        1.00   1.00
 5,000                  No             No               0.92      0.98        0.92   0.92
                                       BIC              1.00      1.00        1.00   1.00
                        250            No               0.89      0.92        0.89   0.89
                                       BIC              1.00      1.00        1.00   1.00
 10,000                 No             No               0.90      0.96        0.90   0.90
                                       BIC              1.00      1.00        1.00   1.00
                        250            No               0.95      0.97        0.95   0.95
                                       BIC              1.00      1.00        1.00   1.00
 20,000                 No             No               0.92      0.97        0.92   0.92
                                       BIC              1.00      1.00        1.00   1.00
                        250            No               0.91      0.99        0.91   0.91
                                       BIC              1.00      1.00        1.00   1.00
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                 60


Table 15. Means of classification accuracy (MA), macro-averaged precision (MAP), recall
(MAR), and F1 (MAF1) using WDMO statistics (Effect size=1.0)
                                                                   Effect size=1.0
 Sample Size            Min #          Pruning
                                                        MA        MAP         MAR   MAF1
 1,000                  No             No               0.96       0.99        0.96   0.96
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               1.00       1.00        1.00   1.00
                                       BIC              1.00       1.00        1.00   1.00
 2,000                  No             No               0.87       0.93        0.87   0.87
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               1.00       1.00        1.00   1.00
                                       BIC              1.00       1.00        1.00   1.00
 5,000                  No             No               0.90       0.98        0.90   0.90
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               0.91       0.94        0.91   0.91
                                       BIC              1.00       1.00        1.00   1.00
 10,000                 No             No               0.92       0.96        0.92   0.92
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               0.87       0.92        0.87   0.87
                                       BIC              1.00       1.00        1.00   1.00
 20,000                 No             No               0.93       0.97        0.93   0.93
                                       BIC              1.00       1.00        1.00   1.00
                        250            No               0.90       0.96        0.90   0.90
                                       BIC              1.00       1.00        1.00   1.00
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                 61


5.3      Accuracy of splitting points of covariates
           This section investigates how accurate the composition of the subgroups is by
evaluating the mean of the splitting points of covariates. Firstly, it was investigated if the noise
covariates were used for splitting at any stages. The results show that the noise covariates were
not selected as the splitting covariates under all conditions. Thus, those results are not presented
in this section. The averages of the splitting points of the informative covariates are presented in
Table 16. If the number of the detected subgroups is not equal to four, it is not possible to
calculate the means of all splitting points of the informative covariates. This is because not only
is the structure of the tree different from the true tree, but also the number of replications is not
sufficient to calculate the mean. Since the first splitting point is 2 for the informative (true)
ordinal covariate, the means of them are calculated across replications if the number of the
detected subgroups is larger than 1. If MOB did not differentiate the true subgroups, leading to
produce just one subgroup, there was not the splitting point. The splitting point of the ordinal
covariate is presented in the column of O in Table 16. The results show that the true splitting
point of 2 for the true informative ordinal covariate was correctly detected for the first splitting
under most of the conditions except two conditions of the effect size of 0.2 and the sample sizes
of 1,000 and 2 000 with BIC pruning option.
           The second true splitting is done by both a continuous covariate and a categorical
covariate as described in Figure 8. The true splitting point of the continuous covariate is -0.7.
The composition of the categorical covariate was also investigated though, the splitting
composition was perfect if the average estimated number of subgroups was four and correctly
recovered. If the number of the determined subgroups under some conditions is larger than 4,
such as 5 or 6 or 7, the subgroup was divided into additional subgroups according to the values
                                                   62


of the continuous covariate.
Table 16. Mean of splitting points of covariates (MS) using WDMO statistics
                                     N=1,000     N=2,000       N=5,000     N=10,000    N=20,000
 Effect Size Min # Pruning
                                     O    C      O      C     O      C      O     C    O     C
 0.2           No        No          2     -     2      -      2      -     2      -   2     -
                         BIC         -     -     -      -      2      -     2      -   2     -
               250       No          2     -     2      -      2      -     2      -   2     -
                         BIC         -     -     -      -      2      -     2      -   2     -
 0.4           No        No          2     -     2      -      2      -     2      -   2     -
                         BIC         2     -     2      -      2      -     2 -0.69 2      -0.70
               250       No          2     -     2      -      2      -     2      -   2     -
                         BIC         2     -     2      -      2      -     2 -0.69 2      -0.70
 0.6           No        No          2     -     2      -      2      -     2      -   2     -
                         BIC         2     -     2      -      2 -0.69 2 -0.70 2           -0.70
               250       No          2 -0.70 2 -0.64 2                -     2      -   2     -
                         BIC         2     -     2      -      2 -0.70 2 -0.70 2           -0.70
 0.8           No        No          2     -     2      -      2      -     2      -   2     -
                         BIC         2     -     2 -0.68 2 -0.69 2 -0.70 2                 -0.70
               250       No          2 -0.70 2 -0.65 2                -     2      -   2     -
                         BIC         2     -     2 -0.67 2 -0.70 2 -0.70 2                 -0.70
 1.0           No        No          2     -     2      -      2      -     2      -   2     -
                         BIC         2 -0.74 2 -0.67 2 -0.69 2 -0.70 2                     -0.70
               250       No          2 -0.70 2 -0.68 2                -     2      -   2     -
                         BIC         2 -0.70 2 -0.70 2 -0.70 2 -0.70 2                     -0.70
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
O = splitting point of ordinal covariate. C = splitting point of continuous covariate.
                                                  63


5.4     Bias and RMSE of parameter estimates
          To further check the unbiasedness and efficiency of the parameter estimates, bias and
root mean squared error (RMSE) were calculated. Since it is infeasible to calculate the bias and
RMSE if the number of estimated subgroups is not the same with the true number of subgroups,
only the results of the conditions that perfectly recovered the true number of subgroups are
presented in Tables 17 - 20 for bias and 21 - 24 for RMSE, respectively. Also, means of
intercept, linear slope, and quadratic slope with their variances are presented in this section
because the biases of covariances and residuals are close to zero across all conditions.
          When the effect size is 0.4, the conditions that perfectly recovered the number of true
subgroups are when the sample sizes are 10,000 and 20,000 with BIC post pruning option.
Looking at the bias of these conditions in Table 17, the range of all bias in this condition is from
-0.003 to 0.003. Focusing on the values of the mean intercept, the range of bias for the mean
intercept is from -0.003 to 0.002. Since the values of the mean intercept parameters with the
effect size of 0.4 are 2.53 (G1), 2.80 (G2), 3.07 (G3), and 3.34 (G4) for each true subgroup, the
relative magnitude of bias (which is calculated by the bias divided by the corresponding
parameter value times 100) is ranging from -0.09% to 0.09%. Table 21 shows the RMSE of
these conditions, and the values are close to zeros.
          When the effect size is 0.6, the conditions that perfectly recovered the number of true
subgroups are when the sample sizes are 5,000, 10,000, and 20,000 with BIC post pruning
option. Looking at the bias of these conditions in Table 18, the range of all bias in this condition
is from -0.005 to 0.004. Focusing on the values of the mean intercept, the range of bias for the
mean intercept is from -0.002 to 0.002. Since the values of the mean intercept parameters with
the effect size of 0.6 are 2.39 (G1), 2.80 (G2), 3.21 (G3), and 3.62 (G4) for each true subgroup,
                                                 64


the relative magnitude of bias is ranging from -0.21% to 0.17%. Table 22 shows the RMSE in
these conditions, and the values are very close to zeros.
          When the effect size is 0.8, the conditions that perfectly recovered the number of true
subgroups additionally included the sample sizes of 1,000 and 2,000 in certain pruning methods.
Specifically, when the sample size is 1,000, setting the minimum sample size of 250 without the
post pruning option using BIC recovered the true number of subgroups. Other conditions
recovered the true number of subgroups include the post pruning option using BIC regardless of
the minimum sample size of 250 across different sample sizes (2,000, 5,000, 10,000, and
20,000). Looking at the bias of these conditions in Table 19, the range of all bias in this
condition is from -0.011 to 0.01. Focusing on the values of the mean intercept, the range of bias
for the mean intercept is from -0.011 to 0.008. Since the values of the mean intercept parameters
with the effect size of 0.8 are 2.25 (G1), 2.80 (G2), 3.35 (G3), and 3.90 (G4) for each true
subgroup, the relative magnitude of bias is ranging from -0.29% to 0.24%. Table 23 shows the
RMSE in these conditions, and the values are very close to zeros.
          When the effect size is 1.0, the conditions that perfectly recovered the number of true
subgroups included two more conditions than the ones of the effect size of 0.8. These conditions
are when 1) sample size of 1,000 setting the minimum size of 250 per subgroup with the post
pruning method using BIC, 2) sample size of 2,000 setting the minimum size of 250 per
subgroup without the pruning method using BIC. Likewise, other conditions recovered the true
number of subgroups include the post pruning option using BIC regardless of the minimum
sample size of 250 across all sample sizes (1,000, 2,000, 5,000, 10,000, and 20,000). Looking at
the bias of these conditions in Table 20, the range of all bias in these conditions is from -0.012 to
0.013. Focusing on the values of the mean intercept, the range of bias for the mean intercept is
                                                 65


from -0.009 to 0.008. Since the values of the mean intercept parameters with the effect size of
1.0 are 2.11 (G1), 2.80 (G2), 3.49 (G3), and 4.18 (G4) for each subgroup, the relative magnitude
of bias is ranging from -0.38% to 0.24%. Table 24 shows the RMSE in these conditions, and the
values are very close to zeros.
Table 17. Bias of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=0.4)
    N        Min # Pruning Subgroups              mi       ms       mq       vi        vs    vq
  10,000      No        BIC           G1         0.002 -0.001 0.000 0.003 -0.002 0.000
                                      G2         0.002 0.001 0.000 0.002 0.001 0.000
                                      G3         0.002 -0.001 0.000 0.003 0.001 0.000
                                      G4         0.001 0.001 0.000 0.000 0.002 0.000
              250       BIC           G1        -0.002 0.000 0.000 0.002 -0.002 0.000
                                      G2        -0.001 -0.001 0.000 0.001 -0.002 0.000
                                      G3        -0.002 0.000 0.000 0.002 -0.002 0.000
                                      G4        -0.003 0.002 0.000 0.003 0.001 0.000
  20,000      No        BIC           G1         0.001 0.000 0.000 0.002 0.000 0.000
                                      G2         0.002 -0.002 0.000 0.001 0.000 0.000
                                      G3         0.002 0.000 0.000 0.000 0.001 0.000
                                      G4         0.001 0.001 0.000 0.000 -0.001 0.000
              250       BIC           G1         0.001 0.000 0.000 -0.001 0.002 0.000
                                      G2         0.001 0.001 0.000 0.000 0.000 0.000
                                      G3         0.001 0.001 0.000 -0.001 0.000 0.000
                                      G4         0.001 -0.001 0.000 -0.001 0.000 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                  66


Table 18. Bias of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=0.6)
    N      Min # Pruning Subgroups                mi       ms      mq        vi        vs    vq
  5,000      No         BIC           G1         0.004 0.001 -0.001 0.003 0.001 0.000
                                      G2         0.003 0.000 0.000 0.004 -0.001 0.000
                                      G3         0.001 0.000 0.000 0.003 -0.001 0.000
                                      G4         0.001 0.002 -0.001 0.002 -0.004 0.000
            250         BIC           G1        -0.005 0.001 0.000 0.001 -0.001 0.000
                                      G2        -0.001 -0.001 0.000 0.004 0.000 0.000
                                      G3        -0.002 -0.002 0.001 0.003 -0.004 0.000
                                      G4        -0.004 0.001 0.000 -0.001 0.003 0.000
 10,000      No         BIC           G1         0.001 -0.002 0.001 -0.001 0.000 0.000
                                      G2         0.000 0.000 0.000 -0.001 0.000 0.000
                                      G3         0.002 -0.002 0.001 -0.002 0.000 0.000
                                      G4        -0.001 0.003 -0.001 -0.004 0.000 0.000
            250         BIC           G1        -0.001 -0.001 0.001 -0.003 0.002 0.000
                                      G2        -0.001 0.000 0.000 -0.002 -0.001 0.000
                                      G3        -0.002 0.001 0.000 -0.005 0.001 0.000
                                      G4         0.000 -0.001 0.000 -0.004 0.000 0.000
 20,000      No         BIC           G1         0.002 0.001 0.000 -0.002 0.000 0.000
                                      G2         0.003 0.001 0.000 -0.001 0.000 0.000
                                      G3         0.004 0.001 0.000 -0.001 0.000 0.000
                                      G4         0.003 0.001 -0.001 0.000 0.000 0.000
            250         BIC           G1         0.000 0.000 0.000 -0.001 -0.002 0.000
                                      G2        -0.001 -0.001 0.000 0.000 0.001 0.000
                                      G3        -0.001 0.001 0.000 0.000 0.000 0.000
                                      G4        -0.001 -0.001 0.000 -0.002 -0.001 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                  67


Table 19. Bias of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=0.8)
    N      Min # Pruning Subgroups                mi      ms       mq        vi       vs     vq
   1000     250         No            G1         0.004 0.005 -0.001 -0.004 -0.002 0.000
                                      G2         0.005 0.006 -0.002 0.010 0.001 0.000
                                      G3         0.008 0.002 0.000 -0.003 -0.002 0.000
                                      G4         0.007 0.005 -0.002 0.003 -0.005 0.000
   2000      No         BIC           G1        -0.006 -0.002 0.001 0.000 -0.003 0.000
                                      G2        -0.008 0.000 0.001 0.000 0.000 0.000
                                      G3        -0.005 0.002 -0.001 0.001 -0.004 0.000
                                      G4        -0.011 0.006 -0.002 0.002 -0.005 -0.001
            250         BIC           G1        -0.002 -0.005 0.002 -0.007 -0.006 -0.001
                                      G2        -0.004 -0.003 0.002 -0.006 -0.001 0.000
                                      G3         0.001 -0.010 0.003 -0.008 -0.004 0.000
                                      G4        -0.003 -0.004 0.001 -0.004 -0.001 0.000
   5000      No         BIC           G1        -0.003 0.004 -0.001 0.002 -0.001 0.000
                                      G2         0.000 0.002 -0.001 0.001 -0.001 0.000
                                      G3         0.000 0.002 -0.001 0.003 -0.002 0.000
                                      G4         0.000 -0.003 0.001 0.003 0.001 0.000
            250         BIC           G1         0.004 0.002 -0.001 -0.004 -0.002 0.000
                                      G2         0.005 -0.001 -0.001 -0.003 0.000 0.000
                                      G3         0.003 0.001 0.000 -0.002 0.000 0.000
                                      G4         0.004 0.000 0.000 -0.004 -0.001 0.000
  10000      No         BIC           G1         0.001 0.000 0.000 0.001 0.000 0.000
                                      G2         0.000 0.003 -0.001 0.002 -0.002 0.000
                                      G3         0.001 0.000 0.000 0.000 0.000 0.000
                                      G4        -0.001 0.003 0.000 0.002 -0.003 0.000
            250         BIC           G1        -0.001 -0.002 0.001 -0.002 -0.001 0.000
                                      G2         0.001 -0.002 0.001 -0.001 0.001 0.000
                                      G3         0.000 -0.001 0.000 -0.003 0.002 0.000
                                      G4         0.000 -0.002 0.001 -0.004 0.001 0.000
  20000      No         BIC           G1        -0.001 -0.001 0.000 0.001 0.000 0.000
                                      G2        -0.001 -0.001 0.000 0.001 0.000 0.000
                                      G3        -0.001 -0.001 0.000 0.002 0.000 0.000
                                      G4        -0.001 0.000 0.000 0.002 0.000 0.000
            250         BIC           G1        -0.001 -0.001 0.000 0.001 0.001 0.000
                                      G2        -0.001 0.000 0.000 -0.001 -0.001 0.000
                                      G3         0.000 -0.001 0.000 -0.001 0.000 0.000
                                      G4        -0.001 0.000 0.000 -0.001 0.001 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                  68


Table 20. Bias of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=1.0)
   N       Min # Pruning Subgroups            mi       ms        mq       vi       vs      vq
  1000      250       No          G1         0.004 0.001 -0.001 0.002 -0.004 0.000
                      No          G2        -0.001 0.002 0.000 0.007 0.004 0.000
                      No          G3         0.004 -0.007 0.003 0.000 -0.002 0.000
                      No          G4         0.006 -0.012 0.004 0.004 0.001 0.000
            250       BIC         G1        -0.005 0.005 0.000 0.007 -0.002 0.000
                      BIC         G2         0.001 -0.004 0.002 0.009 -0.001 0.000
                      BIC         G3        -0.002 0.001 0.000 0.013 0.002 0.000
                      BIC         G4         0.000 -0.004 0.002 0.003 0.002 0.000
  2000      No        BIC         G1        -0.008 0.003 0.000 0.004 0.003 0.000
                      BIC         G2        -0.009 0.003 -0.001 0.001 -0.002 0.000
                      BIC         G3        -0.007 -0.001 0.001 0.001 -0.001 0.000
                      BIC         G4        -0.007 -0.001 0.000 -0.002 0.001 0.000
            250       No          G1        -0.005 0.006 -0.002 0.004 0.000 0.000
                      No          G2        -0.004 0.004 -0.001 0.000 -0.003 0.000
                      No          G3        -0.002 -0.001 0.001 0.007 -0.003 0.000
                      No          G4        -0.001 0.000 0.000 0.003 0.001 0.000
            250       BIC         G1         0.001 -0.005 0.002 -0.004 -0.007 0.000
                      BIC         G2        -0.001 0.003 -0.001 -0.007 -0.007 -0.001
                      BIC         G3         0.000 -0.003 0.001 -0.003 -0.002 0.000
                      BIC         G4        -0.002 0.001 0.000 0.001 -0.003 0.000
  5000      No        BIC         G1        -0.001 -0.001 0.000 -0.004 -0.002 0.000
                      BIC         G2        -0.001 -0.001 0.000 -0.005 0.001 0.000
                      BIC         G3        -0.001 -0.002 0.001 -0.003 -0.002 0.000
                      BIC         G4         0.001 -0.005 0.001 -0.004 0.002 0.000
            250       BIC         G1        -0.001 0.001 0.000 0.003 -0.003 0.000
                      BIC         G2         0.000 0.000 0.000 0.000 -0.006 0.000
                      BIC         G3         0.002 -0.001 0.000 0.004 -0.002 0.000
                      BIC         G4         0.000 0.001 0.000 0.001 -0.001 0.000
 10000      No        BIC         G1         0.002 0.000 0.000 -0.003 0.001 0.000
                      BIC         G2         0.004 -0.005 0.001 0.000 0.000 0.000
                      BIC         G3         0.002 -0.002 0.000 0.000 0.002 0.000
                      BIC         G4         0.002 0.000 0.000 -0.002 0.002 0.000
            250       BIC         G1         0.001 -0.002 0.001 -0.001 -0.001 0.000
                      BIC         G2         0.000 -0.001 0.001 -0.002 -0.003 0.000
                      BIC         G3         0.003 -0.002 0.001 0.000 -0.001 0.000
                      BIC         G4         0.000 -0.001 0.000 -0.001 -0.002 0.000
                                              69


Table 20 (cont’d)
  20000      No         BIC           G1         0.003 0.000 0.000 0.000 0.000 0.000
                        BIC           G2         0.002 0.000 0.000 0.000 0.000 0.000
                        BIC           G3         0.002 -0.001 0.000 0.000 -0.001 0.000
                        BIC           G4         0.003 -0.001 0.000 0.001 0.000 0.000
            250         BIC           G1         0.001 0.001 0.000 0.003 0.000 0.000
                        BIC           G2         0.002 0.001 0.000 0.001 -0.001 0.000
                        BIC           G3         0.003 0.000 0.000 0.003 -0.001 0.000
                        BIC           G4         0.001 0.001 0.000 0.001 0.002 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
Table 21. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=0.4)
    N      Min # Pruning Subgroups                mi       ms       mq       vi        vs    vq
 10,000      No         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
            250         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
 20,000      No         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
            250         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                  70


Table 22. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=0.6)
    N      Min # Pruning Subgroups                mi       ms       mq       vi        vs    vq
  5,000      No         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.001 0.000 0.000
            250         BIC           G1         0.001 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.001 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
 10,000      No         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
            250         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
 20,000      No         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
            250         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                  71


Table 23. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=0.8)
    N      Min # Pruning Subgroups                mi       ms      mq        vi       vs      vq
   1000     250          No           G1         0.002    0.001   0.000     0.004    0.001   0.000
                                      G2         0.002 0.001 0.000 0.004 0.001 0.000
                                      G3         0.002 0.001 0.000 0.003 0.001 0.000
                                      G4         0.002 0.001 0.000 0.003 0.001 0.000
   2000      No         BIC           G1         0.001 0.000 0.000 0.001 0.001 0.000
                                      G2         0.002 0.000 0.000 0.001 0.001 0.000
                                      G3         0.001 0.001 0.000 0.001 0.000 0.000
                                      G4         0.002 0.001 0.000 0.001 0.000 0.000
            250         BIC           G1         0.001 0.001 0.000 0.001 0.000 0.000
                                      G2         0.001 0.000 0.000 0.002 0.000 0.000
                                      G3         0.001 0.001 0.000 0.001 0.001 0.000
                                      G4         0.001 0.000 0.000 0.001 0.001 0.000
   5000      No         BIC           G1         0.001 0.000 0.000 0.000 0.000 0.000
                                      G2         0.001 0.000 0.000 0.000 0.000 0.000
                                      G3         0.001 0.000 0.000 0.000 0.000 0.000
                                      G4         0.001 0.000 0.000 0.000 0.000 0.000
            250         BIC           G1         0.001 0.000 0.000 0.000 0.000 0.000
                                      G2         0.001 0.000 0.000 0.001 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.001 0.000 0.000 0.001 0.000 0.000
  10000      No         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
            250         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
  20000      No         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
            250         BIC           G1         0.000 0.000 0.000 0.000 0.000 0.000
                                      G2         0.000 0.000 0.000 0.000 0.000 0.000
                                      G3         0.000 0.000 0.000 0.000 0.000 0.000
                                      G4         0.000 0.000 0.000 0.000 0.000 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
                                                  72


Table 24. RMSE of parameter estimates of conditions perfectly recovered true subgroups using
WDMO statistics (Effect size=1.0)
   N       Min # Pruning Subgroups          mi        ms       mq      vi        vs      vq
  1000      250       No          G1       0.002     0.001   0.000    0.003     0.001   0.000
                      No          G2       0.002 0.001 0.000 0.003 0.001 0.000
                      No          G3       0.002 0.001 0.000 0.002 0.001 0.000
                      No          G4       0.002 0.001 0.000 0.003 0.001 0.000
            250      BIC          G1       0.002 0.001 0.000 0.002 0.001 0.000
                     BIC          G2       0.002 0.001 0.000 0.002 0.001 0.000
                     BIC          G3       0.002 0.001 0.000 0.002 0.001 0.000
                     BIC          G4       0.002 0.001 0.000 0.003 0.001 0.000
  2000      No       BIC          G1       0.001 0.001 0.000 0.001 0.000 0.000
                     BIC          G2       0.001 0.001 0.000 0.001 0.000 0.000
                     BIC          G3       0.001 0.001 0.000 0.001 0.000 0.000
                     BIC          G4       0.001 0.000 0.000 0.002 0.000 0.000
            250       No          G1       0.001 0.001 0.000 0.001 0.000 0.000
                      No          G2       0.001 0.001 0.000 0.001 0.001 0.000
                      No          G3       0.001 0.000 0.000 0.002 0.001 0.000
                      No          G4       0.001 0.001 0.000 0.001 0.000 0.000
            250      BIC          G1       0.001 0.001 0.000 0.001 0.001 0.000
                     BIC          G2       0.001 0.000 0.000 0.001 0.000 0.000
                     BIC          G3       0.001 0.001 0.000 0.001 0.000 0.000
                     BIC          G4       0.001 0.000 0.000 0.001 0.001 0.000
  5000      No       BIC          G1       0.001 0.000 0.000 0.000 0.000 0.000
                     BIC          G2       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G3       0.001 0.000 0.000 0.001 0.000 0.000
                     BIC          G4       0.001 0.000 0.000 0.001 0.000 0.000
            250      BIC          G1       0.000 0.000 0.000 0.001 0.000 0.000
                     BIC          G2       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G3       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G4       0.000 0.000 0.000 0.000 0.000 0.000
 10000      No       BIC          G1       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G2       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G3       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G4       0.000 0.000 0.000 0.000 0.000 0.000
            250      BIC          G1       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G2       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G3       0.000 0.000 0.000 0.000 0.000 0.000
                     BIC          G4       0.000 0.000 0.000 0.000 0.000 0.000
                                            73


Table 24 (Cont’d)
  20000       No         BIC          G1            0.000 0.000 0.000 0.000 0.000 0.000
                         BIC          G2            0.000 0.000 0.000 0.000 0.000 0.000
                         BIC          G3            0.000 0.000 0.000 0.000 0.000 0.000
                         BIC          G4            0.000 0.000 0.000 0.000 0.000 0.000
              250        BIC          G1            0.000 0.000 0.000 0.000 0.000 0.000
                         BIC          G2            0.000 0.000 0.000 0.000 0.000 0.000
                         BIC          G3            0.000 0.000 0.000 0.000 0.000 0.000
                         BIC          G4            0.000 0.000 0.000 0.000 0.000 0.000
Note. 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = a test statistic of weighted double maximum for ordinal covariate.
mi = mean intercept. ms = mean linear slope. mq = mean quadratic slope.
vi = variance of intercept. vs = variance of linear slope. vq = variance of quadratic slope.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
5.5      Several desirable options
           This section focuses on differences of the results stem from several available options.
As I described before, three options were considered in this study: 1) three different test statistic
for the ordinal covariate (𝐿𝐿𝐿𝐿, 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 , and 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 ), 2) an option of pre-pruning option
whether or not to limit the minimum sample size per subgroup, and 3) an option whether or not
to use post pruning using BIC. The mean of estimated number of subgroups and the statistical
power to recover the true number of subgroups were presented as an order of effect size in
Tables 25 - 29.
5.5.1    Test statistics of ordinal covariates
           Regardless of treating the ordinal covariates as categorical or ordinal, there was not
noticeable significant difference in terms of the mean of estimated number of subgroups and the
statistical power to recover the true number of subgroups between three different test statistic
across most of conditions except a few conditions. There were minor discrepancies between the
different test statistic when the sample sizes were relatively small (1,000 and 2,000) with the
                                                     74


medium or large effect sizes (0.4, 0.6, 0.8, 1.0).
           First, Table 26 shows the results of the effect size of 0.4. When the sample size was
2,000 with limiting the minimum size per subgroup without post pruning using BIC, the means
of the estimated subgroups were 3.96 for the 𝐿𝐿𝐿𝐿 (test statistic for categorical covariate), 4.02 for
the 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 (LM test statistic for ordinal covariate), and 4.00 for the 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 (double maximum
test statistic for ordinal covariate), respectively. Their statistical powers were 94%, 96%, and
94%, respectively. However, this discrepancy was due to the informative continuous covariate,
which means that the split point of the ordinal covariate was 2 for all conditions. Thus, the
condition of the effect size of 0.4 with the sample size of 2,000 sufficiently works well to recover
the true number of subgroups with the pre-pruning option of setting the minimum sample size of
250 in this study even though it was not perfectly recovered. However, with the same conditions
of the options, the number of estimated subgroups increased as the sample size increases
regardless of the different test statistic for the ordinal covariate.
           Second, Table 27 shows the results of the effect size of 0.6. When the sample size was
1,000 with limiting the minimum size per subgroup without post pruning using BIC, the means
of the estimated subgroups across three test statistic of 𝐿𝐿𝐿𝐿, 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 , and 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 were 3.98,
3.99, and 3.98, respectively, and their statistical powers were also 98%, 99%, and 98%,
respectively. With the same condition, as the sample size was 2,000, the means of the estimated
subgroups across three options were 4.01, 4.00, and 4.03 with their corresponding statistical
powers of 99%, 100%, and 97%, respectively. Likewise, this was due to the informative
continuous covariate not by the informative ordinal covariate. Thus, the condition of the effect
size of 0.6 with the sample size of 1,000 or more sufficiently works well to recover the true
number of subgroups with the options of pre-pruning of setting the minimum sample size of 250
                                                    75


without post-pruning method using BIC. However, with the same conditions of the options, the
number of estimated subgroups tend to increase as the sample size increases regardless of the
different test statistic for the ordinal covariate like the results of the effect size of 0.4.
          Third, Table 28 shows the results of the effect size of 0.8. When the sample size was
1,000 with limiting the minimum size per subgroup without post pruning using BIC, the means
of the estimated subgroups across three test statistic of 𝐿𝐿𝐿𝐿, 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 , and 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 were the same
as 4.00 showing 100% of statistical powers. With the same condition, as the sample size was
2,000, the means of the estimated subgroups across three options were 4.00, 4.02, and 4.02 with
their corresponding statistical powers of 100%, 98%, and 98%, respectively. Likewise, this was
due to the informative continuous covariate not by the informative ordinal covariate. However,
with the same conditions of the options, the number of estimated subgroups tend to increase as
the sample size increases regardless of the different test statistic for the ordinal covariate like the
results of the effect sizes of 0.4 and 0.6. The powers were less than 90% with this condition.
Thus, when there is mean difference with the effect size of 0.8 and the sample sizes of 1,000 or
2,000, MOB sufficiently works well to recover the true number of subgroups with the options of
pre-pruning of setting the minimum sample size of 250 without post-pruning method using BIC.
Looking at Table 29, the results of the effect size of 1.0 show a similar pattern of the results of
the effect size of 0.8 except one condition. This condition will be discussed next section. Like the
other results, there were not discrepancies of the results between three test statistic.
5.5.2    Post pruning method using BIC
          The post pruning option with BIC plays a key role to determine the final number of
subgroups. With or without it produces different number of subgroups under most of the
conditions. Across most of the conditions, the mean of estimated number of subgroups tends to
                                                    76


increase without the post pruning option, which means that MOB tends to over-extract the
number of subgroups. When the effect size was 0.2, there were no cells which correctly
recovered the true number of subgroups even if the sample size was large. Looking at the Table
25, the means of estimated number of subgroups increase as the sample size increase without the
post pruning option. Even if MOB was used with setting the minimum size of 250 without post
pruning of BIC, the mean of estimated number of subgroups increased. This is because tiny
parameter instabilities can be detected with large sample sizes. Using post pruning of BIC
consistently reduced the number of subgroups compared to the one without it. When the sample
sizes were 1,000 or 2,000, the averages of determined number of subgroups were around 1,
which MOB failed to differentiate the group differences that improve the model fit statistically.
Their statistical power was also zero across conditions. In addition, as the sample size increases
to 5,000 or more, the averages of determined number of subgroups were around 2 with the post
pruning using BIC. That is, even though MOB might differentiate distinct subgroups with around
80% statistical powers at first, the model fit of the subgroups did not improve compared to the
model without the subgroups, determining the number of subgroups as two instead of four.
          Next, when the effect size was 0.4, using post pruning with BIC works well to recover
the true number of subgroups under the sample sizes of 10,000 and 20,000. Looking at Table 26,
with the increased effect size of 0.4 compared to 0.2, the statistical powers to recover the true
number of subgroups nearly 99-100% where there are 10,000 samples or more regardless of
setting the minimum sample size per subgroup. As the effect size increases, required sample size
for recovering the true number of subgroups decreases. Looking at Table 27, when the effect size
was 0.6, the average number of estimated subgroups was nearly 4.00 and the statistical power
was nearly 100% with the 5,000 of sample size using post pruning method. When the effect size
                                                  77


was 0.8, the 2,000 of sample size was sufficient to recover the true number of subgroups as
presented in Table 28. When the effect size was 1.0, the 1,000 of sample size was sufficient to
recover the true number of subgroups as presented in Table 29. Cleary, post pruning using BIC
helps to avoid over-fitting issues (growing large tree) under certain conditions.
5.5.3   Limiting minimum sample size per subgroup
          Limiting the minimum sample size per subgroup as 250 was used to correctly detect the
nonlinear changes with four time points and get stable parameter estimates for each subgroup.
Without this, smaller sample size per subgroup can be used to fit the quadratic latent growth
curve model. Two notable findings how limiting the minimum size works are when the effect
sizes are 0.6 and 0.8 with the sample size of 1,000. Looking at Table 27, the average of estimated
number of subgroups was ranged from 3.98 to 3.99, and their statistical power was ranged from
99% to 100% without post pruning using BIC. However, when the post pruning of BIC was used
with the same condition, the average of estimated number of subgroup and the statistical power
were 2.00 and 0%, respectively. Also, when the sample size increases to 2,000, the average of
estimated number of subgroups and statistical power were ranged from 4.00 to 4.03 and from
97% to 100%, respectively. When the effect size was 0.8 in Table 28, the average of estimated
number of subgroups and the statistical power were 4.00 and 100% with 1,000 samples,
respectively, and the average of estimated number of subgroups was ranged from 4.00 to 4.02
and the statistical power was ranged from 98% to 100% with 2,000 samples. However, as the
sample size increases more than 2,000, MOB without the post pruning using BIC over-extracted
the number of subgroups. This means that when the effect sizes are medium (0.4 - 0.6) with
relatively small sample size, such as 2,000, limiting the minimum sample size per subgroup
works better than post pruning using BIC to recover the true number of subgroups.
                                                78


Table 25. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
true number of subgroups (Effect size = 0.2)
                                        N=1,000   N=2,000        N=5,000     N=10,000  N=20,000
  Min #        Pruning Ordinal
                                        MNS   P  MNS       P    MNS     P    MNS     P MNS    P
  No           No          𝐿𝐿𝐿𝐿         2.06 0 2.15 2 3.13 31 4.19 83 4.19                   82
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.06 1 2.09 1 3.41 34 4.25 80 4.24                   80
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.13 1 2.15 2 3.47 35 4.27 78 4.24                   80
               BIC         𝐿𝐿𝐿𝐿         1.00 0 1.00 0 2.00 0 2.00 0 2.00                      0
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 1.00 0 1.01 0 2.00 0 2.00 0 2.00                      0
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     1.00 0 1.00 0 2.00 0 2.00 0 2.00                      0
               No          𝐿𝐿𝐿𝐿         1.96 0 2.11 0 3.35 31 4.19 82 4.29                   78
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.10 0 3.48 40 4.20 83 4.23                   78
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.02 0 2.09 0 3.50 41 4.18 84 4.33                   75
               BIC         𝐿𝐿𝐿𝐿         1.00 0 1.04 0 2.00 0 2.00 0 2.00                      0
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 1.00 0 1.03 0 2.00 0 2.00 0 2.00                      0
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     1.00 0 1.03 0 2.00 0 2.00 0 2.00                      0
  250          No          𝐿𝐿𝐿𝐿         2.06 0 2.15 2 3.13 31 4.19 83 4.19                   82
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.06 1 2.09 1 3.41 34 4.25 80 4.24                   80
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.13 1 2.15 2 3.47 35 4.27 78 4.24                   80
               BIC         𝐿𝐿𝐿𝐿         1.00 0 1.00 0 2.00 0 2.00 0 2.00                      0
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 1.00 0 1.01 0 2.00 0 2.00 0 2.00                      0
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     1.00 0 1.00 0 2.00 0 2.00 0 2.00                      0
               No          𝐿𝐿𝐿𝐿         1.96 0 2.11 0 3.35 31 4.19 82 4.29                   78
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.10 0 3.48 40 4.20 83 4.23                   78
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.02 0 2.09 0 3.50 41 4.18 84 4.33                   75
               BIC         𝐿𝐿𝐿𝐿         1.00 0 1.04 0 2.00 0 2.00 0 2.00                      0
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 1.00 0 1.03 0 2.00 0 2.00 0 2.00                      0
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     1.00 0 1.03 0 2.00 0 2.00 0 2.00                      0
Note. 𝐿𝐿𝐿𝐿 = Lagrange multiplier (LM) type test statistic for categorical covariate.
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 = Test statistic of adapted maximum of 𝐿𝐿𝐿𝐿 for ordinal covariate.
𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = Test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
MNS = mean number of estimated subgroups. P = statistical power.
                                                 79


Table 26. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
true number of subgroups (Effect size = 0.4)
                                        N=1,000   N=2,000       N=5,000      N=10,000 N=20,000
  Min #        Pruning Ordinal
                                        MNS   P  MNS       P   MNS     P    MNS    P  MNS   P
  No           No          𝐿𝐿𝐿𝐿         2.47 5 4.11 79 4.27 78 4.17 83 4.22                 86
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.39 5 4.12 86 4.15 87 4.27 78 4.28                 78
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.48 8 4.13 84 4.17 86 4.16 85 4.19                 84
               BIC         𝐿𝐿𝐿𝐿         2.00 0 2.00 0 2.03 0 4.00 100 4.00                 100
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.00 0 2.03 0 3.99 99 4.00                  100
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.00 0 2.04 0 4.00 100 4.00                 100
               No          𝐿𝐿𝐿𝐿         2.73 12 3.96 94 4.21 82 4.23 78 4.28                78
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.86 12 4.02 96 4.22 79 4.17 84 4.19                84
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.88 17 4.00 94 4.21 81 4.11 90 4.14                87
               BIC         𝐿𝐿𝐿𝐿         2.00 0 2.00 0 2.06 0 4.00 100 4.00                 100
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.00 0 2.07 1 4.00 100 4.00                 100
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.00 0 2.05 0 4.00 100 4.00                 100
  250          No          𝐿𝐿𝐿𝐿         2.47 5 4.11 79 4.27 78 4.17 83 4.22                 86
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.39 5 4.12 86 4.15 87 4.27 78 4.28                 78
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.48 8 4.13 84 4.17 86 4.16 85 4.19                 84
               BIC         𝐿𝐿𝐿𝐿         2.00 0 2.00 0 2.03 0 4.00 100 4.00                 100
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.00 0 2.03 0 3.99 99 4.00                  100
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.00 0 2.04 0 4.00 100 4.00                 100
               No          𝐿𝐿𝐿𝐿         2.73 12 3.96 94 4.21 82 4.23 78 4.28                78
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.86 12 4.02 96 4.22 79 4.17 84 4.19                84
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.88 17 4.00 94 4.21 81 4.11 90 4.14                87
               BIC         𝐿𝐿𝐿𝐿         2.00 0 2.00 0 2.06 0 4.00 100 4.00                 100
                           𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.00 0 2.07 1 4.00 100 4.00                 100
                           𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.00 0 2.05 0 4.00 100 4.00                 100
Note. 𝐿𝐿𝐿𝐿 = Lagrange multiplier (LM) type test statistic for categorical covariate.
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 = Test statistic of adapted maximum of 𝐿𝐿𝐿𝐿 for ordinal covariate.
𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = Test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
MNS = mean number of estimated subgroups. P = statistical power.
                                                 80


Table 27. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
true number of subgroups (Effect size = 0.6)
                                       N=1,000   N=2,000        N=5,000      N=10,000 N=20,000
  Min #        Pruning Ordinal
                                       MNS   P  MNS       P    MNS     P    MNS    P  MNS   P
  No           No         𝐿𝐿𝐿𝐿         4.13 85 4.14 88 4.21 84 4.23 79 4.23                 79
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.15 87 4.21 82 4.26 78 4.31 73 4.20                 82
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.16 84 4.22 83 4.23 80 4.27 79 4.27                 78
               BIC        𝐿𝐿𝐿𝐿         2.00 0 2.21        3    4.00 100 4.00 100 4.00      100
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.15        4    4.00 100 4.00 100 4.00      100
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.11        1    4.00 100 4.00 100 4.00      100
               No         𝐿𝐿𝐿𝐿         3.98 98 4.01 99 4.20 81 4.31 77 4.26                 77
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 3.99 99 4.00 100 4.29 73 4.18 84 4.21                81
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     3.98 98 4.03 97 4.15 87 4.23 84 4.14                 87
               BIC        𝐿𝐿𝐿𝐿         2.00 0 2.11        3    4.00 100 4.00 100 4.00      100
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.15        2    4.00 100 4.00 100 4.00      100
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.12        3    4.00 100 4.00 100 4.00      100
  250          No         𝐿𝐿𝐿𝐿         4.13 85 4.14 88 4.21 84 4.23 79 4.23                 79
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.15 87 4.21 82 4.26 78 4.31 73 4.20                 82
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.16 84 4.22 83 4.23 80 4.27 79 4.27                 78
               BIC        𝐿𝐿𝐿𝐿         2.00 0 2.21        3    4.00 100 4.00 100 4.00      100
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.15        4    4.00 100 4.00 100 4.00      100
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.11        1    4.00 100 4.00 100 4.00      100
               No         𝐿𝐿𝐿𝐿         3.98 98 4.01 99 4.20 81 4.31 77 4.26                 77
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 3.99 99 4.00 100 4.29 73 4.18 84 4.21                81
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     3.98 98 4.03 97 4.15 87 4.23 84 4.14                 87
               BIC        𝐿𝐿𝐿𝐿         2.00 0 2.11        3    4.00 100 4.00 100 4.00      100
                          𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.00 0 2.15        2    4.00 100 4.00 100 4.00      100
                          𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.00 0 2.12        3    4.00 100 4.00 100 4.00      100
Note. 𝐿𝐿𝐿𝐿 = Lagrange multiplier (LM) type test statistic for categorical covariate.
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 = Test statistic of adapted maximum of 𝐿𝐿𝐿𝐿 for ordinal covariate.
𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = Test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
MNS = mean number of estimated subgroups. P = statistical power.
                                                 81


Table 28. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
true number of subgroups (Effect size = 0.8)
                                       N=1,000   N=2,000        N=5,000      N=10,000 N=20,000
  Min #       Pruning Ordinal
                                      MNS    P  MNS       P    MNS     P    MNS    P  MNS   P
  No          No         𝐿𝐿𝐿𝐿         4.19 82 4.28 75 4.16 84 4.23 83 4.27                  76
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.24 79 4.19 83 4.15 86 4.25 81 4.28                  78
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.16 84 4.32 74 4.21 82 4.23 78 4.18                  84
              BIC        𝐿𝐿𝐿𝐿         2.20   3  4.00 100 4.00 100 4.00 100 4.00            100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.19   5  4.01 99 4.00 100 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.17   2  4.00 100 4.00 100 4.00 100 4.00            100
              No         𝐿𝐿𝐿𝐿         4.00 100 4.00 100 4.21 82 4.20 84 4.24                82
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.02 98 4.20 81 4.22 81 4.30                 74
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.00 100 4.02 98 4.19 85 4.12 89 4.21                 83
              BIC        𝐿𝐿𝐿𝐿         2.15   3  4.00 100 4.00 100 4.00 100 4.00            100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.11   3  4.00 100 4.02 99 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.15   2  4.00 100 4.00 100 4.00 100 4.00            100
  250         No         𝐿𝐿𝐿𝐿         4.19 82 4.28 75 4.16 84 4.23 83 4.27                  76
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.24 79 4.19 83 4.15 86 4.25 81 4.28                  78
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.16 84 4.32 74 4.21 82 4.23 78 4.18                  84
              BIC        𝐿𝐿𝐿𝐿         2.20   3  4.00 100 4.00 100 4.00 100 4.00            100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.19   5  4.01 99 4.00 100 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.17   2  4.00 100 4.00 100 4.00 100 4.00            100
              No         𝐿𝐿𝐿𝐿         4.00 100 4.00 100 4.21 82 4.20 84 4.24                82
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.02 98 4.20 81 4.22 81 4.30                 74
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.00 100 4.02 98 4.19 85 4.12 89 4.21                 83
              BIC        𝐿𝐿𝐿𝐿         2.15   3  4.00 100 4.00 100 4.00 100 4.00            100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 2.11   3  4.00 100 4.02 99 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     2.15   2  4.00 100 4.00 100 4.00 100 4.00            100
Note. 𝐿𝐿𝐿𝐿 = Lagrange multiplier (LM) type test statistic for categorical covariate.
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 = Test statistic of adapted maximum of 𝐿𝐿𝐿𝐿 for ordinal covariate.
𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = Test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
MNS = mean number of estimated subgroups. P = statistical power.
                                                 82


Table 29. Mean of estimated number of subgroups (MNS) and statistical power (P) to recover
true number of subgroups (Effect size = 1.0)
                                       N=1,000   N=2,000        N=5,000      N=10,000 N=20,000
  Min #       Pruning Ordinal
                                      MNS    P  MNS       P    MNS     P    MNS    P  MNS   P
  No          No         𝐿𝐿𝐿𝐿         4.13 89 4.22 80 4.29 77 4.14 88 4.24                  80
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.22 84 4.15 86 4.28 76 4.34 74 4.23                  82
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.12 88 4.26 79 4.25 77 4.22 84 4.19                  85
              BIC        𝐿𝐿𝐿𝐿         3.99 99 4.00 100 4.00 100 4.00 100 4.00              100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     3.99 99 4.00 100 4.00 100 4.00 100 4.00              100
              No         𝐿𝐿𝐿𝐿         4.00 100 4.01 99 4.17 84 4.27 76 4.29                 77
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.01 99 4.11 89 4.17 86 4.25                 80
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.00 100 4.00 100 4.17 84 4.27 78 4.17                83
              BIC        𝐿𝐿𝐿𝐿         4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
  250         No         𝐿𝐿𝐿𝐿         4.13 89 4.22 80 4.29 77 4.14 88 4.24                  80
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.22 84 4.15 86 4.28 76 4.34 74 4.23                  82
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.12 88 4.26 79 4.25 77 4.22 84 4.19                  85
              BIC        𝐿𝐿𝐿𝐿         3.99 99 4.00 100 4.00 100 4.00 100 4.00              100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     3.99 99 4.00 100 4.00 100 4.00 100 4.00              100
              No         𝐿𝐿𝐿𝐿         4.00 100 4.01 99 4.17 84 4.27 76 4.29                 77
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.01 99 4.11 89 4.17 86 4.25                 80
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.00 100 4.00 100 4.17 84 4.27 78 4.17                83
              BIC        𝐿𝐿𝐿𝐿         4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
                         𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
                         𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂     4.00 100 4.00 100 4.00 100 4.00 100 4.00             100
Note. 𝐿𝐿𝐿𝐿 = Lagrange multiplier (LM) type test statistic for categorical covariate.
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 = Test statistic of adapted maximum of 𝐿𝐿𝐿𝐿 for ordinal covariate.
𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 = Test statistic of weighted double maximum for ordinal covariate.
Min # = minimum sample size per a subgroup. Pruning = post pruning method using BIC.
MNS = mean number of estimated subgroups. P = statistical power.
                                                 83


5.6     Results of growth mixture model
          The same simulated datasets were fitted to growth mixture model parallelly. The results
show that the number of classes (subgroups) was only two across all the conditions. That is,
GMM failed to recover the true number of subgroups as the best fitting model across all the
conditions. Specifically, the models showing the lowest BIC values were the two-class solutions.
In addition, the Lo-Mendell-Rubin likelihood ratio test was not significant for three-, four-, and
five- class solutions across more than 97 replications and all the conditions. When the effect size
was 0.8 or 1.0 and the sample size was 10,000 or 20,000, GMMs produced significant LRT
results for the four-class models even though the values of BIC of these models were higher than
the ones of the two-class model. More importantly, when I look at the composition of the
classified four classes, the proportions of the sample size of four-class model were approximately
49.8%, 0.01%, 50%, and 0.01% for each class, which is most of the subjects were classified into
two classes. Furthermore, the parameter estimates were not consistent over the replication. Since
the best solutions were the two-class model, it was not possible to calculate the classification
accuracy and precision as well as the bias and RMSE of the parameter estimates. However,
except means intercept, linear slope, and quadratic slope, their variances and covariances as well
as the residuals were close to the parameter values because they were fixed across classes as the
population model.
                                                  84


                         CHAPTER 6. CONCLUSION AND DISCUSSION
6.1      Summary of findings
           This study had three main research purposes. First, it aimed to introduce and
demonstrate how to use model-based recursive partitioning (MOB) approach combined with
latent growth curve model (LGCM) for longitudinal study to uncover heterogeneous
subpopulation. Since this approach was not introduced in the field of education and psychology,
I used an empirical representative longitudinal data in education as an illustrative purpose. The
procedures, findings and interpretations were presented in Chapter 3. The second purpose of this
study was to investigate the performance of MOB with the quadratic latent growth curve model
having two interactions among three types of covariates, one is between an ordinal and
continuous covariate and another is between the ordinal and categorical covariate. Based on the
results from the Chapter 3, a population model was chosen for data generation. Effect size and
sample size were varied to simulate datasets, and three options in the estimation were considered
as simulation conditions. A simulation study was conducted to answer seven research questions
under 300 conditions from fully crossed five factors. Lastly, it aimed to compare the results of
MOB with the ones of unconditional growth mixture model (GMM). There are six key findings
from both illustrative analysis and simulation study to address the research questions.
           First, the result of empirical study using HSLS:09 to find distinct subgroups showing
different trajectories and initial status of GPA score suggests that using both options of post
pruning with BIC and limiting the number of sample size per subgroup makes the resulting tree
more concise and easier to interpret the composition of subgroup. In addition, using the test
statistic of 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 for the ordinal covariate is more likely to makes MOB to use the ordinal
                                                   85


covariates for splitting the groups as it produces more accurate p-values than the test statistic of
the categorical covariate. If there are available ordinal covariates that are strongly related to the
outcomes in a parametric model and the order of the values are important to be considered in a
study, it is highly suggested to declare them as ordinal and to use corresponding test statistic in
software. In addition, connecting two packages should be accomplished by user’s own function
because commercial software is not available yet.
           Second, a simulation study was conducted to investigate the performance of this
approach for a given population model under five factors. The results show that the true
number of subgroups was perfectly recovered when the effect size was equal to 1.0 with both
sample size of 1,000 and BIC post pruning option. With the effect size of 0.8, the required
sample size was 2,000 to recover the true number of subgroups perfectly. As the effect sizes
of the mean intercept decrease to medium sizes, including 0.6 and 0.4, the required sample
sizes also increase to 5,000 and 10,000, respectively. When the effect size was small (0.2), the
maximal statistical power to recover the true number of subgroups was 84% with the sample
size of 10,000.
           Third, if the number of subgroups was perfectly recovered, the overall accuracy and
precision were also 1.00. Even if the number of subgroups was not perfectly recovered, when
the effect size and the sample size were 0.4 and 2,000, respectively, the overall accuracy and
macro-averaged precision were 0.97 and 0.98, respectively. Furthermore, four noises
covariates were not chosen for splitting subgroups at any conditions indicating that MOB
works well to find splitting points as designed. Only the informative continuous covariate was
additionally used to split the subgroups if the number of estimated subgroups was larger than
4. The splitting points of the informative ordinal and categorical covariates were nearly close
                                                  86


to the true point of 2 for the former and (10,20) vs. (30,40) for the latter if the number of
subgroups was perfectly uncovered. The true splitting point of 2 for the informative ordinal
covariate was correctly detected for the first splitting under most of the conditions except two
conditions that the effect size is 0.2 and the sample sizes are 1,000 and 2 000 with BIC pruning
option. This is because the uncovered number of subgroups was 1 under these conditions.
Besides, the parameter estimates were also unbiased and efficient for those conditions that the
number of estimated subgroups was nearly equal to four, and their statistical power was close to
100%.
            Fourth, the simulation study shows that there is no evident difference among the test
statistic for the ordinal covariates. This result is partly because the population model includes the
main effect of the ordinal covariate only. The remained continuous and categorical covariates
have the interaction effects only with the ordinal covariate. In addition, the population model is
straightforward to differentiate the subgroups using one cut point only. This would make no
difference among the three test statistic. However, the procedure and result of the empirical study
show that if the test statistic for the ordinal covariate is used, the MOB obviously selects the
ordinal covariate first to split the samples. Even though the relationship between the outcome
and the ordinal covariate was strong, using the default option for the test statistic of categorical
covariate as the ordinal covariate did not select the ordinal covariate at both the first stage and
any other splitting stage. Thus, it is highly desirable for researchers to use all available options
for the test statistic of the ordinal covariate. If there are strong relationships between the outcome
and the splitting candidate ordinal covariates, it would be preferred for them to use either
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑂𝑂 or 𝑊𝑊𝑊𝑊𝑊𝑊𝑂𝑂 rather than 𝐿𝐿𝐿𝐿.
         Fifth, based on the results of both empirical and simulation study, using post pruning
                                                     87


option of BIC helps to avoid over-fitting issues resulting in growing large size of tree generally.
Moreover, it works better than limiting the number of subgroups. With smaller sample size of
1,000 and medium to large effect sizes (0.6 and 0.8), however, using the post pruning option of
BIC reduced the number of subgroups unnecessarily. Limiting the number of subgroups did not
impact to determine the true number of subgroups. However, it is suggested to know the
adequate or required number of sample size for a certain parametric model to be used as a
template model in advance. As described in the results of the empirical study, there were huge
number of subgroups in a tree without limiting the sample size per subgroup. Since a parametric
model is fitted to the samples of each subgroup, the adequate sample size is required to get stable
and correct parameter estimates. Having a rationale to determine the adequate sample size for the
specific parametric model is a task for each researcher as this study did.
        Last but not least, GMM did not differentiate the true subgroups under all conditions
considered in this study for a given population model of LGCM. The model fit of the four-class
solution was not the best based on the value of BIC. Likelihood ratio test also did not produce
significant result supporting that there are four classes in a population in terms of growth
trajectories. Even if GMM finds the four latent classes as the best fitting model at a few times
under certain conditions, the classification accuracy and precision were very poor because most
of the samples were divided into two subgroups only and there were a few samples within the
remained two subgroups. This would be partly due to the fact that the data generation was not
based on the mixture model. Since the purpose of this study is not to compare the performance of
these two approaches directly, it is not true that MOB performs better than GMM. Each method
has its own research purposes and the approach, assumption, and conceptual framework of
GMM to enumerate the number of subgroups is totally different from MOB. To better find the
                                                  88


best fitting model for GMM, more rigorous steps and approaches should be required. The focus
of this study was to examine if MOB not only finds complex interactions between informative
covariates including their cut points, but also determines the true number of subgroups correctly.
Thus, the above results support the claim that the research goals are met.
6.2      Discussions
           The findings of this study have several significant implications for advancements of
quantitative methodology in education research. As the machine learning techniques and their
algorithms are getting popular and widely used by social scientist in these days, incorporating the
machine learning and a variety of statistical models can be one of great analytic tools for
handling big data in the study of education.
           First, it introduces a general framework of MOB and detailed procedures how to use it
for social and behavioral researchers. Through the illustrative example, they can be guided when
MOB can be useful and how it can be applied to different research questions and settings. Within
the framework of MOB, the definition of covariate is broader than the one commonly used in
statistics. A covariate can be regarded as a variable that is potentially related to the interested
outcome(s) in the available datasets. Thus, this methodological approach is especially beneficial
when researchers handle very large samples, such as more than 10,000, to explore unknown
composition of subpopulations and uncover them with many variables that are potentially related
to the interested outcome(s) and have interaction effects with other variables. Besides, the results
of simulation study provide quantitative methodology community with statistical evidence of
how well MOB recovers true subgroups, making them to be equipped with this analytic tool in
hand. Yet, this approach would not be helpful for relatively smaller sample sizes, such as 1,000
or 2,000, because it would be unrealistic to have such a huge strong effect sizes (more than 0.8)
                                                 89


in social science. In fact, if there is a good theoretically established statistical model that fits data
well with adequate sample size, then there is no need to consider using MOB. However, if there
is a less established theoretical model and you have huge amount of variables with large sample
sizes, then MOB would be a natural candidate as one of useful analytic tools. Thus, it is
suggested to start with fitting the theoretical model to the whole sample of data, and then to
consider the further suspected potential covariates for splitting the data to determine the
subgroups. Practically, those subgroups can be either directly interpreted to explain the
discrepancy of the interested outcome or a source to be analyzed as separate subsamples for any
other further steps if necessary. Therefore, the number of subgroups totally depends on the
purpose of study.
           Second, novel but important research questions can be postulated and answered.
Disparities in educational opportunities and achievements exist among students of different
gender, race/ethnicity, SES, and other demographics including environmental factors. When
these characteristics are considered interdependently, for example, White female student from
high-SES background and Black male student from low-SES background, the educational
inequalities may be found to be worse than simply examining subgroup differences by gender or
race/ethnicity. In this regard, an emerging framework of intersectionality in education research
attends to these layered marginalization of student populations. Yet, such subpopulations are
mostly defined by researchers’ own decisions or based on an established theory, often resulting
in arbitrary groups, not empirically identified. Methodologically, MOB showed its strengths to
detect and find meaningful subgroups and advance our understanding of the complex social
mechanisms through examining those interaction effects. Furthermore, the resulting tree of MOB
can be visualized to describe the composition of the subgroups in a tree or a trend of change,
                                                   90


which is very interpretable, intuitive, and straightforward. Moreover, as big data is becoming
more available in education research, it is critical to begin investigating other unveiled contextual
factors (e.g., average housing prices in the neighborhood) that are contributing to the increasing
gap in educational opportunities. If there exist some informative covariates that are already found
to be related to the outcomes, those can be included in the population model. In this case, the
former covariates are not used for splitting the samples because they are already in the statistical
template model. Other unveiled potential covariates then would be used for splitting the
subgroups. Thus, this study ultimately contributes to informing various education policy
stakeholders to make data-driven decisions holding statistical properties.
           Third, this study has numerous potentials to be extended to other various statistical
models, such as causal inference methods that aim to test possible heterogeneous treatment
effects of candidate covariates. Although the simulation study was conducted for a given
population having specific parameter values within a context of longitudinal study, the findings
of this study can also be similarly applied to other popular statistical models. In fact, the existing
approach so called SEM Trees is tailed for the SEM models specifically. This study will thus
play a critical role to lay the groundwork of extending the application of MOB into various
statistical models by investigating its performance regarding complex covariate effects to find
subgroups.
           Fourth, MOB has several options to optimize the size of subgroups with pre- and post-
pruning options. This study examined their desired options under different conditions though,
pruning itself totally depends on the purpose of the study. If the purpose of study is to get
insights on different complex effects of the covariates on the outcome, larger number of
subgroups would give ones all the details and direct interpretation of them like the traditional
                                                  91


regression model. Also, the interpretation of the results of MOB with large number of subgroups
would not be different from the regression model if the interaction effects are not complex. On
the other hands, if the purpose of study is to find a few meaningful subpopulations showing
distinct different distribution and pattern of the outcome(s), optimizing the size of subgroups
with pre- and post- pruning options in MOB would help reduce the number of subgroups, leading
to more concise larger subgroups. In this case, a few certain targeted demographics and other
background variables would be beneficial.
6.3     Limitation and future research
           This study has certain limitations as followings. First, it is assumed that individuals are
independent for the purpose of study even if it is not true. That is, the empirical data used in this
study has multilevel (nested) data structure. However, this analytic approach using machine
learning is an exploratory data analytic tool and has a complementary nature to existing
traditional statistical models. Thus, it is not necessary to consider the multilevel data structure at
this stage. If the interaction effects of covariates are detected through this analysis, the terms can
be reflected in a statistical multilevel modeling and estimated for statistical inference as the
second step.
           Second, this study considered a specific population model. Different population model
could result in different results in terms of performance because MOB is basically data-driven
method, meaning that it depends on the available data and the parametric model used. In
addition, the parametric model used in this study does not include any established covariates
even if it is possible. In other words, the covariates are only used for splitting the samples. The
future step would consider this population model that includes covariates that would not be used
                                                   92


for splitting. It also considered two types of interactions, which is the one between the ordinal
and continuous covariates and another between the ordinal and categorical covariates. In this
case, only the main effect of the ordinal covariate is strongly associated with the outcome.
However, it is not fully addressed in literature if MOB can detect the higher-order interaction
effects of the covariates when the main effects of those are not associated with the outcomes.
Thus, this study is limited to the cases having interaction effects when the main effects are also
associated.
           Based on the limitations, this study has several next steps. The first possible extension
of this study can be adopting the multilevel structural equation models accounting for a nested
data structure. Distinct subgroups may be uncovered according to the higher-level covariates
such as school type, districts, geographical information, or even states. The current study does
not consider the multilevel structure for the simplicity. As stated earlier, any parametric models
can be utilized as a template model to examine the associations between the focal model
parameters and the covariates.
           Secondly, even though it is not common in social science, there would be cases where
there is no main effect, but it may interact with other covariates to have effects on the outcomes,
producing higher order interactions. If there exist empirical data showing this particular
interaction effect in educational and psychological study, a simulation condition with a different
population model needs to be added to examine the performance of MOB.
           Thirdly, the effects of post-pruning method are not fully examined yet in the previous
literature. For large-scale datasets, post-pruning is strongly suggested to reduce the number of
terminal nodes (subgroups), improving interpretability and stability of parameter estimates
depending on the number of parameters and sample size. Thus, a comparative study between two
                                                  93


packages of semtree and partykit is also desired because each has distinct features in
terms of how to split the subgroups and available options.
         Lastly, there is a need to rigorously design a comparative study to compare GMM and
MOB for different population models. Although the third part of this study dealt with whether
GMM also recovers the true number of subgroups, the data generation under a given population
model did not fully reflect the nature of heterogeneity of parameters for GMM. The next step is
to compare the performance between two approaches under various population models as well as
conditions.
                                                94


APPENDIX
   95


                                APPENDIX
                R CODES CONNECTING LAVAAN AND PARTYKIT
# List of packages #
rm(list=ls())
library(MASS)
library(sandwich)
library(lavaan)
library(partykit)
library(strucchange)
library(plyr)
library(MplusAutomation)
# Quadratic model with free covariances and fixed residuals #
QLGM <- '
  inter =~ 1*GPA9 + 1*GPA10 + 1*GPA11 + 1*GPA12;
  slope =~ 0*GPA9 + 1*GPA10 + 2*GPA11 + 3*GPA12;
  quadr =~ 0*GPA9 + 1*GPA10 + 4*GPA11 + 9*GPA12;
  inter ~~ vi*inter; inter ~ mi*1;
  slope ~~ vs*slope; slope ~ ms*1;
  quadr ~~ vq*quadr; quadr ~ mq*1;
  inter ~~ cis*slope;
  inter ~~ ciq*quadr;
  slope ~~ csq*quadr;
  GPA9 ~~ res*GPA9; GPA9 ~ 0*1;
  GPA10 ~~ res*GPA10; GPA10 ~ 0*1;
  GPA11 ~~ res*GPA11; GPA11 ~ 0*1;
  GPA12 ~~ res*GPA12; GPA12 ~ 0*1;
'
# Fit function for SEM #
lavaan_fit <- function(model) {
  function(y, x = NULL, start = NULL, weights = NULL, offset = NULL,
..., estfun = FALSE, object = FALSE) {
    require(lavaan)
    lgcm <- lavaan::lavaan(model = model, data = y, start = start)
    list(
      coefficients = stats4::coef(lgcm), # coefficients
      objfun = -as.numeric(stats4::logLik(lgcm)), # negative log-
likelihood
      estfun = if(estfun) sandwich::estfun(lgcm) else NULL, # score
matrix including empirical estimating functions
      object = if(object) lgcm else NULL
    )
  }
}
# A function for concise results #
node_format <- function(node) {
  c("",
    sprintf("n = %s", node$nobs),
capture.output(print(cbind(node$coefficients[c("mi","ms","mq","vi","vs",
"vq","cis","ciq","csq","res")]), digits = 1L))[-1L])
}
                                   96


# Latent growth curve model with mob algorithm #
pop.control=mob_control(alpha=0.05,bonferroni=TRUE,minsize=250,ytype="da
ta.frame",ordinal="max") # ordinal variable option = max
# Chapter 3 model with all covariates #
pop.mod <- mob(GPA9+GPA10+GPA11+GPA12 ~
SEX+RACE5+LOCA+HACT+SES+MATEFF+BEHAVSCH+SCHCLI+FLUNCH+MISBEHAV,
               data = dat,
               fit = lavaan_fit(QLGM),
               control = pop.control)
plot(tr, drop = TRUE, tnex = 2,FUN=node_format) # terminal nodes
                                   97


                      R CODES FOR SIMULATION STUDY
# Parameters of population model
mean.isq=c(2.8,-0.1,0.06)
cov.isq=matrix(c(0.471,0.001,-0.012,0.001,0.057,-0.016,-0.012,-
0.016,0.007),3,3) # covariance matrix
par=c(mean.isq,diag(cov.isq),cov.isq[upper.tri(cov.isq, diag =
F)],0.084) # parameters for evaluation
# Store gmm results #
gmm.NG=list() # number of groups (sub-populations/classes)
gmm.M=list() # model fit information
gmm.C=list()# classification accuracy
gmm.E=list() # estimates of coefficients
gmm.times=list()
# Store mob results #
res.NG=list() # number of groups (sub-populations/classes)
res.cls=list() # classification accuracy
res.est=list() # estimates of coefficients
res.split=list() # split points
res.times=list()
# Simulation conditions #
size=c(1000,2000,5000,10000,20000) # sample size = 5
effectsize=c(0.2,0.4,0.6,0.8,1.0) # Cohen's d effect size = 5
mean.diff=round(effectsize*sqrt(cov.isq[1,1]),2);mean.diff # mean
difference based on Cohen's effect size
ordinal.option=c("chisq","L2","max") # ordinal = 3
minsize=c("min.No","min.250") # minimum size of node = 2
prune=c("prune.No","prune.BIC") # pruning = 2
# data generation #
set.seed(10)
REP=1:100 # replication number
# Simulation start #
for(s in 1:length(mean.diff)){
  # parameters matrix; intercept only difference
  # Scenario 1: Four groups having different intercepts
  pars=matrix(c(c(par[1]-mean.diff[s],par[-1]), # G1 parameters
                c(par[1]+0*mean.diff[s],par[-1]), # G2 parameters
                c(par[1]+mean.diff[s],par[-1]), # G3 parameters
                c(par[1]+2*mean.diff[s],par[-
1])),nrow=4,ncol=10,byrow=T) # G4 parameters
colnames(pars)=c("mi","ms","mq","vi","vs","vq","cis","ciq","csq","res")
  for(i in 1:length(size)){
    for(kk in 1:length(minsize)){
      for(pp in 1:length(prune)){
        for(o in 1:length(ordinal.option)){
         # replication start #
             for(r in REP){
        tem=NA
        N=size[i]
        factors<-mvrnorm(N/4,mean.isq,cov.isq) # four subgroups
        G1<-as.data.frame(matrix(NA,N/4,4)) # four time points
        G2<-as.data.frame(matrix(NA,N/4,4)) # four time points
        G3<-as.data.frame(matrix(NA,N/4,4)) # four time points
        G4<-as.data.frame(matrix(NA,N/4,4)) # four time points
                                    98


        colnames(G1)=c(paste0("GPA",9:12))
        colnames(G2)=c(paste0("GPA",9:12))
        colnames(G3)=c(paste0("GPA",9:12))
        colnames(G4)=c(paste0("GPA",9:12))
        # intercept (fixed effect) only different models depending on
effect size
        for(t in 1:4){
           G1[,t]=1*(factors[,1]-mean.diff[s])+(t-1)*factors[,2]+(t-
1)^2*factors[,3]+rnorm(N/4,0,sqrt(0.084))
           G2[,t]=1*factors[,1]+(t-1)*factors[,2]+(t-
1)^2*factors[,3]+rnorm(N/4,0,sqrt(0.084))
           G3[,t]=1*(factors[,1]+mean.diff[s])+(t-1)*factors[,2]+(t-
1)^2*factors[,3]+rnorm(N/4,0,sqrt(0.084))
           G4[,t]=1*(factors[,1]+2*mean.diff[s])+(t-1)*factors[,2]+(t-
1)^2*factors[,3]+rnorm(N/4,0,sqrt(0.084))
        }
        # 3 Informative covariates
        G1$ORDINAL=as.ordered(sample(c(1:2),size=N/4,replace=T))
        G1$CONTINUOUS=sample(seq(from=-3.5,to=-
0.7,by=0.1),size=N/4,replace=T) # continuous
        G1$CATEGORY=as.factor(sample(c(10,20,30,40),size=N/4,replace=T))
        G2$ORDINAL=as.ordered(sample(c(1:2),size=N/4,replace=T))
        G2$CONTINUOUS=sample(seq(from=-
0.6,to=3.5,by=0.1),size=N/4,replace=T) # continuous
        G2$CATEGORY=as.factor(sample(c(10,20,30,40),size=N/4,replace=T))
        G3$ORDINAL=as.ordered(sample(c(3:5),size=N/4,replace=T))
        G3$CONTINUOUS=sample(seq(from=-
3.5,to=3.5,by=0.1),size=N/4,replace=T) # continuous
        G3$CATEGORY=as.factor(sample(c(10,20),size=N/4,replace=T))
        G4$ORDINAL=as.ordered(sample(c(3:5),size=N/4,replace=T))
        G4$CONTINUOUS=sample(seq(from=-
3.5,to=3.5,by=0.1),size=N/4,replace=T) # continuous
        G4$CATEGORY=as.factor(sample(c(30,40),size=N/4,replace=T))
        tem=rbind(G1,G2,G3,G4)
        # 4 NOISE VARIABLES #
        tem$N.ORDINAL=as.ordered(sample(c(1:5),size=N,replace=T))
        tem$N.CONTINUOUS1=sample(seq(from=-
3.5,to=3.5,by=0.1),size=N,replace=T)
        tem$N.CONTINUOUS2=round(rnorm(N, mean = 0, sd = 1),1)#
continuous
tem$N.CATEGORY=as.factor(sample(c(10,20,30,40,50),size=N,replace=T))
        True.G=as.factor(rep(c("a","b","c","d"),each=N/4))
        tem=data.frame(tem,True.G) # generated data
        #
        # # GMM using Mplus #
        write.table(tem,"mydat.dat",col.names = F,row.names = T,
quote=F)
        runModels(dir) # running GMM through Mplus
        out=readModels(dir,what="all") # reading Mplus output
        gmm.res=do.call("rbind.fill",sapply(out,"[","summaries"))
mod.fit=gmm.res[,c("BIC","Entropy","T11_LMR_PValue","NLatentClasses")]
        mod.fit.best=mod.fit[order(mod.fit$BIC),] # the lowest BIC
        bc=mod.fit.best$NLatentClasses[1]
        # Number of classes
                                    99


gmm.M[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",minsiz
e[kk],"_",prune[pp])]][[r]]=mod.fit.best
gmm.NG[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",minsi
ze[kk],"_",prune[pp])]][r]=bc
        # gmm class membership
        gmm.cls=out[[bc-1]]$savedata
        gmm.cls$tg=tem$True.G
        # classification accuracy of gmm
        cm=table(gmm.cls$tg,gmm.cls$C) # confusion matrix
        n = sum(cm) # number of instances
        nc = nrow(cm) # number of classes
        diag = diag(cm) # number of correctly classified instances per
class
        rowsums = apply(cm, 1, sum) # number of instances per class
        colsums = apply(cm, 2, sum) # number of predictions per class
        p = rowsums / n # distribution of instances over the actual
classes
        q = colsums / n # distribution of instances over the predicted
classes
        # Overall classification accuracy
        accuracy = sum(diag)/ n # the total number of correct
predictions divided by the total number of predictions made for a
dataset.
        # Per-class precision, recall, and F-1
        precision = diag / colsums # the number of positive class
predictions that actually belong to the positive class.
        recall = diag / rowsums # the number of positive class
predictions made out of all positive examples in the dataset.
        f1 = 2 * precision * recall / (precision + recall) # a single
score that balances both the concerns of precision and recall in one
number.
        #Macro-averaged metrics
        macroPrecision=mean(precision)
        macroRecall=mean(recall)
        macroF1=mean(recall)
        CA=c(accuracy,macroPrecision,macroRecall,macroF1)
gmm.C[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",minsiz
e[kk],"_",prune[pp])]][[r]]=CA
        # gmm estimates
        gmm.est=out[[bc-1]]$parameters$unstandardized
        gmm.est2=matrix(NA,nrow=bc,ncol=11)
        for(gm in 1:bc){
          m=subset(gmm.est,LatentClass==gm & paramHeader=="Means")[,3]
          v=subset(gmm.est,LatentClass==gm &
paramHeader=="Variances")[,3]
          sw=subset(gmm.est,LatentClass==gm & paramHeader=="S.WITH")[,3]
          qw=subset(gmm.est,LatentClass==gm & paramHeader=="Q.WITH")[,3]
          rv=subset(gmm.est,LatentClass==gm &
paramHeader=="Residual.Variances")[,3]
          gmm.tem=c(m,v,sw,qw,rv[1],gm)
          gmm.est2[gm,]=gmm.tem
                                   100


        }
        gmm.est3=as.data.frame(gmm.est2[order(gmm.est2[,1]),])
gmm.E[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",minsiz
e[kk],"_",prune[pp])]][[r]]=gmm.est3
        End=Sys.time()
gmm.times[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",mi
nsize[kk],"_",prune[pp])]][[r]]=difftime(End, Start, units = "secs")
        rm(out)
        ###############
        # MOB fitting #
        ###############
        # control option here #
        if(kk==1 & pp==1)
pop.control=mob_control(alpha=0.05,bonferroni=TRUE,ytype="data.frame",or
dinal=ordinal.option[o],vcov="opg") # minsize option
        if(kk==1 & pp==2)
pop.control=mob_control(alpha=0.05,bonferroni=TRUE,ytype="data.frame",pr
une="BIC",ordinal=ordinal.option[o],vcov="opg") # minsize option
        if(kk==2 & pp==1)
pop.control=mob_control(alpha=0.05,bonferroni=TRUE,ytype="data.frame",mi
nsize=250,ordinal=ordinal.option[o],vcov="opg") # minsize option
        if(kk==2 & pp==2)
pop.control=mob_control(alpha=0.05,bonferroni=TRUE,ytype="data.frame",mi
nsize=250,prune="BIC",ordinal=ordinal.option[o],vcov="opg") # minsize
option
        # Fit the data and run MOB
        Start <- Sys.time()
        tr <- mob(GPA9+GPA10+GPA11+GPA12 ~
ORDINAL+CONTINUOUS+CATEGORY+N.ORDINAL+N.CONTINUOUS1+N.CONTINUOUS2+N.CATE
GORY,
                  data = tem,
                  fit = lavaan_fit(QLGM),
                  control = pop.control)
        # number of subgroups #
res.NG[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",minsi
ze[kk],"_",prune[pp])]][r]=width(tr)
        # estimates #
        if (width(tr)==1)
est=round(coef(tr)[c("mi","ms","mq","vi","vs","vq","cis","ciq","csq","re
s")],3) else
est=round(coef(tr)[,c("mi","ms","mq","vi","vs","vq","cis","ciq","csq","r
es")],3)
res.est[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",mins
ize[kk],"_",prune[pp])]][[r]]=est
        # split points #
        ni=nodeids(tr)
        ni_terminal=nodeids(tr, terminal = TRUE) # terminal node ids
                                   101


        ni_inner=ni[!ni %in% ni_terminal] # inner node ids
        a=sapply(ni_inner, function(x)
split_node(node_party(tr[[x]]))$breaks)
res.split[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",mi
nsize[kk],"_",prune[pp])]][[r]]=unlist(a)
        # Classification accuracy #
        cm=table(tem[,"True.G"],predict(tr,newdata=tem,type="node")) #
confusion matrix
        n = sum(cm) # number of instances
        nc = nrow(cm) # number of classes
        diag = diag(cm) # number of correctly classified instances per
class
        rowsums = apply(cm, 1, sum) # number of instances per class
        colsums = apply(cm, 2, sum) # number of predictions per class
        p = rowsums / n # distribution of instances over the actual
classes
        q = colsums / n # distribution of instances over the predicted
classes
        # Overall classification accuracy
        accuracy = sum(diag)/ n # the total number of correct
predictions divided by the total number of predictions made for a
dataset.
        # Per-class precision, recall, and F-1
        precision = diag / colsums # the number of positive class
predictions that actually belong to the positive class.
        recall = diag / rowsums # the number of positive class
predictions made out of all positive examples in the dataset.
        f1 = 2 * precision * recall / (precision + recall) # a single
score that balances both the concerns of precision and recall in one
number.
        #Macro-averaged metrics
        macroPrecision=mean(precision)
        macroRecall=mean(recall)
        macroF1=mean(recall)
        CA=c(accuracy,macroPrecision,macroRecall,macroF1)
res.cls[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",mins
ize[kk],"_",prune[pp])]][[r]]=CA
        End <- Sys.time()
res.times[[paste0(effectsize[s],"_",size[i],"_",ordinal.option[o],"_",mi
nsize[kk],"_",prune[pp])]][[r]]=difftime(End, Start, units = "secs")
        rm(tr)
        print(paste0(Sys.time(),
effectsize[s],"_",size[i],"_",ordinal.option[o],"_",minsize[kk],"_",prun
e[pp],"_REP = ",r))
        } # one replication
}}}}}
                                   102


REFERENCES
    103


                                          REFERENCES
Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on
        automatic control, 19(6), 716-723.
Allensworth, E. M., & Clark, K. (2020). High school GPAs and ACT scores as predictors of
        college completion: Examining assumptions about consistency across high schools.
        Educational Researcher, 49(3), 198-211.
Andrews, D.W. K. (1993), Tests for parameter instability and structural change with unknown
        change point, Econometrica, 61(4), 821–856.
Arnold, M., Oberski, D. L., Brandmaier, A. M., & Voelkle, M. C. (2020). Identifying heterogeneity
        in dynamic panel models with individual parameter contribution regression. Structural
        Equation Modeling: A Multidisciplinary Journal, 27(4), 613-628.
Arnold, M., Voelkle, M. C., & Brandmaier, A. M. (2021). Score-guided structural equation model
        trees. Frontiers in Psychology, 11, 3913. https://doi.org/10.3389/fpsyg.2020.564403
Asparouhov, T., & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step
        approaches using M plus. Structural equation modeling: A multidisciplinary
        Journal, 21(3), 329-341.
Baker, R. S., Martin, T., & Rossi, L. M. (2016). Educational data mining and learning
        analytics. The Wiley handbook of cognition and assessment: Frameworks, methodologies,
        and applications, 379-396.
Bauer, D. J., & Shanahan, M. J. (2007). Modeling complex interactions: Person-centered and
        variable-centered approaches. In T. Little, J. Bovaird, & N. Card (Eds.), Modeling
        contextual effects in longitudinal studies (pp. 255-283). Routledge.
Bowers, A. J., & Sprott, R. (2012). Examining the multiple trajectories associated with dropping
        out of high school: A growth mixture model analysis. The Journal of educational
        research, 105(3), 176-195.
Brandmaier, A. M., Prindle, J. J., McArdle, J. J., & Lindenberger, U. (2016). Theory-guided
        exploration with structural equation model forests. Psychological methods, 21(4), 566.
Brandmaier, A. M., von Oertzen, T., McArdle, J. J., & Lindenberger, U. (2013). Structural
        equation model trees. Psychological methods, 18(1), 71.
Brandmaier, A. M., von Oertzen, T., McArdle, J. J., & Lindenberger, U. (2014). Exploratory data
        mining with structural equation model trees. Contemporary issues in exploratory data
        mining in the behavioral sciences, 96-127.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
                                                104


Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and regression
         trees. Routledge.
Bürgin, R., & Ritschard, G. (2015). Tree-based varying coefficient regression for longitudinal
         ordinal responses. Computational Statistics & Data Analysis, 86, 65-80.
Chen, Q., Luo, W., Palardy, G. J., Glaman, R., & McEnturff, A. (2017). The efficacy of common
         fit indices for enumerating classes in growth mixture models when nested data structure is
         ignored: A Monte Carlo study. Sage Open, 7(1), 2158244017700459.
Chow, G. C. (1960), Tests of equality between sets of coefficients in two linear regressions,
         Econometrica, 28, 591-605.
Cohen, P., West, S. G., & Aiken, L. S. (2014). Applied multiple regression/correlation analysis for
         the behavioral sciences. Psychology press.
De Gonzalez, A. B., & Cox, D. R. (2007). Interpretation of interaction: A review. The Annals of
         Applied Statistics, 1(2), 371-385.
Diallo, T. M., Morin, A. J., & Parker, P. D. (2014). Statistical power of latent growth curve models
         to detect quadratic growth. Behavior research methods, 46(2), 357-371.
Dimitrov, D. M., Al-Saud, F. A. A. M., & Alsadaawi, A. S. (2015). Investigating population
         heterogeneity and interaction effects of covariates: The case of a large-scale assessment
         for teacher licensure in Saudi Arabia. Journal of Psychoeducational Assessment, 33(7),
         674-686.
Eo, S. H., & Cho, H. (2014). Tree-structured mixed-effects regression modeling for longitudinal
         data. Journal of Computational and Graphical Statistics, 23(3), 740-760.
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of
         classifiers to solve real world classification problems?. The journal of machine learning
         research, 15(1), 3133-3181.
Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2018). Detecting treatment-
         subgroup interactions in clustered data with generalized linear mixed-effects model
         trees. Behavior research methods, 50(5), 2016-2034.
Fu, W., & Simonoff, J. S. (2015). Unbiased regression trees for longitudinal and clustered
         data. Computational Statistics & Data Analysis, 88, 53-74.
Gibson, W. A. (1959). Three multivariate models: Factor analysis, latent structure analysis, and
         latent profile analysis. Psychometrika, 24(3), 229-252.
Grimm, K. J., & Jacobucci, R. (2020). Reliable Trees: Reliability Informed Recursive Partitioning
         for Psychological Data. Multivariate behavioral research, 1-13.
         https://doi.org/10.1080/00273171.2020.1751028
Grün, B., Kosmidis, I., & Zeileis, A. (2012). Extended Beta Regression in R: Shaken. Stirred,
         Mixed, and Partitioned, 2012(48), 25.
                                                   105


Hajjem, A., Larocque, D., & Bellavance, F. (2017). Generalized mixed effects regression
          trees. Statistics & Probability Letters, 126, 114-118.
Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: an R package for facilitating large-
          scale latent variable analyses in M plus. Structural equation modeling: a multidisciplinary
          journal, 25(4), 621-638.
Hansen, B. E. (1997). Approximate asymptotic p values for structuras-change tests. Journal of
          Business & Economic Statistics, 15(1), 60-67.
Harlow, L. L., & Oswald, F. L. (2016). Big data in psychology: Introduction to the special
          issue. Psychological Methods, 21(4), 447.
Hjort, N. L., & Koning, A. (2002). Tests for constancy of model parameters over time. Journal of
          Nonparametric Statistics, 14(1-2), 113-132.
Hothorn, T., & Zeileis, A. (2015). partykit: A modular toolkit for recursive partytioning in R. The
          Journal of Machine Learning Research, 16(1), 3905-3909.
Hu, J., Leite, W. L., & Gao, M. (2017). An evaluation of the use of covariates to assist in class
          enumeration in linear growth mixture modeling. Behavior Research Methods, 49(3),
          1179-1190.
Jacobucci, R., & Grimm, K. J. (2020). Machine learning and psychological research: The
          unexplored effect of measurement. Perspectives on Psychological Science, 15(3), 809-
          816.
Jacobucci, R., Grimm, K. J., & McArdle, J. J. (2017). A comparison of methods for uncovering
          sample heterogeneity: Structural equation model trees and finite mixture
          models. Structural equation modeling: a multidisciplinary journal, 24(2), 270-282.
Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36(4),
          409-426.
Kim, J., & Cho, H. (2019). Seemingly unrelated regression tree. Journal of Applied
          Statistics, 46(7), 1177-1195.
Kleiber, C., Hornik, K., Leisch, F., & Zeileis, A. (2002). strucchange: An r package for testing for
          structural change in linear regression models. Journal of statistical software, 7(2), 1-38.
Kundu, M. G., & Harezlak, J. (2019). Regression trees for longitudinal data with baseline
          covariates. Biostatistics & epidemiology, 3(1), 1-22.
Le, T. T., & Moore, J. H. (2021). treeheatr: an R package for interpretable decision tree
          visualizations. Bioinformatics, 37(2), 282-284.
Liu, J. (2020). Extending mixture of experts model to investigate heterogeneity of trajectories:
          when, where and how to add which covariates. arXiv preprint arXiv:2007.02432.
Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal
                                                   106


        mixture. Biometrika, 88(3), 767-778.
Loh, W. Y. (2014). Fifty years of classification and regression trees. International Statistical
        Review, 82(3), 329-348.
Lubke, G. H., & Muthén, B. O. (2005). Investigating population heterogeneity with factor mixture
        models. Psychological methods, 10(1), 21.
Lubke, G. H., & Muthén, B. O. (2007). Performance of factor mixture models as a function of
        model size, covariate effects, and class-specific parameters. Structural Equation
        Modeling: A Multidisciplinary Journal, 14(1), 26-47.
Magidson, J., & Vermunt, J. (2004). Latent class models. In D. Kaplan (Ed.), Handbook of
        quantitative methodology for the social sciences (pp. 175–198). Newbury Park, CA: Sage.
McCutcheon, A. C. (1987). Latent class analysis. Beverly Hills, CA: Sage.
McLachlan, G. J., & Basford, K. E. (1988). Mixture models: Inference and applications to
        clustering (Vol. 38). New York: M. Dekker.
McLachlan, G. J., Lee, S. X., & Rathnayake, S. I. (2019). Finite mixture models. Annual review of
        statistics and its application, 6, 355-378.
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Mellenbergh, G. J. (1989). Item bias and item response theory. International journal of
        educational research, 13(2), 127-143.
Merkle, E. C., & Zeileis, A. (2013). Tests of measurement invariance without subgroups: A
        generalization of classical methods. Psychometrika, 78(1), 59-82.
Merkle, E. C., Fan, J., & Zeileis, A. (2014). Testing for measurement invariance with respect to an
        ordinal variable. Psychometrika, 79(4), 569-584.
Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a
        proposal. Journal of the American statistical association, 58(302), 415-434.
Muthén, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM
        algorithm. Biometrics, 55(2), 463-469.
Muthén, L. K., & Muthén, B. O. (2017). Mplus: Statistical Analysis with Latent Variables: User's
        Guide (Version 8). Los Angeles, CA: Authors.
Neale, M. C., Hunter, M. D., Pritikin, J. N., Zahery, M., Brick, T. R., Kirkpatrick, R. M., ... &
        Boker, S. M. (2016). OpenMx 2.0: Extended structural equation and statistical
        modeling. Psychometrika, 81(2), 535-549.
Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in
        latent class analysis and growth mixture modeling: A Monte Carlo simulation
        study. Structural equation modeling: A multidisciplinary Journal, 14(4), 535-569.
                                                 107


Nylund-Gibson, K., & Masyn, K. E. (2016). Covariates and mixture modeling: Results of a
         simulation study exploring the impact of misspecified effects on class enumeration.
         Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 782-797.
Nylund-Gibson, K., Grimm, R. P., & Masyn, K. E. (2019). Prediction from latent classes: A
         demonstration of different approaches to include distal outcomes in mixture
         models. Structural Equation Modeling: A Multidisciplinary Journal, 26(6), 967-985.
Paquette, L., Ocumpaugh, J., Li, Z., Andres, A., & Baker, R. (2020). Who's Learning? Using
         Demographics in EDM Research. Journal of Educational Data Mining, 12(3), 1-30.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data
         analysis methods (Vol. 1). SAGE.
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling and more. Version 0.5–
         12 (BETA). Journal of statistical software, 48(2), 1-36.
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item
         analysis. Applied Psychological Measurement, 14(3), 271-282.
Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461-464.
Serang, S. (2021). A comparison of three approaches for identifying correlates of heterogeneity in
         change. New Directions for Child and Adolescent Development.
Serang, S., Jacobucci, R., Stegmann, G., Brandmaier, A. M., Culianos, D., & Grimm, K. J. (2020).
         Mplus Trees: structural equation model trees using Mplus. Structural Equation Modeling:
         A Multidisciplinary Journal, 1-11.
Stegmann, G., Jacobucci, R., Serang, S., & Grimm, K. J. (2018). Recursive partitioning with
         nonlinear models of change. Multivariate behavioral research, 53(4), 559-570.
Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item
         functioning in the Rasch model. Psychometrika, 80(2), 289-316.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale,
         application, and characteristics of classification and regression trees, bagging, and random
         forests. Psychological methods, 14(4), 323.
Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley-
         Terry models by means of recursive partitioning. Journal of Educational and Behavioral
         Statistics, 36(2), 135-153.
Su, X., Wang, M., & Fan, J. (2004). Maximum likelihood regression trees. Journal of
         Computational and Graphical Statistics, 13(3), 586-598.
Usami, S., Hayes, T., & McArdle, J. (2017). Fitting structural equation model trees and latent
         growth curve mixture models in longitudinal designs: The influence of model
         misspecification. Structural Equation Modeling: A Multidisciplinary Journal, 24(4), 585-
         598.
                                                  108


Usami, S., Jacobucci, R., & Hayes, T. (2019). The performance of latent growth curve model-
         based structural equation model trees to uncover population heterogeneity in growth
         trajectories. Computational Statistics, 34(1), 1-22.
van Wie, M. P., Li, X., & Wiedermann, W. (2019). Identification of confounded subgroups using
         linear model-based recursive partitioning. Psychological Test and Assessment
         Modeling, 61(4), 365-387.
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step
         approaches. Political analysis, 18(4), 450-469.
Wang, T., Merkle, E. C., Anguera, J. A., & Turner, B. M. (2021). Score-based tests for detecting
         heterogeneity in linear mixed models. Behavior Research Methods, 53(1), 216-231.
Wang, T., Strobl, C., Zeileis, A., & Merkle, E. C. (2018). Score-based tests of differential item
         functioning via pairwise maximum likelihood estimation. psychometrika, 83(1), 132-155.
Wang, Y., Kim, E., Ferron, J. M., Dedrick, R. F., Tan, T. X., & Stark, S. (2020). Testing
         measurement invariance across unobserved groups: The role of covariates in factor
         mixture modeling. Educational and Psychological Measurement, 0013164420925122.
Zeileis, A. (2020, September 7). Structural equation model trees with partykit and lavaan.
         Research homepage of Achim Zeileis. https://www.zeileis.org/news/lavaantree/
Zeileis, A., & Hornik, K. (2007). Generalized M‐fluctuation tests for parameter
         instability. Statistica Neerlandica, 61(4), 488-508.
Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-based recursive partitioning. Journal of
         Computational and Graphical Statistics, 17(2), 492-514.
Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., and Kopf, J. (2020). Psychotree: recursive
         partitioning based on psychometric models (Version 0.15- 3) [Computer software].
                                                  109