A LATENT STATE TRAIT MODEL FOR MULTILEVEL MEDIATION ANALYSIS WITH 
MULTIPLE TIMEPOINTS 

By 

Lydia Bradford 

A DISSERTATION 

Submitted to 
Michigan State University 
in partial fulfillment of the requirements   
for the degree of 

Measurement and Quantitative methods – Doctor of Philosophy 

2024 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABSTRACT

In randomized control trials (RCT), the recent focus has shifted to how an intervention yields

positive results on its intended outcome. This aligns with the recent push of implementation

science in healthcare (Bauer et al., 2015) but goes beyond this. RCTs have moved to evaluating the

theoretical framing of the intervention as well as di ering implementation e ects. One example of

a typical mediation design in education research is the 3-2-1 mediation design (Pituch et al., 2009)

with random assignment at the school level, the mediator at the teacher level, and the outcome

at the student level. In such situations, it is not uncommon for the mediator to be measured at

multiple time points across the intervention period. However, the current mediation models are not

equipped for a longitudinal mediator that does not measure growth and where the outcome measure

is only measured once in a 3-2-1 model. This dissertation has three primary goals. The ﬁrst is to

provide a framework to answer research questions, such as the mediating e ect of teacher practices

on an intervention’s impact on student achievement. In this situation, the mediator is measured at

multiple time points in a 3-2-1 design, which current methods are not equipped to answer. The

second goal is to provide potential estimation methods for the framework and to evaluate their bias

and power. The ﬁnal goal is to understand how these estimation methods perform in an actual

study.

Again, this study combines multilevel mediation with a latent state-trait (LST) framework to

provide a model that can answer mediation questions when the mediator is at level 2 (teacher level)

in a 3-level design and is measured at multiple time points. It also provides four di erent estimation

methods (averages of the summed mediator, averages of factor scores, factor scores from the LST

model, and the fully speciﬁed model) and the assumptions required for the four methods. These

assumptions include assumptions on restrictions of the latent structure, measurement error, and the

presence of state vs trait variances. It then uses a simulation study to evaluate the four di erent

methods under varying design conditions: sample size, factor loadings, and e ect sizes. Finally,

this study investigates these methods in a project-based learning (PBL) science intervention study

(Crafting Engaging Science Interventions [CESE]; Schneider et al., 2022).

The results of the simulation study show that the choice of measure for the mediator is critical

in reducing bias and increasing power in the estimation of the multilevel LST mediation model.

Mediators with low construct validity will lead to bias across estimation methods. These might

be mediating measures that are not truly measuring the mediator, have small factor loadings,

or otherwise, the variance in the mediator is not explained by the proposed underlying factors.

Additionally, mediators with more time-speciﬁc variance than trait-speciﬁc variance also lead to

more bias across the estimation methods. These are situations where the time point explains more

variance than the level 2 (teachers) general trait. If the time points are teacher practices in a given

class period, this would be the situation where the teacher’s practices vary widely from day to day

and not from teacher to teacher. The simulation also indicates that the sample sizes required for

such research questions are large (>200).

Following the simulation study, the methods were evaluated in the CESE study, a cluster

randomized control trial with 61 schools, 102 teachers, and 4238 students. The CESE intervention

included professional learning for the teachers, 3 NGSS-aligned units (in either chemistry or physics)

with driving questions and hands-on experiences for the students, and NGSS-aligned end-of-unit

assessments. During the intervention, as part of data collection, a random sample of teachers was

observed 1 to 5 times, and their PBL practices in the classroom were scored. Mediation in this study

aimed to understand how the intervention a ected teacher PBL practices in the classroom and how

those practices directly a ected student science achievement at the end of the study. The results of

the estimations of these mediation e ects indicate that the models can converge and provide results;

however, this empirical study reiterates the ﬁndings from the simulation study of issues with small

sample sizes and low trait-speciﬁc variances. Investigating these mediation e ects for the CESE

intervention also raises several additional design considerations for mediation research questions,

such as the e ect of using raters, confounders, and the design of the mediating measure (in this

case, the observation protocol).

Copyright by
LYDIA BRADFORD
2024

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my advisor and committee chair, Dr. Barbara Schneider.

She has always been my biggest supporter, constantly pushing me to be my best. I would not be

where I am without her. She has guided me and taught me so much during my PhD. I am prepared

to advance in my career because of her. I also want to thank her more speciﬁcally for her guidance

on this dissertation and all the time and energy she has put into advising me on it.

I also want to thank my committee members, Dr. Spyros Konstatopoulos, Dr. Kenneth Frank,

and Dr. Joseph Krajcik. Each of them has guided and advised me in di erent ways during my time

as a doctoral student and helped to make me the academic I am today. I also want to thank them

for their advice on this speciﬁc study.

I also owe a lot of thanks to my partner in crime in the measurement and quantitative methods

program, Kayla Bartz. I want to thank her for all our conversations and discussions about methods,

measurement, the study, and any other musings. I also want to thank the rest of the team in the

o ce of the Hannah Chair for always being my biggest supporters and encouraging me to put my

best into this study and my work in the program. I also need to thank all my friends in the MQM

program, Jiachen Liu, Shimeng Dai, and Jiawei Li, for all the critical conversations and discussions

that would eventually help to conceptualize this study.

Finally, I would like to thank my friends and family for all their support:

to Colin DeWitt

for being a rock during the writing of this dissertation; to my mom, dad, and all my siblings for

bringing joy and surprises and support to me whenever I need it; to my salsa community in East

Lansing who always cheer me on; and ﬁnally, to my church and choir community who always seem

to be able to remember my research topic. I would not have been able to ﬁnish without the support

of all those around me.

v

TABLE OF CONTENTS

LIST OF ABBREVIATIONS .

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

CHAPTER 2

LATENT TRAIT-STATE THEORY AND STATISTICAL MULTI-
LEVEL MEDIATION . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

CHAPTER 3

SIMULATION STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . 35

CHAPTER 4

EMPIRICAL STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

CHAPTER 5

DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

BIBLIOGRAPHY .

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

APPENDIX A

R SIMULATION CODE . . . . . . . . . . . . . . . . . . . . . . . . . 85

APPENDIX B

EMPIRICAL STUDY MPLUS CODE . . . . . . . . . . . . . . . . .

. 99

vi

LIST OF ABBREVIATIONS

RCT

LST

Randomized Control Trial

Latent State-Trait

CESE

Crafting Engaging Science Environments

vii

CHAPTER 1

INTRODUCTION

Over the past 20 years, governmental agencies have increasingly required robust mediation analysis

in randomized control trials (RCT) education research to understand the pathway by which the

intervention is implemented and the magnitude of that e ect on the outcome. Within the context of

RCTs, mediation analysis explores how the intervention works, which can be driven by components

of the intervention (the implementation of the intervention) or an underlying theoretical mediational

pathway of an intervention. The e ects of mediation are investigated across many ﬁelds, including

health, psychology, and education. One example in healthcare is an investigation into factors that

mediate the impact of cognitive functional therapy (CFT) on chronic low back pain (O’Neill et al.,

2020). This study randomized participants into CFT vs. group exercise and education. It measured

the mediators (pain self-e cacy, stress, fear of physical activity, coping, depression, and anxiety)

at the halfway point for the intervention (6 months) and then measured the outcomes (disability

and back pain) at the end (12 months). Another example in psychology is an investigation on

the mediating e ects of mindfulness-based self-e cacy on the treatment e ect of meditation or

exercise and stress (Goldstein et al., 2020). Here, the study randomized individuals into mindfulness

training, exercise training, or the control group. Randomization occurred before the intervention;

some of the mediators (mindfulness and self-reported physical activity) were measured at 4 months,

while others (mindfulness self-e cacy and exercise self-e cacy) were measured at 6 months, and

the outcome (health and mental health) was measured after 8 months. Finally, within education, an

example of a mediation study is Herman et al. (2022)’s investigation into whether student time-on-

task mediates the e ects of the CHAMPS classroom management intervention on student outcomes

(student social behavior and academics, standardized academic achievement, and classroom and

homework completion). This study theorized that teachers’ use of CHAMPS would increase student

time on task, leading to the intended behavior and academic outcomes. In this investigation, baseline

measures were collected prior to randomization. In contrast, the mediator (student time-on-task)

was measured at the baseline and then at the end of the year, and the outcomes were collected

1

at the baseline and end of the intervention. Then, one (standardized academic achievement) was

collected as a follow-up the following year.

Depending on the research design, mediation may happen at a di erent level than the level of

the e ects (student outcomes, an individual’s health outcomes, or an employee), such as through a

teacher, a parent, an organizational leader, or a doctor. One example of this is in education, where

the e ect of higher school SES composition (level 2) a ects the outcome of college choice (level 1),

which is partially mediated through school practices towards preparing students for college (level

2; Palardy, 2015).

A typical mediation design in education research is the 3-2-1 mediation design (Pituch et al.,

2009), where the treatment is assigned at the highest level (the school), implemented at the middle

level (the teacher), and evaluated at the individual level (the student). Two examples are a multilevel

mediation analysis on the Building Blocks mathematics curriculum (Schenke et al., 2017) and on

the Content-Focused literacy Coaching intervention (Matsumura et al., 2013). In both these studies,

the treatment (i.e., the Building Blocks mathematics curriculum and Content-Focused Coaching)

was assigned at the school level, the mediator was measured at the second level (classroom level)

through classroom observations on classroom quality, and the outcome was measured at the student

level through achievement in mathematics and reading comprehension respectively (Note, Schenke

et al. (2017) use a 2-2-1 analysis, but the true e ect of estimating the mediation e ects of the

intervention would be estimated by a 3-2-1 design whereby the treatment is as the school level,

classroom quality was measured at the classroom level and mathematics achievement was measured

at the individual level).

Another example is a multilevel mediation analysis on a study on reciprocal teaching and

student self-regulated learning (Schünemann et al., 2017). In this study, the treatment was assigned

at the classroom level, and the outcome was at the student level. However, for the mediator, the

students were working in groups (2nd level), and these groups were observed using videos (Note

again: these authors also use a 2-2-1 analysis, but the more rigorous design would again have been

to estimate this with a 3-2-1 design). This study had multiple mediators, with two mediators at

2

level two and one mediator at level one. Conceptually, the level 2 mediators (student classroom

group observations) inﬂuenced the level one mediator (strategy-related task performance), which

inﬂuenced the outcome (reading comprehension). The 3-2-1 mediation design is not only applicable

to educational research but can be applied to the health/medical ﬁeld where a hospital is at the

highest level, the mediation is at the second level, which is the provider, and the outcome is at

the patient level (Williams et al., 2022), demonstrating the usefulness and versatility of multilevel

mediation analysis. One example of this would be a cluster randomized control trial where the

intervention of training on trauma-informed care (Reeves, 2015) to nurses/providers. The treatment

would be randomized at the hospital level; the mediator would be nurses’ use of trauma-informed

care at the nurse level, and the outcome would be patient-reported satisfaction with their care.

However, several problems have arisen from this transition from simple intent-to-treat analyses

to including the intervention’s mediation e ects, including questions of the measurement, data

collection, and estimation methods to use for these mediation e ects. The remainder of Chapter

1 will investigate the history and di erent developments of mediation analysis, followed by the

current gap in the literature and how this dissertation is addressing this gap.

1.1 History and Development of Mediation Analysis

Single level mediation models

Mediation analysis dates back to prior the 1980s (Ritchie and Miles, 1970; Walberg, 1969).

However, it was only in1986 when Baron and Kenny (1986) clearly deﬁned the di erences between

mediation and moderation, explaining mediation with a path diagram still commonly used today

as well as summarizing the estimation methods and sources of issues of mediation analysis at

the time. They deﬁne moderation as partitioning the e ects into subgroups and the mediator

as the how/why the e ects occur. In this article, they argue that a variable is a mediator under

three conditions: 1) the predictor signiﬁcantly accounts for variation in the mediating variable;

2) the mediator signiﬁcantly accounts for the variation in the outcome; and 3) the inclusion of

the mediator signiﬁcantly decreases the variation in the outcome that the main predictor explains.

They recommend testing this using three regressions; however, they note that this assumes no

3

measurement error and that the outcome variable does not cause the mediator (temporal distinction

between the two). If there is measurement error, they recommend using latent variable modeling

to model the mediation (Baron and Kenny, 1986).

Following Baron and Kenny (1986)’s work, methods for estimating mediation e ects continued

to expand, and in 2002, MacKinnon et al. (2002) compared three signiﬁcant inference methods

for mediation e ects: casual steps methods, the di erence in coe cients methods, and product

of coe cients methods using a simulation study. The authors’ goal was to guide researchers who

were using these methods to estimate the indirect e ects in mediation analysis. The indirect e ects

in mediation are the combined e ects of the relationship between the mediator and the predictor

and then the outcome and the mediator (Bollen, 1987). They found that the causal steps methods

tend to have too low Type I error rates and too low statistical power. Meanwhile, di erent methods

performed better if both the a and b paths were equal to zero or if one of the two paths was equal

to zero, making it di cult to choose a method to test the inference. MacKinnon et al. (2002)

recommend two di erent methods for researchers who want to investigate indirect e ects to ﬁrst

establish whether the indirect e ect (0

⇥

1) is zero (empirical distribution or distribution of the z

scores of a and b multiplied) and then establish whether both paths are zero (joint signiﬁcance)—by

the early 2010s, bootstrapped and Monte Carlo conﬁdence intervals had been recommended for

testing mediation inference (MacKinnon et al., 2004; Preacher and Hayes, 2008; Preacher and Selig,

2012). With new methods and again needing to guide researchers using these methods, Hayes and

Scharkow (2013) analyze the accuracy and conﬁdence interval coverage for these new methods

along with the Sobel test and distribution of the product across di erent sample sizes (n < 100, n

<200, and n>500). They found that the Sobel test was too conservative in general; the two bootstrap

conﬁdence intervals performed the best when there was a true indirect e ect, and the Monte Carlo

conﬁdence intervals and distribution of the product performed the best when there was no e ect.

They thus recommend the Monte Carlo conﬁdence intervals and distribution of the product as the

best but also conservative tests for mediation inference (Hayes and Scharkow, 2013).

4

Multilevel mediation models

Moving from mediation in single-level design into multilevel designs in randomized control

trials, Bauer et al. (2006) expands mediation that does not require two-step estimation into multilevel

models in 1-1-1 and 2-1-1 mediation designs. The 1-1-1 multilevel mediation design indicates that

the random assignment, mediation, and outcome are all at the ﬁrst level. Still, there may be

clustering and random e ects at a level higher than the ﬁrst level. This type of study may occur

in a health intervention where the treatment is assigned to the patient, the implementation is at the

patient level, and the health outcomes are at the patient level; however, the patients are clustered

by their providers. The 2-1-1 mediation design is when random assignment is at the second level

while the mediator and outcome are at the ﬁrst level. Bauer et al. (2006) estimate the mediation

using a multivariate system of equations and provide the expected value and variance of 0

1.

⇥

The 2-1-1 design may be the case in education where a curriculum intervention is assigned at the

school level, student engagement, the mediator, is at the student level, and academic achievement,

the outcome, is also at the student level.

Pituch et al. (2009) broaden this work by providing methods for estimating ﬁve di erent

multilevel mediation designs: the 3-1-1 design, the 3-2-1 design, the 3-3-1 design, the 2-1-1 design,

and the 2-2-1 design. In each of these designs, the ﬁrst number indicates the level where random

assignment occurs, the second number where the mediator is, and the third the outcome level. Thus,

the 3-1-1 design indicates randomization at the highest level (such as the school level) and both

the mediator and outcome at level one (such as the student level). Following, the 3-2-1 design is

randomization at the highest level, the mediator at the middle level, and the outcome at the ﬁrst level;

the 3-3-1 design is randomization and mediation at the highest level and outcome at the lowest level.

The 2-1-1 and 2-2-1 are designs with only two levels such as a classroom randomization or a school

randomization with teacher outcomes. All of these methods are estimated using multiple multilevel

models to estimate the a and b paths, followed by multiplying the paths and testing the inference

with Sobel’s standard errors (Pituch et al., 2009). Following Bauer et al. (2006) and Pituch et al.

(2009)’s expansion of mediation analysis into multilevel designs, Preacher et al. (2010) points out

5

issues in estimation mediation with multilevel models. These issues include bias in the estimation

of the indirect e ect, the fact that multilevel models are unable to treat upper-level variables as

outcomes, such that multilevel models could not be used in the following models: 1-1-2, 1-2-1,

1-2-2, and 2-1-2 models. Additionally, the 2-2-1 mediation models require a two-step process. The

authors note that multilevel structural equation modeling faces none of these limitations. Thus,

they recommended using general multilevel structural equation modeling in place of multilevel

modeling with the one limitation of required sample size (Preacher et al., 2010).

Longitudinal mediation models

Around the same time that mediation was expanding into multilevel models, methods for

longitudinal mediation also began emerging. Selig and Preacher (2009) argue for using longitudinal

mediation and provide three methods in longitudinal mediation analysis. They note causality

concerns when using contemporaneous data for mediation, such as whether the predictor a ects the

mediator and whether the mediator actually a ects the outcome. Jose (2016) speciﬁcally shows how

concurrent mediation often does not generalize to the proposed longitudinal mediation. The three

longitudinal methods provided by Selig and Preacher (2009) are for continuous predictor, mediator,

and outcome variables and include a cross-lagged model, a latent growth model, and a latent change

model. Jose (2016) expands longitudinal models into experimental and quasi-experimental designs

where the treatment is given at T1, the mediator measured at T2, and the outcome measured at

T3. Then, Goldsmith et al. (2018) provides applications for the cross-lagged model, latent growth

model, and latent change model in experimental settings where the treatment is randomly assigned

and the mediator and outcome measures are longitudinal but collected at the same time at every

time point, including baseline.

All of these longitudinal data methods assume that all variables are collected at all time points.

The cross-lagged model estimates the mediation so that the predictor is one-time point lagged

from the mediator and the mediator is one-time point lagged from the outcome, ensuring correct

temporal sequencing for mediation causality (Selig and Preacher, 2009; Goldsmith et al., 2018).

For example, in a science curriculum intervention where the assumed mediation path is that the

6

intervention increases student interest in science, which would then increase student interest in

science careers, if the intervention is assigned at T1 and the students’ interest in science and science

careers is measured at T1, T2, T3, T4, and T5, then the cross-lagged model would estimate the

mediation through the treatment e ect on interest in science at T2 accounting for interest in science

at T1 and then the relationship between interest in science at T2 and interest in science careers

at T3 accounting for interest in science careers at T2. Then, these would be estimated across all

time points, consistently accounting for at least one lag. This requires at least three time points

for the mediator and outcome to be measured at multiple time points at the same time. The latent

growth model allows for the change in the mediator and outcome over time to di er by individuals

through latent intercepts and slopes. It also provides for the investigation of the mediator on the

rate of change (the slope) of the outcome (Selig and Preacher, 2009; Goldsmith et al., 2018). In

an example of a mathematical modeling intervention where the intervention teaches students about

mathematical modeling and the assumed mediator is an increased understanding of mathematical

modeling, which then increases students’ general mathematical ability, both the mediator and

outcome variable can be modeled with a growth model where it is expected that students will

get better in mathematical modeling and general mathematics over time but allows this growth

to be di erent by students (with the latent intercept and slopes). In this case, the student rate of

growth in mathematical modeling can mediate their rate of growth in general mathematics. In these

situations, however, the temporal relationship of the mediator occurring before the outcome cannot

be established. The latent di erence scores allow the relationships between di erent time points

to be di erent; it also allows for more temporal investigations similar to the cross-lagged model.

In this case, respectively, the mediator and outcomes at time t are subtracted from the mediator

and outcomes at time t+1, allowing for di erences to change across time points if there are more

than 3-time points. In the case of 4-time points, the mediation would be the treatment e ect on

the di erence in the mediator between time points 3 and 2 and then the relationship between the

mediator di erence between time points 3 and 2 and the outcome di erence between time points 4

and 3 (Selig and Preacher, 2009; Goldsmith et al., 2018). In the above example, with an intervention

7

addressing students’ science career interests, this mediation would be the treatment e ect on the

change in science interest between time points 2 and 3 and then the relationship between the change

in science interest and the change in science career interest between timepoints 3 and 4.

Longitudinal and multilevel mediation models

Combining longitudinal mediation with multilevel mediation, Zhang and Phillips (2018) pro-

poses both a 2-2-1 cross-lagged mediation and a 2-1-1 cross-lagged mediation, including random

e ects for the level 2 cluster. Finally, McNeish and MacKinnon (2022) applies dynamic structural

equation modeling for intensive longitudinal mediation. It expands upon the cross-lagged model

but combines time-series and multilevel modeling. Using this dynamic structural equation model

allows for the estimation of stationary mediation, person-speciﬁc mediation where the mediation

varies by person, dynamic mediation where the indirect e ect can change over time, and cross-

classiﬁed mediation which allows the mediation to vary by both person and time (McNeish and

MacKinnon, 2022).

Recent measurement concerns in mediation models

Finally, recent years have led to an increase in research on measurement questions regarding

mediation. Olivera-Aguilar et al. (2018) investigate the e ects of measurement noninvariance in

the mediator on the mediation e ects. They note that mediation analysis assumes measurement

invariance but ﬁnd that mediation estimation is robust to violations of factorial invariance where the

loadings are not equivalent across groups but not to violations of metric invariance where the latent

intercepts are not equivalent across groups. However, Olivera-Aguilar et al. (2018) note that even

though mediation analysis is not robust to metric invariance, this may be a result of the treatment

(or group) e ect being tested with the mediation. On a di erent measurement question, Gonzalez

and MacKinnon (2021) investigates di erent misspeciﬁcation of models when the true mediator is

a bifactor model. They compare ten di erent mediator models across the various factors, including

whether the general factor or a speciﬁc factor is the mediator in the estimation. They ﬁnd that

the probability of ﬁnding a mediation e ect decreases with measurement error and that increasing

complexity decreases the power of ﬁnding an e ect (Gonzalez and MacKinnon, 2021). However,

8

previous work done by these authors had already shown that using a unidimensional model instead

of a bifactor model when investigating a speciﬁc latent factor can lead to bias and lower power

(Gonzalez and MacKinnon, 2018), so there may also be a tradeo  between bias and e ciency in

mediation with complex latent structures.

1.2 Current Problem

Researchers interested in answering research questions that include mediators must consider

the choice, measurement, and data collection design for the mediator variables. These choices

are driven by their assumptions about what could impact the implementation of the outcome,

their theoretical mediation models, and measurement considerations for the speciﬁc mediators.

Within education, it is not uncommon to have longitudinal mediators, such as teacher practices

measured through teacher observations at multiple time points, or student emotionality measures,

such as engagement, measured through longitudinal data collection methods, such as the experience

sampling method (ESM). However, current longitudinal mediation assumes either that the mediator

and outcome variables are both being measured longitudinally or that there is growth in the mediator,

which then a ects the outcome measure.

However, measures can di er from those used in the current longitudinal mediation literature

in several ways. First, it is possible that only the mediator is longitudinal as opposed to having both

the mediator and outcome longitudinal and measured simultaneously. In essence, these are speciﬁc

situations where there is a start and an end to an intervention where the treatment is assigned

at the beginning; the intervention lasts a speciﬁed amount of time (for example, one academic

year); the outcome is measured at the end of the intervention; and between the beginning and end,

the mediator is measured multiple times. In these cases, neither the cross-lagged model nor the

more expansive dynamic structural equation model mediation ﬁts the data. Additionally, in these

mediation measures, there may not be an expected growth; instead, the measures assume a person-

level trait with also time-speciﬁc variance that does not necessarily follow a trend. These measures

do not ﬁt with the latent growth model or the latent di erence models in mediation analysis.

9

1.3 Latent Trait-State Theory

To address the gap in estimating mediation with measures that assume a personal level trait

but also time-speciﬁc variance, this study combines multilevel mediation with latent trait-state

theory (LTS). The basis for LTS is to address the longitudinal question of how to consider the

variability of a construct between time points. Numerous latent variables measured across time

have contextual inﬂuences, particularly in education research. One such construct is student

engagement (Vongkulluksn and Xie, 2022). Student engagement may change over time, but it

may also be that the most signiﬁcant inﬂuence on student engagement is di erences in classroom

activities, as well as other factors within a student’s life, such as their general emotions that

day or their interactions with friends and family at that time. Latent Trait-State theory models

the separation of the trait (time-invariant), which in the above example would be the student’s

propensity toward engagement, and the state (time-dependent), which in the above example would

be the day-to-day di erences in classroom activities or the e ects of a students’ general mood on

any given day, aspects of a latent construct. Although LTS theory has been around since the 1980s

under relatively strict assumptions, which assumed that trait constructs did not change over time

(Geiser, 2020), Steyer et al. (2015) revised the theory under less stringent assumptions to allow the

trait constructs to change over time and provided models for single trait single indicator to single

trait multiple indicators to multiple states single indicator to multiple states multiple indicators.

The single vs. multiple traits indicate the number of latent constructs at the trait level. The above

example of student engagement would be a single trait; however, if the researcher were interested in

both student engagement and student science ability simultaneously, those would be multiple traits.

The number of indicators is the number of items that load onto the trait constructs. For example,

if, at any given time point, there was only one question measuring student engagement, this would

be a single indicator. In contrast, if there were multiple questions measuring student engagement,

that would be a multiple indicator example.

10

1.4 Goals and Structure of Study

To deal with longitudinal measures of mediation that have more occasion speciﬁcity as opposed

to growth, this current study expands upon the current literature on longitudinal mediators by

applying a latent trait-state model to multilevel SEM mediation analysis. The goals of this study

were:

1. Provide a model to address mediation measures with multiple time points through a latent

state-trait perspective as opposed to through a growth perspective

2. Provide di erent methods for the estimation of these latent state-trait mediation e ects

3. Compare the di erent estimation methods through bias, power, and convergence

4. Apply these methods to an empirical example to explore how they perform in real-world

situations

This study achieves these goals by explaining the theoretical basis of latent trait-state theory

and statistical multilevel mediation and deriving the combined model, followed by four di erent

ways to potentially estimate this mediation, three of which do not use the fully speciﬁed model but

instead use estimates of the trait mediator. At the same time, the fourth is the estimation of the

full model using structural equation modeling. These estimates of the trait include an average sum,

average factor scores from a factor analysis, and general factor scores from the latent state-trait

model estimated separately. Then, the dissertation tests the bias and power of the four di erent

estimation methods using a simulation study, followed by an empirical example to show how these

estimation methods perform with actual data through the analysis of the mediation e ects of teacher

practices from a project-based learning (PBL) science intervention in high school chemistry and

physics classes.

In the next chapter, the latent trait-state theory is brieﬂy explained, along with how it can be

considered in a bifactor model when there are a limited number of time points. The latent trait

state model is then expanded into a multilevel model, allowing for random e ects. Following this

11

explanation of the latent trait-state theory, a summary of statistical multilevel mediation is given

along with considerations for when using an outcome variable that is measured with binary items

(which could be expanded to other types of items, such as unordered or ordered categorical items).

Then, the multilevel latent trait-state model is combined with the multilevel mediation to provide

a mediation model using latent trait-state theory. Following this, the four estimation methods are

provided, as described above. Then, in Chapter 3, a simulation study explores how these four

di erent estimation methods perform across 3rd-level cluster sample sizes, general and speciﬁc

factor loadings in the latent trait state model, and di ering e ect sizes for the a and b pathways in

the mediation analysis. Finally, chapter 4 explores these estimation methods in a real-world science

curriculum intervention (Schneider et al., 2022) where teacher observations measured at multiple

times are used as a mediator with an assumed latent trait state model for teacher practices. Finally,

chapter 5 wraps up a discussion on this proposed model and how it performed, followed by current

limitations and future research.

12

CHAPTER 2

LATENT TRAIT-STATE THEORY AND STATISTICAL MULTILEVEL MEDIATION

This chapter gives the theoretical basis for latent trait-state theory, followed by advances in multilevel

latent trait-state theory. Then, after delving into statistical mediation and, more speciﬁcally,

multilevel mediation, it combines the two theoretical models into the latent trait-state multilevel

mediation model. Finally, it will provide estimation methods for the latent trait-state multilevel

mediation model.

2.1 Latent Trait-State Theory

The proposed mediation model of this study considers an LTS model where there are multiple

indicators for the mediation construct so that there are multiple questions measuring the mediating

construct, which is measured over time (at multiple time points) and does not restrict time invariance

of the trait. Not restricting time invariance means that how each indicator measures the mediator

may not be the same from time to time. Although time invariance of the trait would be ideal, as

many researchers hope that the measure always measures the same construct with the same scale,

there are several instances where this may be violated in an experimental study. For example, a

mediator with an observer or scorer with a di erent person observing or scoring at other times

may change how the items load onto the construct. Another example would be a self-administered

survey. As time changes, the individual’s perception of the question changes as they gain a better

understanding of what the question may be asking. The proposed model does not assume time

invariance; however, a simple constraining of parameters would shift this assumption so that the

proposed model is plausible in the situation of time invariance or time non-invariance.

Starting with understanding the LTS model to be incorporated into the mediation model, this

study proposes an LTS model through a bifactor model for a single trait multistate model with

multiple indicators. The bifactor model is used in a broad set of measurement contexts, such

as personality (Chen et al., 2012), assessments (DeMars, 2006), and in the single trait multistate

LTS (Geiser, 2020). In the more general context, the bifactor model is a multidimensional model

where each item loads (or is related) to a general factor and then another more speciﬁc factor.

13

An example of this would be in a science assessment where the general factor would be general

science ability, but the speciﬁc factors may be content-speciﬁc factors (such as physical science,

biological science, or environmental science) or skill-speciﬁc factors, such as di erent science

practices (modeling, mathematical representations, or planning an investigation). These models

often ﬁt the data better and give more information about students’ general and speciﬁc abilities.

More speciﬁcally, using the bifactor model in the LTS context, each item loads onto the trait factor

and then each time-speciﬁc factor, allowing for the distinguishing of trait factors from the state

(time-speciﬁc) variance. In the example of student engagement, the general/trait factor would again

be the students’ propensity to be engaged, and the speciﬁc/state factor would be the time-speciﬁc

variations in student engagement.

Statistically, this LTS bifactor model is:

.8C 9 = U8C0 +

_8C1b 9

+

X8C2ZC 9

n8C 9

+

Where .8C 9 represents item i at time point t for individual j, U8C0 is the intercept for item i at time

point t, _8C1 is the time speciﬁc trait loading for item i at time t, b 9 is the common trait factor for

individual j, X8C2 is the factor loading for item i at time point t on the state residual factor, ZC 9 is the

state residual factor at time point t for individual j, and n8C 9 is the unique factor for item i at time t

for individual j (or otherwise, the measurement error; Geiser, 2020, p. 179).

In the above example of student engagement measured at random timepoints, if the indicators

for engagement are student interest, skill, and challenge (Schneider et al., 2016), then .8C 9 is item i

(either interest, skill, or challenge) at the speciﬁc time point t for student j, U8C0 is the intercept of the

indicator (interest, skill, or challenge) at the speciﬁc time point t, _8C1 is the loading of the indicator

(interest, skill, or challenge) at time point t on the students’ propensity to be engaged, b 9 (trait

factor). X8C2 is the loading of the indicator (interest, skill, or challenge) onto the state residual factor

(time-speciﬁc factor of the student’s engagement, ZC 9 ), n8C 9 is the remaining unexplained variance

in the student’s response to the indicator of interest, skill, or challenge.

In a situation with three indicators (such as the engagement example above) at three-time points,

this model is visually represented by Figure 2.1.

14

Figure 2.1 Bifactor Latent State-Trait Model

This bifactor latent state-trait model can be expanded into a multilevel component in cases

where the observations are clustered, such as teachers or students within schools (Wang et al.,

2018). When extending the bifactor model to two levels, the trait and state factors now have within

and between-level components. For example, if level 1 is students and level 2 is schools, there are

within-school (student level) trait and state factors that represent these factors for the individual

students, but then there are between-school (school level) trait and state factors. These indicate that

variance in these trait and state factors is explained by the students belonging to the same school.

Level 1:

Level 2:

.8C 9 : = U8C0:

_8C1: b F

9 : +

X8C2: Z F

C 9 : +

n8C 9 :

+

_8C1: = _8C10

X8C2: = X8C20

15

U8C0: = U8C00 +

_8C01b 1

: +

X8C02Z 1

C: +

h8C0:

Where b 1

: is the between level common trait factor for cluster k; Z 1

C: is the between level state

factor for cluster k at time point t; and h8C0: is the random intercept for item i at time point j for

cluster k. In the engagement example with students nested within schools, b 1

: is the school level

propensity for students to be engaged in school k (do these propensities vary by school), Z 1

C: is the

school level time speciﬁc engagement factor for school k at time point t(factors such as classroom

practices that a ect engagement from day to day), and h8C0: is the random variance for school k on

the responses to the indicators (challenge, interest, and skill).

2.2 Statistical Multilevel Mediation

The 3-2-1 statistical multilevel mediation model is exempliﬁed in a standard education RCT

sampling and data collection framework. Schools are randomly sampled and assigned treatment.

All teachers within the school who are applicable to the treatment are included in the study (or

randomly chosen if the treatment applies to all teachers). Then, all students within those teachers

are included. Treatment is assigned before the start of the school year. Teachers are randomly

observed once throughout the school year to measure teacher practices as a mediator. Finally, the

outcome is measured at the student level at the end of the year (after all observations and at the end

of the treatment).

Pituch et al. (2009) provide the following mediation model for the 3-2-1 mediation design:

Level 1:

Level 2:

Level 3:

.9 :; = c0:;

4 9 :;

+

":; = V"

0; +

A "
:;

c0:; = V00;

V01; ":;

A0:;

+

+

0; = W "
V"

00 +
V00; = W000 +

W "
01);

D"
0;

+

W001);

D00;

+

16

V01: = W010

Where ":; is the mediator at level two for the level two cluster k within the level three cluster

l; ); is the treatment at the level 3 cluster l; .9 :; is the outcome at level 1 for individual j in the level

2 cluster k in the level 3 cluster l. 4 9 :; is the level 1 residual on the outcome for individual j in the

level 2 cluster k in the level 3 cluster l; A "

0:; is the level 2 residual on the mediator for the level two

cluster k within the level three cluster l; A0:; is the level 2 random intercept on the outcome for the

level two cluster k within the level three cluster l; D"

0; is random intercept for the level 3 cluster l on

the mediator; D00; is random intercept for the level 3 cluster l on the outcome. W "

00 is mean of the
01 is pathway a in the mediation estimation; W010 is the
pathway b where the product of a and b estimates the indirect e ect; and W001); is the c’ pathway

mediator; W000 is the mean of the outcome;W "

or the treatment e ect not explained by the mediator. In the example where the mediator is teacher

practice and outcome, student achievement, ":; is a single indicator for teacher practice; .9 :; is a

single indicator for student achievement (such as a standardized test score); W "

01 is the e ect of the

treatment on teacher practices; W010 is the relationship between student achievement and teacher

practices; and W001); is the treatment e ect on student achievement not explained by the mediator.

The 3-2-1 mediation can be applied in a situation where the mediator and outcome are latent

variables (have multiple indicators that load onto the latent variable) instead of observed variables

in a multilevel SEM framework (Silva et al., 2019). This may be a situation where the mediator,

teacher practices, are measured with multiple items in the observation, and the outcome, student

achievement, is measured with a test with multiple items. In this case, the mediator and outcome are

latent constructs that may also have measurement error (these measures are not perfectly reliable

instruments). This expansion into multilevel SEM is essential, especially in education research

trials where the outcome variable is often a student-level construct that includes some measurement

error.

Level 1:

.8 9 :; = U80:;

_81:;\F

9 :; +

+

n8 9 :;

17

Level 2:

Level 3:

"8:; = U"

80; +

81;\ "F
_"

:; +

n "
8:;

U80:; = U800;

_801;\ 1:

:; +

+

h80:;

_81:; = _810;

80; = U"
U"

801\ "1
_"

;

800 +
81; = _"
_"
810

h"
80;

+

D"
;

\ "1
;

= W0 +
U800; = U8000 +

W1);

+
_8001\ 1;

; +

a800;

_801; = _8010

_810; = _8100

\ 1;
; = V0 +

V1);

+

V2\ "1
;

D

+

Where latent mediator variable is split into a within and a between latent construct (\ "F
:;

and

) and the latent outcome variable is split into a within, a between 2nd level, and a between

9 :;, \ 1:

:; , and \ 1;
;

respectively; such as within students, between teachers, and

between schools). Like the multilevel LTS model, the latent constructs are allowed to vary by each

level. In the example above, the student’s academic ability would have a within-level (student level)

ability score, a teacher-level score, and a school-level score, allowing for the measured ability to

vary across teachers and schools. Similarly, the latent variable of teacher practices is allowed to

vary at the school level, so there is a di erence in the average teacher practice factor scores between

schools.

In this multilevel SEM, the treatment and mediation e ects are estimated at the third level. W01

is the pathway a in the mediation estimation and V2 is pathway b where the product of a and b

estimates the indirect e ect. Similarly to the single indicator model above, in the example, now,

W01 is the e ect of the treatment on the between school teacher practices, and V2 is the relationship

18

\ "1
;
3rd level construct (\F

between the school level teacher practices and the between school level student academic ability.

This procedure allows for estimating the mediation e ects in the presence of measurement error

within the mediator and the outcome variable. This model is represented in Figure 2.2.

2.3 Latent Trait-State Theory in Multilevel Mediation

Combining Latent Trait-State Theory into the mediation framework, where the mediator is

a longitudinal measure that has both trait and state latent variables, such as in the case where

teacher practices or social and emotional well-being are measured at multiple time points during

the intervention, the model becomes:

Level 1:

Level 2:

Level 3:

.8 9 :; = U80:;

_81:;\F

9 :; +

+

n8 9 :;

"8C:; = U8C0;

+

_8C1;b F

X8C2; Z F

n "
8C:;

:; +
_801;\ 1:

:; +

C:; +

h80:;

U80:; = U800;

+

_81:; = _810;

U8C0; = U8C00 +

_8C01b 1

; +

X8C02Z 1

C; +

h"
8C0;

_8C1; = _8C10

X8C2; = X8C20

U800; = U8000 +

_8001\ 1;

; +

a800;

_801; = _8010

_810; = _8100

W1);

n

+

b 1
; = W0 +
V1b 1

\ 1;
; = V0 +

V2);

h

+

; +

19

Figure 2.2 Multilevel Structural Equation Mediation Model

20

Where .8 9 :; is item i for individual j in the level 2 cluster k within the level 3 cluster l on the

outcome measure and "8C:; is the mediator item i at time point t for the level 2 cluster k within the

level 3 cluster l. Where a latent mediator is modeled by the mediator trait factor which is split into

a within and a between construct (b F

:; and b 1

; ), the mediator state factor which is split into a within

and between level construct (Z F

C:; and Z 1

C;) and the latent outcome variable is split into a within, a

between 2nd level, and a between 3rd level construct (\F

9 :;, \ 1:

:; , and \ 1;
;

respectively). U8C00 is the

mean of the mediator item i, _8C01 is the mediator item i at time t loadings onto the between level 3

trait factor, X8C02 is the mediator item i at time t loadings onto the between level 3 state factor, h"
8C0;

is the level 3 random intercept on the mediator item i, _8C10 is the mediator item at time t loadings

onto the level 2 trait factor, X8C20 is the mediator item i loadings at time t onto the level 2 state factor,

U8000 is the mean of the outcome item i, _8001 is the outcome item i loading onto the level 3 outcome

factor, a800; is the level 3 random intercept for item i. _8010 is the outcome item i loading for the

level 2 outcome factor, h80:; is the level 2 random intercept for outcome item i, _8100 is the outcome

item i loading for the level 1 outcome factor. In this multilevel SEM, the treatment and mediation

e ects are estimated at the third level. W1 is the pathway a in the mediation estimation and V1 is

pathway b where the product of a and b estimates the indirect e ect. Finally, V2 is the c’ pathway

or the remaining treatment e ect not explained by the mediator.

Returning to the healthcare example from Chapter 1 (see page 3), if level 1 were patients,

level 2, nurses, and level 3, hospitals where the treatment is training on trauma-informed care

and assigned at the hospital level, the mediator of nurse (level 2) use of trauma-informed care is

measured longitudinally, and the outcome is patient-reported satisfaction (measured with multiple

items), then .8 9 :; represents item i on the patient reported satisfaction survey for patent j under

nurse k within hospital l and "8C:; represents item i on the nurse’s measure of trauma-informed

care use at time point t for nurse k within hospital l. Then, b F

:; is the use of trauma informed

care trait for nurse k in hospital l and b 1
;

is the hospital level use of trauma informed care trait for

hospital l (allowing for this trait to vary by di erent hospitals); Z F

C:; is trauma informed care time

point variance at time point t for nurse k in hospital l; and Z 1

C; is the hospital level trauma informed

21

care time point variance at time point t for hospital l (allowing hospital wide time point factors,

such as how busy the hospital is, to number of sta  out, to general stress levels of sta , to e ect the

time point variance of trauma informed care use); \F

9 :; is the satisfaction factor for patient j under

nurse k in hospital l, \ 1:

:; is the between nurses patient satisfaction factor for nurse k in hospital l (so

that the patient satisfaction factor varies by nurses), and \ 1;
;

is the hospital level patient satisfaction

factor for hospital l (allowing for variance in the patient satisfaction factor at the hospital level as

well). Finally, W1 is the hospital-level e ect of the training on nurse’s trauma-informed care use

trait, V1 is the relationship between the hospital-level nurse’s trauma-informed care use trait, and the

hospital-level patient satisfaction, and V2 is the remaining e ect of the training directly on hospital

level patient satisfaction not explained by the nurse’s trauma-informed care use trait.

Next, suppose the logit function is applied to the outcome variable, such as in situations where

the outcome measure is a student achievement test (where the items are binary). In that case, the

model becomes as is shown below. The only di erence here compared to above is how the outcome

items are modeled.

Level 1:

Level 2:

Level 3:

.8 9 :; =

4 (

D80:;

+

1

+

1
081:;

\ F

9 :; 

(

182:;

))

"8C:; = U8C0;

_8C1;b F

:; +

+

X8C2; ZC:;

n "
8C:;

+

D80:; = D800;

0801;

\ 1:
:;  

(

1802;

)

+

081:; = 0810;

182:; = 1820;

U8C0; = U8C00 +

_8C01b 1

; +

X8C02Z 1

C; +

h"
8C0;

_8C1; = _8C10

X8C2; = X8C20

22

D800; = D8000 +

08001(
0801; = 08010

\ 1;
;  

18002)

1802; = 18020

0810; = 08100

1820; = 18200

W1);

b 1
; = W0 +
V1b 1;

; +

\ 1;
; = V0 +

n

+

V2);

h

+

The parameters for the mediator at level 2 and then level 3, as well as the mediation and

treatment e ects, remain the same as those above; however, the parameters that are di erent are in

the outcome model. Here, 08100 is the level 1 item discrimination parameter for outcome item i, 18200

is the level 1 item di culty parameter for outcome item i, 08010 is the level 2 item discrimination

parameter for outcome item i, 18020 is the level 2 item di culty parameter for outcome item i,

08001 is the level 3 item discrimination parameter for outcome item i, and 18002 is the level 3 item

di culty parameter for outcome item i. In many cases, it is not expected that the item di culty

will vary by level 2 or level 3, in which case these would be constrained to zero. Item di culty is

related to how "easy" the item is or how likely someone will get it correct, with larger di culty

parameters indicating easier items. Item discrimination is the ability of the item to di erentiate

individuals based on their true ability.

In the education curriculum example where level 1 is the student level, level 2 is the teacher

level, and level 3 is the school level and where the curriculum is randomly assigned at the school

level, teachers implement it in their classrooms (at the second level), and the e ect is measured

through a multiple choice test (binary items) administered at the student level, then .8 9 :; represents

item i on the multiple choice test for student j in teacher k’s classroom in school l where each item

has its own student level, teacher level, and school level discrimination (08100,08010, and 08001 ) and

di culty parameters (18200,18020, and 18002 ) and "8C:; represents item i on the implementation

measure at time t for teacher k in school l. Then, b F

:; is implementation trait for teacher k in school l

23

and b 1

; is the school level implementation trait for school l (allowing for this trait to vary by di erent

schools); Z F

C:; is the teacher’s implementation at time point t variance for teacher k in school l and

Z 1
C; is the school level implementation at time point t variance for school l (allowing school wide
time point factors); \F

9 :; is the student academic ability for student j in teacher j’s class in school

l, \ 1:
:;

is the between teacher student academic ability for teacher k in school l (so that student

academic ability varies by teachers), and \ 1B
;

is the school level student academic ability for school

l (allowing for variance in the student academic ability at the school level as well). Finally, W1 is the

school-level e ect of the curriculum on the teacher’s implementation trait, V1 is the relationship

between the school-level teacher implementation trait and the school-level student academic ability,

and V2 is the remaining e ect of the curriculum directly on school level student academic ability

not explained by the teacher’s implementation.

The above model is represented in Figure 2.3.

2.4 Estimation of the LST Mediation

In estimating the mediating e ects of a longitudinal variable that follows a latent state-trait

model, four estimation methods are explored in this dissertation, along with the simulation study

to examine the bias and power of each method. The four estimation methods are the averages

of the mediator across time points using a 3-2-1 mediation, using factor scores for each time

point and averaging them across the time points as the mediator used in a 3-2-1 mediation model;

factor scores from the latent state-trait model as the mediator in the 3-2-1 mediation; and ﬁnally,

estimation of the fully speciﬁed multilevel LST mediation. The four estimation methods have

di erent assumptions that are required for them to be unbiased and recommended. In the following

sections, each estimation method will be deﬁned and incorporated into an estimation with a single

indicator outcome (not a latent variable) and a multiple indicator outcome (latent variable). Then,

the assumptions for each of the estimation methods will be described.

24

Figure 2.3 LST in Multilevel Mediation Model

25

First Estimation: averages of the summed mediator

For the ﬁrst estimation method, the averages of the mediator across the time points will be

deﬁned as:

¯":; =

1

⇥

 

)

C=1 ⌃8= 
⌃C=)

8=1"8C:;

T is the total number of time points, and I is the total number of items in the mediator measure.

¯" is the average of the items averaged across time points. In the situation where the mediator is

teacher practices, this would be the averages of the teacher practice scores at each time that the

teacher is observed. Then, these are averaged across the observation time points.

Using ¯":; in the 3-2-1 mediation analysis, the following is estimated:

Single Indicator Outcome

Level 1:

Level 2:

Level 3:

.9 :; = c0:;

4 9 :;

+

¯":; = V"

A "
:;

0; +
V01; ¯":;

c0:; = V00;

+

A0:;

+

0; = ˆW "
V"

00 +
V00; = ˆW000 +

ˆW "
01);

D"
0;

+

ˆW001);

D00;

+

V01; = ˆW010

Then, ˆW "

01 is the estimated a pathway, ˆW010 is the estimated b pathway, and ˆW001 is the estimated

c’ pathway for the mediation analysis. With the inclusion of the average time points, these can

be estimated with either a multilevel model or a multilevel structural equation model with either

robust maximum likelihood or Bayesian estimation.

Multiple Indicator Outcome

Level 1:

.8 9 :; = U80:;

_81:;\F

9 :; +

+

n8 9 :;

26

Level 2:

Level 3:

¯":; = V"

A "
:;

0; +
_801;\ 1:

U80:; = U800;

\ 1:
:; = V0;

+

+

h80:;

:; +

V1; ¯":;

A:;

+

_81:; = _810;

0; = ˆW "
V"

00 +
U800; = ˆU8000 +

ˆW "
01);

+
ˆ_8001\ 1;

; +

D"
0;

a800;

_801; = ˆ_8010

_810; = ˆ_8100

V0; = ˆV00

V1; = ˆV10

+
01 is the estimated a pathway, ˆV00 is the estimated b pathway, and ˆV1 is the estimated
c’ pathway for the mediation analysis. With the inclusion of the average time points and using

Then, ˆW "

\ 1;
; = ˆV0 +

ˆV1);

D

multiple indicator outcomes, these can be estimated with a multilevel structural equation model

with either robust maximum likelihood or Bayesian estimation.

In the situation where the mediator is teacher practices and the outcome is student achievement,

the ˆW "

01 is the e ect of the treatment on teacher practices, ˆW010 is the relationship between teacher
practices and student achievement, and ˆW001 is the remaining e ect of the treatment on student

achievement not due to teacher practices.

This estimation method of using the summed averages of the mediator assumes no time-varying

variance. It also assumes that all items are equally loading onto the construct and that there is no

measurement error. In the teacher practices example, the mediation measure perfectly measures

27

the teacher practices at each time point with no other outside inﬂuences a ecting the estimation

of the teacher practices, and each item is weighted equally with regards to estimating the teacher

practices construct. If these assumptions are met, this estimation method will be unbiased, tend to

converge more quickly than the other methods, be easy to interpret and implement, and be the most

parsimonious.

Second Estimation: averages of the factor scores

For the second estimation method, the factor for each time point will be deﬁned and estimated

by:

"8C:; = U8C00 +
1
¯Z:; =
)

ˆX8C10 ˆZC:;

n "
8C:;

+

⌃C=)
C=1

ˆZC:;

Where, once again, T is the total number of time points.

ˆZ:; is the average factor scores across

the timepoints. In the teacher practices example, at each time point, the observation scores for

each item would be included in a factor analysis to estimate the teacher practice factor. Then, these

estimated factor scores for teacher practice would be averaged across each time point for the teacher.

Using ˆZ:; in the 3-2-1 mediation analysis, the following is estimated:

Single Indicator outcome

Level 1:

Level 2:

Level 3:

.9 :; = c0:;

4 9 :;

+

¯Z:; = V"

0; +

A "
0:;

c0:; = V00;

V01; ¯Z:;

A:;

+

+

0; = ˆW "
V"

00 +
V00; = ˆW000 +

ˆW "
01);

D"
0;

+

ˆW001);

D00;

+

V01; = ˆW010

28

Then, ˆW "

01 is the estimated a pathway, ˆW010 is the estimated b pathway, and ˆW001 is the estimated c’

pathway for the mediation analysis. With the inclusion of the factor scores, these can be estimated

with either a multilevel model or a multilevel structural equation model with either robust maximum

likelihood or Bayesian estimation.

Multiple indicator outcome

Level 1:

Level 2:

Level 3:

.8 9 :; = U80:;

_81:;\F

9 :; +

+

n8 9 :;

¯Z:; = V"

A "
:;

0; +
_801;\ 1:

U80:; = U800;

\ 1:
:; = V0;

+

+

h80:;

:; +

V1; ¯Z:;

A:;

+

_81:; = _810;

0; = ˆW "
V"

00 +
U800; = ˆU8000 +

ˆW "
01);

+
ˆ_8001\ 1;

; +

D"
0;

a800;

_801; = ˆ_8010

_810; = ˆ_8100

V0; = ˆV00

V1; = ˆV10

+
01 is the estimated a pathway, ˆV00 is the estimated b pathway, and ˆV1 is the estimated c’
pathway for the mediation analysis. With the inclusion of these factor scores and using a multiple

Then, ˆW "

\ 1;
; = ˆV0 +

ˆV1);

D

indicator outcome, these can be estimated with a multilevel structural equation model with either

robust maximum likelihood or Bayesian estimation.

29

Similarly in the situation where the mediator is teacher practices and the outcome is student

achievement, the ˆW "

01 is the e ect of the treatment on teacher practices, ˆW010 is the relationship
between teacher practices and student achievement, and ˆW001 is the remaining e ect of the treatment

on student achievement remaining not due to teacher practices.

This estimation method assumes no general trait outside of the average of each time point but

does allow for time-varying variance. Similarly to the estimation method above, this estimation

also assumes no measurement error in the estimation of the time-speciﬁc factor scores. Again,

in the example with teacher practices, this assumes that at each time point, the measure perfectly

estimates the teacher practice with the given item loadings estimated in the factor analysis at that

speciﬁc time point. Under these assumptions, this estimation method is unbiased, will have fewer

convergence issues, and will be the more parsimonious method in estimating the factor scores than

the following estimation method.

Third estimation: factor scores from the LST model

For the third estimation method, the factor from the LST model is estimated by:

"8C:; = ˆU8C00 +

ˆ_8C10 ˆb:;

ˆX8C20ZC:;

n8C:;

+

+

Using ˆb:; in the 3-2-1 mediation analysis, the following is then estimated:

Single indicator outcome

Level 1:

Level 2:

Level 3:

.9 :; = c0:;

4 9 :;

+

ˆb:; = V"

0; +

A "
:;

c0:; = V00;

V01; ˆb:;

A0:;

+

+

0; = ˆW "
V"

00 +
V00; = ˆW000 +

ˆW "
01);

D"
0;

+

ˆW001);

D00;

+

30

V01; = ˆW010

Then, ˆW "

01 is the estimated a pathway, ˆW010 is the estimated b pathway, and ˆW001 is the estimated

c’ pathway for the mediation analysis. With the inclusion of the estimated trait factor scores, these

can be estimated with either a multilevel model or a multilevel structural equation model with either

robust maximum likelihood or Bayesian estimation.

Multiple indicator outcome

Level 1:

Level 2:

Level 3:

.8 9 :; = U80:;

_81:;\F

9 :; +

+

n8 9 :;

ˆb:; = V"

A "
:;

0; +
_801;\ 1:

U80:; = U800;

\ 1:
:; = V0;

+

+

h80:;

:; +

V1; ˆb:;

A:;

+

_81:; = _810;

0; = ˆW "
V"

00 +
U800; = ˆU8000 +

ˆW "
01);

+
ˆ_8001\ 1;

; +

D"
0;

a800;

_801; = ˆ_8010

_810; = ˆ_8100

V0; = ˆV00

V1; = ˆV10

+
01 is the estimated a pathway, ˆV00 is the estimated b pathway, and ˆV1 is the estimated
c’ pathway for the mediation analysis. With the inclusion of these estimated trait factor scores and

Then, ˆW "

\ 1;
; = ˆV0 +

ˆV1);

D

31

using a multiple indicator outcome, these can be estimated with a multilevel structural equation

model with either robust maximum likelihood or Bayesian estimation.

Again, in the situation where the mediator is teacher practices and the outcome is student

achievement, the ˆW "

01 is the e ect of the treatment on teacher practices, ˆW010 is the relationship
between teacher practices and student achievement, and ˆW001 is the remaining e ect of the treatment

on student achievement remaining not due to teacher practices.

This estimation method allows for a general trait and time-speciﬁc state constructs; however, it

assumes no measurement error for the estimated general trait. Essentially, the LST model estimation

perfectly predicts the mediator trait. The beneﬁt of this estimation method is that the assumptions

are not as constrictive as the above two methods; however, it will converge at much higher rates

than the fourth estimation method below.

Fourth estimation: fully speciﬁed model

The ﬁnal estimation is the fully speciﬁed multilevel model proposed on pages 7-9:

Single indicator outcome

Level 1:

.9 :; = c0:;

4 9 :;

+

Level 2:

Level 3:

"8C:; = U8C0;

+

_8C1;b F

X8C2; Z F

n "
8C:;

C:; +

:; +
V01;b F

c0:; = V00;

+

A0:;

:; +

ˆX8C02Z 1

C; +

h"
8C0;

; +

ˆ_8C01b 1

U8C0; = ˆU8C00 +
b 1
; = ˆW0 +
_8C1; = ˆ_8C10

ˆW1);

n

+

X8C2; = ˆX8C20

V00; = ˆW000 +

ˆW001);

D00;

+

32

V01; = ˆW010

Then, ˆW1 is the estimated a pathway, ˆW010 is the estimated b pathway, and ˆW001 is the estimated

c’ pathway for the mediation analysis. This can be estimated with a multilevel structural equation

model with either robust maximum likelihood or Bayesian estimation.

Multiple indicator outcome

Level 1:

.8 9 :; = U80:;

_81:;\F

9 :; +

+

n8 9 :;

Level 2:

Level 3:

"8C:; = U8C0;

+

_8C1;b F

X8C2; Z F

n "
8C:;

:; +
_801;\ 1:

:; +

C:; +

h80:;

U80:; = U800;

+

_81CB = _810B

U8C0; = ˆU8C00 +

ˆ_8C01b 1

; +

ˆX8C02Z 1

C; +

h"
8C0;

_8C1; = ˆ_8C10

X8C2; = ˆX8C20

U800; = ˆU8000 +

ˆ_8001\ 1;

; +

a800;

_801; = ˆ_8010

_810; = ˆ_8100

ˆW1);

b 1
; = ˆW0 +
ˆV1b 1;

; +

\ 1;
; = ˆV0 +

n

+

ˆV2);

h

+

Then, ˆW1 is the estimated a pathway, ˆV1 is the estimated b pathway, and ˆV2 is the estimated c’

pathway for the mediation analysis. This can be estimated with a multilevel structural equation

model with either robust maximum likelihood or Bayesian estimation.

33

Comparably, in the situation where the mediator is teacher practices and the outcome is student

achievement, the ˆW1 is the e ect of the treatment on teacher practices, ˆW010 is the relationship

between teacher practices and student achievement, and ˆW001 is the remaining e ect of the treatment

on student achievement remaining not due to teacher practices.

This model allows measurement error to be modeled directly into the mediation e ects; however,

it does assume that the latent state-trait model is correctly speciﬁed for the mediator. The follow-

ing simulation study investigates how these four methods perform across several speciﬁcations,

including e ect sizes, sample sizes, and loading sizes.

34

CHAPTER 3

SIMULATION STUDY

This chapter investigates the bias and power of the above four di erent estimations of the mediation

variable under the assumption of measurement error and various conditions. Using Paxton et al.

(2001)’s framework for Monte Carlo simulation studies, this chapter will estimate the four di erent

estimations of the mediation variable: Standardized averages of the mediator across timepoints

(3-2-1 mediation); factor scores for each time point averaged across timepoints (ignoring the

correlation between timepoints; 3-2-1 mediation); factor scores from the LST as the mediator

(3-2-1 mediation); full speciﬁed model (Multilevel SEM). The method of data generation, followed

by the method for evaluating the simulation study, and then the results of the simulation study are

described below.

3.1 Method

The simulations will be drawn from a multilevel structural equation model of a standard

cluster randomized control trial in education research where the school is assigned the treatment,

implemented at the classroom level, and the outcome is measured at the student level. This structural

equation model will include a treatment indicator, items that are measured at multiple times for the

mediator, and items that load onto the outcome variable.

The conditions for this simulation study for the four estimation methods varied across school-

level sample sizes, mediation e ect sizes, and loadings for the LTS mediation model. These varying

conditions are reported in table 3.1.

The school-level sample size included 30, 60, and 200, plausible school sample sizes in educa-

Table 3.1 Simulation Varying Conditions

Parameter
school sample size
a path (W1)
b path (V1)
trait speciﬁc loadings ( _8C10)
time speciﬁc loadings (X8C20)

Conditions
30, 60, 200
0.15, 0.25, 0.45
0.05, 0.15, 0.25
0.3, 0.6, 0.9
0.3, 0.6, 0.9

35

Table 3.2 Simulation Constant Conditions

)

)

\F

(
)
)
)
)

Parameter
number of teachers per school
number of students per teacher
student level variance of outcome factor (+ 0A
\ 1C
teacher random e ects on outcome (+ 0A
(
\ 1B
school random e ects on outcome (+ 0A
(
teacher level variance of mediator trait factor (+ 0A
teacher level variance of mediator time-speciﬁc factors (+ 0A
correlation between trait and time-speciﬁc factors (⇠>AA
school random e ects on mediator (+ 0A
outcome loadings (_8100)
total treatment e ect (W1 ⇥
treatment/control school split
random seed
number of simulation reps

V1 +

V2)

b F

b 1

(

)

)

(

)

(

)

Z F
C )
(
b F,Z F
C )

)
,⇠>AA

Condition
2
30
1
0.15
0.2
1
1
0
0.8
1, 0.5, 0.8, 0.3
0.2
0.5
2024
100

C ,Z F
Z F
C )

)

(

tion e cacy research. The a path (treatment on teacher practices) of the mediation e ects will vary

by small e ect (0.15), medium e ect (0.25), and large e ect (0.45; Smith and Sheridan, 2019).

These are the e ect sizes of the treatment on teacher practices. The b path (practices on student

achievement) of the mediation e ects will vary by small e ect (0.05), medium e ect (0.15), and

large e ect (0.25; Kraft, 2020). The general and time-speciﬁc latent variables loadings will vary

by small, medium, and large (0.3, 0.6, 0.9). Overall, this is 324 di erent conditions per model

estimation. With 100 replications for each condition, 32,400 di erent data sets were produced.

The ﬁxed conditions of the simulation include the number of teachers per school, number of

students per teacher, teacher random e ects on outcome, school random e ects on outcome, school

random e ects on the mediator, outcome loadings, total treatment e ect, treatment/control schools

split, random seed, number of simulation reps, outcome variance, teacher level mediator variance

of trait and time-speciﬁc factors, and correlation between trait and time-speciﬁc factors. These are

reported in table 3.2.

The school random e ect will be ﬁxed at 0.2, and the outcome loadings will be ﬁxed at the

items varying between 0.1 and 0.8. The total treatment e ect on the outcome will be ﬁxed at 0.2,

while the percentage of treatment explained by the mediator will vary depending on the mediation

e ect sizes.

36

Data Generation

The data will be drawn from a population model that assumes the 3-2-1 LST mediation model

(see pages 23-24) is the true population parameter where there are three-time points with three

items each for the observed mediator and four dichotomous items for the outcome with the above

varying and ﬁxed conditions. The data generation and simulation were completed in the R package,

"MplusAutomation," (Hallquist and Wiley, 2018) in combination with Mplus (Muthen and Muthen,

2017), which provided a method for the automation of data generation and estimation. The R code

can be found in Appendix A.

Estimation Methods

Once the data has been generated, the four estimation methods presented in Chapter Two will

be estimated on each simulated dataset. As the focus of interest, the a and b paths will be estimated

separately for each method (standardized averages of mediator, pages 27-28; averaged factor scores,

pages 29-30; factor scores from LST, pages 31-32; fully speciﬁed model, pages 33-34).

For the ﬁrst estimation method (standardized averages of the mediator), these values are cal-

culated by the average of the mediator at each point, averaged across the time points, and then

standardized. For the second estimation method (averaged factor scores, pages 29-30), the factor

scores are estimated using maximum likelihood with the variances of the factors at each time point

constrained to 1 for identiﬁcation. For the third estimation method (factor scores from LST, pages

31-32), the factor scores are estimated using maximum likelihood with the variances of the trait

and time-speciﬁc factors constrained to 1 and the correlation between the trait and time-speciﬁc

factors constrained to zero.

The pathways a and b (e ect of treatment on mediator and relationship between mediator and

outcome) are estimated with non-informative multilevel Bayesian methods with the Gibbs sampler

(Depaoli and Clifton, 2015) with either the estimated mediators in the model or the fully speciﬁed

measurement model. In the models that used estimated values of the mediator, the variance of

the student-level outcome is ﬁxed to 1 to ensure identiﬁcation of the model. In the fully speciﬁed

models, variances of the student-level outcome and teacher-level mediators were ﬁxed to 1, and the

37

correlations between the trait and time-speciﬁc mediator factors were ﬁxed to 0 to ensure model

identiﬁcation. For all models, the convergence was determined by using the Mplus default settings

(Asparouhouv and Muthen, 2010) where two MCMC chains are used, with the ﬁrst half being

discarded and the second half being used for estimating the posterior distribution and determining

convergence. Convergence in this instance is determined by the Potential Scale Reduction (PSR)

convergence criteria, with the default PSR for convergence set at 1.05 (Asparouhouv and Muthen,

2010). The PSR is a comparison of the within and between iteration variance of the parameters

where the smaller PSR indicates smaller between iteration variance, indicating convergence.

After each method estimates each condition over 100 repetitions, it will be evaluated on the

relative bias, convergence rates, and power of the a and b paths.

Evaluation Criteria

The evaluation criteria for this simulation study are relative bias, convergence rates, and power.

Note that for this simulation study, neither the bias of the standard error nor the mean square error

is explored. These are available upon request. Each model is evaluated on both the and b paths.

The relative bias is deﬁned as:

A4;0C8E4180B =

ˆ\

\

 
\

Where ˆ\ is the estimated parameter of either path a or path b, and \ is the true parameter of

either path a or path b.

The convergence rate is the percentage of times the model converges so that out of every 100

replications if 90 replications converge, the convergence rate would be 90%. Finally, power is the

percentage of replications that estimated signiﬁcant e ects for a and b paths, so that similarly, if

out of every 100 replications, 90 had signiﬁcant a pathways, then the power for pathway a would

be 0.90.

38

Table 3.3 Bias for 30 schools

S: 0.15

General Time Speciﬁc

S:0.05

M:0.15

L:0.25

S:0.05

Pathway a

M:0.25

Pathway b

M:0.15

L:0.45

L:0.25

S:0.05

M:0.15

L:0.25

a

b

a

b

a

b

a

b

a

b

a

b

a

b

a

b

Loading

b
Loading
Standardized Averages of the Mediator
0.3

a

-0.739
-0.539
-0.555
-0.208
-0.241
-0.247
-0.208
-0.177
-0.150

-0.116
-0.125
-0.194
0.207
0.136
0.075
0.170
0.146
0.118

-0.609
-0.932
-0.593 -0.080
-0.580 -0.092
0.382
-0.288
0.496
-0.259
0.918
-0.248
0.688
0.042
0.676
0.065
0.980
0.081

-0.618
-0.606
-0.598
-0.311
-0.302
-0.291
-0.008
0.007
0.022

-0.427
-0.379
-0.340
-0.063
0.023
0.057
0.325
0.395
0.443

-0.423
-0.383
-0.360
-0.095
-0.059
-0.029
0.265
0.274
0.313

-0.511
-0.463
-0.391
-0.187
-0.149
-0.077
0.158
0.202
0.241

-0.538
-0.516
-0.482
-0.194
-0.162
-0.144
0.154
0.188
0.199

-0.530
-0.515
-0.502
-0.210
-0.194
-0.177
0.098
0.122
0.147

-0.631
-0.562
-0.526
-0.290
-0.261
-0.229
0.030
0.048
0.076

-0.926
-0.074
-0.090
0.392
0.496
0.918
0.686
0.676
0.980

1.782
1.886
1.820
2.260
2.424
2.426
2.456
2.318
2.250

-0.184
-0.054
0.334
0.340
0.318
0.606
0.604
0.620
0.510

3.546
2.652
1.258
3.352
4.276
-1.000
3.166
4.136
0.432

-0.4287
-0.376
-0.387
-0.089
-0.001
0.050
0.338
0.391
0.455

-0.405
-0.372
-0.343
-0.088
-0.044
-0.011
0.248
0.296
0.333

-0.528
-0.461
-0.394
-0.197
-0.150
-0.087
0.159
0.186
0.229

-0.981
-0.991
-0.591
-0.853
-1.198
-1.061
-0.859
-1.091
-2.278

-0.135
-0.450
-0.041
0.959
-0.467
-0.533
0.401
0.334
-0.231

0.127
0.157
0.063
0.406
0.443
0.374
0.433
0.449
0.404

-0.460
-0.398
-0.329
-0.161
-0.175
-0.167
0.008
-0.076
-0.152

0.513
0.123
-0.403
0.539
0.380
-0.061
0.564
0.575
0.495

-0.431
-0.391
-0.337
-0.056
-0.004
0.027
0.321
0.381
0.398

-0.411
-0.393
-0.363
-0.065
-0.047
-0.010
0.251
0.295
0.327

-0.523
-0.465
-0.477
-0.195
-0.154
-0.109
0.136
0.163
0.204

-1.009
-1.046
-0.333
-0.918
-1.694
-1.073
-1.303
-0.863
-0.653

-0.737
-0.539
-0.555
-0.208
-0.241
-0.246
-0.160
-0.175
-0.148

-0.150
-0.194
-0.235
0.141
0.064
0.019
0.104
0.104
0.051

-0.583
-0.546
-0.590
-0.335
-0.359
-0.418
-0.226
-0.284
-0.367

-0.129
-0.100
-0.304
-0.112
-0.160
0.061
-0.262
0.198
-0.054

-0.536
-0.507
-0.484
-0.197
-0.146
-0.126
0.154
0.197
0.226

-0.539
-0.515
-0.501
-0.228
-0.204
-0.183
0.106
0.113
0.134

-0.601
-0.561
-0.526
-0.272
-0.248
-0.212
0.044
0.070
0.084

-0.964
-0.688
-1.120
-1.091
-0.922
-0.802
-0.931
-1.268
-0.824

-0.926
-0.074
-0.090
0.392
0.498
0.918
0.698
0.676
0.980

-0.086
2.126
1.996
2.588
2.720
2.564
2.730
2.592
2.608

-0.314
-0.050
0.150
0.416
0.546
0.540
0.528
0.562
0.418

3.592
2.984
4.910
2.766
2.408
3.782
2.936
4.668
2.618

-0.537
-0.506
-0.512
-0.213
-0.161
-0.130
0.163
0.195
0.233

-0.528
-0.509
-0.496
-0.222
-0.194
-0.172
0.096
0.127
0.152

-0.604
-0.560
-0.526
-0.277
-0.250
-0.218
0.032
0.052
0.078

-0.961
-0.859
-0.742
-1.076
-0.995
-0.946
-0.916
-1.048
-1.155

-0.136
-0.450
-0.040
0.961
-0.461
-0.537
0.401
0.331
-0.231

0.192
0.216
0.119
0.509
0.523
0.432
0.510
0.538
0.517

-0.458
-0.395
-0.391
-0.110
-0.113
-0.189
-0.049
-0.106
-0.181

0.569
0.940
-0.232
0.320
0.271
-0.277
0.273
0.223
0.543

-0.652
-0.588
-0.625
-0.554
-0.603
-0.586
-0.346
-0.359
-0.330
-0.348
-0.429
-0.310
-0.258 -0.031
-0.304 -0.014
-0.380
-0.006

-0.254
-0.986
-0.386
-1.013
-0.551
-0.796
-0.177
-0.931
-0.365
-0.789
-0.695
-0.923
-1.248
-0.185
-0.966 -0.114
0.090
-1.229

-0.976
-1.041
-1.854
-1.027
-1.102
-1.173
-1.067
-0.989
-0.921

-0.609
-0.592
-0.596
-0.296
-0.267
-0.250
0.046
0.064
0.085

-0.611
-0.600
-0.594
-0.307
-0.297
-0.283
-0.001
0.016
0.030

-0.135
-0.528
-0.588
-0.452 -0.597 -0.523
-0.039 -0.579 -0.528
0.962
-0.285 -0.208
-0.460 -0.268 -0.244
-0.531 -0.258 -0.245
-0.210
0.041
0.401
-0.177
0.060
0.331
-0.152
0.066
-0.231

0.299
0.325
0.233
0.805
0.699
0.615
0.672
0.711
0.677

-0.582 -0.072
-0.603 -0.055
-0.594 -0.120
0.256
-0.312
0.244
-0.294
0.181
-0.283
0.258
-0.004
0.246
0.013
0.208
0.027

-0.655
-0.493
-0.626 -0.349
-0.602
-0.349
-0.331
-0.307
-0.031
-0.018
-0.002

-0.627
-0.376
-0.630 -0.561
-0.335 -0.608 -0.589
-0.296
-0.346 -0.365
-0.166 -0.334 -0.300
-0.212 -0.313 -0.454
-0.132 -0.035 -0.216
-0.163 -0.025 -0.278
-0.263 -0.011 -0.427

-0.990
-0.964
-0.534
-1.018
-0.963
-1.111
-1.052
-0.971
-0.918

-0.992
0.599
0.013
0.116
-0.999
0.640
-1.000 -1.000
0.336
-1.001 -0.067
0.477
0.103
0.278
-1.250
-0.105 -0.943 -0.358
-0.956 -0.136
0.443
-1.112 -0.241
0.492
-1.000 -1.000
0.992

2.290
2.482
2.346
3.250
3.298
3.078
2.970
3.266
3.204

-0.406
0.094
0.342
0.234
0.304
0.370
0.132
0.202
0.376

3.150
3.176
4.406
3.378
2.846
3.280
2.664
2.578
4.872

0.6

0.9

0.6

0.9

0.6

0.9

Averaged factor scores
0.3

Factor scores of the LST Model
0.3

0.6

Fully speciﬁed model
0.3

0.3
-0.920
0.6
-0.844
0.9
-0.369
0.3
-0.880
0.6
-0.879
0.9
-1.000
0.3
-1.000
0.6
-1.222
-1.138
0.9
Note:Bias less than 0.05 is bolded

0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

3.2 Results

Bias

For 30 schools, almost none of the models meets the threshold for acceptable bias (

k 
0.05). Bias was the smallest when factor scores from the LST model were used when the general

k

180B

loading (trait loading) was high, and the pathways were of medium or strong strength. In general,

the a paths tend to be less biased than the b paths, and this is particularly true when pathway b is

tiny (0.05), where the b pathways have a lot of bias. These results are reported in Table 3.3.

For 60 schools, there is less bias in general compared to 30 schools. Again, the bias is the

slightest with the factor scores from the LST model with high general loadings (trait loadings), and

this is true across all of the pathways of di erent sizes under the highest general loadings. However,

there are now some unbiased estimates from both the average factor scores and the standardized

39

Table 3.4 Bias for 60 schools

S: 0.15

General Time Speciﬁc

S:0.05

M:0.15

L:0.25

S:0.05

Pathway a

M:0.25

Pathway b

M:0.15

L:0.45

L:0.25

S:0.05

M:0.15

L:0.25

Loading

b
Loading
Standardized Averages of the Mediator
0.3

a

a

b

a

b

a

b

a

b

a

b

a

b

a

b

a

b

-0.639
-0.622
-0.593
-0.301
-0.265
-0.245
0.046
0.091
0.112

-0.631
-0.612
-0.593
-0.332
-0.298
-0.273
-0.025
0.019
0.049

-0.652
-0.603
-0.579
-0.333
-0.313
-0.269
-0.007
0.018
0.043

-1.000
-0.873
-0.853
-1.133
-1.220
-0.573
-0.869
-1.024
-0.792

-0.322
-0.484
-0.688
0.229
0.071
-0.100
0.348
0.149
-0.057

0.057
-0.204
-0.406
0.275
0.041
-0.166
0.219
0.087
-0.063

0.621
0.369
0.095
0.755
0.475
0.165
0.514
0.348
0.137

-0.135
-0.038
-0.561
-0.109
-0.267
0.003
-0.080
-0.137
-0.055

-0.653
-0.680
-0.219 -0.658 -0.010 -0.664 -0.322 -0.672 -0.219 -0.677 -0.012
-0.627 -0.449 -0.642 -0.934 -0.653 -0.484 -0.656 -0.449 -0.668 -0.936 -0.674
-0.597 -0.575 -0.619 -3.000 -0.636 -0.688 -0.638 -0.575 -0.655 -1.020 -0.664
-0.367
0.229
0.159
-0.333
-0.316
-0.355
0.071
-0.006 -0.311
-0.278
-0.348
-0.245 -0.194 -0.299
-0.051
0.141
-0.006
0.029
0.076
-0.036
0.020
0.004
-0.130
0.107
-0.029
0.036

-0.363
-0.341
-0.350
-0.319
-0.307 -0.100 -0.307 -0.194 -0.344
-0.048
-0.023
-0.012
-0.034
0.006
0.015
-0.024
0.024
0.027

0.830
0.584
0.266
1.514
1.028
0.398

0.830
0.584
0.266
1.514
1.028
0.400

-0.350
0.159
-0.327 -0.006

0.348
0.149
-0.057

0.141
0.004
-0.130

0.298

-0.662

-0.628 -0.028 -0.657
-0.660 -0.067 -0.680 -0.082 -0.681
0.015
-0.623 -0.230 -0.648 -0.188 -0.650 -0.233 -0.655 -0.258 -0.674 -0.600 -0.675
-0.602 -0.403 -0.635 -0.602 -0.638 -0.465 -0.640 -0.424 -0.667 -0.804 -0.666
0.151
-0.342
-0.381
-0.358
0.127
-0.308 -0.025 -0.337
-0.045 -0.348 -0.094 -0.369 -0.318 -0.370
-0.263 -0.071 -0.324 -0.294 -0.326 -0.222 -0.322 -0.118 -0.360 -0.900 -0.362
-0.074
-0.062
0.143
-0.076
0.168
-0.019
0.047
-0.061 -0.138 -0.063
0.013
0.009
0.028
-0.027
-0.087 -0.051 -0.526 -0.053
-0.155 -0.005
0.061

0.182
-0.048
0.068
-0.023
-0.044 -0.006

-0.056
-0.031
-0.014

0.702
0.344
0.158

-0.360
-0.339

0.608
0.092

-0.369

-0.378

0.226

0.065

-0.669
-0.618
-0.567
-0.341
-0.305
-0.251
0.006
0.033
0.071

-0.680
0.255
-0.656
0.034
-0.220 -0.627
-0.360
0.398
-0.338
0.185
-0.077 -0.311
0.257
-0.031
0.116
-0.017
-0.083
-0.012

-0.940
-0.459 -0.973
-1.101 -0.525 -0.765
-1.028 -0.668 -0.930
-1.025
-0.442 -0.986
-1.279 -0.506 -1.080
-0.358 -0.472
-1.279
-0.981
-0.419 -1.064
-0.832 -0.515 -0.859
-1.385 -0.476 -1.087

2.390
2.326
1.438
2.284
1.968
1.566
1.684
1.460
1.116

1.724
1.620
2.808
1.624
1.512
1.532
1.536
1.834
0.596

-0.682
-0.657
-0.630
-0.363
-0.344
-0.329
-0.038
-0.024
-0.014

0.573
0.421
0.053
0.667
0.433
0.179
0.438
0.293
0.110

-0.696
0.215
-0.686
-0.654
-0.675
0.005
-0.622 -0.242 -0.657
-0.382
0.346
-0.368
-0.371
0.134
-0.346
-0.317 -0.088 -0.358
-0.068
0.202
-0.040
-0.061
0.065
-0.024
-0.002 -0.101
-0.047

-0.969 -0.105 -0.991 -0.465 -0.943
-0.787 -0.189 -1.035 -0.426 -1.081
-0.738 -0.104 -0.131 -0.441 -0.750
-0.960 -0.149 -1.009 -0.498 -0.964
-1.232 -0.157 -0.976 -0.357 -1.040
-0.828 -0.131 -1.171 -0.136 -0.914
-1.062 -0.097 -0.951 -0.436 -0.999
-0.860 -0.192 -1.022 -0.375 -1.099
-1.302 -0.330 -0.534
-1.526

0.023

1.996
1.878
1.440
1.094
0.888
0.756
0.658
0.578
0.810

1.756
2.252
1.840
1.694
2.102
1.892
1.660
1.916
1.042

-0.697
-0.679
-0.658
-0.387
-0.375
-0.357
-0.072
-0.064
-0.049

-0.968
-0.930
-1.000
-1.047
-0.927
-0.704
-1.024
-0.952
-1.384

-0.322
-0.484
-0.688
0.229
0.071
-0.100
0.348
0.149
-0.057

-0.146
-0.349
-0.597
0.025
-0.202
-0.412
-0.055
-0.172
-0.334

-0.684 -0.220
-0.449
-0.676
-0.575
-0.666
-0.372
0.158
-0.359 -0.006
-0.194
-0.348
0.141
-0.057
0.004
-0.041
-0.130
-0.031

-0.683 -0.104
-0.314
-0.677
-0.670
-0.524
-0.386 -0.051
-0.374 -0.215
-0.355
-0.365
-0.091
-0.079
-0.224
-0.066
-0.296
-0.056

0.149
-0.698
0.431
-0.058
-0.678
0.310
-0.244
-0.660
0.021
0.225
-0.386
0.317
-0.374
0.114
0.047
-0.162
-0.358
0.059
-0.070
0.064
0.125
-0.062 -0.044
0.014
-0.010 -0.049 -0.180

-0.123
-0.072
-1.000
-0.052
-0.089
0.288
-0.151
-0.115
-0.093

-0.972 -0.463
-0.460
-0.956
-1.000
-1.000
-0.984 -0.460
-0.419
-0.892
-0.943
-0.588
-0.990 -0.474
-1.069
-0.390
-1.138 -0.176

-0.630
-0.604
-0.565
-0.289
-0.251
-0.232
0.056
0.099
0.127

4.000
-0.934
-1.020
0.830
0.582
0.266
1.514
1.028
0.398

-0.625
-0.603
-0.591
-0.321
-0.288
-0.269
-0.010
0.029
0.057

0.498
-0.166
-0.416
0.868
0.328
-0.060
0.954
0.636
0.326

-0.661
-0.617
-0.579
-0.325
-0.311
-0.265
-0.003
0.019
0.049

2.612
2.286
1.634
2.586
2.124
1.456
1.938
1.644
1.266

1.596
1.830
1.070
1.538
1.734
1.714
1.576
1.464
2.790

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.6

0.9

0.6

0.9

0.6

0.9

Averaged factor scores
0.3

Factor scores of the LST Model
0.3

0.6

Fully speciﬁed model
0.3

0.3
-0.948
0.6
-1.069
0.9
-0.830
0.3
-0.883
0.6
-1.058
0.9
-0.911
0.3
-0.827
0.6
-1.286
-0.641
0.9
Note:Bias less than 0.05 is bolded

0.9

average of the mediator with high general loadings. In the standardized averages of the mediator,

these are in the medium and high pathways and for the average factor scores, these tend to be under

the small to medium pathways. Again, pathway a tends to be less biased than pathway b. In general,

the bias of the b pathway decreases as the magnitude of the true b pathway increases, which is not

consistently true across the a path. These results are reported in Table 3.4.

For 200 schools, the bias is comparable to the 60 schools. Here, the bias is most negligible

in both the averaged factor scores and the factor scores from the latent state-trait model under the

highest trait loadings. In the averaged factor scores, only the pathway a is unbiased under certain

conditions whereas the factor scores from the LST model also had unbiased pathways b under

certain conditions. These results are reported in Table 3.5.

Across all of the sample sizes and conditions, the fully speciﬁed model was never unbiased, a

40

Table 3.5 Bias for 200 schools

S: 0.15

General Time Speciﬁc

S:0.05

M:0.15

L:0.25

S:0.05

Pathway a

M:0.25

Pathway b

M:0.15

L:0.45

L:0.25

S:0.05

M:0.15

L:0.25

Loading

b
Loading
Standardized Averages of the Mediator
0.3

a

0.6

0.9

0.6

0.9

0.6

0.9

0.6

0.9

Averaged factor scores
0.3

Factor scores of the LST Model
0.3

Fully speciﬁed model
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

a

b

a

b

a

b

a

b

a

b

a

b

a

b

a

b

0.834
-0.716
0.266
-0.699
-0.167
-0.685
0.581
-0.416
0.328
-0.393
-0.375
0.044
0.210
-0.111
-0.085
0.102
-0.065 -0.047

-0.633
-0.626
-0.621
-0.329
-0.313
-0.303
-0.031
-0.008
0.007

-0.633
-0.646
-0.621
-0.329
-0.313
-0.303
-0.031
-0.008
0.007

-1.013
-0.081
-1.859
-1.041
-0.912
-1.379
-1.039
-0.665
-1.177

1.409
0.693
0.119
0.905
0.603
0.294
0.415
0.295
0.144

1.409
0.759
0.119
0.905
0.603
0.294
0.415
0.295
0.144

-0.209
-0.229
-0.223
-0.228
-0.223
-0.088
-0.217
-0.218
-0.291

-0.733
-0.709
-0.695
-0.433
-0.407
-0.387
-0.126
0.029
-0.075

-0.627
-0.620
-0.617
-0.323
-0.307
-0.299
-0.027
0.152
0.011

-0.627
-0.620
-0.617
-0.323
-0.307
-0.299
-0.027
0.152
0.011

-1.173
-0.761
-1.184
-1.200
-1.065
-1.273
-1.227
–
-1.494

0.726
0.240
-0.157
0.484
0.252
0.007
0.132
0.062
-0.102

1.122
0.530
0.026
0.690
0.442
0.170
0.251
0.304
0.020

1.122
0.530
0.026
0.690
0.442
0.170
0.251
0.304
0.022

-0.530
-0.518
-0.557
-0.531
-0.527
-0.551
-0.528
–
-0.566

-0.698
-0.691
-0.684
-0.397
-0.386
-0.376
-0.097
-0.082
-0.070

-0.666
-0.660
-0.656
-0.364
-0.353
-0.346
-0.063
-0.050
-0.039

-0.767
-0.760
-0.758
-0.463
-0.460
-0.458
-0.159
-0.158
-0.157

-1.020
-0.945
-0.433
-1.030
-0.788
-0.382
-1.049
-1.244
-0.253

1.362
0.396
-0.126
1.128
0.592
0.196
0.656
0.428
0.164

3.168
1.786
0.844
2.294
1.712
1.164
1.532
1.276
0.960

1.658
0.634
0.014
1.044
0.540
0.108
0.486
0.272
0.026

1.314
1.318
1.374
1.348
1.272
1.228
1.314
1.426
1.446

-0.709
-0.699
-0.691
-0.409
-0.396
-0.385
-0.107
-0.090
-0.079

-0.661
-0.657
-0.654
-0.360
-0.350
-0.345
-0.060
-0.047
-0.038

-0.771
-0.764
-0.759
-0.469
-0.464
-0.461
-0.164
-0.162
-0.160

-1.043
-1.066
-1.169
-1.054
-0.951
0.973
-0.997
-0.878
-0.660

-0.720
0.883
-0.706
0.302
-0.697
-0.141
-0.419
0.621
-0.404
0.363
-0.393
0.072
-0.116
0.239
0.121
-0.099
-0.047 -0.086

1.541
0.817
0.206
1.037
0.711
0.387
0.530
0.403
0.209

-0.656
-0.654
-0.650
-0.356
-0.348
-0.342
-0.059
-0.045
-0.036

-0.777
0.793
-0.766
0.037
-0.760
-0.380
-0.473
0.509
-0.467
0.109
-0.462
-0.227
0.145
-0.168
-0.041 -0.164
-0.162
-0.260

-0.218
-0.246
-0.244
-0.225
-0.190
-0.231
-0.220
-0.195
-0.277

-1.067
-0.739
-1.811
-1.068
-1.183
-0.168
-0.934
-0.982
-0.671

0.726
0.239
-0.157
0.488
0.253
0.011
0.132
0.026
-0.102

1.177
0.566
0.062
0.741
0.487
0.212
0.294
0.192
0.061

0.584
-0.102
-0.471
0.371
-0.000
-0.308
0.050
-0.123
-0.310

-0.543
-0.518
-0.526
-0.528
-0.552
-0.633
-0.517
-0.539
-0.510

-0.699
-0.695
-0.691
-0.398
-0.392
-0.387
-0.098
-0.090
-0.084

-0.683
-0.680
-0.677
-0.383
-0.377
-0.372
-0.082
-0.075
-0.070

-0.740
-0.736
-0.734
-0.438
-0.436
-0.434
-0.135
-0.134
-0.133

-0.980
-0.966
-1.083
-0.890
-0.675
-1.080
-1.025
-0.859
-1.132

1.310
0.376
-0.154
1.086
0.592
0.190
0.592
0.398
0.098

3.778
2.284
1.178
2.806
2.172
1.556
1.940
1.660
1.340

1.092
0.326
-0.150
0.614
0.232
-0.128
0.172
0.020
-0.186

1.324
1.374
1.494
1.286
1.342
1.324
1.324
1.358
1.234

-0.705
-0.666
-0.695
-0.405
-0.398
-0.392
-0.103
-0.095
-0.089

-0.680
-0.653
-0.676
-0.380
-0.374
-0.372
-0.081
-0.073
-0.068

-0.742
-0.736
-0.734
-0.441
-0.437
-0.435
-0.137
-0.136
-0.135

-0.956
-0.965
-0.928
-0.955
-0.954
-0.885
-1.042
-0.909
-0.617

0.834
1.264
-0.149
0.589
0.341
0.059
0.192
0.102
-0.039

1.682
1.668
0.283
1.1867
0.8547
0.5057
0.631
0.501
0.341

0.586
0.816
-0.444
0.343
-0.007
-0.313
0.031
-0.135
-0.315

-0.229
0
-0.270
-0.221
-0.223
-0.255
-0.201
-0.247
-0.081

0.725
-0.711
-0.704
0.251
-0.698 -0.146
0.490
-0.411
0.260
-0.402
-0.396
0.016
0.121
-0.108
-0.099
0.026
0.082
-0.044

-0.676
-0.675
-0.674
-0.378
-0.372
-0.370
-0.080
-0.072
-0.022

1.296
0.657
0.132
0.859
0.594
0.294
0.380
0.271
0.328

0.496
-0.745
-0.156
-0.739
-0.502
-0.735
-0.444
0.296
-0.440 -0.058
-0.437 -0.350
-0.140 -0.002
-0.138 -0.168
-0.077
-0.116

-0.968
-0.537
-1.163 -0.528
-0.342 -0.478
-0.992
-0.514
-0.993 -0.528
-1.465 -0.507
-0.957
-0.530
-0.818 -0.526

–

–

-0.697
-0.684
-0.673
-0.395
-0.376
-0.359
-0.095
-0.070
-0.050

-0.642
-0.633
-0.625
-0.336
-0.317
-0.308
-0.035
-0.012
0.004

-0.642
-0.633
-0.625
-0.336
-0.317
-0.308
-0.035
-0.012
0.004

-1.077
-1.359
-1.167
-1.084
-1.797
-1.722
-1.280
-0.735
-0.747

1.310
0.360
-0.210
1.068
0.552
0.168
0.626
0.400
0.140

2.778
1.496
0.646
1.960
1.440
0.924
1.230
1.014
0.728

2.778
1.496
0.646
1.960
1.440
0.924
1.230
1.014
0.728

1.368
1.454
1.534
1.346
1.304
1.468
1.288
1.416
1.248

surprising ﬁnding. However, when investigated, this was due to the between level 3 trait loadings

being 0 instead of the population parameter. This may result from the level 2 sample size or the

level 2 interclass correlation coe cients. Changes in these parameters should be investigated in

future work to evaluate this unforeseen issue. Additionally, Using a weakly informed prior for

the random variances at level 2 and level 3 may decrease bias through decreasing issues in the

variance-covariance matrix, which should also be investigated in future work.

An important takeaway from the results of the bias across the di erent school sample sizes is the

importance of solid measures for the mediator, measures explicitly that load strongly onto the trait

factor of the mediator. When researching questions related to mediators with trait-state properties,

investigators should invest a decent amount of time and resources into ensuring that the measure is

able to measure the trait factor of the mediator. Then, depending on the number of schools and the

41

strength of the mediation measure, the investigator should consider using the factor scores from the

LST model (with a smaller number of schools).

Convergence

Across all school sample sizes, the standardized averages, averaged factor scores, and factor

scores from the LST model converged at 100%. This indicates that researchers should not have

convergence problems when using these estimation methods. At 30 schools, the fully speciﬁed

model converged more than 70% of the time only under one condition (small general loading, small

speciﬁc loadings, and medium a and b pathways). In general, convergence decreased as both the

general and speciﬁc loadings increased. There is no general trend in convergence dependent on the

magnitude of the true a or b pathways. Results are reported in Table 3.6.

At 60 schools, the fully speciﬁed model converged at rates higher than 70% when the general

and speciﬁc loadings were small across all pathway magnitudes, as well as with a medium general

loading and small speciﬁc loadings with medium and small pathways a. In general, convergence

was higher with a larger school sample, and the trend continued to decrease as the general and

speciﬁc loadings increased. Results are reported in Table 3.7.

At 200 schools, convergence rates were higher than at 30 or 60 schools. Convergence rates

met the threshold of 70% under small and medium trait loadings with small state loadings.

In

general, convergence rates decreased as factor loadings increased with no pattern across the a

and b pathways. Results are reported in Table 3.8. In general, when considering using the fully

speciﬁed model, convergence will occur with larger sample sizes and smaller factor loadings. These

convergence issues may be due to similar problems causing bias, such as the lack of variance in the

level 3 mediator. In the future, this should be investigated with more conditions.

Under this current simulation study, with 200 schools or fewer and the recommendation to use

factor scores from the LST under the bias conditions, factor scores from the LST converged under

all conditions, indicating that there would likely be no convergence issues under this estimation

method.

42

Table 3.6 Convergence for 30 schools

General Time Speciﬁc

S: 0.15

Pathway a

M:0.25

Pathway b

L:0.45

Loading

Loading
Standardized Averages of the Mediator
0.3

S:0.05 M:0.15 L:0.25 S:0.05 M:0.15 L:0.25 S:0.05 M:0.15 L:0.25

0.6

0.9

0.6

0.9

0.6

0.9

Averaged factor scores
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

1
1
1
1
1
1
1
1
1
Factor scores of the LST Model
1
0.3
1
1
1
1
1
1
1
1

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.9

0.6

Fully speciﬁed model
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

0.66
0.15
0.02
0.54
0.15
0
0.42
0.05
0.01

0.66
0.12
0.02
0.52
0.16
0.01
0.39
0.07
0.04

0.69
0.14
0.02
0.55
0.08
0.01
0.33
0.07
0.02

0.6
0.16
0.03
0.49
0.27
0.01
0.39
0.12
0.04

0.7
0.12
0.02
0.49
0.12
0.03
0.42
0.15
0.03

0.65
0.15
0.01
0.41
0.14
0.02
0.31
0.09
0.02

0.65
0.11
0.02
0.47
0.13
0.02
0.3
0.11
0.02

0.61
0.15
0.03
0.52
0.14
0.02
0.32
0.09
0.04

0.68
0.11
0
0.54
0.08
0.05
0.35
0.09
0

43

Table 3.7 Convergence for 60 schools

General Time Speciﬁc

S: 0.15

Pathway a

M:0.25

Pathway b

L:0.45

Loading

Loading
Standardized Averages of the Mediator
0.3

S:0.05 M:0.15 L:0.25 S:0.05 M:0.15 L:0.25 S:0.05 M:0.15 L:0.25

0.6

0.9

0.6

0.9

0.9

0.6

Averaged factor scores
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

1
1
1
1
1
1
1
1
1
Factor scores of the LST Model
1
0.3
1
1
1
1
1
1
1
1

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.9

0.6

Fully speciﬁed model
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

0.88
0.22
0.05
0.66
0.16
0.05
0.47
0.20
0.04

0.85
0.19
0.03
0.69
0.14
0.02
0.49
0.13
0.03

0.86
0.14
0.06
0.73
0.24
0.04
0.47
0.18
0.02

0.86
0.21
0.02
0.75
0.20
0.02
0.45
0.19
0.03

0.87
0.12
0.03
0.75
0.18
0.04
0.46
0.12
0.05

0.89
0.22
0.02
0.73
0.2
0.04
0.56
0.18
0.04

0.83
0.21
0.01
0.68
0.21
0.05
0.53
0.18
0.02

0.90
0.14
0
0.61
0.19
0.04
0.54
0.15
0.03

0.93
0.18
0
0.74
0.22
0.03
0.56
0.15
0.02

44

Table 3.8 Convergence for 200 schools

General Time Speciﬁc

S: 0.15

Pathway a

M:0.25

Pathway b

L:0.45

Loading

Loading
Standardized Averages of the Mediator
0.3

S:0.05 M:0.15 L:0.25 S:0.05 M:0.15 L:0.25 S:0.05 M:0.15 L:0.25

0.6

0.9

0.6

0.9

0.6

0.9

Averaged factor scores
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

1
1
1
1
1
1
1
1
1
Factor scores of the LST Model
1
0.3
1
1
1
1
1
1
1
1

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.6

0.9

Fully speciﬁed model
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1

0.90
0.20
0.02
0.72
0.21
0.04
0.4
0.10
0.01

0.89
0.18
0.04
0.65
0.22
0.03
0.54
0.18
0.01

0.85
0.28
0.05
0.74
0.25
0.02
0.51
0
0.05

0.83
0.23
0.03
0.68
0.17
0.04
0.42
0.20
0.03

0.87
0.21
0.02
0.75
0.24
0.03
0.45
0.12
0.02

0.83
0.23
0.04
0.72
0.25
0.01
0.34
0.17
0.04

0.92
0.19
0.04
0.63
0.17
0.05
0.43
0.15
0.01

0.88
0.20
0.02
0.71
0.23
0.05
0.48
0.12
0.01

0.93
0.29
0.01
0.65
0.23
0.02
0.35
0.11
0

45

Table 3.9 Power for 30 schools

S: 0.15

Pathway a

M:0.25

Pathway b

L:0.45

General Time Speciﬁc

S:0.05

M:0.15

L:0.25

S:0.05

M:0.15

L:0.25

S:0.05

M:0.15

L:0.25

a

b

a

b

a

b

a

b

a

b

a

b

a

a

Loading

Loading
b
Standardized Averages of the Mediator
0
0.3
0
0
0
0.011
0.011
0.022
0.011
0.011

0
0
0
0
0
0
0
0.011
0.011

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.9

0.6

0.9

0.6

Averaged factor scores
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0
0
0
0
0
0
0
0
0.010
Factor scores of the LST Model
0
0.3
0
0
0
0
0
0
0
0

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.9

0.6

Fully speciﬁed model
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.00
0
0
0
0
0
0
0
0

0.6

0.9

Power

0
0
0
0
0
0
0.021
0.010
0.010

0
0
0
0
0
0
0.010
0.010
0.010

0.030
0
0
0.019
0.067
0
0
0
0

0
0
0
0
0
0
0
0
0.011

0
0
0
0
0
0
0
0
0.010

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0.011
0.022
0.022
0.011

0
0
0
0
0
0
0.021
0.010
0.010

0
0
0
0
0
0
0.010
0.010
0.020

0.015
0
0
0.019
0
0
0.026
0
0

0
0
0
0
0
0
0
0.011
0.011

0
0
0
0
0
0
0
0
0.010

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0
0
0
0.011
0
0
0.011
0.022 0.011 0.022
0.033 0.011 0.011 0.011 0.022 0.011
0.022 0.011 0.011 0.011 0.011 0.011

0
0
0
0
0
0.011
0.022

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0.011
0.011 0.011
0.022 0.022 0.022 0.022
0.033 0.032 0.011 0.033
0.022 0.043 0.011 0.033

b

0
0
0
0
0

a

0
0

0
0

0.011 0.011
0.022 0.022
0.022 0.033
0.011 0.033

0
0
0
0
0
0
0

0
0.021
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.021
0.021
0.021
0.031 0.010 0.010 0.010 0.010
0.021 0.010 0.010 0.010 0.010 0.010

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0.021 0.031 0.021 0.031
0.021 0.021 0.010 0.021
0.021 0.042 0.010 0.031

0.010 0.031
0.021 0.031
0.010 0.031

0
0
0
0
0
0
0
0.010
0.020 0.010 0.010 0.010 0.020 0.010

0
0
0
0
0
0
0.010
0.010

0
0
0
0
0
0
0
0.010

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0
0

0
0
0
0
0
0
0.010 0.010 0.010
0.010 0.030 0.010 0.030
0.020 0.052 0.010 0.051

0.029
0.071
0
0.382
0
0
0
0.143
0

0
0
0
0
0
0
0
0
0

0.017
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0.014
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0.032
0
0

0
0
0
0
0
0
0
0
0

0.015
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0.010
0.030
0.020 0.051

0.033
0
0
0.038
0
0
0
0
0

0
0
0
0
0
0
0
0
0

b

0
0

0
0
0
0.022
0.033
0.022

0
0
0
0
0
0
0.010
0.010
0.021

0
0
0
0
0
0
0.010
0.010
0.010

0.058
0.091
0
0.019
0
0
0
0
0

Across all three school sample sizes, no power of 80% was ever achieved for any of the models

or estimated pathways. For 200 schools, power increased substantially for the a pathway when the

a pathway is 0.45 and the trait loadings were high when using the standardized averages, averaged

factor scores, and the factor scores of the LST model. As the state loadings increased, the power

also increased within the high trait loadings. Power remained low in the fully speciﬁed model.

These results are reported in Tables 3.9-3.11.

In general, when researchers hope to answer research questions that may use this LST multilevel

mediation model, power will not reach adequate levels with 200 schools or with e ect sizes less than

or equal to 0.45. To increase power, researchers may consider increasing the sample size, adjusting

46

Table 3.10 Power for 60 schools

S: 0.15

Pathway a

M:0.25

Pathway b

L:0.45

General Time Speciﬁc

S:0.05

M:0.15

L:0.25

S:0.05

M:0.15

L:0.25

S:0.05

M:0.15

L:0.25

Loading

Loading
b
Standardized Averages of the Mediator
0.3

a

a

b

a

b

a

b

a

0
0
0
0
0
0
0
0.011
0.032

0
0
0
0
0
0
0
0.021
0.021

0
0
0
0
0
0
0
0
0.021

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0.011
0
0.011

0
0
0
0
0
0
0.021
0.010
0.010

0
0
0.01
0
0
0.031
0.021
0.042
0.031

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0.011
0.021

0
0
0
0
0
0
0
0.021
0.021

0
0
0
0
0
0
0
0
0.021

0
0
0
0
0
0
0
0
0

b

0
0
0
0
0
0

a

0
0
0
0
0
0

0.032 0.011
0.043 0.022
0.021 0.021

b

0
0
0
0
0
0
0
0
0

a

b

a

b

a

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0
0
0
0.011
0
0.011
0
0.011 0.011 0.011 0.032 0.065
0.043 0.065
0.021
0.021
0
0.021 0.064
0.021 0.011 0.021

b

0
0
0
0
0
0
0
0
0

0
0
0
0
0.011
0.011
0.064
0.064
0.064

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0.021
0.021 0.021 0.011 0.021 0.010 0.021 0.021 0.053 0.021 0.053
0.010 0.074 0.021 0.074
0.031 0.032
0.063
0.021 0.074
0.021 0.031

0.031 0.010 0.031
0.031 0.010 0.031

0
0
0
0
0.011
0.021

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0

0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0.01
0
0
0.021

0
0
0
0
0.011
0.011
0.011
0.011
0.032
0.032
0.032
0.032 0.031 0.021 0.032
0.032 0.021 0.021 0.021 0.021 0.021 0.032 0.042 0.031 0.042
0.043 0.062 0.042 0.062
0.043 0.021 0.032 0.021 0.032 0.032
0.043 0.095 0.053 0.084
0.043 0.032 0.042 0.031 0.042 0.032

0
0
0.01
0
0.011
0.031

0
0
0.011
0.010
0.01

0
0
0
0
0
0

0
0
0
0
0

0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0.011
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0.012
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0.055
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0.011
0
0

0
0
0.01
0
0
0.010
0.021
0.03
0.042

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0.011
0.011
0.011 0.064
0.064
0.011 0.064

0

0
0
0
0
0
0

0
0
0
0
0
0.021
0.021 0.053
0.011 0.074
0.011 0.062

0
0
0
0
0

0
0
0.011
0.010
0.010
0.032 0.032
0.042 0.043
0.052 0.064
0.063 0.085

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0.032
0.043
0.021

0
0
0
0
0
0
0.011
0.011
0.010

0
0
0.011
0.011
0.021
0.043
0.032
0.043
0.043

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0.011
0.011
0.021

0
0
0
0
0
0
0
0
0

0.6

0.9

0.6

0.9

0.9

0.6

Averaged factor scores
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0
0
0
0
0
0
0
0.021
0.021
Factor scores of the LST Model
0
0.3
0
0
0
0
0
0
0.01
0.021

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.6

0.9

Fully speciﬁed model
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

to larger minimum detectable e ect sizes for the a and b pathways, or using weakly informative

priors in the Bayesian estimation.

47

Table 3.11 Power for 200 schools

S: 0.15

Pathway a

M:0.25

Pathway b

L:0.45

General Time Speciﬁc

S:0.05

M:0.15

L:0.25

S:0.05

M:0.15

L:0.25

S:0.05

M:0.15

L:0.25

a

b

a

b

a

b

a

b

a

b

a

b

a

b

a

b

Loading

Loading
Standardized Averages of the Mediator
0.3

b

a

0.6

0.9

0.6

0.9

0.6

0.9

0.6

Averaged factor scores
0.3

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0
0
0
0
0
0
0.01
0.01
0.01
Factor scores of the LST Model
0.3

0.9

Fully speciﬁed model
0.3

0
0
0
0
0
0

0
0
0
0
0
0

0
0.03
0.03
0.08
0.05
0.05
0.01 0.087 0.01
0.01
0.06
0.01
0.01
0.06
0.01

0
0.01
0.02
0.06
0.05
0.05
0.07
0.09
0.07

0
0.01
0.02
0.06
0.05
0.05
0.07
0.09
0.07

0
0.05
0
0.014
0
0
0
0
0

0
0
0
0
0
0
0.01
0.01
0.01

0
0
0
0
0
0
0.01
0.01
0.01

0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0.01
0.01
0.01

0
0
0
0
0
0
0
0
0

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.3
0.6
0.9
0.3
0.6
0.9
0.3
0.6
0.9

0.03
0.03
0.04
0.07
0.06
0.06
0.11
0.10
0.09

0.01
0.03
0.03
0.09
0.08
0.07
0.09
0.09
0.09

0.01
0.03
0.03
0.09
0.08
0.07
0.09
0.09
0.09

0
0
0
0
0
0
0.019
0
0

0
0
0
0
0
0
0.01
0
0.01

0
0
0
0
0
0
0.01
0
0.01

0
0
0
0
0
0
0.01
0
0.01

0
0
0
0
0
0
0
–
0

0
0.03
0
0.04
0
0.05
0
0.11
0
0.1
0.01
0.07
0.12
0.03
0.086 0.04
0.06
0.13

0
0.02
0
0.04
0
0.04
0
0.12
0
0.11
0
0.09
0.12
0.03
0.083 0.05
0.120 0.08

0
0.02
0
0.04
0
0.04
0
0.12
0
0.11
0.01
0.09
0.12
0.03
0.083 0.04
0.05
0.12

0.01
0.03
0.03
0.08
0.05
0.05
0.08
0.06
0.06

0
0.02
0.03
0.06
0.05
0.06
0.09
0.11
0.08

0.03
0.06
0.08
0.05
0.06
0.08
0.08
0.09
0.10

0.012
0.036
0
0
0.04
0
0
–
0

0
0
0
0
0
0
0
0
0

0
0.043
0
0.015
0
0
0.024
0
0

0
0
0
0
0
0.01
0.03
0.04
0.06

0
0
0
0
0
0
0.03
0.05
0.08

0
0
0
0
0
0.01
0.03
0.04
0.04

0
0
0
0
0
0
0
0
0

0.02
0.03
0.04
0.06
0.06
0.06
0.11
0.10
0.09

0.02
0.03
0.03
0.09
0.08
0.08
0.09
0.10
0.10

0.03
0.08
0.08
0.08
0.1
0.10
0.13
0.12
0.12

0.011
0
0
0
0.042
0
0
0
0

0
0
0
0
0
0
0.03
0.03
0.07

0
0
0
0
0
0
0.03
0.05
0.08

0
0
0
0
0
0.01
0.02
0.04
0.05

0
0
0
0
0
0
0
0
0

0.03
0.04
0.05
0.11
0.1
0.07
0.12
0.13
0.13

0.03
0.04
0.04
0.12
0.12
0.11
0.12
0.11
0.13

0.05
0.08
0.09
0.12
0.12
0.13
0.13
0.13
0.13

0
0.043
0
0
0
0
0.029
0
0

0
0
0
0.02
0.04
0.08
0.46
0.49
0.51

0
0
0
0.03
0.05
0.08
0.48
0.52
0.57

0
0
0
0.02
0.04
0.06
0.42
0.43
0.44

0
0
0
0
0
0
0
0
0

0
0.03
0.03
0.08
0.05
0.05
0.08
0.06
0.06

0.01
0.03
0.04
0.09
0.08
0.09
0.11
0.12
0.09

0.04
0.06
0.07
0.08
0.08
0.07
0.10
0.11
0.11

0
0
0
0.016
0
0
0
0
0

0
0
0
0.02
0.03
0.08
0.43
0.47
0.50

0
0
0
0.03
0.05
0.08
0.48
0.52
0.57

0
0
0
0.02
0.04
0.05
0.41
0.43
0.44

0
0
0
0
0
0
0
0
0

0.03
0
0.04
0.07
0.06
0.06
0.11
0.11
0.08

0.03
0.053
0.04
0.09
0.09
0.08
0.10
0.11
0.11

0.03
0.05
0.08
0.10
0.13
0.10
0.13
0.13
0.13

0
0
0
0.014
0
0
0
0
0

0
0
0
0.02
0.03
0.08
0.43
0.45
0.63

0
0
0
0.02
0.06
0.08
0.48
0.052
0.679

0
0
0
0.02
0.04
0.05
0.39
0.42
0.429

0
0
0
0
0
0
0
0
–

0.03
0.04
0.05
0.11
0.10
0.07
0.12
0.13
0.11

0.06
0.05
0.07
0.14
0.11
0.10
0.15
0.13
0.179

0.06
0.09
0.09
0.13
0.13
0.13
0.13
0.13
0.179

0
0
0
0
0.043
0
0
0
–

48

CHAPTER 4

EMPIRICAL STUDY

Following the simulation study, data from the Crafting Engaging Science Environments (CESE)

intervention is analyzed and compared using the same four estimation models. Data from the

CESE intervention include an intervention indicator, items from a summative assessment, and

items for the teacher observations measured at multiple time points with di erent raters during the

intervention as a mediating e ect for the CESE intervention.

The data for this is from the evaluation of the intervention tested in Michigan and California

in 2018-2019. The overall treatment e ect and preliminary mediation analyses were reported in

Schneider et al. (2022). For the treatment e ect and the mediation analysis, hierarchical linear

models were used with an equated (between chemistry and physics) standardized score as the

outcome and controlling for the pre-test with an estimated factor score. The two mediators explored

were a composite score of the teacher’s self-reported use of project-based learning measured from

their exit survey post-treatment and the students’ reporting of modeling from their exit survey.

Schneider et al. (2022) found that teachers’ incorporation of PBL did not signiﬁcantly mediate

the treatment e ect; however, students’ use of modeling did mediate the treatment e ect (at a

signiﬁcance level of 0.10) and accounted for about 28% of the treatment. This empirical study

reanalyzes the teacher use of project-based learning using longitudinal observation scores instead of

the post-treatment self-reported scores. This study ﬁrst explains the methods (including describing

the sample, the measures, the assumed mediation model, and the estimation methods used) and

then reports and compares the results of the mediation analysis across the four di erent estimation

methods.

4.1 Method

As noted previously, this study is the result of a cluster randomized control trial on the e cacy of

the CESE intervention. Before the start of the study in 2018, the 70 schools participating in the study

were randomized into the treatment and control conditions (within four di erent regional blocks).

Preceding the start of the intervention, 36 treatment and 34 control schools were used. Table 4.1

49

gives the ﬁnal analytic sample of 61 schools. At the beginning of the intervention, the treatment

teachers were provided with three days of professional learning on the CESE curriculum, and

the control teachers were provided with one day of professional learning on the Next Generation

of Science Standards (NGSS). Throughout the academic year, the CESE intervention included

three project-based learning, NGSS-aligned units in either chemistry or physics, along with end-

of-unit formative assessments. Through the professional learning, the provided units, and the

formative assessments as a holistic intervention, it was expected that teacher use of project-based

learning would increase along with NGSS-aligned teaching, which would increase student science

achievement. Schneider et al. (2022) give evidence that the intervention was e ective and increased

student achievement by about 0.20 standard deviations. The next question is whether the treatment

increased teacher use of project-based learning in their classroom and whether that increase mediates

the intent-to-treat treatment e ects.

Sample

In its entirety, during the intervention time, the following data were collected: teacher back-

ground survey, student background survey, student pretest, student Experience Sampling Method

(ESM) surveys, teacher ESM surveys, teacher observations, treatment student unit assessments,

teacher exit survey, student exit survey, student summative assessment, and school-linked data from

the Common Core of Data. Table 4.1 gives the number of beeps/observations, students, teachers,

and schools for all the di erent data collection points, as well as the analytic sample used to test

the main treatment e ect.

The ﬁnal analytic sample included 4,238 students in 102 teachers in 61 schools. Of these 102

teachers, 55 had one or more observations in 38 schools, just over half of the analytic sample.

This reduction in sample size may cause the sample to be underpowered for both the treatment and

mediating e ects. Regardless of whether the sample is powered, these data will still be used to

estimate all models that converge.

50

Table 4.1 Data collection for Students, Teachers, and Schools

Beep/observation Student Teacher School
6694

7009
273
108

6720
546

5435
5977

119
115
118
27
28
55
107
103
107

67
66
66
21
21
38
63
59
62
69
61

4238

102

Data source
Student Background
Teacher Background
Student Pretest
Student ESM
Teacher ESM
Teacher Observations
Teacher Exit Survey
Student Exit Survey
Student Summative Assessment
Linked CCD
Analytic Sample

Measures

The CESE observation protocol

During the CESE intervention, a subsample of teachers was observed 1-5 times throughout the

three units by 14 di erent observers. The instrument used for the observation had 13 items related

to teacher actions/behaviors and seven items related to student actions/behaviors. Each item was

scored on a scale from 1 to 4, with one indicating that a behavior or practice was not observed and

four indicating that the practice was observed entirely. For each item, the observer must give a

justiﬁcation for the scores. The ten teacher items can be divided into four di erent constructs. The

ﬁrst construct is related to teacher PBL practices and includes four items. The second construct is

the teacher’s support of social and emotional learning. The third construct is direct measures of

implementation ﬁdelity with two items, and the ﬁnal construct is classroom management with four

items. Table 4.2 reports the items and their constructs.

The CESE pretest

The CESE pretest was developed from 12 NAEP 8th-grade physical science items. Nine of these

items were multiple-choice, and four were free-response (one item included both a multiple-choice

and a free-response portion). This analysis will only use the multiple-choice items, as not all of

the free-response items are available. The multiple-choice items are loaded onto one construct of

science knowledge.

51

Table 4.2 CESE Teacher Observation Teacher Items and Constructs

Construct

PBL
Practices

Social and
Emotional
Learning
Fidelity of
Implementation

Classroom
Management

Item
Teacher’s use of DQ (Driving Question)
Teacher’s Support for Figuring Out
Teacher provides feedback to encourage students in using the SEPs and
CCCs to make sense of phenomena
Teacher’s use of discourse moves to engage in sensemaking
Teacher’s support of agency
Teacher’s support of persistence
Teachers support for collaboration and small group work
Lesson as Written
PBL Practices
Clear Evidence of Norms and Routines
Teacher supports Student work-overall monitoring of student work
Instructional sequencing and pacing
Behavior management

Table 4.3 CESE Pretest 1PL vs 2 PL vs 3PL

Model
1-PL
2-PL
2-PL
3-PL

AIC
64165.67
63613.77
63613.77
63526.74

BIC
64232.23
63733.60
63733.60
63706.48

Likelihood
-32072.83
-31788.89
-31788.89
-31736.37

LRT

df

p-value

5673.89

105.03

8

9

<0.001

<0.001

The psychometrician on the project evaluated whether the unidimensional IRT model of 1 PL,

2 PL, or 3 PL was the best ﬁt for these items. The likelihood ratio test results are reported in Table

4.3.

The 2-PL model was better than the 1-PL model, with an LRT of 567.89, df of 8, and a p-value

less than 0.001. The 3-PL model was better than the 2-PL model, with an LRT of 105.03, df of 9,

and a p-value less than 0.001.

After determining that the 3-PL model was the best ﬁt, the guessing, di culty, and discrimina-

tion parameters were estimated for the nine items, these are reported in Table 4.4.

The pretest has varying levels of di culty and discrimination. The guessing parameter falls

between approximately zero and 0.34, with some items having a close-to-no guessing probability

and some having a higher-than-probable guessing probability.

52

Table 4.4 CESE Pretest 3-PL model

Item
Item 1
Item 2
Item 3
Item 4
Item 5
Item 6
Item 8
Item 9
Item 10

Guessing Di culty Discrimination

0.29
0.336
0.332
0.295
0.002
0.275
0.002
0.289
0.004

1.149
-0.576
1.264
1.711
-1.007
0.959
-0.64
0.401
-1.004

1.168
1.544
2.076
1.92
1.011
0.895
1.675
1.384
1.753

Table 4.5 unidimensionality of the CESE summative assessment

1 factor vs 2 factor

1 factor vs bifactor

2 factor vs bifactor

Chemistry
j2 = 437.92
df = 1
p-value < 0.001
j2 = 1807.55
df=25
p-value<0.001
j2 = 1343.69
df=24
p-value<0.001

Physics
j2 = 142
df = 1
p-value < 0.001
j2 = 475
df=12
p-value<0.001
j2 = 273.432
df=11
p-value<0.001

The CESE summative assessment

The CESE summative assessment was split between physics and chemistry students, both

developed with items from the 11th grade science assessment of the Michigan Department of

Education. The chemistry assessment had 13 questions, with some having sub-items, and the

physics assessment had six questions, with some also having sub-items. Because the questions

with sub-items did not require the student to get the ﬁrst part correct to get the second part correct,

each is treated as its own item. The items were checked for unidimensionality for the chemistry

and physics summative assessments. A one-factor conﬁrmatory factor analysis was compared to

a two-factor conﬁrmatory factor analysis, then both the one factor and two factor models were

compared to a bifactor model. The model comparison results for the one factor vs. two factors vs.

bifactor are reported in Table 4.5.

The bifactor model outperforms the 1-factor and 2-factor models for both chemistry and physics.

53

Table 4.6 Bifactor model ﬁt of the CESE summative assessment

RMSEA

CFI
TLI

Chemistry
0.036
[0.034,0.037]
0.949
0.938

Physics
0.018
[0.017,0.018]
0.993
0.926

Note: 95% conﬁdence intervals for the RMSEA are in brackets.

Table 4.7 CESE Summative Assessment 1PL vs 2 PL vs 3PL

Model

AIC

BIC

Likelihood Scaling factor Parameters

Di erence
scaling correction

LRT

df

p-value

Chemistry
1-PL
2-PL
2-PL
3-PL
Physics
1-PL
2-PL
2-PL
3-PL

124131.836
120154.937
120154.937
119696.482

124310.418
120633.282
120633.282
120334.275

-62037.918
-60002.469
-60002.469
-59748.210

26290.527
25949.374
25949.374
25741.935

26373.125
26147.612
26147.612
26006.251

-13130.263
-12938.687
-12938.687
-12822.967

0.983
1.090
1.090
1.054

1.009
1.024
1.024
0.839

28.000
75.000
75.000
100.000

15.000
36.000
36.000
48.000

1.155

0.944

1.035

0.284

3525.561

47.000 <0.001

538.456

25.000 <0.001

370.129

21.000 <0.001

816.079

12.000 <0.001

The model ﬁt indices for the two bifactor models are reported in Table ??.

The RMSEA values and conﬁdence intervals for chemistry and physics fall less than 0.05,

and their CFI and TLI values are greater than 0.90, indicating a good model ﬁt for the bifactor

model. Following ensuring that the bifactor model was a good ﬁt for the summative assessment, the

1-PL, 2-PL, and 3-PL bifactor IRT models were compared for chemistry and physics. This model

comparison for the chemistry and physics model ﬁt between 1-PL, 2-PL, and 3-PL is reported in

Table 4.7.

There were issues in the estimation for the 3-PL models for both physics and chemistry. This led

to additional constraints in the physics 3-PL model and likely incorrect estimates in the chemistry

3-PL model. Because of this, even though the 3-PL models were a better model ﬁt than the 2-PL

models, the 2-PL models were chosen as the better model for the physics and chemistry items. Table

4.8 provides the item discriminations and di culties for the summative assessment for chemistry

and physics. Using the IRT bifactor model parameters, the reliabilities were estimated (Raykov

et al., 2010). The reliability estimate for the chemistry summative assessment general factor was

0.824, and the reliability for the physics general factor was 0.853.

54

Table 4.8 CESE Summative Assessment 2PL parameters

Physics
0⌧
1.9652
1.8462
3.1433
3.2861
1.6354
1.4093
1.1305
2.2729
1.190
1.5623
1.0812
2.1539

01
-0.0765
-0.5287
-0.0629
-0.1258
-0.2329
-0.1292
1.7425
1.3821
1.5674

02 Di culty
0.680
-0.2771
2.0009
-2.159
-0.3689
0.4114
1.5538
2.006
1.7867
0.6018
0.8993
3.4578

4.318
2.9665
0.6001

01
0.3672
7.3542
15.6859
-0.0221
0.2482
0.3893
0.1802
0.4403
0.0221
0.1921
2.5942
1.5555

Chemistry
0⌧
1.0948
4.2228
8.7057
0.816
1.6116
2.2712
0.2448
2.0451
0.5219
1.8394
3.1501
2.0638
0.5389
1.1492
1.615
0.0493
1.3702
1.5402
0.5389
-0.2686
-0.0323
-0.3128
1.5096
2.3324
3.451

02 Di culty
0.6375
-8.5085
-17.3281
-0.4267
1.0336
0.6834
0.7939
-2.1981
1.6898
-0.6987
6.6198
4.8178
1.7918
-1.4331
-0.4199
1.0064
-0.1326
2.7676
0.2346
0.8228
1.4144
1.7391
0.0748
2.8458
6.0367

0.1275
0.3502
0.0782
0.3587
0.3655
0.4471
0.3995
0.2227
2.9087
2.8866
0.2142
0.2533
0.2227

Item
Item 1
Item 2
Item 3
Item 4
Item 5
Item 6
Item 7
Item 8
Item 9
Item 10
Item 11
Item 12
Item 13
Item 14
Item 15
Item 16
Item 17
Item 18
Item 19
Item 20
Item 21
Item 22
Item 23
Item 24
Item 25

Analysis

Assumed Mediation Model

The assumed mediation model for this study is shown in Figures 4.1-4.3 with the following

assumed models to be estimated:

Level 1:

2⌘4<8 9 :; =

?⌘H8 9 :; =

D2
80:;+

\2F

5 9 :;) 

12
82:;)

4 (

1

+

?A4C4BC8 9 :; = 28

1

+ (

+
1

1 ?
82:;)

\ ?F
5 9 :;) 
1
06
81:; (

D6
80:;+

\6F
9 :; ) 

16
82:;)

1
02
81:B (
1
0 ?
81:; (

D ?
80:;+

4 (

28

 

)

1

4 (

+

55

⌧ 9 :; = V2F
\2F

0 +

1 \6F
V2F

9 :; +

n 2F
9 :;

⌧ 9 :; = V ?F
\ ?F

0 +

1 \6F
V ?F

9 :; +

n ?F
9 :;

Level 2:

_8C1:;b F

:; +

X8C2BZC:;

n "
8C:;

+

"8C:; = U8C0;

+
80:; = D2
D2

80:; = D ?
D ?

800; +

80:; = D6
D6

800; +

\21:

5 :; ) 

12
802;

\ ?1:
5 :; ) 
\61:
:; ) 

1 ?
802;

16
802;

800; +

02
801; (
0 ?
801; (
06
801; (
81:; = 02
02

810;

81:; = 0 ?
0 ?

810;

81:; = 06
06

810;

82:; = 12
12

820;

82:; = 1 ?
1 ?

820;

82:; = 16
16

820;

⌧ :; = V21:
\21:

0 +

V21:
1

b F
:; +

V21:
2

⌧ :; = V ?1:
\ ?1:

0

V ?1:
1

b F
:; +

+

Level 3:

V ?1:
2

\61:
:; +
\61:
:; +

n 21:
:;

n ?1:
:;

U8C0; = U8C00 +

_8C01b 1

; +

h"
8C0;

_8C1; = _8C10

X8C2; = X8C20

800; = D2
D2

8000 +

800; = D ?
D ?

8000 +

800; = D6
D6

8000 +

02
8001(
0 ?
8001(
06
8001(

\21;

5 ;  

\ ?1;
5 ;  
\61;
;  

12
8;)
1 ?
8;)
16
8;)

56

801; = 02
02

8010

801; = 0 ?
0 ?

8010

801; = 06
06

8010

802; = 12
12

8020

802; = 1 ?
1 ?

8020

802; = 16
16

8020

810; = 02
02

8100

810; = 0 ?
0 ?

8100

810; = 06
06

8100

820; = 12
12

8200

820; = 1 ?
1 ?

8200

820; = 16
16

8200

b 1
; = W0 +
1b 1
V2

; +

0 +

⌧; = V2
\21;

V2
2);

W1);

n

+

⌧; = V ?
\ ?1;

0 +

V ?
1 b 1

; +

V ?
2 );

3\61;
V2

; +
3 \61;
V ?

; +

h2

h ?

+

+

At the student level, 2⌘4<8 9 :; and ?⌘H8 9 :; are the chemistry and physics summative assessment

item i for student j in teacher k in school l. \2F

5 9 :; and \ ?F

5 9 :; are the vector of latent factors for

the chemistry and physics abilities within a student. 02

8100 and 0 ?

8100 are the student level vector of

8200
8200 are the student level chemistry and physics item di culties for item i. ?A4C4BC8 9 :; is the

discrimination parameters for the chemistry and physics summative assessment items, and 12
and 1 ?
general science pretest items; \6F
8100, the student level pretest item discriminations; 16
06
and 28, the pretest guessing parameter for item i. Finally, at the student level, V2F
1

9 :; is the general science ability for student j in teacher k in school l;

8200, the student level pretest item di culties;

is the relationship

57

between the general science ability from the pretest and the general chemistry ability, and V ?F
1

is

the relationship between the general science ability from the pretest and the general physics ability.

At the teacher level, "8C:; is the teacher observation PBL item i at timepoint t for teacher k in

school l. b F

:; is the teacher PBL trait for teacher k in school l, and ZC:; is the teacher PBL state

PBL trait and states, respectively. \21:

at timepoint t for teacher k in school l. _8C10 and X8C20 are the teacher-level factor loadings for the
5 :; , and \61:
8010, and 06

are the between teacher chemistry, physics,

5 :; , \ ?1:
8010, 0 ?

:;

8020, the between teacher item di culty parameters. V21:

1

8010, the between teacher item discrimination
and V ?1:

and general science latent abilities; 02
8020, 1 ?

8020, and 16

parameters; 12

1

are the relationships between the teacher’s PBL trait and the teacher-level general chemistry and

physics ability. V21:

2

and V ?1:

2

are the relationships between teacher general science ability and

teacher general chemistry and physics ability respectively.

At the school level, b 1
;

is the between school teacher PBL trait for school l, and _8C01 is the

between school factor loadings for the teacher PBL trait. \21B

school chemistry, physics, and general science latent abilities; 02

school item discrimination parameters; 12

8002, 1 ?

8002, and 16

;

5 ;

are the between

5 ; , \ ?1B
8001, 0 ?

, and \61B
8001, and 06
8002, the between school item di culty

8001, the between

parameters. W1 is the treatment e ect on the between-school teacher PBL trait. V2

1 and V ?

1 are

the relationships between the between school teacher PBL trait and the between school general

chemistry and physics ability. V2

2 and V ?
3 and V ?
science ability and the between school general chemistry and physics ability.

chemistry and physics ability. V2

3 are the relationships between the between school general

2 are the treatment e ects on the between school general

Finally, the estimated mediation of the teacher PBL trait is W1 ⇥
Because of the complexity of estimating a model that includes two di erent 2-PL bifactor

V2
1 and W1 ⇥

V ?
1 .

outcomes where a di erent set of students have di erent measures (chemistry vs physics) and

a 3-PL pretest measure, this study uses equated outcome scores and pretest factor scores which

correspond to the estimation conducted in Schneider et al. (2022). The online appendix also

describes the equating process in Schneider et al. (2022). This simpliﬁes the ﬁnal estimated

mediation model to:

58

Figure 4.1 CESE multiple timepoint mediation model within

59

Figure 4.2 CESE multiple timepoint mediation model between teacher level

60

Figure 4.3 CESE multiple timepoint mediation model between school level

61

Level 1:

Level 2:

Level 3:

B284=2402⌘84E4<4=C = V0:;

V1:; ˆ\F6

9 :; +

+

n 9 :;

"8C:; = U8C0;

V0:; = V00;

_8C1;b F

:; +

X8C2; ZC:;

n "
8C:;

+

V01;b F

:; +

V02; ˆ\ 16

:; +

A0:;

+

+

V1:; = V10;

U8C0; = U8C00 +

_8C01b 1

; +

h"
8C0;

_8C1; = _8C10

X8C2; = X8C20

V00; = V000 +

V001);

D00;

+

V10; = V100

V01; = V010

V02; = V020

b 1
; = W0 +

W1);

n

+

Where, at the student level, V100 is the relationship between the general science ability factor

scores from the pretest ( ˆ\F6

9 :;) with the equated physics and chemistry science scores.

At the teacher level, "8C:; is the teacher observation PBL item i at timepoint t for teacher k in

school l. b F

:; is the teacher PBL trait for teacher k in school l, and ZC:; is the teacher PBL state

at timepoint t for teacher k in school l. _8C10 and X8C20 are the factor loadings for the teacher-level

PBL trait and states, respectively. V010 is the relationship between the teacher PBL trait and the

equated physics and chemistry science scores and V020 is the relationship between the between

teacher general science ability factor scores and the equated physics and chemistry science scores.

62

At the school level, b 1
;

is the between school teacher PBL trait for school l, and _8C01 is the

between school factor loadings for the teacher PBL trait. V001 is the treatment e ects on the equated

physics and chemistry science scores.

Finally, the estimated mediation of the teacher PBL trait is W1 ⇥

V010. Figure 4.4 depicts this

simpliﬁed mediation model.

Model Comparison

The teacher observation measure of teacher PBL practices will be compared across three

di erent models: a unidimensional model, which assumes no state factors; a three-factor model,

which assumes no trait factor; and the bifactor model, which allows for both state and trait factors

of the teacher mediation. In all of these models, time-invariant factor loadings are not assumed

as is often assumed in latent trait state theory. The reason for not assuming time invariant factor

loadings is because these measures were collected by observers who may themselves be inﬂuenced

by di erent times and thus a ect the loadings at various time points. Similarly, the trait factor

loadings were not assumed to unity. Because of the small sample size (n = 55 for 36 to 48 free

parameters), these measurement models will be estimated using Bayes with non-informative priors.

These models were then compared through their deviance information criterion (Spiegelhalter

et al., 2002, DIC), bayesian information criterion (Schwarz, 1978, BIC), and posterior predictive

credible intervals and p-values (Gelman et al., 1996). The DIC and BIC values are purely for model

comparison, with smaller DIC and BIC values indicating a more robust model ﬁt. The posterior

predictive credible intervals (CI) and p-values indicate general model ﬁt, with a CI that includes

zero and a p-value close to 0.5 indicating strong model ﬁt.

Estimation

This mediation model will be estimated using the four di erent models examined in the sim-

ulation study: standardized averages of the mediator across time points (3-2-1 mediation); factor

scores for each time point averaged across time points (ignoring the trait factor; 3-2-1 mediation;

factor scores from the LST as the mediator (3-2-1 mediation); fully speciﬁed model (Multilevel

SEM).

63

Figure 4.4 CESE Simpliﬁed Multilevel Mediation Model

64

Before delving into the mediation model, the treatment e ect estimated in Schneider et al.

(2022) will be reestimated using Bayesian multilevel modeling to replicate the ﬁndings. This will

be done with the full sample that includes all teachers, even those without the observations, and

then with the limited sample, which includes only the 55 teachers who had at least one observation.

After estimating the treatment e ect, the following four mediation models will be estimated.

Standardized average of the mediator across time points: As was deﬁned in Chapter 2, the

averages of the mediator across the time points are deﬁned as:

¯":; =

1
12

C=1⌃8=4
⌃C=3

8=1"8C:;

Then, the following is estimated:

Level 1:

Level 2:

Level 3:

B284=2402⌘84E4<4=C = V0:;

V1:; ˆ\F6

9 :; +

+

n 9 :;

V0:; = V00;

V01; ¯"

V02; ˆ\ 16

:; +

+

ˆA0:;

+

V1:; = V10;

¯":; = W0B

n:;

+

V00; = ˆV000 +

ˆV001);

ˆD00;

+

V10; = ˆV100

V01; = ˆV010

V02; = ˆV020

W0; = ˆW00 +
Then ˆW01 is the estimated a pathway, ˆV010 is the estimated b pathway, and ˆW01 ⇥

ˆW01);

ˆA0;

+

ˆV010 is the

indirect e ect of the PBL practices on the treatment e ect.

65

Factor scores for each time point averaged across time points: Also deﬁned in Chapter 2, the

factor for each time point will be deﬁned as:

"8C = U8C

ˆX8C1 ˆZC:;

n "
8C

+

+

¯Z:; =

1
3

⌃C=3
C=1

ˆZC:;

Where ˆZC:; is the estimated factor score at time point t and ¯Z:; is the average of those factor

scores across the three time points. Then, the following mediation model is estimated:

Level 1:

Level 2:

Level 3:

B284=2402⌘84E4<4=C = V0:;

V1:; ˆ\F6

9 :; +

+

n 9 :;

V0:; = V00;

V01; ¯Z:;

+

V02; ˆ\ 16

:; +

+

ˆA0:;

V1:; = V10;

¯Z:; = W0;

n:;

+

V00; = ˆV000 +

ˆV001);

ˆD00;

+

V10; = ˆV100

V01; = ˆV010

V02; = ˆV020

W0; = ˆW00 +

ˆW01);

ˆA0;

+

Where once again, ˆW01 is the estimated a pathway, ˆV010 is the estimated b pathway, and ˆW01 ⇥

ˆV010

is the indirect e ect of the PBL practices on the treatment e ect.

Factor scores from the LST as the mediator: Again, as was deﬁned in Chapter 2, the factor

scores from the LST are estimated by:

"8C:; = ˆU8C00 +

ˆ_8C10 ˆb:;

ˆX8C20ZC:;

n8C:;

+

+

66

ˆb:; is the estimated factor score for the teacher PBL practices trait. Then, the mediation model

is estimated:

Level 1:

Level 2:

Level 3:

B284=2402⌘84E4<4=C = V0:;

V1:; ˆ\F6

9 :; +

+

n 9 :;

V0:; = V00;

V01; ˆb:;

+

V02; ˆ\ 16

:; +

+

ˆA0:;

V1:; = V10;

ˆb:; = W0;

n:;

+

V00; = ˆV000 +

ˆV001);

ˆD00;

+

V10; = ˆV100

V01; = ˆV010

V02; = ˆV020

W0; = ˆW00 +

ˆW01);

ˆA0;

+

Where once again, ˆW01 is the estimated a pathway, ˆV010 is the estimated b pathway, and ˆW01 ⇥

ˆV010

is the indirect e ect of the PBL practices on the treatment e ect.

Fully speciﬁed model: The simpliﬁed model on page 42 is estimated as follows:

Level 1:

Level 2:

B284=2402⌘84E4<4=C = V0:;

V1:; ˆ\F6

9 :; +

+

n 9 :;

"8C:; = U8C0;

V0:; = V00;

_8C1;b F

:; +

X8C2; ZC:;

n "
8C:;

+

V01;b F

:; +

V02;

ˆ\ 16
:; +

ˆA0:;

+

+

V1:; = V10;

67

Level 3:

U8C0; = ˆU8C00 +

ˆ_8C01b 1

; +

h"
8C0;

_8C1; = ˆ_8C10

X8C2; = ˆX8C20

V00; = ˆV000 +

ˆV001);

ˆD00;

+

V10; = ˆV100

V01; = ˆV010

V02; = ˆV020

b 1
; = ˆW0 +

ˆW1);

n

+

These models were estimated using Bayesian structural equation modeling with non-informative

priors on the coe cients and informative prior on the school level and teacher level variances on

the equated summative assessment and school level variance on the teacher PBL practices to

increase power and convergence in this smaller sample size. For the school level variance for the

equated summative assessment, the prior was set as school level variance follows an inverse gamma

distribution (Anderson, 2007) with U = 2.25 and V = 0.19 which yields a mean of the distribution at

0.154 (corresponding to an ICC of 0.154 since the outcome variable is standardized with a variance

of 1 Spybrook et al., 2022) and the teacher level variance follows an inverse gamma distribution

with U = 2.25 and V = 0.155 which yields a mean of the distribution at 0.124 (corresponding to

an ICC of 0.124 Spybrook et al., 2022). For the school level variance on teacher PBL practices,

the prior follows an inverse gamma distribution with U = 2.12 and V = 0.12, which yields a mean

of the distribution at 0.11 (corresponding to an ICC of 0.11 since the teacher PBL practices are

either standardized with variance of one or the latent variable is constrained to have a variance of

1Westine et al., 2020). Similar to the simulation study, the Gibbs sampler and the default Mplus

settings were used to determine convergence (see pages 38-39). Again, similarly, in estimating the

factor scores from the LST model and the fully speciﬁed model, the teacher-level PBL practices

68

Table 4.9 Model Comparison for the Teacher Observations

Posterior Predictive Checking

Model
1-factor
3-factor
Bifactor

DIC
1003.311
974.319
991.967

BIC

95% CI

1080.395 [-29.262,58.977]
1058.475 [-36.542,43.055]
1089.647 [-34.509,44.831]

p-value
0.312
0.444
0.452

trait and time-speciﬁc factors variance were ﬁxed to 1, and the correlation between them was ﬁxed

to 0 for identiﬁcation of the model. The Mplus code for the four di erent estimation methods can

be found in Appendix B.

4.2 Results

Model Comparison

Across the three di erent models for the teacher observations, all three models had acceptable

posterior predictive credible intervals and p-values (0 inclusive and not too far from 0.5 respectively),

although the 3-factor solutions had a p-value closer to 0.5 than the 1 factor, indicating stronger ﬁt

and the bifactor had a p-value closer to 0.5 than both the1-factor and 3-factor models, although not

substantially closer compared to the 3-factor model. The 3-factor model had the lowest DIC and

BIC values compared to the 1-factor and bifactor models; however, the DIC and BIC values of the

3-factor were not substantially lower than the bifactor model (around a 20-30 di erence). These

results are displayed in Table 4.9.

Because the theoretical framework for the bifactor model (allowing for a trait factor of PBL

practices as the mediator) ﬁts better to the theoretical mediation model assumed compared to the

3-factor model (assuming no trait factor of PBL practices and only state factors). Since the models

performed comparably, the bifactor model is deemed to be an adequate model of the teacher PBL

practices. Both the 3-factor and bifactor factor loadings are reported in Table 4.10.

Here, the 3 factor loadings are similar to the state-speciﬁc factor loadings of the bifactor model

(which are slightly lower in some cases). Most of the time speciﬁc factor loadings are medium sized

factor loadings with some being low and some high. Most of the factor loadings for the trait factor

are small (between 0.1 and 0.4), with some being medium. This aligns with the model comparisons

69

Table 4.10 Factor loadings for 3 factor and bifactor models

3 factor model

Bifactor model

Time point 1 factor Time point 2 factor Time point 3 factor Time point 1 factor Time point 2 factor Time point 3 factor Trait factor

0.590
0.749
0.647
0.738

0.499
0.737
0.434
0.990

0.582
0.714
0.527
0.589

0.512
0.558
0.346
0.707

0.816
0.681
0.216
0.731

0.133
0.216
0.322
0.569
0.256
0.545
0.286
0.621
0.285
0.408
0.256
0.056

0.636
0.638
0.183
0.708

Item
Timepoint 1 Item 1
Timepoint 1 Item 2
Timepoint 1 Item 3
Timepoint 1 Item 4
Timepoint 2 Item 1
Timepoint 2 Item 2
Timepoint 2 Item 3
Timepoint 2 Item 4
Timepoint 3 Item 1
Timepoint 3 Item 2
Timepoint 3 Item 3
Timepoint 3 Item 4

where the bifactor does not signiﬁcantly increase model ﬁt compared to the three-factor model.

Here, in this latent state-trait bifactor model, more variance is explained at the state level than at

the trait factor.

Estimation of Mediation E ects

The estimate of the treatment e ects for the full sample and the limited sample are reported in

Table 4.11. These e ects are comparable to each other, with an estimated impact of 0.198 in the

full sample and 0.186 in the limited sample. However, the limited sample is underpowered and

unable to detect a treatment e ect at the 5% level. These results are comparable to the estimated

treatment e ect found in Schneider et al. (2022) although slightly smaller in magnitude.

All four estimation models converged in estimating the four mediation e ect estimation methods.

Mostly, the estimates for the a, b, and product of a and b are consistent across the full and limited

samples (outside of the average sum scores for pathways a and b). Almost none of the pathways nor

the product of a and b were signiﬁcant at the 5% level, which is unsurprising for a sample size of 61

schools. Only pathways a for the full sample in the average sum scores were signiﬁcant, possibly

due to chance (or bias). Outside of the average sum scores, pathway b and the product of a and

b are both practically zero (although some of the credible intervals for pathway b are quite wide).

These results are reported in Table 4.12.

The null mediation results may be a result of several factors. The ﬁrst is that this study is not

powered to detect these a and b pathways, as was found in Chapter 3. The second regards the

teacher observation measure itself. With small trait loadings and medium state loadings with only

70

Table 4.11 Treatment e ects of the full and limited sample

Parameter
Treatment

Region Fixed E ects

Teacher level pretest factor

Chemistry

student level pretest factor

student level variance

teacher level variance

school level variance

Full Sample
Estimate
0.198
(0.097)
[0.028,0.384]
0.002
(0.003)
[-0.005,0.008]
0.414
(0.124)
[0.163,0.683]
-0.504
(0.071)
[-0.24,-0.323]
0.279
(0.016)
[0.253,0.313]
0.722
(0.016)
[0.693,0.754]
0.092
(0.021)
[0.062,0.152]
0.032
(0.012)
[0.021,0.078]

Limited Sample
Estimate
0.186
(0.159)
[-0.064,0.549]
0.004
(0.006)
[-0.009,0.014]
0.716
(0.188)
[0.409,1.191]
-0.394
(0.121)
[-0.618,-0.143]
0.299
(0.024)
[0.255,0.348]
0.792
(0.024)
[0.751,0.847]
0.094
(0.030)
[0.053,0.170]
0.042
(0.021)
[0.019,0.101]

N
36
School
49
Teacher
2113
Student
Note: posterior standard deviations are in parentheses and
95% credible intervals are in brackets.

61
102
4238

71

Table 4.12 Mediation e ects of the four estimation Methods

Estimation
Full Sample
Average sum scores

Average factor scores

Bifactor factor scores

Fully speciﬁed model

Limited Sample
Average sum scores

Average factor scores

Bifactor factor scores

Fully speciﬁed model

a

b

-3.195
(1.396)
[-6.003,-0.545]
0.232
(0.204)
[-0.127,0.656]
0.055
(0.095)
[-0.139,0.241]
0.074
(0.214)
[-0.370,0.416]

17.281
(8.369)
[-0.106,35.486]
0.512
(0.383)
[-0.186,1.249]
0.234
(0.186)
[-0.137,0.616]
0.091
(0.243)
[-0.422,0.505]

0.262
(0.351)
[-0.374,1.060]
-0.003
(0.039)
[-0.081,0.071]
-0.071
(0.096)
[-0.264,0.116]
-0.025
(0.099)
[-0.218,0.173]

0.303
(0.415)
[-0.462,1.176]
-0.003
(0.065)
[-0.128,0.125]
-0.060
(0.147)
[-0.356,0.227]
-0.011
(0.146)
[-0.294,0.279]

Indirect e ect
(a*b)

-0.693
(1.386)
[-4.630,1.202]
0
(0.014)
[-0.032,0.028]
-0.002
(0.013)
[-0.034,0.021]
0
(0.021)
[-0.042,0.049]

3.952
(8.713)
[-8.146, 26.531]
-0.001
(0.042)
[-0.087,0.090]
-0.007
(0.046)
[-0.120,0.068]
0
(0.036)
[-0.084,0.075]

Note: posterior standard deviations are in parentheses and 95% credible
intervals are in brackets.

72

60 schools, the simulations from Chapter 3 indicate that there was a high likelihood of negative

bias, which may be occurring here. The second issue is that this model assumes unconfoundedness

after accounting for the covariates (here, student and teacher-level pretest factor scores). There

may be additional variables that confound the relationship between the teacher’s PBL practice trait

and the student science test score that have yet to be accounted for. It may also be the case that

the PBL practices are not mediating the treatment e ects, which would replicate the ﬁndings from

Schneider et al. (2022), which found null e ects for teacher-reported incorporation of PBL as a

mediator.

73

CHAPTER 5

DISCUSSION

5.1 Contributions

Expanding the 3-2-1 mediation model to incorporate a latent state-trait model gives a new

model for estimating mediators that are assumed to be latent variables and estimated at multiple

time points. This is particularly pertinent to education studies that use teacher practices observed

at multiple time points or teacher ESM data as mediators for the treatment.

The simulation study shows how probable each estimation method may be and the bias and

power tradeo  when choosing the method in these 3-2-1 latent trait state mediation models. In

general, none of the methods were powered for any sample sizes investigated in this simulation

study. This suggests that when using longitudinal mediators in a 3-2-1 model such as this one, a

researcher will need a much larger sample size (more than 200) or other ways to increase power,

such as using informative priors with Bayesian methods (such as was done in chapter 4) along with

sum scores in reliable unidimensional outcome measures (Widaman and Revelle, 2023, which has

shown to be unbiased in many situations) to decrease model complexity.

An additional essential consideration that arises from the simulation study is the importance

of having a reliable measure with both strong construct and content validity. Essentially, it is

a measure that is more consistent across time and less time-speciﬁc. When researchers plan an

evaluation where they want to evaluate longitudinal mediators, they may want to emphasize the

selection of these particular measures to ensure that their measures meet this criterion. Additionally,

it is recommended that the developer of these measures report the psychometric properties so that

researchers can choose their measures appropriately. If no measure is available, researchers can

follow standard protocols of ﬁeld testing the measure before using them to ensure that they will be

adequate to answer the research question of interest.

When using a small sample of schools (30 or 60) with a measure that has medium or high

general loadings, using factor scores of the latent state-trait model is the best option. However,

the b path may still be biased, especially with small b pathway e ects. Researchers may want to

74

consider using bias correction techniques such as Croon’s (Kelcey et al., 2021), which were not

part of the investigation in this study.

The empirical study contributes to understanding how realistic these model estimation methods

are. Because of the complexity of the latent model of the outcome and pretest measures, this

empirical study used a standardized equated outcome and factor scores for the pretest, which may

have added bias to the four estimation models. However, all four models converged, indicating

that these methods can be used in an empirical setting. However, this does come with some

caveats. First, with 60 schools, as was seen in the simulation study, this empirical study was not

powered to detect reasonable pathway a or b e ects. This empirical study highlighted additional

considerations when researchers plan on using longitudinal teacher observations as the mediator,

such as rater e ects and adequate variance in the mediator, and how these considerations may a ect

the estimation of the mediation.

Implications for Education Research

As education research, along with other social study ﬁelds, face issues of replication (Wiliam,

2022), researchers are faced with the question of how the treatment e ect works, often leading to

an investigation into the mediation e ects of the intervention. Even in reasonably non-complex

multilevel models, power for these estimation methods requires either many schools (

200) or

 

very high a and b pathways (

 

0.5; Kelcey et al., 2020. However, e ect sizes in education research

tend to be small and thus even more so underpowered (Wiliam, 2022). As researchers move from

testing the e cacy of their study into scaling their study, they, as well as funding agencies, should

expect a dramatic increase in the number of schools required to investigate how the treatment

works through mediation. Additionally, as using informative Bayesian priors in the estimation of

these mediation e ects may increase the power, researchers should aim to collect preliminary data

during the development and e cacy stages of their study to estimate informative priors for their

scale-up investigation of the mediators. In addition, the education research community should come

together to provide appropriate priors for various parameters in di ering education research studies,

including but not limited to the mediator a and b pathways when available. Finally, many education

75

intervention studies occur over a speciﬁed period (such as a semester, an academic year, or multiple

academic years), lending to the need for longitudinal mediator measures. In these circumstances,

researchers need to invest time and resources into considering the appropriate longitudinal model

for these mediation methods and choosing and/or designing the appropriate longitudinal measure.

Does a LST model theoretically ﬁt the mediation measures the best, or does a latent change model

make more sense? Are the measures loading strongly enough onto the latent factors to minimize

bias in the mediation estimation? These are questions researchers should consider as they design

their study with longitudinal mediation. Furthermore, resources should be invested in developing

and validating strong longitudinal measures for common mediators that may be investigated across

numerous education intervention studies.

5.2 Limitations

Several limitations exist to the proposed model, simulation study, and empirical study. Begin-

ning with the proposed bifactor model for estimating the latent state-trait mediator, this model was

chosen because it does not require time invariance nor does it necessarily require equal distance

between the time points, which other longitudinal models assume; however, this model breaks

down quickly with intensive longitudinal data. As time point C

, the bifactor model will have

! 1

a number of factors, n, that approaches

1+

1. Depending on the sample size of individuals, this

may cause identiﬁcation issues rather quickly. The current model, which incorporates the bifactor

latent state-trait model into multilevel structural equation modeling for mediation, may be able to

be expanded to latent state-trait models that are more appropriate for intensive longitudinal data

(Geiser, 2020).

There are numerous limitations to the simulation study that were beyond the scope of it. This

simulation considers ideal conditions in a multilevel structural equation mediation analysis. It ﬁrst

considers balanced designs with equal number of treatment and control clusters and equal numbers

of level 2 and level 1 observations within each cluster. Additionally, this study does not vary the

number of level 2 and level 1 observations to see how the di erent estimation methods’ bias, power,

or convergence may di er by di erent numbers of observations at various levels (the current study

76

only varies the level 3 sample sizes). One goal of this simulation study was to understand how

bias and power are a ected by di erent levels of the general and speciﬁc loadings in the bifactor

latent state-trait model. However, this led to consistent sizes of loadings across the general and

speciﬁc loadings, respectively, which is only partially realistic to what may be seen in the actual

studies. Studies may have measures where certain items are stronger than others, as was seen in

the empirical study in Chapter 4. It may also be interesting to understand how the proportion of

items as strong vs. medium vs. weak a ects the bias and power of the estimation methods.

Additionally, this simulation study had issues with any methods at any sample size being

powered enough to detect the true e ects. With level 3 cluster sample sizes of 30, 60, and 200,

these are sizes that might be expected in large cluster randomized control trials; however, given

that no method was powered with these sample sizes, additional sample sizes or larger e ect

sizes may need to be investigated to understand at what threshold a study may be powered to

detect these longitudinal mediation e ects. Additionally, these simulation studies investigated

these estimation methods using non-informative Bayesian multilevel structural equation methods;

power may be increased by incorporating slightly informative priors. The simulations also did not

include covariates in the model, which may increase the power for the a and c’ pathways; however,

these covariates could decrease the e ciency of the b pathway (Shen et al., 2024). This leads to

another signiﬁcant limitation of this simulation study: the exclusion of covariates and/or confounder

variables. This mediation model assumes unconfoundedness between all the variables (conditional

on any covariates). The treatment to the mediator, pathway a, is assumed to be unconfounded

through random treatment assignment. However, the current simulation also assumes that the

mediator-to-outcome relationship is unconfounded, which may not be reasonable in most studies.

Additional research on the magnitude of added bias when including no confounders but covariates,

ignoring varying levels of essential confounders, and including confounders in the model may give

additional insights into bias and power for the di erent estimation methods.

Finally, this simulation does not consider any bias-corrected estimation methods when using

the factor scores of the latent state-trait model. These models have shown to be unbiased and have

77

fewer issues with converging (Kelcey et al., 2021). This addition may provide a more plausible

method of investigating these longitudinal mediation e ects for researchers.

The empirical study has several limitations. The ﬁrst is the limited sample size and the mediator

teacher measure of PBL practices. With a cluster sample size of 61 and a measure with small trait

loadings, the estimated a and b pathways were expected to be biased based on the simulation study

from Chapter 3. In addition to the bias from the small sample size and small factor loadings, other

areas may have added bias to the estimation for the empirical study. There may have been missing

confounders that were not considered in the model that included only the student- and teacher-level

pretest. Also, using equated standardized scores for the outcome measure may have introduced

bias into the b and c’ pathways. Finally, ignoring the e ects of the raters may have added bias to

estimating all pathways.

5.3 Future Directions

As noted in the limitations section, this work has several avenues for future research. The ﬁrst

would be more ﬁne-tuned simulations to focus on power and what might be required for researchers

to have enough power to investigate longitudinal mediators in a 3-2-1 design similar to this one. The

addition of covariates, changes in sample size makeup at di erent levels, di erent e ect sizes, and

the addition of using bias-corrected factor scores (Kelcey et al., 2021) or using plausible values from

Bayesian factor analysis (Beauducel and Hilger, 2022) to simplify the model should be investigated

to understand ways that researchers may be able to increase power reasonably. These questions of

power and estimation also lead to examining the e ects of confounders on the b and c’ pathways

on bias and causal implications.

Future research may also include exploring situations that are not ideal; for example, when the

measurement has a mixture of small, medium, and large loadings and di erent proportions of the

items, also with sample sizes that are not equal, including varying ﬁrst and second level sample

sizes, but also non-equal sample sizes based upon treatment conditions as well as di ering number

of measurement times based upon treatment conditions (such as treatment 2nd level observations

having more mediator time points than the control group). A further situation to be examined

78

would be the presence of missing data and how missing data may impact the estimation of this

model. Another problem to consider would be how this model and estimation performs when the

model is misspeciﬁed. For instance, if the true model is a cross-lagged mediation analysis, how

much bias (if any) is introduced when using the latent state-trait model in the 3-2-1 design instead

of a cross-lagged model?

Finally, in the future, the current model proposed in this dissertation can be expanded to other

multilevel models, such as the 2-1-1 model or a cross-classiﬁed model. It may also be broadened

to include rater e ects for each item at each time point. Finally, this model should be developed

to incorporate intensive longitudinal data that might be expected when using emotionality as a

mediator with data collection, such as the experience sampling method (Vongkulluksn and Xie,

2022, ESM).

79

BIBLIOGRAPHY

Anderson, J. E. (2007). Random e ect modelling using bayesian methods. International Journal

of Services Technology and Management, 8(4-5):316–328.

Asparouhouv and Muthen (2010). Bayesian Analysis Using Mplus: Technical Implementation.

Muthen & Muthen.

Baron, R. M. and Kenny, D. A. (1986). The moderator–mediator variable distinction in social psy-
chological research: Conceptual, strategic, and statistical considerations. Journal of personality
and social psychology, 51(6):1173.

Bauer, D. J., Preacher, K. J., and Gil, K. M. (2006). Conceptualizing and testing random indirect
e ects and moderated mediation in multilevel models: new procedures and recommendations.
Psychological methods, 11(2):142.

Bauer, M. S., Damschroder, L., Hagedorn, H., Smith, J., and Kilbourne, A. M. (2015). An

introduction to implementation science for the non-specialist. BMC psychology, 3:1–12.

Beauducel, A. and Hilger, N. (2022). Coe cients of factor score determinacy for mean plausible
values of bayesian factor analysis. Educational and Psychological Measurement, 82(6):1069–
1086.

Bollen, K. A. (1987). Total, direct, and indirect e ects in structural equation models. Sociological

methodology, pages 37–69.

Chen, F. F., Hayes, A., Carver, C. S., Laurenceau, J.-P., and Zhang, Z. (2012). Modeling general
and speciﬁc variance in multifaceted constructs: A comparison of the bifactor model to other
approaches. Journal of personality, 80(1):219–251.

DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model

to testlet-based tests. Journal of educational measurement, 43(2):145–168.

Depaoli, S. and Clifton, J. P. (2015). A bayesian approach to multilevel structural equation modeling
with continuous and dichotomous outcomes. Structural Equation Modeling: A Multidisciplinary
Journal, 22(3):327–351.

Geiser, C. (2020). Longitudinal structural equation modeling with Mplus: A latent state-trait

perspective. Guilford publications.

Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model ﬁtness

via realized discrepancies. Statistica sinica, pages 733–760.

Goldsmith, K. A., MacKinnon, D. P., Chalder, T., White, P. D., Sharpe, M., and Pickles, A.
(2018). Tutorial: The practical application of longitudinal structural equation mediation models

80

in clinical trials. Psychological methods, 23(2):191.

Goldstein, E., Topitzes, J., Brown, R. L., and Barrett, B. (2020). Mediational pathways of meditation
and exercise on mental health and perceived stress: A randomized controlled trial. Journal of
health psychology, 25(12):1816–1830.

Gonzalez, O. and MacKinnon, D. P. (2018). A bifactor approach to model multifaceted constructs
in statistical mediation analysis. Educational and Psychological Measurement, 78(1):5–31.

Gonzalez, O. and MacKinnon, D. P. (2021). The measurement of the mediator and its inﬂuence on

statistical mediation conclusions. Psychological Methods, 26(1):1.

Hallquist, M. N. and Wiley, J. F. (2018). MplusAutomation: An R package for facilitating large-

scale latent variable analyses in Mplus. Structural Equation Modeling, pages 621–638.

Hayes, A. F. and Scharkow, M. (2013). The relative trustworthiness of inferential tests of the
Psychological

indirect e ect in statistical mediation analysis: does method really matter?
science, 24(10):1918–1927.

Herman, K. C., Reinke, W. M., Dong, N., and Bradshaw, C. P. (2022). Can e ective classroom
behavior management increase student achievement in middle school? ﬁndings from a group
randomized trial. Journal of Educational Psychology, 114(1):144.

Jose, P. E. (2016). The merits of using longitudinal mediation. Educational Psychologist, 51(3-

4):331–341.

Kelcey, B., Cox, K., and Dong, N. (2021). Croon’s bias-corrected factor score path analysis
for small-to moderate-sample multilevel structural equation models. Organizational research
methods, 24(1):55–77.

Kelcey, B., Spybrook, J., Dong, N., and Bai, F. (2020). Experimental power for cross-level mediation
in school-randomized studies of teacher development. Journal of research on educational
e ectiveness.

Kraft, M. A. (2020). Interpreting e ect sizes of education interventions. Educational researcher,

49(4):241–253.

MacKinnon, D. P., Lockwood, C. M., Ho man, J. M., West, S. G., and Sheets, V. (2002). A
comparison of methods to test mediation and other intervening variable e ects. Psychological
methods, 7(1):83.

MacKinnon, D. P., Lockwood, C. M., and Williams, J. (2004). Conﬁdence limits for the indirect
e ect: Distribution of the product and resampling methods. Multivariate behavioral research,
39(1):99–128.

81

Matsumura, L. C., Garnier, H. E., and Spybrook, J. (2013). Literacy coaching to improve student
reading achievement: A multi-level mediation model. Learning and Instruction, 25:35–48.

McNeish, D. and MacKinnon, D. P. (2022). Intensive longitudinal mediation in mplus. Psycho-

logical methods.

Muthen and Muthen (1998-2017). Mplus User’s Guide. Eight Edition. Muthen & Muthen.

Olivera-Aguilar, M., Rikoon, S. H., Gonzalez, O., Kisbu-Sakarya, Y., and MacKinnon, D. P.
(2018). Bias, type i error rates, and statistical power of a latent mediation model in the presence
of violations of invariance. Educational and Psychological Measurement, 78(3):460–481.

O’Neill, A., O’Sullivan, K., O’Sullivan, P., Purtill, H., and O’Kee e, M. (2020). Examining what
factors mediate treatment e ect in chronic low back pain: A mediation analysis of a cognitive
functional therapy clinical trial. European Journal of Pain, 24(9):1765–1774.

Palardy, G. J. (2015). High school socioeconomic composition and college choice: Multilevel me-
diation via organizational habitus, school practices, peer and sta  attitudes. School E ectiveness
and School Improvement, 26(3):329–353.

Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., and Chen, F. (2001). Monte carlo experiments:

Design and implementation. Structural Equation Modeling, 8(2):287–312.

Pituch, K. A., Murphy, D. L., and Tate, R. L. (2009). Three-level models for indirect e ects in
school-and class-randomized experiments in education. The Journal of Experimental Education,
78(1):60–95.

Preacher, K. J. and Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and
comparing indirect e ects in multiple mediator models. Behavior research methods, 40(3):879–
891.

Preacher, K. J. and Selig, J. P. (2012). Advantages of monte carlo conﬁdence intervals for indirect

e ects. Communication Methods and Measures, 6(2):77–98.

Preacher, K. J., Zyphur, M. J., and Zhang, Z. (2010). A general multilevel sem framework for

assessing multilevel mediation. Psychological methods, 15(3):209.

Raykov, T., Dimitrov, D. M., and Asparouhov, T. (2010). Evaluation of scale reliability with binary

measures using latent variable modeling. Structural Equation Modeling, 17(2):265–279.

Reeves, E. (2015). A synthesis of the literature on trauma-informed care. Issues in mental health

nursing, 36(9):698–709.

Ritchie, J. and Miles, R. E. (1970). An analysis of quantity and quality of participation as mediating

variables in the participative decision making process. Personnel Psychology, 23(3).

82

Schenke, K., Nguyen, T., Watss, T., Sarama, J., and Clements, D. (2017). Di erential e ects of the
classroom on african american and non-african american’s mathematics achievement. Journal
of Educational Psychology, 109(6):794–811.

Schneider, B., Krajcik, J., Lavonen, J., Salmela-Aro, K., Broda, M., Spicer, J., Bruner, J., Moeller,
J., Linnansaari, J., Juuti, K., and Viljaranta, J. (2016). Investigating optimal learning moments
in u.s. and ﬁnnish science classes. Journal of Research in Science Teaching, 53(3):400–421.

Schneider, B., Krajcik, J., Lavonen, J., Salmela-Aro, K., Klager, C., Bradford, L., Chen, I.-C.,
Baker, Q., Touitou, I., Peek-Brown, D., et al. (2022).
Improving science achievement—is it
possible? evaluating the e cacy of a high school chemistry and physics project-based learning
intervention. Educational Researcher, 51(2):109–121.

Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, pages 461–464.

Schünemann, N., Spörer, N., Völlinger, V. A., and Brunstein, J. C. (2017). Peer feedback mediates
the impact of self-regulation procedures on strategy use and reading comprehension in reciprocal
teaching groups. Instructional Science, 45(4):395–415.

Selig, J. P. and Preacher, K. J. (2009). Mediation models for longitudinal data in developmental

research. Research in human development, 6(2-3):144–164.

Shen, Z., Li, W., and Leite, W. (2024). Statistical power and optimal design for randomized

controlled trials investigating mediation e ects. Psychological Methods.

Silva, B. C., Bosancianu, C. M., and Littvay, L. (2019). Multilevel structural equation modeling.

Sage Publications.

Smith, T. E. and Sheridan, S. M. (2019). The e ects of teacher training on teachers’ family-
engagement practices, attitudes, and knowledge: A meta-analysis. Journal of Educational and
Psychological Consultation, 29(2):128–157.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian measures
of model complexity and ﬁt. Journal of the royal statistical society: Series b (statistical
methodology), 64(4):583–639.

Spybrook, J., Unlu, F., Hedberg, E., Mulolli, D., and Berglund, T. (2022). Improving the design of
evaluations that include students, teachers, and schools. Paper presented at the Annual Meeting
for the American Educational Research Association, San Diego, CA.

Steyer, R., Mayer, A., Geiser, C., and Cole, D. A. (2015). A theory of states and traits—revised.

Annual review of clinical psychology, 11:71–98.

Vongkulluksn, V. W. and Xie, K. (2022). Multilevel latent state-trait models with experience
sampling data: An illustrative case of examining situational engagement. Open Education

83

Studies, 4(1):252–272.

Walberg, H. J. (1969). Social environment as a mediator of classroom learning. Journal of

Educational Psychology, 60(6p1):443.

Wang, Y., Kim, E. S., Dedrick, R. F., Ferron, J. M., and Tan, T. (2018). A multilevel bifac-
tor approach to construct validation of mixed-format scales. Educational and psychological
measurement, 78(2):253–271.

Westine, C. D., Unlu, F., Taylor, J., Spybrook, J., Zhang, Q., and Anderson, B. (2020). Design
parameter values for impact evaluations of science and mathematics interventions involving
teacher outcomes. Journal of Research on Educational E ectiveness, 13(4):816–839.

Widaman, K. F. and Revelle, W. (2023). Thinking thrice about sum scores, and then some more

about measurement and analysis. Behavior research methods, 55(2):788–806.

Wiliam, D. (2022). How should educational research respond to the replication “crisis” in the social
sciences? reﬂections on the papers in the special issue. Educational Research and Evaluation,
27(1-2):208–214.

Williams, N. J., Preacher, K. J., Allison, P. D., Mandell, D. S., and Marcus, S. C. (2022). Required
Implementation Science,

sample size to detect mediation in 3-level implementation studies.
17(1):66.

Zhang, Q. and Phillips, B. (2018). Three-level longitudinal mediation with nested units: How does
an upper-level predictor inﬂuence a lower-level outcome via an upper-level mediator over time?
Multivariate behavioral research, 53(5):655–675.

84

APPENDIX A

R SIMULATION CODE

Simulation Code

1 setwd("Z:/Hannah Chair/PIRE/DATA/Analysis/Lydia/Dissertation")

2 getwd()

3

4 library(MplusAutomation)

5 source("sim-functions.R")

6

7

8

9 sss = c(30, 60, 200, 500) #school sample size

10 tps = c(2,5,10,20) #teahcer per school

11 spt = c(20, 30, 60) #students per teacher

12

13 ape = c(0.15, 0.25, 0.45) #a path effects

14 bpe = c(0.05, 0.15, 0.25)#b path effects

15

16 gl = c(0.3, 0.6, 0.9) #general loadings

17 tl = c(0.3, 0.6, 0.9) #time specific loadings

18

19 repResult <- list()

20 attlist <- list()

21 stdResult <- list()

22 f3Result <- list()

23 bfResult <- list()

24 setwd("Z:/Hannah Chair/PIRE/DATA/Analysis/Lydia")

85

25 t = 2

26 s = 30

27 for (ss in sss){

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

for (a in ape){

for (b in bpe){

for (g in gl){

for (l in tl){

setwd("Z:/Hannah Chair/PIRE/DATA/Analysis/Lydia/

Dissertation") #reset directory

dir.create(paste0("ss-",ss,"t-",t,"s-",s,"a-",a,"b-",b,"g

-",g,"l-",l))#make new directory

setwd(paste0("ss-",ss,"t-",t,"s-",s,"a-",a,"b-",b,"g-",g,"

l-",l))#change directory to new directory

att <- list(ss, t, s, a, b, g, l)

print(att)

attlist <- append(attlist,att)

cat(fullmodelsim(ss, t, s, a, b, g, l, processors=8),file

= file.path(getwd(),paste0(ss,t,s,a,b,g,l,".inp")))

runModels(file.path(getwd(),paste0(ss,t,s,a,b,g,l,".inp"))

)

repResult <- append(repResult,readModels(file.path(getwd()

,paste0(ss,t,s,a,b,g,l,".out"))))

cat(stzmed(ss, t, s, a, b, g, l, processors=8),file = file

.path(getwd(),paste0("std -",ss,t,s,a,b,g,l,".inp")))

runModels(file.path(getwd(),paste0("std -",ss,t,s,a,b,g,l

,".inp")))

86

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

stdResult <- append(stdResult,readModels(file.path(getwd()

,paste0("std -",ss,t,s,a,b,g,l,".out"))))

flist <- list()

bflist <- list()

for (i in 1:100){

cat(threefactors(ss, t, s, a, b, g, l,i, processors=8) ,

file = file.path(getwd(),paste0("3f -",ss,t,s,a,b,g,

l,i,".inp")))

runModels(file.path(getwd(),paste0("3f -",ss,t,s,a,b,g,l

,i,".inp")))

cat(bifactors(ss, t, s, a, b, g, l,i, processors=8) ,

file = file.path(getwd(),paste0("bf -",ss,t,s,a,b,g,l

,i,".inp")))

runModels(file.path(getwd(),paste0("bf -",ss,t,s,a,b,g,l

,i,".inp")))

flist[i] <- paste0("3F", ss,t,s,a,b,g,l,"REP",i,".dat")

bflist[i] <- paste0("BF", ss,t,s,a,b,g,l,"REP",i,".dat")

}

fdata <- data.frame(matrix(unlist(flist),nrow = 100, byrow

=TRUE))

bdata <- data.frame(matrix(unlist(flist),nrow = 100, byrow

=TRUE))

write.table(fdata, file = paste0("3F", ss,t,s,a,b,g,l,"

REPlist.dat"), row.names=FALSE,quote=FALSE,col.names=

FALSE)

write.table(bdata, file = paste0("BF", ss,t,s,a,b,g,l,"

REPlist.dat"), row.names=FALSE,quote=FALSE,col.names=

87

FALSE)

cat(f3med(ss, t, s, a, b, g, l, processors=8),file = file.

path(getwd(),paste0("f3med -",ss,t,s,a,b,g,l,".inp")))

runModels(file.path(getwd(),paste0("f3med -",ss,t,s,a,b,g,

l,".inp")))

f3Result <- append(f3Result,readModels(file.path(getwd(),

paste0("f3med -",ss,t,s,a,b,g,l,".out"))))

cat(bfmed(ss, t, s, a, b, g, l, processors=8),file = file.

path(getwd(),paste0("bfmed -",ss,t,s,a,b,g,l,".inp")))

runModels(file.path(getwd(),paste0("bfmed -",ss,t,s,a,b,g,

l,".inp")))

bfResult <- append(bfResult,readModels(file.path(getwd(),

paste0("bfmed -",ss,t,s,a,b,g,l,".out"))))

59

60

61

62

63

64

65

66

67

68

69 }

70

71

72

73

}

}

}

}

74 save(repResult,attlist,stdResult,f3Result,bfResult, file="sim.RData")

Simulation functions

76 ### Generates the MPLUS syntax for all the simulations

77

88

78 fullmodelsim <- function(ss, t, s, a, b, g, l, processors=8){

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

nobs <- ss*t*s

te <- 0.2 - a*b

syntax <- paste0(

"TITLE: \n",

"Fully Specified Model Simulation - School N =", ss, " Teacher N

=", t,

" Student N =", s, " a =",a," b =", b, " trait loadings =", g, "

state loadings =",l, "\n",

"MONTECARLO: \n",

"NAMES ARE T PT1 PT2 PT3 PT4 TO11 TO21

TO31 TO12 TO22 TO32 TO13 TO23 TO33;

CATEGORICAL = PT1 PT2 PT3 PT4; \n",

"BETWEEN = TO11 TO21

TO31 TO12 TO22 TO32 TO13 TO23 TO33 (level3)T; \n",

"GENERATE = PT1-PT4(1); \n",

"CUTPOINTS = T(0);\n",

"NOBSERVATIONS =",nobs,";\n",

"NCSIZES = 1[1]; \n",

"CSIZES =",ss,"[",t,"(",s,")]; \n",

"NREP = 100;\n",

"SEED = 2024; \n",

"REPSAVE = ALL; \n",

"SAVE =", ss,t,s,a,b,g,l,"REP*.dat; \n",

"ANALYSIS: \n",

"TYPE IS THREELEVEL \n;",

"processors =",processors, ";\n",

89

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

"ALGORITHM=GIBBS(RW); \n",

"MODEL POPULATION: \n",

"%WITHIN% \n",

"OUTCOME BY PT1@1 PT2*0.5 PT3*0.8 PT4*0.3;\n",

"OUTCOME*1;\n",

"%BETWEEN level2% \n",

"OUTCOMET BY PT1@1 PT2*0.5 PT3*0.8 PT4*0.3;\n",

"OUTCOMET*.85; \n",

"MED BY TO11*",g," TO21-TO33*",g, ";\n",

"MEDT1 BY TO11*",l," TO21-TO31*",l,"; \n",

"MEDT2 BY TO12*",l," TO22-TO32*",l,"; \n",

"MEDT3 BY TO13*",l," TO23-TO33*",l,"; \n",

"MED-MEDT3@1; \n",

"MED-MEDT3 with MED-MEDT3@0;\n",

"%BETWEEN level3% \n",

"MEDB BY TO11*",g," TO21-TO33*",g, ";\n",

"MEDB@.2;\n",

"OUTCOMEB BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3;\n",

"MEDB ON T*",a,"; \n",

"OUTCOMEB ON MEDB*", b," T*",te,";\n",

"OUTCOMEB*.8; \n",

"MODEL: \n",

"%WITHIN% \n",

"OUTCOME BY PT1@1 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOME*1;\n",

90

"%BETWEEN level2% \n",

"OUTCOMET BY PT1@1 PT2*0.5 PT3*0.8 PT4*0.3;\n",

"OUTCOMET*.85; \n",

"MED BY TO11*",g," TO21-TO33*",g, ";\n",

"MEDT1 BY TO11*",l," TO21-TO31*",l,"; \n",

"MEDT2 BY TO12*",l," TO22-TO32*",l,"; \n",

"MEDT3 BY TO13*",l," TO23-TO33*",l,";\n",

"MED-MEDT3@1;\n",

"MED-MEDT3 with MED-MEDT3@0; \n",

"%BETWEEN level3% \n",

"MEDB BY TO11*",g," TO21-TO33*",g, "; \n",

"MEDB@.2; \n",

"OUTCOMEB BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3;\n",

"MEDB ON T*",a,";\n",

"OUTCOMEB ON MEDB*", b," T*",te,";\n",

"OUTCOMEB*.8; \n"

)

return(syntax)

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151 }

152

153 stzmed <- function(ss, t, s, a, b, g, l, processors=8){

154

155

156

te <- 0.2 - a*b

syntax <- paste0(

"TITLE: \n",

91

157

"SIMULATION OF MT-MEDIATION SCHOOL N = ", ss, ", TEACHER N = ",t

, ", STUDENT N = ", s,", A=", a, ", B= ",b,", GENERAL LOADING

= ",g,", TIME LOADING = ",l," - STD MEDIATOR \n",

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

"DATA: \n",

"FILE IS ", ss,t,s,a,b,g,l,"REPlist.dat; \n",

"TYPE = MONTECARLO; \n",

"VARIABLE: \n",

"NAMES ARE PT1 PT2 PT3 PT4 TO11 TO21

TO31 TO12 TO22 TO32 TO13 TO23 TO33 T Tid Sid TO1 TO2 TO3 TOM;\n",

"CATEGORICAL = PT1 PT2 PT3 PT4; \n",

"CLUSTER = Sid Tid; \n",

"BETWEEN = TO11 TO12

TO13 TO21 TO22 TO23 TO31 TO32 TO33 TO1 TO2 TO3 TOM (Sid) T;\n",

"DEFINE: \n",

"TO1 = (TO11 + TO21 + TO31)/3;\n",

"TO2 = (TO12 + TO22 + TO32)/3;\n",

"TO3 = (TO13 + TO23 + TO33)/3;\n",

"TOM = (TO1 +TO2 + TO3)/3;\n",

"ANALYSIS: \n",

"TYPE IS THREELEVEL;\n",

"ESTIMATOR=Bayes;\n",

"processors = ",processors,";\n",

"ALGORITHM=GIBBS(RW); \n",

"MODEL:\n",

"%WITHIN% \n",

"OUTCOME BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3; \n",

92

"OUTCOME*1; \n",

"%Between Tid% \n",

"OUTCOMET BY PT1@1 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOMET*.85;\n",

"TOM@1; \n",

"%BETWEEN Sid% \n",

"TOM@.2;\n",

"OUTCOMEB BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOMEB*.2;\n",

"TOM ON T*",a,"; \n",

"OUTCOMEB ON TOM*",b," T*",te,";\n",

"OUTPUT: TECH9; \n"

)

return(syntax)

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196 }

197

198 threefactors <- function(ss, t, s, a, b, g, l, i, processors=8){

199

200

201

syntax<- paste0(

"TITLE: \n",

"SIMULATION OF MT-MEDIATION SCHOOL N = ", ss, ", TEACHER N = ", t,

",

202

STUDENT N = ", s, ", A = ",a,", B = ",b,", GENERAL LOADING = ",g

,", TIME LOADING = ",l," - save 3 factor scores \n",

203

204

205

206

"DATA: \n",

"FILE IS ", ss,t,s,a,b,g,l,"REP",i,".dat;\n",

"VARIABLE: \n",

"NAMES ARE PT1 PT2 PT3 PT4 TO11 TO21

93

TO31 TO12 TO22 TO32 TO13 TO23 TO33 T Tid Sid;\n",

"MODEL: \n",

"MEDT1 BY TO11 TO21 TO31;\n",

"MEDT2 BY TO12 TO22 TO32; \n",

"MEDT3 BY TO13 TO23 TO33;\n",

"MEDT1-MEDT3@1; \n",

"SAVEDATA: \n",

"save=fscores;\n",

"FILE IS 3F", ss,t,s,a,b,g,l,"REP",i,".dat;\n"

)

return(syntax)

207

208

209

210

211

212

213

214

215

216

217

218 }

219

220 f3med <- function(ss, t, s, a, b, g, l, processors=8){

221

222

223

224

225

226

227

228

229

230

te <- 0.2 - a*b

syntax <- paste0(

"TITLE: \n",

"SIMULATION OF MT-MEDIATION SCHOOL N = ", ss, ", TEACHER N = ",t,

", STUDENT N = ", s,", A=", a, ", B= ",b,", GENERAL LOADING= ",

g,", TIME LOADING = ",l," - 3 factor MEDIATOR \n",

"DATA: \n",

"FILE IS 3F", ss,t,s,a,b,g,l,"REPlist.dat; \n",

"TYPE = MONTECARLO; \n",

"VARIABLE: \n",

"NAMES ARE PT1 PT2 PT3 PT4 TO11 TO21

TO31 TO12 TO22 TO32 TO13 TO23 TO33 T Tid Sid TO1 TO1_SE TO2 TO2_SE

TO3 TO3_SE TOM;\n",

94

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

"USEVAR = PT1 PT2 PT3 PT4 T Tid Sid TO1 TO2 TO3 TOM; \n",

"CATEGORICAL = PT1 PT2 PT3 PT4; \n",

"CLUSTER = Sid Tid; \n",

"BETWEEN = TO1 TO2 TO3 TOM (Sid) T;\n",

"DEFINE: \n",

"TOM = (TO1 +TO2 + TO3)/3;\n",

"ANALYSIS: \n",

"TYPE IS THREELEVEL;\n",

"ESTIMATOR=Bayes;\n",

"processors = ",processors,";\n",

"ALGORITHM=GIBBS(RW); \n",

"MODEL:\n",

"%WITHIN% \n",

"OUTCOME BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOME*1; \n",

"%Between Tid% \n",

"OUTCOMET BY PT1@1 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOMET*.85;\n",

"TOM@1; \n",

"%BETWEEN Sid% \n",

"TOM@.2;\n",

"OUTCOMEB BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOMEB*.2;\n",

"TOM ON T*",a,"; \n",

"OUTCOMEB ON TOM*",b," T*",te,";\n",

"OUTPUT: TECH9; \n"

95

)

return(syntax)

258

259

260 }

261

262 bifactors <- function(ss, t, s, a, b, g, l, i, processors=8){

263

264

265

syntax<- paste0(

"TITLE: \n",

"SIMULATION OF MT-MEDIATION SCHOOL N = ", ss, ", TEACHER N = ", t,

",

266

STUDENT N = ", s, ", A = ",a,", B = ",b,", GENERAL LOADING = ",g

,", TIME LOADING = ",l," - save bifactor scores \n",

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

"DATA: \n",

"FILE IS ", ss,t,s,a,b,g,l,"REP",i,".dat;\n",

"VARIABLE: \n",

"NAMES ARE PT1 PT2 PT3 PT4 TO11 TO21

TO31 TO12 TO22 TO32 TO13 TO23 TO33 T Tid Sid;\n",

"MODEL: \n",

"GM BY TO11 TO12

TO13 TO21 TO22 TO23 TO31 TO32 TO33; \n",

"MEDT1 BY TO11 TO21 TO31;\n",

"MEDT2 BY TO12 TO22 TO32; \n",

"MEDT3 BY TO13 TO23 TO33;\n",

" GM-MEDT3@1;\n",

"GM-MEDT3 with GM-MEDT3@0;\n",

"SAVEDATA: \n",

"save=fscores;\n",

"FILE IS BF", ss,t,s,a,b,g,l,"REP",i,".dat;\n"

96

)

return(syntax)

283

284

285 }

286

287 bfmed <- function(ss, t, s, a, b, g, l, processors=8){

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

te <- 0.2 - a*b

syntax <- paste0(

"TITLE: \n",

"SIMULATION OF MT-MEDIATION SCHOOL N = ", ss, ", TEACHER N = ",t,

", STUDENT N = ", s,", A=", a, ", B= ",b,", GENERAL LOADING= ",

g,", TIME LOADING = ",l," - 3 factor MEDIATOR \n",

"DATA: \n",

"FILE IS 3F", ss,t,s,a,b,g,l,"REPlist.dat; \n",

"TYPE = MONTECARLO; \n",

"VARIABLE: \n",

"NAMES ARE PT1 PT2 PT3 PT4 TO11 TO21

TO31 TO12 TO22 TO32 TO13 TO23 TO33 T Tid Sid \n",

"TOG TOG_SE TO1 TO1_SE TO2 TO2_SE TO3 TO3_SE;\n",

"USEVAR = PT1 PT2 PT3 PT4 T Tid Sid TOG; \n",

"CATEGORICAL = PT1 PT2 PT3 PT4; \n",

"CLUSTER = Sid Tid; \n",

"BETWEEN = TOG (Sid) T;\n",

"ANALYSIS: \n",

"TYPE IS THREELEVEL;\n",

"ESTIMATOR=Bayes;\n",

"processors = ",processors,";\n",

97

"ALGORITHM=GIBBS(RW); \n",

"MODEL:\n",

"%WITHIN% \n",

"OUTCOME BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOME*1; \n",

"%Between Tid% \n",

"OUTCOMET BY PT1@1 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOMET*.85;\n",

"TOG@1; \n",

"%BETWEEN Sid% \n",

"TOG@.2;\n",

"OUTCOMEB BY PT1@1.0 PT2*0.5 PT3*0.8 PT4*0.3; \n",

"OUTCOMEB*.2;\n",

"TOG ON T*",a,"; \n",

"OUTCOMEB ON TOG*",b," T*",te,";\n",

"OUTPUT: TECH9; \n"

)

return(syntax)

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326 }

98

APPENDIX B

EMPIRICAL STUDY MPLUS CODE

CESE Treatment E ect

327 TITLE: CESE TREATMENT - STDZ OUTCOME - WITH PRIORS

328 DATA: FILE IS tobs.csv;

329 VARIABLE: NAMES ARE STUID SID CHEM RID TID PRE1-PRE9

330

331

332

333

334

335

336

337

338

PHY1-PHY12 CHEM1-CHEM25 T POSTEQ PRESUB SPRESUB

OB1T1-OB1T11

OB2T1-OB2T11 OB3T1-OB3T11 OB4T1-OB4T11

OB5T1-OB5T11;

USEVAR = SID RID TID T

POSTEQ CHEM PRESUB SPRESUB;

MISSING ARE .;

CLUSTER = SID TID;

WITHIN = PRESUB;

BETWEEN = (TID) CHEM SPRESUB(SID)T RID;

339 ANALYSIS: TYPE IS THREELEVEL;

340

341

ESTIMATOR=BAYES;

ALGORITHM=GIBBS(RW);

342 MODEL:

343 %WITHIN%

344 POSTEQ ON PRESUB;

345 %BETWEEN TID%

346 POSTEQ ON CHEM SPRESUB;

347 POSTEQ (TAU1);

348 %BETWEEN SID%

99

POSTEQ (TAU2);

POSTEQ ON T RID;

349

350

351

352 MODEL PRIORS:

353

354

TAU1~IG(2.25,0.16);

TAU2 ~ IG(2.25,0.19);

Mediation 1: Standardized averages of the mediator

355 TITLE: CESE MEDIATION - Standardized MED AND OUTCOME scores - WITH

INFORMATIVE PRIORS

356 DATA: FILE IS tobs.csv;

357 VARIABLE: NAMES ARE STUID SID CHEM RID TID PRE1-PRE9

PHY1-PHY12 CHEM1-CHEM25 T POSTEQ PRESUB SPRESUB

OB1T1-OB1T11

OB2T1-OB2T11 OB3T1-OB3T11 OB4T1-OB4T11

OB5T1-OB5T11;

USEVAR =SID RID TID T

POSTEQ CHEM PRESUB SPRESUB OB1T1-OB1T4

OB2T1-OB2T4 OB3T1-OB3T4 TO1 TO2 TO3 TOM;

MISSING ARE .;

CLUSTER = SID TID;

WITHIN = PRESUB;

BETWEEN = OB1T1-OB3T4 TO1 TO2 TO3 TOM (TID) CHEM SPRESUB (SID

)T RID;

358

359

360

361

362

363

364

365

366

367

368

369 DEFINE:

370 TO1 = (OB1T1 + OB1T2 + OB1T3 + OB1T4)/4;

100

371 TO2 = (OB2T1 + OB2T2 + OB2T3 + OB2T4)/4;

372 TO3 = (OB3T1 + OB3T2 + OB3T3 + OB3T4)/4;

373 TOM = (TO1 +TO2 + TO3)/3;

374

375 ANALYSIS: TYPE IS THREELEVEL;

ESTIMATOR=Bayes;

ALGORITHM=GIBBS(RW);

376

377

378

379 MODEL:

380 %WITHIN%

381 POSTEQ ON PRESUB;

382 %BETWEEN TID%

383

384

385

386

POSTEQ ON TOM (B1)

CHEM SPRESUB;

POSTEQ (TAU1);

TOM ON CHEM SPRESUB;

387 %BETWEEN SID%

POSTEQ (TAU2);

TOM (TAU3);

TOM ON T (A)

RID;

POSTEQ ON T RID;

388

389

390

391

392

393

394 MODEL PRIORS:

395

396

397

TAU1~IG(2.25,0.16);

TAU2 ~ IG(2.25,0.19);

TAU3~IG(2.12,.12);

101

398

399 MODEL CONSTRAINT:

NEW (IND1);

IND1 = A*B1;

400

401

402

403 OUTPUT: STANDARDIZED CINTERVAL;

Mediation 2: Averages of the factor socres

factor score estimation

404 TITLE: CESE MEDIATION - SAVE FACTOR SCORES;

405 DATA: FILE IS tobs.csv;

406 VARIABLE: NAMES ARE STUID SID CHEM RID TID PRE1-PRE9

PHY1-PHY12 CHEM1-CHEM25 T POSTEQ PRESUB SPRESUB

OB1T1-OB1T11

OB2T1-OB2T11 OB3T1-OB3T11 OB4T1-OB4T11

OB5T1-OB5T11;

USEVAR ARE SID RID TID T

POSTEQ CHEM PRESUB SPRESUB OB1T1-OB1T4

OB2T1-OB2T4 OB3T1-OB3T4;

MISSING ARE .;

CLUSTER = TID;

BETWEEN = OB1T1-OB3T4;

407

408

409

410

411

412

413

414

415

416

417 ANALYSIS: TYPE IS TWOLEVEL;

ESTIMATOR =Bayes;

418

419

420 MODEL:

102

421 %WITHIN%

422

423 %BETWEEN%

424 MEDT1 BY OB1T1* OB1T2 OB1T3 OB1T4;

425 MEDT2 BY OB2T1* OB2T2 OB2T3 OB2T4;

426 MEDT3 BY OB3T1* OB3T2 OB3T3 OB3T4;

427 MEDT1-MEDT3@1;

428

429 SAVEDATA:

430 save=fscores(50 10);

431 FILE IS 3FSCORES.csv;

Estimation of Mediation

432 TITLE: CESE MEDIATION - AVERGAED FACTOR SCORES - INFPRIORS

433

434

435

436

437

438

439

440

441

442

443

DATA: FILE IS 3FSCORES.csv;

VARIABLE: NAMES ARE OB1T1-OB1T4

OB2T1-OB2T4 OB3T1-OB3T4

SID RID T POSTEQ CHEM PRESUB SPRESUB MEDT1 MEDT1m

MEDT1sd

MEDT125 MEDT198 MEDT2 MEDT2m MEDT2sd

MEDT225 MEDT298 MEDT3 MEDT3m MEDT3sd

MEDT325 MEDT398 IG1-IG35 TID;

USEVAR = SID RID T

POSTEQ CHEM PRESUB SPRESUB MEDT1 MEDT2 MEDT3 TID

TOM;

MISSING ARE *;

CLUSTER = SID TID;

103

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

WITHIN = PRESUB;

BETWEEN = MEDT1 MEDT2 MEDT3 TOM (TID)CHEM SPRESUB (SID)T

RID;

DEFINE:

TOM = (MEDT1 + MEDT2 + MEDT3)/3;

STANDARDIZE TOM;

ANALYSIS: TYPE IS THREELEVEL;

ESTIMATOR=Bayes;

ALGORITHM=GIBBS(RW);

MODEL:

%WITHIN%

POSTEQ ON PRESUB;

%BETWEEN TID%

POSTEQ ON TOM (B1)

SPRESUB CHEM;

TOM ON SPRESUB CHEM;

POSTEQ (TAU1);

%BETWEEN SID%

POSTEQ (TAU2);

TOM (TAU3);

TOM ON T (A)

RID;

POSTEQ ON T RID;

104

470 MODEL PRIORS:

471

472

473

474

475

476

477

478

479

480

TAU1~IG(2.25,0.16);

TAU2 ~ IG(2.25,0.19);

TAU3~IG(2.12,.12);

MODEL CONSTRAINT:

NEW (IND1);

IND1 = A*B1;

OUTPUT: CINTERVAL;

Mediation 3: Factors from LST

Factor scores estimation

481 TITLE: CESE MEDIATION - SAVE BIFACTOR SCORES;

482 DATA: FILE IS tobs.csv;

483 VARIABLE: NAMES ARE STUID SID CHEM RID TID PRE1-PRE9

484

485

486

487

488

489

490

491

492

PHY1-PHY12 CHEM1-CHEM25 T POSTEQ PRESUB SPRESUB

OB1T1-OB1T11

OB2T1-OB2T11 OB3T1-OB3T11 OB4T1-OB4T11

OB5T1-OB5T11;

USEVAR ARE SID RID TID T

POSTEQ CHEM PRESUB SPRESUB OB1T1-OB1T4

OB2T1-OB2T4 OB3T1-OB3T4;

MISSING ARE .;

CLUSTER = SID TID;

WITHIN = PRESUB;

105

BETWEEN = (TID) OB1T1-OB3T4 CHEM SPRESUB (SID)T RID;

493

494

495 ANALYSIS: TYPE IS THREELEVEL;

ESTIMATOR=BAYES;

ALGORITHM=GIBBS(RW);

496

497

498 MODEL:

499

500 %BETWEEN TID%

MED BY OB1T1* OB1T2-OB3T4;

MEDT1 BY OB1T1* OB1T2-OB1T4;

MEDT2 BY OB2T1* OB2T2-OB2T4;

MEDT3 BY OB3T1* OB3T2-OB3T4;

MED-MEDT3@1;

MED-MEDT3 WITH MED-MEDT3@0;

501

502

503

504

505

506

507

508

509 SAVEDATA:

510 save=fscores(50 10);

511 FILE IS BFSCORES.csv;

Mediation estimation

512 TITLE: CESE MEDIATION - BIFACTOR SCORES - INFPRIORS

513 DATA: FILE IS BFSCORES.csv;

514 VARIABLE: NAMES ARE RID T POSTEQ CHEM PRESUB SPRESUB OB1T1-OB1T4

515

516

517

OB2T1-OB2T4 OB3T1-OB3T4

TOM TOMm TOMsd TOM25 TOM98 MEDT1 MEDT1m MEDT1sd

MEDT125 MEDT198 MEDT2 MEDT2m MEDT2sd

106

MEDT225 MEDT298 MEDT3 MEDT3m MEDT3sd

MEDT325 MEDT398 IG1-IG10 SID TID;

USEVAR = SID RID T

POSTEQ CHEM PRESUB SPRESUB TID TOM;

MISSING ARE *;

CLUSTER = SID TID;

WITHIN = PRESUB;

BETWEEN = TOM(TID)CHEM SPRESUB (SID)T RID;

518

519

520

521

522

523

524

525

526

527

528 ANALYSIS: TYPE IS THREELEVEL;

ESTIMATOR=Bayes;

ALGORITHM=GIBBS(RW);

529

530

531

532 MODEL:

533

%WITHIN%

534 POSTEQ ON PRESUB;

535 %BETWEEN TID%

536

537

538

539

POSTEQ ON TOM (B1)

SPRESUB CHEM;

TOM ON SPRESUB CHEM;

POSTEQ (TAU1);

540 %BETWEEN SID%

541

542

543

544

POSTEQ (TAU2);

TOM (TAU3);

TOM ON T (A)

RID;

107

POSTEQ ON T RID;

545

546

547 MODEL PRIORS:

TAU1~IG(2.25,0.16);

TAU2 ~ IG(2.25,0.19);

TAU3~IG(2.12,.12);

548

549

550

551

552 MODEL CONSTRAINT:

553

554

NEW (IND1);

IND1 = A*B1;

Mediation 4: Fully speciﬁed model

555 TITLE: CESE full mediation model - STDZ OUTCOME -inprior

556 DATA: FILE IS tobs.csv;

557 VARIABLE: NAMES ARE STUID SID CHEM RID TID PRE1-PRE9

558

559

560

561

562

563

564

565

566

567

PHY1-PHY12 CHEM1-CHEM25 T POSTEQ PRESUB SPRESUB

OB1T1-OB1T11

OB2T1-OB2T11 OB3T1-OB3T11 OB4T1-OB4T11

OB5T1-OB5T11;

USEVAR = SID RID TID T

POSTEQ CHEM PRESUB SPRESUB OB1T1-OB1T4

OB2T1-OB2T4 OB3T1-OB3T4;

MISSING ARE .;

CLUSTER = SID TID;

WITHIN = PRESUB;

BETWEEN = OB1T1-OB1T4

108

568

OB2T1-OB2T4 OB3T1-OB3T4 (TID) CHEM SPRESUB (SID)T

RID;

569 ANALYSIS: TYPE IS THREELEVEL;

570

571

572

ESTIMATOR=BAYES;

ALGORITHM=GIBBS(RW);

BITERATIONS = 100000;

573 MODEL:

574 %WITHIN%

575 POSTEQ ON PRESUB;

576 %BETWEEN TID%

577

578

579

580

581

582

583

584

585

586

POSTEQ ON MED (B1)

CHEM SPRESUB;

MED ON CHEM SPRESUB;

MED BY OB1T1* OB1T2-OB3T4;

MEDT1 BY OB1T1* OB1T2-OB1T4;

MEDT2 BY OB2T1* OB2T2-OB2T4;

MEDT3 BY OB3T1* OB3T2-OB3T4;

MED-MEDT3@1;

MED-MEDT3 WITH MED-MEDT3@0;

POSTEQ (TAU1);

587 %BETWEEN SID%

588

589

590

591

592

593

POSTEQ (TAU2);

MEDB BY OB1T1* OB1T2-OB3T4;

MEDB (TAU3);

MEDB ON T (A)

RID;

POSTEQ ON T RID;

109

594

595 MODEL PRIORS:

TAU1~IG(2.25,0.16);

TAU2 ~ IG(2.25,0.19);

TAU3~IG(2.12,.12);

596

597

598

599

600 MODEL CONSTRAINT:

601

602

NEW (IND1);

IND1 = A*B1;

110