CLARIFICATION AND IDENTIFICATION OF CAUSAL ESTIMANDS USING PRINCIPAL STRATUM STRATEGY IN CLINICAL TRIALS WITH TWO ACTIVE TREATMENTS By Hanyue Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Epidemiology - Doctor of Philosophy 2021 ABSTRACT CLARIFICATION AND IDENTIFICATION OF CAUSAL ESTIMANDS USING PRINCIPAL STRATUM STRATEGY IN CLINICAL TRIALS WITH TWO ACTIVE TREATMENTS By Hanyue Li The randomization process in clinical trials is disrupted with the existence of post- randomization events or intercurrent events. The “gold-standard” intention-to-treat (ITT) estimand therefore loses its clinical relevance because it compares treatment assignments instead of actual received treatments. Alternative estimands need to be defined and identified to quantify the causal effect of treatments. In this dissertation, we focus on the intercurrent event of treatment nonadherence and identify the causal estimands in clinical trials with two active treatments. We work under the Neyman-Rubin causal framework and the principal stratification framework. First, we propose a nonparametric approach which identifies the complier average causal effect (CACE) as the ratio of the ITT effect of treatment assignment on the outcome to the ITT effect of treatment assignment on the treatment received under the exclusion restriction, monotonicity, and no partial-compliers assumptions. We discuss violation of the identification assumptions and derive the corresponding bias formulas. Simulations with various degrees of assumption violations are conducted to evaluate the performance and sensitivity of the approach. The results show that the nonparametric approach can yield an unbiased estimator for CACE when sample size is 500 or above and the percentage of compliers is above or equal to 70%. In addition, increasing the number of compliers has the potential to reduce the bias to as close as zero. Second, we propose a multisite design approach which identifies the CACE under the zero correlation assumption. We derive the bias formula when measurement errors and omitted variables exist. Simulations across various scenarios are conducted for the oracle, naïve, and bootstrap estimators. The results show that multisite design approach can yield an unbiased estimator if there are no measurement errors and omitted variables. Increasing the number of people in each site can reduce the bias because it reduces the variation of the measurement errors. Increasing the number of sites, on the other hand, does not make a significant impact on the bias. We apply the two proposed approaches to the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) data to identify the causal effects of the medication augmentation and the causal effects of the medication switching. We show that augmenting citalopram with sustained- release bupropion is no better than augmenting citalopram with buspiron in terms of remission but has a higher response rate. Augmenting citalopram with sustained-release bupropion also reduces the 17-item Hamilton Rating Scale for Depression (HAM-D17) score greater than augmenting citalopram with buspiron. We also show that switching to extended-release venlafaxine has a slightly better performance compared switching to sustained-release bupropion in terms of remission, HAM-D17 scores at the end of the study, and reduction of HAM-D17 scores. Copyright by HANYUE LI 2021 To Mom and Dad. Thank you for everything. I love you to the moon and back. v ACKNOWLEDGEMENTS A journey is approaching to the destination. It is a journey filled with adventures, challenges, confusions, and tears. It is also a journey full of gains, reconciliations, appreciations, and happiness. At the end of the journey, I would like to express my sincere gratitude to my advisor and dissertation committee chair, Dr. Zhehui Luo. Dr. Luo guides me throughout this dissertation as well as my Ph.D. life with her endless knowledge, thoughtful insights and unlimited patience. This dissertation will not be completed without her support and advice. I would like to thank my other committee members: Dr. Ling Wang, Dr. Ahnalee Brincks, and Dr. Joseph Gardiner for their kindness, encouragement, and the invaluable suggestions and comments they give. I would like to thank the staff and faculty in my department who create such a friendly environment for us. I would like to thank all the friends I’ve met along this journey. Lastly, thank you my husband, Shitong Gu, for always staying by my side. Thank you my mom and dad, Yun Lou and Wengang Li, for loving me and supporting me unconditionally all the time. Thank you Audrey Gu, for being my daughter. Mommy loves you, always has, always will. Thank you, beautiful journey. vi TABLE OF CONTENTS LIST OF TABLES ......................................................................................................................... ix LIST OF FIGURES ...................................................................................................................... xii CHAPTER 1 INTRODUCTION AND AIMS ............................................................................... 1 CHAPTER 2 BACKGROUND ...................................................................................................... 4 2.1 Neyman-Rubin Causal Model ............................................................................................... 4 2.2 Principal Stratification Framework ....................................................................................... 6 2.3 Identification of Causal Estimands in RCTs with Treatment Nonadherence ....................... 7 CHAPTER 3 A NONPARAMETRIC APPROACH .................................................................... 12 3.1 Study Setup and Causal Estimands ..................................................................................... 12 3.2 Identification Strategies of the Causal Estimand ................................................................ 14 3.3 Sensitivity of the Causal Estimand to Key Assumptions .................................................... 18 3.3.1 Violation of Assumption I (Exclusion Restriction) ...................................................... 19 3.3.2 Violation of Assumption II (Monotonicity) ................................................................. 20 3.3.2.1 Violation of Assumption II.a (No Irrationalists).................................................... 21 3.3.2.2 Violation of Assumption II.b (No Flip-Floppers) .................................................. 23 3.3.3 Violation of Assumption III (No Partial-Complier) ..................................................... 24 3.4 Simulation Studies............................................................................................................... 26 3.4.1 Simulation Setup........................................................................................................... 26 3.4.2 Performance Metrics..................................................................................................... 30 3.4.3 Results .......................................................................................................................... 31 3.4.3.1 Performance Analysis ............................................................................................ 31 3.4.3.2 Sensitivity Analysis ............................................................................................... 33 CHAPTER 4 A MULTISITE DESIGN APPROACH ................................................................. 39 4.1 Identification Strategies of the Causal Estimand ................................................................ 40 4.2 Estimation of the Causal Estimand ..................................................................................... 44 4.3 The Problems of Omitted Variables and Measurement Errors ........................................... 46 4.4 Simulation Studies............................................................................................................... 49 4.4.1 Simulation Setup........................................................................................................... 49 4.4.2 Results .......................................................................................................................... 53 4.4.2.1 Performance Analysis ............................................................................................ 53 4.4.2.2 Sensitivity Analysis ............................................................................................... 59 CHAPTER 5 APPLICATION ...................................................................................................... 66 5.1 Medication Augmentation: CIT+BUP vs. CIT+BUS ........................................................ 69 5.1.1 Assessment of Assumptions ......................................................................................... 72 5.1.2 Analysis ........................................................................................................................ 79 5.1.3 Results .......................................................................................................................... 82 vii 5.2 Medication Switching: BUP vs. VEN ................................................................................. 85 5.2.1 Assessment of Assumptions ......................................................................................... 88 5.2.2 Results .......................................................................................................................... 93 5.3 Discussion ........................................................................................................................... 96 CHAPTER 6 DISCUSSION AND FUTURE WORK ............................................................... 100 APPENDICES ............................................................................................................................ 103 APPENDIX A Results of the Binary Outcomes for the Nonparametric Approach ............... 104 APPENDIX B Results of the Binary Outcomes for the Multisite Design Approach ............ 109 BIBLIOGRAPHY ....................................................................................................................... 115 viii LIST OF TABLES Table 2.1 Principal strata classified by A(0) and A(1) for placebo-controlled RCTs. ................... 7 Table 3.1 Principal strata classified by A(1) and A(2) for RCTs with two active treatments…......12 Table 3.2 Principal strata classified by A(1) and A(2) for RCTs with two active treatments under Assumption II, the monotonicity assumption, and Assumption III, the no partial-compliers assumption. ................................................................................................................................... 17 Table 3.3 Principal strata classified by A(1) and A(2) for RCTs with two active treatments when Assumption II.a, the no irrationalists assumption, is violated. ..................................................... 21 Table 3.4 Principal strata classified by A(1) and A(2) for RCTs with two active treatments when Assumption II.b, the no flip-floppers assumption, is violated. ..................................................... 23 Table 3.5 Principal strata classified by A(1) and A(2) for RCTs with two active treatments when Assumption III, the no partial-compliers assumption, is violated. ............................................... 25 Table 3.6 Summary of the parameter values used to generate the potential outcomes for the nonparametric approach. Scenario A is when all assumptions are satisfied. Scenario B is when Assumption I, the exclusion restriction assumption, is violated. Scenario C is when Assumption II.a, the no irrationalists assumption, is violated. Scenario D is when Assumption II.b, the no flip- floppers assumption, is violated. Scenario E is when Assumption III, the no partial-compliers assumption, is violated. ................................................................................................................. 29 Table 4.1 Principal strata classified by A(1) and A(2) for RCTs with two active treatments under Assumption II, the monotonicity assumption, and Assumption IV, the no never-takers assumption. ....................................................................................................................................................... 41 Table 4.2 Summary of the parameter values used to generate the potential outcomes for the multisite design approach. ............................................................................................................ 52 Table 4.3 Sensitivity of the multisite design estimator across the nine scenarios based on number of sites and site sizes for the continuous outcome when site size is fixed if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. ................................................................ 60 ix Table 4.4 Sensitivity of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites for the continuous outcome when site size is varied if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. .......... 62 Table 5.1 Baseline and level 2 entry characteristics of the patients in the medication augmentation strategy. ......................................................................................................................................... 70 Table 5.2 Observed treatment adherence in the medication augmentation strategy..................... 72 Table 5.3 Side effects in level 2 in the medication augmentation strategy................................... 74 Table 5.4 Population structure when Assumptions I (exclusion restriction), Assumption II (monotonicity), and Assumption III (no partial-compliers) are satisfied for the nonparametric approach. ....................................................................................................................................... 75 Table 5.5 Population structure when Assumptions I (exclusion restriction), Assumption II (monotonicity), and IV (no never-takers) are satisfied for the multisite design approach. .......... 76 Table 5.6 Reasons for missing HAM-D17 scores in the medication augmentation strategy......... 80 Table 5.7 Baseline and level 2 entry characteristics of the patients in the medication switching strategy. ......................................................................................................................................... 86 Table 5.8 Observed treatment adherence in the medication switching strategy. .......................... 88 Table 5.9 Side effects in level 2 in the medication switching strategy. ........................................ 90 Table 5.10 Reasons for missing HAM-D17 scores in the medication switching strategy. ............ 93 Table 5.11 Population structure when Assumption III (no partial-compliers) is violated for the nonparametric approach in the medication augmentation strategy: situation 1. ........................... 97 Table 5.12 Population structure when Assumption III (no partial-compliers) is violated for the nonparametric approach in the medication augmentation strategy: situation 2. ........................... 97 x Table B.1 Sensitivity of the multisite design estimator across the nine scenarios based on number of sites and site sizes for the binary outcome when site size is fixed if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. .............................................................. 111 Table B.2 Sensitivity of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites for the binary outcome when site size is varied if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. ........ 113 xi LIST OF FIGURES Figure 2.1 Causal diagram for RCTs with binary actions and binary compliance status. 𝑍 is the randomly assigned treatment. 𝐴 is the actually taken treatment. 𝑌 is the observed outcome. 𝑈 represents the confounding variables that are either observed or unobserved and affect both 𝐴 and 𝑌. ..................................................................................................................................................... 4 Figure 3.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the nonparametric estimator across proportions of compliers for the continuous outcome when all assumptions are satisfied (scenario A). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. . 32 Figure 3.2 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of compliers for the continuous outcome when Assumption I, the exclusion restriction assumption, is violated (scenario B). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. ............................ 35 Figure 3.3 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of irrationalists for the continuous outcome when Assumption II.a, the no irrationalists assumption, is violated (scenario C). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. ............................ 36 Figure 3.4 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of flip- floppers for the continuous outcome when Assumption II.b, the no flip-floppers assumption, is violated (scenario D). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. ............................ 37 Figure 3.5 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of partial- compliers for the continuous outcome when Assumption III, the no partial-compliers assumption, is violated (scenario E). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. ............................ 38 Figure 4.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the nine scenarios based on number of sites 𝑠 and site sizes 𝑛 for the continuous outcome when site size is fixed. For example, label “s=50, n=25” indicates that the scenario has 50 sites and 25 individuals in each site. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal xii strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. ...................................................................................................................... 55 Figure 4.2 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites 𝑠 for the continuous outcome when site size is varied. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. .................................... 58 Figure 5.1 Linear regressions of site-specific estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 , 𝜋̂6|𝑘 on the site-level aggregated covariates in the medication augmentation strategy. .............. 78 Figure 5.2 Estimated complier average causal effects (CACEs) and their 95% confidence intervals for the primary endpoint and secondary endpoints in the medication augmentation strategy. In each panel, the horizontal solid line represents the null. The dashed vertical line separates the nonparametric estimates on the left-hand side from the multisite design estimates on the right-hand side. The blue square represents main analysis. The green circle represents sensitivity analysis 1. The red triangle represents sensitivity analysis 2. The orange diamond represents sensitivity analysis 3. ...................................................................................................................................... 83 Figure 5.3 Linear regressions of site-specific estimated principal strata proportions π ̂3|k , π ̂4|k , π ̂6|k on the site-level aggregated covariates in the medication switching strategy. ..................... 92 Figure 5.4 Estimated complier average causal effects (CACEs) and their 95% confidence intervals for the primary endpoint and secondary endpoints in the medication switching strategy. In each panel, the horizontal solid line represents the null. The dashed vertical line separates the nonparametric estimates on the left-hand side from the multisite design estimates on the right-hand side. The blue square represents main analysis. The green circle represents sensitivity analysis 1. The red triangle represents sensitivity analysis 2. The orange diamond represents sensitivity analysis 3. ...................................................................................................................................... 95 Figure A.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the nonparametric estimator across proportions of compliers for the binary outcome when all assumptions are satisfied (scenario A). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 104 Figure A.2 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of compliers for the binary outcome when Assumption I, the exclusion restriction assumption, is xiii violated (scenario B). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. .......................... 105 Figure A.3 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of irrationalists for the binary outcome when Assumption II.a, the no irrationalists assumption, is violated (scenario C). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. .......................... 106 Figure A.4 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of flip- floppers for the binary outcome when Assumption II.b, the no flip-floppers assumption, is violated (scenario D). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. ...................................... 107 Figure A.5 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of partial- compliers for the continuous outcome when Assumption III, the no partial-compliers assumption, is violated (scenario E). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. .......................... 108 Figure B.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the nine scenarios based on number of sites 𝑠 and site sizes 𝑛 for the binary outcome when site size is fixed. For example, label “s=50, n=25” indicates that the scenario has 50 sites and 25 individuals in each site. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. .................................................................................................................................... 109 Figure B.2 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites 𝑠 for the binary outcome when site size is varied. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. ................................................................... 110 xiv CHAPTER 1 INTRODUCTION AND AIMS Different estimands correspond to different types of treatment effects and can address fundamentally different research questions. In practice, however, the choices of estimands are often implied through subsequent analyses, leading to a consequence that the estimands misalign with the study objectives (Rubin, 2005). The same issue arises in clinical trials with post- randomization events. Clinical trials, usually randomized controlled trials (RCTs), are the foundation to provide valid estimates of the causal effects because randomization balances the observed and unobserved baseline confounding between treatment groups. To estimate the causal effects of treatments, the top-choice estimand is always the intention-to-treat (ITT) estimand. Nevertheless, with the existence of the post-randomization events such as loss to follow up or treatment nonadherence, the gold-standard ITT estimand is challenged as it merely compares the treatment assignments instead of the actual treatments taken, which loses clinical relevance and conflicts with the research question. In 2010, the National Research Council (NRC) published a report on “Prevention and Treatment of Missing Data in Clinical Trials”, emphasizing the importance of clarifying estimands before trial design and analyses with a focus on missing data (National Research Council, 2010). Last year, beyond the discussion of missingness, the International Council for Harmonisation (ICH) finalized the draft addendum to its E9 guideline on “Estimands and Sensitivity Analysis in Clinical Trials”, further focusing on structuring a framework to define the estimand and clarifying the treatment effects of interest (ICH E9 working group, 2020). It is essential to have a clear understanding and description of the estimands and the causal effects of interest. Alternative 1 estimands besides ITT estimand, aiming at quantifying the causal effects of treatments, need to be considered for causal inference when post-randomization events exist. In this dissertation, we will explore and clarify the potential causal estimands that can be defined in clinical trials with two active treatments when there exists treatment nonadherence and develop the identification strategies to estimate the corresponding causal effects. Our study is motivated by the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study which aims to evaluate the comparative effectiveness of various treatment options for participants with nonpsychotic major depressive disorder (MDD) who fail to have a satisfactory response to the first-line medication citalopram (Rush et al., 2004). Rush et al., (2006) compared treatment switching options: sustained-release bupropion (bupropion-SR), sertraline, or extended-release venlafaxine (venlafaxine-XR) and found insignificant differences among these treatments with respect to remission or response rates. Trivedi et al. (2006) and Bech et al. (2012) showed a slightly better performance of bupropion-SR than buspirone when comparing treatment augmentation options. These studies focused on the ITT estimand among two or more active treatments. Yet questions pertaining to nonadherence with treatments and what causal effects can be identified for which population remain to be discussed. Nonadherence has been well recognized in the placebo-controlled trials and identification of the causal effects has been widely discussed (Angrist et al., 1996; Balke & Pearl, 1997; Imbens & Angrist, 1994; Imbens & Manski, 2004; Jiang et al., 2016; Little et al., 2009; Miratrix et al., 2018; Yuan et al., 2018). The definition and identification becomes more complicated if the comparison occurs between two active treatments rather than between a treatment and a placebo. Roy et al. (2008) and Long et al. (2010) identified the CACE for trials with more than one active treatment. However, their identification was built on parametric models. Yuan et al. (2018) utilized 2 the nature of multisite design which automatically creates multiple instruments from the site-by- treatment interactions, yet their study was essentially a placebo-controlled trial. We extend the methods by Angrist et al. (1996) and Yuan et al. (2018) to our two-active-treatments setting and define CACE without assuming any parametric models. Specifically, we will point-identify our causal estimand of interest rather than partial-identification using bounds (Swanson et al., 2018). We will work under the potential outcome framework (Rubin, 1974; Splawa-Neyman et al., 1990) and the principal stratification framework (Frangakis & Rubin, 2002). Our specific aims are: Aim 1: Define and identify the causal estimand and develop the identification strategies for randomized controlled trials with two active treatments subject to treatment nonadherence. Specific aim 1.1: Define the identification strategies to estimate the corresponding causal effect through a nonparametric approach. Specific aim 1.2: Define the identification strategies to estimate the corresponding causal effect via a multisite design approach. Aim 2: Evaluate the performance of the estimand and assess its sensitivity when there are deviations from key identification assumptions through simulation studies. Specific aim 2.1: Evaluate the performance and sensitivity of the estimand in the nonparametric approach when the exclusion restriction assumption or the two structural assumptions are violated through simulation studies. Specific aim 2.2: Evaluate the performance and sensitivity of the estimand in the multisite design approach when the zero-correlation assumption is violated through simulation studies. Aim 3: Apply the two proposed approaches to the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial to identify the causal effects of two medication augmentation options and the causal effects of two medication switching options. 3 CHAPTER 2 BACKGROUND 2.1 Neyman-Rubin Causal Model We work under the potential outcome framework, which is also known as the Neyman-Rubin Causal Model. For a unit, the causal effect of an action relative to the other is the difference between the outcome had the unit been assigned to one action and the outcome had the unit been assigned to the other action (Rubin, 1974). Each action of the unit corresponds to one potential outcome. However, only the potential outcome associated with the action taken is actually observed, whereas the other is not realized (Holland, 1986; Imbens & Rubin, 2015). Accordingly, we can never observe the causal effect of one action over the other at the unit level. Figure 2.1 Causal diagram for RCTs with binary actions and binary compliance status. 𝑍 is the randomly assigned treatment. 𝐴 is the actually taken treatment. 𝑌 is the observed outcome. 𝑈 represents the confounding variables that are either observed or unobserved and affect both 𝐴 and 𝑌. Formally, consider a placebo-controlled RCT with binary actions and binary compliance status whose causal diagram is shown in Figure 2.1. For unit 𝑖, let 𝑍𝑖 be the treatment randomly assigned, 𝐴𝑖 be the treatment actually taken, 𝑌𝑖 be the observed outcome, and 𝑈𝑖 be the confounding variables that are either observed or unobserved and affect both 𝐴𝑖 and 𝑌𝑖 . We denote 𝑌𝑖 (𝑎) as the potential outcome if the unit takes treatment 𝑎 where 𝑎 = 1 for treatment and 𝑎 = 0 4 for placebo. Similarly, we denote 𝐴𝑖 (𝑧) as the potential treatment taken if the unit is assigned to treatment 𝑧, for 𝑧 𝜖 {0, 1}. Each unit has two potential outcomes 𝑌𝑖 (1) and 𝑌𝑖 (0), but only one of them can be observed. The individual causal effect of treatment is defined as 𝑌𝑖 (1) − 𝑌𝑖 (0), which is generally impossible to obtain (Hernán & Robins, 2019). As we aim to identify the causal effects for a population of interest, we must rely on multiple units to estimate the average causal effects. We define the average causal effects as the difference in average potential outcomes between units taking one treatment and the same units had they taken the other for a certain population, e.g., 𝐸{𝑌(1) − 𝑌(0)}. To make causal inferences and conclude causal relationships from samples with observed variables, we must estimate the outcomes that are not factual. Some general assumptions are needed under the potential outcome framework.  SUTVA (Stable Unit Treatment Value Assumption): The potential outcomes of one unit are not affected by the treatment status of other units and there is only one form or version of each treatment.  Consistency: The potential outcome that corresponds to the treatment taken is actually the observed outcome: 𝑌𝑖 = 𝐴𝑖 𝑌𝑖 (1) + (1 − 𝐴𝑖 )𝑌𝑖 (0)  Exchangeability: The potential outcomes are independent of the treatment status given all confounding variables: (𝑌𝑖 (1), 𝑌𝑖 (0)) ⊥ 𝐴𝑖 |𝑈𝑖  Positivity: The conditional probability of taking different treatments given the confounding variables must be greater than zero and less than one: 0 < Pr(𝐴𝑖 = 𝑎|𝑈𝑖 ) < 1, 𝑤ℎ𝑒𝑟𝑒 𝑎 = 0,1 5 Nonetheless, assumptions are not facts and they can be false. Sometimes we are not even able to test some assumptions from the knowledge at hand (Imbens & Rubin, 2015). If inferences are based on false assumptions, we cannot expect valid estimates. 2.2 Principal Stratification Framework To identify and estimate the average causal effects in a certain population, we must ensure the comparison groups represent the same population. Instead of adjusting for the pretreatment variables, Balke and Pearl (1994) introduced the concept of “response variable” that substitutes the multi-dimensional pretreatment variables and divides the population into equivalent classes so that treatment groups within the same class are comparable. Frangakis and Rubin (2002) conceptualized the term “equivalence classes” as “principal strata” and proposed the principal stratification framework particularly targeting at post-treatment variables. Within their framework, units are cross-classified into different strata “based on their joint potential values of the post- treatment variable under each of the treatments being compared” without introducing post- treatment selection bias. The treatment effects within each stratum are thus indeed causal effects. Applications using principal stratification abound in the literature. Consensus has been reached on the success of this framework for nonadherence issues in RCTs (Pearl, 2011; VanderWeele, 2011). For a placebo-controlled RCT with binary treatments and binary adherence, using notations in section 2.1, the population can be partitioned into four strata based on the pair of potential treatment taken under different treatment assignment, (𝐴𝑖 (1), 𝐴𝑖 (0)), as shown in Table 2.1. 6 Table 2.1 Principal strata classified by A(0) and A(1) for placebo-controlled RCTs. Z=0 A(0)=0 A(0)=1 A(1)=0 Never-taker Defier Z=1 A(1)=1 Complier Always-taker For those that never take the treatment whichever group they are assigned, i.e., (𝐴𝑖 (1) = 0, 𝐴𝑖 (0) = 0), they are defined as never-takers. Those that always take the treatment whichever group they are assigned, i.e., (𝐴𝑖 (1) = 1, 𝐴𝑖 (0) = 1), are referred to as always-takers. Compliers are defined as those who take the treatment if randomized to treatment and take the placebo if randomized to placebo (𝐴𝑖 (1) = 1, 𝐴𝑖 (0) = 0), and defiers are those always do the opposite to what they are assigned (𝐴𝑖 (1) = 0, 𝐴𝑖 (0) = 1). The subpopulations of never-takers, always-takers and defiers are referred to as noncompliers. After the classification, we can easily define any causal effect within a specific stratum. For example, the treatment effect in the complier stratum is called local average treatment effect (LATE) or complier average causal effect (CACE). As the principal strata are defined by the potential outcomes of the post-treatment variable and we can never observe all the potential outcomes simultaneously, it is impossible for us to know exactly which stratum a unit belongs to. Nonetheless, it is still critical to use principal stratification to define causal effects because it helps us clarify which causal quantity we can identify and it is the foundation for subsequent estimation and interpretation. 2.3 Identification of Causal Estimands in RCTs with Treatment Nonadherence An estimand is “the target of estimation for a particular trial objective”. It asks the question “what is to be estimated?” (ICH E9 working group, 2020). Thus, an estimand can be of any form even 7 without a practical interpretation as long as it aligns with the scientific goals. In addition to corresponding to the research objectives, a causal estimand should also provide a reasonable causal interpretation. A fundamental attribute of a causal estimand, especially when the interest lies in estimating causal effects, is that it targets the same group of units (Frangakis & Rubin, 2002). Researchers and scientists often use terms that involve causal implication such as “causation” or “causality” with extreme caution as it is usually easy to confuse “association” which can be an estimand, with “causation” which must be a causal estimand if it is the study objective. In a RCT, the ITT estimand is a causal estimand because randomization ensures the comparison groups represent the same population. However, with the existence of treatment nonadherence, the ITT estimand fails to answer the correct research question if the causal effect of treatment rather than of assignment is of interest. The traditional per-protocol estimand and as- treated estimand, on the other hand, take into account treatment nonadherence. By definition, the traditional per-protocol estimand only includes those who are “observed to follow their treatment assignment” and the as-treated estimand ignores treatment assignment and directly compares the actual treatments received (Shrier et al., 2014). These two estimands describe some particular treatment effects. However, we are hesitant to conclude they are causal estimands because we have limited information to envision what the participants in one group would have done if they had been in the other group. Specifically, since participants are no longer randomized, the treatment groups are not guaranteed to be comparable for the per-protocol estimand and as-treated estimand. Hence, these two estimands will not be included in our subsequent discussion. We consider the simplest way to measure compliance in a RCT which is to dichotomize it as whether participants comply with their assigned treatment or not. For a placebo-controlled RCT with binary treatments and binary compliance, Angrist et al. (1996) point-identified the CACE 8 using instrumental variables (IV) without assuming any parametric models. Their identification was based on two critical assumptions: the exclusion restriction (ER) assumption and the monotonicity assumption. The ER assumption assumes that there is no direct effect of treatment assignment 𝑍 on outcome 𝑌. In other words, the effect of 𝑍 on 𝑌 can only go through the effect of treatment 𝐴 on 𝑌. The monotonicity assumption requires that assignment 𝑍 influences treatment 𝐴 monotonically, i.e., there are no defiers (Imbens & Angrist, 1994). Under these two assumptions, Angrist et al. (1996) identified the CACE as the ratio of the ITT causal effect of 𝑍 on 𝑌 and the ITT causal effect of 𝑍 on 𝐴. Using the same logic of partitioning the population into four principal strata (Table 2.1), Imbens and Rubin (1997) proposed a Bayesian approach to infer the causal estimands of CACE and defier average causal effect (DACE) using EM and data augmentation algorithms for posterior maximum likelihood estimation. Their approach can relax the ER and monotonicity assumptions from the nonparametric IV approach. However, Imbens and Rubin (1997) also found that estimation for CACE can be more accurate compared to the nonparametric method if under these two assumptions. Both Little and Yau (1998) and Little et al. (2009) compared the traditional IV approach and the maximum likelihood (ML) estimation for CACE and argued that the model-based estimator was more efficient than the IV estimator, but was sensitive to model specification and required large sample size. The identification and estimation of CACE has also been extensively studied in the presence of covariates. With covariates, some assumptions for identifying CACE can be satisfied, or removed. Hirano et al. (2000) extended the work of Imbens & Rubin (1997) by incorporating the covariates. They formulated separate ER assumptions for subgroups given covariates and examined the sensitivity to violations of different ERs. They claimed a weakly identified model without ERs which needed a more careful selection of likelihood function and prior distribution. 9 Instead of relying on choosing proper priors and likelihood functions, Jo (2002) explored the identifiability of CACE by relying on limited covariate information. Specifically, Jo (2002) modified the Bayesian approach of Imbens & Rubin (1997) by relaxing the ER assumption and adding two additional functional assumptions which were believed to be more reasonable and applicable in practice. Frangakis et al. (2002) expanded the identification problem associated with noncompliance to clustered data and proposed a Bayesian framework for the clustered encouragement design. Further, they used covariates to relax the ER assumption and predicted the principal strata defined by the potential compliance. Hitherto, the discussion only involved placebo-controlled RCTs. Attention has been paid to RCTs with nonadherence beyond placebo-controlled RCTs as well. Roy et al. (2008) considered the identification of CACE in the setting with two active treatments. Without covariate information, they partially identified CACE by setting bounds under two more assumptions, one of which is the treatment access restriction assumption which disallows subjects in one treatment group to access the treatment in the other group. The other one is the modified monotonicity assumption which assumes that the probability of compliance of one treatment assignment group is higher for those who would comply to the other treatment assignment compared to those who would not. Based on the covariates, however, Roy et al. (2008) were able to model the marginal distributions of the compliance status and point-identify the CACE. Long et al. (2010) proposed a Bayesian approach following Imbens & Rubin (1997) for the RCTs with three treatment arms. Among the treatment arms, two are active treatments and the third one is the control arm. To reduce the number of principal strata, two extra assumptions were made as well. One of the assumptions is no access to the other treatment assumption as in Roy et al. (2008). The other one is a modified monotonicity assumption which assumes that subjects who comply with one 10 treatment would always comply with the other treatment. Under these assumptions, Long et al. (2010) modeled the principal compliance directly without incorporating covariates. Yuan et al. (2018) proposed a novel way to identify certain principal causal effects utilizing the features of a multisite design. They separated their control into two different types to identify their estimands of interest. Similarly, they imposed two structural assumptions to reduce the number of principal strata. Then they made a key assumption about the site-level zero correlation between the distribution of the principal strata and the principal causal effects. Under this strong assumption, they proved that the CACE can be identified. Yuan et al. (2018) also investigated the use of covariates to relax the above strong assumption and obtained a consistent estimator of CACE by conditioning on the covariates. 11 CHAPTER 3 A NONPARAMETRIC APPROACH 3.1 Study Setup and Causal Estimands In this section, we first describe our study setup, that is, a RCT with two active treatments subject to nonadherence, under the Neyman-Rubin potential outcome and principal stratification frameworks. We extend notations introduced in section 2.1. Here, the treatment assignment 𝑍𝑖 takes values 1 and 2, with 1 for treatment 1 and 2 for treatment 2. There are therefore three possible values for the potential treatment received 𝐴𝑖 (𝑧): 0, 1, and 2. Thus, given the three values of 𝐴𝑖 (𝑧) under each treatment, we can define nine principal strata as shown in the 3×3 Table 3.1. Table 3.1 Principal strata classified by A(1) and A(2) for RCTs with two active treatments. Z=2 A(2)=0 A(2)=1 A(2)=2 1. Never-taker 2. Defier 3. Partial-2-complier A(1)=0 (NT) (DF) (P2C) π1 π2 π3 4. Partial-1-complier 5. Always-1-taker 6. Complier Z=1 A(1)=1 (P1C) (A1T) (COMP) π4 π5 π6 7. Defier 8. Defier 9. Always-2-taker A(1)=2 (DF) (DF) (A2T) π7 π8 π9 We number the table cells row by row and denote their corresponding probabilities as 𝜋𝑔 , 𝑔 = 1, 2, … , 9. Analogous to the 2×2 Table 2.1, we define never-takers (NT) and compliers (COMP) in the same way as in section 2.2. The always-takers for the placebo-controlled RCTs have now been separated into the always-1-takers (A1T) who always take treatment 1 regardless of which treatment they are assigned, and the always-2-takers (A2T) who always take treatment 2 12 regardless of which treatment they are assigned. We name those who comply with only one treatment but do not take anything if assigned to the other treatment as partial-compliers (PC). Among the partial compliers, we further differentiate those who only comply with treatment 1 but take nothing if assigned to treatment 2 as partial-1-compliers (P1C) and those vice versa as partial- 2-compliers (P2C). Finally, we call the subpopulations in the three remaining strata defiers (DF) as they do not show any systematically consistent preference but show contradictory actions to their own decisions. For example, for those with (𝐴𝑖 (1) = 2, 𝐴𝑖 (2) = 0), if assigned to treatment 1 they choose to take treatment 2, but if assigned to treatment 2 they choose to taking nothing. It seems that they ordered their preference as nothing>treatment 2>treatment 1. However, if this is the case, they would take nothing if assigned to treatment 1 at the beginning. We further subdivide defiers into two categories. Based on their similarity in decision making for the treatment received, for those with (𝐴𝑖 (1) = 0, 𝐴𝑖 (2) = 1) or (𝐴𝑖 (1) = 2, 𝐴𝑖 (2) = 0), they are named as irrationalists (IR). For those with (𝐴𝑖 (1) = 2, 𝐴𝑖 (2) = 1), they are flip-floppers (FF). Under the above setting, we next define our causal estimand of interest. For individual 𝑖, let 𝑌𝑖 (𝑧, 𝐴𝑖 (𝑧)) be the potential outcome if the individual is randomly assigned to treatment 𝑧 with 𝑧 = 1 or 2. This two-index potential outcome can be simplified to the one-index potential outcome 𝑌𝑖 (𝑧) as they both depend solely on the values of 𝑧. However, the notation 𝑌𝑖 (𝑧) might be confused with our previously defined potential outcome of treatment received 𝑌𝑖 (𝑎). Thus, we will use 𝑌𝑖 (𝑧, 𝐴𝑖 (𝑧)) to avoid confusion in subsequent sections and the one-index potential outcome will always represent the potential outcome under different treatment values 𝑎. It is worth mentioning that both 𝑌𝑖 (𝑧, 𝐴𝑖 (𝑧)) and 𝑌𝑖 (𝑎) depends merely on the values of one variable, which is 𝑧 or 𝑎. There is another potential outcome 𝑌𝑖 (𝑧, 𝑎) that depends on both values of 𝑧 and 𝑎. For example, 13 if the treatment assignment has a direct effect on the outcome, both variables 𝑍𝑖 and 𝐴𝑖 will have an impact on 𝑌𝑖 and the corresponding potential outcome will be 𝑌𝑖 (𝑧, 𝑎). We define the ITT effect comparing treatment assignment 2 with treatment assignment 1 as 𝑌𝑖 (2, 𝐴𝑖 (2)) − 𝑌𝑖 (1, 𝐴𝑖 (1)) at the individual level and as 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2)) − 𝑌𝑖 (1, 𝐴𝑖 (1))] at the population level. With nonadherence in our setting and aiming at identifying the causal effects of treatment received, the ITT estimand is clearly not the one we pursue. The average treatment effect (ATE) estimand compares the potential outcomes of treatment 2 and treatment 1, i.e., 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)] for the entire population. However, ATE is not readily identified without the strong constant effect assumption which assumes the causal effects are the same across individuals. Alternatively, we identify the average treatment effect for a specific principal stratum, the compliers, that is, we identify the average treatment effect for those who will always comply to the treatments to which they are randomly assigned. Thus, in our study setting, we aim at identifying the complier average causal effect, CACE. 3.2 Identification Strategies of the Causal Estimand In section 2.1, we reviewed several general assumptions to identify the causal effects. In this section, we carefully examine extensions of these assumptions to our study setting. First, we modify the consistency assumption that connects the observed variables with the potential variables. As we assign values 1 and 2 to our two treatments, the relationship between the observed 𝐴𝑖 and the potential 𝐴𝑖 (𝑧) is then 𝐴𝑖 = 𝐴𝑖 (1)(2 − 𝑍𝑖 ) + 𝐴𝑖 (2)(𝑍𝑖 − 1) . Similarly, the relationship between the observed 𝑌𝑖 and the potential 𝑌𝑖 (𝑎) becomes 𝑌𝑖 = 𝐴𝑖 (2 − 𝐴𝑖 )𝑌𝑖 (1) + 1 1 𝐴𝑖 (𝐴𝑖 − 1)𝑌𝑖 (2) + 2 (𝐴𝑖 − 1)(𝐴𝑖 − 2)𝑌𝑖 (0). Second, we modify the exchangeability assumption 2 to include the three potential outcomes, that is, (𝑌𝑖 (0), 𝑌𝑖 (1), 𝑌𝑖 (2)) ⊥ 𝐴𝑖 | 𝑈𝑖 . To find the 14 identification strategies for our causal estimand of interest, we make some additional assumptions as follows. Assumption I (Exclusion restriction, ER): There is no direct effect from 𝑍 to 𝑌, i.e., the effect of 𝑍 on 𝑌 can only go through the effect of 𝐴 on 𝑌, or 𝑌𝑖 (𝑧, 𝐴𝑖 (𝑧)) = 𝑌𝑖 (𝐴𝑖 (𝑧)) for 𝑧 = 1 or 2. Assumption II (Monotonicity or No defiers): There are no individuals with (𝐴𝑖 (1) = 0, 𝐴𝑖 (2) = 1), (𝐴𝑖 (1) = 2, 𝐴𝑖 (2) = 0), or (𝐴𝑖 (1) = 2, 𝐴𝑖 (2) = 2), i.e., π2 = π7 = π8 = 0. Assumption II.a (No irrationalists): There are no individuals with (𝐴𝑖 (1) = 0, 𝐴𝑖 (2) = 1) or (𝐴𝑖 (1) = 2, 𝐴𝑖 (2) = 0), i.e., π2 = π7 = 0. Assumption II.b (No flip-floppers): There are no individuals with (𝐴𝑖 (1) = 2, 𝐴𝑖 (2) = 1), i.e., π8 = 0. The monotonicity assumption is plausible as we assume that when participants consent to enter a trial, it is less likely for them to completely disobey the treatment implementation procedure without showing any reasonable logic. We also separate the assumption into two sub-assumptions for our sensitivity discussion later. Assumption III (No partial-compliers): There are no individuals with (𝐴𝑖 (1) = 0, 𝐴𝑖 (2) = 2) or (𝐴𝑖 (1) = 1, 𝐴𝑖 (2) = 0), i.e., π3 = π4 = 0. This assumption states that if participants choose to comply with one treatment, it is implausible that they choose to take nothing when assigned to the other treatment. It is more specific to the study environment. There is a possibility that it can be violated and it is not straightforward to justify it. We will discuss deviation from this assumption in section 3.3.3. With the above assumptions, we now formulate the causal effect of 𝐴 on 𝑌. First, under the SUTVA, consistency and exclusion restriction assumptions, we establish a relationship 15 between the ITT effect of 𝑍 on 𝑌, i.e., 𝑌𝑖 (2, 𝐴𝑖 (2)) − 𝑌𝑖 (1, 𝐴𝑖 (1)) and the causal effect of 𝐴 on 𝑌 at the individual level. That is, 𝑌𝑖 (2, 𝐴𝑖 (2)) – 𝑌𝑖 (1, 𝐴𝑖 (1)) = 𝑌𝑖 (𝐴𝑖 (2)) – 𝑌𝑖 (𝐴𝑖 (1)) 1 1 = [𝐴𝑖 (2)(2 − 𝐴𝑖 (2))𝑌𝑖 (1) + 2 𝐴𝑖 (2)(𝐴𝑖 (2) − 1)𝑌𝑖 (2) + 2 (𝐴𝑖 (2) − 1)(𝐴𝑖 (2) − 2)𝑌𝑖 (0)] 1 1 − [𝐴𝑖 (1)(2 − 𝐴𝑖 (1))𝑌𝑖 (1) + 2 𝐴𝑖 (1)(𝐴𝑖 (1) − 1)𝑌𝑖 (2) + 2 (𝐴𝑖 (1) − 1)(𝐴𝑖 (1) − 2)𝑌𝑖 (0)] 1 1 = − 2 (𝐴𝑖 (2) − 𝐴𝑖 (1))(𝑌𝑖 (2) − 𝑌𝑖 (1)) + 2 (𝐴𝑖 (2) − 𝐴𝑖 (1))(𝐴𝑖 (2) + 𝐴𝑖 (1))(𝑌𝑖 (2) − 𝑌𝑖 (1)) 3 + (𝐴𝑖 (2) − 𝐴𝑖 (1))(2 − 𝐴𝑖 (2) − 𝐴𝑖 (1))(𝑌𝑖 (1) − 𝑌𝑖 (0)) (3.1) The first equality of equation (3.1) holds under Assumption I which reduces the two-index potential outcome 𝑌𝑖 (𝑧, 𝐴𝑖 (𝑧)) to the one-index potential outcome 𝑌𝑖 (𝐴𝑖 (𝑧)) for 𝑧 = 1 or 2. The second equality follows the consistency assumption. Next, we can write down the average ITT effect by taking expectations to both the left- and right-hand sides of equation (3.1): 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2)) – 𝑌𝑖 (1, 𝐴𝑖 (1))] 1 = − 2 𝐸[(𝐴𝑖 (2) − 𝐴𝑖 (1))(𝑌𝑖 (2) − 𝑌𝑖 (1))] 1 + 2 𝐸[(𝐴𝑖 (2) − 𝐴𝑖 (1))(𝐴𝑖 (2) + 𝐴𝑖 (1))(𝑌𝑖 (2) + 𝑌𝑖 (1))] 3 + 𝐸[(𝐴𝑖 (2) − 𝐴𝑖 (1))(2 − 𝐴𝑖 (2) − 𝐴𝑖 (1))(𝑌𝑖 (1) − 𝑌𝑖 (0))] 1 1 = [2 𝐸(𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐴𝑖 (2) − 𝐴𝑖 (1) = −1) − 2 𝐸(𝑌𝑖 (2) − 𝑌𝑖 (0)|𝐴𝑖 (2) − 𝐴𝑖 (1) = −1) 1 − 2 𝐸(𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐴𝑖 (2) − 𝐴𝑖 (1) = −1)]𝑃𝑟(𝐴𝑖 (2) − 𝐴𝑖 (1) = −1) + [−𝐸(𝑌(2) − 𝑌(1)|𝐴(2) − 𝐴(1) = 2) + 2𝐸(𝑌(2) − 𝑌(0)|𝐴(2) − 𝐴(1) = 2) − 𝐸(𝑌(1) − 𝑌(0)|𝐴(2) − 𝐴(1) = 2)]𝑃𝑟(𝐴(2) − 𝐴(1) = 2) 1 3 + [− 2 𝐸(𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐴𝑖 (2) − 𝐴𝑖 (1) = 1) + 2 𝐸(𝑌𝑖 (2) − 𝑌𝑖 (0)|𝐴𝑖 (2) − 𝐴𝑖 (1) = 1) 16 3 − 𝐸(𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐴𝑖 (2) − 𝐴𝑖 (1) = 1)]𝑃𝑟(𝐴𝑖 (2) − 𝐴𝑖 (1) = 1) 2 = 𝐸[𝑌𝑖 (0) − 𝑌𝑖 (1)|𝐴𝑖 (2) − 𝐴𝑖 (1) = −1] 𝑃𝑟(𝐴𝑖 (2) − 𝐴𝑖 (1) = −1) + 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (0)|𝐴𝑖 (2) − 𝐴𝑖 (1) = 2] 𝑃𝑟(𝐴𝑖 (2) − 𝐴𝑖 (1) = 2) + 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐴𝑖 (2) − 𝐴𝑖 (1) = 1]𝑃𝑟(𝐴𝑖 (2) − 𝐴𝑖 (1) = 1) = 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐴𝑖 (2) − 𝐴𝑖 (1) = 1]𝑃𝑟(𝐴𝑖 (2) − 𝐴𝑖 (1) = 1) (3.2) The second equality in equation (3.2) holds under Assumption II and the last equality is true under Assumption III. Thus, with the two structural assumptions: Assumption II and III, we can reduce the nine principal strata to four, as shown in Table 3.2. Notice that, the right-hand side of equation (3.2) is a product of the complier average causal effect and the probability of 𝐴𝑖 (2) − 𝐴𝑖 (1) = 1. This probability is actually the average ITT effect of 𝑍 on 𝐴, i.e., 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)], because 𝐴𝑖 (2) − 𝐴𝑖 (1) = 0 for NT, A1T and A2T. Therefore, we combine our findings into the following proposition. Table 3.2 Principal strata classified by A(1) and A(2) for RCTs with two active treatments under Assumption II, the monotonicity assumption, and Assumption III, the no partial-compliers assumption. Z=2 A(2)=0 A(2)=1 A(2)=2 1. Never-taker A(1)=0 (NT) π1 5. Always-1-taker 6. Complier Z=1 A(1)=1 (A1T) (COMP) π5 π6 9. Always-2-taker A(1)=2 (A2T) π9 17 Proposition I: For a RCT with two active treatments subject to nonadherence, the complier average causal effect (CACE) can be identified under Assumption I, II, and III as the ratio of the ITT effect of 𝑍 on 𝑌 to the ITT effect of 𝑍 on 𝐴: 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2)) – 𝑌𝑖 (1, 𝐴𝑖 (1))] 𝐶𝐴𝐶𝐸 = 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐴𝑖 (2) − 𝐴𝑖 (1) = 1] = 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] To estimate CACE under Proposition I, we need therefore to estimate both the ITT effects of 𝑍 on 𝑌 and 𝑍 on 𝐴 . Due to randomization and consistency, the ITT effect of 𝑍 on 𝑌 , 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))] will be equal to 𝐸(𝑌𝑖 |𝑍𝑖 = 2) − 𝐸(𝑌𝑖 |𝑍𝑖 = 1). Then, we can use the sample average to unbiasedly estimate 𝐸(𝑌𝑖 |𝑍𝑖 = 𝑧), 𝑧 = 1, 2, i.e., ∑𝑖𝜖{𝑖:𝑍𝑖 =2} 𝑌𝑖 ∑𝑖𝜖{𝑖:𝑍𝑖 =1} 𝑌𝑖 ̂𝑦 = 𝐼𝑇𝑇 − 𝑁2 𝑁1 where 𝑁𝑧 is the number of individuals that are assigned to treatment 𝑧 for 𝑧 = 1, 2. Similarly, the unbiased estimator for the ITT effect of 𝑍 on 𝐴 is ∑𝑖𝜖{𝑖:𝑍𝑖 =2} 𝐴𝑖 ∑𝑖𝜖{𝑖:𝑍𝑖 =1} 𝐴𝑖 ̂𝑎 = 𝐼𝑇𝑇 − 𝑁2 𝑁1 Thus, the estimated CACE is ∑𝑖𝜖{𝑖:𝑍 =2} 𝑌𝑖 ∑𝑖𝜖{𝑖:𝑍 =1} 𝑌𝑖 𝑖 − 𝑖 ̂𝑦 𝐼𝑇𝑇 ̂ = 𝐶𝐴𝐶𝐸 = 𝑁2 𝑁1 ∑𝑖𝜖{𝑖:𝑍 =2} 𝐴𝑖 ∑𝑖𝜖{𝑖:𝑍 =1} 𝐴𝑖 (3.3) ̂𝑎 𝐼𝑇𝑇 𝑖 𝑖 − 𝑁2 𝑁1 3.3 Sensitivity of the Causal Estimand to Key Assumptions As our identification strategies for CACE rely on several assumptions, in this section, we discuss how violations of key assumptions affect our causal estimand of interest. In other words, when these assumptions are violated, what causal effect will we be estimating using equation (3.3)? We consider deviations from these assumptions one at a time. 18 3.3.1 Violation of Assumption I (Exclusion Restriction) Under the ER assumption, there will be no direct effect of treatment assignments on the outcomes. However, in some situations, the randomization may have an impact on the outcomes by stimulating other paths that go directly to the outcomes without passing through the treatment received. For example, in a behavioral trial, knowing which treatment arms they are assigned may change the participants’ lifestyle, e.g., their smoking status, their workout plan, etc., and subsequently affect the health outcomes of interest. Thus, we assess the sensitivity of our causal estimand to the violation of the ER assumption while maintaining all the other assumptions. We rewrite the average ITT effect without the ER assumption as 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))] = 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃] 𝑃𝑟(𝐶𝑂𝑀𝑃) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝑁𝑇] 𝑃𝑟(𝑁𝑇) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐴1𝑇] 𝑃𝑟(𝐴1𝑇) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐴2𝑇] 𝑃𝑟(𝐴2𝑇) (3.4) We let the direct effect of 𝑍 on 𝑌 be 𝐻𝑖 = 𝑌𝑖 (2, 𝑎) − 𝑌𝑖 (1, 𝑎) and the direct effect of 𝐴 on 𝑌 be 𝐺𝑖 = 𝑌𝑖 (𝑧, 2) − 𝑌𝑖 (𝑧, 1). Then, for compliers, if we assume an additive effect of 𝑍 and 𝐴 on 𝑌 (Angrist et al., 1996), equation (3.4) can be rewritten as 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌(1, 𝐴𝑖 (1))] 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))] = 𝑃𝑟(𝐶𝑂𝑀𝑃) 𝑃𝑟(𝑁𝑇) 𝑃𝑟(𝐴1𝑇) 𝑃𝑟(𝐴2𝑇) = 𝐸[𝐺 + 𝐻|𝐶𝑂𝑀𝑃] + 𝐸[𝐻|𝑁𝑇] 𝑃𝑟(𝐶𝑂𝑀𝑃) + 𝐸[𝐻|𝐴1𝑇] 𝑃𝑟(𝐶𝑂𝑀𝑃) + 𝐸[𝐻|𝐴2𝑇] 𝑃𝑟(𝐶𝑂𝑀𝑃) 𝑃𝑟(𝐶𝑂𝑀𝑃) 𝑃𝑟(𝑁𝑇) = 𝐸[𝐺|𝐶𝑂𝑀𝑃] + 𝐸(𝐻|𝐶𝑂𝑀𝑃) 𝑃𝑟(𝐶𝑂𝑀𝑃) + 𝐸[𝐻|𝑁𝑇] 𝑃𝑟(𝐶𝑂𝑀𝑃) 19 𝑃𝑟(𝐴1𝑇) 𝑃𝑟(𝐴2𝑇) + 𝐸[𝐻|𝐴1𝑇] 𝑃𝑟(𝐶𝑂𝑀𝑃) + 𝐸[𝐻|𝐴2𝑇] 𝑃𝑟(𝐶𝑂𝑀𝑃) 𝐸(𝐻) = 𝐶𝐴𝐶𝐸 + 𝑃𝑟(𝐶𝑂𝑀𝑃) (3.5) Note that the CACE in equation (3.5) is defined as follows which is different from the CACE when the ER assumption is satisfied. 𝐶𝐴𝐶𝐸 = 𝐸[𝐺|𝐶𝑂𝑀𝑃] = 𝐸[𝑌𝑖 (𝑧, 2) − 𝑌𝑖 (𝑧, 1)|𝐶𝑂𝑀𝑃] = 𝐸[𝑌𝑖 (1, 2) − 𝑌𝑖 (1, 1)] = 𝐸[𝑌𝑖 (2, 2) − 𝑌𝑖 (2, 1)] The ITT effect of 𝑍 on 𝐴 will remain equal to the probability of compliers because there is no change of the population composition when the ER assumption is violated. In other words, the population still contains the four subpopulations that are from strata 1, 5, 6, and 9. Either increasing the proportion of compliers or decreasing the direct effects of treatment assignments on the outcomes will reduce the bias, as shown in equation (3.5). 3.3.2 Violation of Assumption II (Monotonicity) Two of the assumptions (II and III) needed to identify our causal estimand of interest are structural assumptions. Deviation from these assumptions will result in a change of the population composition. In this section, we investigate the violation of the first structural assumption, i.e., monotonicity. Specifically, we will look at violation of the no irrationalists assumption (II.a) which will add principal strata 2 and 7 into our population and violation of the no flip-floppers assumption (II.b) which will add principal stratum 8 into our population. 20 3.3.2.1 Violation of Assumption II.a (No Irrationalists) As the structure of the principal strata has changed (Table 3.3), the ITT effect of 𝑍 on 𝐴 is no longer equal to the proportion of compliers. For irrationalists, we refer to those with (𝐴(1) = 0, 𝐴(2) = 1) as irrationalists 1 (IR1) and those with (𝐴(1) = 2, 𝐴(2) = 0) as irrationalists 2 (IR2) for the subsequent formulation. Table 3.3 Principal strata classified by A(1) and A(2) for RCTs with two active treatments when Assumption II.a, the no irrationalists assumption, is violated. Z=2 A(2)=0 A(2)=1 A(2)=2 1. Never-taker 2. Irrationalist 1 A(1)=0 (NT) (IR1) π1 π2 5. Always-1-taker 6. Complier Z=1 A(1)=1 (A1T) (COMP) π5 π6 7. Irrationalist 2 9. Always-2-taker A(1)=2 (IR2) (A2T) π7 π9 The ITT effect of 𝑍 on 𝐴 becomes 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] = (1 − 0) × 𝑃𝑟(𝐼𝑅1) + (0 − 2) × 𝑃𝑟(𝐼𝑅2) + (2 − 1) × 𝑃𝑟(𝐶𝑂𝑀𝑃) = 𝑃𝑟(𝐼𝑅1) − 2𝑃𝑟(𝐼𝑅2) + 𝑃𝑟(𝐶𝑂𝑀𝑃) = 𝑃𝑟(𝐶𝑂𝑀𝑃) − [2𝑃𝑟(𝐼𝑅2) − 𝑃𝑟(𝐼𝑅1)] Then, our nonparametric estimator becomes 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))] 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] 𝑃𝑟(𝐼𝑅1) = 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐼𝑅1] 𝑃𝑟(𝐶𝑂𝑀𝑃)−[2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1)] 21 𝑃𝑟(𝐼𝑅2) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐼𝑅2] 𝑃𝑟(𝐶𝑂𝑀𝑃)−[2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1)] 2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃] [1 + 𝑃𝑟(𝐶𝑂𝑀𝑃)−[2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1)]] 𝑃𝑟(𝐼𝑅) = 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐼𝑅] 𝑃𝑟(𝐶𝑂𝑀𝑃)−[2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1)] + 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐶𝑂𝑀𝑃] 2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃] 𝑃𝑟(𝐶𝑂𝑀𝑃)−[2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1)] = 𝐶𝐴𝐶𝐸 𝑃𝑟(𝐼𝑅) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐼𝑅] 𝑃𝑟(𝐶𝑂𝑀𝑃)−[2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1)] 2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃] (3.6) 𝑃𝑟(𝐶𝑂𝑀𝑃)−[2𝑃𝑟(𝐼𝑅2)−𝑃𝑟(𝐼𝑅1)] The last two items of equation (3.6) consist of the bias which relates to both the principal strata proportions and the principal strata causal effects. Suppose the participants in the IR stata are symmetric, i.e., the proportions of IR1 and IR2 are the same. Then, under this symmetry in principal strata assumption for irrationalists, the bias from equation (3.6) can be simplified to 𝑃𝑟(𝐼𝑅1) 𝐵𝑖𝑎𝑠 = 𝑃𝑟(𝐶𝑂𝑀𝑃)−𝑃𝑟(𝐼𝑅1) × {2𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐼𝑅] + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃]} (3.7) The bias formula in equation (3.7) explicitly shows that bias from violation of the no- irrationalists assumption will be inversely proportional to the relative proportion of compliers to irrationalists. Equation (3.7) also shows that the bias will always exist even if the ITT effect of irrationalists is zero as long as there are irrationalists in the population unless the ITT effect of irrationalists is exactly half of the opposite of the complier average causal effect (CACE). 22 3.3.2.2 Violation of Assumption II.b (No Flip-Floppers) The principal strata of the population after violating the no flip-floppers assumption are shown in Table 3.4. Similar as above, the ITT effect of 𝑍 on 𝐴 is 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] = (2 − 1) × 𝑃𝑟(𝐶𝑂𝑀𝑃) + (1 − 2) × 𝑃𝑟(𝐹𝐹) = 𝑃𝑟(𝐶𝑂𝑀𝑃) − 𝑃𝑟(𝐹𝐹) Thus, the nonparametric estimator becomes 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))] 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] 𝑃𝑟(𝐶𝑂𝑀𝑃) 𝑃𝑟(𝐹𝐹) = 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐶𝑂𝑀𝑃] 𝑃𝑟(𝐶𝑂𝑀𝑃)−𝑃𝑟(𝐹𝐹) − 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐹𝐹] 𝑃𝑟(𝐶𝑂𝑀𝑃)−𝑃𝑟(𝐹𝐹) 𝑃𝑟(𝐹𝐹) 𝑃𝑟(𝐹𝐹) = 𝐶𝐴𝐶𝐸 + 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐶𝑂𝑀𝑃] 𝑃𝑟(𝐶𝑂𝑀𝑃)−𝑃𝑟(𝐹𝐹) − 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐹𝐹] 𝑃𝑟(𝐶𝑂𝑀𝑃)−𝑃𝑟(𝐹𝐹) 𝑃𝑟(𝐹𝐹) = 𝐶𝐴𝐶𝐸 + 𝑃𝑟(𝐶𝑂𝑀𝑃)−𝑃𝑟(𝐹𝐹) {𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐶𝑂𝑀𝑃] − 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐹𝐹]} (3.8) Table 3.4 Principal strata classified by A(1) and A(2) for RCTs with two active treatments when Assumption II.b, the no flip-floppers assumption, is violated. Z=2 A(2)=0 A(2)=1 A(2)=2 1. Never-taker A(1)=0 (NT) π1 5. Always-1-taker 6. Complier Z=1 A(1)=1 (A1T) (COMP) π5 π6 8. Flip-floppers 9. Always-2-taker A(1)=2 (FF) (A2T) π8 π9 The second item of equation (3.8) forms the bias formula when the no flip-floppers assumption is violated. Similar implications can be found as for the irrationalists. Namely, the bias can be reduced if the relative proportion of compliers to flip-floppers increases. Only if there is no 23 variation in the causal effects of the treatment received for compliers and flip-floppers, that is, the causal effects of treatment received is constant across the principal strata, the bias will be zero. 3.3.3 Violation of Assumption III (No Partial-Complier) We discuss the sensitivity of our causal estimand to the violation of the second structural assumption: no partial-compliers in this section. The structure of the principal strata is changed to Table 3.5. The ITT effect of 𝑍 on 𝐴 is 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] = (2 − 0) × 𝑃𝑟(𝑃2𝐶) + (0 − 1) × 𝑃𝑟(𝑃1𝐶) + (2 − 1) × Pr(𝐶𝑂𝑀𝑃) = 2𝑃𝑟(𝑃2𝐶) − 𝑃𝑟(𝑃1𝐶) + 𝑃𝑟(𝐶𝑂𝑀𝑃) The nonparametric estimator then becomes 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))] 𝐸[𝐴𝑖 (2) − 𝐴𝑖 (1)] 𝑃𝑟(𝑃2𝐶) = 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝑃2𝐶] 2𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃) 𝑃𝑟(𝑃1𝐶) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝑃1𝐶] 2𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃) 2 𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃] [1 − 2𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃)] 𝑃𝑟(𝑃𝐶) = 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝑃𝐶] 2𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃) + 𝐸[𝑌𝑖 (2) − 𝑌𝑖 (1)|𝐶𝑂𝑀𝑃] 𝑃𝑟(𝑃1𝐶)−2𝑃𝑟(𝑃2𝐶) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃] 2𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃) = 𝐶𝐴𝐶𝐸 𝑃𝑟(𝑃𝐶) + 2𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃) 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝑃𝐶] 24 𝑃𝑟(𝑃1𝐶)−2𝑃𝑟(𝑃2𝐶) + 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃] (3.9) 2𝑃𝑟(𝑃2𝐶)−𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃) We can derive the bias from the last two items of equation (3.9). Similarly, under the symmetry in principal strata assumption for partial-compliers, the proportion of partial-1- compliers is identical to that of partial-2-compliers, the bias can be simplified to 𝑃𝑟(𝑃1𝐶) 𝐵𝑖𝑎𝑠 = 𝑃𝑟(𝑃1𝐶)+𝑃𝑟(𝐶𝑂𝑀𝑃) × {2𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝑃𝐶] − 𝐸[𝑌𝑖 (2, 𝐴𝑖 (2))– 𝑌𝑖 (1, 𝐴𝑖 (1))|𝐶𝑂𝑀𝑃]} (3.10) Again, the bias will decrease as the relative proportion of compliers to partial-compliers increase. Even when the ITT effect of partial-compliers is zero, the estimator is still biased. Only if the ITT effect of partial-compliers is on average half of the complier average causal effect will the bias be zero. Table 3.5 Principal strata classified by A(1) and A(2) for RCTs with two active treatments when Assumption III, the no partial-compliers assumption, is violated. Z=2 A(2)=0 A(2)=1 A(2)=2 1. Never-taker 3. Partial-2-complier A(1)=0 (NT) (P2C) π1 π3 4. Partial-1-complier 5. Always-1-taker 6. Complier Z=1 A(1)=1 (P1C) (A1T) (COMP) π4 π5 π6 9. Always-2-taker A(1)=2 (A2T) π9 25 3.4 Simulation Studies 3.4.1 Simulation Setup In this section, we introduce a series of simulations to evaluate the performance and sensitivity of our nonparametric estimator in equation (3.3). The data generating process follows 5 steps: 1) Generate the treatment assignment. We use a Bernoulli distribution with a probability of 0.5 to generate the treatment assignment 𝑍𝑖 for individual 𝑖. 2) Generate the nine principal strata. We use a multinomial distribution with nine pre-specified probabilities 𝜋1 , 𝜋2 , … , 𝜋9 , where ∑9𝑔=1 𝜋𝑔 = 1 to generate the principal strata 𝐺𝑖 for individual 𝑖. 3) Obtain the observed treatment received. Given the principal strata 𝐺𝑖 generated in step 2), we can obtain the values of the potential treatment received (𝐴𝑖 (1), 𝐴𝑖 (2)) for individual 𝑖. Then, with the known treatment assignment 𝑍𝑖 , the observed treatment received 𝐴𝑖 will be equal to 𝐴𝑖 (1) if 𝑍𝑖 = 1 and 𝐴𝑖 (2) if 𝑍𝑖 = 2. 4) Generate the potential outcomes. We consider both the continuous outcomes and the binary outcomes. We generate the potential outcomes 𝑌𝑖 (1, 𝐴𝑖 (1)) and 𝑌𝑖 (2, 𝐴𝑖 (2)) separately. a. For the continuous potential outcomes, 𝑌𝑖 (1, 𝐴𝑖 (1)) = 𝛼0 + 𝛼1 𝑔1𝑖 + 𝛼2 𝑔2𝑖 + 𝛼3 𝑔3𝑖 + 𝛼4 𝑔4𝑖 + 𝛼5 𝑔5𝑖 + 𝛼6 𝑔6𝑖 + 𝛼7 𝑔7𝑖 + 𝛼8 𝑔8𝑖 + 𝛼9 𝑔9𝑖 + 𝛿1𝑧 𝐼𝑧𝑖 + 𝜖𝑖 𝑌𝑖 (2, 𝐴𝑖 (2)) = 𝛽0 + 𝛽1 𝑔1𝑖 + 𝛽2 𝑔2𝑖 + 𝛽3 𝑔3𝑖 + 𝛽4 𝑔4𝑖 + 𝛽5 𝑔5𝑖 + 𝛽6 𝑔6𝑖 + 𝛽7 𝑔7𝑖 + 𝛽8 𝑔8𝑖 + 𝛽9 𝑔9𝑖 + 𝛿2𝑧 𝐼𝑧𝑖 + 𝜖𝑖 where 𝑔𝑗𝑖 is an indicator variable with 𝑔𝑗𝑖 = 1 if individual 𝑖 is in stratum 𝑗 and 0 otherwise. 𝐼𝑧𝑖 indicates the treatment assignment with 𝐼𝑧𝑖 = 1 if individual 𝑖 is assigned to treatment 1 and 0 if assigned to treatment 2. The one error term 𝜖𝑖 , shared by the two counterfactuals, follows a standard normal distribution 𝑁(0, 1). 26 b. For the binary potential outcomes, 𝑙𝑜𝑔𝑖𝑡(Pr(𝑌𝑖 (1, 𝐴𝑖 (1)) = 1)) = 𝛼0 + 𝛼1 𝑔1𝑖 + 𝛼2 𝑔2𝑖 + 𝛼3 𝑔3𝑖 + 𝛼4 𝑔4𝑖 + 𝛼5 𝑔5𝑖 + 𝛼6 𝑔6𝑖 + 𝛼7 𝑔7𝑖 + 𝛼8 𝑔8𝑖 + 𝛼9 𝑔9𝑖 + 𝛿1𝑧 𝐼𝑧𝑖 𝑙𝑜𝑔𝑖𝑡(Pr(𝑌𝑖 (2, 𝐴𝑖 (2)) = 1)) = 𝛽0 + 𝛽1 𝑔1𝑖 + 𝛽2 𝑔2𝑖 + 𝛽3 𝑔3𝑖 + 𝛽4 𝑔4𝑖 + 𝛽5 𝑔5𝑖 + 𝛽6 𝑔6𝑖 + 𝛽7 𝑔7𝑖 + 𝛽8 𝑔8𝑖 + 𝛽9 𝑔9𝑖 + 𝛿2𝑧 𝐼𝑧𝑖 where the definition of the notations are the same as the continuous case. 5) Obtain the observed outcome. Given the treatment assignment 𝑍𝑖 and the potential outcomes 𝑌𝑖 (𝑧, 𝐴𝑖 (𝑧)), 𝑧 = 1, 2 generated in step 4), the observed outcome 𝑌𝑖 will be 𝑌𝑖 (1, 𝐴𝑖 (1)) if 𝑍𝑖 = 1 and 𝑌𝑖 (2, 𝐴𝑖 (2)) if 𝑍𝑖 = 2. After all data are generated, we can then obtain the estimate of our causal estimand of interest from the sample quantities using equation (3.3). We run five sets of simulations for the five scenarios A to E for both the continuous outcomes and the binary outcomes. Scenario A is to evaluate the performance of the nonparametric estimator when all assumptions are met. Scenarios B to E are the sensitivity analysis when one of the assumptions described in section 3.2 is violated. We summarize the parameters used to generate the potential outcomes for each scenario in Table 3.6. The true value of CACE is set to be 6 for the continuous case and is around 0.35 for the binary case. To simplify exposition, we call the irrationalists, flip-floppers, and partial-compliers violators as their existence is due to violations of a certain assumption. For each scenario, we vary the principal strata proportion of compliers from 50% to 75% with an increment of 5%. On the other hand, for scenarios B to E, we fix the total principal strata proportions for never-takers, always-1- takers, and always-2-takers to be 20%. Then, the principal strata proportions of the violators will vary from 30% to 5% corresponding to the varying principal strata proportion of compliers. As a result, the relative proportions of compliers to violators will be increasing as the principal strata 27 proportion of compliers is increasing and the principal strata proportions of the violators are decreasing. We consider various sample sizes with 𝑛 = 500, 1000 or 2000. As there are six principal strata proportions, three sample sizes, two types of outcomes and five scenarios, we will therefore run 6 × 3 × 2 × 5 = 180 sets of simulations in total each with 10000 iterations. In addition, within each iteration of the simulation, we draw 2000 bootstrap samples to calculate the bootstrap standard errors of our nonparametric estimator. 28 Table 3.6 Summary of the parameter values used to generate the potential outcomes for the nonparametric approach. Scenario A is when all assumptions are satisfied. Scenario B is when Assumption I, the exclusion restriction assumption, is violated. Scenario C is when Assumption II.a, the no irrationalists assumption, is violated. Scenario D is when Assumption II.b, the no flip- floppers assumption, is violated. Scenario E is when Assumption III, the no partial-compliers assumption, is violated. Continuous Outcome Binary Outcome Scenario A B C D E A B C D E 𝛼0 0 0 0 0 0 0 0 0 0 0 𝛼1 -3 -3 -3 -3 -3 0.5 0.5 0.5 0.5 0.5 𝛼2 0 0 -1 0 0 0 0 -2 0 0 𝛼3 0 0 0 0 -1 0 0 0 0 -3 𝛼4 0 0 0 0 7 0 0 0 0 0.2 𝛼5 8 8 8 8 8 0.5 0.5 0.5 0.5 0.5 𝛼6 4 4 4 4 4 -0.5 -0.5 -0.5 -0.5 -0.5 𝛼7 0 0 5 0 0 0 0 0.3 0 0 𝛼8 0 0 0 6 0 0 0 0 0.2 0 𝛼9 12 12 12 12 12 0.5 0.5 0.5 0.5 0.5 𝛽0 0 0 0 0 0 0 0 0 0 0 𝛽1 -3 -3 -3 -3 -3 0.5 0.5 0.5 0.5 0.5 𝛽2 0 0 3 0 0 0 0 -0.2 0 0 𝛽3 0 0 0 0 10 0 0 0 0 0.3 𝛽4 0 0 0 0 -1 0 0 0 0 -2 𝛽5 8 8 8 8 8 0.5 0.5 0.5 0.5 0.5 𝛽6 10 10 10 10 10 1 1 1 1 1 𝛽7 0 0 -4 0 0 0 0 -3 0 0 𝛽8 0 0 0 3 0 0 0 0 -0.8 0 𝛽9 12 12 12 12 12 0.5 0.5 0.5 0.5 0.5 𝛿1𝑧 0 -1 0 0 0 0 -0.5 0 0 0 𝛿2𝑧 0 -2 0 0 0 0 -0.3 0 0 0 29 3.4.2 Performance Metrics We use the following metrics to evaluate the performance and sensitivity of our nonparametric estimator. 1. Mean bias: the average of the difference between the estimated CACE and the true CACE, ∑10000 ̂ 𝑖=1 (𝐶𝐴𝐶𝐸𝑖 − 𝐶𝐴𝐶𝐸) 10000 2. Percent bias (%): the average of the bias relative to the true CACE, ∑10000 ̂ 𝑖=1 (𝐶𝐴𝐶𝐸𝑖 − 𝐶𝐴𝐶𝐸) × 100 10000 × 𝐶𝐴𝐶𝐸 3. Mean standard error: the average of the standard errors from each iteration of the simulations. ∑10000 ̂ 𝑖=1 𝑆𝐸(𝐶𝐴𝐶𝐸𝑖 ) 10000 ̂ ) is estimated by 2000 bootstrapped samples within a simulation. where each SE(𝐶𝐴𝐶𝐸 4. Root mean squared error: the square root of the average of the squared bias, 10000 ̂ 2 √∑𝑖=1 (𝐶𝐴𝐶𝐸𝑖 − 𝐶𝐴𝐶𝐸) 10000 5. Standard error ratio: the ratio of the mean standard error to the standard deviation of the 10000 estimated CACE 6. Coverage rate (%): the percent of the 95% confidence intervals that cover the true CACE. The 95% confidence interval is calculated as ̂ − 1.96 × 𝑆𝐸, 𝐶𝐴𝐶𝐸 (𝐶𝐴𝐶𝐸 ̂ + 1.96 × 𝑆𝐸) 30 3.4.3 Results We present the results of scenarios A to E for the continuous outcomes here. Results for the binary outcomes show similar patterns and can be found in Appendix A. 3.4.3.1 Performance Analysis The top two panels of Figure 3.1 report the mean bias and percent bias of the estimator. Our nonparametric method can yield slightly biased estimators for CACE as both the bias and percent bias approach to zero when 𝑛 = 500, 1000 𝑜𝑟 2000. The estimators are less sensitive to the principal strata proportion of compliers when there is an adequate number of individuals in the sample. For example, when 𝑛 = 2000 , the estimators gather around the true CACE with approximately zero bias and percent bias for the various principal strata proportions of compliers. However, when 𝑛 = 500, the bias and percent bias are more scattered across the principal strata proportions of compliers. This is consistent with the results of mean standard errors that we discuss below. In addition, this pattern is as expected because the size of compliers will always be larger in a sample with higher percentage of compliers than in a sample of the same sample size but with lower percentage of compliers. As we are estimating the average treatment effect in compliers, the larger size of compliers will therefore produce relatively smaller bias. 31 Figure 3.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the nonparametric estimator across proportions of compliers for the continuous outcome when all assumptions are satisfied (scenario A). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 32 Similar patterns are observed for the mean standard errors and root mean squared errors as shown in the middle two panels of Figure 3.1. The larger the sample size, the higher the principal strata proportion of compliers, the smaller the mean standard errors. We anticipate that the root mean squared errors will have the same pattern as the mean standard errors. That is, the larger the sample size, the larger the complier size, the smaller the root mean squared errors, the better the estimators. We use the standard error ratios to compare the mean standard errors with the empirical standard errors of the estimator. The bottem left corner of Figure 3.1 shows that the standard error ratios are all roughly one, suggesting that the bootstrap standard errors of the nonparametric estimator can accurately estimate the empirical standard deviations of the estimated CACEs. The coverage rates, which are shown at the bottom right corner of Figure 3.1, are around 95% when 𝑛 = 1000 𝑜𝑟 2000. However, they are inflated when 𝑛 = 500, especially for samples with lower principal strata proportions of compliers. 3.4.3.2 Sensitivity Analysis From the bias formula derived in section 3.3 when violations of Assumption I, II, and III occur, in general, the bias will depend on the principal strata proportions as well as certain treatment effects including the direct effects of treatment assignments on the outcomes and the principal strata ITT effects. However, we will only focus on evaluating the consequnces from changes of the principal strata proportions because 1) the direct impact from violations of the assumptions is a change of the population composion; and 2) either the direct effects of treatment assignments or the principal strata ITT effects can be thought as inherent properties of the principal strata which will be a constant within a principal stratum. As a result, we will not be interested in the magnitudes or 33 signs of the bias. We will, on the other hand, be interested in whether there will be some trends or patterns from the changes of the principal strata proportions. We discuss the bias due to violation of ER separately from the bias due to the three types of violators because bias due to violation of ER is more sensitive to the specific values of the direct effects of treatment assignments. Although increasing the percentage of compliers in a sample will decrease the bias as shown in Figure 3.2, it will not approach to zero unless the direct effects of treatment assignments are closer to zero. However, as indicated in Figure 3.3, Figure 3.4, and Figure 3.5, the bias due to violations of the structural assumptions can be as close as zero if the relative proportion of compliers to violators is sufficiently large. Thus, we are able to minimize the bias due to the violators by maximizing the percentage of compliers and minimizing the percentage of violators. Increasing sample size will not reduce the bias, but it will make the estimators more precise as shown in the standard error panels of Figure 3.2, Figure 3.3, Figure 3.4, and Figure 3.5. The standard errors will also be smaller when there are more compliers or less violators in the sample. The root mean squared errors show the performance of an estimator. Similar to the patterns of the bias and the mean standard errors, increasing the proportion of compliers or decreasing the proportions of violators will lower the root mean squared errors and make the estimators better. 34 Figure 3.2 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of compliers for the continuous outcome when Assumption I, the exclusion restriction assumption, is violated (scenario B). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 35 Figure 3.3 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of irrationalists for the continuous outcome when Assumption II.a, the no irrationalists assumption, is violated (scenario C). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 36 Figure 3.4 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of flip- floppers for the continuous outcome when Assumption II.b, the no flip-floppers assumption, is violated (scenario D). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 37 Figure 3.5 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of partial- compliers for the continuous outcome when Assumption III, the no partial-compliers assumption, is violated (scenario E). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 38 CHAPTER 4 A MULTISITE DESIGN APPROACH The nonparametric approach identifies our causal estimand of interest by imposing several assumptions. In this chapter, we introduce a method based on the multisite design of a study. Different from the placebo-controlled trials that only has one endogenous variable, our study setup involves two endogenous variables because the treatment received 𝐴𝑖 can take three values: 0, 1, or 2. These two endogenous variables are the indicator variables generated from 𝐴𝑖 . Namely, one indicates whether individual 𝑖 takes treatment 1 or not and the other indicates whether takes treatment 2 or not. With one single instrument, i.e., the treatment assignment 𝑍𝑖 and more than one endogenous variables, it is generally not possible to identify any causal effects without further considerations. However, the identification of causal effects becomes possible in study settings with multiple sites even without looking for extra new instruments. If we assume that the casual structure will be the same across sites for a multisite study, multiple instruments are therefore generated from the site-by-instrument interactions. This idea has the potential to be used in research on causal effects. Raudenbush and Bloom (2015) underlined the importance of learning about the distribution of treatment effects through multisite trials. Reardon and Raudenbush (2013) derived assumptions required to identify the causal effects of multiple mediators using the site-by- treatment instruments. Yuan et al. (2018) proposed a method similar to Reardon and Raudenbush (2013) and estimated the principal strata causal effects rather than causal effects of mediators in a placebo-controlled multisite trial. Our study is an extension to the approach by Yuan et al. (2018) as it involves two active treatments. It also differs from the approach by Reardon and Raudenbush (2013) although there are two mediators (the aforementioned two indicators) in our study. The 39 effects of the two mediators are from comparisons of either treatment 1 or treatments 2 versus taking nothing. Even if we can take the difference of these two effects to obtain the effect of treatment 2 versus treatment 1, it targets at a population that is not of our interest. We will discuss our method in more detail in the following sections. 4.1 Identification Strategies of the Causal Estimand We divide the population into the same nine principal strata as described in section 3.1. To identify our causal estimand of interest CACE via the multisite design, we make an additional structural assumption. Assumption IV (No never-takers): There are no individuals with (𝐴𝑖 (1) = 0, 𝐴𝑖 (2) = 0), i.e., π1 = 0. This assumption states that there are no participants who choose to take nothing or to not follow the protocol as required no matter which treatment they are assigned. It can be satisfied if we assume that participants are willing to take the treatment as required when they are eligible to enroll in a trial and sign the consent form. Under Assumption II and Assumption IV, the nine principal strata are reduced to five, as shown in Table 4.1. Combined with Assumption I (Exclusion Restriction), the overall ITT effect of 𝑍 on 𝑌 in site 𝑘 is 𝐼𝑇𝑇𝑦|𝑘 = 𝐼𝑇𝑇3|𝑘 𝜋3|𝑘 + 𝐼𝑇𝑇4|𝑘 𝜋4|𝑘 + 𝐼𝑇𝑇5|𝑘 𝜋5|𝑘 + 𝐼𝑇𝑇6|𝑘 𝜋6|𝑘 + 𝐼𝑇𝑇9|𝑘 𝜋9|𝑘 = 𝐼𝑇𝑇3|𝑘 𝜋3|𝑘 + 𝐼𝑇𝑇4|𝑘 𝜋4|𝑘 + 𝐶𝐴𝐶𝐸𝑘 𝜋6|𝑘 (4.1) where ITTy|k = 𝐸(𝑌(2, 𝐴(2)) − 𝑌(1, 𝐴(1))|𝑆 = 𝑘) with S representing sites and 𝑘 = 1, … , K ; ITTg|k = 𝐸(𝑌(2, 𝐴(2)) − 𝑌(1, 𝐴(1))|𝐺 = 𝑔, 𝑆 = 𝑘) is the stratum-specific ITT effect of Z on Y in site k and the principal strata 𝑔, for 𝑔 = 3, 4, 5, 6, 9; and πg|k = Pr(𝐺 = 𝑔|𝑆 = 𝑘) is the site- 40 specific principal stratum proportion for stratum 𝑔 in site 𝑘. The first equality of equation (4.1) is the decomposition of the overall ITT effect into the weighted stratum-specific ITT effects with weights equal to the principal strata proportions. The last equality is due to Assumption I under which 𝐼𝑇𝑇5|𝑘 = 𝐼𝑇𝑇9|𝑘 = 0. Table 4.1 Principal strata classified by A(1) and A(2) for RCTs with two active treatments under Assumption II, the monotonicity assumption, and Assumption IV, the no never-takers assumption. Z=2 A(2)=0 A(2)=1 A(2)=2 3. Partial-2-complier A(1)=0 (P2C) π3 4. Partial-1-complier 5. Always-1-taker 6. Complier Z=1 A(1)=1 (P1C) (A1T) (COMP) π4 π5 π6 9. Always-2-taker A(1)=2 (A2T) π9 As our causal estimand of interest is the CACE for the entire population instead of the CACE for any site, equation (4.1) can be rewritten as 𝐼𝑇𝑇𝑦|𝑘 = 𝐼𝑇𝑇3|𝑘 𝜋3|𝑘 + 𝐼𝑇𝑇4|𝑘 𝜋4|𝑘 + 𝐶𝐴𝐶𝐸𝑘 𝜋6|𝑘 = 𝐼𝑇𝑇3 𝜋3|𝑘 + 𝐼𝑇𝑇4 𝜋4|𝑘 + 𝐶𝐴𝐶𝐸𝜋6|𝑘 + (𝐼𝑇𝑇3|𝑘 − 𝐼𝑇𝑇3 )𝜋3|𝑘 + (𝐼𝑇𝑇4|𝑘 − 𝐼𝑇𝑇4 )𝜋4|𝑘 + (𝐶𝐴𝐶𝐸𝑘 − 𝐶𝐴𝐶𝐸)𝜋6|𝑘 = 𝐼𝑇𝑇3 𝜋3|𝑘 + 𝐼𝑇𝑇4 𝜋4|𝑘 + 𝐶𝐴𝐶𝐸𝜋6|𝑘 + 𝜖𝑘 (4.2) where 𝜖𝑘 = (𝐼𝑇𝑇3|𝑘 − 𝐼𝑇𝑇3 )𝜋3|𝑘 + (𝐼𝑇𝑇4|𝑘 − 𝐼𝑇𝑇4 )𝜋4|𝑘 + (𝐶𝐴𝐶𝐸𝑘 − 𝐶𝐴𝐶𝐸)𝜋6|𝑘 ; 𝐼𝑇𝑇3 , 𝐼𝑇𝑇4 and 𝐶𝐴𝐶𝐸 are the population principal strata causal effects for principal strata 3, 4, and 6 respectively, i.e., 𝐼𝑇𝑇3 = 𝐸(𝐼𝑇𝑇3|𝑘 ), 𝐼𝑇𝑇4 = 𝐸(𝐼𝑇𝑇4|𝑘 ), and 𝐶𝐴𝐶𝐸 = 𝐸(𝐶𝐴𝐶𝐸𝑘 ). 41 Equation (4.2) shows a form of multiple linear regression with site-specific ITT effect 𝐼𝑇𝑇𝑦|𝑘 as the outcome, site-specific principal strata proportions 𝜋3|𝑘 , 𝜋4|𝑘 and 𝜋6|𝑘 as the regressors , and 𝜖𝑘 as the error term. As we have K sites, we will have K such equations. We can envision our multisite study setup as follows: suppose we draw K sites from an infinite number of ′ sites, we assume that the site-level parameters 𝐐 = (𝐼𝑇𝑇3|𝑘 𝐼𝑇𝑇4|𝑘 𝐶𝐴𝐶𝐸𝑘 𝜋3|𝑘 𝜋4|𝑘 𝜋6|𝑘 ) are independent and identically distributed and follow a distribution with the population-level parameters 𝐌 = (𝐼𝑇𝑇3 𝐼𝑇𝑇4 𝐶𝐴𝐶𝐸 𝜋3 𝜋4 𝜋6 )′ as the mean vector and a 6 × 6 variance- covariance matrix 𝜮 whose element σij denotes the covariance between ith and jth element of 𝐐. Given this setup, there is one more assumption needed to identify the CACE using a multiple linear regression. That is, the error term 𝜖𝑘 is required to be uncorrelated with the regressors 𝜋3|𝑘 , 𝜋4|𝑘 , and 𝜋6|𝑘 and has a mean of zero. Specifically, if 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜖𝑘 ) = 𝑐𝑜𝑣(𝜋4|𝑘 , 𝜖𝑘 ) = 𝑐𝑜𝑣(𝜋6|𝑘 , 𝜖𝑘 ) = 0 and E(𝜖𝑘 ) = 0, we can identify the population CACE as the coefficient of 𝜋6|𝑘 in the multiple linear regression. A sufficient condition of this requirement yields the following assumption. Assumption V (Zero correlation): The site-specific principal strata proportions are uncorrelated with the site-specific principal strata causal effects. Specifically, 𝐶𝑜𝑣(𝜋3|𝑘 , 𝐼𝑇𝑇3|𝑘 ) = 𝐶𝑜𝑣(𝜋3|𝑘 , 𝐼𝑇𝑇4|𝑘 ) = 𝐶𝑜𝑣(𝜋3|𝑘 , 𝐶𝐴𝐶𝐸𝑘 ) = 0, 𝐶𝑜𝑣(𝜋4|𝑘 , 𝐼𝑇𝑇3|𝑘 ) = 𝐶𝑜𝑣(𝜋4|𝑘 , 𝐼𝑇𝑇4|𝑘 ) = 𝐶𝑜𝑣(𝜋4|𝑘 , 𝐶𝐴𝐶𝐸𝑘 ) = 0, 𝐶𝑜𝑣(𝜋6|𝑘 , 𝐼𝑇𝑇3|𝑘 ) = 𝐶𝑜𝑣(𝜋6|𝑘 , 𝐼𝑇𝑇4|𝑘 ) = 𝐶𝑜𝑣(𝜋6|𝑘 , 𝐶𝐴𝐶𝐸𝑘 ) = 0. Assumption V implies that the expected value of the error term 𝜖𝑘 is 𝐸(𝜖𝑘 ) = 𝐸 ((𝐼𝑇𝑇3|𝑘 − 𝐼𝑇𝑇3 )𝜋3|𝑘 + (𝐼𝑇𝑇4|𝑘 − 𝐼𝑇𝑇4 )𝜋4|𝑘 + (𝐶𝐴𝐶𝐸𝑘 − 𝐶𝐴𝐶𝐸)𝜋6|𝑘 ) = 𝐸(𝐼𝑇𝑇3|𝑘 𝜋3|𝑘 ) + 𝐸(𝐼𝑇𝑇4|𝑘 𝜋4|𝑘 ) + 𝐸(𝐶𝐴𝐶𝐸𝑘 𝜋6|𝑘 ) 42 − 𝐼𝑇𝑇3 𝐸(𝜋3|𝑘 ) − 𝐼𝑇𝑇4 𝐸(𝜋4|𝑘 ) − 𝐸(𝜋6|𝑘 )𝐶𝐴𝐶𝐸 = 𝑐𝑜𝑣(𝜋3|𝑘 , 𝐼𝑇𝑇3|𝑘 ) + 𝐸(𝜋3|𝑘 )𝐸(𝐼𝑇𝑇3|𝑘 ) + 𝑐𝑜𝑣(𝜋4|𝑘 , 𝐼𝑇𝑇4|𝑘 ) + 𝐸(𝜋4|𝑘 )𝐸(𝐼𝑇𝑇4|𝑘 ) + 𝑐𝑜𝑣(𝜋6|𝑘 , 𝐶𝐴𝐶𝐸𝑘 ) + 𝐸(𝜋6|𝑘 )𝐸(𝐶𝐴𝐶𝐸𝑘 ) − 𝐼𝑇𝑇3 𝐸(𝜋3|𝑘 ) − 𝐼𝑇𝑇4 𝐸(𝜋4|𝑘 ) − 𝐸(𝜋6|𝑘 )𝐶𝐴𝐶𝐸 =0 Likewise, Assumption V also implies that 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜖𝑘 ) = 𝑐𝑜𝑣(𝜋4|𝑘 , 𝜖𝑘 ) = 𝑐𝑜𝑣(𝜋6|𝑘 , 𝜖𝑘 ) = 0. We provide the proof for 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜖𝑘 ) = 0 below. The same logic applies to the other two conditions 𝑐𝑜𝑣(𝜋4|𝑘 , 𝜖𝑘 ) = 0 and 𝑐𝑜𝑣(𝜋6|𝑘 , 𝜖𝑘 ) = 0. 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜖𝑘 ) = 𝑐𝑜𝑣(𝜋3|𝑘 , (𝐼𝑇𝑇3|𝑘 − 𝐼𝑇𝑇3 )𝜋3|𝑘 + (𝐼𝑇𝑇4|𝑘 − 𝐼𝑇𝑇4 )𝜋4|𝑘 + (𝐶𝐴𝐶𝐸𝑘 − 𝐶𝐴𝐶𝐸)𝜋6|𝑘 ) = 𝑐𝑜𝑣(𝜋3|𝑘 , 𝐼𝑇𝑇3|𝑘 𝜋3|𝑘 ) + 𝑐𝑜𝑣(𝜋3|𝑘 , 𝐼𝑇𝑇4|𝑘 𝜋4|𝑘 ) + 𝑐𝑜𝑣(𝜋3|𝑘 , 𝐶𝐴𝐶𝐸𝑘 𝜋6|𝑘 ) − 𝐼𝑇𝑇3 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋3|𝑘 ) − 𝐼𝑇𝑇4 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋4|𝑘 ) − 𝐶𝐴𝐶𝐸𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋6|𝑘 ) 2 = 𝐸(𝜋3|𝑘 𝐼𝑇𝑇3|𝑘 ) − 𝐸(𝜋3|𝑘 )𝐸(𝐼𝑇𝑇3|𝑘 𝜋3|𝑘 ) + 𝐸(𝜋3|𝑘 𝐼𝑇𝑇4|𝑘 𝜋4|𝑘 ) − 𝐸(𝜋3|𝑘 )𝐸(𝐼𝑇𝑇4|𝑘 𝜋4|𝑘 ) + 𝐸(𝜋3|𝑘 𝐶𝐴𝐶𝐸𝑘 𝜋6|𝑘 ) − 𝐸(𝜋3|𝑘 )𝐸(𝐶𝐴𝐶𝐸𝑘 𝜋6|𝑘 ) − 𝐼𝑇𝑇3 𝑣𝑎𝑟(𝜋3|𝑘 ) − 𝐼𝑇𝑇4 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋4|𝑘 ) − 𝐶𝐴𝐶𝐸𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋6|𝑘 ) 2 = 𝐸(𝜋3|𝑘 )𝐸(𝐼𝑇𝑇3|𝑘 ) − 𝐸(𝜋3|𝑘 ) (𝑐𝑜𝑣(𝜋3|𝑘 , 𝐼𝑇𝑇3|𝑘 ) + 𝐸(𝐼𝑇𝑇3|𝑘 )𝐸(𝜋3|𝑘 )) + 𝐸(𝜋3|𝑘 𝜋4|𝑘 )𝐸(𝐼𝑇𝑇4|𝑘 ) − 𝐸(𝜋3|𝑘 ) (𝑐𝑜𝑣(𝜋4|𝑘 , 𝐼𝑇𝑇4|𝑘 ) + 𝐸(𝐼𝑇𝑇4|𝑘 )𝐸(𝜋4|𝑘 )) + 𝐸(𝜋3|𝑘 𝜋6|𝑘 )𝐸(𝐶𝐴𝐶𝐸𝑘 ) − 𝐸(𝜋3|𝑘 ) (𝑐𝑜𝑣(𝜋6|𝑘 , 𝐶𝐴𝐶𝐸𝑘 ) + 𝐸(𝐶𝐴𝐶𝐸𝑘 )𝐸(𝜋6|𝑘 )) − 𝐼𝑇𝑇3 𝑣𝑎𝑟(𝜋3|𝑘 ) − 𝐼𝑇𝑇4 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋4|𝑘 ) − 𝐶𝐴𝐶𝐸𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋6|𝑘 ) 43 2 2 = 𝐸(𝐼𝑇𝑇3|𝑘 ) (𝐸(𝜋3|𝑘 ) − 𝐸(𝜋3|𝑘 ) ) + 𝐸(𝐼𝑇𝑇4|𝑘 ) (𝐸(𝜋3|𝑘 𝜋4|𝑘 ) − 𝐸(𝜋3|𝑘 )𝐸(𝜋4|𝑘 )) + 𝐸(𝐶𝐴𝐶𝐸𝑘 )(𝐸(𝜋3|𝑘 𝜋6|𝑘 ) − 𝐸(𝜋3|𝑘 )𝐸(𝜋6|𝑘 )) − 𝐼𝑇𝑇3 𝑣𝑎𝑟(𝜋3|𝑘 ) − 𝐼𝑇𝑇4 𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋4|𝑘 ) − 𝐶𝐴𝐶𝐸𝑐𝑜𝑣(𝜋3|𝑘 , 𝜋6|𝑘 ) =0 Assumption V is a nontrivial assumption and may not be empirically verifiable. We will explore the sensitivity of this assumption through simulation studies in section 4.4. Combining all the assumptions mentioned above results in Proposition II. Proposition II: For a RCT with two active treatments subject to nonadherence, the complier average causal effect (CACE) can be identified under Assumption I, II, IV, and V via a multisite design. 4.2 Estimation of the Causal Estimand To identify CACE using equation (4.2) , it requires us to first identify the proportions of individuals in principal strata 3, 4, and 6 for each site, i.e., π3|k , π4|k , and π6|k . In other words, we need to identify which principal stratum each individual belongs to in site 𝑘. As the principal strata are determined by the potential treatment received A(Z) and it is impossible to observe both A(1) and A(2) for one individual simultaneously, thus it is impossible to determine to which principal stratum an individual belongs. We can, however, estimate these proportions by using the observed quantities. As the observed treatment assignment Z takes two values and the observed treatment receipt A takes three values, there are in total six possible Z and 𝐴 combinations that we can observe. These six combinations are mixtures of the five principal strata shown in Table 4.1. 44 We can therefore estimate the principal strata proportions through the proportions of the observed combinations. Let pa|zk = Pr(𝐴 = 𝑎|𝑍 = 𝑧, 𝑆 = 𝑘) be the probabilities of the six observed combinations in site 𝑘 where a = 0,1,2; z = 1,2; and ∑𝑎=0,1,2 𝑝𝑎|𝑧𝑘 = 1. We can calculate pa|zk by simply taking the ratio of Na|zk and Nz|k where Na|zk is the number of individuals that are assigned to treatment z and take treatment a in site 𝑘; Nz|k is the total number of individuals that are assigned to treatment 𝑧 in site 𝑘. Then, we can immediately estimate π3|k , π4|k , π5|k and π9|k from p0|1k , p0|2k , p1|2k and p2|1k as follows N0|1k 𝜋̂3|𝑘 = p0|1k = , N1|k N0|2k 𝜋̂4|𝑘 = p0|2k = , N2|k N1|2k 𝜋̂5|𝑘 = p1|2k = , N2|k N2|1k 𝜋̂9|𝑘 = p2|1k = , N1|k because randomization ensures that the principal strata proportions are the same across treatment arms. Take estimating the proportion of partial-2-compliers (π3|k ) as an example. First, under Assumptions II and IV, we believe that those who are assigned to treatment 1 and take nothing are all partial-2-compliers. Then, we obtain the proportion of partial-2-compliers from those who are assigned to treatment 1 for site k, which is p0|1k . Due to randomization, the proportion of partial- 2-compliers in the entire population will be equal to the proportion of partial-2-compliers in each treatment assignment arm of the site, i.e., Pr(𝐴𝑖 (1) = 0, 𝐴𝑖 (2) = 2|𝑆 = 𝑘) = Pr(𝐴𝑖 (1) = 0, 𝐴𝑖 (2) = 2|𝑆 = 𝑘, 𝑍𝑖 = 𝑧) for 𝑧 = 1, 2. Therefore, we can estimate π3|k from p0|1k for site 𝑘. Similarly, to estimate the proportion of compliers, we first observe those who are assigned to 45 N2|2k treatment 2 and also take treatment 2 and calculate this quantity as p2|2k = . These individuals N2|k are a mixture of partial-2-compliers (principal stratum 3), compliers (principal stratum 6) and always-2-takers (principal stratum 9). As we have obtained 𝜋̂3|𝑘 and 𝜋̂9|𝑘 , the proportion of compliers is estimated by N2|2k N0|1k N2|1k 𝜋̂6|𝑘 = p2|2k − 𝜋̂3|𝑘 − 𝜋̂9|𝑘 = p2|2k − p0|1k − p2|1k = − − N2|k N1|k N1|k This formula also provides a way of falsifying Assumption IV. Specifically, if we use the above formula to derive the estimated population principal stratum proportion 𝜋̂6 , there is a chance that 𝜋̂6 can be negative, therefore enabling us to falsify Assumption IV if needed. After identifying the principal strata proportions, we also need to identify the ITT effect of 𝑍 on 𝑌 for each site 𝑘 . We know that 𝐼𝑇𝑇𝑦|𝑘 = 𝐸(𝑌(2, 𝐴(2)) − 𝑌(1, 𝐴(1))|𝑆 = 𝑘) = 𝐸(𝑌|𝑍 = 2, 𝑆 = 𝑘) − 𝐸(𝑌|𝑍 = 1, 𝑆 = 𝑘) where the second equality is due to randomization and consistency. Then, we can use the sample average to estimate 𝐸(𝑌|𝑍 = 𝑧, 𝑆 = 𝑘), 𝑧 = 1,2: ∑𝑖𝜖{𝑖:𝑍𝑖 =2} 𝑌𝑖|𝑘 ∑𝑖𝜖{𝑖:𝑍𝑖 =1} 𝑌𝑖|𝑘 ̂ 𝑦|𝑘 = 𝐼𝑇𝑇 − 𝑁2|𝑘 𝑁1|𝑘 where 𝑌𝑖|𝑘 is the observed outcome for individual 𝑖 in site 𝑘. With the estimated ITT effects and the estimated principal strata proportions, we can use equation (4.3) to identify our causal estimand of interest via a multiple linear regression, that is, ̂ 𝑦|𝑘 = 𝐼𝑇𝑇3 𝜋̂3|𝑘 + 𝐼𝑇𝑇4 𝜋̂4|𝑘 + 𝐶𝐴𝐶𝐸𝜋̂6|𝑘 + 𝜖̂𝑘 𝐼𝑇𝑇 (4.3) 4.3 The Problems of Omitted Variables and Measurement Errors As our multisite design approach relies on Assumption V, the zero correlation assumption, violation of this assumption will result in bias in the estimates, which is the so-called omitted- 46 variable bias. In addition, because we use the estimated site-specific principal strata proportions and the estimated site-specific ITT effects to identify CACE rather than the true principal strata proportions and the true ITT effect, the measurement errors in these quantities will also bias the estimates. For exposition, we explore the issues of measurement errors and omitted variables in matrix format in this section. Generally, consider the true model 𝒀 = 𝑿𝜽 + 𝑼𝜸 + 𝝐 𝑋11 … 𝑋1𝑝 where 𝒀 = (𝑌1 ⋯ 𝑌𝑛 )′ is a 𝑛 × 1 column vector of the true outcomes; 𝑿 = ( ⋮ ⋱ ⋮ ) 𝑋𝑛1 … 𝑋𝑛𝑝 𝑈11 … 𝑈1𝑞 is a 𝑛 × 𝑝 matrix of the regressors; 𝑼 = ( ⋮ ⋱ ⋮ ) is a 𝑛 × 𝑞 matrix of the unobserved 𝑈𝑛1 … 𝑈𝑛𝑞 confounding variables that are correlated with some or all components of 𝑿; 𝜽 = (𝜃1 ⋯ 𝜃𝑝 )′ is a 𝑝 × 1 column vector for the coefficients of 𝑿; 𝜸 = (𝛾1 ⋯ 𝛾𝑞 )′ is a 𝑞 × 1 column vector for the coefficients of 𝑼; and 𝝐 = (𝜖1 ⋯ 𝜖𝑛 )′ is a 𝑛 × 1 column vector of the error terms that follow a standard normal distribution and are independent of 𝑿 and 𝑼. Now, let 𝑽 be the observed outcome vector that measures 𝒀 with classic measurement error 𝜹, i.e., 𝑽 = 𝒀 + 𝜹, where 𝜹 = (𝛿1 ⋯ 𝛿𝑛 )′ is a 𝑛 × 1 column vector representing the measurement errors in 𝑽 and assumed to be independent of 𝒀. Similarly, let 𝑾 be the observed regressors matrix that measures 𝑿 with 𝑒11 … 𝑒1𝑝 classic measurement errors 𝒆, i.e., 𝑾 = 𝑿 + 𝒆, where 𝒆 = ( ⋮ ⋱ ⋮ ) is a 𝑛 × 𝑝 matrix 𝑒𝑛1 … 𝑒𝑛𝑝 representing the measurement errors in 𝑾 and assumed to be independent of 𝑿. Due to the unobserved confounding variables (the omitted variables) and the measurement errors in both outcomes and regressors, the actual regression model becomes 47 𝑽 = 𝑾𝜽∗ + 𝝐∗ Thus, the ordinary least square (OLS) estimator of 𝜽∗ is ̂∗ = (𝑾′ 𝑾)−1 𝑾′ 𝑽 𝜽 = (𝑾′ 𝑾)−1 𝑾′ (𝒀 + 𝜹) = (𝑾′ 𝑾)−1 𝑾′ (𝑿𝜽 + 𝑼𝜸 + 𝝐 + 𝜹) = (𝑾′ 𝑾)−1 𝑾′ 𝑿𝜽 + (𝑾′ 𝑾)−1 𝑾′ 𝑼𝜸 + (𝑾′ 𝑾)−1 𝑾′ (𝝐 + 𝜹) (4.4) Taking expectation of equation (4.4), we have 𝐸(𝜽̂∗ ) = 𝐸 (𝐸(𝜽 ̂∗ |𝑾)) = 𝐸(𝐸((𝑾′ 𝑾)−1 𝑾′ 𝑿𝜽 + (𝑾′ 𝑾)−1 𝑾′ 𝑼𝜸 + (𝑾′ 𝑾)−1 𝑾′ (𝝐 + 𝜹)|𝑾)) = 𝐸(𝐸((𝑾′ 𝑾)−1 𝑾′ (𝑾 − 𝒆)𝜽|𝑾)) + 𝐸(𝐸((𝑾′ 𝑾)−1 𝑾′ 𝑼𝜸|𝑾)) + 𝐸(𝐸((𝑾′ 𝑾)−1 𝑾′ (𝝐 + 𝜹)|𝑾)) = 𝜽 − 𝐸(𝐸((𝑾′ 𝑾)−1 𝑾′ 𝒆𝜽|𝑾)) + 𝐸(𝐸((𝑾′ 𝑾)−1 𝑾′ 𝑼𝜸|𝑾)) = 𝜽 + 𝐸((𝑾′ 𝑾)−1 𝑾′ 𝑼𝜸 − (𝑾′ 𝑾)−1 𝑾′ 𝒆𝜽) = 𝜽 + 𝐸((𝑾′ 𝑾)−1 (𝑾′ 𝑼𝜸 − 𝑾′ 𝒆𝜽)) = 𝜽 + 𝐸((𝑾′ 𝑾)−1 ((𝑿 + 𝒆)′ 𝑼𝜸 − (𝑿 + 𝒆)′ 𝒆𝜽)) = 𝜽 + 𝐸((𝑾′ 𝑾)−1 (𝑿′ 𝑼𝜸 − 𝒆′ 𝒆𝜽)) (4.5) The last item of the third equality equals to zero because the conditional expectations of 𝝐 and 𝜹 are zero. The last equality holds because 𝒆 is independent of 𝑼 and 𝑿. Thus, we can derive the bias formula from equation (4.5) when there is omitted variables as well as measurement errors in regressors and outcomes. That is Bias = 𝐸(𝜽̂∗ ) − 𝜽 = 𝐸((𝑾′ 𝑾)−1 (𝑿′ 𝑼𝜸 − 𝒆′ 𝒆𝜽)) (4.6) 48 To apply the general bias formula in equation (4.6) into our study setting, first we 𝜋3|1 𝜋4|1 𝜋6|1 transform the notations in section 4.1 into matrix format. Let 𝝅 = ( ⋮ ⋮ ⋮ ) be the 𝜋3|𝐾 𝜋4|𝐾 𝜋6|𝐾 𝐾 × 3 matrix representing the true principal strata proportions and the population principal strata causal effects vector 𝜷 = (𝐼𝑇𝑇3 𝐼𝑇𝑇4 𝐶𝐴𝐶𝐸)′ is the coefficients of 𝝅. Correspondingly, 𝝅 ̂= 𝜋̂3|1 𝜋̂4|1 𝜋̂6|1 ( ⋮ ⋮ ⋮ ) will be the 𝐾 × 3 matrix for the estimated principal strata proportions. 𝜋̂3|𝐾 𝜋̂4|𝐾 𝜋̂6|𝐾 Thus, the measurement error matrix 𝒆 becomes 𝒆 = 𝝅 ̂ − 𝝅. We still use the notation 𝑼 for the unobserved site-level covariates with 𝜸 as its coefficient vector, and the dimensions of 𝑼 is 𝐾 × 1. With these notations, plugging 𝝅̂ for 𝑾 and 𝝅 for 𝑿 into equation (4.6), we have Bias = E((𝝅̂ ′𝝅 ̂ )−1 (𝛑′ 𝐔𝛄 − 𝒆′ 𝒆𝜽)) (4.7) for our multisite study. This formula implies that the bias will not only depend on the variance of 𝝅 ̂ which is the sum of the variance of 𝛑 and the variance of 𝒆 but also on the covariance between 𝛑 and 𝑼. Specifically, the bias will be impacted by the variance of the true principal strata proportions, the variance of the measurement errors in estimating the true principal strata proportions, as well as the covariance between the true principal strata proportions and the unobserved site-level covariates. These implications will be explored through our simulations. 4.4 Simulation Studies 4.4.1 Simulation Setup We design a two-stage data generating process for our simulations. For the site-level data, we use a truncated multivariate normal distribution with mean vector 𝒎 = (0.05 0.05 0.5)′ and 49 0.01 variance-covariance matrix 𝚪 = (−0.0025 0.01 ) to generate the site-specific −0.0025 −0.0025 0.01 principal strata proportions of partial-2-compliers, partial-1-compliers and compliers, i.e., 𝜋3|𝑘 , 𝜋4|𝑘 , and 𝜋6|𝑘 for 𝐾 sites. We truncate the three principal proportions with a lower limit 𝑳 = (0.01 0.01 0.1)′ and a upper limit 𝑼 = (0.15 0.15 0.7)′ . With the generated 𝜋3|𝑘 , 𝜋4|𝑘 , and 𝜋6|𝑘 , we then draw 𝜋5|𝑘 for always-1-takers from a uniform distribution (0, 1 − 𝜋3|𝑘 − 𝜋4|𝑘 − 𝜋6|𝑘 ) and calculate 𝜋9|𝑘 for always-2-takers from 1 − 𝜋3|𝑘 − 𝜋4|𝑘 − 𝜋6|𝑘 − 𝜋5|𝑘 . For the number of sites 𝐾, we consider three different values for our simulations, that is, 𝐾 = 50, 100, or 200. Given the site-level quantities, we next generate the individual-level data for each site. First, we need to determine the site size of each site, that is, the number of individuals in a site. Although site sizes usually vary by sites in practice, we explore both situations where the site size is fixed or varies across sites. We consider the fixed site size as a simple and special case for the real world and look at three values: 25, 50, or 100 individuals for every site. For varied site size, we use a Poisson distribution with 𝜆 = 10 to generate the site size for each site and scale it by 5. Then, with the known site size, we generate the principal stratum 𝐺𝑖 for each individual 𝑖. Specifically, we first generated a variable 𝑇 from a uniform distribution 𝑈(0, 1) . Then, we use 𝑇 to inverse transform a multinomial distribution with probabilities 𝜋𝑔|𝑘 , 𝑔 = 3,4,5,6,9 for the principal stratum 𝐺𝑖 . We use the same variable 𝑇 to inverse transform a binomial distribution for an individual-level covariate 𝑈𝑖 which will therefore be correlated with the principal stratum 𝐺𝑖 . We aggregate 𝑈𝑖 to obtain a site-level covariate 𝑈𝑠 for later use. For the treatment assignment status 𝑍𝑖 , we use a Bernoulli distribution with a probability of 0.5 to generate the two active randomization arms. Thus, given 𝐺𝑖 and 𝑍𝑖 , we can obtain the 50 observed treatment received 𝐴𝑖 in the same way as we did in Step 3) of the nonparametric simulation setup. To better represent the real world, we generate our potential outcomes (𝑌𝑖 (1, 𝐴𝑖 (1)), 𝑌(2, 𝐴𝑖 (2))) for each individual through random effect models. Specifically, for the continuous outcomes, we use the formula 𝑌𝑖 (1, 𝐴𝑖 (1)) = 𝛼3 𝑔3𝑖 + 𝛼4 𝑔4𝑖 + 𝛼6 𝑔6𝑖 + 𝜆𝑈𝑠 + 𝜖𝑏 + 𝜖𝑤𝑖 𝑌𝑖 (2, 𝐴𝑖 (2)) = 𝑌𝑖 (1, 𝐴𝑖 (1)) + 𝛽3 𝑔3𝑖 + 𝛽4 𝑔4𝑖 + 𝛽6 𝑔6𝑖 + 𝛾𝑈𝑠 where 𝑔𝑗𝑖 is an indicator with 𝑔𝑗𝑖 = 1 if individual 𝑖 is in principal stratum 𝑗 and 0 otherwise. 𝜖𝑏 is generated at the site level and represents the between-site variation whereas 𝜖𝑤𝑖 is an individual- level random error which is also known as the within-site variation. Both 𝜖𝑏 and 𝜖𝑤𝑖 follow a standard normal distribution 𝑁(0,1). 𝑈𝑠 is the site-level covariate that we generated previously. Now, it shows clearly that 𝑈𝑠 is a confounding variable that is associated with both the site-level principal strata proportions and the potential outcomes. Similarly, we generate the binary potential outcomes from a logistic regression: 𝑙𝑜𝑔𝑖𝑡(Pr(𝑌𝑖 (1, 𝐴(1)) = 1)) = 𝛼3 𝑔3 + 𝛼4 𝑔4 + 𝛼6 𝑔6 + 𝜆𝑈𝑠 + 𝜖𝑏 𝑙𝑜𝑔𝑖𝑡(Pr(𝑌𝑖 (2, 𝐴(2)) = 1)) = 𝑙𝑜𝑔𝑖𝑡(Pr(𝑌𝑖 (1, 𝐴(1)) = 1)) + 𝛽3 𝑔3 + 𝛽4 𝑔4 + 𝛽6 𝑔6 + 𝛾𝑈𝑠 With the treatment assignment status 𝑍𝑖 and the potential outcomes generated above, the observed outcome 𝑌𝑖 will be equal to 𝑌𝑖 (𝑧, 𝐴𝑖 (𝑧)) if z=1 or 2. After all data are generated, we can use the multiple linear regression to obtain the estimate of our causal estimand of interest. As discussed in section 4.3, our estimator may be biased when measurement errors and omitted variables exist. We investigate these problems by considering three types of estimators: oracle, naïve and bootstrap. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression in equation (4.2). The 51 naïve estimator is obtained by fitting the regression in equation (4.3) using the estimated principal strata proportions. We also include a bootstrap estimator whose estimate is the average of the naïve estimates from resampling the site-level quantities. We use the estimated ITT effects 𝐼𝑇𝑇̂ 𝑦|𝑘 as the outcome of the multiple linear regression for all three estimators for consistency as measurement errors in regressors rather than outcomes are our major concern. For varied site size, we fit both the unweighted and weighted regression. Three types of weights are considered: site size, within- site variation, and complier size per site. Table 4.2 Summary of the parameter values used to generate the potential outcomes for the multisite design approach. Parameters Outcome Analysis Type 𝛼3 𝛼4 𝛼6 𝛽3 𝛽4 𝛽6 𝜆 𝛾 Performance Continuous -3 -4 -2 8 -7 6 0 0 Analysis Binary -3 -1 -1 4 -3 1.51 0 0 Sensitivity Continuous -3 -4 -2 8 -7 6 2 1 or -1 Analysis Binary -3 -1 -1 4 -3 1.51 2 1 or -1 We conduct two analyses for both the continuous outcomes and the binary outcomes. One is the performance analysis that evaluates the performance of our estimators when all assumptions required for the identification of the causal estimand are met. The other one is the sensitivity analysis when Assumption V is violated. We set the true values of CACE to be 6 for the continuous case and 0.3 for the binary case. Parameter values that are used for simulating the data are the same for the fixed site size and the varied site size. We summarize these values in Table 4.2. For the performance analysis, we set 𝜆 = 𝛾 = 0 because in this way no confounding variable 𝑈𝑠 exists 52 and Assumption V is therefore satisfied. For sensitivity analysis, we fix 𝜆 to be 2 but vary the sign of 𝛾 as 𝛾 plays a role in the bias formula as in equation (4.7). For each scenario, we repeat our simulation 10000 times. 4.4.2 Results We discuss the performance and sensitivity of our estimator using the same performance metrics that are described in section 3.4.2. We only display the results for the continuous outcomes here. Results for the binary outcomes are similar (see Appendix B). For fixed site size, we consider nine scenarios based on the combinations of number of sites and site sizes. For varied site size, there are twelve scenarios based on the combinations of types of weights (including unweighted) and number of sites. 4.4.2.1 Performance Analysis Figure 4.1 includes the bias, percent bias, mean standard errors, root mean squared errors, ratios of mean standard errors and empirical standard errors, and coverage rates for samples with fixed site size. In each panel, the dotted lines represent the nine scenarios. For example, label “s=50, n=25” indicates that the scenario has 50 sites and 25 individuals in each site. The top two panels of Figure 4.1 show that the oracle estimator is unbiased across all scenarios. This is as expected as the OLS estimator is unbiased when there are no omitted variables as well as no measurement errors in the regressors. The naïve estimator and the bootstrap estimator are slightly biased due to the measurement errors, which can be explained by equation (4.7). As there is no omitted variable issue in the performance analysis, the term 𝛑′ 𝐔𝜸 will not exist. Then, the bias will only be associated with the variance of the true principal strata proportions and the variance of the 53 measurement errors. Specifically, the bias will decrease as the variance of the measurement errors decreases whereas it will increase as the variance of the regressors decreases. This pattern is reflected in our results. In Figure 4.1, we observe that the bias becomes smaller when we increase the site size given a fixed number of sites. This is because the estimated principal strata proportions will be closer to the true proportions when the site size becomes larger. As a result, the variance of the measurement errors will become smaller. We expect an increase in the bias if the number of sites increases while the site size is controlled. However, it seems that this pattern is not significant especially for larger site sizes. For example, the percent bias looks almost the same when the site size is 100, ranging from -0.86% when 𝑠 = 50 to -0.89% when 𝑠 = 200 for the naïve estimator. This is reasonable because 50 is a relatively big enough sample size for the linear regression, further increasing the sample size will not make a significant difference in reducing the dispersion of the true principal strata proportions. 54 Figure 4.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the nine scenarios based on number of sites 𝑠 and site sizes 𝑛 for the continuous outcome when site size is fixed. For example, label “s=50, n=25” indicates that the scenario has 50 sites and 25 individuals in each site. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. 55 A clear pattern can be observed for the mean standard errors of the three estimators. Namely, the estimators become more precise as the number of sites and the site size increases. One interesting finding is that the mean standard errors of the oracle estimator is always bigger than that of the naïve or the bootstrap estimator when the number of sites and the site size is fixed. This is plausible because the variance of the OLS estimator will be inversely related to the variance of the regressors holding everything else constant. As the variability in regressors is minimal when estimating the oracle estimator compared to the other two estimators, the mean standard errors of the oracle estimator will as a result be the largest among the three estimators within the same scenario. The root mean squared errors have a similar pattern as the mean standard errors across scenarios. The more sites, the bigger the site size, the smaller the root mean squared errors of the estimators. The mean standard errors of the naïve and the bootstrap estimators are close to the standard deviation of the estimated CACE as the standard error ratios for these two estimators lie around one as shown at the lower left corner of Figure 4.1. On the other hand, the mean standard errors of the oracle estimator slightly overestimate the empirical standard errors. This suggests that the mean standard errors of the oracle estimator may be a little conservative, which is consistent with the implications found for the mean standard errors. The lower right corner of Figure 4.1 shows the coverage rates for the three estimators. The oracle estimator has a higher coverage rate that is around 96% across the nine scenarios than the other two estimators. Neither the naïve estimator nor the bootstrap estimator can provide valid coverage rates for the effect of interest. The coverage rates of these two estimators move away from 95% as the number of sites increases given the site size, and move towards 95% as the site size increases given the number of sites. 56 Similar findings for the varied site size are shown in Figure 4.2. As in Figure 4.1, the dotted lines represent the twelve scenarios for the varied site size with UN standing for unweighted, SS for weighting by the site size, WV for weighting by the within-site variation and COMP for weighting by the number of compliers. For every type of weights, the oracle estimator is unbiased as expected. There is a slight and insignificant increase in the bias of the naïve estimator and the bootstrap estimator when there are more sites within each type of weights. Weighting by the within-site variation gives the largest bias among the four types of weights. This is explainable because weighting in this way compromises the variability of the estimated principal strata proportions and therefore results in larger bias. The mean standard errors and the root mean squared errors of the three estimators for the varied site size show the same patterns as for the fixed site size. The mean standard errors of the naïve estimator and the bootstrap estimator are always similar to each other within each scenario for the same type of weights except when using the within-site variation as weights. In that case, the mean standard errors of the naïve estimator are smaller than those of the bootstrap estimator. This is also reflected in the standard error ratio panel of Figure 4.2. The smaller mean standard errors of the naïve estimator when weighting by the within-site variation results in a severe underestimate for the empirical standard errors, leading to coverage rates that are much lower than 95%. In general, the coverage rates of the oracle estimator are around 95%, however, the coverage rates of the naïve estimator and the bootstrap estimator are all below 95% as they are biased due to measurement errors. 57 Figure 4.2 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites 𝑠 for the continuous outcome when site size is varied. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. 58 4.4.2.2 Sensitivity Analysis When there are omitted variables that are associated with the site-specific principal strata proportions and the site-specific principal strata ITT effects, Assumption V will be violated. For our simulations, we only include one omitted variable 𝑈𝑠 for simplicity. As shown in Equation (4.7), the bias will be affected by the product of the covariance between the omitted variable 𝑈𝑠 and the principal strata proportions 𝝅 and the coefficient of the omitted variable 𝛾. Therefore, as our simulations have negative covariance between 𝑈𝑠 and 𝝅 for all scenarios, we consider two values for 𝛾 with opposite signs, i.e., 𝛾 = 1 or 𝛾 = −1, to see how our estimators perform when there is deviation from the zero correlation assumption. We compare three cases in Table 4.3 for the fixed site size and Table 4.4 for the varied site size with C1 referring to the case when Assumption V holds (𝜆 = 𝛾 = 0, reference case), C2 referring to the case when 𝜆 = 2 and 𝛾 = 1, and C3 referring to the case when 𝜆 = 2 and 𝛾 = −1. The results for the oracle estimator show the influence directly from violation of Assumption V. However, the results for the naïve estimator and the bootstrap estimator are more practical because they combine the impact of the two most frequently occurred situations in the practical setting, i.e., the influence from the omitted variable and the measurement errors in estimating the principal strata proportions. 59 Table 4.3 Sensitivity of the multisite design estimator across the nine scenarios based on number of sites and site sizes for the continuous outcome when site size is fixed if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. Oracle Naïve Bootstrap # of Site Measure Sites Size C1 C2 C3 C1 C2 C3 C1 C2 C3 25 0.17 -0.69 0.70 -0.18 -3.06 -1.82 -0.17 -3.05 -1.82 50 50 -0.12 -0.63 0.62 -0.1 -2.48 -1.30 -0.09 -2.48 -1.30 100 -0.00 -0.57 0.58 -0.05 -1.85 -0.73 -0.05 -1.85 -0.72 Bias 25 0.45 -0.69 0.70 -0.19 -3.06 -1.82 -0.18 -3.06 -1.82 (× 100) 100 50 -0.22 -0.63 0.62 -0.10 -2.47 -1.29 -0.10 -2.47 -1.28 100 -0.12 -0.58 0.57 -0.05 -1.85 -0.73 -0.05 -1.85 -0.73 25 -0.16 -0.69 0.71 -0.19 -3.06 -1.82 -0.19 -3.06 -1.82 200 50 -0.11 -0.63 0.62 -0.10 -2.47 -1.29 -0.10 -2.47 -1.29 100 -0.06 -0.57 0.58 -0.05 -1.85 -0.73 -0.05 -1.85 -0.73 25 0.03 -10.7 12.7 -3.0 -47.1 -33.1 -2.9 -47.0 -33.0 50 50 -0.02 -9.7 11.3 -1.6 -38.2 -23.6 -1.6 -38.1 -23.6 100 0.00 -8.8 10.5 -0.9 -28.5 -13.2 -0.8 -28.4 -13.1 Percent 25 0.07 -10.6 12.7 -3.1 -47.1 -33.1 -3.0 -47.1 -33.1 Bias 100 50 -0.04 -9.7 11.4 -1.7 -38.0 -23.4 -1.6 -38.0 -23.3 (%) 100 -0.02 -8.9 10.4 -0.9 -28.5 -13.3 -0.9 -28.5 -13.3 25 -0.03 -10.6 12.8 -3.2 -47.1 -33.1 -3.1 -47.1 -33.1 200 50 -0.02 -9.7 11.3 -1.7 -38.0 -23.4 -1.7 -38.0 -23.4 100 -0.01 -8.8 10.5 -0.9 -28.5 -13.3 -0.9 -28.5 -13.3 25 0.46 1.14 1.14 0.30 0.60 0.63 0.29 0.61 0.63 50 50 0.37 0.92 0.92 0.26 0.56 0.58 0.25 0.56 0.58 100 0.29 0.71 0.71 0.22 0.49 0.50 0.21 0.50 0.51 Standard 25 0.32 0.80 0.80 0.21 0.42 0.44 0.20 0.42 0.44 Error 100 50 0.26 0.64 0.64 0.18 0.39 0.40 0.18 0.39 0.40 100 0.21 0.49 0.49 0.15 0.34 0.35 0.15 0.34 0.35 25 0.23 0.56 0.56 0.15 0.29 0.31 0.14 0.29 0.31 200 50 0.18 0.45 0.45 0.13 0.27 0.28 0.12 0.27 0.28 100 0.14 0.34 0.34 0.11 0.24 0.24 0.11 0.24 0.24 60 Table 4.3 (cont’d) 25 0.42 1.32 1.32 0.35 3.12 1.93 0.34 3.11 1.93 50 50 0.34 1.12 1.11 0.27 2.54 1.42 0.27 2.54 1.42 Root 100 0.28 0.92 0.92 0.22 1.92 0.89 0.22 1.91 0.88 Mean 25 0.29 1.03 1.04 0.28 3.09 1.87 0.27 3.08 1.87 100 Squared 50 0.24 0.89 0.89 0.21 2.50 1.35 0.20 2.50 1.35 Error 100 0.19 0.76 0.75 0.16 1.89 0.81 0.16 1.88 0.81 25 0.21 0.87 0.89 0.24 3.07 1.85 0.24 3.07 1.85 200 50 0.17 0.77 0.76 0.16 2.48 1.32 0.16 2.48 1.32 100 0.14 0.67 0.67 0.12 1.87 0.77 0.12 1.87 0.77 61 Table 4.4 Sensitivity of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites for the continuous outcome when site size is varied if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. Type Oracle Naïve Bootstrap of # of Measure Weight sites C1 C2 C3 C1 C2 C3 C1 C2 C3 50 0.50 -0.63 0.64 -0.10 -2.55 -1.36 -0.10 -2.54 -1.35 UN 100 0.08 -0.62 0.65 -0.11 -2.56 -1.37 -0.11 -2.55 -1.36 200 -0.07 -0.64 0.63 -0.11 -2.57 -1.38 -0.11 -2.56 -1.37 50 0.25 -0.62 0.63 -0.09 -2.46 -1.28 -0.09 -2.46 -1.28 SS 100 0.08 -0.62 0.63 -0.10 -2.46 -1.28 -0.10 -2.46 -1.28 Bias 200 -0.03 -0.62 0.63 -0.10 -2.46 -1.28 -0.10 -2.46 -1.28 (× 100) 50 0.75 -0.67 0.58 -0.15 -2.57 -1.40 -0.14 -2.56 -1.38 WV 100 0.04 -0.67 0.59 -0.17 -2.58 -1.40 -0.16 -2.57 -1.40 200 -0.14 -0.69 0.57 -0.17 -2.59 -1.42 -0.17 -2.59 -1.4 50 0.19 -0.62 0.63 -0.07 -2.48 -1.30 -0.06 -2.48 -1.30 COMP 100 0.11 -0.61 0.63 -0.07 -2.47 -1.30 -0.07 -2.47 -1.30 200 -0.00 -0.62 0.62 -0.07 -2.48 -1.30 -0.07 -2.48 -1.30 50 0.08 -9.7 11.7 -1.7 -39.3 -24.8 -1.6 -39.1 -24.6 UN 100 0.01 -9.6 11.8 -1.8 -39.4 -24.9 -1.8 -39.3 -24.8 200 -0.01 -9.8 11.5 -1.8 -39.5 -25.0 -1.8 -39.5 -25.0 50 0.04 -9.6 11.4 -1.6 -37.9 -23.3 -1.5 -37.9 -23.3 SS 100 0.01 -9.5 11.5 -1.6 -37.8 -23.3 -1.6 -37.8 -23.2 Percent 200 0.00 -9.6 11.4 -1.6 -37.9 -23.3 -1.6 -37.9 -23.3 Bias 50 0.12 -10.4 10.6 -2.5 -39.6 -25.4 -2.3 -39.4 -25.2 (%) WV 100 0.01 -10.3 10.8 -2.8 -39.7 -25.5 -2.6 -39.6 -25.4 200 -0.02 -10.6 10.4 -2.8 -39.9 -25.7 -2.8 -39.8 -25.6 50 0.03 -9.5 11.4 -1.1 -38.1 -23.7 -1.0 -38.2 -23.7 COMP 100 0.02 -9.4 11.5 -1.1 -38.1 -23.6 -1.1 -38.1 -23.6 200 0.00 -9.7 11.3 -1.1 -38.1 -23.7 -1.1 -38.1 -23.7 62 Table 4.4 (cont’d) 50 0.38 0.95 0.95 0.27 0.57 0.59 0.27 0.59 0.61 UN 100 0.27 0.66 0.66 0.19 0.39 0.41 0.19 0.41 0.42 200 0.19 0.47 0.47 0.13 0.28 0.29 0.13 0.29 0.30 50 0.37 0.92 0.92 0.26 0.56 0.58 0.25 0.57 0.58 SS 100 0.26 0.64 0.64 0.18 0.39 0.40 0.18 0.39 0.40 Standard 200 0.18 0.45 0.45 0.13 0.27 0.28 0.12 0.27 0.28 Error 50 0.35 0.88 0.87 0.26 0.56 0.58 0.29 0.62 0.66 WV 100 0.24 0.61 0.60 0.18 0.39 0.41 0.21 0.44 0.47 200 0.17 0.43 0.42 0.13 0.27 0.28 0.15 0.32 0.34 50 0.35 0.94 0.94 0.25 0.57 0.59 0.25 0.57 0.59 COMP 100 0.25 0.65 0.65 0.17 0.39 0.41 0.17 0.40 0.41 200 0.17 0.46 0.46 0.12 0.28 0.28 0.12 0.28 0.29 50 0.37 1.16 1.16 0.29 2.62 1.50 0.29 2.61 1.49 UN 100 0.26 0.93 0.94 0.22 2.59 1.44 0.22 2.59 1.43 200 0.18 0.80 0.79 0.17 2.58 1.41 0.17 2.58 1.41 50 0.35 1.11 1.11 0.27 2.53 1.41 0.27 2.53 1.41 Root SS 100 0.24 0.89 0.90 0.20 2.49 1.34 0.20 2.49 1.34 Mean 200 0.17 0.77 0.77 0.16 2.48 1.31 0.16 2.48 1.31 Squared 50 0.34 1.15 1.09 0.35 2.66 1.57 0.34 2.64 1.54 Error WV 100 0.24 0.93 0.88 0.28 2.62 1.49 0.27 2.62 1.48 200 0.17 0.82 0.73 0.23 2.61 1.46 0.23 2.61 1.46 50 0.34 1.13 1.13 0.26 2.54 1.43 0.26 2.55 1.43 COMP 100 0.24 0.90 0.92 0.19 2.51 1.36 0.19 2.51 1.36 200 0.17 0.77 0.78 0.14 2.49 1.33 0.14 2.49 1.33 63 In Table 4.3 and Table 4.4, each row within a performance measure represents one scenario that is described in the previous section. The bias of the oracle estimator is all negative in C2 and all positive in C3. This is consistent with equation (4.7) which becomes E((𝝅 ̂ ′𝝅 ̂ )−1 𝛑′ 𝐔𝛄) because there are no measurement errors for the oracle estimator. As the covariance is all negative in our simulations, the sign of the coefficient of the omitted variable dominates the direction of the bias. On the other hand, there are almost no differences on the mean standard errors if we change the sign of the coefficient of the omitted variable for the oracle estimator within each scenario for the fixed site size. This is also the case for the varied site size except those weighted by the within- site variation which is reasonable as the variability has been taken into account. When Assumption V is violated, for the naïve estimator and the bootstrap estimator, the bias formula in equation (4.7) show that the bias is related to the difference of the scaled covariance between the omitted variable and the principal strata proportions and the scaled variance of the measurement errors. The scalars are the coefficient of the omitted variable and the population principal strata ITT effects, respectively. Because of this difference, in some circumstances the magnitude of the bias of the naïve estimator or the bootstrap estimator is not significantly larger than that of the oracle estimator within the same scenario for the same case. For example, in Table 4.3, when the number of sites is 50 and the site size is 100, the bias of the oracle estimator for C3 is 0.58 whereas the bias of the native estimator is -0.73 whose absolute value is not significantly bigger than that of the oracle estimator. The mean standard errors of the naïve estimator and the bootstrap estimator are found to be smaller than those of the oracle estimator when there are omitted variables for both the fixed and varied site size, an identical finding as in the performance analysis. This is true because there is more variability in the estimated principal strata proportions than in the true principal strata 64 proportions for the naïve estimator and the bootstrap estimator, resulting in less variation in the distribution of the estimators. For the root mean squared errors, we find a consistent pattern for the three estimators across the three cases, that is, the larger the number of sites, the larger the number of individuals in each site, the smaller the root mean squared error, the better the estimators. 65 CHAPTER 5 APPLICATION Major depressive disorder (MDD) is a common medical disorder and a serious public health concern. More than 300 million people worldwide suffer from depression, corresponding to 4.4% of the global population (World Health Organization, 2017). Women are more likely to live with depression than men with a prevalence of 5.1% and 3.6%, respectively. The economic burden of MDD has increased by 21.5% to $210.5 billion in 2010 compared to that in 2005 (Greenberg et al., 2015). As one of the leading causes of non-fatal health loss, depression contributes to 7.5% of years lived with disability (YLD) (James et al., 2018). Depression is also a leading cause of suicide deaths, costing approximately 800,000 lives every year. To treat major depressive disorder, a variety of antidepressants are available (Frank et al., 1996). Among these medications, the selective serotonin reuptake inhibitors (SSRIs), serotonin norepinephrine reuptake inhibitors (SNRIs), mirtazapine, and bupropion are often taken as the first-line treatments because they are safe, more tolerable and acceptable, less costly and have less toxicity (Gelenberg et al., 2010). The effectiveness of these antidepressants, between or within medication classes, is similar. However, approximately 50% of the depressed patients fail to respond to the initial treatments and only about 30% achieve full remission (Fava et al., 2003). Lack of remission to the first-line treatments will result in persistent disabilities and impaired quality of life. Thus, next-step treatments after initial failure to boost remission rates are often in demand. The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial is a large- scale, multicenter, multistep, real-world clinical trial of patients with nonpsychotic major depressive disorder. It aims to provide evidence for the optimal choice of the next-step treatments 66 for those who have an initial failure to the first-line antidepressant. Details of the study design is described elsewhere (Rush et al., 2004). In this chapter, we apply our proposed approaches to the STAR*D data at level 2 to identify the complier average causal effect (CACE) of treatment options when treatment nonadherence exists. Specifically, we focus on the antidepressant medications at level 2 for those who did not obtain a satisfactory response from citalopram (CIT), an SSRI antidepressant at level 1. Level 2 treatments include augmentation to the previous citalopram or switching to a different medication based on their acceptability. For those who chose the medication augmentation strategy, they were randomly assigned to receive one of the two treatment options: sustained-release bupropion (BUP) or buspirone (BUS). For those who chose the medication switching strategy, they were randomly assigned to receive one of the three treatment options: sustained-release BUP, sertraline (SER), or extended-release venlafaxine (VEN). Among these three switching options, we only consider sustained-release BUP and extended-release VEN for comparison in this dissertation because they belong to different medication classes from citalopram which may provide new indications. With the study setting, we define our estimand of interest, CACE, in terms of its following attributes (ICH E9 working group, 2020). Treatment. We compare treatment options within the medication augmentation and switching strategies separately. For the medication augmentation strategy, the treatments of interest are augmenting CIT with BUP (CIT+BUP) and augmenting CIT with BUS (CIT+BUS). For the medication switching strategy, the treatments of interest are BUP and VEN. Population. Our population of interest consists of those who always adhere to whichever treatment they would be assigned regardless of the assignment status, that is, the compliers defined by the potential treatment received under the principal stratification framework in Chapter 3. 67 Endpoint. The primary endpoint of interest is the remission of symptoms which is defined as a score of seven or less on the 17-item Hamilton Rating Scale for Depression (HAM-D17) at the end (12 weeks or longer) of level 2. We also consider secondary endpoints that are based on the HAM- D17 scores. They are 1) total HAM-D17 scores at the end of level 2 which range from 0 to 52 with higher values indicating more severe depression; 2) reduction of HAM-D17 scores, i.e., (HAM- D17 scores at the beginning of level 2 − HAM-D17 scores at the end of level 2); and 3) response which is defined as a 50% or more reduction of HAM-D17 scores at the end of level 2. Population-level summary. We use the potential outcome means under each treatment in our population of interest as the population-level summaries. Intercurrent Event. We are interested in the intercurrent event of treatment nonadherence in this study. Specifically, we consider a wide range of reasons for treatment nonadherence which are collected by the protocol deviation questionnaire and the study termination questionnaire. The reasons include 1) lack of efficacy; 2) unacceptable side effects; 3) committed suicide/suicide attempt; 4) developed general medical or surgical condition that required protocol to be stopped; 5) developed symptoms requiring non-protocol treatment; 6) moved from the area; 7) found research too burdensome; 8) patient nonadherence to study procedures; and 9) patient nonadherence to study medications. With the above attributes, we can clearly define our causal estimand of interest as the mean difference at week 12 or longer between the potential outcomes under each of the two active treatments for the principal stratum of compliers when treatment nonadherence exists. The corresponding treatment effect is therefore the CACE and we use the corresponding principal stratum strategy in the ICH E9 (R1) addendum to address the intercurrent event of treatment nonadherence. 68 5.1 Medication Augmentation: CIT+BUP vs. CIT+BUS In STAR*D, randomization was stratified by clinical sites and preference stratum. After the initial failure of citalopram at level 1, 565 patients in 38 clinical sites accepted the medication augmentation strategy at level 2. Among these patients, 279 were randomly assigned to CIT+BUP and 286 were randomly assigned to CIT+BUS. Table 5.1 shows the characteristics of the patients at baseline or level 2 entry, all of which are balanced between the two treatment assignment groups except the length of illness which is defined as years from first episode to baseline of the study. Those assigned to CIT+BUP had a significantly shorter length of illness (15.2 years) than those assigned to CIT+BUS (17.7 years, 𝑃 = 0.02). Table 5.2 shows the observed treatment adherence by treatment assignment groups. Note that the treatment assignment 𝑍 takes two values with 1 representing those assigned to CIT+BUP (𝑍 = 1) and 2 representing those assigned to CIT+BUS (𝑍 = 2). The treatment receipt 𝐴 takes three values with 1 representing receiving CIT+BUP (𝐴 = 1), 2 representing receiving CIT+BUS (𝐴 = 2) and 0 representing not taking any treatment as required. We observe that 80% of the patients in the CIT+BUP assignment group adhered to what was assigned whereas in the CIT+BUS group 73% of the patients adhered to what was assigned. None of the patients in either group crossed over to receive treatment from the other group. 69 Table 5.1 Baseline and level 2 entry characteristics of the patients in the medication augmentation strategy. CIT+BUP CIT+BUS Total Characteristic (N=279) (N=286) (N=565) Setting Primary care 94 (33.7) 95 (33.2) 189 (33.5) Specialty care 185 (66.3) 191 (66.8) 376 (66.5) QIDS-C16 at level 2 entry 11.7±4.1 11.7±4.2 11.7±4.2 Monthly household income 2287±2600 2366±3320 2327±2983 Marital status Never married 84 (30.1) 79 (27.7) 163 (28.9) Married or cohabiting 112 (40.1) 117 (41.1) 229 (40.6) Separated or divorced 73 (26.2) 85 (29.8) 158 (28.0) Widowed 10 (3.6) 4 (1.4) 14 (2.5) Education Less than high school 33 (11.8) 44 (15.4) 77 (13.7) Less than college 178 (63.8) 183 (64.2) 361 (64.0) College or above 68 (24.4) 58 (20.4) 126 (22.3) Employment status Unemployed 111 (39.8) 125 (43.9) 236 (41.9) Employed 160 (57.3) 147 (51.6) 307 (54.4) Retired 8 (2.9) 13 (4.5) 21 (3.7) Insurance Private insurance 130 (48.1) 142 (51.6) 272 (49.9) Public insurance 35 (13.0) 36 (13.1) 71 (13.0) No insurance 105 (38.9) 97 (35.3) 202 (37.1) Sex Male 107 (38.4) 126 (44.1) 233 (41.2) Female 172 (61.6) 160 (55.9) 332 (58.8) Age 41.3±12.8 42.0±12.6 41.6±12.7 Race White 221 (79.2) 220 (76.9) 441 (78.1) Black/African American 48 (17.2) 47 (16.4) 95 (16.8) Other 10 (3.6) 19 (6.7) 29 (5.1) Hispanic No 243 (87.1) 245 (85.7) 488 (86.4) Yes 36 (12.9) 41 (14.3) 77 (13.6) CIRS score 4.5±4.1 4.9±4.0 4.7±4.0 Age at first MDE 26.1±14.4 24.3±13.6 25.2±14.0 Length of illness 15.2±12.6 17.7±13.6 16.5±13.2 Family history of MDD No 131 (48.0) 138 (48.6) 269 (48.3) Yes 142 (52.0) 146 (51.4) 288 (51.7) 70 Table 5.1 (cont’d) Suicide history No 266 (97.4) 274 (96.8) 540 (97.1) Yes 7 (2.6) 9 (3.2) 16 (2.9) SF-12 score Physical 48.3±12.5 48.1±12.0 48.2±12.2 Mental 25.0±8.2 26.2±7.7 25.6±8.0 WSAS score 25.8±8.1 25.6±9.1 25.7±8.6 QLESQ score 37.1±12.9 39.3±14.2 38.2±13.6 71 5.1.1 Assessment of Assumptions In this section, we discuss the validity of the key assumptions used to identify CACE, that is, Assumptions I, II, and III for the nonparametric approach and Assumption V for the multisite design approach. Specifically, we evaluate whether these assumptions can be falsified for the augmentation strategy. For the nonparametric approach, it is straightforward that Assumption II, the monotonicity assumption, is satisfied because no crossover occurs in our data as shown in Table 5.2. Table 5.2 Observed treatment adherence in the medication augmentation strategy. Treatment Received (A) None CIT+BUP CIT+BUS (A = 0) (A = 1) (A = 2) CIT+BUP (Z = 1) 56 (20) 223 (80) 0 (0) Treatment Assigned (Z) CIT+BUS (Z = 2) 78 (27) 0 (0) 208 (73) If the treatment assignment has an effect on the outcomes of interest that is not through the treatment received, Assumption I, the exclusion restriction assumption will be violated. However, the exclusion restriction assumption can never be verified from the data because the potential outcomes cannot be observed at the same time. It can, on the other hand, be testable using available data (Angrist et al., 1996). To better represent the real world, STAR*D was designed not to blind the patients and the physicians who administered the drugs, but only blind the assessors (Sinyor et al., 2010). It is likely that knowing the assigned treatment will have an effect on “experiencing” side effects and subsequently violate the exclusion restriction assumption. Table 5.3 shows the side effects in level 2 in the medication augmentation strategy. No significant difference was found 72 between those assigned to CIT+BUP and those assigned to CIT+BUS in terms of maximal frequency (𝜒 2 = 0.87, 𝑃 = 0.99), maximal intensity (𝜒 2 = 2.04, 𝑃 = 0.92), or maximal burden of side effects ( 𝜒 2 = 12.0, 𝑃 = 0.06 ). Therefore, our data do not support the violation of Assumption I. 73 Table 5.3 Side effects in level 2 in the medication augmentation strategy. CIT+BUP CIT+BUS Total Characteristic (N=279) (N=286) (N=565) Maximal frequency of side effects in level 2 No side effects 55 (19.8) 59 (20.7) 114 (20.2) 10% of the time 44 (15.8) 47 (16.5) 91 (16.2) 25% of the time 47 (16.9) 49 (17.2) 96 (17.0) 50% of the time 47 (16.9) 52 (18.3) 99 (17.6) 75% of the time 31 (11.2) 28 (9.8) 59 (10.5) 90% of the time 19 (6.8) 16 (5.6) 35 (6.2) All the time 35 (12.6) 35 (11.9) 69 (12.3) Maximal intensity of side effects in level 2 No side effects 55 (19.8) 58 (20.4) 113 (20.0) Trivial 27 (9.7) 29 (10.2) 56 (10.0) Mild 55 (19.8) 52 (18.3) 107 (19.0) Moderate 61 (22.0) 60 (21.0) 121 (21.5) Marked 44 (15.8) 52 (18.3) 96 (17.0) Severe 27 (9.7) 29 (10.2) 56 (10.0) Intolerable 9 (3.2) 5 (1.7) 14 (2.5) Maximal burden of side effects in level 2 No side effects 63 (22.7) 75 (26.3) 138 (24.5) Minimal impairment 62 (22.3) 56 (19.7) 118 (21.0) Mild impairment 65 (23.4) 52 (18.2) 117 (20.8) Moderate impairment 50 (18.0) 55 (19.3) 105 (18.6) Marked impairment 24 (8.6) 33 (11.6) 57 (10.1) Severe impairment 8 (2.9) 14 (4.9) 22 (3.9) Unable to function 6 (2.1) 0 (0.0) 6 (1.1) 74 In addition to satisfying Assumption II, the fact that the treatments are not crossed over in the medication augmentation strategy also provides evidence of the nonexistence of always-1-takers (strata 5) and always-2-takers (strata 9). Given this fact and that Assumption I is likely to hold, the population structure would become Table 5.4 if Assumption III, the no partial-compliers assumption is also satisfied for the nonparametric approach. We can therefore falsify Assumption III using the observed data. Table 5.4 Population structure when Assumptions I (exclusion restriction), Assumption II (monotonicity), and Assumption III (no partial-compliers) are satisfied for the nonparametric approach. Z=2 A(2)=0 A(2)=1 A(2)=2 1. Never-taker A(1)=0 (NT) π1 6. Complier Z=1 A(1)=1 (COMP) π6 A(1)=2 Specifically, following the discussion in section 4.2, we can estimate the proportion of never-takers 𝜋1 in two ways. One is using the probability of those who were assigned to CIT+BUP but did not take it, i.e., Pr(𝐴 = 0|𝑍 = 1) for 𝜋1 and the other is using the probability of those who were assigned to CIT+BUS but did not take it, i.e., Pr(𝐴 = 0|𝑍 = 2) . Similarly, we can use the probability of those who were assigned to CIT+BUP and took it, i.e., Pr(𝐴 = 1|𝑍 = 1) or the probability of those who were assigned to CIT+BUS and took it, i.e., Pr(𝐴 = 2|𝑍 = 2) to estimate 75 the proportion of compliers 𝜋6 . If Assumption III holds, we expect that the two estimated probabilities for 𝜋1 to be equal and that the two estimated probabilities for 𝜋6 to be equal. That is, ̂(𝐴 = 0|𝑍 = 1) = Pr 𝜋̂1 = Pr ̂(𝐴 = 0|𝑍 = 2) ̂(𝐴 = 1|𝑍 = 1) = Pr 𝜋̂6 = Pr ̂(𝐴 = 2|𝑍 = 2) (5.1) However, as shown in Table 5.2, the probability Pr ̂(𝐴 = 0|𝑍 = 1) = 0.2 differs significantly from the probability Pr ̂(𝐴 = 0|𝑍 = 2) = 0.27 , and the probability Pr ̂(𝐴 = 1|𝑍 = 1) = 0.8 differs significantly from the probability Pr ̂(𝐴 = 2|𝑍 = 2) = 0.73 ( 𝑧 = −2.01, 𝑃 = 0.04 ). Therefore, there is a possibility that Assumption III is violated. We will give further discussions later in this chapter. Table 5.5 Population structure when Assumptions I (exclusion restriction), Assumption II (monotonicity), and IV (no never-takers) are satisfied for the multisite design approach. Z=2 A(2)=0 A(2)=1 A(2)=2 3. Partial-2-complier A(1)=0 (P2C) π3 4. Partial-1-complier 6. Complier Z=1 A(1)=1 (P1C) (COMP) π4 π6 A(1)=2 For the multisite design approach, the population structure becomes Table 5.5 under Assumptions I, II, and IV in the medication augmentation strategy. We only evaluate Assumption V, the zero correlation assumption as it is the key to this approach. Specifically, Assumption V can be violated if there are site-level covariates that correlate to both the site-specific principal 76 strata proportions 𝜋3|𝑘 , 𝜋4|𝑘 , 𝜋6|𝑘 and the site-level ITT effects 𝐼𝑇𝑇𝑦|𝑘 for 𝑘 = 1, 2, … , 38. As we cannot observe 𝜋3|𝑘 , 𝜋4|𝑘 , 𝜋6|𝑘 and 𝐼𝑇𝑇𝑦|𝑘 directly, we will use the estimated quantities 𝜋̂3|𝑘 , 𝜋̂4|𝑘 , ̂ 𝑦|𝑘 instead. The site-level variable of clinical setting which indicates whether a site 𝜋̂6|𝑘 , and 𝐼𝑇𝑇 provides primary care or specialty care could be one such site-level covariate. Patients in the specialty care setting might be more compliant to their treatment and therefore have a larger proportion of compliers. However, our data do not provide evidence of association between clinical setting and 𝜋̂3|𝑘 (𝑡 = 0.7, 𝑃 = 0.49), 𝜋̂4|𝑘 (𝑡 = −0.18, 𝑃 = 0.86), 𝜋̂6|𝑘 (𝑡 = −0.29, 𝑃 = 0.77). We also consider aggregated variables from the individual-level data as the potential confounders including site-specific proportion of females, proportion of African Americans, average age, proportion of the unemployed, average household income, average years of schooling, proportion of married individuals, proportion of those with a family history with MDD, average physical health, average mental health, average quality of life, and average baseline medical conditions. 77 Figure 5.1 Linear regressions of site-specific estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 , 𝜋̂6|𝑘 on the site-level aggregated covariates in the medication augmentation strategy. 78 Figure 5.1 shows the linear regressions of the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 , 𝜋̂6|𝑘 on each of the above aggregated variables. No significant association has been found for these covariates except the proportion of African Americans ( 𝜋̂3|𝑘 : 𝑡 = 4.98, 𝑃 < 0.001; 𝜋̂4|𝑘 : 𝑡 = 0.2, 𝑃 = 0.85; 𝜋̂6|𝑘 : 𝑡 = −2.8, 𝑃 = 0.008). We further evaluate the association between the site- ̂ 𝑦|𝑘 of the four endpoints and proportion of African Americans through linear regressions level 𝐼𝑇𝑇 and no significant association has been found for any of the endpoints. Therefore, our data seem to support the plausibility of Assumption V. 5.1.2 Analysis Our primary and secondary endpoints are all based on the HAM-D17 scores at the end of level 2 which have some missing data. Table 5.6 summarizes the reasons for missingness in the medication augmentation strategy. Overall, 61.3% of the patients missed the HAM-D17 scores due to loss to follow up or withdrawn consent. 19.1% of the patients are due to treatment nonadherence. 32 (18.5%) patients exited level 2 before week 12. Even though the HAM-D17 scores of the last group were collected, we ignore these scores and treat them as missing because otherwise it will misalign with our causal estimand of interest that targets endpoints at week 12 or longer. We name the reason of this type of missingness as “inadequate stay in level 2”. There were also 2 (1.1%) patients who missed the HAM-D17 scores with no reasons. 79 Table 5.6 Reasons for missing HAM-D17 scores in the medication augmentation strategy. CIT+BUP CIT+BUS Total Reasons for missing HAM-D17 scores (N=85) (N=88) (N=173) Lost to follow up/Withdrew consent 59 (69.4) 47 (53.4) 106 (61.3) Inadequate stay in level 2 10 (11.8) 22 (25.0) 32 (18.5) Treatment nonadherence 15 (17.6) 18 (20.5) 33 (19.1) Other 1 (1.2) 1 (1.1) 2 (1.1) To deal with missingness, as suggested in the National Research Council (NRC) report, we chose the most commonly used mixed effects model repeated measures (MMRM) analysis which assumes the missing endpoints are missing at random (MAR) as our main analysis following the framework in Figure 1 of the ICH E9 (R1) addendum. (National Research Council, 2010; ICH E9 working group, 2020) We included age, race, duration of level 2 treatment, and the 16-item Quick Inventory of Depressive Symptomatology-Clinician-Rated (QIDS-C16) scores at the beginning of level 2 in the MMRM analysis because these covariates differed significantly between those with and without missing endpoints. Next, we applied our proposed approaches within the MMRM analysis. Specifically, we estimated the potential outcome 𝑌𝑖 (1, 𝐴(1)) for each patient in the sample by fitting a random effect model using patients who were randomly assigned to 𝑍 = 1 only. Then, we obtained 𝐸(𝑌𝑖 (1, 𝐴(1)) by taking average of the fitted values. Similarly, we estimated 𝐸(𝑌𝑖 (2, 𝐴(2)) by taking average of the fitted values from a separate random effect model using patients who were randomly assigned to 𝑍 = 2 only. The marginal ITT effects of 𝑍 on 𝑌 for the nonparametric approach were calculated by subtracting the estimated 𝐸(𝑌𝑖 (1, 𝐴(1)) from the estimated 𝐸(𝑌𝑖 (2, 𝐴(2)). For the multisite design approach, we obtained the site-level marginal ITT effects of 𝑍 on 𝑌 by repeating the above steps for each site. We estimated the ITT effects of 𝑍 on 𝐴 from the average values 𝐴 for each value of 𝑍 for the nonparametric approach and the site- 80 specific principal strata proportions from the observed proportions of 𝐴 given 𝑍 for the multisite design approach. We bootstrapped 500 times to obtain the standard errors of our estimates. The main analysis has the potential to overestimate the missing endpoints because the underlying MAR mechanism assumes that the missing endpoints and non-missing endpoints follow the same distribution. For placebo-controlled trials, overestimation of the missing endpoints will often result in exaggerated treatment effects (Mehrotra, 2019). To address this problem, Mehrotra et al. (2017) proposed an approach that was less in favor of the treatment group for the placebo-controlled trials and therefore provided an estimator that was less likely towards the direction of treatment efficacy. Their basic idea was to use the estimated overall mean of the placebo group from MMRM as the mean of the dropouts in the treatment group. Subsequently, they estimated the overall mean of the treatment group as a weighted sum of the estimated mean of the completers from MMRM and the estimated mean of the dropouts. The weights are the proportions of completers and dropouts in the treatment group respectively. To evaluate the robustness of our estimators from the main analysis, we extend the approach in Mehrotra et al. (2017) to our setting with two active treatments and conduct three sensitivity analyses. Specifically, let 𝜇̂ 1 be the estimated overall mean in the CIT+BUP group obtained from MMRM discussed above, 𝜇̂ 2 be the estimated overall mean from MMRM in the CIT+BUS group, 𝑝1,𝑚𝑖𝑠𝑠 be the proportion of missing endpoints in the CIT+BUP group, 𝑝2,𝑚𝑖𝑠𝑠 be the proportion of missing endpoints in the CIT+BUS group, and 𝜏̂𝑗 be the estimated ITT effect of 𝑍 on 𝑌 for sensitivity analysis 𝑗, 𝑗 = 1, 2, 3 . Sensitivity analysis 1 uses CIT+BUP as the reference group and re-estimates the overall mean of the CIT+BUS group, that is, 𝜇̂ 2,𝑛𝑒𝑤 = (1 − 𝑝2,𝑚𝑖𝑠𝑠 )𝜇̂ 2 + 𝑝2,𝑚𝑖𝑠𝑠 𝜇̂ 1 (5.2) 𝜏̂1 = 𝜇̂ 2,𝑛𝑒𝑤 − 𝜇̂ 1 = (1 − 𝑝2,𝑚𝑖𝑠𝑠 )(𝜇̂ 2 − 𝜇̂ 1 ) 81 Similarly, sensitivity analysis 2 uses CIT+BUS as the reference group and re-estimates the overall mean of the CIT+BUP group, that is, 𝜇̂ 1,𝑛𝑒𝑤 = (1 − 𝑝1,𝑚𝑖𝑠𝑠 )𝜇̂ 1 + 𝑝1,𝑚𝑖𝑠𝑠 𝜇̂ 2 (5.3) 𝜏̂ 2 = 𝜇̂ 2 − 𝜇̂ 1,𝑛𝑒𝑤 = (1 − 𝑝1,𝑚𝑖𝑠𝑠 )(𝜇̂ 2 − 𝜇̂ 1 ) Sensitivity analysis 3 cross-references one group for the other and re-estimates the overall means for both groups, that is, 𝜇̂ 1,𝑛𝑒𝑤 = (1 − 𝑝1,𝑚𝑖𝑠𝑠 )𝜇̂ 1 + 𝑝1,𝑚𝑖𝑠𝑠 𝜇̂ 2 (5.4) 𝜇̂ 2,𝑛𝑒𝑤 = (1 − 𝑝2,𝑚𝑖𝑠𝑠 )𝜇̂ 2 + 𝑝2,𝑚𝑖𝑠𝑠 𝜇̂ 1 𝜏̂3 = 𝜇̂ 2,𝑛𝑒𝑤 − 𝜇̂ 1,𝑛𝑒𝑤 = (1 − 𝑝1,𝑚𝑖𝑠𝑠 − 𝑝2,𝑚𝑖𝑠𝑠 )(𝜇̂ 2 − 𝜇̂ 1 ) Note that 𝜇̂ 2 − 𝜇̂ 1 is the estimated ITT effect of 𝑍 on 𝑌 for the main analysis. 5.1.3 Results Figure 5.2 shows the estimated CACEs and their 95% confidence intervals for the primary endpoint of remission and the secondary endpoints of total HAM-D17 scores, reduction of HAM- D17 scores, and response. In each panel, the solid horizontal line represents the null CACE. The dashed vertical line separates the nonparametric estimates on the left-hand side from the multisite design estimates on the right-hand side. For the multisite design approach, we list the results with different types of weights. For remission, all of the estimated CACEs are around zero and the 95% confidence intervals include zero; thus no significant differences of the two augmented medications are found in the main analysis and the sensitivity analyses. This aligns with existing findings of similar remission rates for these two groups although the target populations are not the same (Trivedi et al., 2006). Evidently, either augmenting CIT with BUP or augmenting CIT with BUS in the medication augmentation strategy is no better than the other one on remission. 82 Figure 5.2 Estimated complier average causal effects (CACEs) and their 95% confidence intervals for the primary endpoint and secondary endpoints in the medication augmentation strategy. In each panel, the horizontal solid line represents the null. The dashed vertical line separates the nonparametric estimates on the left-hand side from the multisite design estimates on the right-hand side. The blue square represents main analysis. The green circle represents sensitivity analysis 1. The red triangle represents sensitivity analysis 2. The orange diamond represents sensitivity analysis 3. 83 The nonparametric approach provides significant estimates of CACE on the total HAM- D17 scores at the end of level 2 as shown in the upper right panel of Figure 5.2. As larger HAM- D17 scores indicate greater severity of depression, the positive estimated CACEs imply that the symptoms of those augmenting CIT with BUP are less severe at the end of level 2 than those augmenting CIT with BUS. This is also true for the multisite design approach, although no evidence of statistical significance is found except in sensitivity analysis 3 if unweighted or weighted by within-site variation. Given the definition of the reduction of HAM-D17 scores in the previous section, we are confident that a greater reduction of symptoms occurs in those augmenting CIT with BUP than those augmenting CIT with BUS because of the negative estimated CACEs across all analyses which is shown in the lower left panel of Figure 5.2. Besides, the multisite design approach also shows that the estimated CACEs are significant except in the main analysis if unweighted or weighted by site size. The estimated CACEs on response are significant in both the nonparametric approach and multisite design approach. In other words, the negative estimated CACEs indicate that those augmenting CIT with BUP have significantly higher response rates than those augmenting CIT with BUS. Our findings in the main analysis are robust and can be justified from the three sensitivity analyses. In addition, Figure 5.2 also shows that the absolute values of the estimated CACEs in the main analysis are the largest and that those in sensitivity analysis 3 are the smallest on any endpoint in the nonparametric approach. This is as expected because the nonparametric estimates are ratios of the estimated ITT effects of 𝑍 on 𝑌 and ITT effects of 𝑍 on 𝐴, and the estimated ITT effects of 𝑍 on 𝑌 in sensitivity analyses are a portion of the estimated ITT effects of 𝑍 on 𝑌 in the main analysis as shown in equations (5.2), (5.3), or (5.4). The standard errors in the main analysis are always larger than those in sensitivity analysis 3 in both approaches. This is reasonable because 84 in sensitivity analysis 3 the estimated mean of the endpoints in either group is a mixture of the non-missing endpoints from both treatment groups, which could reduce the variations. The estimated CACEs of the reduction of HAM-D17 scores in the multisite design approach if weighted by within-site variation may look odd. This is because extreme weight is placed on one particular site whose proportion of missingness in treatment assignment group 1 (𝑍 = 1) happen to be zero. Therefore, the estimated ITT effects of 𝑍 on 𝑌 in sensitivity analysis 2 are the same as the main analysis, and the absolute values of the estimated ITT effects of 𝑍 on 𝑌 in the main analysis and sensitivity analysis 2 are larger than those in sensitivity analysis 1 and sensitivity analysis 3. 5.2 Medication Switching: BUP vs. VEN For the 727 patients who failed to remit from citalopram and who accepted the medication switching strategy, 239 were randomly assigned to sustained-release bupropion (BUP), 238 were randomly assigned to sertraline (SER), and 250 were randomly assigned to extended-release venlafaxine (VEN). As we aim to find the optimal second-step treatment and sertraline falls into the same medication class SSRI as citalopram, we decided to only compare BUP and VEN for our subsequent analysis. As a result, our final sample contains 489 patients in 38 clinical sites. Table 5.7 shows the characteristics of the patients at baseline or level 2 entry in the medication switching strategy. 85 Table 5.7 Baseline and level 2 entry characteristics of the patients in the medication switching strategy. BUP VEN Total Characteristic (N=239) (N=250) (N=489) Setting Primary care 94 (39.3) 96 (38.4) 190 (38.9) Specialty care 145 (60.7) 154 (61.6) 299 (61.1) QIDS-C16 at level 2 entry 14.1±4.6 14.1±4.6 14.1±4.6 Monthly household income 2139±2737 2077±2258 2108±2508 Marital status Never married 69 (28.9) 63 (25.2) 132 (27.0) Married or cohabiting 89 (37.2) 104 (41.6) 193 (39.5) Separated or divorced 69 (28.9) 72 (28.8) 141 (28.8) Widowed 12 (5.0) 11 (4.4) 23 (4.7) Education Less than high school 25 (10.5) 27 (10.8) 52 (10.6) Less than college 157 (65.7) 165 (66.0) 322 (65.9) College or above 57 (23.8) 58 (23.2) 115 (23.5) Employment status Unemployed 90 (37.8) 100 (40.0) 190 (38.9) Employed 132 (55.5) 138 (55.2) 270 (55.3) Retired 16 (6.7) 12 (4.8) 28 (5.8) Insurance Private insurance 106 (44.5) 113 (46.9) 219 (45.7) Public insurance 45 (18.9) 33 (13.7) 78 (16.3) No insurance 87 (36.6) 95 (39.4) 182 (38.0) Sex Male 103 (43.1) 90 (36.0) 193 (39.5) Female 136 (56.9) 160 (64.0) 296 (60.5) Age 42.4±12.9 41.6±12.7 42.0±12.8 Race White 179 (74.9) 186 (74.4) 365 (74.6) Black/African American 47 (19.7) 42 (16.8) 89 (18.2) Other 13 (5.4) 22 (8.8) 35 (7.2) Hispanic No 216 (90.4) 221 (88.4) 437 (89.4) Yes 23 (9.6) 29 (11.6) 52 (10.6) CIRS score 5.1±4.2 5.2±4.1 5.2±4.2 Age at first MDE 25.7±14.6 24.4±13.8 25.0±14.2 Length of illness 16.6±13.6 17.3±13.6 17.0±13.6 Family history of MDD No 115 (48.5) 112 (45.7) 227 (47.1) Yes 122 (51.5) 133 (54.3) 255 (52.9) 86 Table 5.7 (cont’d) Suicide history No 226 (95.4) 237 (96.7) 463 (96.1) Yes 11 (4.6) 8 (3.3) 19 (3.9) SF-12 score Physical 46.7±12.4 47.1±12.3 46.9±12.3 Mental 25.8±8.5 26.0±7.7 25.9±8.1 WSAS score 25.2±8.9 25.3±8.6 25.2±8.7 QLESQ score 39.2±15.7 38.4±15.4 38.8±15.5 87 None of these characteristics are significantly different between the two treatment assignment groups. Table 5.8 shows the observed treatment adherence by treatment assignment groups for patients who accepted the medication switching strategy. For those assigned to BUP (𝑍 = 1), 33% did not take the treatment as required (𝐴 = 0) and 67% adhered to the treatment (𝐴 = 1). For those assigned to VEN (𝑍 = 2), 27% did not take the treatment as required (𝐴 = 0) and 73% adhered to the treatment (𝐴 = 2). Again, given the STAR*D design, none of the patients in each treatment assignment group crossed over to obtain the treatment from the other group. Table 5.8 Observed treatment adherence in the medication switching strategy. Treatment Received (A) None BUP VEN (A = 0) (A = 1) (A = 2) BUP (Z = 1) 79 (33) 160 (67) 0 (0) Treatment Assigned (Z) VEN (Z = 2) 67 (27) 0 (0) 183 (73) 5.2.1 Assessment of Assumptions We examine the validity of key assumptions in identifying CACE for the medication switching strategy in this section. For the nonparametric approach, we test whether treatment assignment can affect the outcomes through side effects. Table 5.9 shows the side effects in level 2 in the medication switching strategy. No significant difference was found between those assigned to BUP and those assigned to VEN in terms of maximal frequency ( 𝜒 2 = 11.2, 𝑃 = 0.08), maximal intensity (𝜒 2 = 7.74, 𝑃 = 0.26), or maximal burden of side effects (𝜒 2 = 8.5, 𝑃 = 0.2). Therefore, Assumption I cannot be falsified. Due to the design and criteria of STAR*D, Assumption II will hold directly as no crossover happens. Given Assumption I and II, if Assumption III holds, two 88 principal strata, which are the never-takers and the compliers, are left as shown in Table 5.4. We can then test Assumption III, the no partial-compliers assumption as we did in section 5.1.1. Specifically, we expect that the proportion of those assigned to BUP but did not take it did not differ significantly from the proportion of those assigned to VEN but did not take it and that the proportion of those assigned to BUP and took it did not differ significantly from the proportion of those assigned to VEN and took it. Even though Pr(𝐴 = 0|𝑍 = 1) = 0.33 is not identical to Pr(𝐴 = 0|𝑍 = 2) = 0.27, and Pr(𝐴 = 1|𝑍 = 1) = 0.67 is not identical to Pr(𝐴 = 2|𝑍 = 2) = 0.73 , these differences are not significant ( 𝑧 = 1.51, 𝑃 = 0.13 ). Therefore, we believe that Assumption III is likely satisfied. 89 Table 5.9 Side effects in level 2 in the medication switching strategy. BUP VEN Total Characteristic (N=239) (N=250) (N=489) Maximal frequency of side effects in level 2 No side effects 34 (14.2) 17 (6.8) 51 (10.4) 10% of the time 10 (4.2) 15 (6.0) 25 (5.1) 25% of the time 21 (8.8) 35 (14.0) 56 (11.4) 50% of the time 43 (18.0) 38 (15.2) 81 (16.6) 75% of the time 39 (16.3) 40 (16.0) 79 (16.2) 90% of the time 29 (12.1) 31 (12.4) 60 (12.3) All the time 63 (26.4) 74 (29.6) 137 (28.0) Maximal intensity of side effects in level 2 No side effects 31 (13.0) 18 (7.2) 49 (10.0) Trivial 5 (2.1) 7 (2.8) 12 (2.5) Mild 21 (8.8) 31 (12.4) 52 (10.6) Moderate 38 (15.9) 42 (16.8) 80 (16.4) Marked 61 (25.5) 56 (22.4) 117 (23.9) Severe 46 (19.2) 60 (24.0) 106 (21.7) Intolerable 37 (15.5) 36 (14.4) 73 (14.9) Maximal burden of side effects in level 2 No side effects 34 (14.2) 25 (10.0) 59 (12.1) Minimal impairment 27 (11.3) 16 (6.4) 43 (8.8) Mild impairment 28 (11.7) 39 (15.6) 67 (13.7) Moderate impairment 47 (19.7) 56 (22.4) 103 (21.0) Marked impairment 50 (20.9) 64 (25.6) 114 (23.3) Severe impairment 33 (13.8) 33 (13.2) 66 (13.5) Unable to function 20 (8.4) 17 (6.8) 37 (7.6) 90 We test Assumption V for the multisite design approach using the same set of site-level covariates as in section 5.1.1. The site-level covariate of clinical setting is not significantly associated with the site-specific estimated principal strata proportions 𝜋̂3|𝑘 (𝑡 = −1.08, 𝑃 = 0.29), 𝜋̂4|𝑘 (𝑡 = −0.87, 𝑃 = 0.39), 𝜋̂6|𝑘 (𝑡 = 0.95, 𝑃 = 0.35). Figure 5.3 shows the linear regression of the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 , 𝜋̂6|𝑘 on each of the aggregated variables. No significant association has been found for these covariates except the proportion of African Americans ( 𝜋̂3|𝑘 : 𝑡 = 0.69, 𝑃 = 0.5; 𝜋̂4|𝑘 : 𝑡 = 2.1, 𝑃 = 0.04; 𝜋̂6|𝑘 : 𝑡 = −2.0, 𝑃 = 0.05 ), the proportion of family history with MDD ( 𝜋̂3|𝑘 : 𝑡 = 2.92, 𝑃 = 0.006; 𝜋̂4|𝑘 : 𝑡 = −0.19, 𝑃 = 0.85; 𝜋̂6|𝑘 : 𝑡 = −1.37, 𝑃 = 0.18 ), and the mean quality of life ( 𝜋̂3|𝑘 : 𝑡 = −2.62, 𝑃 = 0.01; 𝜋̂4|𝑘 : 𝑡 = −2.68, 𝑃 = 0.01; 𝜋̂6|𝑘 : 𝑡 = 2.9, 𝑃 = 0.006). We further evaluate the association between the site-level 𝐼𝑇𝑇̂ 𝑦|𝑘 of the four endpoints and these three covariates through linear regressions and no significant association has been found for any of the endpoints. Therefore, neither the site-level variable of clinical setting nor the various aggregated variables confound the associations between the estimated site-specific principal strata proportions and the estimated site- level ITT effects. We cannot falsify Assumption V using our data. 91 Figure 5.3 Linear regressions of site-specific estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 , 𝜋̂6|𝑘 on the site-level aggregated covariates in the medication switching strategy. 92 5.2.2 Results In the medication switching strategy, 186 out of 489 patients who were either randomly assigned to BUP or VEN missed their HAM-D17 scores. Out of these patients, 56.5% are due to loss to follow up or withdrawn consent and 20.4% are due to inadequate stay in level 2, as shown in Table 5.10. 39 (21.0%) patients missed their endpoints due to treatment nonadherence. There were 4 (2.1%) patients whose HAM-D17 scores were missing with no reasons reported. Table 5.10 Reasons for missing HAM-D17 scores in the medication switching strategy. BUP VEN Total Reasons for missing HAM-D17 scores (N=94) (N=92) (N=186) Lost to follow up/Withdrew consent 49 (52.1) 56 (60.9) 105 (56.5) Inadequate stay in level 2 22 (23.4) 16 (17.4) 38 (20.4) Treatment nonadherence 20 (21.3) 19 (20.6) 39 (21.0) Other 3 (3.2) 1 (1.1) 4 (2.1) We also conduct the main analysis and three sensitivity analyses for the medication switching strategy. We use covariates that differ significantly between those with and without missing endpoints in the MMRM analysis. Specifically, we use QIDS-C16 scores at the beginning of level 2, education and duration of level 2 treatment for the endpoints of remission and total HAM-D17 scores and QIDS-C16 scores at the beginning of level 2 and duration of level 2 treatment for the endpoints of change of HAM-D17 scores and response. As shown in Figure 5.4, our results provide some clinical implications even though no statistical significance of the estimates of CACE is found on any of the endpoints. The estimated CACEs on remission are all above or almost zero. The total HAM-D17 scores at the end of level 2 for those switching to VEN are no bigger than those switching to BUP given the non-positive estimated CACEs. The estimated CACEs on 93 reduction of HAM-D17 scores are non-negative, suggesting that a greater or equal reduction happens after switching to VEN compared to switching to BUP. We do not show the results on response in Figure 5.3 because the random effects models on response do not converge. Our findings are robust in all analyses with main analysis having the widest confidence intervals and sensitivity analysis 3 the narrowest. 94 Figure 5.4 Estimated complier average causal effects (CACEs) and their 95% confidence intervals for the primary endpoint and secondary endpoints in the medication switching strategy. In each panel, the horizontal solid line represents the null. The dashed vertical line separates the nonparametric estimates on the left-hand side from the multisite design estimates on the right-hand side. The blue square represents main analysis. The green circle represents sensitivity analysis 1. The red triangle represents sensitivity analysis 2. The orange diamond represents sensitivity analysis 3. 95 5.3 Discussion With the observed treatment adherence proportions in each treatment assignment group and the population structure, the population principal strata proportions can be estimated. Specifically, in the medication augmentation strategy, combining Table 5.2 with Table 5.4, 20 to 27 percent of the population are never-takers whereas the rest 73 to 80 percent are compliers for the nonparametric approach. For the multisite design approach in this strategy, with Table 5.2 and Table 5.5, the estimated proportion of partial-1-compliers is 0.27, the estimated proportion of partial-2-compliers is 0.2, and the estimated proportion of compliers is 0.53 ( 0.8 − 0.27 = 0.73 − 0.2 = 0.53). Similarly, in the medication switching strategy, for the nonparametric approach, the estimated proportion of never-takers ranges from 0.27 to 0.33 and the estimated proportion of compliers ranges from 0.67 to 0.73 given Table 5.8 and Table 5.4. For the multisite design in this strategy, the estimated proportions of partial-2-compliers, partial-1-compliers and compliers are 0.33, 0.27 and 0.4 (0.67 − 0.27 = 0.73 − 0.33 = 0.4) respectively given Table 5.8 and Table 5.5. The above proportions are estimated assuming all assumptions required in the nonparametric and multisite design approaches are satisfied. However, as mentioned in section 5.1.1, it is likely that Assumption III, the no partial-compliers assumption, is violated for the nonparametric approach in the medication augmentation strategy. We revisit this problem here. The violation of Assumption III may result from two situations which are shown in Table 5.11 and Table 5.12. In situation 1, the existence of partial-1-compliers violates Assumption III whereas no partial-2-compliers exist. Therefore, given the observed treatment adherence proportions, the estimated principal strata proportions of never-takers, partial-1-compliers and compliers are 0.2, 0.07 and 0.73 respectively. 96 Table 5.11 Population structure when Assumption III (no partial-compliers) is violated for the nonparametric approach in the medication augmentation strategy: situation 1. Z=2 A(2)=0 A(2)=1 A(2)=2 A(1)=0 𝜋̂1 = 0.2 Z=1 A(1)=1 𝜋̂4 = 0.07 𝜋̂6 = 0.73 A(1)=2 Situation 2 has the same population structure as the multisite design approach. Both partial-1- compiers and partial-2-compliers exist in this situation which violates Assumption III. Therefore, the estimated principal strata proportions of partial-2-compliers, partial-1-compliers and compliers are 0.2, 0.27 and 0.53 respectively. Table 5.12 Population structure when Assumption III (no partial-compliers) is violated for the nonparametric approach in the medication augmentation strategy: situation 2. Z=2 A(2)=0 A(2)=1 A(2)=2 A(1)=0 𝜋̂3 = 0.2 Z=1 A(1)=1 𝜋̂4 = 0.27 𝜋̂6 = 0.53 A(1)=2 97 With the estimated proportions of partial-compliers, we can then apply equation (3.9) in chapter 3 to examine the estimators. Specifically, if we assume the ITT effects of 𝑍 on 𝑌 in partial- compliers are the same as in compliers and plug in the estimated proportions, the nonparametric estimators for situation 1 and situation 2 become, 𝐸[𝑌𝑖 (2,𝐴𝑖 (2))– 𝑌𝑖 (1,𝐴𝑖 (1))] Situation 1: = 𝐶𝐴𝐶𝐸 + 0.21 ∗ 𝐶𝐴𝐶𝐸 = 1.21 ∗ 𝐶𝐴𝐶𝐸 𝐸[𝐴𝑖 (2)−𝐴𝑖 (1)] 𝐸[𝑌𝑖 (2,𝐴𝑖 (2))– 𝑌𝑖 (1,𝐴𝑖 (1))] Situation 2: = 𝐶𝐴𝐶𝐸 + 0.52 ∗ 𝐶𝐴𝐶𝐸 = 1.52 ∗ 𝐶𝐴𝐶𝐸. 𝐸[𝐴𝑖 (2)−𝐴𝑖 (1)] It is therefore evident that the nonparametric estimators in both situations overestimate the CACE. Our results also resonate with this evidence. Figure 5.2 shows that our nonparametric estimates for any endpoints are much farther away from the null than the multisite design estimates. Even though the zero correlation assumptions of the multisite design approach are likely satisfied by our data, the estimators are still biased because of the measurement errors in estimating the principal strata proportions. Specifically, since Assumption V, the zero correlation assumption is likely satisfied by our data, the term 𝛑′ 𝐔𝛄 in bias formula (4.7) will not exist. The bias due to measurement errors is then negatively proportional to the true CACE with the proportion less than 1 because it is the ratio of the variance of measurement errors and the variance of the estimated principal strata proportions. With the negative bias, the multisite design approach will therefore underestimate the CACE. Our analyses provide several meaningful implications. Augmenting citalopram with sustained-release bupropion for remission is no better than augmenting citalopram with buspirone, as indicated by the approximately zero estimated CACE. The estimated CACE on the total HAM- D17 score ranges from 0.16 to 2.4 from the main analysis, suggesting that the total HAM-D17 score at the end of level 2 if augmenting citalopram with sustained-release bupropion is 0.16 to 2.4 smaller than that if augmenting citalopram with buspirone. Similarly, augmenting citalopram with 98 sustained-release bupropion reduce the HAM-D17 score 0.58 to 2.1 more than augmenting citalopram with buspirone. The response rate if augmenting citalopram with sustained-release bupropion is significantly higher by 4% to 8% than if augmenting citalopram with buspirone. Switching to extended-release venlafaxine and switching to sustained-release bupropion will have similar effects because none of the analyses for any of the four endpoints show significant difference. Our analyses address the previous criticisms of the STAR*D findings summarized in Pigott (2015). First, corresponding to the first criticism on page 2 of Pigott (2015), instead of using the self-report 16-item Quick Inventory of Depressive Symptomatology (QIDS-SR16) scores that may introduce bias, our primary and secondary endpoints are all based on the HAM-D17 scores which are collected by the blinded research outcomes assessors. Second, Pigott (2015) criticized the exclusion of patients with missing exit HAM-D17 scores or classification of these patients as nonremitters in previous reports (Rush, Trivedi, Wisniewski, Nierenberg, et al., 2006; Rush, Trivedi, Wisniewski, Stewart, et al., 2006; Trivedi et al., 2006). Our main analysis does not exclude any patients and uses the mixed effects models repeated measures approach for the missing endpoints problem. Our main analysis may be criticized by Pigott as in his third argument because the MMRM method implicitly assumes that the missing endpoints and non-missing endpoints follow the same distribution. We address this criticism by adopting three sensitivity analyses which are described in section 5.1.2. Our results from the sensitivity analyses are consistent with the results in the main analysis. 99 CHAPTER 6 DISCUSSION AND FUTURE WORK In this dissertation, we consider treatment nonadherence in randomized controlled trials with two active treatments. We employ the Neyman-Rubin causal framework and the principal stratification framework to explore an alternative causal estimand to the intention to treat (ITT) estimand. With treatment nonadherence, the ITT estimand identifies the causal effect of assignment instead of the causal effect of treatment, which therefore loses clinical relevance. To identify the causal effect of treatment, previous work has largely focused on the placebo-controlled trials. For example, Angrist et al. (1996) identified the complier average causal effects (CACE) under the exclusion restriction assumption and the monotonicity assumption. Yuan et al. (2018) identified the CACE under a multisite design. However, little attention has been placed on randomized controlled trials with two active treatments when treatment nonadherence exists. The identification of the causal estimand becomes complicated in such setting because the population is more diverse than that in the placebo-controlled trial. Therefore, we clarify and define the identification strategies for the CACE. Our proposed nonparametric approach identifies the complier average causal effect as the ratio of the ITT effect of treatment assignment 𝑍 on the outcome 𝑌 to the ITT effect of treatment assignment 𝑍 on the treatment received 𝐴 under the exclusion restriction, monotonicity, and no partial-compliers assumptions. We also derive the bias formula and evaluate the performance of the estimator if one of the aforementioned assumptions is violated. Our simulations show that the nonparametric approach can yield a slightly biased estimator for CACE. However, when sample size is 500 or above, the nonparametric estimator becomes unbiased when the percentage of 100 compliers is above or equal to 70%. In addition, increasing the number of compliers has the potential to reduce the bias to as close as zero when there is deviation from the assumptions. Our second approach identifies the complier average causal effect via a multisite design under the zero correlation assumption. Under this assumption our simulations show that this approach can yield an unbiased estimator if there are no measurement errors. We derive the bias formula for the situations where measurement errors exist and the zero-correlation assumption is violated. We find that increasing the number of people in each site can reduce the bias because it reduces the variation of the measurement errors. Increasing the number of sites, on the other hand, does not make a significant impact on the bias. Our study can provide several meaningful insights. First, we identify the causal estimand for the causal effects of treatment, which is clinically relevant and meaningful. Second, we explicitly clarify the underlying assumptions needed to identify our causal estimand of interest. This can help researchers better understand and interpret their results. Third, our derived bias formulas allow researchers to discover potential factors that can result in the bias. Besides, researchers can also improve their study design before conducting any studies based on the bias formulas. Fourth, our results from the nonparametric approach provide evidence for researchers to take measures and increase the adherence rates of participants in a sample. Fifth, our results from the multisite design approach provide evidence for researchers to increase enrollment in each site of the studies. Sixth, our approaches can be extended to any studies that involve a valid instrumental variable and unmeasured confounding and all the above implications can be applied. Although we make suggestions on increasing the number of compliers based on the nonparametric approach, one limitation of our study is that we cannot identify the complier population in practice. However, we can model the compliance status from observed covariates 101 and predict an individual’s compliance status instead (Roy et al., 2008). Next, one consequence of the non-identifiability of the principal strata is that the measurement errors in estimating the principal strata proportions in our multisite design approach are unavoidable. Therefore, one direction of the future work can be to address the bias issue resulting from the measurement errors. For example, we can substitute a random effects model for the multiple linear regression model used in equation (4.2) to take into account the variations and reduce the bias due to measurement errors (Reardon et al., 2014; Bloom et al., 2017). Another limitation of our study is that the zero correlation assumption used in the multisite design approach is a rather strong assumption and cannot be verified empirically. Future work can be done to incorporate covariates into this assumption and relax it to be a conditional assumption (Yuan et al., 2018). Both approaches in our study make point-identification of the complier average causal effects. To justify the findings, we can also extend our work by introducing bounds for our estimators in the future. Finally, we can extend our study setting to three active treatments, four active treatments, etc. and make pairwise comparisons. For example, in a RCT with three active treatments, the population can be divided into 4 × 4 = 16 principal strata based on the potential treatment received. We can then identify the causal estimand from any pair of the three treatments by making additional assumptions. 102 APPENDICES 103 APPENDIX A Results of the Binary Outcomes for the Nonparametric Approach Figure A.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the nonparametric estimator across proportions of compliers for the binary outcome when all assumptions are satisfied (scenario A). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 104 Figure A.2 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of compliers for the binary outcome when Assumption I, the exclusion restriction assumption, is violated (scenario B). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 105 Figure A.3 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of irrationalists for the binary outcome when Assumption II.a, the no irrationalists assumption, is violated (scenario C). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 106 Figure A.4 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of flip- floppers for the binary outcome when Assumption II.b, the no flip-floppers assumption, is violated (scenario D). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 107 Figure A.5 Sensitivity (upper left: bias; upper right: percent bias; lower left: mean standard error; lower right: root mean squared error) of the nonparametric estimator across proportions of partial- compliers for the continuous outcome when Assumption III, the no partial-compliers assumption, is violated (scenario E). The blue square represents sample size 𝑛 = 500. The red circle represents sample size 𝑛 = 1000. The green triangle represents sample size 𝑛 = 2000. 108 APPENDIX B Results of the Binary Outcomes for the Multisite Design Approach Figure B.1 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the nine scenarios based on number of sites 𝑠 and site sizes 𝑛 for the binary outcome when site size is fixed. For example, label “s=50, n=25” indicates that the scenario has 50 sites and 25 individuals in each site. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. 109 Figure B.2 Performance (top left: bias; top right: percent bias; middle left: mean standard error; middle right: root mean squared error; bottom left: standard error ratio; bottom right: coverage) of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites 𝑠 for the binary outcome when site size is varied. The blue square represents the oracle estimator which uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The red circle represents the naïve estimator which uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The green triangle represents the bootstrap estimator which is the average of the naïve estimates after resampling the site-level quantities. 110 Table B.1 Sensitivity of the multisite design estimator across the nine scenarios based on number of sites and site sizes for the binary outcome when site size is fixed if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. Oracle Naïve Bootstrap # of Site Measure Sites Size C1 C2 C3 C1 C2 C3 C1 C2 C3 25 0.13 -8.1 13.2 -1.4 -19.3 -0.64 -1.4 -19.3 -0.60 50 50 0.14 -8.1 11.9 -0.90 -16.6 1.8 -0.87 -16.6 1.9 100 0.01 -8.1 11.0 -0.62 -14.0 4.2 -0.60 -14.0 4.2 Bias 25 0.04 -7.9 13.5 -1.5 -19.1 -0.38 -1.5 -19.1 -0.35 (× 100) 100 50 0.00 -8.1 12.0 -0.94 -16.7 1.8 -0.94 -16.7 1.8 100 0.01 -8.2 10.9 -0.64 -14.2 4.1 -0.63 -14.2 4.1 25 -0.01 -8.0 13.3 -1.55 -19.1 -0.51 -1.5 -19.1 -0.49 200 50 0.00 -8.0 12.0 -0.97 -16.7 1.8 -0.97 -16.7 1.8 100 0.00 -8.2 10.8 -0.65 -14.2 4.0 -0.65 -14.2 4.0 25 0.42 -23.6 66.2 -4.7 -56.3 -3.2 -4.5 -56.2 -3.0 50 50 0.47 -23.4 60.0 -3.0 -48.3 9.2 -2.9 -48.2 9.3 100 0.05 -23.4 55.2 -2.0 -40.6 21.0 -2.0 -40.6 21.1 Percent 25 0.14 -23.1 67.8 -4.9 -55.8 -1.9 -4.9 -55.7 -1.8 Bias 100 50 -0.01 -23.4 60.5 -3.1 -48.4 9.1 -3.1 -48.4 9.1 (%) 100 0.05 -23.8 55.0 -2.1 -41.1 20.5 -2.1 -41.1 20.5 25 -0.04 -23.2 67.0 -5.1 -55.8 -2.5 -5.1 -55.7 -2.5 200 50 0.00 -23.3 60.2 -3.2 -48.6 8.8 -3.2 -48.5 8.8 100 -0.01 -23.9 54.5 -2.2 -41.2 20.2 -2.2 -41.2 20.2 25 0.09 0.21 0.21 0.07 0.15 0.15 0.07 0.15 0.15 50 50 0.07 0.19 0.17 0.06 0.15 0.14 0.06 0.15 0.14 100 0.06 0.17 0.14 0.05 0.14 0.12 0.05 0.14 0.12 Standard 25 0.06 0.15 0.15 0.05 0.11 0.11 0.05 0.10 0.10 Error 100 50 0.05 0.13 0.12 0.04 0.10 0.09 0.04 0.10 0.09 100 0.04 0.12 0.10 0.04 0.10 0.08 0.04 0.10 0.08 25 0.04 0.10 0.10 0.03 0.07 0.07 0.03 0.07 0.07 200 50 0.04 0.09 0.08 0.03 0.07 0.07 0.03 0.07 0.07 100 0.03 0.08 0.07 0.02 0.06 0.06 0.02 0.07 0.06 111 Table B.1 (cont’d) 25 0.09 0.23 0.25 0.07 0.25 0.15 0.07 0.25 0.15 50 50 0.07 0.2 0.21 0.06 0.22 0.14 0.06 0.22 0.14 Root 100 0.06 0.19 0.18 0.05 0.20 0.12 0.05 0.20 0.13 Mean 25 0.06 0.17 0.20 0.05 0.22 0.11 0.05 0.22 0.11 Square 100 50 0.05 0.15 0.17 0.04 0.20 0.10 0.04 0.20 0.10 Error 100 0.04 0.14 0.15 0.04 0.17 0.09 0.04 0.17 0.10 25 0.04 0.13 0.17 0.04 0.20 0.07 0.04 0.20 0.07 200 50 0.04 0.12 0.15 0.03 0.18 0.07 0.03 0.18 0.07 100 0.03 0.12 0.13 0.03 0.16 0.07 0.03 0.16 0.07 112 Table B.2 Sensitivity of the multisite design estimator across the twelve scenarios based on types of weights (UN: unweighted; SS: weight by site size; WV: weight by within-site variation; COMP: weight by number of compliers) and number of sites for the binary outcome when site size is varied if Assumption V, the zero correlation assumption, is violated. The oracle estimator uses the true principal strata proportions π3|𝑘 , π4|𝑘 and π6|𝑘 to fit the multiple linear regression. The naïve estimator uses the estimated principal strata proportions 𝜋̂3|𝑘 , 𝜋̂4|𝑘 and 𝜋̂6|𝑘 to fit the multiple linear regression. The bootstrap estimator is the average of the naïve estimates after resampling the site-level quantities. For each of these three estimators, we consider three cases. C1 is the reference case with 𝜆 = 𝛾 = 0 indicating that Assumption V is satisfied. C2 refers to the case when 𝜆 = 2 and 𝛾 = 1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = 1. C3 refers to the case when 𝜆 = 2 and 𝛾 = −1 indicating that Assumption V is violated and the coefficient of the unobserved confounding is 𝛾 = −1. Type Oracle Naïve Bootstrap of # of Measure Weight sites C1 C2 C3 C1 C2 C3 C1 C2 C3 50 -0.03 -8.0 12.1 -1.1 -17.2 1.3 -1.1 -17.2 1.4 UN 100 0.06 -8.0 12.2 -0.97 -17.0 1.6 -0.96 -17.0 1.6 200 0.04 -8.03 12.2 -1.1 -17.1 1.4 -1.0 -17.1 1.5 50 0.01 -8.0 11.9 -0.99 -16.8 1.7 -0.97 -16.8 1.7 SS 100 0.05 -8.0 11.9 -0.92 -16.7 1.9 -0.91 -16.7 1.9 Bias 200 0.04 -8.0 12.0 -0.98 -16.7 1.8 -0.97 -16.7 1.8 (× 100) 50 -0.03 -9.8 9.3 -3.9 -18.5 -0.59 -3.6 -18.4 -0.29 WV 100 -0.03 -9.9 9.0 -3.9 -18.4 -0.57 -3.8 -18.3 -0.37 200 -0.03 -10.0 8.8 -4.1 -18.4 -0.86 -4.0 -18.4 -0.73 50 0.02 -7.9 11.9 -0.82 -16.8 1.5 -0.81 -16.9 1.5 COMP 100 0.06 -8.0 11.9 -0.75 -16.7 1.8 -0.74 -16.7 1.8 200 0.03 -8.0 12.0 -0.81 -16.8 1.7 -0.81 -16.8 1.7 50 -0.09 -23.2 60.7 -3.6 -49.8 6.8 -3.5 -49.7 6.9 UN 100 0.20 -23.2 61.3 -3.2 -49.4 8.0 -3.2 -49.3 8.0 200 0.13 -23.3 61.5 -3.5 -49.7 7.2 -3.5 -49.7 7.3 50 0.02 -23.3 59.9 -3.3 -48.8 8.5 -3.2 -48.8 8.5 SS 100 0.18 -23.4 60.0 -3.1 -48.3 9.4 -3.0 -48.4 9.4 Percent 200 0.15 -23.3 60.5 -3.2 -48.5 9.1 -3.2 -48.5 9.1 Bias 50 -10.0 -28.5 46.9 -13.0 -53.7 -3.0 -12.0 -53.3 -1.4 (%) WV 100 -10.1 -28.7 45.3 -13.0 -53.4 -2.9 -12.5 -53.2 -1.9 200 -10.6 -28.9 44.5 -13.6 -53.5 -4.3 -13.3 -53.4 -3.7 50 0.06 -22.9 60.0 -2.7 -48.8 7.8 -2.7 -48.9 7.7 COMP 100 0.21 -23.3 60.0 -2.5 -48.6 8.9 -2.5 -48.6 8.9 200 0.10 -23.3 60.2 -2.7 -48.7 8.5 -2.7 -48.7 8.5 113 Table B.2 (cont’d) 50 0.07 0.19 0.18 0.06 0.15 0.14 0.06 0.15 0.14 UN 100 0.05 0.13 0.12 0.04 0.10 0.10 0.04 0.10 0.10 200 0.04 0.09 0.09 0.03 0.07 0.07 0.03 0.07 0.07 50 0.07 0.19 0.17 0.06 0.15 0.14 0.06 0.15 0.14 SS 100 0.05 0.13 0.12 0.04 0.10 0.09 0.04 0.10 0.09 Standard 200 0.04 0.09 0.08 0.03 0.07 0.07 0.03 0.07 0.07 Error 50 0.07 0.19 0.17 0.06 0.15 0.13 0.06 0.16 0.14 WV 100 0.05 0.13 0.12 0.04 0.10 0.09 0.04 0.12 0.10 200 0.04 0.09 0.08 0.03 0.07 0.06 0.03 0.08 0.07 50 0.07 0.20 0.18 0.06 0.15 0.14 0.06 0.15 0.14 COMP 100 0.05 0.14 0.12 0.04 0.10 0.10 0.04 0.11 0.10 200 0.03 0.10 0.09 0.03 0.07 0.07 0.03 0.07 0.07 50 0.08 0.21 0.22 0.06 0.23 0.15 0.06 0.23 0.15 UN 100 0.05 0.16 0.18 0.04 0.20 0.10 0.04 0.20 0.10 200 0.04 0.12 0.15 0.03 0.19 0.07 0.03 0.19 0.07 50 0.07 0.21 0.21 0.06 0.22 0.14 0.06 0.22 0.14 Root SS 100 0.05 0.15 0.17 0.04 0.20 0.10 0.04 0.20 0.10 Mean 200 0.04 0.12 0.15 0.03 0.18 0.07 0.03 0.18 0.07 Squared 50 0.09 0.25 0.21 0.08 0.25 0.14 0.07 0.25 0.14 Error WV 100 0.07 0.19 0.16 0.06 0.22 0.10 0.06 0.22 0.10 200 0.05 0.15 0.13 0.05 0.20 0.07 0.05 0.20 0.07 50 0.08 0.22 0.22 0.06 0.23 0.14 0.06 0.23 0.14 COMP 100 0.05 0.16 0.17 0.04 0.20 0.10 0.04 0.20 0.10 200 0.04 0.13 0.15 0.03 0.18 0.07 0.03 0.18 0.07 114 BIBLIOGRAPHY 115 BIBLIOGRAPHY Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. 13. Balke, A., & Pearl, J. (1994). Counterfactual Probabilities: Computational Methods, Bounds and Applications. In Uncertainty Proceedings 1994 (pp. 46–54). Elsevier. Balke, A., & Pearl, J. (1997). Bounds on Treatment Effects from Studies with Imperfect Compliance. Journal of the American Statistical Association, 92(439), 1171–1176. Bech, P., Fava, M., Trivedi, M. H., Wisniewski, S. R., & Rush, A. J. (2012). Outcomes on the pharmacopsychometric triangle in bupropion-SR vs. buspirone augmentation of citalopram in the STAR*D trial: Outcomes on the pharmacopsychometric triangle. Acta Psychiatrica Scandinavica, 125(4), 342–348. Bloom, H. S., Raudenbush, S. W., Weiss, M. J., & Porter, K. (2017). Using Multisite Experiments to Study Cross-Site Variation in Treatment Effects: A Hybrid Approach With Fixed Intercepts and a Random Treatment Coefficient. Journal of Research on Educational Effectiveness, 10(4), 817–842. Fava, M., Rush, A. J., Trivedi, M. H., Nierenberg, A. A., Thase, M. E., Sackeim, H. A., Quitkin, F. M., Wisniewski, S., Lavori, P. W., Rosenbaum, J. F., Kupfer, D. J., & for the STAR∗D Investigators Group. (2003). Background and rationale for the Sequenced Treatment Alternatives to Relieve Depression (STAR∗D) study. Psychiatric Clinics of North America, 26(2), 457–494. Frangakis, C. E. (2002). Clustered Encouragement Designs with Individual Noncompliance: Bayesian Inference with Randomization, and Application to Advance Directive Forms. Biostatistics, 3(2), 147–164. Frangakis, C. E., & Rubin, D. B. (2002). Principal Stratification in Causal Inference. Biometrics, 58(1), 21–29. Frank, E., Karp, J. F., & Rush, A. J. (1996). Efficacy of treatments for major depression. Psychopharmacology Bulletin, 29(4), 457-475. Gelenberg, A. J., Freeman, M. P., Markowitz, J. C., Rosenbaum, J. F., Thase, M. E., Trivedi, M. H., Rhoads, R. S. V., Reus, V. I., DePaulo, J. R., Fawcett, J. A., Schneck, C. D., & Silbersweig, D. A. (2010). WORK GROUP ON MAJOR DEPRESSIVE DISORDER. 152. Greenberg, P. E., Fournier, A.-A., Sisitsky, T., Pike, C. T., & Kessler, R. C. (2015). The Economic Burden of Adults With Major Depressive Disorder in the United States (2005 and 2010). The Journal of Clinical Psychiatry, 76(02), 155–162. 116 Hernán, M. A., & Robins, J. M. (2019). Causal Inference. Hirano, K., Imbens, G. W., Rubin, D. B., & Zhou, X.-H. (2000). Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics, 1(1), 69–88. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. Imbens, G. W., & Angrist, J. D. (1994). Identification and Estimation of Local Average Treatment Effects. Econometrica, 62, 467–475. Imbens, G. W., & Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica, 72(6), 1845-1857. Imbens, G. W., & Rubin, D. B. (1997). Bayesian inference for causal effects in randomized experiments with noncompliance. The Annals of Statistics, 25(1), 305–327. https://doi.org/10.1214/aos/1034276631 Imbens, G. W., & Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press. ICH E9 working group. (2020). ICH E9 (R1): addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. James, S. L., Abate, D., Abate, K. H., Abay, S. M., Abbafati, C., Abbasi, N., Abbastabar, H., Abd- Allah, F., Abdela, J., Abdelalim, A., Abdollahpour, I., Abdulkader, R. S., Abebe, Z., Abera, S. F., Abil, O. Z., Abraha, H. N., Abu-Raddad, L. J., Abu-Rmeileh, N. M. E., Accrombessi, M. M. K., … Murray, C. J. L. (2018). Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. The Lancet, 392(10159), 1789–1858. Jiang, Z., Ding, P., & Geng, Z. (2016). Principal causal effect identification and surrogate end point evaluation by multiple trials. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(4), 829–848. Jo, B. (2002). Estimation of Intervention Effects with Noncompliance: Alternative Model Specifications. Journal of Educational and Behavioral Statistics, 27(4), 385–409. Little, R. J., Long, Q., & Lin, X. (2009). A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized Clinical Trials Subject to Noncompliance. Biometrics, 65(2), 640–649. Little, R. J., Long, Q., & Lin, X. (2009). A Comparison of Methods for Estimating the Causal Effect of a Treatment in Randomized Clinical Trials Subject to Noncompliance. Biometrics, 65(2), 640–649. 117 Long, Q., Little, R. J. A., & Lin, X. (2010). Estimating causal effects in trials involving multitreatment arms subject to non-compliance: A Bayesian framework. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59(3), 513–531. Mehrotra, D. V. (2019). A note on the draft International Council for Harmonisation guidance on estimands and sensitivity analysis. Clinical Trials, 16(4), 339–344. Mehrotra, D. V., Liu, F., & Permutt, T. (2017). Missing data in clinical trials: Control-based mean imputation and sensitivity analysis. Pharmaceutical Statistics, 16(5), 378–392. Miratrix, L., Furey, J., Feller, A., Grindal, T., & Page, L. C. (2018). Bounding, An Accessible Method for Estimating Principal Causal Effects, Examined and Explained. Journal of Research on Educational Effectiveness, 11(1), 133–162. National Research Council. (2010). The Prevention and Treatment of Missing Data in Clinical Trials. National Academies Press. Pearl, J. (2011). Principal Stratification—A Goal or a Tool? The International Journal of Biostatistics, 7(1), 1–13. Pigott, H. E. (2015). The STAR*D Trial: It is Time to Reexamine the Clinical Beliefs That Guide the Treatment of Major Depression. The Canadian Journal of Psychiatry, 60(1), 9–13. Raudenbush, S. W., & Bloom, H. S. (2015). Learning About and From a Distribution of Program Impacts Using Multisite Trials. American Journal of Evaluation, 36(4), 475–499. Reardon, S. F., & Raudenbush, S. W. (2013). Under What Assumptions Do Site-by-Treatment Instruments Identify Average Causal Effects? Sociological Methods & Research, 42(2), 143–163. Reardon, S. F., Unlu, F., Zhu, P., & Bloom, H. S. (2014). Bias and Bias Correction in Multisite Instrumental Variables Analysis of Heterogeneous Mediator Effects. Journal of Educational and Behavioral Statistics, 39(1), 53–86. Roy, J., Hogan, J. W., & Marcus, B. H. (2008). Principal stratification with predictors of compliance for randomized trials with 2 active treatments. Biostatistics, 9(2), 277–289. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. Rubin, D. B. (2005). Causal Inference Using Potential Outcomes: Design, Modeling, Decisions. Journal of the American Statistical Association, 100(469), 322–331. Rush, A. J., Fava, M., Wisniewski, S. R., Lavori, P. W., Trivedi, M. H., Sackeim, H. A., Thase, M. E., Nierenberg, A. A., Quitkin, F. M., Kashner, T. M., Kupfer, D. J., Rosenbaum, J. F., Alpert, J., Stewart, J. W., McGrath, P. J., Biggs, M. M., Shores-Wilson, K., Lebowitz, B. 118 D., Ritz, L., … for the STAR*D Investigators Group. (2004). Sequenced treatment alternatives to relieve depression (STAR*D): Rationale and design. Controlled Clinical Trials, 25(1), 119–142. Rush, A. J., Trivedi, M. H., Wisniewski, S. R., Nierenberg, A. A., Stewart, J. W., Warden, D., Niederehe, G., Thase, M. E., Lavori, P. W., Lebowitz, B. D., McGrath, P. J., Rosenbaum, J. F., & Sackeim, H. A. (2006). Acute and Longer-Term Outcomes in Depressed Outpatients Requiring One or Several Treatment Steps: A STAR*D Report. Am J Psychiatry, 13. Rush, A. J., Trivedi, M. H., Wisniewski, S. R., Stewart, J. W., Nierenberg, A. A., Thase, M. E., Ritz, L., Biggs, M. M., Warden, D., Luther, J. F., Shores-Wilson, K., Niederehe, G., & Fava, M. (2006). Bupropion-SR, Sertraline, or Venlafaxine-XR after Failure of SSRIs for Depression. New England Journal of Medicine, 354(12), 1231–1242. Shrier, I., Steele, R. J., Verhagen, E., Herbert, R., Riddell, C. A., & Kaufman, J. S. (2014). Beyond intention to treat: What is the right question? Clinical Trials: Journal of the Society for Clinical Trials, 11(1), 28–37. Sinyor, M., Schaffer, A., & Levitt, A. (2010). The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) Trial: A Review. The Canadian Journal of Psychiatry, 55(3), 126– 135. Splawa-Neyman, J., Dabrowska, D. M., & Speed, T. P. (1990). On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9. Statistical Science, 5(4), 465–472. Swanson, S. A., Hernán, M. A., Miller, M., Robins, J. M., & Richardson, T. S. (2018). Partial Identification of the Average Treatment Effect Using Instrumental Variables: Review of Methods for Binary Instruments, Treatments, and Outcomes. Journal of the American Statistical Association, 113(522), 933–947. Trivedi, M. H., Thase, M. E., Ritz, L., Biggs, M. M., & Rush, A. J. (2006). Medication Augmentation after the Failure of SSRIs for Depression. The New England Journal of Medicine, 10. VanderWeele, T. J. (2011). Principal Stratification—Uses and Limitations. The International Journal of Biostatistics, 7(1), 1–14. World Health Organization. (2017). Depression and other common mental disorders: global health estimates (No. WHO/MSD/MER/2017.2). World Health Organization. Yuan, L.-H., Feller, A., & Miratrix, L. W. (2018). Identifying and Estimating Principal Causal Effects in Multi-site Trials. ArXiv:1803.06048 [Stat]. 119