MULTI-MARKER GENETIC ASSOCIATION AND INTERACTION TESTS FOR SURVIVAL OUTCOMES By Di Wu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Biostatistics – Doctor of Philosophy 2022 ABSTRACT MULTI-MARKER GENETIC ASSOCIATION AND INTERACTION TESTS FOR SURVIVAL OUTCOMES By Di Wu With advancements in high-throughput technologies, studies have been conducted to investigate the role of massive genetic variants in human diseases. While multi-marker tests have been developed for binary and continuous disease outcomes, there are few such tests available for time-to-event outcomes. The existing tests have various drawbacks, including slow computation speed, being conservative in small samples, incapability of dealing with confounding, etc. To facilitate the genetic association and interaction analyses of time- to-event outcomes, we develop four suites of novel multi-marker survival tests for genetic association and interaction. The new tests address all the drawbacks of the existing tests. Furthermore, they can account for potential genetic heterogeneity to enhance power and deal with left truncation of survival data. Some of the new tests can handle competing risks, and some apply to interval-censored data. Simulation studies show that the new tests perform very well in finite samples of various sizes. When the genetic effect is heterogeneous across individuals/subpopulations, the new association tests considering genetic heterogeneity are more powerful than the existing tests, which do not account for genetic heterogeneity. Using the new methods, we performed genome-wide association analyses of 1) age to Alzheimer’s disease data from the Religious Orders Study and the Rush Memory and Aging Project (ROSMAP) and 2) age to early childhood caries data from a dbGaP study, Dental Caries: Whole Genome Association and Gene x Environment Studies. ACKNOWLEDGMENTS I am sincerely grateful to my dissertation advisors, Dr. Chenxi Li and Dr. Qing Lu for their valuable guidance and inspiring comments. Without their supervision and support, it would be impossible for me to make great progress in my academic research and complete the dissertation. Also, I want to express my appreciation to my committee members, Dr. Yuehua Cui and Dr. Honglei Chen for their precious time and comments to improve this dissertation. iii TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii CHAPTER 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction and literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Organization of this dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 CHAPTER 2 Multi-marker genetic association and interaction tests for survival out- comes based on weighted V statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Association tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 G-G/G-E interaction test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Testing genetic association in the absence of genetic heterogeneity . . . . 14 2.2.2 Testing genetic association in the presence of genetic heterogeneity . . . 17 2.2.3 Testing G-G/G-E interaction effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.4 Empirical size under stringent p-value thresholds . . . . . . . . . . . . . . . . . 25 2.3 A Real Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 CHAPTER 3 Multi-marker genetic association and interaction tests with interval-censored survival outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.1 Association test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.2 G-G/G-E interaction test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.1 Testing genetic association in the absence of genetic heterogeneity . . . . 39 3.2.2 Testing genetic association in the presence of genetic heterogeneity . . . 41 3.2.3 Testing G-G/G-E interaction effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2.4 Empirical size under stringent p-value thresholds . . . . . . . . . . . . . . . . . 48 3.3 A Real Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 CHAPTER 4 Multi-marker genetic association and interaction tests based on the ac- celerated failure time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.1 Association tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.2 G-G/G-E interaction test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1.3 Small sample adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 iv 4.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.1 Association test in the absence of genetic heterogeneity . . . . . . . . . . . . 59 4.2.2 Association test in the presence of genetic heterogeneity . . . . . . . . . . . 61 4.2.3 Testing G-G/G-E interaction effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.4 Empirical size under stringent p-value thresholds . . . . . . . . . . . . . . . . . 70 4.2.5 Small-sample correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3 A Real Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 CHAPTER 5 Multi-marker genetic association and interaction tests based on the ad- ditive hazards model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1.1 Association test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1.2 G-G/G-E interaction test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.1.3 Small sample adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.1 Association test in the absence of genetic heterogeneity . . . . . . . . . . . . 85 5.2.2 Association test in the presence of genetic heterogeneity . . . . . . . . . . . 87 5.2.3 Testing G-G/G-E interaction effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2.4 Empirical size under stringent p-value thresholds . . . . . . . . . . . . . . . . . 94 5.2.5 Small-sample correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 A Real Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 CHAPTER 6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 6.1 Remarks regarding selection of survival model . . . . . . . . . . . . . . . . . . . . . . . .104 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107 APPENDIX A: Proofs for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108 APPENDIX B: Proofs for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111 APPENDIX C: Proofs for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121 v LIST OF TABLES Table 2.1 Computation time (seconds) of WV, coxKM, gt and coxSKATs under the null hypothesis of no genetic association and p = 5. . . . . . . . . . . . . . . . . . . 15 Table 2.2 Empirical size and power comparison between WV and coxKM in testing association of a SNP set with right-censored survival time in the absence of genetic heterogeneity, adjusting for covariates. . . . . . . . . . . . . . . . . . . . . . . 16 Table 2.3 Empirical size and power comparison between HWV and coxKM in the presence of genetic heterogeneity across four observable sub-populations, adjusting for covariates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Table 2.4 Empirical size comparison between HWV and coxKM in testing genetic association in the presence of genetic heterogeneity across two latent sub- populations, adjusting for covariates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Table 2.5 Power comparison between HWV and coxKM in the presence of genetic heterogeneity across two latent sub-populations, adjusting for baseline co- variates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Table 2.6 Empirical size comparison between HWV and coxKM in the presence of genetic heterogeneity across 20 latent sub-populations, adjusting for covari- ates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Table 2.7 Empirical size comparison of HWV and coxKM considering genetic hetero- geneity due to genomic profile, adjusting for covariates. . . . . . . . . . . . . . . . 23 Table 2.8 Empirical sizes and powers of WVI and LRT in testing G-G interaction without considering left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Table 2.9 Empirical sizes and powers of HWV and LRT in testing G-E interaction without considering left truncation (p = 10, q = 2, n = 1000). . . . . . . . . . . . 26 Table 2.10 Empirical size for WV and HWV that considers heterogeneity across indi- vidual genome profiles and adjusts for covariates, under several stringent p-value thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Table 2.11 Top five significant genes discovered by WV/HWV from a genome-wide association analysis of ROSMAP dataset by assuming age-to-onset of AD follows Cox PH model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 vi Table 3.1 Empirical size and power of WV-IC with r = 0 in testing genetic effects in the absence of left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Table 3.2 Empirical size and power of WV-IC with r = 0 in testing genetic effects under linear confounding and no left truncation. . . . . . . . . . . . . . . . . . . . . . 41 Table 3.3 Comparison of performance of WV-IC and HWV-IC (r = 0) in testing genetic association under genetic heterogeneity across four observable sub- populations and no left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Table 3.4 Empirical size of HWV-IC with r = 0 in testing genetic effects under genetic heterogeneity across two latent subpopulations and no left truncation. . . . . 44 Table 3.5 Power of HWV-IC with r = 0 in testing genetic effects under genetic het- erogeneity across two latent subpopulations and no left truncation. Various heterogeneity scenarios were considered, determined by the values of β1k and β2k , including the same effect size and the same effect direction (T1), identical sizes but opposite directions (T2), no effect in one subpopulation while positive effect in the other (T3), different sizes and opposite directions (T4), and different sizes but the same direction (T5). . . . . . . . . . . . . . . . . . 44 Table 3.6 Empirical size of HWV-IC with r = 0 in testing genetic effects under genetic heterogeneity across twenty latent subpopulations and no left truncation. . . 45 Table 3.7 Power of HWV-IC with r = 0 under genetic heterogeneity across twenty latent subpopulations and no left truncation. . . . . . . . . . . . . . . . . . . . . . . . 45 Table 3.8 Empirical size of HWV-IC with r = 0 in testing genetic effects under genetic heterogeneity across individual genome profiles and no left truncation. . . . . 46 Table 3.9 Power of HWV-IC with r = 0 under genetic heterogeneity across individual genome profiles and no left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 3.10 Empirical sizes and powers of WVI-IC and LRT in testing G-G interaction under no left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Table 3.11 Empirical sizes and powers of WVI-IC and LRT in detecting G-E interaction under no left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Table 3.12 Empirical sizes of WV-IC and HWV-IC under stringent p-value thresholds and no left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 vii Table 3.13 Top five genes discovered by WV-IC and HWV-IC from a gene-based genome- wide association analysis of the DC-WGAGE dataset. PH and PO stand for the Cox proportional hazard model and the proportional odds model respectively. CP and IBS stand for cross-product kernel and IBS kernel respectively. Various types of heterogeneity were considered, including no genetic heterogeneity (S1), heterogeneity across races (S2), heterogeneity be- tween the two home water fluoride levels (S3), heterogeneity between sexes (S4), and heterogeneity across genetic backgrounds (S5). . . . . . . . . . . . . . . . 52 Table 4.1 Empirical size and power comparison of VZ and VZH in testing genetic effects under covariate adjustment and left truncation. . . . . . . . . . . . . . . . . . . . . . 60 Table 4.2 Empirical size and power of VZ in testing genetic effects under quadratic confounding and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Table 4.3 Empirical sizes of VZH and VZ in testing genetic association under genetic heterogeneity across two observable subpopulations and left truncation. . . . 63 Table 4.4 Powers of VZH and VZ in testing genetic association under genetic hetero- geneity across two observable subpopulations and left truncation. . . . . . . . . 64 Table 4.5 Empirical sizes of VZ and VZH in testing genetic effects under genetic het- erogeneity across two latent subpopulations, covariate adjustment and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Table 4.6 Powers of VZ and VZH in testing genetic effects under genetic heterogeneity across two latent subpopulations, covariate adjustment and left truncation. Various heterogeneity scenarios were considered, determined by the values of β1k and β2k , including the same effect size and the same effect direction (T1), identical sizes but opposite directions (T2), no effect in one subpopulation while positive effect in the other (T3), and different sizes but the same direction (T4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Table 4.7 Empirical sizes of VZ and VZH in testing genetic effects under genetic hetero- geneity across twenty latent subpopulations, covariate adjustment and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Table 4.8 Power comparison of VZH and VZ under genetic heterogeneity across twenty latent subpopulations, covariate adjustment and left truncation. . . . . . . . . . 67 Table 4.9 Empirical sizes of VZH and VZ in testing genetic effects under genetic het- erogeneity across individual genome profiles, covariate adjustment and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 viii Table 4.10 Powers of VZH and VZ under genetic heterogeneity across individual genome profiles, covariate adjustment and left truncation. . . . . . . . . . . . . . . . . . . . . 68 Table 4.11 Empirical sizes and powers of VI and the Wald test in testing G-G interaction under left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Table 4.12 Empirical sizes and powers of VI and the Wald test in testing G-E interaction under left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Table 4.13 Empirical sizes of VZ and VZH under stringent p-value thresholds and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Table 4.14 Empirical size and power of test VZ and VZc considering left truncation, with n = 100, p = 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Table 4.15 Empirical size and power of test VZH and VZH,c considering genetic het- erogeneity across two observable subpopulations and left truncation, with n = 200, p = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Table 4.16 Empirical size and power of test VI and VIc in testing G-G interaction effect, considering left truncation, with n = 600, p = q = 2. . . . . . . . . . . . . . . . . . . 73 Table 4.17 Empirical size and power of test VI and VIc in testing G-E interaction effect, considering left truncation, with n = 500, p = 2. . . . . . . . . . . . . . . . . . . . . . 73 Table 4.18 Top five genes discovered by VZ and VZH,c from ROSMAP dataset by as- suming the age-to-onset of Alzheimer’s disease follows AFT model. IBS, CP, Lap and Quad stand for IBS, cross-product, Laplacian and Quadratic kernel, respectively, that measures genetic similarity. Various types of het- erogeneity were considered, including no genetic heterogeneity (S1), hetero- geneity between sexes (S2), heterogeneity across education categories (S3), and heterogeneity across genetic backgrounds (S4). . . . . . . . . . . . . . . . . . . . 77 Table 4.19 Function of IGSF23, the third most frequent top gene, according to Entrez. 77 Table 5.1 Empirical size and power comparison of test VZ and test VZH in testing genetic effects considering adjustment covariates and left truncation. . . . . . . 86 Table 5.2 Empirical size and power of test VZ in testing genetic effects under quadratic confounding and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Table 5.3 Empirical sizes of test VZ and test VZH in testing genetic association un- der genetic heterogeneity across two observable subpopulations, considering adjustment covariates and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . 88 ix Table 5.4 Power of test VZ and test VZH in testing genetic association under genetic heterogeneity across two observable subpopulations, considering adjustment covariates and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Table 5.5 Empirical size of VZ and VZH in testing genetic effects under genetic hetero- geneity across two latent subpopulations, with covariates adjustment and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Table 5.6 Power of test VZ and VZH in testing genetic effects under genetic heterogene- ity across two latent subpopulations, considering adjustment covariates and left truncation. Various heterogeneity scenarios were considered, determined by the values of β1k and β2k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Table 5.7 Empirical size of test VZ and VZH in testing genetic effects under genetic heterogeneity across twenty latent subpopulations considering adjustment covariates and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Table 5.8 Power of test VZH and VZ under genetic heterogeneity across twenty latent subpopulations, considering adjustment covariates and left truncation. . . . . 92 Table 5.9 Empirical size of test VZH and VZ in testing genetic effects under genetic heterogeneity across individual genome profiles, considering adjustment co- variates and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Table 5.10 Power of test VZH and VZ under genetic heterogeneity across individual genome profiles, considering adjustment covariates and left truncation. . . . . 93 Table 5.11 Empirical sizes and powers of test VI and Wald test in testing G-G interac- tion considering left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Table 5.12 Empirical sizes and powers of test VI and Wald test in detecting G-E inter- action considering left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Table 5.13 Empirical sizes of test VZ and VZH under stringent p-value thresholds and left truncation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Table 5.14 Empirical size and power of test VZ and VZc considering left truncation, with n = 100, p = 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Table 5.15 Empirical size and power of test VZH and VZH,c considering genetic het- erogeneity across two observable subpopulations and left truncation, with n = 200, p = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 x Table 5.16 Empirical size and power of test VI and VIc in testing G-G interaction effect, considering left truncation, with n = 300, p = q = 10. . . . . . . . . . . . . . . . . . 98 Table 5.17 Empirical size and power of test VI and VIc in testing G-E interaction effect, considering left truncation, with n = 200, p = 10. . . . . . . . . . . . . . . . . . . . . 99 Table 5.18 Top five genes discovered by VZc and VZH,c from ROSMAP dataset by as- suming age-to-onset of AD follows additive hazards model. CP, IBS, Lap and Quad are IBS, cross-product, Laplacian and Quadratic kernel, respec- tively, which measure genetic similarity. Various types of heterogeneity were considered, including no genetic heterogeneity (S1), heterogeneity across ed- ucation categories (S2), heterogeneity between sexes (S3), and heterogeneity across genetic backgrounds (S4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102 xi LIST OF FIGURES Figure 2.1 U nif orm(0, 1) Q-Q plots of p-values for WV, coxKM, gt and coxSKATs under n = 150, p = 5 and the null hypothesis of no genetic association. . . . . 16 Figure 2.2 U nif orm(0, 1) Q-Q plots of p-values for WV and coxKM under linear con- founding and the null hypothesis of no genetic association. . . . . . . . . . . . . . 18 Figure 2.3 Power comparison between HWV and coxKM in the presence of genetic heterogeneity across twenty latent subpopulations when p = 15 and n = 1000. 24 Figure 2.4 Power comparison between HWV and coxKM in the presence of genetic heterogeneity across individual genome profiles when p = 10 and n = 1000. . 25 Figure 4.1 Q-Q plot of the null p-value of VZ under quadratic confounding with n = 400 and p = 3 or 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Figure 4.2 Comparison of uniform QQ plot of test VZ and VZc in detecting genetic association without considering genetic heterogeneity effect, with n=100 and p=15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Figure 4.3 Uniform QQ plot of VZH and VZH,c , considering genetic heterogeneity across two observable subpopulations, with n=200 and p=10. . . . . . . . . . . . . . . . . 74 Figure 4.4 Comparison of uniform QQ plot of test VI and VIc in testing G-G interaction effect, with n=600, p=q=2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Figure 4.5 Comparison of uniform QQ plot of test VI and VIc in testing G-E interaction effect, with n=500, p=2, q=1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Figure 5.1 Comparison of uniform QQ plot of test VZ and VZc in detecting genetic association without considering genetic heterogeneity effect, with n=100 and p=15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Figure 5.2 Comparison of uniform QQ plot of test VZH and VZH,c , considering genetic heterogeneity across two observable subpopulations, with n = 200 and p = 10. 98 Figure 5.3 Comparison of uniform QQ plot of test VI and VIc in testing G-G interaction effect, with n=300, p=q=10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100 Figure 5.4 Comparison of uniform QQ plot of test VI and VIc in testing G-E interaction effect, with n=200, p=10, q=1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100 xii CHAPTER 1 Introduction 1.1 Introduction and literature review Advances in high throughput sequencing (HTS) technologies have been used to create large datasets, which enables researchers to comprehensively investigate the insights of a wide- ranging catalog of variants (e.g., single nucleotide polymorphisms (SNPs), gene expression, and copy number variations). Also, with HTS technologies, genome-wide association studies (GWAS), and whole-genome/exome sequencing association studies can be used to detect the association between novel genetic variants and complex human diseases. A case-control study has been a popular experiment designed in genetic association studies, by comparing the frequency of SNP alleles in two groups of individuals: 1) cases who have been diagnosed with the target disease and 2) controls who are randomly selected and disease-free. Although case-control is one of the primary tools in identifying genetic variants that are associated with disease susceptibility, there is increased interest in studying the genetic association with survival outcomes such as age-to-onset of cancer diagnosis (Nagy, Munkácsy, and Győrffy, 2021; Huang et al., 2017, 2009; Azzato et al., 2010), progressive supranuclear palsy (PSP) (Jabbari et al., 2021), and alcohol dependence (Kapoor et al., 2014). The reason behind this increasing interest is that, with the continued growth of longitudinal data in healthcare research, a cohort study is becoming more and more popular in experiment design. In cohort studies, working with survival outcomes has been shown to be more powerful in detecting the association between SNPs and the onset of a wide range of human phenotypes (van der Net et al., 2008; Staley et al., 2017; Hughey et al., 2019), because in addition to binary outcomes (i.e. disease occurrence), survival outcomes also provided information about age/time of the disease onset. Numerous studies in cancer research, for example, lung cancer (Beer et al., 1 2002; Chen et al., 2007; Yu et al., 2008) and breast cancer (Shu et al., 2012; Azzato et al., 2010), have demonstrated the prediction of survival outcomes using genetic markers allow the identification of high-risk individuals at an early stage of disease, therefore additional treatment can be given after primary treatment to prevent the recurrence of the disease. HTS technologies are able to sequence millions of DNA molecules at a time, producing 120Gb data in less than 30 hours. Despite the comprehensive insights about DNA molecules, one challenge is the high dimensionality of the sequencing dataset. Single variant tests identify important genetic variants by assessing their marginal effects individually. However, this suffers from 1) low efficiency when there exists a complex relationship between multiple genetic variants and the phenotype, 2) high computation burden due to a massive number of genetic variants, 3) low statistical power on rare variants unless sample size or effect size is large (Lee et al., 2014), 4) fewer SNPs meet the threshold after correcting for multiple testing issue due to a large number of genetic variants. To address these issues, a biologically relevant region (e.g., genes, genetic pathways) is tested instead of a single variant. Such region-based test enables the aggregation of association signals from rare variants to boost statistical testing power. Moreover, the results of region-based genetic association analyses are more interpretable and reproducible, since the effect of single variants in a biological region is integrated, therefore the test is robust for the effect of poorly annotated single variant (Subramanian et al., 2005; Beyene et al., 2009; Goeman et al., 2005). Kernel method has been extensively used in machine learning applications (Hofmann, Schölkopf, and Smola, 2008). By applying feature mapping functions, the kernel method en- ables researchers to view data from a different perspective and capture complex relationships between predictive variables and response variables. Some popular applications of the kernel 2 method are support vector machine (SVM) classifier, kernel principal component analysis (kernel PCA) for dimension reduction, and kernel K-means for clustering. Kernel machine regression (KMR) is a more general and rigorous framework that applies the kernel method in regression analysis. In the past two decades, numerous set-based genetic association tests for survival outcomes were developed based on KMR (Cai, Tonini, and Lin, 2011; Goeman et al., 2005; Sinnott and Cai, 2013; Chen et al., 2014). These tests have been shown to detect novel genes or genetic pathways associated with age-to-onset of human diseases such as breast cancer and obesity. However, the existing KMR-based tests for survival outcomes have several drawbacks: 1) the test is time-consuming for permutation-based globaltest: gt (Goeman et al., 2005), kernel machine Cox regression: coxKM (Cai, Tonini, and Lin, 2011), and kernel machine regression under accelerated failure time (AFT) model: af tKM (Sinnott and Cai, 2013), they rely on resampling procedures (1-10K permutations) to approximate the null distribution of test statistic, thereby compute p-value, which is time-consuming as shown in simulation results in Chapter 2. 2) asymptotic theory-based globaltest (Goeman et al., 2005), KMR for AFT model (Sinnott and Cai, 2013), and the score-based sequence kernel association test (SKAT) for Cox regression: CoxSKATs (Chen et al., 2014) are not reliable under moderate sample size (n=150). This is demonstrated by the uniform Quantile-Quantile (QQ) plots of p-value in simulation results in Chapter 2, it can be observed that there is an obvious deviation from the uniform distribution. 3) The major existing KMR-based association tests assume genetic homogeneous effect, where the genetic effects are identical across different individuals or subpopulations. All existing KMR-based association test are subject to power loss when the genetic variants in the target population have a heterogeneous effect, where genetic effects 3 vary across individuals and (un)observable subpopulations. Studies (McClellan and King, 2010; Galvan, Ioannidis, and Dragani, 2010) suggested that genetic heterogeneity is more suitable in capturing genetic effects in genetic research of complex human diseases, because through the evolutionary history of human genetics, the same genetic marker may possess numerous different rare mutations, and each rare mutations may lead to different phenotype in different individuals and environments. 4) coxKM (Cai, Tonini, and Lin, 2011) is not accurate when there exists observable confounder. As shown in simulation results in Chapter 2, when in the presence of adjustment covariates that are linearly correlated with genetic markers, coxKM becomes anti-conservative and the QQ plot deviates from U nif orm(0, 1). 5) the existing genetic association tests: globaltest (Goeman et al., 2005), CoxSKAT s (Chen et al., 2014), coxKM (Cai, Tonini, and Lin, 2011), and af tKM (Sinnott and Cai, 2013) all studied right-censored data, but not applicable to survival times that subject to general interval-censoring, which arise in many important studies of complex human diseases: age-related macular degeneration (AMD) (Sun and Ding, 2019) and breast cancer (Kim, Williamson, and Lin, 2016). To address the drawbacks of existing KMR-based tests and to facilitate the genetic association and interaction analyses of interval-censored time-to-event outcomes, four suites of multi-marker survival tests were developed based on weighted V statistics. 1.2 Organization of this dissertation In the rest of this dissertation, a series of novel association and interaction tests were devel- oped based on different types of survival outcomes or survival models. They are organized as follows. In Chapter 2, we assume that survival times follow Cox proportional hazards model with possible covariate adjustment and left truncation. In Chapter 3, we assume that 4 survival times follow a semiparametric transformation model and are subject to interval censoring, with possible covariate adjustment and left truncation. In Chapters 4 and 5, we assume that survival times respectively follow the accelerated failure time (AFT) model and the additive hazards model subject to competing risk and left truncation. Finally, Chapter 6 describes concluding remarks and future works based on this dissertation. 5 CHAPTER 2 Multi-marker genetic association and interaction tests for sur- vival outcomes based on weighted V statistics 2.1 Methods A set of genetic association and interaction tests are developed for survival times that follow Cox proportional hazard (PH) model, with or without considering adjustment covariates and left truncation time. 2.1.1 Association tests Consider a cohort study with n independent individuals, who are disease-free when entering the study. Let G = (G1 , . . . , Gp ) denotes p-dimentional genetic marker we are interested in. Gj = {Gij }ni=1 is the j-th genetic marker that represents either a gene expression, a SNP (coded by 0, 1, or 2) or a genetic pathway. To mimic linkage disequilibrium (LD) structure in real world application, we assume that there is moderate correlation between genetic markers of the underlying population. Let T and C denote respectively survival time and right censoring time, then the observed survival outcome for each individual is denoted as (A, T̃ , ∆), where A is truncation time (i.e., individual’s age at study entry), T̃ = min(T, C), and ∆ = I(T ≤ C) is an indicator for event (∆ = 1) or censoring (∆ = 0). When adjust for covariates, a k-dim time-invariant baseline covariates Z = (Z1 , . . . , Zk ) is considered to reduce confounding and/or increase test power. We developed two sets of association test by assuming the effect of G is either homogeneous or heterogeneous across sub-population structure indicated explicitly by d-dim variable X = (X1 , . . . , Xd ) or implicitly by observable/latent variables inferred by X. Denote the observed data as D = (A, T̃ , ∆, G, Z, X), where X exists only when considering genetic heterogeneity. We assume non-informative censoring and left truncation given G and Z (and X when considering 6 genetic heterogeneity). In the absence of genetic heterogeneity, without covariate adjustment Under the null hypothesis that G is not associated with the time-to-onset of the interesting event when do not adjust for Z, we assume that the survival time follows a Cox proportional hazard (PH) model with hazard function, λ(t) = λ0 (t), where λ0 (t) is unspecified baseline Rt hazard function. The cumulative hazard function Λ(t) = 0 λ(s)ds will be estimated non- parametrically by using the Nelson-Aalen estimator. The proposed association test is based on the following weighted V statistic, X n X n V = n−2 f˜(Gi , Gj )Mi Mj , (2.1) i=1 j=1 R∞ where Mi = Ni (∞) − 0 Yi (s)λ(s)ds is martingale residual of subject i, where Ni (t) = I(Ui ≤ t, ∆i = 1) and Yi (t) = I(Ui ≥ t)} are corresponding counting process and at-risk process. The test statistic includes two parts 1) V statistic kernel Mi Mj and 2) weight function f˜(Gi , Gj ), which is a centered genetic similarity between subject i and j, f˜(Gi , Gj ) = f (Gi , Gj ) − E[f (Gi , Gj )|Gi ] − E[f (Gi , Gj )|Gj ] + E[f (Gi , Gj )], (2.2) where f (Gi , Gj ) is a kernel function (e.g., IBS, Gaussian kernel). The test statistic is constructed such that when G is significantly associated with the event of interest, we expect a large phenotype similarity, Mi Mj , weighted by a large genetic similarity, f˜(Gi , Gj ), leading to a large V ∗ value, therefore the test will have small p-value. Now we want to derive the asymptotic null distribution of V . By Mercer’s Theorem (Cris- 7 tianini and Shawe-Taylor, 2000), under regularity conditions, we can decompose f˜(Gi , Gj ) as f˜(Gi , Gj ) = ∞ ∞ ∞ P l=1 λl ψ(Gi )ψ(Gj ), where {λl }l=1 and {ψl (·)}l=1 are eigenvalues and eigen- functions of f˜(·, ·). We then have ∞  n X 1 Xp 2 nV = √ λt ψt (Gi )Mi . (2.3) t=1 n i=1 R∞ Theorem 1. Suppose that η ≡ E(Mi2 ), where M = N (∞) − 0 Y (s)dΛ(s), is a finite- positive number and E[f˜(Gi , Gj )] < ∞. Under the null hypothesis that T is independent P∞ of G, nV ⇝ η t=1 λt χ21t , where χ21t ’s are independent χ2 random variables with degree of freedom 1. Following Theorem 1, we can show that the asymptotic null distribution of nV is a weighted sum of independent degree one χ2 random variables. Rt In real application, we estimate Λ(t) by Nelson-Aalon estimator, Λ̂(t) = 0 dN (s)/Y (s), Yi (t). f˜(Gi , Gj ) is estimated from observed data by Pn Pn where N (t) = i=1 Ni (t), Y (t) = i=1 (I − J)F (I − J), where F = {f (Gj , Gj )}n×n , I is a n × n diagonal matrix, and J is a n × n matrix with all elements equal 1. The test statistic can be written as V = M̂(I − J)T F (I − J)M̂, (2.4) R∞ ˆ where M̂ = (M̂1 , . . . , M̂n )T , with M̂i = Ni (∞) − 0 Yi (s)dΛ(s). We approximates λ = E(Mi2 ) by using λ̂ = M̂T M̂/n and ηt by using ηˆt = η̃t /n, where {η̃t }nt=1 are eigenvalues of 8 (I − J)T F (I − J). The asymptotic null distribution of V can be approximated by X n nV ∼ λ̂ η̂t χ21t , (2.5) t=1 where {χ21t }nt=1 are χ2 random variables with 1 degree of freedom. Based on this linear com- bination of independent χ2 random variables, the p-value can be calculated using Davies’ method (Davies, 1980), P (nV ≥ nVobs ), where Vobs is observed value of V . This is imple- mented using R function ”davies” in R package ”CompQuadF orm”. In the absence of genetic heterogeneity, with adjustment covariates The association test considering adjustment covariates, Z, is similar to the test without covariate adjustment. We now assume that the survival time for subject i conditional on Z follow a Cox PH model with below hazard function λ(t|Zi ) = λ0 (t) exp{β T Zi } and covariate-centered genetic similarity as follow f˜Z (Gi , Gj ) = f (Gi , Gj ) − E[f (Gi , Gk )u(Zk , Zj )|Gi , Zj ] − E[f (Gj , Gk )u(Zk , Zi )|Gj , Zi ] +E[u(Zi , Zm )f (Gm , Gk )u(Zk , Zj )|Zi , Zj ], 9 where u(Zi , Zj ) = (1, ZTi )T [E{(1, ZT )T (1, ZT )}]−1 (1, ZTj )T . The covariate-adjusted weighted V statistic can be written as n n 1 XX ˜ VZ = 2 fZ (Gi , Gj )MZ,i MZ,j , (2.6) n i=1 j=1 where MZ,i MZ,j is considered ij-th V statistic kernel that measures phenotype similarity, f˜Z (Gi , Gj ) is covariate-centered kernel matrix that measures genetic similarity. The larger (smaller) phenotype similarity weighted by larger (smaller) genetic similarity, leads to larger (smaller) V statistic value and smaller (larger) p-value. Theorem 2. Suppose that ξ ≡ E(MZ,i 2 ) is a finite positive number and E[f˜Z (Gi , Gj )] < ∞. P∞ When T is independent of G given Z and G is independent of Z, nVZ ⇝ ξ t=1 νt χ21t , where {χ21t }∞ 2 t=1 are independent χ random variables with 1 degree of freedom and {νt }t=1 are ∞ eigenvalues of f˜Z (Gi , Gj ) = ∞ P t=1 νt ϕ(Gi , Zi )ϕ(Gj , Zj ) with E(ϕs (G, Z)ϕt (G, Z)) = I(s = t). The same asymptotic null distribution holds when f (Gi , Gk ) is cross-product kernel and G is linearly related to Z through G = a + BT Z + e, where a and B are constant intercept and regression coefficient matrix and e is zero mean random error vector that is independent of Z. When G is independent of Z, both test statistics, V and VZ , are asymptotically valid, with VZ more powerful. When Z is correlated with both G and survival outcome, then Z is an observable confounder. Under the linear confounding effect, Theorem 2 implies that the covariate-adjusted weighted V test with cross-product kernel genetic similarity is asymptotically correct. In real application, we can obtain maximum partial-likelihood estimate for β and Breslow 10 estimator for cumulative baseline hazard function Λ0 (t), denoted as β̂ and Λ̂0 (t), by fitting a CoxPH model (Andersen and Gill, 1982) to the observed data D. With β̂ and Λ̂0 (t), we R∞ T have estimated martingale residual for subject i as M̂Z,i = Ni (∞)− 0 Yi (s) exp(β̂ Z)dΛ̂0 (s). The covariate-centered genetic similarity kernel f˜Z (Gi , Gj ) is approximated using observed sample by (I − H)T F (I − H). The covariate adjusted weighted V statistic can be written as VZ = M̂TZ (I − H)T F(I − H)M̂Z , (2.7) where M̂Z = (M̂Z,i , . . . , M̂Z,n )T , I is n×n diagonal matrix, H = Z̃(Z̃T Z̃)−1 Z̃T with Z̃ = (1, Z) is a n × (k + 1) matrix with elements in the first column equal 1. The asymptotic null distribution of nVZ can be approximated by Xn nVZ ∼ ξˆ ν̂t χ21t , (2.8) t=1 where ξˆ = M̂TZ M̂Z /n is an approximation of ξ = E(MZ,i 2 ), ν̂t = ν̃t /n with {ν̃t }′ s are eigenval- ues of (I − H)T F(I − H). When dimension of Z, denoted as k (k < n), is large relative to n, ξˆ n Pn formula 2.8 is replace with n−k−1 t=1 ν̂t χ21t to take into account of the projection operator (I − H) as mentioned in Wei and Lu (2017). Based on this distribution, linear combination of independent χ2 random variable with 1 degree of freedom, p-value can be achieved using Davies’ method (Davies, 1980), P (nVZ ≥ nVZ,obs ), where VZ,obs is the observed value of VZ . In the presence of genetic heterogeneity In the association tests introduced above, we assume the effect of G is homogeneous. How- ever, genetic heterogeneity is more appropriate in genetic research of complex human disease. 11 In this section, we extend the proposed association tests, V and VZ , to take into considera- tion genetic heterogeneous effect, by replacing F matrix in hazard function 2.4 and 2.7 with W = (J + K) ⊙ F, where ⊙ is element-wise matrix product and K = {k(Xi , Xj )}n×n mea- sures similarity of observable/latent subpopulation structure imposed by variable X. The resulting weighted V statistic with or without covariates adjustment are   V H = M̂T (I − J)W(I − J)M̂,     VZH = M̂TZ (I − J)W(I − J)M̂Z .  The asymptotic null distribution of V H and VZH can be derived similarly as other association tests introduced in previous sections, by replacing F with W. Using Davies’ method (Davies, 1980), the p-values can be computed, respectively, by P (nV H ≥ nVobs H ) and P (nVZH ≥ H H H nVZ,obs ), where Vobs and VZ,obs are observed values of V H and VZH . 2.1.2 G-G/G-E interaction test Let G = (G1 , . . . , Gp ) be the gene of interest. In testing G-G interaction effect, we assume the effect of G depends on another gene H = (H1 , . . . , Hq ). To test the G-G interac- tion effect, we assume that the survival time follow a Cox PH model with hazard function λ(t|G, H) = λ0 (t) exp{β T G + αT H}. The maximum partial-lkelihood estimates of (β, α) is denoted as (β̂, α̂), Λ0 (t) is estimated by using Breslow estimator, denoted as Λ̂0 (t). We then R∞ Yi (s) exp( pk=1 Gik β̂k + ql=1 Hil α̂l )dΛ̂0 (s). The G-G interaction P P have M̂i = Ni (∞) − 0 test is based on the following weighted V test statistic VI = M̂T (I − O)(F ⊙ K)(I − O)M̂, 12 where O = Q(QT Q)−1 QT with Q = (1, G, H) is a n × (1 + p + q) matrix with all element in first column equal 1. F ⊙ K is element-wise matrix product of F and K with F = {f (Gi , Gj )}n×n and K = {k(Hi , Hj )}n×n . The asymptotic null distribution of nVI can be approximated, similarly as nVZ , by linear combination of independent χ2 distribution. When p + q is large relative to n, we multiply the approximated asymptotic null distribution by n/(n − p − q + 1) to take into account the project operator (I − O). The corresponding p-value is obtained using Davies’ method (Davies, 1980), P (nVI ≥ nVI,obs ). In testing G-E interaction effect, the test can be conducted similarly as G-G interaction, except that H in replaced with an environmental variable. 2.2 Simulations We performed Monte Carlo simulation to assess the finite-sample performance of the weighted V test considering adjustment covariates, in absence of left-truncation time. In all simulation settings, unless otherwise specified, three different sample sizes, n = {500, 1000, 1500}, and three different SNP-set sizes, p = {5, 10, 15} were considered. The genetic covariates we con- sidered are SNP-set with each SNP generated from a Binomial(2, 0.2). When considering confounding effects, SNP-set will be replaced by gene expression values. Two adjustment covariates were considered for each subject, one binary Z1 ∼ Bernoulli(0.5) and one contin- uous Z2 ∼ U nif orm(0, 2). Survival times were assumed to follow the Cox PH model, which was generated from an exponential distribution with rate λ. The λ value varies in differ- ent scenarios to control the right censoring rate at moderate level (between 20% and 40%). The assessment of the empirical size and statistical power were based on, respectively, 2000 and 1000 Monte Carlo samples, and the significance level of a test was set at 0.05, unless otherwise specified in the scenario when testing genetic association under various stringent 13 p-value thresholds. 2.2.1 Testing genetic association in the absence of genetic heterogeneity In this series of simulations, empirical size and power of test VZ was assessed in detecting the association between a SNP-set and onset of target event. To demonstrate the advantages of weighted V test, the performance of VZ was compared with three existing tests: coxKM (R package ’coxKM’), asymptotic version gt (R package ’globaltest’), and coxSKAT LRT. Survival times for subject i(i = 1, . . . , n) was generated from exponential distribution with following rate p 2 X X λ(t|Gi , Zi ) = λ0 (t) exp{ βj Gij + 0.5Zik }, (2.9) j=1 k=1 where λ0 (t) and {βj }pj=1 vary between scenarios. Computation time comparison between Weighted V statistic (WV), coxKM, gt (asymptotic version), and coxSKATs We compared computation times of WV, coxKM, gt, and coxSKATs under the null hypothe- sis of no genetic association. Survival time for subject i was assumed to follow Cox PH model with hazard function 2.9 where λ0 (t) = 1 and {βj }′ s = 0. The cross-product kernel was used to measure genetic similarity. Three sample sizes n = 150, 500, 1000 and one SNP-set size p = 5 were considered. Table 2.1 shows that WV and asymptotic version gt are comparably fast, while coxKM is the slowest due to the resampling procedure to achieve the test statistic large sample null distribution and p-value. 14 Table 2.1 Computation time (seconds) of WV, coxKM, gt and coxSKATs under the null hypothesis of no genetic association and p = 5. Sample Size WV coxKM gt coxSKATs 150 0.07 0.30 0.03 0.09 500 0.24 1.78 0.23 0.36 1000 1.41 8.22 1.76 2.13 Comparison of empirical size of WV, coxKM, gt (asymptotic version), and coxSKATs under moderate sample size We investigated the empirical sizes of WV, coxKM, gt, coxSKATs under moderate sample size n = 150 and SNP-set size p = 5. Other simulation settings were the same as the simulation above. Figure 2.1 indicates that both WV and coxKM tests are asymptotically correct while gt and coxSKATs are not because the QQ plot of their p-values deviates from U nif orm(0, 1) distribution. Therefore, only WV and coxKM were investigated for the following simulations. Empirical size and power under various n’s and p’s In this simulation, the performance of WV and coxKM tests were investigated under various n and p. Survival time for subject i was assumed to follow Cox PH model with hazard function 2.9 where λ0 (t) = 0.5, {βj }’s were set to 0 or 0.1 for empirical size and power assessment. IBS kernel is used to measure genetic similarity. Table 2.2 indicates that Type I error of both WV and coxKM are close to the nominal level and have comparable statistical power under various n’s and p’s. Empirical size and power under linear confounding effect In this simulation, the genetic covariates of our interest are gene expression values G = (G1 , . . . , Gp ), where Gi = Zi1 + Zi2 + ei . ei ∼ M V N (0, Σ) is p-dim multivariate normal 15 Table 2.2 Empirical size and power comparison between WV and coxKM in testing association of a SNP set with right-censored survival time in the absence of genetic heterogeneity, adjusting for covariates. Empirical Size (Power) p=5, n=500 p=5, n=1000 p=5, n=1500 WV 0.054 (0.208) 0.049 (0.445) 0.055 (0.601) coxKM 0.052 (0.244) 0.054 (0.430) 0.045 (0.627) p=10, n=500 p=10, n=1000 p=10, n=1500 WV 0.038 (0.335) 0.052 (0.631) 0.055 (0.819) coxKM 0.041 (0.337) 0.051 (0.630) 0.040 (0.846) p=15, n=500 p=15, n=1000 p=15, n=1500 WV 0.046 (0.507) 0.049 (0.853) 0.046 (0.974) coxKM 0.038 (0.473) 0.034 (0.886) 0.045 (0.976) WV coxKM 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile gt coxSKATs 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile Figure 2.1 U nif orm(0, 1) Q-Q plots of p-values for WV, coxKM, gt and coxSKATs under n = 150, p = 5 and the null hypothesis of no genetic association. random error with covariance matrix Σ = {0.5|i−j| }p×p . Cross-product kernel was used to measure gene expression similarity for both WV and coxKM tests. Survival times were generated similarly as previous simulation, except G is replaced by the gene expression 16 values. Sample size n = 500 and SNP-set size p = 5 is considered. Type 1 error of WV is 0.0475, which is close to nominal level, while Type I error of coxKM (0.0185) is far from nominal level. This can also be observed from the U nif orm(0, 1) Q-Q plot in Figure 2.2. 2.2.2 Testing genetic association in the presence of genetic heterogeneity In this series of simulations, we investigated the performance of HWV in testing genetic association with age-to-onset of target event, considering genetic heterogeneity across four different sources: 1) heterogeneity across four observable subpopulations, 2) heterogeneity across two latent subpopulations, 3) heterogeneity across twenty latent subpopulations, and 4) heterogeneity across genome profile. Genetic heterogeneity across four observable subpopulations In this simulation, we investigated the empirical size and power of HWV and coxKM in the presence of genetic heterogeneity across four observable subpopulations inferred by X = (X1 , X2 , X3 ), a three-dimensional dummy variable that codes four equally distributed observable subpopulations. Survival time for subject i was generated from Cox PH model with the following hazard function p X λ(t|Gi , Xi ) = exp (β1 Xi1 + β2 Xi2 + β3 Xi3 + β4 )Gik k=1 +0.6Xi1 + 0.6Xi2 + 0.2Xi3 . We set (β1 , β2 , β3 , β4 ) to be (0, 0, 0, 0) and (0.5, 0.005, 0.001, 0.0005) for, empirical size assessment and power assessment, respectively. IBS kernel was used to measure genetic sim- ilarity and identity kernel, {I(Xi = Xj )}n×n , was used to measure subpopulation similarity in HWV. Two sample sizes n = {500, 750} and three SNP-set sizes p = {5, 10, 15} were 17 considered. Table 2.3 shows that both HWV and coxKM control Type I error well around nominal level under various sample sizes and SNP-set sizes. However, HWV is more powerful than coxKM in presence of genetic heterogneity across four observable subpopulations, by taking advantage of the background similarity when conducting the test. WV coxKM 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile Figure 2.2 U nif orm(0, 1) Q-Q plots of p-values for WV and coxKM under linear confounding and the null hypothesis of no genetic association. Table 2.3 Empirical size and power comparison between HWV and coxKM in the presence of genetic heterogeneity across four observable sub-populations, adjusting for covariates. Empirical Size (Power) p=5, n=500 p=5, n=750 HWV 0.054 (0.801) 0.049 (0.963) coxKM 0.047 (0.618) 0.047 (0.811) p=10, n=500 p=10, n=750 HWV 0.052 (0.942) 0.046 (0.997) coxKM 0.045 (0.799) 0.055 (0.975) p=15, n=500 p=15, n=750 HWV 0.043 (0.931) 0.050 (0.998) coxKM 0.044 (0.857) 0.049 (0.984) Genetic heterogeneity across two latent subpopulations In this simulation, we investigated the empirical size and power of HWV and coxKM in the presence of genetic heterogeneity across two latent subpopulations. The survival time for 18 subject i in subpopulation j was generated from Cox PH model with the following hazard function p X λ(t|Gij , Zij ) = 0.5 exp{ Gijk βjk + 0.5Zij1 + 0.5Zij2 }, (2.10) k=1 where βjk represents the effect of Gk (k = 1, . . . , p) in subpopulation j (j = 1, 2). One- dimensional covariate X = {Xi }ni=1 was generated to infer two latent subpopulations. Each element is Xi = ai + 1 + ei , where ai ∼ Bernoulli(0.5) and ei ∼ N ormal(0, 0.25). IBS kernel was used to measure genetic similarity and unified kernel to measure latent subpopulation similarity. Table 2.4 shows that Type I error of HWV is close to nominal level under various n’s and p’s, while coxKM could be conservative when p is large relative to n (e.g., under p=15, n=500). Table 2.5 shows power comparison of HWV and coxKM under four different scenarios: T1, T2, T3, and T4. It can be observed that in the absence of genetic heterogeneity (scenario T1 β1k = β2k ̸= 0), HWV is less powerful than coxKM since the genetic effect in the underlying population is homogeneous. However, in the presence of genetic heterogeneity (scenario T2, T3, T4), HWV is more powerful than coxKM and as heterogeneity size (|β1k − β2k |) increases, HWV gains more power. Genetic heterogeneity across twenty latent subpopulations In this simulation, we investigated empirical sizes and power of HWV and coxKM in the presence of genetic heterogeneity across twenty latent subpopulations. The simulation setting is same as above except the number of subpopulations increased to twenty. The regression coefficients, {β j = (βj1 , . . . , βjp ), s.t. βj1 = . . . , βjp }20 j=1 , in hazard function 2.10 were set to 0 for empirical size assessment. In power assessment, {β j }20 j=1 were generated from uniform distribution with mean µβ and standard deviation σβ . To infer the twenty subpopulation 19 Table 2.4 Empirical size comparison between HWV and coxKM in testing genetic association in the presence of genetic heterogeneity across two latent sub-populations, adjusting for covariates. Empirical Size p=5, n=500 p=5, n=1000 p=5, n=1500 HWV 0.050 0.051 0.056 coxKM 0.048 0.061 0.051 p=10, n=500 p=10, n=1000 p=10, n=1500 HWV 0.050 0.054 0.046 coxKM 0.034 0.049 0.046 p=15, n=500 p=15, n=1000 p=15, n=1500 HWV 0.051 0.047 0.046 coxKM 0.038 0.039 0.042 structure, we used X = (X1 , . . . , X25 ) to infer subpopulation structure. Xk = ak + ek (k = 1, . . . , 25), with ak is a length n bootstrap sample from {1, . . . , 20} and ek = (ek1 , . . . , ekn ) ∼ N ormal(0, 0.5) is random error vector. IBS kernel was used to measure genetic similarity in HWV and coxKM. Unified kernel was used to measure subpopulation similarity in HWV. Table 2.6 shows both HWV and coxKM control Type I error well around nominal level, with an minor exception that under p = 15, n = 500, coxKM is slightly conservative due to relatively small sample size issue. Figure 2.3 shows power comparison under p = 15, n = 1000. It can be observed that as average genetic effect size increases (µβ = 0, 0.02, 0.03), both HWV and coxKM gains power. However, for a fixed average genetic effect (µβ ), as genetic heterogeneity effect increases (σβ = 0.02, 0.04, 0.06), power of coxKM does not change much, while HWV gains substantially more power by considering subpopulation structure in performing the association test. 20 Table 2.5 Power comparison between HWV and coxKM in the presence of genetic heterogeneity across two latent sub-populations, adjusting for baseline covariates. Heterogeneity Scenario† T1 T2 T3 T4 β1k +0.09 +0.13 +0.02 +0.04 0 0 +0.01 -0.01 β2k +0.09 +0.13 -0.02 -0.04 +0.02 +0.04 +0.05 +0.03 HWV 0.11 0.25 0.10 0.27 0.06 0.11 0.09 0.10 p=5, n=500 coxKM 0.28 0.60 0.04 0.06 0.06 0.07 0.08 0.04 HWV 0.24 0.59 0.18 0.48 0.08 0.17 0.18 0.15 p=5, n=1000 coxKM 0.57 0.91 0.04 0.04 0.05 0.07 0.09 0.04 HWV 0.42 0.86 0.20 0.64 0.09 0.21 0.22 0.21 p=5, n=1500 coxKM 0.84 0.99 0.05 0.04 0.06 0.08 0.11 0.05 HWV 0.10 0.25 0.27 0.77 0.10 0.30 0.29 0.27 p=10, n=500 coxKM 0.46 0.84 0.04 0.04 0.04 0.06 0.07 0.05 HWV 0.25 0.69 0.48 0.98 0.16 0.51 0.55 0.51 p=10, n=1000 coxKM 0.83 1.00 0.05 0.04 0.05 0.07 0.10 0.04 HWV 0.44 0.94 0.69 1.00 0.21 0.67 0.71 0.70 p=10, n=1500 coxKM 0.96 1.00 0.05 0.05 0.07 0.08 0.10 0.04 HWV 0.09 0.22 0.54 0.98 0.19 0.58 0.58 0.56 p=15, n=500 coxKM 0.56 0.93 0.05 0.04 0.04 0.05 0.06 0.05 HWV 0.22 0.72 0.86 1.00 0.34 0.87 0.87 0.86 p=15, n=1000 coxKM 0.93 1.00 0.03 0.05 0.05 0.08 0.12 0.05 HWV 0.43 0.98 0.94 1.00 0.46 0.95 0.96 0.95 p=15, n=1500 coxKM 0.99 1.00 0.05 0.06 0.05 0.09 0.16 0.06 † Various heterogeneity scenarios were investigated in this simulation, determined by values of β1k and β2k , including same effect size with same direction (T1), same effect size with opposite directions (T2), no effect in one sub-population while positive effect in the other (T3), and different effect sizes with same or opposite directions (T4). Genetic heterogeneity across individual genome profile In this simulation, instead of assuming a ’categorical’ subpopulation structure like previous two simulations did, we now assume that the subpopulation structure is ’continuous’. In other words, each individual is itself a subpopulation and genetic variants have varied effect across individuals. We used a genome profile of 1,000 SNPs, X = (X1 , . . . , X1000 ), to infer subpopulation structure. X was generated in three steps, 1) generate genetic effect sizes {β i = (βi1 , . . . , βip ), s.t. βi1 = . . . = βip }ni=1 from uniform distribution with mean µβ and variance σβ2 , 2) generate a n-dim vector Xd (d = 1, . . . , 1000) from M V N (0, Σ), where 21 Table 2.6 Empirical size comparison between HWV and coxKM in the presence of genetic heterogeneity across 20 latent sub-populations, adjusting for covariates. Empirical Size p=5, n=500 p=5, n=1000 p=5, n=1500 HWV 0.052 0.055 0.051 coxKM 0.042 0.053 0.045 p=10, n=500 p=10, n=1000 p=10, n=1500 HWV 0.048 0.056 0.047 coxKM 0.046 0.043 0.050 p=15, n=500 p=15, n=1000 p=15, n=1500 HWV 0.047 0.040 0.044 coxKM 0.038 0.044 0.051 Σ is a n × n identity matrix under null hypothesis and {exp(−|βi1 − βj1 |/σβ )}n×n under alternative hypothesis, 3) categorize Xd (d = 1, . . . , 1000) into SNPs coded by 0, 1, or 2 by using predetermined quantile cutoffs: a2 , 2a(1 − a), and (1 − a)2 of a standard normal distribution where a is MAF generated from Beta(2, 5), to ensure the resulting population is in HWE. Survival times were generated from Cox PH model with hazard function 2.10, with {β j = (βj1 , . . . , βjp ), s.t. βj1 = . . . = βjp }nj=1 set to 0 for empirical size assessment and sampled from uniform distribution with mean µβ and variance σβ2 for power assessment. IBS kernel was used to measure genetic and genome profile similarity. Results in Table 2.7 indicate that Type I error of both HWV and coxKM are close to nominal level under various sample sizes (n) and SNP-set sizes (p). Figure 2.4 shows power comparison of HWV and coxKM under p = 10, n = 1000. Figure 2.4 indicates that as average genetic effect (µβ ) increases, both tests gain power. However, for a fixed µβ , as genetic heterogeneity size (σβ ) increases, HWV gains substantially more power than coxKM does. 22 Table 2.7 Empirical size comparison of HWV and coxKM considering genetic heterogeneity due to genomic profile, adjusting for covariates. Empirical Size p=5, n=500 p=5, n=1000 p=5, n=1500 HWV 0.041 0.057 0.049 coxKM 0.042 0.046 0.057 p=10, n=500 p=10, n=1000 p=10, n=1500 HWV 0.044 0.054 0.057 coxKM 0.045 0.044 0.057 p=15, n=500 p=15, n=1000 p=15, n=1500 HWV 0.042 0.047 0.050 coxKM 0.039 0.040 0.045 2.2.3 Testing G-G/G-E interaction effect In this simulation, we investigated the performance of the weighted V test for G-G and G-E interaction, denoted as WVI, and compared it with the likelihood ratio test (LRT). For testing G-G interaction effect, two genes, G = (G1 , . . . , Gp ) and H = (H1 , . . . , Hq ), were generated using a two-step procedure: 1) generate n samples from M V N (0, ΣG ) and from M V N (0, ΣH ), where ΣG = {0.3|k−l| }p×p and ΣH = {0.3|k−l| }q×q , to mimic LD structure between k- and l-th SNP within gene G and H; 2) categorize each column into 0, 1 or 2 using pre-defined cut-off values to achieve Hardy-Weinberg Equilibrium (HWE) and set MAF to 0.2. We assume survival time of subject i (i = 1, . . . , n) follow Cox PH model with the following hazard function p q pq X X X λ(t|Gi , Hi ) = 0.5 exp{ 0.05Gik + 0.05Hil + βm (GH)im }, k=1 l=1 m=1 where (GH)im represents m-th interaction term of Gi and Hi . {βm }pq m=1 were set to 0 and 0.1 for empirical size and power assessment, respectively. Three sample sizes n = 500, 1000, 1500 23 and two SNP-set sizes p = q = 2, 3 were considered. Cross-product kernel was used to measure genetic similarity for both G and H. µβ = 0 µβ = 0.02 µβ = 0.03 1.0 1.0 1.0 Method HWV coxKM 0.8 0.8 0.8 0.6 0.6 0.6 Power Power Power 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.02 0.03 0.04 0.05 0.06 0.02 0.03 0.04 0.05 0.06 0.02 0.03 0.04 0.05 0.06 σβ σβ σβ Figure 2.3 Power comparison between HWV and coxKM in the presence of genetic heterogeneity across twenty latent subpopulations when p = 15 and n = 1000. For testing G-E interaction effect, the simulation setting is similar to that of testing G-G interaction, except that H = (H1 , H2 ) is an two-dimensional environmental variable, with H1 ∼ Bernoulli(0.5) and H2 ∼ U nif orm(0, 2). The survival time for subject i (i = 1, . . . , n) was generated from CoxPH model with the following hazard function p 2 2p X X X λ(t|Gi , Hi ) = 0.5 exp 0.1Gik + 0.1Hil + βm (GH)m , (2.11) k=1 l=1 m=1 where (GE)m is m-th interaction term of G and H. One sample size n = 1000 and one SNP-set size p = 10 was considered (q was fixed at 2 since H has two-dimension). IBS kernel to measure genetic similarity and unified kernel for environmental similarity. 24 Table 2.8 and 2.9 indicates that, in testing G-G and G-E interaction effect, Type I error of WVI is close to nominal level while LRT could be slightly inflated under small sample size and relative large SNP-set. Under all n’s and p’s WVI is more powerful than LRT as expected because the asymptotic null distribution of WVI has a more effective degree of freedom than LRT, therefore more powerful than LRT. The power advantages of WVI over LRT increase with the strength of LD, measured by ρ in Σ = {ρ|k−l| }p×p . µβ = 0 µβ = 0.02 µβ = 0.03 1.0 1.0 1.0 Method HWV coxKM 0.8 0.8 0.8 0.6 0.6 0.6 Power Power Power 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.02 0.03 0.04 0.05 0.06 0.02 0.03 0.04 0.05 0.06 0.02 0.03 0.04 0.05 0.06 σβ σβ σβ Figure 2.4 Power comparison between HWV and coxKM in the presence of genetic heterogeneity across individual genome profiles when p = 10 and n = 1000. 2.2.4 Empirical size under stringent p-value thresholds So far, we have investigated the empirical size of WV and HWV under a significance level of 0.05. However, to account for multiple testing issues in GWAS, a more strict p-value (5 × 10−3 or lower) is often considered. In this simulation, we investigated the empirical size of 1) WV in absence of genetic heterogeneity, with p = 5, n = 1000 and 2) HWV in the presence of heterogeneity across observable subpopulations with p = 10, n = 1000 under 25 Table 2.8 Empirical sizes and powers of WVI and LRT in testing G-G interaction without considering left truncation. Empirical Size (Power) p=2, q=2, n=500 p=2, q=2, n=1000 p=2, q=2, n=1500 WVI 0.050 (0.255) 0.056 (0.435) 0.051 (0.629) LRT 0.055 (0.205) 0.054 (0.328) 0.052 (0.494) p=3, q=3, n=500 p=3, q=3, n=1000 p=3, q=3, n=1500 WVI 0.057 (0.488) 0.059 (0.837) 0.057 (0.952) LRT 0.072 (0.341) 0.064 (0.664) 0.052 (0.864) Table 2.9 Empirical sizes and powers of HWV and LRT in testing G-E interaction without considering left truncation (p = 10, q = 2, n = 1000). Empirical Size (Power) HWV 0.049 (0.905) LRT 0.071 (0.855) p-value thresholds that are more strict than 0.05. 350K Monte Carlo samples were generated in the same way as corresponding association tests in previous sections. Table 2.10 shows that the empirical size of WV and HWV is close to the nominal level. Table 2.10 Empirical size for WV and HWV that considers heterogeneity across individual genome profiles and adjusts for covariates, under several stringent p-value thresholds. Empirical Size† Threshold WV (p=5, n=1000) HWV (p=10, n=1000) 0.05 0.0494 0.0487 0.005 0.00474 0.00453 0.0005 0.000400 0.000489 0.00005 0.0000429 0.0000371 † Proportion of p-values in 350K replicates that are smaller or equal to the corresponding threshold. 2.3 A Real Application We applied the developed WV and HWV test on the ROSMAP dataset by assuming the age-to-onset of AD follows the Cox PH model. ROSMAP is a GWAS that includes demen- 26 tia exam data from two large longitudinal studies of aging and dementia: 1) the Religious Orders Study (ROS) and the Rush Memory and Aging Project (MAP) (Bennett et al., 2018). The ROSMAP genotype dataset includes 1,639 subjects, each has 750,173 SNPs. We first performed quality control on genotype dataset 1) at SNP level by removing SNPs with MAF>0.01, HWE test’s p-value > 10−6 , or missing rate > 2%, then 2) at the subject level by removing subjects with missing genotype rate > 2%. This results in a quality-controlled genotype dataset with 1,618 subjects each having 619,061 SNPs. The missing genotypes were then imputed with random samples generated from Binomial(2, M AF ), where MAF was estimated from the genotype dataset. PCA was performed using Plink software (ver- sion 1.9) to obtain the first 20 principal components that represent the underlying ancestry information. After that, we grouped all available SNPs into 21,329 different genes by using human genome annotation (GRCh38/hg38) obtained from the UCSC Genome Browser. We also formed a new gene APOE4 which was coded as the count of the APOE4-ϵ4 allele. The ROSMAP phenotype dataset includes information from 2,376 subjects. Each subject has 1) baseline covariates: race, gender, years of education, and binary study cohort indicator (ROS or MAP) and 2) follow-up information: subject’s age and clinical cognitive diagnosis result at each examination center visit including at study registry. For our analysis, we include only subjects who are disease-free in the study registry. Age at baseline served as left-truncation time, the age at first AD diagnosis served as survival time, and the age at study termination served as right censoring time if the subject is disease-free throughout the study follow-up period. We merged preprocessed genotype and phenotype datasets by keeping only the overlap- ping subjects to generate the analysis-ready dataset, which includes 1,433 subjects, with a 27 censoring rate of 67.8%. We are interested in 1) testing genetic covariates that are associated with the time-to-onset of first AD diagnosis and 2) detecting potential genetic heterogeneity across subpopulations inferred by either baseline covariates or individual genome profiles (represented by a random sample of 200K SNPs), in the presence of left-truncation time and adjust for gender, year of education, study cohort indicator (ROS served as baseline), and first 20 principal components. Note that race is not adjusted because 1,432 out of 1,433 subjects are white. IBS kernel was used to measure genetic similarity, while the choice of similarity kernel for subpopulation similarity depends on the source of heterogeneity, which is elaborated in Table 2.11. Various types of heterogeneity have been considered including no genetic heterogeneity (S1), heterogeneity across individual genome profiles (S2), heterogeneity due to years of edu- cation (S3), heterogeneity across three education attainment categories (0: yrs of education ≤ 12; 1: yrs of education ∈ (12, 16]; 2: yrs of education > 16) with education similarity between subjects i and j measured by κij = I(Xi = Xj ) (S4), heterogeneity across the three education attainment categories with education similarity measured by IBS kernel (S5), and heterogeneity due to sex (S6). FDR-based p-value thresholds were calculated according to Benjamini and Hochberg (1995) under arbitrary dependence assumption, i.e., Thresholdi = Pmαi (i = 1, . . . , m), where m is the number of tests and α is the target m k=1 (1/k) FDR. Table 2.11 shows the analysis results of the ROSMAP dataset. To account for multiple testing issues, the p-value thresholds were obtained by controlling the FDR under 10% by Benjamini-Hochberg procedure Benjamini and Hochberg (1995). When in the absence of genetic heterogeneity (S1), APOE4 appeared to be the most significant gene (p=1.21E-10) 28 followed by APOC1 (p=1.04E-07). When in the presence of genetic heterogeneity (S2-S6), even though APOE4 and APOC1 were still the most significant genes, their p-values vary across scenarios, with notably smaller p-values in S4 that considered genetic heterogeneity across re-categorized year-of-education, with similarity measured by IBS kernel. This indi- cates the potential genetic heterogeneity due to education levels. In other words, the effect of APOE4 and APOC1 on age-to-onset of AD varies across subjects’ education levels. Table 2.11 Top five significant genes discovered by WV/HWV from a genome-wide association analysis of ROSMAP dataset by assuming age-to-onset of AD follows Cox PH model. Heterogeneity Gene names and p-values S1 APOE4 APOC1 CAMSAP1 XPR1 GRIP1 1.21e-10 1.04e-07 2.34e-05 1.48e-04 3.04e-04 S2 APOE4 APOC1 CAMSAP1 XPR1 KU.MEL.3 4.03e-10 2.31e-07 7.56e-06 6.92e-05 1.97e-04 S3 APOE4 APOC1 CAMSAP1 XPR1 PPM1A 6.72e-11 4.54e-08 7.94e-05 1.33e-04 3.93e-04 S4 APOE APOC1 CAMSAP1 XPR1 LINC00911 3.68e-11 1.68e-08 3.07e-05 4.30e-05 7.88e-05 S5 APOE APOC1 CAMSAP1 LETMD1 CSRNP2 4.30e-11 3.64e-08 1.15e-05 3.57e-05 6.17e-05 S6 APOE APOC1 CAMSAP1 XPR1 PPM1A 2.87e-10 2.29e-07 2.05e-05 9.72e-05 1.19e-04 Threshold 4.45e-07 8.89e-07 1.33e-06 1.78e-06 2.22e-06 2.4 Discussion We developed a suite of multi-variant association and interaction tests for survival pheno- types based on weighted V statistics. They have three main advantages over the state- of-the-art methods. First, the new tests are faster. Second, they can account for possible heterogeneity in genetic effects to enhance power. Third, they can handle linear confounding. Although the covariate adjustment in the (heterogeneity) weighted V tests is based on a Cox model, the tests are robust against model mis-specification. This is because, under a 29 null model that is not the Cox model, Z ∞ M̂Z,i = Ni (∞) − Yi (s) exp(ZTi γ̂)dΛ̂0 (s) Z0 ∞   E{dNi (s)} = Ni (∞) − {Yi (s) exp(ZTi γ ∗ ) + op (1)} + op (1) 0 E{Yi (s) exp(ZTi γ ∗ )} ∗ ≡ MZ,i + op (1), where γ ∗ is defined as in Lin and Wei (1989), E(MZ,i ∗ ) = 0, and thus Theorem 2 with MZ,i ∗ replaced by MZ,i still holds. The covariate adjustment proposed for the (heterogeneity) weighted V tests cannot con- trol Type I error rates around the nominal level in the presence of nonlinear confounding, i.e., the genetic covariates follow a nonlinear model about the adjustment covariates. However, our simulation results suggested that the empirical sizes of the proposed tests are smaller, not larger, than the nominal level in this situation (results not shown here due to space limi- tations). A conservative test does not violate the fundamental principle of small-probability events in hypothesis testing. The proposed G-G/G-E interaction test assumes linearity on the main effects of the two genes (or the gene and the environmental determinant). When the true main effects are nonlinear, the test may have an incorrect size. A more robust test could estimate the null model using the kernel machine Cox regression (Cai, Tonini, and Lin, 2011), which, however, significantly increases the computational cost. One may improve the power of the heterogeneity weighted V tests by changing wij to 30 (ρ+κij )f (Gi , Gj ), where ρ is a positive unknown parameter, and then using the test statistic S = sup nV H (ρ) ρ∈I or SZ = sup nVZH (ρ), ρ∈I where I is a range of ρ under consideration. The asymptotic null distribution of S or SZ is hard to derive. A workaround is to calculate S or SZ for many permutations of the n subjects’ martingale residuals to obtain a permutation-based p-value. But this permutation test requires an assumption that there is no relationship between the genetic variables and the adjustment covariates when the latter is present. We can also develop a set of weighted U tests in parallel to the weighted V tests developed herein. For example, a covariate-adjusted weighted U test for genetic association is based on the following weighted U statistic, 1 Xˆ UZ = f˜Z (gi , gj )M̂Z,i M̂Z,j , n(n − 1) i̸=j where fˆ˜Z (gi , gj ) is the ij-th element of (I − H)T F (I − H). Following the derivation of the large-sample distribution of nVZ , it is easy to show that when n is large, the null distribution of nUZ can be approximated by X n nUZ ∼ ξˆ ν̂t (χ21t − 1). t=1 31 The weighted U and V tests can be used interchangeably, as they have the same asymptotic efficiency. 32 CHAPTER 3 Multi-marker genetic association and interaction tests with interval- censored survival outcomes 3.1 Methods 3.1.1 Association test Consider a cohort study of n independent subjects, who are disease-free at the study registry. Denote genetic covariates G = (G1 , . . . , Gp ), where Gj is the j-th genetic variant that can be either a SNP (coded as 0, 1, or 2) or gene expression values. Z = (Z1 , . . . , Zd ) is a d-dim adjustment covariates. Survival times, T , are subject to the interval censoring and possible left truncation. Interval left and right endpoints are denoted as L and R, and left truncation time is denoted as A. Two sets of tests were developed to detect genetic association with interval-censored survival outcomes by assuming the effect of G is homogeneous or heteroge- neous across subpopulations indicated explicitly by X or by latent variables inferred by X. Therefore, we denote the observed data as D = {(Gi , Zi , Li , Ri , Ai , Xi )}ni=1 (Xi only used in the presence of genetic heterogeneity). We assume that examination times {(Li , Ri )}ni=1 are independent of event risk (In that case one can in the analysis ignore the distribution of the examination times, and treat the examination times as fixed.). We also assume non- informative censoring and truncation time given G and Z (and X in the presence of genetic heterogeneity). Consider the scenario when in the absence of genetic heterogeneity. Under the null hypothesis that G is independent of T given Z, we assume a semi-parametric transformation model for the cumulative hazard function of T given Z as follow Λ(t|Z) = G[Λ0 (t) exp{β T Z}], 33 where Λ0 (t) is unspecified cumulative baseline hazard function with Λ0 (0) = 0. G(x) = R∞ − log 0 e−xt f (t)dt, where f (t) is a known density function whose support is R+ . Choose f (t) to be a gamma density with mean 1 and variance r yields the logarithmic transformation: G(x) = log(1 + rx)/x, which becomes Cox proportional hazard and proportional odds model when setting r=0 and r=1. Motivated by association test proposed in Chapter 2, we introduce a subject-level frailty (h) and use a working semiparametric transformation frailty model with cumulative hazard function: Λ(t|Z, h) = G[Λ0 (t) exp{β T Z + h}]. The covariate-adjusted weighted V test statis- tic can be constructed similarly to equaltion 2.6 by replacing martingale residuals {MZ,i }ni=1 by score function {SZ,i }ni=1 , where SZ,i = ∂ log Li (Λ0 , β, hi )/∂hi |hi =0 . Li (Λ0 , β, hi ) is condi- tional likelihood function of subject i given frailty hi and left-truncation time Ai , as follows T T exp(−G[Λ0 (Li )eβ Zi +hi ]) − exp(−G[Λ0 (Ri )eβ Zi +hi ]) Li (Λ0 , β, hi ) = T . exp(−G[Λ0 (Ai )eβ Zi +hi ]) The weighted V test statistics is as follow, Xn X n VZ,IC = n−2 f˜Z (Gi , Gj )SZ,i SZ,j , (3.1) i=1 j=1 where SZ,i SZ,j is considered as V statistic kernel that measures phenotype similarity, weighted 34 by covariate-centered genetic similarity f˜Z (Gi , Gj ), f˜Z (Gi , Gj ) = f (Gi , Gj ) − E[f (Gi , Gk )u(Zk , Zj )|Gi , Zj ] −E[f (Gj , Gk )u(Zk , Zi )|Gj , Zi ] +E[u(Zi , Zm )f (Gm , Gk )u(Zk , Zj )|Zi , Zj ], where f (Gi , Gj ) is genetic similarity and u(Zi , Zj ) = (1, ZTi )[E{(1, ZT )T (1, ZT )}]−1 (1, ZTj )T . When G has effect on T , the larger (smaller) phenotype similarity, weighted by larger (smaller) genetic similarity, results in larger (smaller) weighted V statistic and more (less) significant p-value. When Z is independent of G, the asymptotic null distribution of VZ,IC can be obtained similarly following Theorem 2 in Chapter 2. Denote ξ = E(SZ2 ), {νt }∞ t=1 are eigenvalues of f˜Z (Gi , Gj ), the asymptotic null distribution of weighted V statistic is X ∞ nVZ,IC ∼ ξ νt χ21t , (3.2) t=1 where {χ21t }∞ 2 t=1 are independent χ random variables of 1 degree of freedom. When Z and G are linearly correlated, G = a + bT Z + e, where a and b are intercept and regression coefficients and e is a normally distributed random error that is independent of Z. Theorem 2 in Chapter 2 suggests that the distribution in 3.2 is still asymptotically correct. In real application the regression coefficients β̂ and cumulative baseline hazard function T Λ̂0 (t) in 3.1 are obtained by fitting model Λ(t|Z, h = 0) = G[Λ0 (t)eβ Z ] with r = 0 using 35 unpenalized version of penalized nonparametric maximum likelihood estimation (PNPMLE) for Cox regression and possibly left truncated data (Li, Pak, and Todem, 2020). When r > 0 is considered, a more general method (Zeng, Mao, and Lin, 2016) can be used to obtain NPMLE of β and Λ0 (t). Due to the lack of available estimation methods for the Porportional Odds model with left-truncation and interval-censored data, left truncation was considered only under the Cox proportional hazards model. f˜Z (Gi , Gj ) is estimated by its sample mean denoted as (I − H)F(I − H), where F = {f (Gi , Gj )}n×n with f (Gi , Gj ) the genetic similarity between any pair of subjects i and j, H = Z̃(Z̃T Z̃)−1 Z̃T with Z̃ = (1, Z). The weighted V statistic can be rewritten as VZ,IC = ŜTZ (I − H)F(I − H)ŜZ , (3.3) where ŜZ = (ŜZ,1 , . . . , ŜZ,n ). Estimate of ξ = E(SZ2 ), denoted as ξ, ˆ can be obtained by Ŝ T ŜZ /n, where ŜZ,i = Z ∂Li (Λ0 (t), β, hi )/∂hi |Λ0 (t)=Λ̂0 (t),β=β̂,hi =0 . The estimate of νt is estimated as ν̂t = ν̃t /n, where ν̃t ’s are eigenvalues of f˜Z (Gi , Gj ). The asymptotic null distribution of nVZ,IC can be ap- proximated by Xn nVZ,IC ∼ ξˆ ν̂t χ2t1 , t=1 which is linear combination of independent χ2 random variables of 1 degree of freedom. obs p-value can be computed using Davies’ method (Davies, 1980) P (nVZ,IC ≥ nVZ,IC ), where obs VZ,IC ) is the observed value of VZ,IC ). When in the absence of adjustment covariates Z, the centered genetic similarity is denoted 36 as f˜(Gi , Gj ) and is estimated by sample mean (I − J)F(I − J), where J = {1}n×n . The corresponding weighted V statistic is VIC = ŜT (I − J)F(I − J)Ŝ, and the asymptotic null distribution can be approximated similarly as above. When in the presence of genetic heterogeneity, the effect of G varies across different subpopulations, the association tests VIC and VZ,IC is extended to consider the subpopulation structure, thereby improving test power. The setting is the same as the association test in the absence of heterogeneity, except that X is now introduced to infer explicit or latent subpopulation structure. The heterogeneity weighted V statistics, with or without covariate adjustment are  H = ŜT (I − J)W(I − J)Ŝ   VIC   H  VZ,IC  = ŜT (I − H)W(I − H)Ŝ, where W = (J + K) ⊙ F. K = {k(Xi , Xj )}n×n with the ij-th element represents sub- population similarity of subject i and j. ⊙ is the element-wise matrix multiplication. The H H asymptotic null distribution of nVIC and nVZ,IC can be approximated similarly as that of nVIC and nVZ,IC using linear combination of independent χ2 random variable with 1 degree of freedom. Based on this distribution, p-value is calculated using Davies’ method (Davies, H H H H H H 1980) by P (nVIC ≥ nVIC,obs ) and P (nVZ,IC ≥ nVZ,IC,obs ). VIC,obs and VZ,IC,obs are observed H H values of VIC and VZ,IC . 3.1.2 G-G/G-E interaction test Two interaction tests were developed to detect the interaction effect between two genes (G-G) or between a gene and environmental variable (G-E). Let G = (G1 , . . . , Gp ) be the interested gene, let H = (H1 , . . . , Hq ) be a different gene or an environmental variable when there exists G-G or G-E interaction effect. 37 To test G-G interaction effect, we assume that given G and H, the survival time T for subject i follow a semi-parametric transformation model with cumulative hazard function as follow Λ(t|Gi , Hi ) = G[Λ0 (t) exp{β T Gi + αT Hi }]. (3.4) The V statistic kernel is Si Sj with Si = ∂Li (Λ0 , β, α, hi )/∂hi |Λ0 =Λ̂0 ,β=β̂,α=α̂,hi =0 . The condi- tional likelihood function for subject i, Li (Λ0 , β, α, hi ), can be written as follow T T T Gi +αT Hi +hi exp(−G[Λ0 (Li )eβ Gi +α Hi hi ]) − exp(−G[Λ0 (Ri )eβ ]) Li (Λ0 , β, α, hi ) = T . (3.5) exp(−G[Λ0 (Ai )eβ Gi +αT Hi +hi ]) I The weighted V statistic for G-G interaction is VIC = ŜT (I − O)(F ⊙ K)(I − O)Ŝ, where O = Q(QT Q)−1 QT , Q = (1, G, H). F = {f (Gi , Gj )}n×n and K = {k(Hi , Hj )}n×n are I genetic similarity matrices. The asymptotic null distribution of nVIC can be approximated similarly as that of nVZ,IC . When p + q is large relative to sample size n, the asymptotic I null distribution of VIC is . p-value can be computed using Davies’ method (Davies, 1980) I I I I P (nVIC ≥ nVIC,obs ), where VIC,obs is observed value of VIC . The test for G-E interaction effect is performed in the same way as that of G-G interaction test except that H is replaced by environmental variable. 3.2 Simulations Monte Carlo simulation was performed to assess the finite-sample performance of (hetero- geneity) weighted V statistic in detecting genetic association with time-to-onset of interval- censored and possibly left-truncated survival outcomes. In all simulation settings, unless otherwise specified, two sample sizes n = 500, 1000 and two SNP-set sizes p = 4, 8 were 38 considered. For each subject, we assume three examine center visits at ages {V1 , V2 , V3 }, where V1 ∼ U nif orm(L, R), V2 = V1 + U nif orm(L, R), and V3 = V2 + U nif orm(L, R) with predetermined positive values L and R to control the left censoring rate ∼ 25% and right censoring rate ∼ 35%. When adjust for covariates, one binary variable Z1 ∼ Binomial(0.5) and one continuous variable Z2 ∼ U nif orm(0, 2) were considered. Genetic covariates G = (G1 , . . . , Gp ) is a n × p SNP set generated using a two-step procedure 1) generate n random samples from multivariate normal distribution M V N (0, Σ), where Σ = {0.5|k−l| }p×p is covariance matrix that used to mimic LD between any pair of SNPs, 2) categorize each quantitative column into SNP with three levels labeled as 0, 1, and 2 by using predetermined quantile cutoffs: a2 , 2a(1−a), and (1−a)2 of a standard normal distribution with a = 0.2, so that the resulting population is in HWE and each SNP has MAF=0.2. When in the presence of genetic heterogeneity, X was generated accordingly to infer explicit/latent subpopulation structure. Survival time T was assumed to follow either Cox proportional hazard or propor- tional odds model, two special cases of the class of semiparametric transformation model. However, left-truncated survival time was not considered under the proportional odds model, due to the lack of an available estimation method, and the simulation results were shown in Appendix. In all simulations, 1000 Monte Carlo samples were generated and the significance level was set to 0.05. 3.2.1 Testing genetic association in the absence of genetic heterogeneity In this series of simulation, we investigated the performance of weighted V statistic in de- tecting genetic association considering adjustment covariates and left-truncation times, in the absence of genetic heterogeneity. Survival time for subject i (i = 1, . . . , n) follows Cox 39 PH model (constant baseline hazard) with the following form of hazard function, λ(t|Zi , Gi ) = exp{β T Gi + αT Zi }, (3.6) where the value of β and α vary in different simulation settings to achieve moderate left- and right-censoring rate (∼ 30%). Empirical size and power under various n’s and p’s In this simulation, we assumed that Z is independent of G, and the empirical size and power of the weighted V test were evaluated under various n′ s and p′ s. Regression coefficients in hazard function 3.6 are set to β = 0, α = 0.5 for empirical size assessment and β = α = 0.5 for power assessment. We set L = 0.1 and R = 0.6 as endpoints of uniform distribution used in follow-up time generation. IBS kernel was used to measure genetic similarity. Table 3.1 shows that the Type I error of WV-IC is close to the nominal level and the power increases with sample size. Table 3.1 Empirical size and power of WV-IC with r = 0 in testing genetic effects in the absence of left truncation. Empirical Size (Power) p=4, n=500 p=4, n=1000 WV-IC 0.047 (0.231) 0.057 (0.410) p=8, n=500 p=8, n=1000 WV-IC 0.048 (0.415) 0.054 (0.738) Empirical size and power under linear confounding In this simulation, we assumed that there exists linear confounding effect, when Z has linear relationship with G. The genetic covariate of our interest, G = (G1 , . . . , Gp ), are gene expression values with Gi = Zi1 + Zi2 + ei (i = 1, . . . , n). The random error is ei ∼ 40 M V N (0, Σ) with Σ = {0.5|k−l| }p×p . Regression coefficients in hazard function 3.6 was set to β = 0, α = 0.05 for empirical size assessment and β = α = 0.02 for power assessment. We set L = 0.1 and R = 0.6 in follow-up time generation. Cross-product kernel was used to measure gene expression similarity. Table 3.2 shows that WV-IC controls Type I error well around nominal level and gains power as sample size increases. Table 3.2 Empirical size and power of WV-IC with r = 0 in testing genetic effects under linear confounding and no left truncation. Empirical Size (Power) p=4, n=500 p=4, n=1000 WV-IC 0.050 (0.158) 0.055 (0.313) p=8, n=500 p=8, n=1000 WV-IC 0.041 (0.320) 0.056 (0.574) 3.2.2 Testing genetic association in the presence of genetic heterogeneity So far, we have assumed that the effect of G is homogeneous in the population. In this series of simulations, we introduced genetic heterogeneity, and the performance of HWV-IC and WV-IC will be evaluated under four sources of heterogeneity 1) observable subpopulations, 2) two latent subpopulations, 3) twenty latent subpopulations, and 4) individual genome profile. Genetic heterogeneity across observable subpopulations In this simulation, we investigated the performance of HWV-IC and WV-IC in the pres- ence of genetic heterogeneity across four observable subpopulations, which was inferred by Z = (Z1 = I(1 ≤ U < 2), Z2 = I(2 ≤ U < 3), Z3 = I(3 ≤ U ≤ 4)), where U = (U1 , . . . , Un ) ∼ U nif orm(0, 4) and I(·) is an indicator function. The genetic covariate G was generated following the two-step procedure, but four different sets of predetermined 41 MAFs (∼ U nif orm(0.005, 0.05)) and LD structures (measured by ρ through Σp×p = {ρ|k−l| }) were used as follows, MAFs = {0.0118, 0.0304, 0.0476, 0.0450, 0.0393, 0.0076, 0.0459, 0.0054} and ρ = 0.2 in subpopulation 1; MAFs = {0.0378, 0.0421, 0.0312, 0.0467, 0.0156, 0.0085, 0.0362, 0.0336} and ρ = 0.5 in subpopulation 2; MAFs = {0.0157, 0.0299, 0.0220, 0.0119, 0.0272, 0.0093, 0.0063, 0.0354} and ρ = 0.7 in subpopulation 3; MAFs = {0.0257, 0.0323, 0.0289, 0.0252, 0.0078, 0.0319, 0.0439, 0.0193} and ρ = 0.6 in subpopulation 4. Survival time for subject i (i = 1, . . . , n) was generated from Cox PH model with the following hazard function, p X λ(t|Gi , Zi ) = exp{ (β1 Zi1 + β2 Zi2 + β3 Zi3 + β4 )Gik + 0.6Zi1 + 0.6Zi2 + 0.2Zi3 }, k=1 where (Zi1 , Zi2 , Zi3 ) is a three-dimensional dummy variable that indicates which subpopula- tion subject i belongs to. We set β1 = β2 = β3 = β4 = 0 for empirical size assessment and β1 = 0.4, β2 = 0.002, β3 = 0.0005, β4 = 0.0005 for power assessment. L = 0.1 and R = 0.5 were the endpoints of unform distribution used in follow-up time generation. IBS kernel was used to measure genetic similarity and identity kernel for subpopulation inferred by Z. Table 3.3 indicates that Type I error of both HWV-IC and WV-IC are close to nominal level and HWV-IC is more powerful than WV-IC as expected since HWV-IC takes advantages of 42 subpopulation structure in performing the test to increase statistical power. Table 3.3 Comparison of performance of WV-IC and HWV-IC (r = 0) in testing genetic association under genetic heterogeneity across four observable subpopulations and no left truncation. Empirical Size (Power) p=4, n=500 p=4, n=1000 HWV-IC 0.040 (0.535) 0.046 (0.878) WV-IC 0.042 (0.384) 0.049 (0.647) p=8, n=500 p=8, n=1000 HWV-IC 0.046 (0.801) 0.053 (0.998) WV-IC 0.047 (0.552) 0.044 (0.889) Genetic heterogeneity across two latent subpopulations In this simulation, we investigated performance of HWV-IC in the presence of two latent subpopulations, inferred by one-dim vector X = {Xi }ni=1 , with Xi = ai + ei , where ai = Bernoulli(0.5) and ei ∼ N ormal(0, 0.5). Survival time for subject i in subpopulation j was generated from Cox PH model with the following hazard function, p X λ(t|Gij , Zij ) = exp{ Gijk βjk + 0.05(Zij1 + Zij2 )}, (3.7) k=1 where {β j = (βj1 , . . . , βjp ) s.t. βj1 = . . . = βjp }2j=1 , denoted as {β j }2j=1 , are effect sizes of G in two subpopulations. {β j }2j=1 were set to 0 for empirical assessment and set to different values power assessment under five different scenarios representing four different forms of genetic heterogeneity. We set L = 0.1 and R = 0.5 for follow-up time generation. IBS kernel was used to measure genetic similarity and Gaussian kernel for subpopulation similarity. Table 3.4 indicates that Type I error of HWV-IC is close to nominal level. Table 3.5 shows power assessment under no heterogeneity (T1) and four different forms of heterogeneity (T2 43 - T5). It can be observed that power of HWV-IC increase with sample size and size of heterogeneity (|β1k − β2k |). Table 3.4 Empirical size of HWV-IC with r = 0 in testing genetic effects under genetic heterogeneity across two latent subpopulations and no left truncation. Empirical Size p=4, n=500 p=4, n=1000 HWV-IC 0.042 0.049 p=8, n=500 p=8, n=1000 HWV-IC 0.047 0.045 Table 3.5 Power of HWV-IC with r = 0 in testing genetic effects under genetic heterogeneity across two latent subpopulations and no left truncation. Various heterogeneity scenarios were considered, determined by the values of β1k and β2k , including the same effect size and the same effect direction (T1), identical sizes but opposite directions (T2), no effect in one subpopulation while positive effect in the other (T3), different sizes and opposite directions (T4), and different sizes but the same direction (T5). Heterogeneity Scenario T1 T2 T3 T4 T5 β1k 0.05 0.1 -0.05 -0.1 0 0 -0.025 -0.025 0.02 0.02 β2k 0.05 0.1 0.05 0.1 0.05 0.1 0.05 0.1 0.05 0.1 p=4, n=500 0.153 0.508 0.079 0.213 0.063 0.185 0.083 0.201 0.079 0.202 p=4, n=1000 0.257 0.862 0.136 0.402 0.110 0.369 0.108 0.305 0.162 0.440 p=8, n=500 0.210 0.790 0.232 0.735 0.139 0.441 0.167 0.437 0.142 0.450 p=8, n=1000 0.423 0.989 0.477 0.976 0.230 0.801 0.280 0.788 0.258 0.812 Genetic heterogeneity across twenty latent subpopulations In this simulation, we investigated the performance of HWV-IC in the presence of twenty latent subpopulations. The simulation setting is similar to simulation above except that the number of subpopulation increased from two to twenty. X = (X1 , . . . , X25 ) was generated to infer subpopulation structure with Xk = ak + ek , ak is a length n bootstrap sample from {1, 2, . . . , 20} and ek ∼ N ormal(0, 0.5). Regression coefficients, {β j }20 j=1 , in hazard function 3.7 were set to 0 for empirical size assessment. For power assessment, {β j }20 j=1 were generated 44 from uniform distribution with mean µβ and variance σβ2 . We set L = 0.1 and R = 0.6 for follow-up time generation. IBS kernel was used to measure genetic similarity and Gaussian kernel for subpopulation similarity. Table 3.6 indicates that Type I error of HWV-IC is close to nominal level under various n’s and p’s. Table 3.7 indicates the power of HWV-IC increases as mean effect size (µβ ) increases. For a mean effect size (µβ ), the power increases with size of genetic heterogeneity (σβ ). Table 3.6 Empirical size of HWV-IC with r = 0 in testing genetic effects under genetic heterogeneity across twenty latent subpopulations and no left truncation. Empirical Size p=4, n=500 p=4, n=1000 HWV-IC 0.051 0.050 p=8, n=500 p=8, n=1000 HWV-IC 0.045 0.052 Table 3.7 Power of HWV-IC with r = 0 under genetic heterogeneity across twenty latent subpopulations and no left truncation. Power µβ 0 0 0 0.025 0.025 0.025 0.05 0.05 0.05 σβ 0.05 0.1 0.15 0.05 0.1 0.15 0.05 0.1 0.15 p=4, n=500 0.063 0.148 0.369 0.105 0.240 0.483 0.224 0.422 0.640 p=4, n=1000 0.108 0.376 0.804 0.252 0.562 0.906 0.532 0.796 0.969 p=8, n=500 0.179 0.766 0.993 0.283 0.862 0.996 0.549 0.933 0.999 p=8, n=1000 0.434 0.989 1.000 0.687 0.999 1.000 0.908 1.000 1.000 Genetic heterogeneity across individual genome profile In this simulation, instead of assuming the subpopulation structure to be ’categorical’ (two or twenty subpopulations), we investigated the performance of HWV-IC in the presence of heterogeneity across ’continuous’ subpopulation structure. In other words, each individual is a ’subpopulation’. Regression coefficients in hazard function 3.7, {β j }nj=1 , were set to 0 45 for empirical assessment and sampled from uniform distribution with mean µβ and variance σβ2 for power assessment. We generated a 1000-SNP genome profile, X = (X1 , . . . , X1000 ), to infer subpopulation structure. The generation of X follow a two-step procedure, 1) generate a length n sample from M V N (0, Σ), repeat for 1,000 times to get a n × 1000 matrix, the covariance matrix Σ was set to In×n under null hypothesis and exp{−|βil − βjl |/σβ } under alternative hypothesis. 2) categorize each quantitative column into SNP coded as 0, 1, and 2 using prespecified cutoffs in order to achieve HWE and MAFs that follow uniform distribution U nif orm(0.05, 0.2). We set (L, R) to be (0.2, 0.6) and (0.1, 0.5), respectively, for empirical size and power assessment. IBS kernel was used to measure both genetic and genome profile similarity. Table 3.8 indicates that Type I error of HWV-IC is close to nominal level. Table 3.9 indicates that HWV-IC gains power as mean effect size (µβ ) increases. For a fixed mean effect size (µβ ), HWV-IC gains power with heterogeneity size (σβ ). Table 3.8 Empirical size of HWV-IC with r = 0 in testing genetic effects under genetic heterogeneity across individual genome profiles and no left truncation. Empirical Size p=4, n=500 p=4, n=1000 HWV-IC 0.043 0.052 p=8, n=500 p=8, n=1000 HWV-IC 0.047 0.049 3.2.3 Testing G-G/G-E interaction effect In this simulation, we investigated the performance of WVI-IC and LRT in testing G-G and G-E interaction effect. For testing G-G interaction between two genes G = (G1 , . . . , Gp ), H = (H1 , . . . , Hq ). The survival time for subject i was generated from Cox PH model with 46 Table 3.9 Power of HWV-IC with r = 0 under genetic heterogeneity across individual genome profiles and no left truncation. Power µβ 0 0 0 0.02 0.02 0.02 0.04 0.04 0.04 σβ 0.04 0.08 0.12 0.04 0.08 0.12 0.04 0.08 0.12 p=4, n=500 0.062 0.076 0.103 0.089 0.116 0.149 0.170 0.190 0.252 p=4, n=1000 0.069 0.087 0.223 0.125 0.204 0.369 0.330 0.442 0.590 p=8, n=500 0.094 0.280 0.693 0.170 0.388 0.765 0.347 0.589 0.850 p=8, n=1000 0.117 0.750 0.996 0.326 0.856 0.999 0.699 0.957 1.000 the following hazard function p q pq X X X λ(t|Gi , Hi ) = exp{ 0.02Gik + 0.02Hil + βm (GH)im }, (3.8) k=1 l=1 m=1 where (GH)im is the m-th column of interaction between Gi and Hi . {βm }pq m=1 were set to 0 for empirical size assessment and 0.08 for power assessment. We set L = 0.1 and R = 0.5 for follow-up time generation. Cross-product kernel was used to measure genetic similarity for both genes. Two sets of SNP-set size were used (p = 4, q = 2) and (p = 6, q = 3). For testing G-E interaction, the simulation setting is similar to G-G interaction test, except that H = (H1 , . . . , Hn ) ∼ Binomial(0.5) is now a one-dim binary vector representing subjects’ gender. Survival time for subject i was generated from Cox proportional hazard model with hazard function 3.8. {βm }pq m=1 were set to 0 and 0.1 for empirical size and power assessment. Cross-product kernel was used to measure genetic similarity and identity kernel for gender similarity. Two sets of SNP-set size (p = 4, q = 1) and (p = 8, q = 1) were investigated. Table 3.10 and 3.11 indicate that, in testing both G-G and G-E interaction, Type I error of WVI-IC is close to nominal level while LRT is inflated. Also, in both scenarios, WVI-IC 47 is more powerful than LRT as expected because the asymptotic null distribution of WVI-IC has a more efficient degree of freedom than LRT, therefore more powerful than LRT. Table 3.10 Empirical sizes and powers of WVI-IC and LRT in testing G-G interaction under no left truncation. Empirical Size (power) p=4, q=2, n=500 p=4, q=2, n=1000 WVI-IC 0.046 (0.458) 0.052 (0.793) LRT 0.069 (0.305) 0.073 (0.519) p=6, q=3, n=500 p=6, q=3, n=1000 WVI-IC 0.060 (0.864) 0.057 (0.989) LRT 0.115 (0.590) 0.076 (0.863) Table 3.11 Empirical sizes and powers of WVI-IC and LRT in detecting G-E interaction under no left truncation. Empirical Size (power) p=4, q=1, n=500 p=4, q=1, n=1000 WVI-IC 0.051 (0.236) 0.056 (0.435) LRT 0.055 (0.204) 0.054 (0.350) p=8, q=1, n=500 p=8, q=1, n=1000 WVI-IC 0.047 (0.450) 0.043 (0.768) LRT 0.069 (0.341) 0.068 (0.575) 3.2.4 Empirical size under stringent p-value thresholds In case-control studies, there is usually a huge amount of genetic variants are tested. To account for the multiple testing issue, Bonferroni correction is used and results in p-value threshold 5 × 10−3 or smaller. In this simulation, we investigated the performance of 1) WV-IC in the absence of genetic heterogeneity and 2) HWV-IC in the presence of genetic heterogeneity across observable subpopulations under stringent p-value thresholds. The cor- responding simulation settings for significance level 0.5 were adopted, except that only sce- nario p = 4, n = 1000 was investigated and 150K Monte Carlo samples were generated. 48 The empirical size is calculated as the proportion of p-values smaller than the corresponding p-value threshold. Table 3.12 shows that the Type I error of both WV-IC and HWV-IC is close to the nominal level, indicating the capability of WV-IC and HWV-IC for large-scale GWAS. Table 3.12 Empirical sizes of WV-IC and HWV-IC under stringent p-value thresholds and no left truncation. Empirical Size Threshold WV-IC HWV-IC 0.005 0.0046 0.0047 0.0005 0.00039 0.00053 0.00005 0.000044 0.000047 3.3 A Real Application We applied the proposed WV-IC and HWV-IC tests to a GWAS Dental Caries: A Whole- Genome Association and Gene X Environment studies. The study includes 5,418 consented subjects ascertained through four study sites: PITT, IOWA, DRDR, and GEIRS. Each subject has 601,273 typed SNPs and 84 subject-level phenotypes such as age at the dental exam, gender, and number of DMFS (Decay, Missing, Filling, Sound) teeth/surfaces. Since the goal of this analysis is to detect genetic association with age-to-onset of ECC, in the absence/presence of genetic heterogeneity, we keep only subjects who were younger than 6 at the dental exam. Also, since each subject has one dental exam, the age-to-onset of ECC is current status data, one special case of interval censoring survival outcome. To avoid systematic error, we performed SNP- and subject-level quality control on the genotype data using PLINK 1.9. SNPs were removed if MAF>0.01, HWE test p-value> 10−6 , or missing rate > 2%. Subjects were removed if the missing genotype rate was > 2%. In 49 addition to quality control, we also removed subjects whose genotype or phenotype is missing. The resulting genotype data has 1,125 subjects and 553,194 SNPs. The genotype data still has ∼ 0.01% missing values, which were then imputed by random samples generated from Binomial(2, M AF ), where MAF was estimated from the phenotype dataset. PCA was performed using PLINK 1.9 to obtain the first 10 principal components. The 553,194 SNPs were then grouped into 23,008 different genes based on human UCSC human genome annotation (NCBI36/hg18). Each of the 23,008 genes was tested for genetic association and potential heterogeneity effect using WV-IC and HWV-IC adjust for the top 10 principal components, study site indicator (PITT as baseline), gender, and race. To account for the multiple testing issue, we adopted the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995) to con- trol the FDR at 10%. The p-value threshold for i-th most significant gene is obtained by Pm −1 li = (iα){m k=1 (1/k)} (i = 1, . . . , 23008), where α = F DR and m=23,008. When in the presence of genetic heterogeneity, four sources were considered 1) across different races (White, Asian, Black American Indian, Bi- or multi-racial, and Other), 2) across two water fluoride levels, sufficient (>0.7 mg/L) and insufficient (≤0.7 mg/L), 3) across two sexes, and 4) across genetic background inferred by a random sample of 200,000 SNPs from the whole genome. Note that, because subjects ascertained from the DRDR study site have missing water fluoride levels, the sample size drops from 1,125 to 652 and the count of unique study sites drops from 4 to 3 when water fluoride level is included in the analysis. In all analy- sis scenarios, two kernels were used to measure genetic similarity: IBS and Cross-product, two special cases of the semiparametric transformation model were considered: the Cox proportional hazard model (r=0) and proportional odds model (r=1). 50 Table 3.13 shows the top five genes detected by WV-IC (scenario S1) and HWV-IC (scenarios S2-S5) and the corresponding p-values. Even though none of the 23,008 genes was shown to be statistically significantly associated with age-to-onset of ECC, the analysis did suggest that MPPED2 and TSPAN2 were frequently found to have the smallest p-value. MPPED2 exhibited suggestive evidence for association with ECC development in the first genome-wide association scan for dental caries in the study of 1,305 US children (Shaffer et al., 2011). The biological role of MPPED2 was later verified via a meta-analysis of five childhood samples (Staley et al., 2017). For TSPAN2, no strong evidence was found to show its biological role in ECC. 3.4 Discussion We developed the first set of multi-marker tests for genetic associations and G-G/G-E inter- actions with interval-censored and possibly left-truncated survival outcomes. The new tests can adjust for covariates based on a semiparametric transformation model. The proposed HWV-IC can also account for genetic heterogeneity to increase the power of association testing. Our methods are directly applicable to interval-censored competing risks data if the observation process is independent of the process of competing risks given the covariates. Specifically, by applying our methods to the reduced interval-censored competing risks data about the failure cause of interest (Hudgens, Li, and Fine, 2014), one can test the effect of a marker set or a G-G/G-E interaction on the cumulative incidence function of that cause. It is worthwhile to extend the proposed tests to multivariate survival phenotypes. This extension has applications to genetic studies of chronic diseases that can occur at multiple sites in the human body, such as dental caries, diabetic retinopathy, and age-related macular 51 Table 3.13 Top five genes discovered by WV-IC and HWV-IC from a gene-based genome-wide association analysis of the DC-WGAGE dataset. PH and PO stand for the Cox proportional hazard model and the proportional odds model respectively. CP and IBS stand for cross-product kernel and IBS kernel respectively. Various types of heterogeneity were considered, including no genetic heterogeneity (S1), heterogeneity across races (S2), heterogeneity between the two home water fluoride levels (S3), heterogeneity between sexes (S4), and heterogeneity across genetic backgrounds (S5). Scenario Genes and p-values MPPED2 OR2B11 NECAB3 E2F1 CYB5R2 S1 5.01E-05 7.61E-05 8.62E-05 8.62E-05 1.69E-04 MPPED2 HOXC13-AS CYB5R2 NECAB3 E2F1 S2 3.21E-05 1.29E-04 1.37E-04 1.79E-04 1.79E-04 TSPAN2 CRCT1 HOXC13-AS LCE5A TSHB PO+CP S3 1.17E-05 2.23E-05 2.55E-05 2.73E-05 4.78E-05 MPPED2 OR2B11 NECAB3 E2F1 CDK5RAP1 S4 5.88E-05 9.91E-05 1.10E-04 1.11E-04 1.65E-04 MPPED2 OR2B11 NECAB3 E2F1 CYB5R2 S5 4.94E-05 7.84E-05 8.70E-05 8.70E-05 1.69E-04 MPPED2 OR2B11 NECAB3 E2F1 CYB5R2 S1 8.23E-05 8.48E-05 8.90E-05 8.90E-05 9.26E-05 MPPED2 CYB5R2 LINC00511 HOXC13-AS NCR3LG1 S2 5.75E-05 9.12E-05 1.69E-04 1.90E-04 2.08E-04 TSPAN2 HOXC13-AS CRCT1 ECRG4 LCE5A PH+CP S3 1.27E-05 3.05E-05 4.53E-05 5.07E-05 5.50E-05 CYB5R2 MPPED2 TSPAN2 NCR3LG1 OR2B11 S4 2.16E-04 2.21E-04 2.49E-04 2.64E-04 2.73E-04 TSPAN2 MPPED2 CYB5R2 TPM3P9 NCR3LG1 S5 1.77E-04 1.93E-04 2.02E-04 2.07E-04 2.09E-04 LOC101927989 MPPED2 PAX9 NECAB3 E2F1 S1 4.68E-05 4.89E-05 7.04E-05 7.61E-05 7.61E-05 MPPED2 STARD5 LOC101927989 CCDC185 CYB5R2 S2 2.78E-05 3.56E-05 1.19E-04 1.31E-04 1.50E-04 TSPAN2 CRCT1 TSHB LCE5A HOXC13-AS PO+IBS S3 5.84E-06 5.63E-05 6.04E-05 6.23E-05 6.48E-04 LOC101927989 CCDC185 MPPED2 PAX9 NECAB3 S4 5.03E-05 6.12E-05 6.46E-05 8.93E-05 9.72E-05 LOC101927989 MPPED2 PAX9 NECAB3 E2F1 S5 4.79E-05 4.85E-05 7.13E-05 7.79E-05 7.79E-05 CCDC185 MPPED2 NECAB3 E2F1 PAX9 S1 5.76E-05 6.89E-05 7.58E-05 7.58E-05 8.95E-05 STARD5 MPPED2 CCDC185 CYB5R2 PPM1E S2 4.28E-05 4.36E-05 7.93E-05 1.07E-04 1.65E-04 TSPAN2 HOXC13-AS TSHB CRCT1 ECRG4 PH+IBS S3 6.21E-06 7.55E-05 7.83E-05 1.14E-04 1.25E-04 CCDC185 STARD5 TSPAN2 MPPED2 CYB5R2 S4 2.04E-05 1.02E-04 1.63E-04 2.01E-04 2.51E-04 CCDC185 STARD5 TSPAN2 MPPED2 CYB5R2 S5 2.79E-05 7.71E-05 1.12E-04 1.60E-04 2.14E-04 p-value threshold 4.09E-07 8.18E-07 1.23E-06 1.64E-06 2.05E-06 degeneration. Joint analysis of the failure times at the different sites could increase the power to detect the association of such a disease with genes. The extension hinges on finding an appropriate phenotype similarity for multivariate interval-censored survival endpoints. 52 CHAPTER 4 Multi-marker genetic association and interaction tests based on the accelerated failure time model 4.1 Methods 4.1.1 Association tests Consider a cohort of n independent individuals who have not experienced any competing events at baseline. We assume that each individual is subject to multiple potential sources of failure event and the occurrence of one event impedes the occurrence of all other events. Let G = (G1 , . . . , Gp ) be genetic covariates that is either a set of SNPs, gene expression values or a genetic pathway. We are interested in testing the genetic association with age-to- onset of Cause 1 (primary event) with or without considering genetic heterogeneity. Denote A as left truncation time (e.g., age at study registry), T̃ = min(T, C) is observed survival time subject to right censoring, where T is survival time and C is right censoring time. ∆ = I(T ≤ C) and ϵ is cause of failure, ϵ = 1 for primary cause, ϵ = 0 for all other causes. Baseline covariates, Z = (Z1 , . . . , Zk ) is adjusted to reduce confounding and/or increase power. When in the presence of genetic heterogeneity, X = (X1 , . . . , Xd ) is used to infer subpopulation structure. Observed data for n subjects is denoted as D = (D1 , . . . , Dn ), with {Di = (Ai , T̃i , ∆i , ∆i ϵi , GTi , ZTi )}ni=1 and also includes Xi in the presence of genetic heterogeneity. We assume the survival time T , the residual censoring time C − A and the left truncation time A are conditionally independent given G and Z (and X if considering genetic heterogeneity). When in the absence of genetic heterogeneity, under the null hypothesis that G is not associated with survival time T , we assume that, given adjustment covariate Z, the survival 53 time of subject i (i = 1, . . . , n) follows AFT model with the following CSH for Cause 1, T T λ(t|Zi ) = λ0 (te−β Zi )e−β Zi , where λ0 (·) is an unknown baseline CSH function. In real application, the regression param- eter β and cumulative baseline hazard function Λ0 (t) are estimated by applying rank-based estimation method (Chiou and Xu, 2017) to observed data D with the (working) AFT model log(T ∗ ) = β T Z + ϵ. The resulting estimators are β̂ and Λ̂0 (β, t). Then we have Rt Rt Λϵ (t) = −∞ ϵ λ (s)ds = −∞ es λ0 (s)ds, which can be estimated using Nelson-Aalon type estimator, Z t Pn i=1 dNi (β, s) Λ̂ϵ (β, t) = Pn , −∞ i=1 νi (β, s)Yi (β, s) where Ni (β, t) = I(ei (β) ≤ t, ∆i ϵi = 1), Yi (β, t) = I(ei (β) ≥ t), and νi (β, t) = I(eai (β) < t), with ei (β) = log T̃i − β T Zi and eai (β) = log Ai − β T Zi . Define martingale residual to R∞ be Mi = Mi (Λϵ , β) = Ni (β, ∞) − −∞ νi (β, t)Yi (β, t)dΛϵ (t)}, M̂i = Mi (Λ̂ϵ (β, t), β) and M̃i = Mi (Λ̂ϵ (β̂, t), β̂). The weighted V test statistics can be written as VZ = M̃T FM̃ Xn X n = f (Gi , Gj )M̃i M̃j , i=1 j=1 where M̃ = (M̃1 , . . . , M̃n ) and F = {f (Gi , Gj )}n×n . M̃i M̃j is V statistic kernel that measures phenotype similarity weighted by f (Gi , Gj ), which measures genetic similarity, for a pair of subjects (i, j). The value of weighted V statistic is the weighted sum over all pairs of (i, j). Therefore, the larger (smaller) the phenotype similarity weighted by larger (smaller) genetic 54 similarity results in larger (smaller) VZ value, therefore smaller (larger) p-value. To get the asymptotic null distribution of VZ , we decompose F = E1T E1 for some m × n matrix E1 , where m(1 ≤ m ≤ n) is the rank of F. We then have weighted V statistic in quadratic form as: VZ = (M̃T E1T )(E1 M̃), and the asymptotic null distribution can be Pm approximated by j=1 λj χ21j , which is linear combination of independent χ2 variables of 1 degree of freedom weighted by {λi }m i=1 , the eigenvalues of estimator of Cov(E1 M̃) (derivation shown in Appendix). Based on this distribution, the p-values can be computed using Davies’ method (Davies, 1980), P (VZ ≥ VZ,obs ) , where VZ,obs is the observed value of VZ . When in the presence of genetic heterogeneity inferred by d-dimensional covariate X = (X1 , . . . , Xd ), the heterogeneity weighted V test statistic is an extension of 5.3 as follow, VZH = M̃T WM̃, where W = (J + H) ⊙ F, H = {h(Xi , Xj )}n×n , h(Xi , Xj ) is a kernel function that measures subpopulation similarity between subject i and j. ⊙ is element-wise matrix multiplication. The asymptotic null distribution of VZH can be approximated similarly as that of VZ . Based on the distribution, p-value can be computed using Davies’ method (Davies, 1980), P (VZH ≥ H H VZ,obs ), where VZ,obs is observed value of VZH . 4.1.2 G-G/G-E interaction test We developed two interaction tests to detect the potential association between G-G and G-E interaction effect on age-to-onset of the primary event (Cause 1) considering left-truncation time. In testing G-G interaction effect two genes G = (G1 , . . . , Gp ) and H = (H1 , . . . , Hq ). 55 Under the null hypothesis of no G-G interaction effect, the adjustment covariate Z is replaced with (G, H) and the genetic main effect is homogeneous. We assume the cause specific hazard for subject i given G and H is, T T Gi −αT Hi Gi −αT Hi λ(t|Gi , Hi ) = λ0 (te−β )e−β , (4.1) where λ0 (·) is unspecified baseline hazard function. In real application, the regression param- eter β and cumulative baseline hazard function Λ0 (t) are estimated by applying rank-based estimation method (Chiou and Xu, 2017) to observed data D with the (working) AFT model log(T ∗ ) = β T G + αT H + ϵ. The resulting estimators are β̂, α̂, and Λ̂0 (β, α, t). Then we Rt Rt have Λϵ (t) = −∞ λϵ (s)ds = −∞ es λ0 (s)ds, which can be estimated using Nelson-Aalon type estimator, Z t Pn i=1 dNi (β, α, s) Λ̂ϵ (β, α, t) = Pn , (4.2) −∞ i=1 νi (β, α, s)Yi (β, α, s) where Ni (β, α, t) = I(ei (β, α) ≤ t, ∆i ϵi = 1), Yi (β, α, t) = I(ei (β, α) ≥ t), and νi (β, α, t) = I(eai (β, α) < t), with ei (β, α) = log T̃i −β T Gi −αT Hi and eai (β, α) = log Ai −β T Gi −αT Hi . R∞ Let Mi = Mi (Λϵ , β, α) = Ni (β, α, ∞)− −∞ νi (β, α, t)Yi (β, α, t)dΛϵ (t)} be martingale resid- ual, M̂i = Mi (Λ̂ϵ (β, α, t), β, α) and M̃i = Mi (Λ̂ϵ (β̂, α̂, t), β̂, α̂). The weighted V test statis- tics can be written as VI = M̃T (F ⊙ K)M̃ Xn X n = f (Gi , Gj )k(Hi , Hj )M̃i M̃j , (4.3) i=1 j=1 where M̃ = (M̃1 , . . . , M̃n ), F = {f (Gi , Gj )}n×n , and K = {k(Hi , Hj )}n×n . M̃i M̃j in 4.3 is 56 the V statistic kernel that measures phenotype similarity weighted by f (Gi , Gj )k(Hi , Hj ), which measures genetic similarity, for a pair of subjects (i, j). The value of weighted V statistic is the weighted sum over all pairs of (i, j). Therefore, the larger (smaller) the phenotype similarity weighted by larger (smaller) genetic similarity results in larger (smaller) VZ value, therefore smaller (larger) p-value. The asymptotic null distribution of VI can be approximated similarly as VZ . Based on the distribution, we use Davies’ method (Davies, 1980) to compute p-value, P (VI ≥ VI,obs ), where VI,obs is the observed value of VI . For testing G-E interaction effect, the weighted V test statistic and its asymptotic null distribution can be obtained in the same way, except that H is replaced by an environmental variable (e.g., gender). The proposed interaction tests are expected to be more powerful than regular tests (e.g., LRT, Wald test) when there exists strong LD because the weighted V statistic has an effective degree of freedom lower than that of regular tests. 4.1.3 Small sample adjustment The association and interaction tests introduced above work efficiently when the sample size is large. However, as shown in the QQ plot in the simulation below, the distribution might not be accurate when n is relatively small (as compared to p), which is frequently encountered in whole-exome sequencing studies (Lee et al., 2012; Emond et al., 2012). Therefore, motivated by (Chen et al., 2016), we propose a small sample correction on VZ , VZH and VI , denoted, respectively, as VZc , VZH,c and VIc as follow, M̃T FM̃ M̃T WM̃ M̃T (F ⊙ K)M̃ VZc = , VZH,c = , and VIc = , M̃T M̃ M̃T M̃ M̃T M̃ 57 where M̃T M̃ accounts for the variability of martingale residual. After accounting for that, we expect that the approximated asymptotic null distribution of corrected weighted V statistics, VZc , VZH,c , and VIc have exact mixture of χ2 distribution. For VZc , based on the distribution, we can apply Davies’ method (Davies, 1980) to compute p-values by P (VZc ≥ VZ,obs c ) = T P ( M̃ FM̃ M̃T M̃ ≥ VZ,obs ) = P (M̃T (F − VZ,obs I)M̃ ≥ 0). The p-value of VZH,c and VIc can be computed similarly. 4.2 Simulations We performed a Monte Carlo simulation to assess the finite-sample performance of (hetero- geneity) weighted V tests on survival time following the AFT model, considering adjustment covariates, left-truncation time, and each individual potentially experiencing two competing events (Cause 1 and Cause 2). In all simulations, unless otherwise specified, two sample sizes n = 400, 500 and two SNP set sizes p = 3, 5 were considered. Genetic covariates, denoted as G = (G1 , . . . , Gp ), is a set of p SNPs (coded by 0, 1, or 2), unless otherwise specified. The generation of G follows a two-step procedure that considered potential LD structure, 1) sample n vectors independently from M V N (0, Σ), where Σ = {0.5|k−l| }p×p , the kl-th element is the covariance between Gk and Gl ; 2) categorize each quantitative column into SNP with three levels 0, 1, and 2 by using predetermined quantile cutoffs: a2 , (1 − a)2 , and 2a(1 − a) of standard normal distribution with a ∼ Beta(2, 5), to ensure the resulting pop- ulation is in Hardy-Weinberg equilibrium (HWE) and MAFs follow Beta(2, 5) distribution. Two baseline covariates were adjusted, one binary Z1 ∼ Binomial(0.5) and one continuous Z2 ∼ U nif orm(0, 2). Unless otherwise specified, we assumed that there is no confounding effect, in other words, Z is independent of G. The failure of all competing events was as- sumed to follow the AFT model, and the observed survival times were generated following 58 steps in section 3.2 of Beyersmann et al. (Beyersmann et al., 2009). Left truncation times were generated from U nif orm(0, 1) and the residual censoring times from Exponential(0.3) to control the right censoring rate around 35%. In all simulations, 1000 Monte Carlo samples were generated and the significance level of a test was set at 0.05 unless otherwise specified in testing genetic association under the scenarios where stringent p-value thresholds were considered. 4.2.1 Association test in the absence of genetic heterogeneity In this series of simulation, we investigated the empirical size and power of VZ in the absence of genetic heterogeneity and in the presence of left truncation time. Left truncation time was generated from U nif orm(0, 1). For subject i, both competing events were assumed to follow AFT model with following cause specific hazard (CSH) function, p 2 p 2 X X X X λ(t|Gi , Zi ) = λ0 (t · exp{− β1j Gij − β2l Zil }) exp{− β1j Gij − β2l Zil }, j=1 l=1 j=1 l=1 where λ0 (x) = x2 + x is baseline hazard function. Empirical size and power under various n’s and p’s In this simulation, we assessed the empirical size and power of VZ and VZH . Cause specific hazard (CSH) of Cause 1 and Cause 2 competing events for subject i are as follows,  p 2 p 2  X X X X  λ1 (t|Zi , Gi ) = λ0 (t · exp{− βj Gij − 0.1Zil }) exp{− βj Gij − 0.1Zil },    j=1 l=1 j=1 l=1 p 2 p 2  X X X X  λ2 (t|Zi , Gi ) = λ0 (t · exp{− 0.2Gij − 0.2Zik }) exp{− 0.2Gij − 0.2Zik },    j=1 k=1 j=1 k=1 59 where λ1 is CSH of Cause 1 (primary event). We set β1 = . . . = βp to 0 and 0.1 for empirical size and power assessment, while CSH for Cause 2 stays the same. IBS kernel was used to measure genetic similarity in both VZ and VZH . Gaussian kernel was used to measure similarity of adjustment covariates Z for test VZH . Table 4.1 shows that the Type I error of both VZ and VZH are close to nominal level and their power are comparable under various n′ s and p′ s. This indicates that when the genetic effect is homogeneous, VZ and VZH have similar performance. Table 4.1 Empirical size and power comparison of VZ and VZH in testing genetic effects under covariate adjustment and left truncation. Empirical Size (Power) p=3, n=400 p=3, n=500 VZ 0.051 (0.688) 0.054 (0.805) VZH 0.050 (0.690) 0.054 (0.798) p=5, n=400 p=5, n=500 VZ 0.052 (0.865) 0.048 (0.945) VZH 0.045 (0.849) 0.047 (0.940) Empirical size and power under quadratic confounding In this simulation, we assessed the empirical size and power of VZ in the absence of genetic heterogeneity. We assumed that G is a quadratic function of Z, in other words, there is quadratic confounding effect. The genetic covariate for subject i was replaced by gene 2 2 expression values generated by Gi = 0.5Zi1 + 0.5Zi2 + 0.25Zi1 + 0.25Zi2 + 0.5Zi1 Zi2 + ei (i = 1, . . . , n). The random error ei ∼ M V N (0, Σ), where Σ = {0.5|k−l| }p×p . CSH for Cause 60 1 and Cause 2 competing event of subject i are as follow  p 2  X X λ1 (t|Zi , Gi ) = λ0 (t · exp{− βj Gij − 0.1Zil })        j=1 l=1     p 2  X X     exp{− βj Gij − 0.1Zil }, j=1 l=1 p 2  X X λ2 (t|Zi , Gi ) = λ0 (t · exp{− 0.1Gij − 0.2Zik })        j=1 k=1     p 2 X X 0.1Gij − 0.2Zik },     exp{−  j=1 k=1 where λ1 is CSH for primary event (Cause 1), β1 = . . . = βp are set to 0 and 0.03 for empirical size and power assessment, while CSH for Cause 2 stays the same. Gaussian kernel was used to measure gene expression similarity. Figure 4.1 shows that p-value of VZ is uniformly distributed and table 4.2 shows Type I error of VZ is close to nominal level and power increases with sample size. The results show that VZ is able to handle quadratic confounding effect. Table 4.2 Empirical size and power of VZ in testing genetic effects under quadratic confounding and left truncation. Empirical Size (Power) p=3, n=400 p=3, n=500 VZ 0.050 (0.178) 0.049 (0.220) p=5, n=400 p=5, n=500 VZ 0.047 (0.279) 0.057 (0.325) 4.2.2 Association test in the presence of genetic heterogeneity In this series of simulations, we investigated the performance of association test statistic, VZH , in the presence of genetic heterogeneity across 1) observable subpopulations, 2) two latent subpopulations, 3) twenty subpopulations, and 4) individual genome profile. The 61 advantage of VZH over VZ is obvious through the performance comparison shown below. For all simulations in this section, we assume that the effect of G exists on survival time of both competing events, but the form of effect is heterogeneous for Cause 1 (primary event) and homogeneous for Cause 2. 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) n = 400, p = 3 (b) n = 400, p = 5 Figure 4.1 Q-Q plot of the null p-value of VZ under quadratic confounding with n = 400 and p = 3 or 5. Genetic heterogeneity across observable subpopulation In this simulation, we assessed the empirical size and power of heterogeneity weighted V statistic, VZH , in the presence of heterogeneity across two observable subpopulations. We used a one-dim vector to infer two equally weighted subpopulations, X = {Xi }ni=1 , where Xi = {0, 1}. Therefore, two genetic covariates (each has sample size n/2) were genertaed using two different sets of MAFs (∼ Beta(2, 5)) and LD structure (Σp×p = {ρ|k−l| } depends on ρ) as follow, MAFs = {0.144, 0.046, 0.042, 0.013, 0.166} and ρ = 0.5 in Subpopulation 1, MAFs = {0.095, 0.038, 0.450, 0.125, 0.265} and ρ = 0.3 in Subpopulation 2. 62 The first p MAFs will be used. We used a one-dimensional vector X = (X1 , . . . , Xn ) ∼ Binomial(0.5) to infer two observable subpopulations (e.g., male and female). The CSH of Cause 1 and Cause 2 for subject i are as follow,  Pp Pp  λ1 (t|Zi , Gi ) = λ0 (t·e− j=1 (β0 +β1 Xi )Gij −0.1Xi )e− j=1 (β0 +β1 Xi )Gij −0.1Xi   Pp P2 Pp P2  λ2 (t|Zi , Gi ) = λ0 (t·e− j=1 0.02Gij − k=1 0.2Zik )e− j=1 0.02Gij − k=1 0.2Zik ,   where β0 = β1 = 0 for empirical size assessment and (β0 , β1 ) = {(0.002, 0.1), (0.002, 0.2)} for power assessment under two different genetic heterogeneity sizes, measured by β1 . IBS kernel was used to measure genetic similarity and identity kernel used to measure subpopulation similarity. Performance of weighted V statistic, VZ , was also included. Table 4.3 shows that Type I error of both VZ and VZH are close to nominal level. Table 4.4 shows that VZH is more powerful than VZ . For a fixed mean effect size (β0 ), as heterogeneity size (β1 ) increases, VZH gains more power by taking advantage of the subpopulation structure in performing the test. Table 4.3 Empirical sizes of VZH and VZ in testing genetic association under genetic heterogeneity across two observable subpopulations and left truncation. Empirical Size p=3, n=400 p=3, n=500 VZH 0.054 0.058 VZ 0.054 0.057 p=5, n=400 p=5, n=500 VZH 0.046 0.049 VZ 0.045 0.051 Genetic heterogeneity across two latent subpopulations In this simulation, we assessed the empirical size and power of heterogeneity weighted V statistic in the presence of genetic heterogeneity across two latent subpopulations. A one- 63 Table 4.4 Powers of VZH and VZ in testing genetic association under genetic heterogeneity across two observable subpopulations and left truncation. Power p=3, n=400 p=3, n=500 β1 0.1 0.2 0.1 0.2 VZH 0.226 0.646 0.261 0.749 VZ 0.185 0.543 0.234 0.645 p=5, n=400 p=5, n=500 β1 0.1 0.2 0.1 0.2 VZH 0.289 0.733 0.368 0.860 VZ 0.259 0.604 0.303 0.711 dim vector X = (X1 , . . . , Xn ) was generated to infer two latent subpopulation structure. Xi = ai + 1 + ei (i = 1, . . . , n), where ai ∼ Binomial(0.5) and ei ∼ N ormal(0, 0.5). The CSH of Cause 1 and Cause 2 competing event for subject i in subpopulation j are as follows  p X λ1 (t|Zi , Gij ) = λ0 (t· exp{− Gijk βjk − 0.1Zi1 − 0.1Zi2 })·         k=1   p  X exp{− Gijk βjk − 0.1Zi1 − 0.1Zi2 }     k=1 Xp (4.4)      λ2 (t|Zi , Gij ) = λ0 (t· exp{− 0.02Gijk − 0.2Zi1 − 0.2Zi2 })·   k=1     p  X    exp{− 0.02Gijk − 0.2Zi1 − 0.2Zi2 }, k=1 where {β j = (βj1 , . . . , βjp ), s.t. βj1 = . . . = βjp }2j=1 are respectively genetic effects of p SNPs in two latent subpopulations. {β j }2j=1 were set to 0 for empirical size assessment and set to different values to indicate different genetic heterogeneity scenarios for power assessment. IBS kernel was used to measure genetic similarity and Gaussian kernel for subpopulation similarity. Table 4.5 shows that Type I error of both VZH and VZ are close to nominal level. Table 4.6 shows that when in the absence of genetic heterogeneity (T1), VZ is more powerful, 64 however, when in the presence of genetic heterogeneity, VZH is more powerful and gains more power as heterogeneity size increases. Table 4.5 Empirical sizes of VZ and VZH in testing genetic effects under genetic heterogeneity across two latent subpopulations, covariate adjustment and left truncation. Empirical Size p=3, n=400 p=3, n=500 VZH 0.047 0.050 VZ 0.052 0.058 p=5, n=400 p=5, n=500 VZH 0.053 0.044 VZ 0.045 0.049 Table 4.6 Powers of VZ and VZH in testing genetic effects under genetic heterogeneity across two latent subpopulations, covariate adjustment and left truncation. Various heterogeneity scenarios were considered, determined by the values of β1k and β2k , including the same effect size and the same effect direction (T1), identical sizes but opposite directions (T2), no effect in one subpopulation while positive effect in the other (T3), and different sizes but the same direction (T4). Heterogeneity Scenario T1 T2 T3 T4 β1k 0.04 0.08 -0.05 -0.1 0 0 0.03 0.03 β2k 0.04 0.08 0.05 0.1 0.08 0.12 0.08 0.1 VZH 0.252 0.768 0.775 0.972 0.719 0.906 0.601 0.784 p=3,n=400 VZ 0.376 0.891 0.056 0.043 0.370 0.636 0.596 0.746 VZH 0.327 0.825 0.826 0.979 0.768 0.945 0.703 0.868 p=3,n=500 VZ 0.376 0.927 0.057 0.047 0.467 0.716 0.702 0.817 VZH 0.403 0.900 0.984 1.000 0.950 0.994 0.855 0.958 p=5,n=400 VZ 0.594 0.976 0.062 0.069 0.557 0.775 0.829 0.894 VZH 0.467 0.968 0.988 1.000 0.968 0.998 0.935 0.979 p=5,n=500 VZ 0.729 0.995 0.047 0.068 0.659 0.866 0.905 0.958 Genetic heterogeneity across twenty latent subpopulation In this simulation, we assessed empirical size and power of heterogeneity weighted V statis- tic, VZH , in the presence of twenty latent subpopulations. The simulation setting is same to 65 simulation above, except that we increased the number of latent subpopulation to twenty. We used X = (X1 , . . . , X25 ) to infer subpopulation structure. Xk = ak + ek (k = 1, . . . , 25) where ak is a length n bootstrap sample from {1, . . . , 20} that represents subgroup assign- ments. Random error ek = (ek1 , . . . , ekn ) ∼ N ormal(0, 0.5). Regression coefficients for j subpopulations, {β j = (βj1 , . . . , βjp ), s.t. βj1 = . . . = βjp }20 j=1 , in CSH function 4.4 were set to 0 for empirical size assessment. In power assessment, {β j }20 j=1 were sampled from uniform distribution with mean µβ and variance σβ2 . IBS kernel was used to measure genetic similar- ity and Gaussian kernel for subpopulation similarity. Table 4.7 shows that Type I error of both VZ and VZH are close to nominal level. Table 4.8 shows that VZH is more powerful than VZ and gains power as heterogeneity size (σβ ) increases. Table 4.7 Empirical sizes of VZ and VZH in testing genetic effects under genetic heterogeneity across twenty latent subpopulations, covariate adjustment and left truncation. Empirical Size p=3, n=400 p=3, n=500 VZH 0.045 0.043 VZ 0.046 0.042 p=5, n=400 p=5, n=500 VZH 0.043 0.044 VZ 0.042 0.048 Genetic heterogeneity across individual genome profile In this simulation, we investigated the empirical size and power of heterogeneity weighted V statistic, VZH , in the presence of genetic heterogeneity across individual genome profile. In stead of considering two or twenty ’categorical’ subpopulation structure, we now assume a ’continuous’ subpopulation structure. In other words, each individual itself is a single ’subpopulation’. We used a genome profile of 1,000 SNPs, X = (X1 , . . . , X1000 ), to infer 66 Table 4.8 Power comparison of VZH and VZ under genetic heterogeneity across twenty latent subpopulations, covariate adjustment and left truncation. Power p=3, n=400 p=3, n=500 (µβ , σβ ) (0.02, 0.04) (0.02, 0.08) (0.02, 0.04) (0.02, 0.08) VZH 0.396 0.742 0.448 0.843 VZ 0.336 0.506 0.390 0.616 p=5, n=400 p=5, n=500 (µβ , σβ ) (0.02, 0.04) (0.02, 0.08) (0.02, 0.04) (0.02, 0.08) VZH 0.679 0.955 0.785 0.988 VZ 0.473 0.634 0.581 0.754 subpopulation structure. X was generated in three steps, 1) generate genetic effect sizes in each subgroup {β i = (βi1 , . . . , βip ), s.t. βi1 = . . . = βip }ni=1 , 2) generate a n-dim vector Xd (d = 1, . . . , 1000) from M V N (0, Σ), where Σ is a n × n identity matrix under null hypothesis and {exp(−|βi1 − βj1 |/σβ )}n×n under alternative hypothesis, 3) categorize Xd (d = 1, . . . , 1000) into SNPs coded by 0, 1, or 2 by using predetermined quantile cutoffs: a2 , 2a(1 − a), and (1 − a)2 of a standard normal distribution where a is MAF generated from Beta(2, 5), to ensure the resulting population is in HWE. IBS kernel was used to measure both genetic and subpopulation similarity. Table 4.9 shows that Type I error of both VZ and VZH are close to nominal level 0.05. Table 4.10 shows that VZH is more powerful than VZ . Furthermore, for fixed mean effect size (µβ = 0.03), as heterogeneity size (σβ ) increases from 0.02 to 0.04, VZH gains substantially more power while the power of VZ does not change much as heterogeneity size increases. 4.2.3 Testing G-G/G-E interaction effect In this simulation, we investigated the performance of weighted V statistic, VI , in testing interaction effect. The performance was also compared with Wald test. For testing G-G 67 Table 4.9 Empirical sizes of VZH and VZ in testing genetic effects under genetic heterogeneity across individual genome profiles, covariate adjustment and left truncation. Empirical Size p=3, n=400 p=3, n=500 VZH 0.052 0.058 VZ 0.053 0.058 p=5, n=400 p=5, n=500 VZH 0.046 0.049 VZ 0.047 0.051 Table 4.10 Powers of VZH and VZ under genetic heterogeneity across individual genome profiles, covariate adjustment and left truncation. Power p=3, n=400 p=3, n=500 (µβ , σβ ) (0.03, 0.02) (0.03, 0.04) (0.03, 0.02) (0.03, 0.04) VZH 0.253 0.328 0.308 0.400 VZ 0.241 0.242 0.290 0.292 p=5, n=400 p=5, n=500 (µβ , σβ ) (0.03, 0.02) (0.03, 0.04) (0.03, 0.02) (0.03, 0.04) VZH 0.449 0.649 0.550 0.756 VZ 0.390 0.381 0.480 0.477 interaction effect between two genes G = (G1 , . . . , Gp ) and H = (H1 , . . . , Hq ). We assumed that two genes come from the same population, therefore have the same MAF and LD structures. CSH of Cause 1 and Cause 2 competing events for subject i are as follows  p q pq X X X 0.2Gij − 0.2Hij − βm (GH)m })·     λ1 (t|Zij , Gij ) = λ0 (t· exp{−  j=1 j=1 m=1       p q pq  X X X     exp{− 0.2Gij − 0.2Hij − βm (GH)m } j=1 j=1 m=1 p q  X X     λ2 (t|Zij , Gij ) = λ0 (t· exp{− 0.2Gij − 0.2Hij })·   j=1 j=1     p q  X X exp{− 0.2Gij − 0.2Hij },     j=1 j=1 68 where λ1 is CSH for Cause 1 competing event (primary event), (GH)m is m-th column of interaction between G and H. Two SNP-set sizes (p = 1, q = 2) and (p = 2, q = 2) and two sample sizes n = 600, 1000 were used. We set βm = 0 under null hypothesis of no G-G interaction effect to Cause 1 competing event and β=1 under alternative hypothesis, when G-G interaction effect exists only in Cause 1 (primary event). Cross-product kernel was used to measure genetic similarity for both genes. For testing G-E interaction effect, the simulation setting is the same as above ex- cept that H is replaced by a one-dim vector binary environmental variable generated from Binomial(0.5). Two SNP-set sizes p = 2, 3 and two sample sizes n = 500, 900 were used. Identity kernel was used to measure environmental variable similarity. Table 4.11 and 4.12 show that Type I error of VI is close to nominal level while Wald test is slightly inflated under small sample size (e.g., under p = 2, q = 2, n = 600). In all scenarios, VI is more powerful than the Wald test as expected, since the asymptotic null distribution of VI has a more effective degree of freedom. Table 4.11 Empirical sizes and powers of VI and the Wald test in testing G-G interaction under left truncation. Empirical Size (Power) p=1, q=2, n=600 p=1, q=2, n=1000 VI 0.054 (0.217) 0.059 (0.289) Wald 0.048 (0.139) 0.059 (0.206) p=2, q=2, n=600 p=2, q=2, n=1000 VI 0.065 (0.408) 0.060 (0.550) Wald 0.075 (0.202) 0.059 (0.375) 69 Table 4.12 Empirical sizes and powers of VI and the Wald test in testing G-E interaction under left truncation. Empirical Size (Power) p=2, q=1, n=500 p=2, q=1, n=900 VI 0.052 (0.329) 0.055 (0.561) Wald 0.047 (0.276) 0.054 (0.507) p=3, q=1, n=500 p=3, q=1, n=900 VI 0.061 (0.381) 0.053 (0.617) Wald 0.085 (0.342) 0.045 (0.529) 4.2.4 Empirical size under stringent p-value thresholds In case-control studies, there is usually a huge amount of genetic variants are tested. To account for the multiple testing issue, Bonferroni correction is used and results in p-value threshold 5 × 10−3 or smaller. In this simulation, we investigated the performance of 1) VZ in the absence of genetic heterogeneity and 2) VZH in the presence of genetic heterogeneity across observable subpopulations under stringent p-value thresholds. The corresponding simulation settings for significance level 0.05 were adopted, except that only p = 3, n = 500 was considered and 500K Monte Carlo samples were generated. The Type I errors were the proportion of p-values that are smaller than the corresponding p-value thresholds. Table 4.13 shows that Type I error of both VZ and VZH are close to nominal level, indicating the capability of VZ and VZH for large-scale GWAS. 4.2.5 Small-sample correction In this series of simulations, we investigated the performance of small sample adjusted (het- erogeneity) weighted V statistic in the following three scenarios: 1) VZ in the absence of genetic heterogeneity, 2) VZH in the presence of genetic heterogeneity across observable sub- populations, and 3) VI in the presence of G-G and G-E interaction effect. 70 Table 4.13 Empirical sizes of VZ and VZH under stringent p-value thresholds and left truncation. Empirical Size Threshold VZ VZH 0.05 0.051 0.049 0.005 0.0052 0.0047 0.0005 0.00054 0.00048 0.00005 0.000060 0.000054 In the absence of genetic heterogeneity under small sample size In this simulation, we assessed the empirical size and power of VZc and VZ in testing genetic association in the absence of genetic heterogeneity. The simulation settings were the same as that of the simulations shown above, except that we used sample size n = 100 and SNP-set size p = 15. Table 4.14 and Figure 4.2 show that small sample corrected weighted V statistic, VZc , is asymptotically correct without loss of power, while unadjusted weighted V statistic, VZ , is not. Table 4.14 Empirical size and power of test VZ and VZc considering left truncation, with n = 100, p = 15. Empirical Size (power) VZc 0.045 (0.417) VZ 0.034 (0.346) In the presence of genetic heterogeneity across observable subpopulations under a small sample size In this simulation, we assessed the empirical size and power of VZH,c and VZH in testing genetic association in the presence of genetic heterogeneity across observable subpopulation. The simulation settings were the same as that of simulations shown above, except that we used sample size n = 200 and SNP-set size p = 10. In power assessment, two genetic 71 heterogeneity sizes were considered: β1 = 0.02, 0.04. Table 4.15 and figure 4.3 show that small sample corrected heterogeneity weighted V statistic, VZH,c , is asymptotically correct without loss of power, while VZH is not. (a) VZ (b) VZc Figure 4.2 Comparison of uniform QQ plot of test VZ and VZc in detecting genetic association without considering genetic heterogeneity effect, with n=100 and p=15. Table 4.15 Empirical size and power of test VZH and VZH,c considering genetic heterogeneity across two observable subpopulations and left truncation, with n = 200, p = 10. Epirical Size Power β1 0 0.02 0.04 VZH,c 0.051 0.343 0.395 VZH 0.045 0.342 0.394 Interaction test under small sample size In this simulation, we assessed the empirical size and power of weighted V statistics, VIc and VI , in testing G-G and G-E interaction effect. The simulation settings were the same as that of simulations with large sample size in previous sections, except that we used n = 600 and p = q = 2 for testing G-G interaction effect and n = 500 and p = 2, q = 1 for testing G-E interaction effect. Table 4.16, 4.17 and figure 4.4, 4.5 show that the p-values of small sample corrected weighted V statistic, VIc , is close to U nif orm(0, 1) distribution, while VI 72 is not. Table 4.16 Empirical size and power of test VI and VIc in testing G-G interaction effect, considering left truncation, with n = 600, p = q = 2. Empirical Size (power) VIc 0.064 (0.413) VI 0.065 (0.408) Table 4.17 Empirical size and power of test VI and VIc in testing G-E interaction effect, considering left truncation, with n = 500, p = 2. Empirical Size (power) VIc 0.054 (0.334) VI 0.052 (0.329) 4.3 A Real Application We applied the developed small sample adjusted (heterogeneity) weighted V test statistics, VZc and VZH,c , on the ROSMAP dataset by assuming the age-to-onset of Alzheimer’s disease follows the AFT model. ROSMAP is a GWAS that includes dementia exam data from two large longitudinal studies of aging and dementia: 1) the Religious Orders Study (ROS) and the Rush Memory and Aging Project (MAP) (Bennett et al., 2018). The ROSMAP genotype dataset includes 1,679 subjects, each has 750,173 SNPs. We first performed quality control on genotype dataset 1) at SNP level by removing SNPs with MAF>0.01, HWE test’s p- value > 10−6 , or missing rate > 2%, then 2) at the subject level by removing subjects with missing genotype rate > 2%. This results in a quality-controlled genotype dataset with 1,618 subjects each having 619,061 SNPs. There were still ∼ 0.3% missing genotypes, which were then imputed with random samples generated from Binomial(2, M AF ), where MAF was estimated from the genotype dataset. PCA was performed using Plink software (version 1.9) to obtain the first 6 principal components that represent the underlying ancestry information. 73 After that, we grouped all available SNPs into 21,285 different genes by using human genome annotation (GRCh38/hg38) obtained from UCSC Genome Browser. We also formed a new gene APOE4 which was coded as the count of the APOE4-ϵ4 allele. The ROSMAP phenotype dataset includes information from 2,543 subjects. Each subject has 1) baseline covariates: race, gender, years of education, and binary study cohort indicator (ROS or MAP) and 2) follow-up information: subject’s age and clinical cognitive diagnosis result at each examination center visit including at the study registry. For our analysis, we include only subjects who are disease-free in the study registry. Age at baseline served as left-truncation time, the age at first AD diagnosis served as survival time, and the age at study termination served as right censoring time if the subject is disease-free throughout the study follow-up period. 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) VZH (b) VZH,c Figure 4.3 Uniform QQ plot of VZH and VZH,c , considering genetic heterogeneity across two observable subpopulations, with n=200 and p=10. We merged preprocessed genotype and phenotype datasets by keeping only the overlap- ping subjects to generate the analysis-ready dataset, which includes 1,440 subjects, with a censoring rate of 62.5%. We are interested in 1) testing genetic covariates that are associated 74 with the time-to-onset of first AD diagnosis and 2) detecting potential genetic heterogeneity across subpopulations inferred by either baseline covariates or individual genome profiles (represented by a random sample of 200K SNPs), in the presence of left-truncation time and adjust for gender, year of education, study cohort indicator (ROS served as baseline), and first 6 principal components. Note that race is not adjusted because 1,439 out of 1,440 subjects are white. IBS kernel was used to measure genetic similarity, while the choice of similarity kernel for subpopulation similarity depends on the source of heterogeneity, which is elaborated in Table 4.18. 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) VI (b) VIc Figure 4.4 Comparison of uniform QQ plot of test VI and VIc in testing G-G interaction effect, with n=600, p=q=2. Table 4.18 shows the analysis results of the ROSMAP dataset. To account for multiple testing issues, the p-value thresholds were obtained by controlling the FDR under 10% using the Benjamini-Hochberg procedure Benjamini and Hochberg (1995). Four different kernels were used to measure genetic similarity. When in the absence of genetic heterogeneity (scenario S1), APOE4 appeared to be the most significant gene followed by APOC1. When in the presence of genetic heterogeneity (scenarios S2-S4), even though APOE4 and APOC1 75 were still the most significant genes, their p-values varies across scenarios and sources of heterogeneity, with notably smaller p-values when Laplacian kernel was used under scenario S2 and S4 that considered genetic heterogeneity across sexes and genetic background. In other words, the effect of APOE4 and APOC1 on age-to-onset of AD varies across subjects’ gender and genetic background. In addition to APOE4 and APOC1, IGSF23 is frequently detected as the next most significant gene. The function of IGSF23 is listed in table 4.19. 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) VI (b) VIc Figure 4.5 Comparison of uniform QQ plot of test VI and VIc in testing G-E interaction effect, with n=500, p=2, q=1. 4.4 Discussion We developed a suite of novel multi-marker genetic association and interaction tests based on weighted V statistics. The proposed tests have four major advantages: 1) they can handle competing risks, 2) they are accurate under a small sample size, 3) the heterogeneity weighted V statistic is powerful when the genetic effect is heterogeneous in the underlying population, and 4) they can handle quadratic confounding effect. The asymptotic null distribution of weighted V statistic can be well approximated by a rescaled χ2 distribution, c0 χ2d0 , using the Satterthwaite method (Liu, Lin, and Ghosh, 2007). 76 Table 4.18 Top five genes discovered by VZ and VZH,c from ROSMAP dataset by assuming the age-to-onset of Alzheimer’s disease follows AFT model. IBS, CP, Lap and Quad stand for IBS, cross-product, Laplacian and Quadratic kernel, respectively, that measures genetic similarity. Various types of heterogeneity were considered, including no genetic heterogeneity (S1), heterogeneity between sexes (S2), heterogeneity across education categories (S3), and heterogeneity across genetic backgrounds (S4). Kernel Scenario Genes and p-values APOE4 APOC1 IGSF23 TBCC HSBP1 S1 3.46E-14 1.72E-09 6.99E-05 7.20E-05 1.21E-04 APOE4 APOC1 PLEKHG5,TNFRSF25 TBCC GRIP1 S2 6.45E-14 2.61E-09 9.45E-05 1.41E-04 1.91E-04 IBS APOE4 APOC1 EFEMP2 MUS81 TBCC S3 6.08E-10 1.01E-07 1.58E-04 6.08E-04 9.74E-04 APOE4 APOC1 IGSF23 TBCC HSBP1 S4 7.63E-14 1.50E-09 7.09E-05 7.26E-05 1.22E-04 APOE4 APOC1 IGSF23 MTMR2 HSBP1 S1 6.66E-16 2.35E-10 4.14E-05 9.32E-05 9.91E-05 APOE4 APOC1 IGSF23 GRIP1 MTMR2 S2 1.44E-15 1.11E-09 9.46E-05 1.06E-04 1.42E-04 CP APOE4 APOC1 TBCC IGSF23 TTC1 S3 1.15E-14 1.08E-09 3.40E-05 9.31E-05 1.50E-04 APOE4 APOC1 IGSF23 MTMR2 HSBP1 S4 6.21E-15 3.19E-10 4.23E-05 9.45E-05 9.84E-05 APOE4 APOC1 IGSF23 GRIP1 TBCC S1 1.00E-12 3.83E-08 6.80E-05 8.70E-05 1.23E-04 APOE4 APOC1 GRIP1 MTMR2 IGSF23 S2 4.57E-13 2.72E-08 1.13E-04 2.07E-04 2.09E-04 Lap APOE4 APOC1 EFEMP2 TBCC DCAF12 S3 6.05E-12 1.27E-08 1.40E-04 3.15E-04 3.88E-04 APOE4 APOC1 IGSF23 GRIP1 TBCC S4 3.26E-13 5.11E-09 6.92E-05 9.29E-05 1.24E-04 APOE4 APOC1 GRIP1 EHHADH-AS1 MTMR2 S1 9.99E-16 2.52E-10 1.07E-04 1.31E-04 2.12E-04 APOE4 APOC1 EHHADH-AS1 GRIP1 C16orf54 S2 6.66E-16 1.07E-09 8.77E-05 9.69E-05 1.33E-04 Quad APOE4 APOC1 TBCC GRIP1 TTC1 S3 2.44E-15 2.07E-09 1.14E-04 1.60E-04 1.78E-04 APOE4 APOC1 GRIP1 EHHADH-AS1 GABBR1 S4 4.11E-15 1.23E-08 1.08E-04 1.29E-04 2.14E-04 p-value threshold 4.46E-07 8.91E-07 1.34E-06 1.78E-06 2.23E-06 Table 4.19 Function of IGSF23, the third most frequent top gene, according to Entrez. Gene Function This gene encodes a protein that has one immunoglobulin (Ig) domain and is a member of the immunoglobulin superfamily. Proteins in this superfamily are IGSF23 usually found on or in cell membranes and act as receptors in immune response pathways. c0 is the scaled parameter and d0 is the χ2 distribution’s degree of freedom estimated by matching the mean and variance of weighted V statistics and c0 χ2d0 . In testing G-G and G-E interaction effect, the scaled χ2 distribution of weighted V statistic has d0 degree of freedom, 77 which is smaller than pq, the number of G-G interaction terms, when there is a weak or strong correlation between SNPs. Therefore the weighted V statistic is more powerful than regular tests (e.g., LRT, Wald test). 78 CHAPTER 5 Multi-marker genetic association and interaction tests based on the additive hazards model 5.1 Methods A series of association and interaction tests were developed for survival times based on an additive hazards model possibly under competing risks and/or left truncation. 5.1.1 Association test Consider a cohort of n independent individuals who have not experienced any competing events at baseline. We assume that each individual is subject to multiple potential sources of failure event and the occurrence of one event impedes the occurrence of all other events. Let G = (G1 , . . . , Gp ) be genetic covariates that is either a set of SNPs, gene expression values or a genetic pathway. We are interested in detecting the genetic association with age- to-onset of Cause 1 competing event (primary event) with or without considering genetic heterogeneity. Denote A as left truncation time (e.g., age at study registry), T̃ = min(T, C) is observed survival time subject to right censoring, where T is survival time and C is right censoring time. ∆ = I(T ≤ C) and ϵ is cause of failure, ϵ = 1 for Cause 1 competing event (primary event) and ϵ = 0 for all other causes. Baseline covariates, Z = (Z1 , . . . , Zk ) is adjusted and when in the presence of genetic heterogeneity, X = (X1 , . . . , Xd ) is used to infer subpopulation structure. Observed data for n subjects is denoted as D = (D1 , . . . , Dn ), with Di = (Ai , T̃i , ∆i , ∆i ϵi , GTi , ZTi , XTi ) (Xi is included only when in the presence of genetic heterogeneity). We assume the survival time T , the residual censoring time C − A and the left truncation time A are conditionally independent given G and Z (and X if considering genetic heterogeneity). When in the absence of genetic heterogeneity, under the null hypothesis that G is in- 79 dependent of survival time of Cause 1 competing event (primary event), we assume that survival time for subject i (i = 1, . . . , n), given adjustment covariate Z, follows additive hazards model with the following CSH function, λ(t|Zi ) = λ0 (t) + β T Zi , (5.1) where λ0 (t) is an unknown baseline hazard function. In real application, the regression parameter β and cumulative baseline hazard function Λ0 (t) are estimated by fitting a semi- parametric additive hazards model (Lin and Ying, 1994) to observed data D. The resulting estimators are β̂ and t Pn T i=1 {dN i (u) − Yi (u)β Zi } Z Λ̂0 (β, t) = P n , (5.2) 0 j=1 Yj (u) where Ni (u) = I(T̃ ≤ u, ∆i ϵi = 1) is counting process and Yi (u) = I(Ai < u ≤ T̃ ) is at-risk R∞ process for subject i at time u. Define Mi = Mi (Λ0 , β) = Ni (∞)− 0 Yi (t){dΛ0 (t)+β T Zi dt} the martingale residual, M̂i = Mi (Λ̂0 (β, t), β). By plugging in the β̂ or Λ̂0 (β, t) in equation 5.2, we have M̃i = Mi (Λ̂0 (β̂, t), β̂). The weighted V test statistics can be written as VZ = M̃T FM̃ Xn X n = f (Gi , Gj )M̃i M̃j , (5.3) i=1 j=1 where M̃ = (M̃1 , . . . , M̃n ) and F = {f (Gi , Gj )}n×n . M̃i M̃j in 5.3 is considered as V statistic kernel that measures phenotype similarity weighted by f (Gi , Gj ), measures genetic similar- 80 ity, for a pair of subjects (i, j). The weighted V statistic value is the weighted sum over all pairs of (i, j). Therefore, the larger (smaller) the phenotype similarity weighted by larger (smaller) genetic similarity results in larger (smaller) VZ value, therefore smaller (larger) p-value. To get the asymptotic null distribution of VZ , we decompose F = E1T E1 for some m × n matrix E1 , where m(1 ≤ m ≤ n) is the rank of F. We then have weighted V statistic in quadratic form, VZ = (M̃T E1T )(E1 M̃), and the asymptotic null distribution can be approxi- Pm mated by j=1 λj χ21j , which is linear combination of independent χ2 variables of 1 degree of freedom weighted by {λi }m i=1 , the eigenvalues of estimator of Cov(E1 M̃) (derivation shown in Appendix). Based on this distribution, the p-values can be computed using Davies’ method (Davies, 1980), P (VZ ≥ VZ,obs ) , where VZ,obs is the observed value of VZ . When in the presence of genetic heterogeneity inferred explicitly or implicitly by X = (X1 , . . . , Xd ), the heterogeneity weighted V test statistic is an extension of 5.3, denoted as VZH = M̃T WM̃, where W = (J+H)⊙F, H = {h(Xi , Xj )}n×n , h(Xi , Xj ) is a kernel function that measures subpopulation similarity between subject i and j. ⊙ is element-wise matrix multiplication. The asymptotic null distribution of VZH can be approximated similarly as that of VZ introduced above by replacing F with W in equation 5.3. Based on the estimated distribution, p-value can be computed by P (VZH ≥ VZ,obs H ) using Davies’ method (Davies, H 1980), where VZ,obs is observed value of VZH . 5.1.2 G-G/G-E interaction test We developed two interaction tests to detect the potential association between G-G and G-E interaction effect on age-to-onset of Cause 1 competing event (primary event) considering left-truncation time. 81 In testing G-G interaction effect of two genes G = (G1 , . . . , Gp ) and H = (H1 , . . . , Hq ). Under the null hypothesis of no G-G interaction effect on Cause 1 competing event, the adjustment covariate Z is replaced with (G, H) and the genetic main effect is assumed to be homogeneous. Under the null hypothesis, the CSH function of subject i given G and H is as follows λ(t|Gi , Hi ) = λ0 (t) + β T Gi + αT Hi , (5.4) where λ0 (t) is unspecified baseline hazard function. In real application, we fit a semiparamet- ric additive hazards model (Lin and Ying, 1994) to observed data D and obtain the estimates of regression coefficients β, α and cumulative baseline hazard function Λ0 , denoted as β̂, α̂, and Λ̂0 , where Λ̂0 is as below t Pn T i=1 {dNi (u)P− Yi (u){β Gi + αHi } Z Λ̂0 (β, α, t) = n , (5.5) 0 j=1 Yj (u) where Ni (u) is counting process and Yi (u) is at-risk process for subject i. Define mar- R∞ tingale residual Mi = Mi (Λ0 , β, α) = Ni (∞) − 0 Yi (t){dΛ0 (t) + (β T Gi + αT Hi )dt}, M̂i = Mi (Λ̂0 (β, α, t), β, α), and M̃i = Mi (Λ̂0 (β̂, α̂, t), β̂, α̂). The weighted V test statis- tic for G-G interaction effect is VI = M̃T (F ⊙ K)M̃ Xn X n = f (Gi , Gj )k(Hi , Hj )M̃i M̃j . i=1 j=1 where F = {f (Gi , Gj )}n×n and K = {k(Hi , Hj )}n×n are genetic similarity matrices, and ⊙ is element-wise matrix multiplication. The asymptotic null distribution of VI can be 82 approximated similarly to VZ by replacing F with F ⊙ K in equation 5.3. Based on this estimated distribution, we use Davies’ method (Davies, 1980) to compute p-value, P (VI ≥ VI,obs ), where VI,obs is the observed value of VI . For testing G-E interaction effect, the weighted V test statistic is performed in the same way as testing G-G interaction effect, except that H is replaced by an environmental variable (e.g., gender). The proposed interaction tests are expected to be more powerful than regular tests (e.g., LRT, Wald test), and the advantage over regular tests increases as LD gets stronger because the asymptotic null distribution of weighted V statistic has an effective degree of freedom lower than that of regular tests. 5.1.3 Small sample adjustment The association and interaction tests introduced above work efficiently when sample size is large. However, as shown in the QQ plot in Figure 5.1 (a) and Figure 5.2 (a) in simulation below, the distribution might not be accurate when n is relative small (as compared to p), which is frequently encountered in whole-exome sequencing studies (Lee et al., 2012; Emond et al., 2012). Therefore, motivated by Chen et al. (2016), we propose a small sample correction on VZ , VZH and VI , denoted, respectively, as VZc , VZH,c and VIc as follow, M̃T FM̃ M̃T WM̃ M̃T (F ⊙ K)M̃ VZc = , VZH,c = , and VIc = , M̃T M̃ M̃T M̃ M̃T M̃ where M̃T M̃ accounts for the variability of martingale residual. We expect that the approx- imated asymptotic null distribution of corrected (heterogeneity) weighted V statistics, VZc , VZH,c , and VIc have exact mixture of χ2 distribution. For VZc , based on the distribution, we can T apply Davies’ method (Davies, 1980) to compute p-values by P (VZc ≥ VZ,obs c ) = P ( M̃ FM̃ M̃T M̃ ≥ 83 VZ,obs ) = P (M̃T (F − VZ,obs I)M̃ ≥ 0). The p-value of VZH,c and VIc can be computed similarly. 5.2 Simulations We performed a Monte Carlo simulation to assess the finite-sample performance of (hetero- geneity) weighted V tests on survival time that follow the additive hazards model, considering adjustment covariates and left-truncation time. Each individual potentially experiences two competing events with Cause 1 (primary event) and Cause 2. In all simulations, unless otherwise specified, two sample sizes n = 600, 800 and two SNP set sizes p = 8, 12 were considered. Genetic covariates, denoted as G = (G1 , . . . , Gp ), is a set of p SNPs (coded by 0, 1, or 2), unless otherwise specified in the scenario under confounding effect. The gener- ation of G follows a two-step procedure that considered potential LD structure, 1) sample n vectors independently from M V N (0, Σ), where Σ = {0.5|k−l| }p×p , the kl-th element is the covariance between Gk and Gl ; 2) categorize each quantitative column into SNP with three levels 0, 1, and 2 by using predetermined quantile cutoffs: a2 , (1 − a)2 , and 2a(1 − a) of standard normal distribution with a ∼ Beta(2, 5), to ensure the resulting population is in Hardy-Weinberg equilibrium (HWE). Two baseline covariates were adjusted Z = (Z1 , Z2 ), one binary Z1 ∼ Binomial(0.5) and one continuous Z2 ∼ U nif orm(3, 5). Unless otherwise specified, we assumed that there is no confounding effect, in other words, Z is independent of G. The failure of all competing events was assumed to follow the additive hazards model, and the observed survival times were generated following steps in section 3.2 of Beyersmann et al. (2009). Left truncation times were generated from U nif orm(0, 1) and the residual censoring times from Exponential(0.3) to control the right censoring rate around 35%. In all simulations, 1000 Monte Carlo samples were generated and the significance level of a test was set at 0.05 unless otherwise specified in testing genetic association under the stringent 84 p-value threshold scenario. 5.2.1 Association test in the absence of genetic heterogeneity In this series of simulation, we investigated the empirical size and power of weighted V statistic (VZ ) in the absence of genetic heterogeneity. For subject i, both competing events were assumed to follow additive hazards model with cause specific hazard (CSH) function of the following form, λ(t|Zi ) = ct2 + β T Gi + αT Zi , (5.6) where ct2 is baseline hazard function. Coefficients, c, β, α vary depending on simulation scenario and cause of failure. Empirical size and power under various n’s and p’s In this simulation, we assessed the empirical size and power of VZ and VZH . CSH function of both competing events for subject i (i = 1, . . . , n) are as follow,   λ1 (t|Zi , Gi ) = 0.6t2 + pj=1 βj Gij + 2k=1 0.5Zik ,   P P  λ2 (t|Zi , Gi ) = 0.3t2 + pj=1 0.05Gij + 2k=1 0.1Zik ,   P P where λ1 is CSH of Cause 1 competing event. We set β1 = . . . = βp to 0 and 0.2 for empirical size and power assessment, while CSH for Cause 2 competing event stays the same. IBS kernel was used to measure genetic similarity in both VZ and VZH . Gaussian kernel was used to measure similarity of adjustment covariates Z for test VZH . Table 5.1 shows that the Type I error of both tests are close to nominal level and their power are comparable under various n′ s and p′ s. This indicates that when the genetic effect is homogeneous, both VZ and VZH are valid and have similar statistical power. 85 Table 5.1 Empirical size and power comparison of test VZ and test VZH in testing genetic effects considering adjustment covariates and left truncation. Empirical Size (Power) p=8, n=600 p=8, n=800 VZ 0.058 (0.584) 0.052 (0.667) VZH 0.059 (0.572) 0.049 (0.674) p=12, n=600 p=12, n=800 VZ 0.050 (0.653) 0.054 (0.784) VZH 0.052 (0.640) 0.052 (0.772) Empirical size and power under quadratic confounding In this simulation, we assessed the empirical size and power of VZ in the absence of genetic heterogeneity. We assumed that Z and G are quadratically related, in other words, there is quadratic confounding effect. The genetic covariate was replaced by gene expression values 2 2 generated by Gi = Zi1 + Zi2 + Zi1 + Zi2 + Zi1 Zi2 + ei (i = 1, . . . , n), where ei ∼ M V N (0, Σ), where Σ = {0.5|k−l| }p×p the covariance between any pair of genetic variants (Gk , Gl ). CSH function for Cause 1 and Cause 2 competing events of subject i are as follow,   λ1 (t|Zi , Gi ) = 9t2 + pj=1 βj Gij + 2k=1 0.1Zik ,   P P  λ2 (t|Zi , Gi ) = 3t2 + pj=1 0.05Gij + 2k=1 0.025Zik ,   P P where λ1 is CSH for Cause 1 competing event, β1 = . . . = βp are set to 0 and 0.1 for empirical size and power assessment, while CSH for Cause 2 stays the same. Gaussian kernel was used to measure gene expression similarity. Table 5.2 shows that Type I error of VZ is close to nominal level and power increases with sample size. This indicates that weighted V statistic handles quadratic confounding effect properly. 86 Table 5.2 Empirical size and power of test VZ in testing genetic effects under quadratic confounding and left truncation. Empirical Size (Power) p=4, n=600 p=4, n=800 VZ 0.047 (0.212) 0.051 (0.320) p=6, n=600 p=6, n=800 VZ 0.051 (0.241) 0.054 (0.329) 5.2.2 Association test in the presence of genetic heterogeneity In this series of simulations, we investigated the performance of heterogeneity weighted V statistics, VZH , in the presence of genetic heterogeneity across 1) observable subpopulations, 2) two latent subpopulations, 3) twenty subpopulations, and 4) individual genome profile. The advantage of heterogeneity weighted V statistic VZH over VZ is obvious through the performance comparison shown below. For all simulations in this section, we assume that the effect of G on survival time is heterogeneous for Cause 1 and homogeneous for Cause 2. Genetic heterogeneity across observable subpopulation In this simulation, we assessed the empirical size and power of heterogeneity weighted V statistic, VZH , in the presence of heterogeneity across two observable subpopulations. There- fore, two genetic covariates (each has sample size n/2) were genertaed using two different sets of MAFs (∼ Beta(2, 5)) and LD structure (measured by ρ in Σ = {ρ|k−l| }) as follow, MAFs = {0.16187719, 0.24027357, 0.29026717, 0.42216265, 0.10528255, 0.50324631, 0.13275845, 0.48439386, 0.07259317, 0.20828786, 0.22822443, 0.39021594} and ρ = 0.5 in Subpopulation 1; MAFs = {0.17645756, 0.22244350, 0.08635246, 0.40581343, 0.49636636, 0.48130878, 0.25831106, 0.38090314, 0.14896296, 0.16374183, 0.29928099, 0.04836184} and ρ = 0.3 87 in Subpopulation 2. The first p MAFs will be used. We used a one-dimensional vector X = (X1 , . . . , Xn ) ∼ Binomial(0.5) to infer two observable subpopulations (e.g., male and female). The CSH of Cause 1 and Cause 2 for subject i are as follow,   λ1 (t|Zi , Gi ) = 9t2 + pj=1 (β0 + β1 Zi )Gij + 0.1Xi ,   P  λ2 (t|Zi , Gi ) = 3t2 + pj=1 0.001Gij + 0.1Xi ,   P where β0 = β1 = 0 for empirical size assessment and (β0 , β1 ) = {(0.002, 0.6), (0.002, 0.7)} for power assessment under different genetic heterogeneity sizes (β1 = 0.6, 0.7). IBS kernel was used to measure genetic similarity and identity kernel to measure subpopulation similarity. Performance of weighted V statistic, VZ , was also included. Table 5.3 shows that Type I error of both VZ and VZH are close to nominal level. Table 5.4 shows that VZH is more powerful than VZ by taking advantage of the subpopulation structure. Also, as size of heterogeneity increases, VZH gains more power. Table 5.3 Empirical sizes of test VZ and test VZH in testing genetic association under genetic heterogeneity across two observable subpopulations, considering adjustment covariates and left truncation. Empirical Size p=8, n=600 p=8, n=800 VZH 0.059 0.054 VZ 0.058 0.052 p=12, n=600 p=12, n=800 VZH 0.046 0.049 VZ 0.046 0.045 88 Table 5.4 Power of test VZ and test VZH in testing genetic association under genetic heterogeneity across two observable subpopulations, considering adjustment covariates and left truncation. Power p=8, n=600 p=8, n=800 β1 0.6 0.7 0.6 0.7 VZH 0.109 0.554 0.169 0.697 VZ 0.103 0.454 0.158 0.551 p=12, n=600 p=12, n=800 β1 0.6 0.7 0.6 0.7 VZH 0.135 0.594 0.192 0.777 VZ 0.114 0.472 0.176 0.633 Genetic heterogeneity across two latent subpopulations In this simulation, we assessed the empirical size and power of VZH in the presence of genetic heterogeneity across two latent subpopulations. A one-dim vector X = (X1 , . . . , Xn ) was generated to infer two latent subpopulation structure. Xi = ai + 1 + ei (i = 1, . . . , n), where ai ∼ Binomial(0.5) and ei ∼ N ormal(0, 0.5). The CSH of Cause 1 and Cause 2 for subject i in subpopulation j are as follow   λ1 (t|Zij , Gij ) = 9t2 + pk=1 Gijk βjk + 0.5Zij1 + 0.5Zij2 ,   P (5.7)  λ2 (t|Zij , Gij ) = 3t2 + pk=1 0.0025Gik + 0.1Zi1 + 0.1Zi2 ,   P where λ1 is CSH of Cause 1, which is of our primary interest and λ2 is CSH of the other competing risk. {β j = (βj1 , . . . , βjp ) s.t. βj1 = . . . = βjp }2j=1 are genetic effects of p SNPs in two latent subpopulations. {β j }2j=1 were set to 0 for empirical size assessment and set to different values for power assessment. IBS kernel was used to measure genetic similarity and Gaussian kernel for subpopulation similarity. Four heterogeneity scenarios were considered 89 determined by the values of β1k and β2k . This includes the same effect size and the same effect direction (T1), identical sizes but opposite directions (T2), no effect in one subpopulation while positive effect in the other (T3), and different sizes but the same direction (T4). Table 5.5 shows that Type I error of both VZH and VZ are close to nominal level. Table 5.6 shows that when in the absence of genetic heterogeneity (T1), VZ is more powerful, however, when in the presence of genetic heterogeneity, VZH is more powerful and gains power as heterogeneity size increases. Table 5.5 Empirical size of VZ and VZH in testing genetic effects under genetic heterogeneity across two latent subpopulations, with covariates adjustment and left truncation. Empirical Size p=8, n=600 p=8, n=800 VZH 0.058 0.055 VZ 0.051 0.059 p=12, n=600 p=12, n=800 VZH 0.052 0.058 VZ 0.054 0.052 Genetic heterogeneity across twenty latent subpopulation In this simulation, we assessed empirical size and power of heterogeneity weighted V statistic in the presence of twenty latent subpopulations. The simulation setting is same to simu- lation above, except that we increased the number of latent subpopulation to twenty. We used X = (X1 , . . . , X25 ) to infer subpopulation structure. Xk = ak + ek (k = 1, . . . , 25) where ak is a length n bootstrap sample from {1, . . . , 20}, representing subgroup assign- ments. Random error ek = (ek1 , . . . , ekn ) ∼ N ormal(0, 0.5). Regression coefficients for j subpopulations, {β j = (βj1 , . . . , βjp ), s.t. βj1 = . . . = βjp }20 j=1 , in CSH function 5.7 were all set to 0 for empirical size assessment. In power assessment, {β j }20 j=1 were sampled from 90 Table 5.6 Power of test VZ and VZH in testing genetic effects under genetic heterogeneity across two latent subpopulations, considering adjustment covariates and left truncation. Various heterogeneity scenarios were considered, determined by the values of β1k and β2k . Heterogeneity Scenario T1 T2 T3 T4 β1k 0.2 0.3 0.25 0.35 0 0 0.05 0.1 β2k 0.2 0.3 0.05 0.05 0.15 0.25 -0.05 -0.1 VZH 0.193 0.410 0.244 0.394 0.138 0.316 0.095 0.216 p=8,n=600 VZ 0.448 0.724 0.243 0.362 0.103 0.176 0.048 0.049 VZH 0.258 0.512 0.341 0.529 0.187 0.377 0.103 0.252 p=8,n=800 VZ 0.512 0.833 0.328 0.436 0.119 0.221 0.049 0.057 VZH 0.180 0.396 0.341 0.624 0.239 0.486 0.146 0.427 p=12,n=600 VZ 0.566 0.819 0.270 0.392 0.088 0.172 0.047 0.044 VZH 0.261 0.529 0.499 0.759 0.311 0.642 0.171 0.538 p=12,n=800 VZ 0.659 0.905 0.323 0.497 0.131 0.226 0.055 0.078 uniform distribution with mean µβ and variance σβ2 . IBS kernel was used to measure genetic similarity and Gaussian kernel for background similarity. Table 5.7 shows that Type I error of both VZ and VZH are close to nominal level. Table 5.8 shows that VZH is more powerful than VZ and as heterogeneity size (σβ ) increases, VZH gains more power. Table 5.7 Empirical size of test VZ and VZH in testing genetic effects under genetic heterogeneity across twenty latent subpopulations considering adjustment covariates and left truncation. Empirical Size p=8, n=600 p=8, n=800 VZH 0.050 0.057 VZ 0.052 0.053 p=12, n=600 p=12, n=800 VZH 0.040 0.044 VZ 0.041 0.042 91 Table 5.8 Power of test VZH and VZ under genetic heterogeneity across twenty latent subpopulations, considering adjustment covariates and left truncation. Power p=8, n=600 p=8, n=800 (µβ , σβ ) (0.15, 0.15) (0.15, 0.25) (0.15, 0.15) (0.15, 0.25) VZH 0.211 0.434 0.261 0.532 VZ 0.177 0.139 0.185 0.167 p=12, n=600 p=12, n=800 (µβ , σβ ) (0.15, 0.15) (0.15, 0.25) (0.15, 0.15) (0.15, 0.25) VZH 0.353 0.691 0.449 0.844 VZ 0.172 0.149 0.218 0.183 Genetic heterogeneity across individual genome profile In this simulation, we investigated the empirical size and power of heterogeneity weighted V statistic in the presence of heterogeneity across individual genome profile. In stead of considering two or twenty ’categorical’ subpopulation structure, we now assume ’continuous’ subpopulation structure. In other words, each individual is a subpopulation. We used a genome profile of 1,000 SNPs, X = (X1 , . . . , X1000 ), to infer subpopulation structure. X was generated in three steps, 1) generate genetic effect sizes {β i = (βi1 , . . . , βip ), s.t. βi1 = . . . = βip }ni=1 from uniform distribution with mean µβ and variance σβ2 , 2) generate n-dim vectors Xd (d = 1, . . . , 1000) from M V N (0, Σ), where Sigma is a n × n identity matrix under null hypothesis and {exp(−|βi1 − βj1 |/σβ )}n×n under alternative hypothesis, 3) categorize Xd (d = 1, . . . , 1000) into SNPs coded by 0, 1, or 2 by using predetermined quantile cutoffs: a2 , 2a(1 − a), and (1 − a)2 of a standard normal distribution with a is MAF generated from Beta(2, 5), to ensure the resulting population is in HWE. IBS kernel was used to measure both genetic and subpopulation similarity. Table 5.9 shows that Type I error of both VZ and VZH are close to nominal level. Table 5.10 shows that VZH is more powerful than VZ and 92 for fixed mean effect size (µβ = 0.15), as heterogeneity size (σβ ) increases from 0.15 to 0.25, VZH gains more power. Table 5.9 Empirical size of test VZH and VZ in testing genetic effects under genetic heterogeneity across individual genome profiles, considering adjustment covariates and left truncation. Empirical Size p=8, n=600 p=8, n=800 VZH 0.050 0.053 VZ 0.048 0.053 p=12, n=600 p=12, n=800 VZH 0.048 0.052 VZ 0.052 0.051 Table 5.10 Power of test VZH and VZ under genetic heterogeneity across individual genome profiles, considering adjustment covariates and left truncation. Power p=8, n=600 p=8, n=800 (µβ , σβ ) (0.15, 0.15) (0.15, 0.25) (0.15, 0.15) (0.15, 0.25) VZH 0.196 0.354 0.289 0.518 VZ 0.154 0.109 0.172 0.133 p=12, n=600 p=12, n=800 (µβ , σβ ) (0.15, 0.15) (0.15, 0.25) (0.15, 0.15) (0.15, 0.25) VZH 0.404 0.720 0.541 0.892 VZ 0.163 0.106 0.192 0.078 5.2.3 Testing G-G/G-E interaction effect In this simulation, we investigated the performance of weighted V statistic in testing interac- tion effect. The performance was also compared with Wald test. For testing G-G interaction effect between two genes (SNP set of size p and q): G = (G1 , . . . , Gp ) and H = (H1 , . . . , Hq ). We assumed that two genes come from the same population, therefore have the same MAF 93 and LD structures. CSH of Cause 1 and Cause 2 for subject i is as follow,   λ1 (t|Gi , Hi ) = 6t2 + pj=1 0.2Gij + qj=1 0.25Hij + pq  P P P l=1 βl (GH)l ,   λ2 (t|Gi , Hi ) = 3t2 + pj=1 0.1Gij + qj=1 0.15Hij ,   P P where λ1 is CSH for Cause 1 competing event (primary event), (GH)l is l-th column of interaction between gene G and H. Two SNP-set sizes p = q = 2 and p = q = 3 were used. We set βl = 0 under null hypothesis of no G-G interaction effect and βl = 0.2 under alternative hypothesis, when G-G interaction effect exists only in Cause 1 (primary event). Cross-product kernel was used to measure genetic similarity for both gene G and H. For testing G-E interaction effect, the simulation setting is the same as testing G-G interaction effect except that H is replaced by a one-dim vector binary environmental variable generated from Binomial(0.5). Two SNP-set sizes p = 4, q = 1 and p = 6, q = 1 were considered. Identity kernel was used to measure environmental variable similarity. Table 5.11 and 5.12 show that Type I error of both VI and Wald test are close to na ominal level. In all scenarios, VI is more powerful than the Wald test since the asymptotic null distribution of VI has a more effective degree of freedom than the Wald test. Furthermore, the advantage of VI over the Wald test increases as the correlation between SNPs (LD) gets stronger. 5.2.4 Empirical size under stringent p-value thresholds In case-control studies, there is usually a huge amount of genetic variants are tested. To account for the multiple testing issue, Bonferroni correction is used and results in a p-value threshold that is 5×10−3 or smaller. In this simulation, we investigated the performance of 1) 94 Table 5.11 Empirical sizes and powers of test VI and Wald test in testing G-G interaction considering left truncation. Empirical Size (power) p=2, q=2, n=600 p=2, q=2, n=800 VI 0.048 (0.208) 0.058 (0.319) Wald 0.058 (0.133) 0.052 (0.197) p=3, q=3, n=600 p=3, q=3, n=800 VI 0.053 (0.397) 0.055 (0.562) Wald 0.044 (0.184) 0.059 (0.265) Table 5.12 Empirical sizes and powers of test VI and Wald test in detecting G-E interaction considering left truncation. Empirical Size (power) p=4, q=1, n=600 p=4, q=1, n=800 VI 0.044 (0.506) 0.043 (0.673) Wald 0.044 (0.358) 0.042 (0.485) p=6, q=1, n=600 p=6, q=1, n=800 VI 0.043 (0.620) 0.048 (0.786) Wald 0.042 (0.395) 0.042 (0.573) VZ in the absence of genetic heterogeneity and 2) VZH in the presence of genetic heterogeneity across observable subpopulations under stringent p-value thresholds. The corresponding simulation settings for significance level 0.5 were adopted, except that only scenario p = 8, n = 800 was investigated and 500K Monte Carlo samples were generated and empirical size is the proportion of p-values less than the corresponding thresholds. Table 5.13 shows that Type I error of both VZ and VZH are close to nominal level, indicating the capability of VZ and VZH for large-scale GWAS. 5.2.5 Small-sample correction So far, we have shown that VZ , VZH , and VI performed well under relatively large sample sizes as compared to SNP-set size. However, we noticed that the proposed weighted V 95 Table 5.13 Empirical sizes of test VZ and VZH under stringent p-value thresholds and left truncation. Empirical Size Threshold VZ VZH 0.05 0.050 0.051 0.005 0.0051 0.0050 0.0005 0.00053 0.00049 0.00005 0.000044 0.000047 statistics could be asymptotically wrong. In this series of simulations, we investigated the performance of small sample corrected (heterogeneity) weighted V statistic in the following three scenarios: 1) test VZc in the absence of genetic heterogeneity, 2) test VZH,c in the presence of genetic heterogeneity across observable subpopulations, and 3) test VIc in the presence of G-G and G-E interaction effect. In the absence of genetic heterogeneity under small sample size In this simulation, we assessed the empirical size and power of VZc and VZ in testing genetic association in the absence of genetic heterogeneity. The simulation settings were the same as that of the simulations shown above, except that we used sample size n = 100 and SNP- set size p = 15. Table 5.14 and Figure 5.1 show that small sample corrected weighted V statistic, VZc , is asymptotically correct without loss of power, while the unadjusted weighted V statistic, VZ , is not. In the presence of observable subpopulation under small sample size In this simulation, we assessed the empirical size and power of VZH,c and VZH in testing genetic association in the presence of genetic heterogeneity across observable subpopulations. The simulation settings were the same as simulations for unadjusted heterogeneity weighted V statistics, except that we used sample size n = 200 and SNP-set size p = 10. Table 5.15 96 and figure 5.2 show that small sample corrected weighted V statistic, VZH,c , is asymptotically correct without loss of power, while the unadjusted heterogeneity weighted V statistic, VZH , is not. Table 5.14 Empirical size and power of test VZ and VZc considering left truncation, with n = 100, p = 15. Empirical Size (power) VZc 0.059 (0.141) VZ 0.051 (0.134) 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) VZ (b) VZc Figure 5.1 Comparison of uniform QQ plot of test VZ and VZc in detecting genetic association without considering genetic heterogeneity effect, with n=100 and p=15. Table 5.15 Empirical size and power of test VZH and VZH,c considering genetic heterogeneity across two observable subpopulations and left truncation, with n = 200, p = 10. Epirical Size Power β1 0 0.6 0.7 VZH,c 0.046 0.211 0.249 VZH 0.043 0.196 0.237 Interaction test under small sample size In this simulation, we assessed the empirical size and power of VIc and VI in testing G-G and G-E interaction effects. The simulation settings were the same to simulation for weighted V 97 statistic without small-sample correction that test interaction effect. To mimic small sample scenario, we used smaller sample size and larger SNP-set size. Specifically, we used n = 300 and p = 10 for testing G-G interaction effect and n = 200 and p = 10 for testing G-E interaction effect. Table 5.16 and 5.17 show that type I error of small-sample corrected test statistic VIc is close to nominal level while that of VI is slightly inflated. Figure 5.3, 5.4 show that small sample corrected weighted V statistic, VIc , is asymptotically correct without loss of testing power, while VI has slightly inflated Type I error under small sample size. 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) VZH (b) VZH,c Figure 5.2 Comparison of uniform QQ plot of test VZH and VZH,c , considering genetic heterogeneity across two observable subpopulations, with n = 200 and p = 10. Table 5.16 Empirical size and power of test VI and VIc in testing G-G interaction effect, considering left truncation, with n = 300, p = q = 10. Empirical Size (power) VIc 0.044 (0.134) VI 0.059 (0.198) 5.3 A Real Application We applied the developed small sample adjusted (heterogeneity) weighted V test statistics, VZc and VZH,c , on ROSMAP by assuming the age-to-onset of AD follows the additive hazards 98 Table 5.17 Empirical size and power of test VI and VIc in testing G-E interaction effect, considering left truncation, with n = 200, p = 10. Empirical Size (power) VIc 0.055 (0.152) VI 0.069 (0.177) model. ROSMAP is a GWAS that includes dementia exam data from two large longitudinal studies of aging and dementia: 1) the Religious Orders Study (ROS) and the Rush Memory and Aging Project (MAP) (Bennett et al., 2018). The ROSMAP genotype dataset includes 1,679 subjects, each has 750,173 SNPs. We first performed quality control on genotype dataset 1) at SNP level by removing SNPs with MAF>0.01, HWE test’s p-value > 10−6 , or missing rate > 2%, then 2) at the subject level by removing subjects with missing genotype rate > 2%. This results in a quality-controlled genotype dataset with 1,618 subjects each having 619,061 SNPs. There were still ∼ 0.3% missing genotypes, which were then imputed with random samples generated from Binomial(2, M AF ), where MAF was estimated from the genotype dataset. PCA was performed using Plink software (version 1.9) to obtain the first 6 principal components that represent the underlying ancestry information. After that, we grouped all available SNPs into 21,285 different genes by using human genome annotation (GRCh38/hg38) obtained from UCSC Genome Browser. We also formed a new gene APOE4 which was coded as the count of the APOE4-ϵ4 allele. The ROSMAP phenotype dataset includes information from 2,543 subjects. Each subject has 1) baseline covariates: race, gender, years of education, and binary study cohort indicator (ROS or MAP) and 2) follow-up information: subject’s age and clinical cognitive diagnosis result at each examination center visit including at the study registry. For our analysis, we include only subjects who are disease-free in the study registry. Age at baseline served as 99 left-truncation time, the age at first AD diagnosis served as survival time, and the age at study termination served as right censoring time if the subject is disease-free throughout the study follow-up period. 1.0 1.0 0.8 0.8 0.6 Sample Quantile Sample Quantile 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) VI (b) VIc Figure 5.3 Comparison of uniform QQ plot of test VI and VIc in testing G-G interaction effect, with n=300, p=q=10. 1.0 1.0 0.8 0.8 Sample Quantile Sample Quantile 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Theoretical Quantile Theoretical Quantile (a) VI (b) VIc Figure 5.4 Comparison of uniform QQ plot of test VI and VIc in testing G-E interaction effect, with n=200, p=10, q=1. We merged preprocessed genotype and phenotype datasets by keeping only the overlap- ping subjects to generate the analysis-ready dataset, which includes 1,440 subjects, with a censoring rate of 62.5%. We are interested in 1) testing genetic covariates that are associated 100 with the time-to-onset of first AD diagnosis and 2) detecting potential genetic heterogeneity across subpopulations inferred by either baseline covariates or individual genome profiles (represented by a random sample of 200K SNPs), in the presence of left-truncation time and adjust for gender, year of education, study cohort indicator (ROS served as baseline), and first 6 principal components. Note that race is not adjusted because 1,439 out of 1,440 subjects are white. IBS kernel was used to measure genetic similarity, while the choice of similarity kernel for subpopulation similarity depends on the source of heterogeneity, which is elaborated in Table 5.18. Table 5.18 shows the results of the ROSMAP dataset analysis. To account for multiple testing issues, the p-value thresholds were obtained by controlling the FDR under 10% using the Benjamini-Hochberg procedure Benjamini and Hochberg (1995). Four different kernels were considered to measure genetic similarity: cross-product (CP), IBS, laplacian (Lap), and quadratic (Quad). When in the absence of genetic heterogeneity (scenario S1), APOE4 appeared to be the most significant gene followed by APOC1. When in the presence of genetic heterogeneity (scenarios S2-S4), even though APOE4 and APOC1 were still the most significant genes, their p-values varies across scenarios, with APOE4 having a notably smaller p-value when the Quadratic kernel was used under scenario S3 that considered genetic heterogeneity across gender. This indicates that the effect of APOE4 on the age-to-onset of AD varies between genders. 5.4 Discussion We developed a suite of multi-marker genetic association and interaction tests based on the additive hazards model. The proposed tests have three major advantages: 1) they fill the research gap that no available genetic association and interaction test for survival times 101 Table 5.18 Top five genes discovered by VZc and VZH,c from ROSMAP dataset by assuming age-to-onset of AD follows additive hazards model. CP, IBS, Lap and Quad are IBS, cross-product, Laplacian and Quadratic kernel, respectively, which measure genetic similarity. Various types of heterogeneity were considered, including no genetic heterogeneity (S1), heterogeneity across education categories (S2), heterogeneity between sexes (S3), and heterogeneity across genetic backgrounds (S4). Kernel Scenario Genes and p-values APOE4 APOC1 IGSF23 GRIP1 MIR12117 S1 3.55E-15 7.26E-10 6.05E-05 6.99E-05 1.06E-04 APOE4 APOC1 TBCC IGSF23 MIR12117 S2 1.99E-14 2.84E-09 6.75E-05 7.16E-05 8.42E-05 CP APOE4 APOC1 GRIP1 IGSF23 MIR12117 S3 3.10E-15 2.44E-09 6.12E-05 1.26E-04 1.37E-04 APOE4 APOC1 IGSF23 TBCC HSBP1 S4 7.63E-14 1.50E-09 7.09E-05 7.26E-05 1.22E-04 APOE4 APOC1 IGSF23 MIR12117 GRIP1 S1 7.97E-14 4.59E-09 8.21E-05 8.78E-05 9.65E-05 APOE4 APOC1 IGSF23 EFEMP2 DCAF12 S2 3.71E-11 2.50E-08 3.03E-04 3.42E-04 7.19E-04 IBS APOE4 APOC1 GRIP1 IGSF9B PLEKHG5 S3 1.42E-13 9.74E-09 8.82E-05 9.83E-05 1.20E-04 APOE4 APOC1 IGSF23 MIR12117 GRIP1 S4 7.53E-14 4.61E-09 8.25E-05 8.79E-05 9.66E-05 APOE4 APOC1 GRIP1 IGSF23 MIR12117 S1 4.73E-13 1.52E-08 6.27E-05 7.78E-05 8.72E-04 APOE4 APOC1 IGSF23 DCAF12 EFEMP2 S2 4.64E-12 3.86E-08 1.15E-04 2.27E-04 3.43E-04 Lap APOE4 APOC1 GRIP1 IGSF9B MIR12117 S3 8.44E-13 2.77E-08 5.53E-05 1.32E-04 1.39E-04 APOE4 APOC1 GRIP1 IGSF23 MIR12117 S4 4.69E-13 1.53E-08 6.27E-05 7.81E-05 8.74E-05 APOE4 APOC1 GRIP1 MIR12117 GABBR1 S1 1.11E-15 6.35E-10 6.66E-05 1.12E-04 1.14E-04 APOE4 APOC1 GRIP1 MIR12117 GABBR1 S2 6.43E-15 2.78E-09 8.38E-05 8.84E-05 1.10E-04 Quad APOE4 APOC1 GRIP1 EHHADH-AS1 GABBR1 S3 2.22E-16 3.10E-09 5.87E-05 7.55E-05 1.33E-04 APOE4 APOC1 GRIP1 MIR12117 GABBR1 S4 1.77E-15 6.24E-10 6.64E-05 1.11E-04 1.13E-04 p-value threshold 4.46E-07 8.91E-07 1.34E-06 1.78E-06 2.23E-06 that follow the additive hazards model, 2) they are powerful when the genetic effect is heterogeneous across subpopulations, and 3) they are robust to quadratic confounding effect and can handle competing risk. The asymptotic null distribution of weighted V statistic can be well approximated by a rescaled χ2 distribution, c0 χ2d0 , using the Satterthwaite method (Liu, Lin, and Ghosh, 2007). c0 is the scaled parameter and d0 is the χ2 distribution’s degree of freedom estimated by 102 matching the mean and variance of weighted V statistics and c0 χ2d0 . In testing G-G and G-E interaction effect, the scaled χ2 distribution of weighted V statistic has d0 degree of freedom, which is smaller than pq, the number of G-G interaction terms, when there is a weak or strong correlation between SNPs. Therefore the weighted V statistic is more powerful than regular tests (e.g., LRT, Wald test). 103 CHAPTER 6 Concluding remarks 6.1 Remarks regarding selection of survival model In this dissertation, we developed four suites of novel multi-marker association and interac- tion tests to detect genetic association and interaction with survival outcomes. Three popular survival models were studied: (conventional) Cox PH model, additive hazard model, and ac- celerated failure time (AFT) model. In a real application, the true survival model underlying the observed data is rarely known, therefore, it would be helpful to determine which model best fits the dataset and the goal of the analysis. Cox PH model is the most popular model that models the hazard rate as a function of predictor variables in a multiplicative manner as follows, λ(t|X) = λ0 (t) exp(β T X), where λ0 (t) is the non-parametric baseline hazard function and X is the predictor variable. One critical assumption of the Cox PH model is that the predictor variables analyzed should satisfy the PH assumption. In other words, for any two predictor variable values X and X ∗ , the corresponding hazard ratio is constant over time, as follows λ(t|X) λ0 (t) exp(β T X) = = exp(β T (X − X ∗ )). λ(t|X ∗ ) λ0 (t) exp(β T X ∗ ) Violation of PH assumption can be tested based on Schoenfeld residuals. When the PH assumption is violated, the additive hazard model and AFT model are alternatives to the Cox PH model. Additive hazard model assumes that the predictor affects hazard function in an additive 104 manner and also assumes constant hazard difference, instead of hazard ratio in the Cox PH model. λ(t|X) = λ0 (t) + β T X, AFT model models the direct effect of predictors on logarithmic survival time through a linear model, log(T ) = β T X + ϵ, where ϵ is an independent and identically distributed random variable and is independent of predictor X. A negative β indicates that an increase in X would accelerate the onset of the event of interest and a negative value prolongs the event onset. Two important assumptions of the AFT model are the linearity assumption and the constant time ratio assumption. Linearity assumption, similar to that in linear regression models, assumes that logarithmic survival time has a linear relationship with covariates. The constant time ratio assumption assumes that the survival time ratio is constant over time for any given two sets of covariates X and X ∗ as follows, T exp(β T X)eϵ T ∗ ∗ (β T (X−X ∗ )) ∗ = T = exp(β (X − X )) ⇒ T = T e . T exp(β X ∗ )eϵ The appropriateness of the AFT model can be assessed based on Cox-Snell residual plot. When the survival time is assumed to follow a class of semiparametric transformation model as introduced in Chapter 3. We have the flexibility to choose from different transfor- mation functions controlled by r (r ≥ 0). For example, we can vary r from 0 to 1.5 in steps of 0.05 (Zeng, Mao, and Lin, 2016), and the transformation function corresponds to the r 105 value that achieves the highest likelihood is selected. One can also choose from r = 0 and r = 1, whichever is the closest to the optimal r to achieve better model interpretability. 6.2 Future work Throughout this dissertation, the proposed four suites of novel association and interac- tion tests all assume independence between study subjects. This assumption is violated in datasets such as time-to-caries data of second molars in preschool child participants of DDHP (Detroit Dental Health Project) dataset studied in Pak, Li, and Todem (2019), where four tooth-level time-to-ECC were considered for each child participant. So far, a multi-marker genetic association and interaction test is lacking for multivariate survival data. However, the estimation methods for such multivariate survival data were developed in Lin (1994) for the multivariate Cox PH model, and in Zeng, Gao, and Lin (2017) for multivariate interval-censored data, which contribute to the development of the multi-marker test. In many applications (Kelly and Lim, 2000; Amorim and Cai, 2014), we deal with the recurrent event, where the event of interest might occur repeatedly, instead of once as as- sumed in this dissertation. A recurrent event subject to interval censoring is panel count data, which has been studied (Zeng and Lin, 2020; Sun and Wei, 2000; Wang, Ma, and Yan, 2013). It is worth developing a multi-marker association and interaction test for such panel count data since this type of data is frequently encountered in complex human diseases such as bladder tumor study (Byar, 1980) and skin cancer (Bailey et al., 2010). 106 APPENDICES 107 APPENDIX A: Proofs for Chapter 2 Proof of Theorem 1 We prove Theorem 1 following the proof of Theorem 3 in Wei and Lu (2017). By the expression (2), we have ∞ n ∗ X 1 X ∗ nV = (√ ψt (Gi )Mi )2 . (6.1) t=1 n i=1 Under the null hypothesis that T is independent of G, E[ψt∗ (Gi )Mi ] = E[ψt∗ (Gi )]EMi = 0 (6.2) and    ηt λ if t = s  E[ψt∗ (Gi )Mi ψs∗ (Gi )Mi ] = E[ψt∗ (Gi )ψs∗ (Gi )]E(Mi2 ) = (6.3)   0  otherwise These results, in conjunction with the multivariate central limit theorem, imply that for any Pn finite index set I = {i1 , . . . , iK }, the multivariate random variable { √1n i=1 ψt∗ (Gi )Mi }t∈I converges in distribution to a zero-mean multivariate normal random variable with a covari- ance matrix whose lk-th element is ληl I(l = k) (l = 1, . . . , K; k = 1, . . . , K). In addition, we have X∞ X∞ E[{ψt∗ (G1 )M1 }2 ] = λ ηt t=1 t=1 = λE[f˜(Gi , Gj )] < ∞. By Theorem 2.13.1 in van der Vaart and Wellner (1996), the countably infinite sequence of R∞ functions {ψt∗ (·)M (·)}, where M (N (·), Y (·)) ≡ N (∞) − 0 Y (s)dΛ(s) , is a Donsker 108 Pn class. Therefore, the empirical process, √1 n i=1 ψt∗ (Gi )Mi , converges weakly to a zero- mean Gaussian process Zt with covariance function, cov(Zt , Zs ) = ηt λI(t = s). By continu- ous mapping theorem, we have X∞ X∞ ∗ nV ⇝ Zt2 =λ ηt χ21t , t=1 t=1 where χ21t ’s are independent chi-square random variables with degree 1. Proof of Theorem 2 Similar to (6.1), nVZ∗ can be re-written as ∞ n X 1 X ∗ nVZ∗ = (√ ϕt (Gi , Zi )MZ,i )2 , (6.4) t=1 n i=1 where ϕ∗t (Gi , Zi ) = νt0.5 ϕt (Gi , Zi ). Under the null hypothesis that T is independent of G given Z, E[ϕ∗t (Gi , Zi )MZ,i ] = E[ϕ∗t (Gi , Zi )E{MZ,i |Gi , Zi }] = E[ϕ∗t (Gi , Zi )E{MZ,i |Zi }] = 0 We are then going to prove that E[ϕ∗t (Gi , Zi )MZ,i ϕ∗s (Gi , Zi )MZ,i ] = ξνt I(t = s). In- deed, when G is independent of Z, it is easy to show that f˜Z (Gi , Gj ) = f (Gi , Gj ) − E[f (Gi , Gj )|Gi ]−E[f (Gi , Gj )|Gj ]+E[f (Gi , Gj )]. When G = a+BT Z+e and f (Gi , Gj ) = 109 GTi Gj , f˜Z (Gi , Gj ) = GTi Gj − GTi E(Gj |Zj ) − E(GTi |Zi )Gj + E(GTi |Zi )E(Gj |Zj ) = eTi ej . In both cases, f˜Z (Gi , Gj ) is independent of (Zi , Zj ). As a result, ϕ∗t (Gi , Zi ) is also indepen- dent of Zi . Thus, E[ϕ∗t (Gi , Zi )MZ,i ϕ∗s (Gi , Zi )MZ,i ] = E[ϕ∗t (Gi , Zi )ϕ∗s (Gi , Zi )E{MZ,i 2 |Gi , Zi }] = E[ϕ∗t (Gi , Zi )ϕ∗s (Gi , Zi )E{MZ,i 2 |Zi }] = E[ϕ∗t (Gi , Zi )ϕ∗s (Gi , Zi )]E[E{MZ,i 2 |Zi }] 2 = νt I(t = s)E{MZ,i } = ξνt I(t = s). The rest of the proof follows the same arguments as in the proof of Theorem 1. 110 APPENDIX B: Proofs for Chapter 4 Derivation of the large-sample null distribution of weighted V statistic based on AFT model Write E1 as E1 = (E11 , . . . , E1n ). Then Xn E1 M̃ = E1i M̃i i=1 n ( n ) Z ∞ X νj (β̂, t)Yj (β̂, t) X = E1i dNi (β̂, t) − Pn dNj (β̂, t) i=1 −∞ j=1 νj (β̂, t)Yj (β̂, t) j=1 n Z ∞ ( Pn ) j=1 E1j νj (β̂, t)Yj (β̂, t) X = E1i − Pn dNi (β̂, t) i=1 −∞ j=1 ν j (β̂, t)Y j (β̂, t) ≡ Qn (β̂). Under the null, β̂ is n1/2 -consistent for β (Lai and Ying, 1991). Using similar arguments to the proof of Theorem 1(ii) in Lai and Ying (1991), it can be shown that n−1/2 E1 (M̃ − M̂) = n−1/2 {Qn (β̂) − Qn (β)} = Bn1/2 (β̂ − β) + op (1 + n1/2 ∥β̂ − β∥), (6.5) where B is the asymptotic slope matrix of n−1 Qn (β). Following Zeng and Lin (2006), we use a least squares method to estimate B. Specifically, let β̃ = β̂ +n−1/2 W, where W ∼ N (0, Iq ) and Iq is a q × q identity matrix. (6.5) implies that n−1/2 {Qn (β̃) − Qn (β̂)} = Bn1/2 (β̃ − β̂) + op (1) = BW + op (1). The least squares method has the following steps: 111 Step 1: Generate L, say 10,000, independent realizations of W, denoted by W1 , . . . , WL . Step 2: Calculate n−1/2 {Qn (β̂ + n−1/2 Wl ) − Qn (β̂)} (l = 1, . . . , L). Step 3: For j = 1, . . . , m, regress n−1/2 {Qn (β̂ + n−1/2 Wl ) − Qn (β̂)} onto Wl (l = 1, . . . , L) to estimate Bj , the j-th row of B, using the least squares estimation. Denote the estimator of B by B̂. Recall that β̂ is obtained from Eq. (5) in Chiou and Xu (2017) with log-rank weights, n ∞ ( Pn ) Z ν (β, t)Y (β, t) Z 1X j=1 j j j Un (β) ≡ Zi − Pn dNi (β, t) = 0. n i=1 −∞ j=1 νj (β, t)Yj (β, t) Chiou and Xu (2017) showed that n1/2 (β̂ − β) = −A−1 n1/2 Un (β) + op (1), (6.6) where A is the asymptotic slope matrix of Un (β). A can be estimated using a similar method to that for estimating B. Specifically, we carry out the following steps: Step I: Generate L̃, say 10,000, independent realizations of S ∼ N (0, Iq ), denoted by S1 , . . . , SL̃ . Step II: Calculate n1/2 Un (β̂ + n−1/2 Sl ) (l = 1, . . . , L̃). Step III: For j = 1, . . . , q, regress n1/2 Un (β̂ + n−1/2 Sl ) onto Sl (l = 1, . . . , L̃) to estimate Aj , the j-th row of A, using the least squares estimation. Denote the estimator of A by Â. Combining (6.5) and (6.6), we have E1 (M̃ − M̂) = −BA−1 nUn (β) + op (n1/2 ). (6.7) 112 Rt Let Mi (t) = Ni (β, t) − −∞ νi (β, u)Yi (β, u)dΛε (u), which a martingale under the null. It is easy to show that n Z ( Pn ) 1X ∞ j=1 Zj νj (β, t)Yj (β, t) Un (β) = Zi − Pn dMi (β, t) n i=1 −∞ j=1 νj (β, t)Yj (β, t) 1 T ∞ Z = Z (In − V (β, t)1T )dM(t), (6.8) n −∞ where 1 is a n-dimension vector of 1’s, Z = (Z1 , . . . , Zn )T , and V (β, t) is a n × 1 vec- tor V (β, t) = (ν1 (β, t)Y1 (β, t), . . . , ν1 (β, t)Yn (β, t))T { nj=1 νj (β, t)Yj (β, t)}−1 , and M(t) = P (M1 (t), . . . , Mn (t))T . It is also easy to show that Z ∞ M̂ = (In − V (β, t)1T )dM(t). (6.9) −∞ Combining (6.7), (6.8) and (6.9) , we have E1 M̃ = E1 M̂ + E1 (M̃ − M̂) Z ∞ = (E1 − BA−1 Z T )(In − V (β, t)1T )dM(t) + op (n1/2 ) −∞ Since M(t) is a vector of independent martingales under the null, whose compensators are Rt Rt Rt Λ(t) = −∞ dΛ(t) ≡ ( −∞ dΛ1 (t), . . . , −∞ dΛn (t))T , where dΛi (t) = νi (β, t)Yi (β, t)dΛε (t) (i = 1, . . . , n), we have by the martingale theory (Fleming and Harrington, 1991, Chapters 2 and 5) that Z ∞ Cov(E1 M̃) ≈ (E1 −BA−1 Z T )(In −V (β, t)1T )diag(dΛ(t))(In −1V (β, t)T )(ET1 −ZA−T B T ), −∞ 113 where diag(a) represents a diagonal matrix with a as the diagonal, and that E1 M̃ is approx- miately multivariate normal with mean zero. An estimator of Cov(E1 M̃) is Z ∞ ˆ Cov(E 1 M̃) = (E1 −B̂ Â−1 Z T )(In −V (β̂, t)1T )diag(dΛ̃(t))(In −1V (β̂, t)T )(ET1 −Z Â−T B̂ T ), −∞ where dΛ̃(t) = (dΛ̃1 (t), . . . , dΛ̃n (t))T with each element as follows, Xn Xn dΛ̃i (t) = νi (β̂, t)Yi (β̂, t) dNj (β̂, t){ νj (β̂, t)Yj (β̂, t)}−1 . (6.10) j=1 j=1 By algebra, we can simplify Cov(E ˆ 1 M̃) as Z ∞ ˆ Cov(E 1 M̃) = (E1 −B̂ Â−1 Z T ){diag(dΛ̃(t))−V (β̂, t)1T dN(β̂, t)V (β̂, t)T }(ET1 −Z Â−T B̂ T ), −∞ (6.11) where N(β̂, t) = (N1 (β̂, t), . . . , Nn (β̂, t))T . Take an eigendecomposition of Cov(E ˆ 1 M̃),   λ1    ˆ ...   T Cov(E 1 M̃) = Γ   Γ ,      λm where Γ is a m × m orthogonal matrix. Together with the asymptotic normality of E1 M̃ , we have   1/2 λ1    d  ...  E1 M̃ ≈ Γ    χ,      1/2 λm 114 where χ ≡ (χ11 , . . . , χ1m )T ∼ N (0, Im ). Thus, m T T d X VZ = M̃ KM̃ = M̃ ET1 E1 M̃ ≈ λj χ21j , j=1 where χ21j ’s are independent chi-square variables with degree 1. 115 APPENDIX C: Proofs for Chapter 5 Derivation of the large-sample null distribution of weighted V statistic based on additive hazard model Consider a general scenario where the hazard function of survival time T that is associated with a possibly time-varying d-dimensional covariate Z(t) is as follows, λ(t|Z) = λ0 (t) + β T Z(t). The proposed weighted V test statistic is VZ = M̃T FM̃, where M̃ = (M̃1 , . . . , M̃n )T and M̃i = Mi (Λ̂0 (β̂, t), β̂) (i = 1, . . . , n) as follows, Z ∞ n Pn dNj (t) T T o j=1 M̃i = Ni (∞) − Yi (t) Pn − β̂ Z̄(t)dt + β̂ Zi (t)dt , 0 j=1 Yi (t) Pn Pn where Z̄(t) = j=1 Yj (t)Zj (t)/ j=1 Yj (t) = ZT (t)V (t), with V (t) = Y (t)(1T Y (t))−1 . V (t) = (V1 (t), . . . , Vn (t))T and Y (t) = (Y1 (t), . . . , Yn (t))T . β̂ is n1/2 -consistent estimator of β as shown in equaltion (2.8) in Lin and Ying (1994). By first-order Taylor expansion of M̃ around β 0 , true parameter value of β, we have ∂ M̃ M̃ ≈ M̂ + |β=β0 (β̂ − β 0 ) (6.12) ∂β T 116 The influence function of β̂ is 1 (β̂ − β 0 ) = D−1 U (β 0 ) + op ( √ ) n X n Z ∞ ≈ D−1 {Zi (t) − Z̄(t)}dMi (t) 0 Zi=1∞ = D−1 {Z(t)T − Z̄(t)1Tn }dM(t), (6.13) 0 where Z(t) = (Zi (t), . . . , Zn (t))T is a n × d covariate matrix at time t, Z̄(t) is a d vector at time t, 1n is n × 1 vector with each element equals 1, and M(t) = (M1 (t), . . . , Mn (t)). U (β 0 ) is estimating function of β 0 according to equation (2.7) in Lin and Ying (1994). When covariate is time-independent, Z(t) = Z, we have Z ∞ −1 T (β̂ − β 0 ) = D Z (In − V (t)1T )dM(t). 0 Matrix D has the following form X n Z ∞ N 2 D = Yj (t){Zj (t) − Z̄(t)} dt j=1 0 Z ∞   = Z(t)T diag(Y (t)) − Y (t)V (t)T Z(t)dt, (6.14) 0 N 2 where a = aaT . When covariate is time-independent, Z(t) = Z, we have h Z ∞ Z ∞ i T Y (t)V (t)T dt Z  D = Z diag Y (t)dt − 0 0 = ZT WZ, 117 R∞  R∞ where W = diag 0 Y (t)dt − 0 Y (t)V (t)T dt. M̂ in equation 6.12 can be written as Z ∞ M̂ = dM̂(t) 0 ∞ Pn j=1 dMj (t) nZ on = dMi (t) − Yi (t) Pn 0 j=1 Yj (t) i=1 Z ∞ = (In − V (t)1T )dM(t) 0 The first-order derivative of M̃ around β 0 (∂ M̃/∂β T |β=β0 ) in equation 6.12 can be written as ∂ M̃ nZ ∞ on T T |β=β 0 = Y i (t)(Z̄(t) − Z i (t) )dt ∂β T i=1 Z0 ∞ = − (diag(Y (t) − Y (t)V (t)T )Z(t)dt 0 = −G. (6.15) When covariate is time-independent, we have Z ∞ ∂ M̃ |β=β0 = − (diag(Y (t)) − Y (t)V (t)T )dt Z ∂β T 0 = −WZ By plugging 6.13, 6.14, and 6.15 into equation 6.12, we can approximate M̃ as follows, Z ∞ Z ∞ T −1 M̃ ≈ (In − V (t)1 )dM(t) − GD Z(t)T (In − V (t)1T )dM(t) Z0 ∞ 0 = (In − GD−1 )(In − V (t)1T )dM(t). 0 118 When covariate is time-independent, Z(t) = Z, the approximation of M̃ can be simplified as Z ∞ −1 T M̃ ≈ (In − WZ(Z WZ) ) (In − V (t)1T )dM(t). 0 Since M(t) is a vector of independent martingales under the null hypothesis, whose com- Rt Rt Rt pensators are Λ(t) = 0 dΛ(t) = ( 0 dΛ1 (t), . . . , 0 dΛn (t))T , where dΛi (t) = Yi (t)dΛ(t) (i = 1, . . . , n). By martingale theory (Fleming and Harrington, 1991, Chapters 2 and 5), M̃ is approximately M V N (0, Cov(M̃)), where Cov(M̃) is approximated as follows, Z ∞ Cov(M̃) ≈ (In − GD−1 Z(t)T )(In − V (t)1T )diag(dΛ(t))(In − 1V (t)T )(In − Z(t)D−1 G T ), 0 where dΛ(t) is approximates Cov(dM(t)). When Z(t) is independent of time, Z(t) = Z, we have Z ∞ −1 T T (In − V (t)1T )diag(dΛ(t))(In − 1V (t)T )   Cov(M̃) ≈ (In − WZ(Z WZ) Z ) 0 (In − Z(ZT WZ)−1 ZT WT ), where dΛ(t) is a n × 1 vector with i-th element equals {Yi (t)(dΛ0 (t) + β T0 Zi (t)dt)} (i = 1, . . . , n). Replace β 0 with its estimator β̂, we can obtain an estimator of Cov(M̃) as follows, Z ∞ ˆ M̃) = (In − WZ(ZT WZ)−1 ZT ) (In − V (t)1T )diag(dΛ̂(t))(In − 1V (t)T )   Cov( 0 (In − Z(ZT WZ)−1 ZT WT ), 119 where dΛ̂(t) = Y (t)V (t)T dN (t) + (diag(Y (t)) − Y (t)V (t)T )Z(t)β̂dt. Write F = ET1 E1 for some m × n matrix E1 , where m (1 ≤ m ≤ n) is the rank of F. Then the weighted V statistic can be written as VZ = (E1 M̃)T (E1 M̃), with E1 M̃ ∼ ˆ N (0, Cov(E ˆ 1 M̃)). Take an eigendecomposition of Cov(E1 M̃),   λ1    ˆ Cov(E  ..  T 1 M̃) = Γ   . Γ ,      λm where {λi }m i=1 are eigenvalues and Γ is a m × m orthogonal matrix. Together with the asymptotic normality of E1 M̃ , we have   1/2 λ1    d  ..  E1 M̃ ≈ Γ  .  χ,      1/2 λm where χ ≡ (χ11 , . . . , χ1m )T ∼ N (0, Im ). Thus, m T T d X VZ = M̃ FM̃ = (E1 M̃) E1 M̃ ≈ λj χ21j , j=1 where χ21j ’s are independent chi-square variables with degree 1. 120 BIBLIOGRAPHY 121 BIBLIOGRAPHY Amorim, L.D., and J. Cai. 2014. “Modelling recurrent events: a tutorial for analysis in epidemiology.” International Journal of Epidemiology 44:324–333. Andersen, P.K., and R.D. Gill. 1982. “Cox’s Regression Model for Counting Processes: A Large Sample Study.” The Annals of Statistics 10:1100 – 1120. Azzato, E.M., P.D. Pharoah, P. Harrington, D.F. Easton, D. Greenberg, N.E. Caporaso, S.J. Chanock, R.N. Hoover, G. Thomas, D.J. Hunter, and P. Kraft. 2010. “A Genome-Wide Association Study of Prognosis in Breast Cancer.” Cancer Epidemiology and Prevention Biomarkers 19:1140–1143. Bailey, H.H., K. Kim, A.K. Verma, K. Sielaff, P.O. Larson, S. Snow, T. Lenaghan, J.L. Viner, J. Douglas, N.E. Dreckschmidt, M. Hamielec, M. Pomplun, H.H. Sharata, D. Puchalsky, E.R. Berg, T.C. Havighurst, and P.P. Carbone. 2010. “A Randomized, Double-Blind, Placebo-Controlled Phase 3 Skin Cancer Prevention Study of alpha- Difluoromethylornithine in Subjects with Previous History of Skin Cancer.” Cancer Pre- vention Research 3:35–47. Beer, D., S. Kardia, C.C. Huang, T. Giordano, A. Levin, D. Misek, L. Lin, G. Chen, G. Tarek, D. Thomas, M. Lizyness, R. Kuick, S. Hayasaka, J. Taylor, M. Iannettoni, M. Orringer, and S. Hanash. 2002. “Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma.” Nature medicine 8:816–24. Benjamini, Y., and Y. Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Methodological) 57:289–300. Bennett, D.A., A.S. Buchman, P.A. Boyle, L.L. Barnes, R.S. Wilson, and J.A. Schnei- der. 2018. “Religious Orders Study and Rush Memory and Aging Project.” Journal of Alzheimer’s disease : JAD 64:S161—S189. Beyene, J., D. Tritchler, J.L. Asimit, and J.S. Hamid. 2009. “Gene- or region-based analysis of genome-wide association studies.” Genetic Epidemiology 33:S105–S110. Beyersmann, J., A. Latouche, A. Buchholz, and M. Schumacher. 2009. “Simulating compet- ing risks data in survival analysis.” Statistics in Medicine 28:956–971. Byar, D.P. 1980. The Veterans Administration Study of Chemoprophylaxis for Recurrent Stage I Bladder Tumours: Comparisons of Placebo, Pyridoxine and Topical Thiotepa, Boston, MA: Springer US. pp. 363–370. 122 Cai, T., G. Tonini, and X. Lin. 2011. “Kernel Machine Approach to Testing the Significance of Multiple Genetic Markers for Risk Prediction.” Biometrics 67:975–986. Chen, H., T. Lumley, J. Brody, N.L. Heard-Costa, C.S. Fox, L.A. Cupples, and J. Dupuis. 2014. “Sequence Kernel Association Test for Survival Traits.” Genetic Epidemiology 38:191–197. Chen, H.Y., S.L. Yu, C.H. Chen, G.C. Chang, C.Y. Chen, A. Yuan, C.L. Cheng, C.H. Wang, H.J. Terng, S.F. Kao, W.K. Chan, H.N. Li, C.C. Liu, S. Singh, W.J. Chen, J.J. Chen, and P.C. Yang. 2007. “A Five-Gene Signature and Clinical Outcome in Non–Small-Cell Lung Cancer.” New England Journal of Medicine 356:11–20. Chen, J., W. Chen, N. Zhao, M.C. Wu, and D.J. Schaid. 2016. “Small Sample Kernel Associa- tion Tests for Human Genetic and Microbiome Association Studies.” Genetic Epidemiology 40:5–19. Chiou, S.H., and G. Xu. 2017. “Rank-Based Estimation for Semiparametric Accelerated Fail- ure Time Model under Length-Biased Sampling.” Statistics and Computing 27:483–500. Cristianini, N., and J. Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. Davies, R.B. 1980. “Algorithm AS 155: The Distribution of a Linear Combination of χ2 Random Variables.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 29:323–333. Emond, M.J., T. Louie, J. Emerson, W. Zhao, R.A. Mathias, M.R. Knowles, F.A. Wright, M.J. Rieder, H.K. Tabor, D.A. Nickerson, K.C. Barnes, R.L. Gibson, M.J. Bamshad, L. National Heart, B.I.N.G.E.S. Project, and L. GO. 2012. “Exome sequencing of extreme phenotypes identifies DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis.” Nature Genetics 44:886–889. Fleming, T.R., and D.P. Harrington. 1991. Counting Processes and Survival Analysis. John Wiley & Sons. Galvan, A., J.P. Ioannidis, and T.A. Dragani. 2010. “Beyond genome-wide association stud- ies: genetic heterogeneity and individual predisposition to cancer.” Trends in Genetics 26:132 – 141. Goeman, J.J., J. Oosting, A.M. Cleton-Jansen, J.K. Anninga, and H.C. van Houwelingen. 2005. “Testing association of a pathway with survival using gene expression data.” Bioin- formatics 21:1950–1957. Hofmann, T., B. Schölkopf, and A.J. Smola. 2008. “Kernel methods in machine learning.” The Annals of Statistics 36:1171 – 1220. 123 Huang, K.L., E. Marcora, A. Pimenova, A. Narzo, M. Kapoor, S.C. Jin, O. Harari, S. Ber- telsen, B. Fairfax, J. Czajkowski, V. Chouraki, B. Grenier-Boley, C. Bellenguez, Y. Dem- ing, A. McKenzie, T. Raj, A. Renton, J. Budde, A. Smith, and A. Goate. 2017. “A common haplotype lowers PU.1 expression in myeloid cells and delays onset of Alzheimer’s disease.” Nature neuroscience 20. Huang, Y.T., R.S. Heist, L.R. Chirieac, X. Lin, V. Skaug, S. Zienolddiny, A. Haugen, M.C. Wu, Z. Wang, L. Su, K. Asomaning, and D.C. Christiani. 2009. “Genome-Wide Analysis of Survival in Early-Stage Non–Small-Cell Lung Cancer.” Journal of Clinical Oncology 27:2660–2667, PMID: 19414679. Hudgens, M.G., C. Li, and J.P. Fine. 2014. “Parametric likelihood inference for interval censored competing risks data.” Biometrics 70:1–9. Hughey, J.J., S.D. Rhoades, D.Y. Fu, L. Bastarache, J.C. Denny, and Q. Chen. 2019. “Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record.” BMC genomics 20:805. Jabbari, E., S. Koga, R.R. Valentino, R.H. Reynolds, R. Ferrari, M.M.X. Tan, J.B. Rowe, C.L. Dalgard, S.W. Scholz, D.W. Dickson, T.T. Warner, T. Revesz, G.U. Höglinger, O.A. Ross, M. Ryten, J. Hardy, M. Shoai, H.R. Morris, K.Y. Mok, D.P. Murphy, S. Al-Sarraj, C. Troakes, S.M. Gentleman, K.S. Allinson, Z. Jaunmuktane, J.L. Holton, A.J. Lees, C.M. Morris, Y. Compta, E. Gelpi, J.C. van Swieten, A. Rajput, L. Ferguson, M.R. Cookson, J.R. Gibbs, C. Blauwendraat, J. Ding, R. Chia, B.J. Traynor, A. Pantelyat, C. Viollet, B.J. Traynor, O. Pletnikova, J.C. Troncoso, L.S. Rosenthal, A.L. Boxer, G. Respondek, T. Arzberger, S. Roeber, A. Giese, D.J. Burn, N. Pavese, A. Gerhard, C. Kobylecki, P.N. Leigh, A. Church, and M. T.M. Hu. 2021. “Genetic determinants of survival in progressive supranuclear palsy: a genome-wide association study.” The Lancet Neurology 20:107–116. Kapoor, M., J.C. Wang, L. Wetherill, N. Le, S. Bertelsen, A.L. Hinrichs, J. Budde, A. Agrawal, L. Almasy, K. Bucholz, D.M. Dick, O. Harari, X. Xiaoling, V. Hesselbrock, J. Kramer, J.I. Nurnberger, J. Rice, M. Schuckit, J. Tischfield, B. Porjesz, H.J. Eden- berg, L. Bierut, T. Foroud, and A. Goate. 2014. “Genome-wide survival analysis of age at onset of alcohol dependence in extended high-risk COGA families.” Drug and Alcohol Dependence 142:56–62. Kelly, P.J., and L.L.Y. Lim. 2000. “Survival analysis for recurrent event data: an application to childhood infectious diseases.” Statistics in Medicine 19:13–33. Kim, H.Y., J.M. Williamson, and H.M. Lin. 2016. “Power and sample size calculations for interval-censored survival analysis.” Statistics in Medicine 35:1390–1400. Lai, T.L., and Z. Ying. 1991. “Rank Regression Methods for Left-Truncated and Right- Censored Data.” The Annals of Statistics 19:531 – 556. 124 Lee, S., G.R. Abecasis, M. Boehnke, and X. Lin. 2014. “Rare-variant association analysis: study designs and statistical tests.” Am J Hum Genet 95:5–23. Lee, S., M.J. Emond, M.J. Bamshad, K.C. Barnes, M.J. Rieder, D.A. Nickerson, D.C. Chris- tiani, M.M. Wurfel, and X. Lin. 2012. “Optimal Unified Approach for Rare-Variant Asso- ciation Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies.” The American Journal of Human Genetics 91:224–237. Li, C., D. Pak, and D. Todem. 2020. “Adaptive lasso for the Cox regression with inter- val censored and possibly left truncated data.” Statistical Methods in Medical Research 29:1243–1255, PMID: 31203741. Lin, D.Y. 1994. “Cox regression analysis of multivariate failure time data: The marginal approach.” Statistics in Medicine 13:2233–2247. Lin, D.Y., and L.J. Wei. 1989. “The Robust Inference for the Cox Proportional Hazards Model.” Journal of the American Statistical Association 84:1074–1078. Lin, D.Y., and Z. Ying. 1994. “Semiparametric analysis of the additive risk model.” Biometrika 81:61–71. Liu, D., X. Lin, and D. Ghosh. 2007. “Semiparametric Regression of Multidimensional Ge- netic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models.” Biomet- rics 63:1079–1088. McClellan, J., and M.C. King. 2010. “Genetic Heterogeneity in Human Disease.” Cell 141:210 – 217. Nagy, Á., G. Munkácsy, and B. Győrffy. 2021. “Pancancer survival analysis of cancer hall- mark genes.” Scientific Reports 11:6047. Pak, D., C. Li, and D. Todem. 2019. “Semiparametric analysis of correlated and interval- censored event-history data.” Statistical Methods in Medical Research 28:2754–2767. Shaffer, J., X. Wang, E. Feingold, M.K. Lee, F. Begum, D. Weeks, K. Cuenco, M. Barmada, S. Wendell, D. Crosslin, C. Laurie, K. Doheny, E. Pugh, Q. Zhang, B. Feenstra, F. Geller, H. Boyd, H. Zhang, M. Melbye, and M. Marazita. 2011. “Genome-wide Association Scan for Childhood Caries Implicates Novel Genes.” Journal of dental research 90:1457–62. Shu, X.O., J. Long, W. Lu, C. Li, W.Y. Chen, R. Delahanty, J. Cheng, H. Cai, Y. Zheng, J. Shi, K. Gu, W.J. Wang, P. Kraft, Y.T. Gao, Q. Cai, and W. Zheng. 2012. “Novel Genetic Markers of Breast Cancer Survival Identified by a Genome-Wide Association Study.” Cancer Research 72:1182–1189. Sinnott, J.A., and T. Cai. 2013. “Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling.” Biometrics 69:861–873. 125 Staley, J.R., E. Jones, S. Kaptoge, A.S. Butterworth, M.J. Sweeting, A.M. Wood, and J.M.M. Howson. 2017. “A comparison of Cox and logistic regression for use in genome- wide association studies of cohort and case-cohort design.” European journal of human genetics : EJHG 25:854—862. Subramanian, A., P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, and J.P. Mesirov. 2005. “Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide ex- pression profiles.” Proceedings of the National Academy of Sciences 102:15545–15550. Sun, J., and L.J. Wei. 2000. “Regression Analysis of Panel Count Data with Covariate- Dependent Observation and Censoring Times.” Journal of the Royal Statistical Society. Series B (Statistical Methodology) 62:293–302. Sun, T., and Y. Ding. 2019. “Copula-based semiparametric regression method for bivariate data under general interval censoring.” Biostatistics 22:315–330. van der Net, J.B., A.C.J.W. Janssens, M.J.C. Eijkemans, J.J.P. Kastelein, E.J.G. Sijbrands, and E.W. Steyerberg. 2008. “Cox proportional hazards models have more statistical power than logistic regression models in cross-sectional genetic association studies.” European journal of human genetics : EJHG 16:1111—1116. van der Vaart, A.W., and J. Wellner. 1996. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics, Springer. Wang, X., S. Ma, and J. Yan. 2013. “AUGMENTED ESTIMATING EQUATIONS FOR SEMIPARAMETRIC PANEL COUNT REGRESSION WITH INFORMATIVE OBSER- VATION TIMES AND CENSORING TIME.” Statistica Sinica 23:359–381. Wei, C., and Q. Lu. 2017. “A generalized association test based on U statistics.” Bioinfor- matics 33:1963–1971. Yu, S.L., H.Y. Chen, G.C. Chang, C.Y. Chen, H.W. Chen, S. Singh, C.L. Cheng, C.J. Yu, Y.C. Lee, H.S. Chen, T.J. Su, C.C. Chiang, H.N. Li, Q.S. Hong, H.Y. Su, C.C. Chen, W.J. Chen, C.C. Liu, W.K. Chan, W.J. Chen, K.C. Li, J.J. Chen, and P.C. Yang. 2008. “MicroRNA Signature Predicts Survival and Relapse in Lung Cancer.” Cancer Cell 13:48 – 57. Zeng, D., F. Gao, and D. Lin. 2017. “Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data.” Biometrika 104:505–525. Zeng, D., and D. Lin. 2020. “Maximum likelihood estimation for semiparametric regression models with panel count data.” Biometrika 108:947–963. Zeng, D., and D.Y. Lin. 2006. “Efficient estimation of semiparametric transformation models for counting processes.” Biometrika 93:627–640. 126 Zeng, D., L. Mao, and D.Y. Lin. 2016. “Maximum likelihood estimation for semiparametric transformation models with interval-censored data.” Biometrika 103:253–271. 127