ARID1A MUTANT PATHOGENESIS OF THE ENDOMETRIAL EPITHELIUM By Jake Jordan Reske A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Genetics and Genome Sciences—Doctor of Philosophy 2021 PUBLIC ABSTRACT ARID1A MUTANT PATHOGENESIS OF THE ENDOMETRIAL EPITHELIUM By Jake Jordan Reske Women’s health diseases represent an understudied, widespread medical concern with historically limited treatment options. Diseases of the endometrium, the innermost lining of the uterus, are a highly prevalent public burden. Endometriosis occurs in 1 in 10 women, and endometrial cancer is the most common gynecologic malignancy in the United States. Recent advances have revealed recurrent genetic causes of endometrial diseases, including gene mutations known to play a role in cancer development. ARID1A is one such gene that is commonly mutated in endometrial diseases, and it encodes a protein involved in regulating DNA packaging and activity in the cell nucleus within a large complex known as SWI/SNF. The focus of this dissertation is to improve our understanding of how ARID1A mutations promote endometrial diseases at multiple biological levels, with a particular focus on how disrupted chromatin regulation affects physiologically relevant gene expression. In these works, genetic engineering techniques are leveraged in mice and human cell-based models supported by public and clinical data to establish the consequences of ARID1A mutations in the endometrium and how they relate to other common genetic alterations in these diseases. These studies have revealed that ARID1A is a tumor suppressor in the endometrial epithelium, such that ARID1A loss drives cellular invasion into nearby tissue. ARID1A mutations also promote invasive metastasis and squamous metaplasia in the context of aggressive TP53 mutations. At the level of chromatin, ARID1A and SWI/SNF directly regulate endometrial epithelial identity genes through both promoter and distal enhancer chromatin interactions. Mechanistically, ARID1A mutant invasion is driven by cell identity control regions known as super-enhancers that become hyperactivated, which can be reversed pharmacologically. Moreover, ARID1A physically and genomically interacts with other nuclear chromatin regulators to govern gene activation states through variant histone regulation. These works have contributed in multiple aspects toward deciphering how ARID1A mutations promote disease in the endometrium, including pre-clinical support for using epigenetic therapies to treat invasive ARID1A mutant endometrial conditions. Future efforts will aim to further identify and understand molecular and biochemical mechanisms linking ARID1A and SWI/SNF chromatin remodeling activity to regulation of gene expression. Ongoing work seeks to explain roles of ARID1A and SWI/SNF epigenetic regulation in normal physiological processes of the endometrium, such as hormone signaling across the menstrual cycle. ABSTRACT ARID1A MUTANT PATHOGENESIS OF THE ENDOMETRIAL EPITHELIUM By Jake Jordan Reske Subunits within the mammalian SWI/SNF chromatin remodeling complex are prone to mutation in human diseases such as cancer. The ARID1A (AT-rich interactive domain 1A; BAF250A) subunit serves as a large scaffold and DNA-binding module of certain SWI/SNF complexes, and deleterious ARID1A mutations are frequently observed in pathologies of the endometrial epithelium, including endometriosis, endometrial hyperplasia, and endometrial carcinoma. This dissertation aims to further our understanding of ARID1A mutant pathologies of the endometrium, spanning genetic interactions involving ARID1A and other disease-associated mutations, physiological consequences to ARID1A deficiency, cellular and molecular mechanisms of ARID1A mutant pathogenesis, chromatin and transcriptional alterations directly resulting from disruption of ARID1A regulation, and biochemical and genomic interactions interdependent upon normal ARID1A-SWI/SNF activity. In Chapter 1, endometrial pathophysiology and the role of ARID1A in chromatin regulation are reviewed as an introduction to the present works. In Chapter 2, a novel genetic mouse model displaying invasive endometrial hyperplasia is established through ARID1A haploinsufficiency in the presence of an oncogenic PI3-kinase pathway mutation specifically in the endometrial epithelium. In this model, transcriptome expression and genome-wide chromatin accessibility measurements indicate that ARID1A loss alters gene promoter chromatin activity leading to epithelial-to-mesenchymal transition and collective invasion in vivo. In Chapter 3, these in vivo genome-wide chromatin accessibility measurements are used to demonstrate that the choice of ATAC-seq normalization method can significantly alter biological interpretation, and analytical strategies to direct genomic data interpretation are explored. In Chapter 4, a genetic interaction between ARID1A and tumor suppressor TP53 in endometrial cancer is investigated, and p53 signaling activation is indicated as a hallmark of ARID1A mutant tumors through disruption of ARID1A regulation at p53 target gene chromatin. In Chapter 5, functions of BRG1 (SMARCA4), a SWI/SNF catalytic subunit, are investigated and contrasted with that of ARID1A in the endometrial epithelium. BRG1 loss in vivo promotes spontaneous translocation of endometrial glands to the uterine myometrium, akin to adenomyosis. In Chapter 6, a genome-wide chromatin state map is constructed from epigenomic assays following the effects of ARID1A loss, and ARID1A is observed to regulate highly active enhancer-like chromatin regions. Unexpectedly, ARID1A oppositely regulates typical vs. super- enhancers, whereby ARID1A loss causes H3K27-hyperacetylation and hyperactivation of the SERPINE1 super-enhancer loci that is required for ARID1A loss-driven invasion and dependent on P300 histone acetyltransferase activity. Finally, in Chapter 7, genome-wide assays reveal that ARID1A is required for maintenance of histone variant H3.3 in active chromatin. Mechanistically, ARID1A physically interacts with H3.3-interacting remodeler CHD4 (NuRD) associated with H3.3 maintenance, and this H3.3 chromatin regulation is further guided by histone reader ZMYND8 to specify repression of hyperactivation at H4(K16)ac+ super-enhancers. Major conclusions and future directions of these studies are summarized in Chapter 8. Altogether, these works have elucidated numerous aspects of ARID1A biology from pathophysiology to genetics to chromatin and transcriptional regulatory mechanisms. Further efforts will aim to explain how ARID1A and SWI/SNF regulate other context-specific chromatin through alternative co-factor interactions. Similar functional genomic experimental frameworks will be applied to demonstrate how these processes are governed during normal endometrial physiology. Copyright by JAKE JORDAN RESKE 2021 ACKNOWLEDGMENTS First and foremost, I would like to express my gratitude and appreciation to my mentor, Dr. Ron Chandler, for his utmost and earnest guidance, support, and willingness to pursue new and exciting areas of epigenetic regulation related to uterine pathophysiology. Ron has been a terrific advisor over my graduate career, and I would not be nearly the scientist I am today without his mentorship. As I prepared to enter graduate school, Ron was a new faculty member in our satellite biomedical campus, and he was extremely welcoming to me as the first rotating graduate student entering his laboratory. I felt a sincere connection from our first interactions, and I knew that we would team well together for a few, productive years. Ron has been incredibly supportive in encouraging me to explore and leverage laboratory techniques and ask questions of my interest, such as substantial use of epigenome-wide assays and pursuing biochemical and genetic interactions. Ron also secured funding for all experimentation described in this dissertation— provided by the Mary Kay Foundation, the Ovarian Cancer Research Alliance, and the National Institutes of Health (NIH) National Institute of Child Health & Human Development (NICHD)— and I thank these funding agencies for their contributions. In addition, Ron has been a terrific supporter of career growth and helped me to identify my path forward following graduate training, including allowing me to participate in two commercial internships during my short time in the lab. I will always cherish this time I spent training under Dr. Chandler which has helped me pursue my scientific career goals in countless aspects. I would also like to acknowledge and thank Dr. Mike Wilson, a postdoctoral fellow in the Chandler laboratory, and my colleague across nearly all presented studies. Mike entered the lab roughly a year before I joined, and he and Ron established most of our cell-based assays that I v leveraged in the works herein as well as the mouse endometrial epithelial cell isolation protocol. Mike and I have co-authored all presented studies, including equal contribution on three published works, and he has been fair and an excellent laboratory instructor, research mentor, and friend throughout my graduate studies. Jeanne Holladay was a research technician that also played an integral role to much of the mouse work and general laboratory maintenance through my first few years in the lab, and I thank her for her persevering assistance. Subechhya Neupane was a graduate intern in the lab around when I was preparing for my comprehensive exam, and she also assisted with various experiments. I would also like to thank all the rotating graduate students, medical interns, and undergraduates who have come and gone in the lab that helped contribute to various aspects of these and other studies. Additionally, I am indebted to all the scientific experts, faculty, staff, impersonal mentors, fellow trainees, and friends in the Department of Obstetrics, Gynecology and Reproductive Biology (OBGYN) at Michigan State University (MSU) for their camaraderie and support, namely Dr. Rick Leach, Dr. Jose Teixeira, Dr. Asgi Fazleabas, Dr. John Risinger, Dr. Niraj Joshi, Harmony Van Valkenberg, Amanda Sterling, and Jenna French among many others. My time in Dr. Chandler’s laboratory was funded by an OBGYN departmental fellowship, and I am extremely thankful for the opportunity to focus solely on research during my time with the department. This dissertation would not be without the mentorship I have received from my graduate guidance committee. In addition to Dr. Chandler, I would like to personally thank Dr. Jose Teixeira, Dr. Jianrong Wang, Dr. Tim Triche, Jr. (Van Andel Institute), and Dr. David Arnosti (committee chair). The individual and combined expertise of these great scientists have contributed to strengthening each area of technical training I received in physiology, cellular and molecular biology, genetics and transcription, computation, and statistics. I would also like to thank my vi previous mentors at Michigan State University, Dr. Hua Xiao, Dr. Erik Martinez-Hackert, and Dr. Poorna Viswanathan, for aiding my first steps into biomedical research and encouraging me to pursue graduate studies. I sincerely thank Dr. Cathy Ernst, the director of the Genetics and Genome Sciences (GGS) program, who has been a tremendously supportive advisor and continually accommodating to remote interactions and coursework during my graduate studies. Cathy and the GGS program have financially supported my training through various assistance over the years. Alaina Burghardt has also been an instrumental GGS administrator and supporter during my time with the program. I would also like to thank Dr. John LaPres and the BioMolecular Science umbrella graduate program—my vessel into graduate studies at Michigan State University—for funding my first year and generally facilitating satellite studies. Outside of Michigan State University, I would like to thank my colleagues and the core services at the Van Andel Research Institute (VARI) in Grand Rapids, MI, located down the street from our MSU biomedical research satellite campus. I have shared my research with the Department of Epigenetics at VARI on numerous occasions during my graduate studies, and I am truly thankful to have a local team of epigenetic experts to discuss our work. All sequencing data presented in this dissertation were generated in collaboration with the Van Andel Genomics Core, and I would like to extend my immense thanks to Marie Adams and her team for continued support. I would also like to acknowledge Dr. Galen Hostetter and the Van Andel Pathology and Biorepository core for their routine assistance with histopathology. Finally, I thank my family, friends, and partner for supporting me over these past years. I would not be here today without this gracious support network. Thank you for standing behind me throughout this journey. vii TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... xiv LIST OF FIGURES ....................................................................................................................... xv KEY TO ABBREVIATIONS ...................................................................................................... xxi CHAPTER 1 INTRODUCTION ................................................................................................... 1 1.1 Uterine endometrium ..................................................................................................... 1 1.1.1 Menstruation and pregnancy ............................................................................. 3 1.1.2 Pathologies ........................................................................................................ 6 1.1.2.1 Endometriosis ........................................................................................... 7 1.1.2.2 Adenomyosis ............................................................................................ 9 1.1.2.3 Hyperplasia ............................................................................................. 11 1.1.2.4 Carcinoma............................................................................................... 12 1.1.3 Disease genetics............................................................................................... 15 1.1.4 ARID1A mutant endometrial pathologies ....................................................... 20 1.2 ARID1A and chromatin regulation ............................................................................. 21 1.2.1 ARID1A and SWI/SNF chromatin remodeler complexes .............................. 22 1.2.2 Context-specific roles of ARID1A and SWI/SNF .......................................... 24 1.2.3 ARID1A alterations in disease ........................................................................ 27 1.2.4 ARID1A mutant disease mechanisms ............................................................. 29 CHAPTER 2 ARID1A AND PI3-KINASE PATHWAY MUTATIONS IN THE ENDOMETRIUM DRIVE EPITHELIAL TRANSDIFFERENTIATION AND COLLECTIVE INVASION .................................................................................. 33 2.1 Abstract........................................................................................................................ 33 2.2 Introduction ................................................................................................................. 34 2.3 Results ......................................................................................................................... 34 2.3.1 ARID1A is haploinsufficient in the endometrial epithelium .......................... 34 2.3.2 Mutant endometrial epithelium show hallmarks of epithelial-to-mesenchymal transition .......................................................................................................... 40 2.3.3 Mouse gene signature identifies invasive patient population .......................... 44 2.3.4 ARID1A loss increases promoter accessibility in vivo ................................... 46 2.3.5 ARID1A functionally binds gene promoters ................................................... 50 2.3.6 ARID1A loss promotes mesenchymal phenotype ........................................... 55 2.3.7 ARID1A loss and PIK3CAH1047R promote invasive phenotypes..................... 59 2.4 Discussion.................................................................................................................... 62 2.5 Methods ....................................................................................................................... 63 2.5.1 Mice ................................................................................................................. 63 2.5.2 Cell lines .......................................................................................................... 63 2.5.3 Histology and immunohistochemistry ............................................................. 64 2.5.4 Immunofluorescence ....................................................................................... 65 viii 2.5.5 Microscopy and imaging ................................................................................. 66 2.5.6 Cell sorting ...................................................................................................... 66 2.5.7 RNA isolation and qRT-PCR .......................................................................... 66 2.5.8 RNA-seq .......................................................................................................... 67 2.5.9 RNA-seq analysis ............................................................................................ 68 2.5.10 ATAC-seq........................................................................................................ 69 2.5.11 ATAC-seq analysis .......................................................................................... 69 2.5.12 Analysis of TCGA-UCEC data ....................................................................... 72 2.5.13 Bioinformatics and statistics............................................................................ 73 2.5.14 Transfection of 12Z cells with siRNA and plasmid DNA .............................. 73 2.5.15 Generation of lentiviral shRNA particles ........................................................ 74 2.5.16 Migration assay ............................................................................................... 74 2.5.17 Invasion assay .................................................................................................. 75 2.5.18 Western blotting .............................................................................................. 76 2.5.19 Chromatin immunoprecipitation ..................................................................... 76 2.5.20 Chromatin immunoprecipitation sequencing (ChIP-seq) ................................ 77 2.5.21 ChIP-seq analysis ............................................................................................ 78 2.5.22 Co-immunoprecipitation (co-IP) ..................................................................... 78 2.5.23 Co-IP followed by mass spectrometry ............................................................ 79 2.5.24 Mass spectrometry analysis ............................................................................. 80 2.6 Data availability........................................................................................................... 80 2.7 Acknowledgments ....................................................................................................... 81 CHAPTER 3 ATAC-SEQ NORMALIZATION METHOD CAN SIGNIFICANTLY AFFECT DIFFERENTIAL ACCESSIBILITY ANALYSIS AND INTERPRETATION ... 82 3.1 Abstract........................................................................................................................ 82 3.2 Introduction ................................................................................................................. 83 3.3 Results ......................................................................................................................... 86 3.3.1 Comparison of 8 analytical approaches to calculate ATAC-seq differential accessibility ..................................................................................................... 86 3.3.2 Choice of ATAC-seq analytical approach is a key step in determining differential chromatin accessibility ................................................................. 89 3.3.3 Temporal chromatin accessibility measurements in yeast also display normalization bias ........................................................................................... 94 3.3.4 Generalized ATAC-seq workflow for differential chromatin accessibility analysis ............................................................................................................ 97 3.3.5 Proposed workflow effectively retains ATAC-seq peak calls in an independent data set ...................................................................................... 101 3.3.6 csaw differential accessibility workflow permits testing of multiple normalization methods .................................................................................. 104 3.4 Discussion.................................................................................................................. 107 3.5 Methods ..................................................................................................................... 112 3.5.1 ATAC-seq and differential accessibility analysis ......................................... 112 3.5.2 RNA-seq analysis .......................................................................................... 112 3.5.3 GM12878 gene expression microarray analysis............................................ 113 3.5.4 Bioinformatics and statistics.......................................................................... 113 ix 3.6 Data availability......................................................................................................... 114 3.7 Acknowledgments ..................................................................................................... 114 CHAPTER 4 CO-EXISTING TP53 AND ARID1A MUTATIONS PROMOTE AGGRESSIVE ENDOMETRIAL TUMORIGENESIS ............................................................... 115 4.1 Abstract...................................................................................................................... 115 4.2 Introduction ............................................................................................................... 116 4.3 Results ....................................................................................................................... 118 4.3.1 TP53 and ARID1A mutations rarely co-occur in endometrial cancer .......... 118 4.3.2 TP53 loss in the presence of PIK3CAH1047R drives hyperplasia and endometrial intraepithelial carcinoma ........................................................... 121 4.3.3 Endometrial phenotypes driven by TP53 or ARID1A loss display overlapping and distinct gene expression signatures ......................................................... 124 4.3.4 Gene expression programs in mouse models reflect human tumor genetics . 129 4.3.5 ARID1A mutant tumors display p53 pathway activation ............................. 132 4.3.6 p53 pathway target genes are directly regulated by ARID1A ....................... 135 4.3.7 Simultaneous TP53 and ARID1A loss promotes aggressive tumorigenesis . 139 4.3.8 ARID1A loss results in ATF3 activation and squamous differentiation ....... 142 4.4 Discussion.................................................................................................................. 147 4.5 Methods ..................................................................................................................... 149 4.5.1 Mice and animal husbandry........................................................................... 149 4.5.2 Histology and immunohistochemistry ........................................................... 150 4.5.3 Endometrial epithelial cell isolation and RNA-seq ....................................... 151 4.5.4 Cleavage Under Targets and Release Using Nuclease (CUT&RUN) ........... 151 4.5.5 Construction and sequencing of CUT&RUN libraries .................................. 152 4.5.6 CUT&RUN bioinformatic analysis ............................................................... 153 4.5.7 Clinical and public cancer data analysis ........................................................ 154 4.5.8 Gene set enrichment analysis ........................................................................ 155 4.5.9 Bioinformatics and statistics.......................................................................... 155 4.6 Data availability......................................................................................................... 156 4.7 Acknowledgments ..................................................................................................... 156 CHAPTER 5 SWI/SNF INACTIVATION IN THE ENDOMETRIAL EPITHELIUM LEADS TO LOSS OF EPITHELIAL INTEGRITY ......................................................... 157 5.1 Abstract...................................................................................................................... 157 5.2 Introduction ............................................................................................................... 158 5.3 Results ....................................................................................................................... 159 5.3.1 BRG1 subunit loss in endometrial epithelial cells results in widespread transcriptional changes .................................................................................. 159 5.3.2 Promoter chromatin interaction patterns underlie transcriptional contributions of SWI/SNF subunits ..................................................................................... 163 5.3.3 BRG1 and ARID1A promote transcription at epithelial identity genes ........ 167 5.3.4 SWI/SNF functions at distal open chromatin regions to regulate epithelial identity genes ................................................................................................. 169 5.3.5 BRG1 loss in the mouse endometrial epithelium promotes gland translocation to the uterine myometrium ............................................................................ 174 x 5.3.6 Transcriptomic analysis of BRG1-null endometrial epithelium reveals actin cytoskeletal and anchoring junction defects .................................................. 178 5.3.7 SWI/SNF subunits commonly regulate epithelial cell adhesion and junction programs ........................................................................................................ 180 5.4 Discussion.................................................................................................................. 184 5.5 Methods ..................................................................................................................... 185 5.5.1 Mice ............................................................................................................... 185 5.5.2 Cell lines ........................................................................................................ 185 5.5.3 Transfection of 12Z cells with siRNA........................................................... 185 5.5.4 Cell growth assay .......................................................................................... 186 5.5.5 Histology and immunohistochemistry ........................................................... 186 5.5.6 Cell sorting .................................................................................................... 187 5.5.7 RNA isolation ................................................................................................ 187 5.5.8 Construction and sequencing of directional mRNA-seq libraries ................. 188 5.5.9 RNA-seq analysis .......................................................................................... 188 5.5.10 Chromatin immunoprecipitation ................................................................... 189 5.5.11 Construction and sequencing of ChIP-seq libraries ...................................... 190 5.5.12 ChIP-seq analysis .......................................................................................... 190 5.5.13 Western blotting ............................................................................................ 191 5.5.14 Bioinformatics and statistics.......................................................................... 191 5.6 Data availability......................................................................................................... 193 5.7 Acknowledgments ..................................................................................................... 193 CHAPTER 6 ARID1A MUTATIONS PROMOTE P300-DEPENDENT ENDOMETRIAL INVASION THROUGH SUPER-ENHANCER HYPERACETYLATION ...... 194 6.1 Abstract...................................................................................................................... 194 6.2 Introduction ............................................................................................................... 195 6.3 Results ....................................................................................................................... 197 6.3.1 ARID1A co-localizes with H3K27ac and is associated with super-enhancers ....................................................................................................................... 197 6.3.2 ARID1A prevents super-enhancer hyperacetylation ..................................... 201 6.3.3 ARID1A and P300 co-occupy highly active super-enhancers ...................... 205 6.3.4 P300 histone acetyltransferase activity is required for ARID1A mutant cell invasion.......................................................................................................... 209 6.3.5 P300 HAT inhibition reverses H3K27 hyperacetylation at a subset of super- enhancers in ARID1A-deficient endometrial cells........................................ 214 6.3.6 SERPINE1 promotes ARID1A mutant cell invasion .................................... 220 6.4 Discussion.................................................................................................................. 224 6.5 Methods ..................................................................................................................... 225 6.5.1 Mouse care, use, and genotyping .................................................................. 225 6.5.2 Cell lines ........................................................................................................ 226 6.5.3 Histology and immunohistochemistry ........................................................... 226 6.5.4 Transfections ................................................................................................. 228 6.5.5 Generation and use of lentiviral shRNA particles ......................................... 229 6.5.6 Histone extraction .......................................................................................... 229 6.5.7 Western blotting ............................................................................................ 230 xi 6.5.8 Transwell invasion assay ............................................................................... 231 6.5.9 Matrigel viability assay ................................................................................. 231 6.5.10 Migration assay ............................................................................................. 232 6.5.11 Viability assay ............................................................................................... 232 6.5.12 Cell growth assay .......................................................................................... 233 6.5.13 Cell suspension Caspase-Glo assay ............................................................... 233 6.5.14 Annexin V assay ............................................................................................ 233 6.5.15 Cell cycle assay ............................................................................................. 234 6.5.16 Construction and sequencing of directional mRNA-seq libraries ................. 234 6.5.17 Chromatin immunoprecipitation ................................................................... 235 6.5.18 Construction and sequencing of ChIP-seq libraries ...................................... 236 6.5.19 Cleavage Under Targets and Release Using Nuclease (CUT&RUN)........... 237 6.5.20 Construction and sequencing of CUT&RUN libraries .................................. 238 6.5.21 RNA-seq analysis .......................................................................................... 239 6.5.22 ChIP-seq analysis .......................................................................................... 240 6.5.23 CUT&RUN analysis ...................................................................................... 242 6.5.24 Chromatin state analysis ................................................................................ 243 6.5.25 Bioinformatics and statistics.......................................................................... 244 6.6 Data availability......................................................................................................... 245 6.7 Acknowledgments ..................................................................................................... 246 CHAPTER 7 ARID1A MAINTAINS TRANSCRIPTIONALLY REPRESSIVE H3.3 THROUGH CHD4-ZMYND8 CHROMATIN INTERACTIONS ..................... 247 7.1 Abstract...................................................................................................................... 247 7.2 Introduction ............................................................................................................... 248 7.3 Results ....................................................................................................................... 249 7.3.1 ARID1A mutations in cancer are associated with slight loss of histone H3.3 ....................................................................................................................... 249 7.3.2 ARID1A regulates H3.3+ active chromatin .................................................. 251 7.3.3 ARID1A chromatin interactions maintain H3.3 ............................................ 255 7.3.4 H3.3 depletion phenocopies transcriptional effects of ARID1A loss ........... 258 7.3.5 ARID1A interacts with CHD4 to co-regulate H3.3 with ZMYND8............. 262 7.3.6 ARID1A-CHD4-ZMYND8 co-repress hyperactivation of H3.3+ super- enhancers ....................................................................................................... 267 7.3.7 ZMYND8 specifies ARID1A-CHD4 chromatin repression toward H4(K16)ac ....................................................................................................................... 269 7.3.8 ARID1A-H3.3 repressed chromatin targets are aberrantly activated in human endometriomas .............................................................................................. 275 7.4 Discussion.................................................................................................................. 277 7.5 Methods ..................................................................................................................... 278 7.5.1 Cell culture, siRNA transfections, and lentiviral shRNA particle usage....... 278 7.5.2 Cell cycle analysis ......................................................................................... 278 7.5.3 Histone extraction .......................................................................................... 279 7.5.4 Co-immunoprecipitation................................................................................ 279 7.5.5 Density sedimentation via glycerol gradient ultracentrifugation .................. 280 7.5.6 Immunoblotting ............................................................................................. 281 xii 7.5.7 mRNA-seq and analysis ................................................................................ 282 7.5.8 ChIP-seq and analysis.................................................................................... 283 7.5.9 Chromatin state modeling and optimization.................................................. 286 7.5.10 Histone peptide arrays ................................................................................... 287 7.5.11 Bioinformatics and statistics.......................................................................... 288 7.6 Data availability......................................................................................................... 288 7.7 Acknowledgments ..................................................................................................... 288 CHAPTER 8 CONCLUSION ................................................................................................... 289 8.1 Summary ................................................................................................................... 289 8.2 ARID1A as a regulator of the endometrial epithelium ............................................. 294 8.3 ARID1A as an endometrial tumor suppressor........................................................... 296 8.4 Relationship between ARID1A and other cancer-associated genes .......................... 299 8.4.1 PIK3CA.......................................................................................................... 299 8.4.2 TP53 .............................................................................................................. 301 8.4.3 SMARCA4 ...................................................................................................... 304 8.5 ARID1A as a mediator of epithelial identity and invasion ....................................... 305 8.6 ARID1A as a transcriptional regulator ...................................................................... 309 8.7 ARID1A as a chromatin regulator ............................................................................. 312 8.7.1 Repression of chromatin at EMT and invasion genes ................................... 312 8.7.2 Association with transcription factor networks ............................................. 314 8.7.3 Relationship to other SWI/SNF subunits ...................................................... 316 8.7.4 Governance of super-enhancer activation states ........................................... 317 8.7.5 Collaboration with other chromatin regulators.............................................. 319 8.8 Future directions ........................................................................................................ 323 8.8.1 Elucidating SWI/SNF-NuRD interdependence on H3.3 ............................... 323 8.8.2 ARID1A-SWI/SNF regulation of steroid hormone signaling ....................... 326 APPENDICES ............................................................................................................................. 332 APPENDIX A Supplementary material for Chapter 2 .................................................... 333 APPENDIX B Supplementary material for Chapter 3 .................................................... 347 APPENDIX C Supplementary material for Chapter 4 .................................................... 374 APPENDIX D Supplementary material for Chapter 5 .................................................... 396 APPENDIX E Supplementary material for Chapter 6 .................................................... 402 APPENDIX F Supplementary material for Chapter 7 .................................................... 413 REFERENCES ............................................................................................................................ 424 xiii LIST OF TABLES Table 2.1 EMT genes functionally regulated by ARID1A chromatin interactions ............... 54 Table 3.1 Description of 8 approaches used to calculate ATAC-seq differential accessibility ............................................................................................................................... 88 Table A.1 Genes exhibiting direct activation by ARID1A in 12Z cells .............................. 333 Table A.2 Genes exhibiting direct repression by ARID1A in 12Z cells .............................. 335 Table C.1 Genes with ARID1A in vivo promoter binding detected in adult mouse endometrial epithelia ........................................................................................... 374 Table D.1 Evidence for possible driver SMARCA4 mutations within uterine endometrial cancer ................................................................................................................... 397 xiv LIST OF FIGURES Figure 1.1 Frequently mutated genes in endometrial cancer .................................................. 17 Figure 1.2 SWI/SNF mutation rates across human cancer ..................................................... 28 Figure 2.1 Development of genetic mouse models representing an allelic series of ARID1A mutations in the endometrial epithelium ............................................................... 38 Figure 2.2 RNA-seq analysis of EPCAM-positive endometrial epithelial cells isolated via magnetic sorting .................................................................................................... 42 Figure 2.3 LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl gene signature correlates with invasive patient gene expression.......................................................................................... 45 Figure 2.4 ATAC-seq analysis of differentially accessible chromatin in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelia ............................................ 48 Figure 2.5 ARID1A chromatin binding is associated with accessibility and differential gene expression driven by ARID1A loss in a human endometrial epithelial cell line .. 52 Figure 2.6 PIK3CAH1047R antagonizes ARID1A loss-induced mesenchymal phenotypes ..... 57 Figure 2.7 ARID1A loss and PIK3CAH1047R promote myometrial invasion in vivo and migration in vitro ................................................................................................... 60 Figure 3.1 DA distributions from the same ATAC-seq data set analyzed by 8 different DA approaches ............................................................................................................. 90 Figure 3.2 Output comparison of approaches for computing differential accessibility .......... 93 Figure 3.3 Comprehensive DA analysis and gene expression comparisons of yeast osmotic time-course series .................................................................................................. 96 Figure 3.4 Generalized ATAC-seq data processing workflow intended for comparative analysis ................................................................................................................ 100 Figure 3.5 Conservative and relevant peak calling by proposed framework exemplified on Buenrostro et al. data ........................................................................................... 103 Figure 3.6 csaw workflow for multiple differential accessibility analyses in R ................... 106 Figure 4.1 TP53 and ARID1A mutations rarely co-occur in endometrial cancer ................ 120 xv Figure 4.2 TP53 loss with oncogenic PIK3CA activation results in endometrial intraepithelial carcinoma ..................................................................................... 123 Figure 4.3 Endometrial epithelial TP53 and ARID1A loss results in overlapping and distinct gene expression programs ................................................................................... 127 Figure 4.4 Pathway analysis of TP53 and ARID1A regulated expression programs in human disease and mouse models ................................................................................... 131 Figure 4.5 ARID1A mutation is associated with p53 pathway activation ............................ 134 Figure 4.6 Analysis of ARID1A chromatin interactions in mouse endometrial epithelia in vivo ...................................................................................................................... 137 Figure 4.7 Co-existing TP53 and ARID1A mutations promote aggressive endometrial tumorigenesis ....................................................................................................... 141 Figure 4.8 ARID1A loss relieves Atf3 repression associated with squamous differentiation ............................................................................................................................. 145 Figure 4.9 Model of independent and co-existing TP53 and ARID1A mutations in endometrial epithelia ........................................................................................... 146 Figure 5.1 BRG1 and ARID1A loss in endometrial epithelial cells lead to widespread differences in transcriptional regulation .............................................................. 161 Figure 5.2 BRG1 and ARID1A activate highly regulated gene promoters .......................... 165 Figure 5.3 SWI/SNF directly promotes transcription at epithelial identity genes ................ 168 Figure 5.4 SWI/SNF activity at distal sites also regulates expression of epithelial identity genes .................................................................................................................... 172 Figure 5.5 Genetically engineered mice harboring BRG1 loss in the endometrial epithelium develop adenomyosis-like phenotypes ................................................................ 176 Figure 5.6 LtfCre0/+; Brg1fl/fl endometrial epithelial cells display loss of actin cytoskeletal and cellular junction programs ............................................................................ 179 Figure 5.7 Endometrial SWI/SNF mutant mouse models converge on loss of cellular adhesion and junction .......................................................................................... 182 Figure 6.1 ARID1A is associated with highly active regulatory elements marked by H3K27ac .............................................................................................................. 199 Figure 6.2 ARID1A prevents H3K27-hyperacetylation at super-enhancers ........................ 203 xvi Figure 6.3 P300 and ARID1A co-regulate H3K27ac at highly active super-enhancers....... 207 Figure 6.4 P300 promotes invasion and survival of ARID1A mutant endometrial epithelia ............................................................................................................................. 212 Figure 6.5 ARID1A antagonizes P300 HAT activity at a subset of active super-enhancers 217 Figure 6.6 Inhibition of P300 HAT activity reverses the expression of a subset of ARID1A- regulated genes .................................................................................................... 219 Figure 6.7 Hyperactivation of the SERPINE1 super-enhancer promotes ARID1A mutant cell invasion................................................................................................................ 222 Figure 7.1 Histone peptide abundance associated with ARID1A mutation in cancer cell lines ............................................................................................................................. 250 Figure 7.2 Genome-wide analysis of H3.3-ARID1A chromatin co-regulation .................... 254 Figure 7.3 Genome-wide analysis of ARID1A-dependent H3.3 .......................................... 257 Figure 7.4 Transcriptional effects of H3.3 depletion and overlap with ARID1A ................ 260 Figure 7.5 Biochemical and genomic characterization of ARID1A, CHD4, and ZMYND8 chromatin interactions co-regulating H3.3 .......................................................... 265 Figure 7.6 H3.3 enhancer regulation by ARID1A, CHD4, and ZMYND8 .......................... 268 Figure 7.7 ZMYND8-mediated chromatin repression is specified by H4(K16)ac ............... 273 Figure 7.8 Mechanistic gene expression alterations in human endometriomas.................... 276 Figure 8.1 ARID1A also physically interacts with variant histone remodeler P400 ............ 325 Figure 8.2 Hypothetical model of H3.3 chromatin regulation by SWI/SNF, NuRD and P400 ............................................................................................................................. 325 Figure 8.3 12Z ESR1+ cells as a model for estrogen-mediated gene expression analysis ... 327 Figure 8.4 Exploring ARID1A-mediated estrogen response ................................................ 329 Figure 8.5 ARID1A-dependent Hallmark estrogen response genes ..................................... 331 Figure A.1 Phenotypic analysis of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G and lung metastasis...................................... 338 xvii Figure A.2 LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrium shows downregulation of steroid hormone receptors and unfolded protein response proteins .................... 340 Figure A.3 Allelic series of ARID1A loss with (Gt)R26Pik3ca*H1047R in endometrial epithelium displays concordant gene expression and differential chromatin accessibility ... 341 Figure A.4 ARID1A loss induces EMT in mouse endometrium ........................................... 343 Figure A.5 Characterization of 12Z siARID1A ATAC-seq and ARID1A ChIP antibody ... 344 Figure A.6 Differential gene expression as a result of PIK3CAH1047R overexpression ......... 345 Figure B.1 Differential gene expression overlap and FDR thresholding analyses ................ 367 Figure B.2 Negative control DA comparison of two control groups from Schep et al. yeast data ...................................................................................................................... 368 Figure B.3 Extended analysis of DA methods on Schep et al. yeast osmotic stress ATAC-seq time course series ................................................................................................ 369 Figure B.4 Complete statistical analysis of Schep et al. osmotic stress ATAC-seq time series DA methods ......................................................................................................... 370 Figure B.5 Replicated analysis downstream of random subsample seeds for complexity normalization ....................................................................................................... 371 Figure B.6 Effects of library complexity normalization by random subsampling ................ 372 Figure C.1 TP53 mutations in endometrial cancers profiled by TCGA-UCEC .................... 377 Figure C.2 Additional histopathological characterization of TP53/PIK3CA mutant mice ... 378 Figure C.3 EPCAM endometrial epithelial cell purification statistics .................................. 379 Figure C.4 Differences in epithelial-mesenchymal transition following TP53 vs. ARID1A loss ....................................................................................................................... 380 Figure C.5 Pathway analysis of distinct expression programs in TP53- and ARID1A-loss- driven hyperplasia ............................................................................................... 382 Figure C.6 Pathway alterations driven by TP53 and ARID1A mutations are associated with histological subtype ............................................................................................. 383 Figure C.7 GO Biological Process gene set overlaps between mouse and human genotypes ............................................................................................................................. 384 xviii Figure C.8 Differential gene expression analysis of TP53 vs. ARID1A mutant mouse and human samples .................................................................................................... 385 Figure C.9 Extended GSEA results for disease and model genetic comparisons ................. 386 Figure C.10 Enrichment of gene expression alterations at TP53 core transcriptional program genes in mouse models ........................................................................................ 387 Figure C.11 PARADIGM pathway activity associated with ARID1A mutation in UCEC tumors .................................................................................................................. 387 Figure C.12 Extended analysis of ARID1A binding sites in vivo endometrial epithelia ........ 388 Figure C.13 Additional TCGA-UCEC and MSK-IMPACT analysis of TP53/ARID1A co- altered tumors ...................................................................................................... 389 Figure C.14 TP53/ARID1A/PIK3CA mutant mouse survival and marker staining ............... 390 Figure C.15 Representative proliferation and caspase-mediated cell death marker IHC ........ 391 Figure C.16 ARID1A loss-induced ATF3 and TP63 is associated with invasive transcriptional signatures ............................................................................................................. 392 Figure C.17 Further representative marker staining of mutant mouse models and vaginal pseudostratified squamous epithelium ................................................................ 394 Figure D.1 Hallmark pathway enrichment among 12Z siBRG1/siARID1A DGE................ 398 Figure D.2 Transcription factor networks among direct activating SWI/SNF target genes .. 399 Figure D.3 Representative images of individual uterine glands from LtfCre0/+; Brg1fl/+ and LtfCre0/+; Brg1fl/fl mice ........................................................................................ 400 Figure D.4 LtfCre0/+; Brg1fl/fl endometrial epithelial cell purification statistics.................... 401 Figure E.1 Genome-wide chromatin features profiled in human 12Z endometrial epithelial cells ...................................................................................................................... 402 Figure E.2 Enhancer classification and additional differential histone modification analysis ............................................................................................................................. 403 Figure E.3 ARID1A and P300 co-regulation of promoters and gene expression ................. 404 Figure E.4 Additional characterization of P300-deficient phenotypes ................................. 406 Figure E.5 Phenotypic characterization of cells following A-485 treatment ........................ 408 xix Figure E.6 Effects of A-485 treatment on ARID1A and PIK3CA double-mutant 12Z cells 410 Figure E.7 SERPINE1/PAI-1 immunohistochemical staining in endometriosis patient samples ................................................................................................................ 412 Figure F.1 H3.3 analysis in histone wild-type CCLE lines................................................... 413 Figure F.2 Additional ARID1A knockdown differential H3.3 data ..................................... 414 Figure F.3 Additional H3.3 knockdown functional analysis ................................................ 415 Figure F.4 Example locus showing additional chromatin features profiled in 12Z cells...... 416 Figure F.5 Peptide specificity of anti-acetyl H2A.Z (K4/K7) .............................................. 417 Figure F.6 ChromHMM 12-feature chromatin state model optimization ............................. 418 Figure F.7 Chromatin accessibility repressed by ARID1A is associated with H4 acetylation ............................................................................................................................. 420 Figure F.8 siZMYND8 and siCHD4 gene expression analysis ............................................ 421 Figure F.9 H4K16ac enrichment at repressed mechanistic genes ......................................... 423 xx KEY TO ABBREVIATIONS AP-1 Activator protein 1 ARID AT-rich interactive domain ARID1A AT-rich interactive domain 1A; BAF250A; p270 ATAC Assay for Transposase-Accessible Chromatin ATAC-seq ATAC followed by sequencing ATF Activating Transcription Factor BAF BRG1-associated factors; (mammalian) SWI/SNF cBAF Canonical BAF; SWI/SNF-A nBAF Neuron-specific BAF ncBAF Non-canonical BAF npBAF Neural progenitor BAF PBAF Polybromo-associated BAF; SWI/SNF-B BAM Binary SAM file format bp Base pairs (in reference to DNA) BP Biological Process (in reference to the GO gene set collection) BRG1 Brahma/SWI2-related gene 1; SMARCA4 BRM Brahma; SMARCA2 BSA Bovine serum albumin CCLE Cancer Cell Line Encyclopedia ChIP Chromatin immunoprecipitation ChIP-seq ChIP followed by sequencing xxi CN Copy-number CNA Copy-number alteration Co-IP Co-immunoprecipitation CPM Counts per million CRE Cyclic-AMP responsive element; not to be confused with Cre recombinase CUT&RUN Cleavage Under Targets and Release Using Nuclease (followed by sequencing) DA Differential accessibility; differentially accessible DE Differential expression; differentially expressed DGE Differential gene expression E Embryonic day, e.g. E10 E2 Estradiol; estrogen ECM Extracellular matrix EIC Endometrial intraepithelial carcinoma EMT Epithelial-to-mesenchymal transition ER Estrogen receptor; ESR1; also refers to endoplasmic reticulum (e.g. ER stress) ESC Embryonic stem cell FBS Fetal bovine serum FC Fold-change FDR False discovery rate (Benjamini-Hochberg procedure); FDR-adjusted p FE Fold-enrichment (observed/expected ratio) fl Flox; loxP FPKM Fragments per kilobase per million mapped reads FRiP Fraction of reads in peaks xxii GDC Genomic Data Commons GEMM Genetically engineered mouse model GLM Generalized linear model GEO Gene Expression Omnibus GO Gene Ontology GSEA Gene set enrichment analysis HAT Histone acetyltransferase HDAC Histone deacetylase HMM Hidden Markov Model HRP Horseradish peroxidase IACUC Institutional Animal Care and Use Committee iCre Codon-improved Cre recombinase IF Immunofluorescence IHC Immunohistochemistry IP Immunoprecipitation kb Kilobases (in reference to DNA) LOESS Locally estimated scatterplot smoothing log2CPM log2 counts per million; logCPM log2FC log2 fold change; logFC logLR log likelihood ratio LOWESS Locally weighted scatterplot smoothing MNase Micrococcal nuclease MSU Michigan State University xxiii NES Normalized enrichment score NuRD Nucleosome Remodeling and Deacetylase complex; Mi-2 OR Odds ratio ORF Open reading frame p Probability of null hypothesis PBS Phosphate buffered saline RNA Ribonucleic acid eRNA Enhancer RNA mRNA Messenger RNA rRNA Ribosomal RNA RNA-seq RNA sequencing PAI-1 Plasminogen activator inhibitor type 1; also known as SERPINE1 PBS Phosphate-buffered saline PGR Progesterone receptor; PR PI3K Phosphoinositide 3-kinase PR Precision-recall P/S Penicillin/streptomycin RPK Reads per kilobase RPKM Reads per kilobase per million mapped reads rlog Regularized-logarithm (DESeq2) SAM Sequence Alignment/Map file format SD Standard deviation SE Super-enhancer xxiv seq Sequencing; high-throughput sequencing SWI/SNF SWItch/Sucrose Non-Fermentable; also known as BRG1-associated factors / BAF TBS Tris-buffered saline TCGA The Cancer Genome Atlas TE Typical enhancer TF Transcription factor TMM Trimmed mean of M values TSS Transcription start site TTS Transcription termination site; UCEC Uterine Corpus Endometrial Carcinoma, in reference to the TCGA patient cohort uPA Urokinase plasminogen activator UTR Untranslated region VARI Van Andel Research Institute; VAI xxv CHAPTER 1 INTRODUCTION Portions of the text from this chapter were previously published (Wilson et al. 2019; Wilson et al. 2020; Reske et al. 2020). 1.1 Uterine endometrium The mammalian uterus is a female sex organ that functions as the site of fetal gestation for reproduction. The uterus is composed of three layers: 1) the inner endometrium, composed of columnar epithelia and connective tissue that form glands, 2) the middle myometrium, mostly composed of smooth muscle, and 3) the outer, serous perimetrium that interfaces with the abdominopelvic cavity as peritoneum. In humans, the uterus connects to fallopian tubes that extend to the ovaries. For pregnancy to occur, the ovary must release a mature oocyte that travels through the fallopian tubes, is fertilized by a spermatozoon, and adheres to the receptive endometrium for implantation and decidualization. The dynamic endometrium undergoes monthly proliferation, differentiation, and shedding throughout the menstrual cycle in anticipation of pregnancy in reproductive age females (Gellersen and Brosens 2014; Mihm, Gangooly, and Muttukrishna 2011). Developmentally, the female uterus forms from the paramesonephric (Müllerian) duct of the embryonic urogenital tract, along with the oviducts, cervix, and vagina (Mullen and Behringer 2014). Differentiation of the uterus is then specified by a spatially regulated network of homeobox (HOX) genes segmented along the anterior-posterior axis. The adult, competent endometrium can be deconstructed into two layers: the outer, functional layer which receives the embryo and is shed during menstruation, and the 1 inner, basal layer which serves to regenerate the functionalis following menstruation. Two main cellular compartments comprise the endometrium: columnar epithelium and connective stroma. Columnar epithelial cells, sometimes ciliated, line the inner lumen and form glands surrounded by stroma, respectively referred to as luminal and glandular epithelium. Endometrial stroma is a heterogeneous population that consists of stromal fibroblasts, endothelial vasculature, immune cell types (lymphocytes and macrophage), and possibly smooth muscle (Wang et al. 2020). Endometrial stromal cells are known to regulate epithelial growth and differentiation through paracrine signaling (Arnold et al. 2001). The remarkable plasticity of the uterus throughout menstruation and pregnancy inevitably points to regenerative properties. Evidence of endometrial stem or progenitor cell populations are still emerging and incompletely identified but likely exist (Gargett, Schwab, and Deane 2016; Teixeira, Rueda, and Pru 2008). Theories of regenerating cell identities include stomal progenitors, differentiated epithelial cells, or rare subsets with unique markers or functions. Steroid hormone receptors and responsiveness may be associated with stem-like properties (Chan and Gargett 2006). Wnt target Axin2 was recently described as a marker of self-renewing and regenerating endometrial epithelial cells in the mouse (Syed et al. 2020). Contributions of extrauterine stem cells to endometrial regeneration have also been supported, such as bone marrow-derived mesenchymal stem cells (Taylor 2004). Endometrial remodeling is largely governed by steroid hormone signaling, namely estrogen and progesterone (Barbieri 2014). It is known that the mitogenic effects of estradiol on the endometrial epithelium are not cell-autonomous but rather mediated by stromal estrogen receptor signaling (Cooke et al. 1997). Nonetheless, estrogen receptor activity in the endometrial epithelium is responsible for some proliferation-independent responses related to survival 2 (Winuthayanon et al. 2010). There are conflicting reports on differences in hormone receptor activity between the endometrial functionalis and basalis, but estrogen and progesterone receptors may be more highly expressed in the basalis and maintained across the menstrual cycle (Leyendecker et al. 2002; Coppens et al. 1993), supporting regenerative theories. 1.1.1 Menstruation and pregnancy Mammalian sexual reproduction involves complex maternal processes like implantation and placentation which require orchestration of numerous cell types in the ovaries and uterus. To facilitate pregnancy, cyclic morphologic changes are driven by hypothalamic, pituitary, and ovarian steroid hormonal activities that affect ovarian and endometrial compartments and physiology, known as the menstrual cycle (Reed and Carr 2000). Activity first begins with menarche during puberty and ends with menopause. The roughly 28-day human menstrual cycle can be separated into two distinct phases: the follicular or proliferative phase, and the luteal or secretory phase, separated by the event of ovulation. The released mature oocyte complex is then captured by ciliated tubal fimbriae and travels toward the uterus for implantation (Ezzati et al. 2014). Fertilization normally occurs in the fallopian ampulla. If pregnancy or successful embryo implantation does not occur within this period, then menses or menstruation follows, and the cycle is reinitiated. The endocrine, cellular, and physiologic features associated with menstrual cycle phases and implantation will be briefly reviewed with focus on the endometrium (Reed and Carr 2000; Mihm, Gangooly, and Muttukrishna 2011; Gellersen and Brosens 2014; Ezzati et al. 2014; Barbieri 2014; Critchley et al. 2020; Ochoa-Bernal and Fazleabas 2020; Pijnenborg, Vercruysse, and Hanssens 2006). 3 Neuroendocrine signaling between the hypothalamus, pituitary gland, and gonads are responsible for physiologic responses in the ovary and uterus across the menstrual cycle that permit fertility (Barbieri 2014; Plant 2015). Hypothalamic neurons produce pulsatile gonadotropin- releasing hormone and exhibit both positive and negative estradiol feedback regulation. Gonadotropin-releasing hormone signals to the pituitary to release gonadotropins, follicle stimulating hormone and luteinizing hormone, that elicit specific, temporal activities in the ovaries. Follicle stimulating hormone promotes ovarian folliculogenesis and associated estradiol production, and a surge of luteinizing hormone mid-cycle triggers ovulation. Following menses, the follicular or proliferative phase occurs until ovulation. In the ovary, this phase is characterized by ovarian follicle development and selection for ovulation, driven by pituitary follicle stimulating hormone and associated with rising estradiol (estrogen). Feedback from estradiol produced by the dominant follicle initiates a surge of pulsatile luteinizing hormone that ultimately leads to release of the mature oocyte complex, or ovulation, around day 14. Progesterone levels are mostly low during the follicular phase but begin to rise during luteinization, responsible for a mid-cycle surge of follicle stimulating hormone and reduced estradiol levels through negative feedback. In the endometrium, estradiol has mitogenic effects on glandular epithelium that induce some proliferation and pseudostratification, and the endometrial layer begins to thicken. The luteal or secretory phase starts after ovulation. Remaining ovarian granulosa cells from the dominant follicle become luteinized and, along with theca and stroma, form the corpus luteum. The corpus luteum functions to secrete progesterone in order to prepare the endometrium for implantation of the blastocyst. Estrogen is also produced, and some continued luteinizing hormone secretion regulates the corpus luteum. The endometrium becomes thickest during the secretory 4 phase, where marked changes are observed accompanying progesterone and estrogen stimulation associated with differentiation of the glandular epithelial and stromal cells. In the days following ovulation, endometrial glands become dilated and luminal secretions are observed, and stromal cells enlarge, divide, and become edematous (Reed and Carr 2000). Other architectural changes occur that assist implantation and pregnancy, including formation of spiral arteries for enhanced nutrient supply (Pijnenborg, Vercruysse, and Hanssens 2006). Decidualization processes of endometrial stromal cells begin in the mid-secretory phase regardless of implantation status. Around days 20-24 of the cycle, the window of implantation or uterine receptivity is observed, marked by maximal progesterone levels and presence of microvilli on the luminal endometrial epithelial cells. The process of embryo implantation can also be separated into different phases: apposition, attachment, and invasion (Ochoa-Bernal and Fazleabas 2020). Apposition represents the first intercellular contact between the blastocyst and receptive endometrium. Various receptor mediated mechanisms govern the proper orientation and apposition of the implanting embryo. Adhesion molecules and signals are then involved in attaching the blastocyst to the apical surface of luminal endometrial epithelium. The endometrial epithelium is then penetrated, and blastocyst trophoblasts differentiate and invade the endometrial stroma to establish chorionic villi and remodel vasculature for placentation (Knofler and Pollheimer 2013). Decidual reprogramming or differentiation of endometrial stromal cells is a complex, multi-compartmental process—intimately connected to endometrial leukocytes—that is essential for numerous physiological processes like implantation and embryo invasion, vascularization, maternal immune tolerance, and endometrial repair following menstruation (Gellersen and Brosens 2014). 5 If implantation has not occurred after approximately 10 days following ovulation, the corpus luteum regresses leading to reduction of estrogen and progesterone followed by menses. In the endometrium, progesterone withdrawal notably leads to constriction of spiral arterioles resulting in tissue ischemia and hypoxia. Various factors are released from the endometrium that result in myometrial contractions, sloughing or desquamation of degraded endometrial tissue, and bleeding. Immune cells are also involved in the endometrial degradation during menses, as menstruation has been considered at least partially an inflammatory event (Evans and Salamonsen 2012). Repair and regeneration of the endometrial functionalis begins within two days of menstruation, while shedding is still ongoing, and involves multiple stem or progenitor cell types (Critchley et al. 2020; Reed and Carr 2000). It is hypothesized that controlled and possibly plastic transitions between epithelial and mesenchymal phenotypes of endometrial tissue are required for proper menstruation and recovery (Owusu-Akyaw et al. 2019). Menses typically lasts four to six days in women but is variable. Aside from those mentioned, other hormones are involved in various aspects of the menstrual cycle, like androgens and glucocorticoids. The characteristic features of the endometrium accompanying the menstrual cycle also permit histological dating, which is widely practiced in clinical and research settings (Noyes, Hertig, and Rock 2019; Murray et al. 2004). Importantly, most mammals do not menstruate, but instead undergo the similar, estrous cycle that involves resorption of the endometrium (Groothuis et al. 2007). 1.1.2 Pathologies Uterine plasticity and the multiple rounds of tissue regression and regeneration that occur throughout female reproductive years make the endometrium particularly prone to disease (Gargett, Nguyen, and Ye 2012; Syed et al. 2020; Teixeira, Rueda, and Pru 2008). As cyclical tissue breakdown, re-epithelialization, and stromal restoration occurs, the maintenance of proper 6 cell identity is thought to be an important feature of a healthy endometrium (Gellersen and Brosens 2014). Progesterone resistance leading to unopposed estrogen signaling and loss or altered expression of steroid hormone receptors is common in endometrial pathologies (Marquardt et al. 2019; Li et al. 1996; Bulun et al. 2010; Mehasseb et al. 2011). Alterations to normal endometrial physiology can result in numerous conditions, including benign diseases, such as endometrial hyperplasia (Montgomery, Daum, and Dunton 2004), adenomyosis (Maheshwari et al. 2012; Abbott 2017) and endometriosis (Chui, Wang, and Shih 2017; Zondervan et al. 2018; Zondervan, Becker, and Missmer 2020), as well as endometrial cancer (Morice et al. 2016; Sorosky 2012; Bell and Ellenson 2019) and endometriosis-associated ovarian cancer (Kurman and Shih Ie 2016). Endometriosis and the related disease adenomyosis are characterized by translocation of endometrial glands to anatomical regions outside of the endometrium. Obesity is associated with variation in menstrual cycling and is a risk factor for endometrial cancer and other obstetric disorders (Bull et al. 2019). A major underlying theory is that adipose tissue engages in estrogen biosynthesis, and, thereby, obesity leads to increased circulating estrogen levels (Lorincz and Sukumar 2006). Pelvic pain, abnormal uterine bleeding, and subfertility are common symptoms of endometrial pathology. For management, surgical removal of the uterus by hysterectomy is a common procedure to alleviate symptoms or progression of uterine pathologies for electing women, with obvious burdens due to loss of fertility. 1.1.2.1 Endometriosis Endometriosis affects roughly 10% of women and is a leading cause of infertility (Zondervan, Becker, and Missmer 2020). Affected women often experience pelvic pain, abnormal vaginal bleeding, pain during intercourse, constipation, difficulty urinating, and fatigue, but 7 asymptomatic cases are also common. Patients with endometriosis are commonly co-morbid with other gynecologic pathologies like the related disease adenomyosis and leiomyomas (uterine fibroids) as well as gastrointestinal, urinary, immunological, and neurological disorders. Endometriosis is also associated with development of certain cancers, like endometrial, clear cell ovarian, and endometrioid ovarian (Kok et al. 2015; Pearce et al. 2012). The relationship between parity and risk for endometriosis is not as clear as in the case of cancer, because endometriosis is associated with infertility. Hormonal therapies, like oral contraceptives, are widely used as the first line of treatment for pain. Hysterectomy is common for management among electing women, but disease recurrence is known to occur, especially in advanced stage cases (Rizk et al. 2014). Endometriosis is characterized by the growth and spread of abnormal endometrial tissue at sites outside of the eutopic endometrium (Bulun 2009; Giudice and Kao 2004; Zondervan, Becker, and Missmer 2020). Ectopic lesions are frequently observed superficially in the peritoneal cavity and on the ovary (referred to as endometrioma) but can also be deep infiltrating and extrapelvic. When endometrial tissue is found within the myometrium, or there is disruption of the endometrial- myometrial boundary, the condition is referred to as adenomyosis (Abbott 2017). Endometriotic lesions have even been observed as cutaneous on the eyelid (Sharghi et al. 2019). Lesions can have various macroscopic presentations, ranging from black, brown, red, and white coloring, and cystic, adhesions, or fibrotic appearances. Laparoscopic inspection is the standard procedure for diagnosis, with histopathology as a tool for confirmation, because few non-invasive diagnostics like imaging are available or effective (Hsu, Khachikyan, and Stratton 2010). The prominent, historic and supported theory of endometriosis etiology is retrograde menstruation (Sampson 1927). During menstruation, it is theorized that desquamated eutopic endometrial tissue can travel through the fallopian tubes and into the peritoneal cavity (Burney 8 and Giudice 2012). There is thence a chance occurrence for pathological seeding of ectopic lesions. Remarkably, retrograde menstruation occurs in the majority of women (D'Hooghe and Debrock 2002), but endometriosis only affects 1 in 10. Therefore, other factors must contribute to the propensity for ectopic implantation. Numerous pathophysiological factors contribute to endometriosis, including genetic predisposition, estrogen dependence, progesterone resistance, cellular survival, cellular attachment and invasion, and inflammation (Burney and Giudice 2012). There is some reported evidence that endometriosis and related adenomyosis may originate specifically from the basal layer of the endometrium (Leyendecker et al. 2002). Note, as mice do not menstruate, there is not a physiological opportunity for retrograde menstruation of endometrial tissue. Further, a key difference between mice and humans is presence of the ovarian bursa, which encapsulates the ovary and oviduct in mice (Stewart and Behringer 2012), thereby eliminating the opportunity for ectopic peritoneal exposure. Consequently, endometriosis does not occur in mice, so experimental mouse models of endometriosis have relied on explanting endometrial tissue by intraperitoneal injection (Bruner et al. 1997). Recent work from our group has reinforced the idea of a presumed natural physical limitation of endometriosis development in mice by surgically incising the uterotubal junction to expose the oviducts to peritoneum, and resulting ectopic endometriosis lesions formed in the context of a genetically engineered mouse model of invasive endometrial hyperplasia (described in Chapter 2) (Wilson, Holladay, and Chandler 2020). 1.1.2.2 Adenomyosis Adenomyosis is a uterine condition related to endometriosis, such that their association and clinical distinction are controversial (Abbott 2017). Adenomyosis is generally defined by invasion or presence of endometrial tissue within the myometrium and/or disrupted endometrial- 9 myometrial boundary. Disease severity is typically classified by depth and prevalence of invasion. Like endometriosis, affected individuals commonly report pain, abnormal bleeding, and fertility issues. Epidemiological prevalence of adenomyosis is not well established and varies between studies (Yu et al. 2020; Vercellini et al. 1995)—and is confounded by underdiagnosis and often asymptomatic presentation—but it could be more common than endometriosis. Adenomyosis may present with other benign uterine conditions like endometriosis, endometrial hyperplasia, and leiomyomas (Ferenczy 1998). As in endometriosis, adenomyosis may also associate with risk for endometrial and ovarian cancer development (Kok et al. 2015; Yeh et al. 2018). Historically, adenomyosis was only diagnosed following histopathological examination after hysterectomy (Szubert et al. 2021). Ultrasound and magnetic resonance imaging are now effective tools for non- invasive detection of adenomyosis (Szubert et al. 2021). Similar medical options are available in the management of endometriosis and adenomyosis, including symptomatic pain relief with NSAIDs and hormonal therapies. Hysterectomy is a gold standard treatment without a chance of recurrence, if fertility preservation is not desired. Endometrial ablation or resection are less- invasive, generally effective treatment procedures (Taran, Stewart, and Brucker 2013). Multiple theories of adenomyosis pathogenesis exist. Endometrial invagination into the myometrium is a widespread and historic theory (Ferenczy 1998). Abnormal invasion of endometrial tissue through the endomyometrial junction is also hypothesized (Bergeron, Amant, and Ferenczy 2006). Normal tissue injury and repair processes that accompany menstrual cycling may promote infiltration of endometrium into myometrium as another proposed mechanism (Leyendecker, Wildt, and Mall 2009). Others theorize that endometriosis and adenomyosis may originate from pluripotent Müllerian remnants or adult stem cells that become metaplastic, which could explain the presence of extrauterine lesions e.g. rectovaginally (Nisolle and Donnez 1997; 10 Garcia-Solares et al. 2018). Curiously, myometrial smooth muscle cells near adenomyotic foci show no abnormalities (Ferenczy 1998). 1.1.2.3 Hyperplasia Endometrial hyperplasia presents in multiple forms and can be a precursor to carcinogenic development in some cases. Endometrial hyperplasia is defined by proliferation of endometrial glands with a decrease in stromal presence, and it is classified as either simple or complex and by presence or absence of cytologic atypia (Montgomery, Daum, and Dunton 2004). Glands tend to crowd and can also be dilated or cystic or have irregular outlines (Kurman, Kaminski, and Norris 1985). Like other endometrial pathologies, estimating frequency of occurrence is difficult, but hyperplasia may affect around 1% of individuals (Reed et al. 2009). Abnormal uterine bleeding is the most common symptom of endometrial hyperplasia (Montgomery, Daum, and Dunton 2004). As in endometrial cancer, unopposed estrogen treatment in postmenopausal women is highly associated with risk of endometrial hyperplasia (Kurman et al. 2000). Complex atypical hyperplasia, marked by complex glandular crowding and cytologic atypia without invasion, and also known as endometrioid intraepithelial neoplasia, is a precursor to endometrioid carcinoma (Bell and Ellenson 2019). A prominent long-term study observed that 30% of complex atypical hyperplasia cases progressed to carcinoma, as opposed to 1% of simple hyperplasia cases, 3% of complex hyperplasia, and 8% of simple atypical hyperplasia (Kurman, Kaminski, and Norris 1985). Cytologic atypia in hyperplasia is less common but a key determinant of carcinogenic progression. One study found that 40% of women with atypical hyperplasia also had concomitant invasive endometrial cancer (Janicek and Rosenshein 1994). As an exception, papillary proliferation of the endometrium without cytologic atypia also occurs and can be 11 associated with complex atypical hyperplasia or carcinoma (Ip et al. 2013). Prognostically, endometrial carcinoma with concomitant hyperplasia is less aggressive than cases without hyperplasia (Gucer et al. 1998), as will be further reviewed in the following section. However, hysterectomy is highly suggested in cases of atypical hyperplasia (Montgomery, Daum, and Dunton 2004). Uncontrolled or deregulated proliferation is an etiological hallmark of hyperplasia. As endometrial hyperplasia is related to endometrioid carcinoma, pathogenesis of hyperplasia is likely tied to endometrioid carcinogenesis. BCL2 is an endogenous inhibitor of apoptosis that is normally expressed in the proliferative endometrium, and oncogenic BCL2 expression has been observed in endometrial hyperplasia (Niemann et al. 1996). Shared chromosomal abnormalities have been detected in endometrial hyperplasia and endometrial cancer (Fabjani et al. 2002). Deleterious PTEN mutations are found in the majority of endometrioid adenocarcinomas and also common in complex atypical hyperplasia (Levine et al. 1998; Mutter et al. 2000), furthering the connection to carcinogenesis. 1.1.2.4 Carcinoma Endometrial cancer is the most prevalent gynecologic malignancy. In 2017, around 61,000 new diagnoses and 11,000 deaths occurred in the United States of America (Constantine et al. 2019). Endometrial cancer incidence is rising due to the increasing prevalence of obesity, which may account for up to 40% of cases (Sorosky 2012; Renehan et al. 2008; Constantine et al. 2019). Endometrial cancer has other established risk factors, notably including unopposed estrogen from postmenopausal hormone replacement therapy (Grady et al. 1995). Conversely, parity and oral contraceptive use are known preventive factors for endometrial cancer risk (Wu et al. 2015; 12 Hinkula et al. 2002; Collaborative Group on Epidemiological Studies on Endometrial 2015). The modulation of estrogenic stimulation, which may promote continuous endometrial proliferation and associated pathologies like hyperplasia, is thought to underlie these risk and protective associations (Deligdisch 2000). Clinically, abnormal uterine bleeding is the most common symptom of malignancy. Endometrial carcinoma is heterogeneous and often considered a set of distinct diseases. Historically, a two-type framework was used to classify endometrial cancer: type I tumors are more common, usually well differentiated, superficially invade myometrium, sensitive to hormonal treatments, usually occur in obese women, and exhibit favorable outcome; type II tumors are less common, often poorly differentiated, tend to be deeply invasive and metastatic, hormone resistant, and exhibit poor prognosis (Bokhman 1983). Histopathologically, these classifications are typically associated with endometrioid (type I) and serous (type II) morphologies, though other established subtypes occur including clear cell, mucinous, mixed, undifferentiated, and dedifferentiated tumors (Bell and Ellenson 2019). Uterine carcinosarcomas are another rare tumor type that contain both carcinoma and sarcoma features with various debated theories of origin. Diagnosed endometrial cancer cases are comprised of roughly 80% endometrioid (uterine endometrioid adenocarcinoma), 5-10% serous (uterine serous carcinoma), <5% clear cell, <2% carcinosarcoma, and <1% for other histological presentations (Bell and Ellenson 2019; Lu and Broaddus 2020). Both overlapping and distinct genetic signatures have been discerned for most endometrial cancer subtypes, and recent studies by consortia such as The Cancer Genome Atlas (TCGA) have suggested that a molecular or genomic subtyping framework provides more robust classification than histopathological features (see section 1.1.3) (Cancer Genome Atlas Research 13 Network et al. 2013). The two most prevalent histopathological subtypes of endometrial cancer, endometrioid and serous, will be further reviewed for the present works. Uterine endometrioid adenocarcinomas are the most prevalent histological subtype and associated with high survival rates, explaining why complex atypical hyperplasia that progress to carcinoma have favorable outcomes. Endometrioid tumors rarely exhibit extrauterine spread, unlike aggressive serous carcinomas. Histologically, endometrioid tumors are marked by hyperproliferation of glandular epithelium with complex architecture and branching not normally observed in healthy endometrium (Bell and Ellenson 2019). Endometrioid tumors are graded on a scale from 1 to 3 based on proportion of solid-tumor component (Lu and Broaddus 2020), where most endometrioid tumors present as low-grade and with typical nuclei. Numerous somatic gene mutations are associated with development of endometrioid adenocarcinomas—such as in PTEN, PIK3CA, ARID1A, and KRAS—that are reviewed in the next section. Uterine serous carcinoma, sometimes referred to as uterine papillary serous carcinoma, is the second most common histological subtype and accounts for the majority of endometrial cancer deaths (del Carmen, Birrer, and Schorge 2012). Unlike indolent endometrioid tumors, serous carcinomas frequently metastasize and are high-grade and aggressive, marked by high-grade nuclei and atypical mitotic figures accompanying papillary growth patterns without smooth borders (Bell and Ellenson 2019). Compared to proliferative endometrioid carcinomas, serous tumors are associated with atrophy (Ambros et al. 1995). Endometrial intraepithelial carcinoma (EIC) is a precursor lesion to uterine serous cancer (Hendrickson et al. 1982; Murali, Soslow, and Weigelt 2014). EIC presents with malignant cytologic atypia and hobnailing or sometimes papillary morphologies that reflect uterine serous carcinomas (Ambros et al. 1995; Soslow, Pirog, and Isacson 2000; Pathiraja, Dhar, and Haldar 2013). EIC precursor lesions are non-invasive but 14 frequently associate with peritoneal spread (Baergen et al. 2001). Etiologically, TP53 mutations are a hallmark feature of EIC and uterine serous cancer, further described in the next section. 1.1.3 Disease genetics Genetic association studies have identified some candidate loci associated with germline predisposition or heritability of endometrial pathologies, but few have been functionally characterized, and effect size is small. For example, in a recent meta-analysis of endometrial cancer genome-wide association studies with over 10,000 disease cases, the most significant locus polymorphism identified (rs11263761, located in an intron of HNF1B) exhibited an allelic odds ratio (OR) of 1.15 (O'Mara et al. 2018). A similar number of significant loci and extent of association have been found for endometriosis (Sapkota et al. 2017; Zondervan et al. 2016). Only 5% of endometrial cancers are estimated to be heritable (Gruber and Thompson 1996), though epidemiological evidence supports there is indeed an increased risk of endometrial cancer development for women with a first-degree family history of endometrial cancer (Win, Reece, and Ryan 2015). Lynch syndrome, a hereditary condition characterized by germline mutations in DNA mismatch repair genes, is the most common and well-established genetic link to endometrial cancer in women, with a lifetime cumulative risk estimate of between 40 and 60% (Meyer, Broaddus, and Lu 2009). Endometriosis is supported as more heritable, with common genetic variation contributing to an estimated 26% of risk (Lee et al. 2013). With the advent of whole exome-sequencing, The Cancer Genome Atlas (TCGA) consortium profiled hundreds of human endometrial cancer tumors to identify frequent driver mutations and establish a molecular framework for subtype classification (Cancer Genome Atlas Research Network et al. 2013). In the TCGA Uterine Corpus Endometrial Carcinoma (UCEC) 15 data set, 373 human endometrial carcinomas (both endometrioid and serous histological presentations) were originally sampled and profiled for integrative (epi)genomic, transcriptomic, and proteomic characterization. Four genomic subtypes of endometrial tumors were categorized by TCGA somatic exome sequencing analyses: 1) ultra-mutated endometrioid-like tumors marked by extremely high mutation burden associated with POLE mutations, affecting DNA repair, and high survival rates, referred to as POLE; 2) hypermutated endometrioid-like tumors with microsatellite instability, referred to as MSI; 3) microsatellite stable endometrioid-like tumors with few somatic copy-number alterations, referred to as copy-number low or CN low; and 4) serous- like tumors with frequent genomic copy-number alterations associated with TP53 mutations, low mutation rate, and poor survival outcome, referred to as copy-number high or CN high. Aside from POLE and TP53, other somatic single-nucleotide gene mutations are associated with the genomic subtypes: PIK3CA mutations are frequent among all four subtypes; PTEN, PIK3R1, ARID1A, and CTCF mutations are common in all subtypes except CN high; CTNNB1 mutations are enriched in CN low tumors; FBXW7 and PPP2R1A mutations are also frequent in CN high tumors (Fig. 1.1). As the endometrioid histological subtype associates with genomic subtypes POLE, MSI, and CN low, while serous-like tumors associated with the CN high subtype, one can extrapolate the mutations associated with disease histological presentation. Grade 3 endometrioid tumors show an enrichment for MSI subtype as compared to lower grade endometrioid tumors (Lu and Broaddus 2020). The results from TCGA mostly aligned with the findings of other endometrial cancer exome sequencing and targeted mutation studies at the time (Kuhn et al. 2012; Le Gallo et al. 2012; Cheung et al. 2011; Liang et al. 2012; Le Gallo and Bell 2014). Another group further characterized the somatic copy-number alteration patterns of the TCGA-UCEC cohort along with an additional 141 endometrial tumors and found that frequent chromosome 1q(32.1) amplifications 16 were associated with poor prognosis, a feature hypothesized as attributed to MDM4 genomic upregulation (Depreeuw et al. 2017). Figure 1.1 Frequently mutated genes in endometrial cancer Summary of somatic mutation rates (top), types of single-nucleotide variants (middle), and associated genomic subtypes (bottom) across 10 frequently mutated genes in endometrial cancer characterized by TCGA-UCEC (Cancer Genome Atlas Research Network et al. 2013), ranked by overall mutation rate. The genomic copy-number alteration high (CN high) subtype contains almost all of the serous-like tumors, while the other subtypes are mostly endometrioid-like tumors. Data are from the more recent TCGA Pan-Cancer Atlas data set of 507 profiled and subtyped endometrial carcinomas (Hoadley et al. 2018), extracted from cBioPortal (Gao et al. 2013). TP53 is one of the most frequently mutated genes across human cancer (Lawrence et al. 2014). Before TCGA and exome sequencing studies, it had already been long established that TP53 mutations are a dominating feature of uterine serous carcinoma (Tashiro et al. 1997). TP53 mutations have also been observed in the serous precursor lesion endometrial intraepithelial 17 carcinoma (Soslow, Pirog, and Isacson 2000; Pathiraja, Dhar, and Haldar 2013). TP53 alterations in endometrial cancer often present as missense mutations marked by strong nuclear staining by immunohistochemistry—as p53 mutants are often not effectively ubiquitinated and degraded (Lukashchuk and Vousden 2007)—but deleterious or null mutations are also observed leading to absence of staining (McCluggage, Soslow, and Gilks 2011; Tashiro et al. 1997; Kobel et al. 2019). This is in contrast with normal, wild-type p53 staining, which displays a mixture of mostly absent cells and infrequent weak or strong nuclear staining dispersed throughout the tissue. TP53 mutations have established ties to chromosomal instability and structural aberrations (Hanel and Moll 2012), supporting the high genomic copy-number alteration molecular signature observed by TCGA. In endometrial cancer, TP53 mutant, serous-like tumors rarely exhibit ARID1A mutations despite overall prevalence, and an early study supported mutation mutual exclusivity in a cohort of 77 ovarian clear cell and uterine endometrioid carcinomas, where all ARID1A mutant tumors contained wild-type TP53, and vice versa (Guan, Wang, and Shih Ie 2011). TP53-ARID1A mutation mutual exclusivity in endometrial cancer is explored in Chapter 4. Of related relevance, mutations in CHD4—a catalytic subunit of the SWI/SNF-like chromatin remodeler complex, NuRD, studied in Chapter 7—are common in TP53 mutant uterine serous cancers (Le Gallo et al. 2012). Mutations leading to oncogenic phosphoinositide 3-kinase (PI3K) pathway activation are frequent in endometrial cancer (Naumann 2011), with 84% of patients displaying mutations in PIK3CA, PIK3R1, or PTEN (Cancer Genome Atlas Research Network et al. 2013). The PI3K pathway centrally involves regulating the phosphorylation of phosphatidylinositol lipids with widespread roles in signal transduction (Fruman et al. 2017). AKT phosphorylation is a common effector readout for PI3K pathway activation. PIK3CA (p110α) is a class I PI3K that 18 phosphorylates phosphatidylinositol-4,5-bisphosphate (PIP2) to phosphatidylinositol-3,4,5- trisphosphate (PIP3), and oncogenic PIK3CA mutations are among the most frequent somatic alterations found in human cancers (Samuels et al. 2004; Lawrence et al. 2014). Missense mutations in PIK3CA are common in complex atypical hyperplasia, and PIK3CA mutation has been identified as an early event in endometrial carcinogenesis (Berg et al. 2015). PIK3CA acts as an oncogene often through three major hotspot mutations—E542K, E545K, and H1047R—which affect interactions with regulatory subunit PIK3R1 (p85α) (E542K, E545K) and kinase activity (H1047R) (Gkeka et al. 2014; Ligresti et al. 2009; Miled et al. 2007). Certain PIK3R1 mutations also stimulate PI3K activity but may be genetic context-dependent (Cheung et al. 2011). PTEN is a PIP3 phosphatase that reverts the activity of class I kinases, and, thereby, PTEN functions as a tumor suppressor that acquires deleterious or loss-of-function mutations in cancer (Fruman et al. 2017). PTEN alterations are observed at an even higher rate than PIK3CA in endometrial cancer (Cancer Genome Atlas Research Network et al. 2013). Cancer-associated mutations and alterations are also observed in benign endometrial diseases like endometriosis, including PIK3CA, KRAS, CTNNB1, PPP2R1A, and ARID1A (Anglesio et al. 2017; Lac, Verhoef, et al. 2019; Li et al. 2014). ARID1A mutations and loss of expression have been observed in endometriosis (Samartzis et al. 2012; Kim et al. 2015). Recent evidence supports that some cancer-associated mutations are also observed in normal or eutopic endometrium, such as PIK3CA alterations (Suda et al. 2018; Lac, Nazeran, et al. 2019; Moore et al. 2020). Such benign genetic aberrations may function as early drivers of carcinogenesis that become neoplastic upon acquisition of a secondary tumor suppressor or oncogene mutation. Other genes, like ARID1A, are only mutated in disease states (Suda et al. 2018), suggesting that their acquisition is involved in transformation. 19 1.1.4 ARID1A mutant endometrial pathologies Mutations in the SWI/SNF chromatin remodeling complex subunit ARID1A (AT-rich interactive domain 1A; BAF250A) were first reported in ovarian clear-cell carcinoma and ovarian endometrioid carcinoma, two epithelial ovarian cancer subtypes associated with endometriosis (Jones et al. 2010; Wiegand et al. 2010). Inactivating ARID1A mutations were soon thereafter observed in numerous other endometrial pathologies (Guan et al. 2011; Liang et al. 2012; Mao and Shih Ie 2013; Wu, Wang, and Shih Ie 2014). ARID1A mutations are found in 40% of low-grade endometrial cancer (Guan et al. 2011), while ARID1A protein expression is lost in 26-29% of low- grade and 39% of high-grade endometrial cancer (Wiegand et al. 2011). ARID1A mutations have also been observed in endometrial clear cell tumors (Le Gallo et al. 2017). ARID1A loss is observed in focal areas of atypical endometrial hyperplasia (Mao et al. 2013), indicating clonal loss. Loss of ARID1A in complex atypical hyperplasia is associated with malignant transformation and concurrent endometrial cancer (Yen et al. 2018). ARID1A mutations are observed in 11% of endometriosis (ovarian endometrioma and deep-infiltrating endometriosis) and >30% of endometriosis-associated ovarian cancer in addition to roughly 40% of endometrial carcinomas (Chui, Wang, and Shih 2017; Kurman and Shih Ie 2016; Anglesio et al. 2017; Suda et al. 2018; Samartzis et al. 2012; Cancer Genome Atlas Research Network et al. 2013). ARID1A expression is lower in eutopic endometrium of patients with endometriosis, and ARID1A is required for embryo implantation in the uterus (Kim et al. 2015). The common cell-of-origin of most ARID1A mutant gynecologic pathologies is the endometrial epithelium, indicating a cell-autonomous requirement. 20 1.2 ARID1A and chromatin regulation Eukaryotic DNA genomes often consist of millions or billions of nucleotide bases, presenting a challenge for compact nuclear storage and precise activation or repression of genomic elements corresponding to biological signals. Chromatin is the collection of genomic DNA and all interacting proteins that regulate genomic activity and nuclear organization. The nucleosome— defined as ~147 bp of genomic DNA wrapped around a canonical octamer of H2A, H2B, H3, and H4 histone proteins—is the fundamental unit of eukaryotic chromatin involved in efficiently packaging genomic DNA. Three-dimensional chromatin packaging and nucleosome configuration can obscure DNA motifs from recognition by sequence-specific regulators such as transcription factors (Clapier and Cairns 2009). Numerous molecular mechanisms of chromatin regulation are adopted to spatiotemporally orchestrate genomic activity in response to biological signals, including histone post-translational modifications, alterations in nucleosome composition or stability, and chromatin interactions with factors that recognize and modify these signals (see section 6.2 for further review). Chromatin remodeler complexes are a class of nuclear regulators that modify genomic DNA packaging and activity, principally, through nucleosome positioning, structure, composition, and ejection. Chromatin remodelers serve essential roles in multiple nuclear processes including transcription and transcriptional regulation, DNA replication, and DNA repair. Chromatin remodeler complexes also govern chromatin architecture through biochemical and functional interactions with other chromatin machinery. Chromatin remodelers are typically classified into four major families based on the catalytic ATPase subunit: SWI/SNF, CHD, INO80, and ISWI (Clapier and Cairns 2009). ARID1A is a core subunit of certain chromatin remodeler complexes belonging to the mammalian SWI/SNF family. 21 1.2.1 ARID1A and SWI/SNF chromatin remodeler complexes The conserved, multi-subunit SWI/SNF complex was first identified through genetic screens in yeast, in which mutations in the encoding genes affected regulation of mating-type switches (Swi, switch) and sucrose fermentation (Snf, sucrose non-fermenting). The transcriptional regulation mechanism by this complex was later determined to be through altering the structure of chromatin, which is ATPase activity dependent (Hirschhorn et al. 1992; Laurent, Treich, and Carlson 1993). This led to SWI/SNF biochemical characterization as a remodeler of nucleosomes through histone octamer translocation (Whitehouse et al. 1999). Preliminary functional studies at the locus level indicated primarily activating roles of SWI/SNF (Peterson and Herskowitz 1992; Laurent, Treitel, and Carlson 1990). Early biochemical and genetic evidence support this framework by suggesting SWI/SNF antagonizes nucleosomal repression. SWI/SNF classically facilitates gene activation by repositioning nucleosomes in chromatin to permit transcriptional machinery binding (Clapier and Cairns 2009). SWI/SNF regulation has been demonstrated to encompass many genes, with evidence suggesting approximately 5-10% of the genome (Holstege et al. 1998; Sudarsanam et al. 2000). However, an early genome-wide study in yeast illustrated that SWI/SNF plays a repressive role at more genes than it activates (Holstege et al. 1998). SWI/SNF and other chromatin remodeler complexes are conferred functional specificity and targeting through biochemical interactions. Aside from signature catalytic subunits, all chromatin remodeler complexes contain subunit domains that separately bind histones and histone post-translational modifications, DNA, and other chromatin regulators like transcription factors (Clapier 2021). SWI/SNF is typically composed of 9 to 12 subunits, though less than half of these are essential for in vitro nucleosome remodeling (Wang, Xue, et al. 1996; Roberts and Orkin 2004). 22 Multiple SWI/SNF complexes associated with different subunit compositions have been identified in mammals—referred to as BRG1/BRM-associated factors (BAF)—including canonical BAF, polybromo-associated BAF, non-canonical BAF, and neuron-specific BAF (further reviewed in section 1.2.2). All mammalian SWI/SNF complexes are comprised of core/structural subunits (e.g. SMARCC1, SMARCD1, ARID scaffolds ARID1A, ARID1B, ARID2), core ATPase catalytic subunits (e.g. SMARCA4/BRG1, SMARCA2/BRM), and accessory subunits thought to confer further functional specificity. Core subunit SMARCB1 (BAF47) directly binds the nucleosome acidic path to augment remodeling activity (Valencia et al. 2019). The SWI/SNF catalytic subunit BRG1 (Brahma-related gene 1, SMARCA4; ortholog of yeast Swi2/Snf2) contains a bromodomain that binds acetylated histone tail residues and is required for some context- dependent remodeler activities (Hassan et al. 2002; Hassan, Awad, and Prochasson 2006; Shen et al. 2007). SWI/SNF can be stabilized at promoter nucleosomes through activator recruitment (Hassan, Neely, and Workman 2001), and multiple SWI/SNF subunits have been observed to directly interact with activators (Neely et al. 2002). Some chromatin remodelers can also interact with histone deacetylases to repress transcription, as in the NuRD complex (Zhang et al. 1998). Alternate mechanisms of remodeler mediated repression include stable positioning of nucleosomes, as has been demonstrated by SWI/SNF at the HIV long terminal repeat (Rafati et al. 2011). Chromatin remodelers are also known to antagonize or modulate activities other chromatin regulators; an example of which is SWI/SNF antagonism of H3K27me3 mediated silencing by polycomb repressive complex (Ho et al. 2011; Stanton et al. 2017). ARID1A (BAF250A) is the largest SWI/SNF subunit and also serves as an essential core scaffold for certain SWI/SNF complexes, physically interacting with multiple subunits (Mashtalir et al. 2018; He et al. 2020). In SWI/SNF complexes, ARID1A is mutually exclusive with the 23 ARID1B (BAF250B) and ARID2 (BAF200) homologs (Wang, Cote, et al. 1996; Wang et al. 2004), and these subunits are thought to confer complex specificity through accessory subunit composition and external protein-protein interactions (Mashtalir et al. 2018). Structurally, ARID1A contains a nominal ARID (AT-rich interactive domain) DNA-binding domain, after which it was first described, though biochemical evidence indicated it binds DNA in a sequence independent manner (Dallas et al. 2000). Nonetheless, DNA-binding activity is essential to ARID1A function, as DNA-binding defective mutant ARID1A forms stable SWI/SNF complexes with functional core remodeler catalytic activities but is embryonically lethal in mice (Chandler et al. 2013). ARID1A also contains a poorly characterized C-terminal domain of unknown function predicted to facilitate protein-protein interactions (Wu and Roberts 2013; Sandhya et al. 2018). The ARID1A C-terminal domain includes conserved, nuclear receptor targeting LXXLL motifs (Heery et al. 1997), thought to enable the reported empirical interactions with multiple steroid hormone receptors including estrogen receptor (ER/ESR1), progesterone receptor (PGR), and glucocorticoid receptor (Nie et al. 2000; Nagarajan et al. 2020; Kim et al. 2021). Structural modeling analyses predict the C-terminal domain forms armadillo (ARM) repeats interact with both the ARID domain and other SWI/SNF subunit proteins (Sandhya et al. 2018). 1.2.2 Context-specific roles of ARID1A and SWI/SNF SWI/SNF complex composition is heterogeneous and cell type dependent (Wang, Cote, et al. 1996). The observation of tissue-specific propensities for certain SWI/SNF subunit mutations further suggests SWI/SNF subunit architecture, complex distribution, and functions are context- dependent. Protein subunit architecture is proposed to define complex specificity through specialized cofactor interactions. Overall, the discrete contributions of accessory subunits are 24 poorly described but have been shown to promote and contribute to multiple remodeler complex functions. The biological complexity observed across eukaryotic evolution has inevitably accompanied the acquisition of more specialized genetic regulatory machinery. Indeed, SWI/SNF subunit composition has progressively changed and increased in size and complexity from yeast to mammals (Kadoch and Crabtree 2015). Multiple architectural configurations of SWI/SNF or BAF complexes can be present and functional within the same mammalian cell, and the respective biochemical activities are dependent upon numerous biological factors including the local chromatin, tissue type, and environmental or developmental cues. The first two identified and most well-studied mammalian SWI/SNF complex configurations are canonical BAF (cBAF; SWI/SNF- A) and polybromo-associated BAF (PBAF; SWI/SNF-B) (Wang, Cote, et al. 1996; Mashtalir et al. 2018; Kadoch and Crabtree 2015); both contain one of two mutually exclusive catalytic ATPases BRG1 (SMARCA4; BAF190A) or BRM (SMARCA2; BAF190B). cBAF is characterized by presence of either ARID1A (BAF250A) or ARID1B (BAF250B), while PBAF contains ARID2 (BAF200) and PBRM1 (BAF180). Other established accessory subunits are complex-specific, such as BRD7 in PBAF (Kaeser et al. 2008). Surprisingly, the catalytic subunit BRG1 is developmentally required while BRM is not, indicating differences in regulatory activity by alternative ATPases (Bultman et al. 2000; Reyes et al. 1998). Gene expression analyses have found that certain SWI/SNF subunit-encoding genes are differentially expressed across human cell types (Ho, Lloyd, and Bao 2019), and studies in various tissue and developmental contexts have identified other, distinct mammalian SWI/SNF complexes. A complex identified in embryonic stem cells (esBAF) contains BRD9 and GLTSCR1 or GLTSCR1L (Ho et al. 2009; Alpsoy and Dykhuizen 2018; Michel et al. 2018), while the related non-canonical BAF (ncBAF) complex also 25 contains those novel subunits but lacks BAF47, BAF57, and ARID1A (Gatchalian et al. 2018). A developing neural progenitor BAF complex (npBAF) contains BAF45a, BAF53a, and SS18, which are replaced by BAF45b, BAF53b, and CREST in the mature neuron-specific BAF complex (nBAF) (Olave et al. 2002; Lessard et al. 2007; Staahl et al. 2013). Other tissue-specific or developmental SWI/SNF complex compositions have been recently identified (Bachmann et al. 2016; Hota et al. 2019). The current literature likely does not encompass all context dependent mammalian SWI/SNF complexes present in nature. Functionally, SWI/SNF chromatin remodeling activities have been demonstrated to promote cell type-specific chromatin and gene expression programs in numerous contexts. SWI/SNF promotes activity at lineage-specific enhancers, such that SWI/SNF deletion leads to downregulation of developmental target genes (Alver et al. 2017). In bone marrow-derived mesenchymal stem cells, BRM depletion alters lineage determination toward osteogenic as opposed to adipogenic commitment (Nguyen et al. 2015). In differentiating preadipocytes, cBAF is required for adipogenesis while PBAF is not, where BRG1 is essential to adipogenic enhancer activation and induction of associated gene expression (Park et al. 2021). BAF60c is specifically expressed in the developing heart (Lickert et al. 2004), and dynamic SWI/SNF complexes govern temporal chromatin accessibility and gene expression during cardiogenic differentiation (Hota et al. 2019). Numerous efforts have supported that SWI/SNF controls the specification and maintenance of various neural fates (Sokpor et al. 2017). The context-specific functions of mammalian SWI/SNF are widespread and continue to be reported. ARID1A mutations are most prevalent in cancers originating from the endometrial epithelium (further reviewed in section 1.2.3). This phenomenon suggests there are likely unique or critical roles for ARID1A-containing SWI/SNF complexes in this tissue. ARID1A mutations 26 also occur in other hormone-responsive tissues like ovary and breast, suggesting a possible relationship with steroid hormone signaling. One of the original reports describing human ARID1A observed that it enhances transcriptional activation by glucocorticoid receptor, estrogen receptor, and progesterone receptor (Inoue et al. 2002). In breast cancer, ARID1A and SWI/SNF physically interact with estrogen receptor (ER) and co-regulate chromatin at ER target genes (Nagarajan et al. 2020). ARID1A was also shown to interact with progesterone receptor (PGR) in an endometrial cancer cell line, and ARID1A uterine or endometrial epithelial knockout mice are infertile with defects in decidualization and endometrial gland development (Kim et al. 2015; Marquardt et al. 2021). As the endometrial plasticity accompanying the menstrual cycle and implantation is facilitated by steroid hormone signals, it remains easy to hypothesize that ARID1A involvement with orchestrating these transcriptional endpoints probably provokes ARID1A mutant pathology associated with spatiotemporal alteration of regular endometrial processes. 1.2.3 ARID1A alterations in disease Mutations in the SWI/SNF chromatin remodeling complex are widespread in various cancers, benign diseases, and developmental disorders. SWI/SNF mutations, most frequently in ARID1B, are a common cause of Coffin-Siris syndrome (Wieczorek et al. 2013), marked by developmental and intellectual disabilities. SWI/SNF is mutated in >20% of all human cancers, and mutations are detected across nearly all cancer types (Wang, Haswell, and Roberts 2014; Kadoch et al. 2013; Kadoch and Crabtree 2015). In cancer, tissue-specific propensities for mutations in certain subunits are evident (Helming, Wang, and Roberts 2014; Kadoch and Crabtree 2015). ARID1A is the most frequently mutated SWI/SNF subunit in cancer (Mittal and Roberts 2020) (Fig. 1.2a), and ARID1A is particularly prone to mutation in gynecologic cancer (Kurman 27 and Shih Ie 2016; Jones et al. 2010; Wiegand et al. 2010; Guan et al. 2011; Wiegand et al. 2011). In fact, ARID1A is one of the most frequently mutated genes in human cancer, alongside TP53 and PIK3CA (Lawrence et al. 2014). Across human cancers, ARID1A mutations are most prevalent in uterine endometrial cancer, in over 40% of cases (Cancer Genome Atlas Research Network et al. 2013; Wu, Wang, and Shih Ie 2014) (Fig. 1.2b). Outside of cancer, ARID1A mutations and loss of expression are also observed in the related, benign disease endometriosis, characterized by ectopic spread of the endometrium (reviewed in 1.1.2.2) (Samartzis et al. 2012; Anglesio et al. 2017; Zondervan, Becker, and Missmer 2020). Figure 1.2 SWI/SNF mutation rates across human cancer a, Mutation frequencies for 21 canonical SWI/SNF complex subunits across 10,279 tumor samples profiled by The Cancer Genome Atlas (TCGA). x-axis displays the total number of tumor samples containing a mutation in the given subunit-encoding gene. y-axis displays the number of TCGA cancer types (out of 33 total) with a mutation rate >2% for that gene. TCGA data retrieved from NIH NCI Genomic Data Commons (GDC). b, ARID1A mutation rates across 33 cancer types profiled by TCGA, from standardized MC3 mutation data (Ellrott et al. 2018). 28 Among highly mutated tumor suppressor genes, ARID1A is unique because ARID1A knockout mice are embryonic lethal in the heterozygous state (Gao et al. 2008), while numerous other tumor suppressor genes (e.g., TP53) are non-essential for mouse development (Donehower 1996). ARID1A-null embryos die at embryonic day (E) E6.5 (Gao et al. 2008), while DNA- binding defective ARID1AV1068G mutant embryos die around E10 (Chandler et al. 2013). ARID1A mutations are often nonsense and result in a frameshift of the open-reading frame (Jones et al. 2010), a characteristic of many tumor suppressors. The effects of ARID1A alterations are likely context-dependent, as both tumor suppressive and oncogenic functions have been reported in the liver (Sun et al. 2017). ARID1A alterations in cancer typically arise as frameshift or nonsense mutations resulting in loss of stable expression (Guan et al. 2012). In fact, diminished ARID1A protein expression is generally a surrogate for inactivating mutations (Wiegand et al. 2010; Maeda et al. 2010). Yet, many cases of ARID1A mutations in cancer do not result in biallelic loss, which suggests ARID1A is haploinsufficient (Wu and Roberts 2013). These characteristics are all hallmarks of a canonical tumor suppressor. 1.2.4 ARID1A mutant disease mechanisms While deleterious ARID1A mutations are common in cancer, the cellular and molecular mechanisms underlying ARID1A tumor suppression are incompletely characterized. Foremost, loss of ARID1A almost certainly affects genomic distribution and activity of nuclear SWI/SNF complexes leading to numerous outcomes of misregulated chromatin (Wu and Roberts 2013). Certain ARID1A mutant cancers are dependent upon the activities of ARID1B-containing SWI/SNF complexes, e.g. to promote enhancer accessibility (Helming, Wang, and Roberts 2014; 29 Kelso et al. 2017). The methyltransferase EZH2 catalyzes H3K27me3 associated with chromatin silencing—a process that is antagonized by SWI/SNF, as reviewed—and EZH2 inhibitors are selectively effective in ARID1A mutant ovarian cancer cell lines (Bitler et al. 2015). ARID1A mutant pathogenesis is also suspected to derive from the disruption of protein-protein interactions and functions specific to ARID1A-containing SWI/SNF complexes, which cannot be compensated by other ARID1B or ARID2 complexes. An early report observed increased ovarian epithelial cell proliferation upon ARID1A silencing in vitro, and, conversely, ectopic re-expression resulted in slowed growth in cancerous lines (Guan, Wang, and Shih Ie 2011). The same study also observed that ARID1A directly interacts with tumor suppressor TP53 (p53) and activates p53 target gene expression including CDKN1A (p21) and SMAD3. ARID1A has been observed to regulate cell cycle checkpoint genes (Nagl et al. 2005). Evidence in the liver suggests ARID1A mutations may function differently in tumor initiation vs. established tumor contexts toward progression (Sun et al. 2017). Other controversial evidence suggests ARID1A mutations result in genomic instability by disrupting SWI/SNF-mediated DNA repair mechanisms, which may contribute to tumorigenesis (Park et al. 2006; Allo et al. 2014). However, identifying ARID1A mutation as the cause or consequence of genomic instability has not been wholly concluded and may be context dependent (Wu, Wang, and Shih Ie 2014). ARID1A mutations result in carcinogenesis likely through combinatorial disruption of these processes, though the molecular mechanisms are poorly understood. As a chromatin regulator, numerous mechanisms of ARID1A nuclear function have been reported but are incomplete. Sampling a subset of the literature across various cellular contexts, and in addition to the previously reviewed works, evidence suggests ARID1A maintains accessibility at enhancers (Kelso et al. 2017) and enhancer-mediated gene expression (Mathur et 30 al. 2017; Sen et al. 2019), regulates transcriptional elongation (Trizzino et al. 2018), assists in topoisomerase activity (Dykhuizen et al. 2013), and catalyzes H2B monoubiquitination at lysine 120 (Li et al. 2010). It is probable that ARID1A governs multiple nuclear processes, which may be cell-type and context specific. This dissertation aims to elucidate how ARID1A regulates chromatin and associated gene expression in the endometrial epithelium to promote healthy uterine functions, and determine what direct molecular, cellular, and physiological consequences occur following ARID1A genetic disruption or nuclear depletion. These efforts are described in the following chapters. In Chapter 2, a genetically engineered mouse model of invasive endometrial hyperplasia through concomitant ARID1A loss and oncogenic PIK3CA activation is established and characterized through genome- wide functional assays including chromatin accessibility and gene expression. In Chapter 3, genome-wide sequencing technologies and technical challenges are reviewed following a thorough examination of the presented in vivo ATAC-seq chromatin accessibility data and possible biological interpretations. In Chapter 4, a genetic relationship between ARID1A and TP53 mutations is explored through human tumors and mouse models. In Chapter 5, the functional and disease roles of ARID1A and BRG1 subunits of SWI/SNF are contrasted in the endometrium. In Chapter 6, a chromatin state map accompanying ARID1A loss is constructed by integrating multiple epigenomic assays, and it is revealed that ARID1A suppresses hyperactivity of super- enhancers involved in cell identity processes. Finally, in Chapter 7, a physical and genomic interaction between ARID1A and another SWI/SNF-like remodeler complex containing co- repressors is described and functionally characterized. In Chapter 8, the major conclusions of these studies are reviewed and discussed, and future experimental directions are outlined with preliminary data. These studies elucidate ARID1A tumor suppressive mechanisms in the 31 endometrium through genetic models and genome-wide measurements of chromatin regulation in healthy and disease states. These works further provide rationale for preclinical development of at least one new therapy strategy for ARID1A mutant endometrial diseases. The research presented establishes that ARID1A governs endometrial epithelial cell identity through chromatin regulatory mechanisms that are suspected to serve parallel roles in both pathological and normal physiological processes of the endometrium. 32 CHAPTER 2 ARID1A AND PI3-KINASE PATHWAY MUTATIONS IN THE ENDOMETRIUM DRIVE EPITHELIAL TRANSDIFFERENTIATION AND COLLECTIVE INVASION A modified version of this chapter was previously published (Wilson et al. 2019): Mike R. Wilson*, Jake J. Reske*, Jeanne Holladay, Genna E. Wilber, Mary Rhodes, Julie Koeman, Marie Adams, Ben Johnson, Ren-Wei Su, Niraj R. Joshi, Amanda L. Patterson, Hui Shen, Richard E. Leach, Jose M. Teixeira, Asgerally T. Fazleabas, and Ronald L. Chandler. 2019. ARID1A and PI3-Kinase pathway mutations in the endometrium drive epithelial transdifferentiation and collective invasion. Nature Communications 10: 3554. *These authors contributed equally. 2.1 Abstract ARID1A and PI3-Kinase (PI3K) pathway alterations are common in neoplasms originating from the uterine endometrium. Here we show that monoallelic loss of ARID1A in the mouse endometrial epithelium is sufficient for vaginal bleeding when combined with PI3K activation. Sorted mutant epithelial cells display gene expression and promoter chromatin signatures associated with epithelial-to-mesenchymal transition (EMT). We further show that ARID1A is bound to promoters with open chromatin, but ARID1A loss leads to increased promoter chromatin accessibility and the expression of EMT genes. PI3K activation partially rescues the mesenchymal phenotypes driven by ARID1A loss through antagonism of ARID1A target gene expression, resulting in partial EMT and invasion. We propose that ARID1A normally maintains endometrial epithelial cell identity by repressing mesenchymal cell fates, and that coexistent ARID1A and PI3K mutations promote epithelial transdifferentiation and collective invasion. Broadly, our 33 findings support a role for collective epithelial invasion in the spread of abnormal endometrial tissue. 2.2 Introduction Deleterious ARID1A mutations are prevalent in pathologies originating from the endometrial epithelium, supporting a tumor suppressor role for ARID1A-containing SWI/SNF complexes in neoplasms originating from the endometrium. However, the hypothesis that ARID1A may function as an endometrial tumor suppressor remains to be tested. Genetically engineered mouse models (GEMMs) offer the opportunity to study gynecologic pathologies in vivo (Perets et al. 2013; Wu et al. 2016; Zhai et al. 2017; Zhang, Fukumoto, and Magno 2018). ARID1A loss in the mouse ovarian surface epithelium drives tumorigenesis when paired with PTEN loss or PIK3CAH1047R mutation (Guan et al. 2014; Chandler et al. 2015). As reviewed in section 1.1.3, activating PI3-kinase pathway mutations are frequent in endometrial cancer. Specifically, PIK3CA mutations commonly co-occur with ARID1A loss in endometrial cancer (Takeda et al. 2016). Therefore, in this study, we utilize conditional knockout alleles and the Cre- LoxP conditional allele targeting system (Kim et al. 2018) to target ARID1A mutations with and without PIK3CAH1047R directly to the endometrial epithelium to test tumorigenic competence. 2.3 Results 2.3.1 ARID1A is haploinsufficient in the endometrial epithelium ARID1A has been hypothesized to function as a haploinsufficient tumor suppressor (Wu and Roberts 2013). To explore this further, we utilized publicly available Uterine Corpus Endometrial Carcinoma (UCEC) mutation and copy-number datasets from The Cancer Genome 34 Atlas (TCGA). Most endometrioid endometrial cancer patients with ARID1A mutations (either single or multiple hits) show no detectable copy-number alterations at the ARID1A locus, with 33% of all patients having a single nonsense mutation and normal ploidy at ARID1A (Fig. 2.1a). As reviewed in section 1.1.3, PI3-kinase pathway mutations are common in endometrial cancer. Co-existing PIK3CA mutation was significantly associated with ARID1A mutation, and a majority (61%) of heterozygous ARID1A tumors also have PIK3CA alterations (Fig. 2.1a). These data demonstrate that 20% of endometrioid endometrial cancer patients are genetically heterozygous for ARID1A mutations and carry PIK3CA alterations. To induce Cre expression in the mouse endometrial epithelium for conditional allele targeting, we utilized LtfCre (Tg(Ltf-iCre)14Mmul). LtfCre contains a codon-improved Cre recombinase (iCre) knock-in at the endogenous lactoferrin/lactotransferrin (Ltf) promoter (Daikoku et al. 2014; Shimshek et al. 2002). Ltf expression is estrogen responsive and naturally occurs at sexual maturity in female mice, becoming fully active by 60 days, and uterine expression is specific to the endometrial epithelium (Daikoku et al. 2014) (Fig. 2.1b). To investigate the consequence of ARID1A loss in the endometrial epithelium, we bred LtfCre0/+ mice to mice with an Arid1afl allele, permitting conditional knockout of ARID1A upon Cre expression (Chandler et al. 2015) (Fig. 2.1c). Genotyping by PCR confirmed expression of each allele (Fig. 2.1d). We observed no gross phenotypes in LtfCre0/+; Arid1afl/fl mice (Fig. A.1a). As reviewed in Chapter 1, it is suspected that ARID1A mutations may require additional oncogenic mutations before tumorigenesis is observed. Previously, (Gt)R26Pik3ca*H1047R was found to be a potent driver of epithelial ovarian tumors when combined with Arid1afl/fl (Chandler et al. 2015). (Gt)R26Pik3ca*H1047R provides conditional expression of the oncogenic PIK3CAH1047R mutation (Adams et al. 2011) (Fig. 2.1c). No gross phenotypes were observed in LtfCre0/+; 35 (Gt)R26Pik3ca*H1047R (Fig. A.1a), as previously described in the endometrial epithelium (Joshi et al. 2015). Therefore, we bred LtfCre mice with mice harboring (Gt)R26Pik3ca*H1047R, Arid1afl, and Arid1aV1068G (DNA-binding domain defective ARID1A mutant) (Chandler et al. 2013) (Fig. 2.1c) to develop an allelic series with increasing ARID1A mutational burden in the endometrial epithelium. Abnormal vaginal bleeding is a prominent symptom of endometrial dysfunction in humans. LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice were sacrificed after a median age of 14 weeks due to vaginal bleeding and uterine tumors (Fig. 2.1e, g). Surprisingly, homozygous ARID1A loss was not required for vaginal bleeding, as LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ mice developed endometrial lesions and vaginal bleeding (Fig. 2.1e, g). For both LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+, and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G mice, median uterus weight, and survival were not significantly different from LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (Fig. 2.1f- g). ARID1A loss and PI3K pathway activation (via phospho-S6 ribosomal protein, P-S6, expression) were determined by immunohistochemistry, while Cytokeratin 8 (KRT8) labeled the endometrial epithelium (Fig. 2.1h and A.1b). LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G, and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl showed evidence of widespread atypical endometrial hyperplasia and nuclear atypia, including glandular crowding and abnormal cytologic features (Fig. 2.1h and A.1b). Endometrial tumors were moderately to poorly differentiated, with areas of solid and cribiform architecture (Fig. 2.1h). In one mouse, we observed visible lung metastasis (Fig. A.1c), a site of metastasis in some endometrial cancer patients. In the LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelium we observed downregulation of estrogen receptor-α (ESR1; ER) and loss of the progesterone receptor (PGR; 36 PR), suggesting changes to steroid hormone regulation (Fig. A.2a). Impaired steroid hormone regulation indicates poor prognosis in endometrial cancer (Zhang et al. 2015). 37 Figure 2.1 Development of genetic mouse models representing an allelic series of ARID1A mutations in the endometrial epithelium 38 Figure 2.1 (cont’d) a, UCEC endometrioid patient ARID1A alteration status and co-incidence with PIK3CA mutation, taken from TCGA-UCEC dataset. b, LacZ expression (blue) is specific to the endometrial epithelium. Sections were counter-stained with nuclear fast red (scale bar = 400 μm). c, Diagram of mutant alleles utilized in this study. d, PCR genotyping results to detect LtfCre0/+, (Gt)R26Pik3ca*H1047R, Arid1afl, and Arid1aV1068G. e, Representative gross images of mice at time of sacrifice due to vaginal bleeding. White arrows indicate tumors. Size of uterine tumor varies within genotype at time of sacrifice. f, Weight of semi-dry mouse uterus by genotype. Control (n = 5), LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ (n = 14), LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G (n = 7), LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (n = 6) (mean ± s.d; * p < 0.05, two-tailed, unpaired t-test,). g, Survival of LtfCre0/+ mice, based on time until vaginal bleeding. (Gt)R26Pik3ca*H1047R (n = 5), Arid1afl/fl (n = 7), (Gt)R26Pik3ca*H1047R; Arid1afl/+ (n = 17), (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G (n = 7), (Gt)R26Pik3ca*H1047R; Arid1afl/fl (n = 12). Mice succumb to vaginal bleeding (sample image inset) at a median (μ1/2) of 16 weeks (LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+) or 14 weeks (LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl, and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G), without a significant difference between these genotypes. LtfCre0/+ mice harboring Arid1afl/fl or (Gt)R26Pik3ca*H1047R alone did not develop vaginal bleeding. h, H&E staining and IHC for ARID1A, P-S6 and KRT8 (n ≥ 2) of the endometrium at 5X (scale bar = 200 μm) and 20X (scale bar = 50 μm) magnification, with x20 magnifications representing portion panel to the right surrounded by black box. ARID1A expression is lost in the endometrial epithelium of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice. P-S6 is shown as marker of AKT pathway activation, and KRT8 is a marker of endometrial epithelium. Arrows indicate endometrial epithelium. 39 2.3.2 Mutant endometrial epithelia show hallmarks of epithelial-to-mesenchymal transition To profile in vivo gene expression changes in mutant endometrial epithelia at an early stage of transformation, we devised an enzymatic digestion and magnetic isolation protocol to positively enrich epithelial populations (Fig. 2.2a). Endometrial epithelial cells express EPCAM (Fig. 2.2b), and EPCAM expression is not altered in the hyperplastic endometrium of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice (Fig. 2.2c). Following positive selection, we analyzed purified populations by flow cytometry, and purity was approximately equivalent between genotypes (Fig. A.3a-b). We then isolated RNA from control and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice. Purified LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl cells showed significantly reduced ARID1A messenger RNA (mRNA) expression (Fig. 2.2d). These samples were processed for RNA-seq, from which we observed 3481 differentially expressed (DE) genes between control and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl cells (FDR < 0.05, DESeq2) (Fig. A.3c). Using stringent criteria (FDR < 10-5, two-fold change), we identified a gene signature of 517 DE genes (Fig. A.3d). We found overlap between LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G, including 963 genes differentially expressed in all genotypes relative to control (Fig. A.3e-g). We performed Gene Set Enrichment Analyses (GSEA) on differentially expressed genes (FDR < 0.05) in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelial cells and identified epithelial-to-mesenchymal transition (EMT) as the top misregulated pathway using hallmark pathway enrichment (Fig. 2.2e). Mesenchymal marker overexpression in endometrial cancer correlates with poor prognosis (Kyo et al. 2006), which is consistent with several Gene Ontology (GO) pathways related to cell motility, migration and adhesion that were identified (Fig. 40 2.2f), further suggesting EMT as a key misregulated pathway in the LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelia. Recently, Mak and Tong et al. identified a patient-derived EMT signature of 77 genes across 11 cancer types (Mak et al. 2016). This gene signature was significantly enriched by GSEA in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl vs. control as well as in ARID1A mutant UCEC patients vs. ARID1A wild-type patients (NES = 1.72 and 1.88, respectively) (Fig. 2.2g), and it contained 33 genes that were differentially expressed in mutant mouse endometrial cells (Fig. 2.2h). EMT is characterized by the loss of cell adherens junctions, tight junctions and apical-basal polarity (Nieto et al. 2016). In LtfCre0/+; Arid1afl/fl mice, we observed reduced CLDN10 and tight junction protein-1 (ZO-1) expression by immunofluorescence (IF), while expression of ICAM-1 was induced, indicating impaired tight junctions (Fig. A.4a-d). ZO-1 expression was partially restored in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (Fig. A.4a). LtfCre0/+; Arid1afl/fl endometrium has high expression of Cleaved Caspase-3 (CASP3), indicating increased apoptosis in the absence of PIK3CAH1047R (Fig. A.4e). Expression of mesenchymal-marker VIM (Vimentin) and EMT transcription factor SNAI2 (Slug) were observed in both LtfCre0/+; Arid1afl/fl and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mutant endometrial epithelia, indicating a shift towards a mesenchymal phenotype (Fig. A.4f-g). CDH1 (E-Cadherin) mislocalization was observed in mutant endometrial epithelia, suggesting alterations in epithelial adherens junctions (Fig. A.4h). These data suggest that the EMT phenotype observed in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelia are driven primarily by ARID1A loss. 41 Figure 2.2 RNA-seq analysis of EPCAM-positive endometrial epithelial cells isolated via magnetic sorting 42 Figure 2.2 (cont’d) a, Schematic of EPCAM isolation using anti-EPCAM-PE antibody and anti-PE microbeads. b, EPCAM is expressed in the endometrial epithelium of a LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mouse by IHC (n = 3). Arrows indicate endometrial epithelium (scale bar = 100 μm). c, IF staining of EPCAM and ARID1A in mouse endometrium (n ≥ 3). Arrows indicate endometrial epithelium (scale bar = 25 μm). d, qPCR analysis of Arid1a gene expression of isolated control (n = 3, pooled groups of six mice) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (mutant) (n = 4, single mice) cells (mean ± s.d; ** p < 0.01, two-tailed, unpaired t-test). e-f, Pathway enrichment analysis on human orthologs of differentially expressed genes between LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl, and control mice (FDR < 0.05; 3481 genes) for MSigDB Hallmark pathways (e) and Gene Ontology (GO) Biological Process terms (f). g, GSEA plots showing significance of Mak and Tong et al. pan-cancer EMT signature upregulation within LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl compared to control and UCEC ARID1Amut patients compared to ARID1Awt. h, Hierarchical clustering of 77 genes within the Mak and Tong et al. pan-cancer EMT signature between control and mutant purified endometrium. Genes found in the Hallmark EMT pathway, and CDH1, are identified. 43 2.3.3 Mouse gene signature identifies invasive patient population We next wanted to determine if LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelial gene expression patterns resembled human disease. We utilized mutation and RNA-seq expression data from the TCGA-UCEC dataset with single-sample GSEA (ssGSEA) to rank UCEC patient endometrioid tumors with gene expression patterns similar to our mouse model. We segregated the upper (similar to mouse) and lower (dissimilar to mouse) quartiles of patients based on human orthologs of our gene signature (Fig. 2.3a). Upper quartile UCEC patients display concordant expression changes for 74% of genes within the LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl gene signature relative to lower quartile patients (Fig. 2.3b). Upper quartile patients show upregulation of EMT, Interferon gamma (IFNγ), Notch and P53 signaling pathways, and downregulation of the unfolded protein response (UPR) (Fig. 2.3c). We confirmed downregulation of GRP94 and GRP78, two proteins critical to the UPR, in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelia in vivo by IHC and IF (Fig. A.2b-c). When comparing ARID1A mutant and wild-type UCEC patients, we also identified upregulation of the EMT pathway (Fig. 2.3d). Clinical staging of endometrial cancer is determined by invasion into surrounding tissue, including the myometrium, cervix, vagina, bladder, and distant metastasis (Sorosky 2012). Upper quartile patients were diagnosed with advanced clinical stage relative to all UCEC patients, with significantly more stage III and stage IV patients (p < 0.01, Chi-squared) (Fig. 2.3e). Furthermore, upper quartile patients had significantly more invasion than lower quartile patients (p < 0.05, two- tailed, unpaired Wilcoxon test) (Fig. 2.3f). These data suggest that endometrial cells from LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice are more representative of UCEC patients with advanced stage, invasive tumors. 44 Figure 2.3 LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl gene signature correlates with invasive patient gene expression a, Distribution of TCGA-UCEC endometrioid patient tumors relative to ssGSEA score for human orthologs of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl gene signature. b, Clustered comparison of scaled fold-change values for signature genes between LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl vs. control mice and upper vs. lower quartile of UCEC endometrioid patients. EMT genes from Hallmark pathway and Mak and Tong pan-cancer gene signature are identified. c, Scatter plot of Hallmark pathway GSEA Normalized Enrichment Scores (NES) from LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl vs. control (human orthologs) and upper quartile of UCEC endometrioid patients vs. lower quartile. d, Scatter plot of Hallmark pathway GSEA NES from upper quartile of UCEC endometrioid patients vs. lower quartile and UCEC endometrioid ARID1Amut (frameshift/truncating alterations) vs. ARID1Awt. e, Upper quartile ssGSEA-enriched UCEC endometrioid patients present with higher stage disease relative to all patients (p < 0.01, Chi-squared). f, Upper quartile ssGSEA-enriched UCEC endometrioid patients have more invasive tumors relative to lower quartile patients (p < 0.05, unpaired Wilcoxon test, one-tailed). Box-and-whiskers plotted in the style of Tukey without outliers. 45 2.3.4 ARID1A loss increases promoter accessibility in vivo To gain insight into chromatin accessibility alterations that may drive the observed gene expression changes, we performed ATAC-seq (Buenrostro et al. 2013) on anti-EPCAM-purified cells. See Chapter 3 for a review on ATAC-seq and technical challenges for measuring genome- wide chromatin accessibility and regulation. In general, the peaks were broader in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl cells compared to cells from control mice (p < 10-15, two-tailed, unpaired Wilcoxon test), potentially indicating greater chromatin accessibility in mutant cells (Fig. 2.4a-b). Among differentially accessible peaks (FDR < 0.20, csaw | edgeR), 2053 showed decreased accessibility in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice, while 1429 showed increased accessibility, suggesting a global trend toward decreasing accessibility (Fig. 2.4c). Primarily, differentially accessible peaks represented mononucleosome fragments (Fig. 2.4d). Despite the trend toward decreased accessibility, among promoters (defined as regions ±3 kb to transcription start sites or TSS) we observed significant increases in accessibility (p < 10−72, Chi- squared test) (Fig. 2.4e), with 470 promoter peaks increasing in accessibility and 179 decreasing (Fig. 2.4f). Genomic repeat elements trended toward decreased accessibility (80% decreasing), accounting for a global trend toward decreasing accessibility (Fig. 2.4f). Among peaks with increased accessibility, CpG islands, promoters and 5′ UTR were the top enriched genomic features (Fig. 2.4g). Differentially accessible peaks, including promoter peaks, were generally located proximal to TSS, with 31.2% of all peaks located within 10 kb of a TSS (Fig. 2.4h-i). We also performed ATAC-seq on EPCAM-purified cells from LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G and observed enrichment for differential accessibility among promoters (p < 10-500) (Fig. A.3h-p). 46 Among genes with differentially accessible promoter peaks, EMT appeared as the top enriched pathway (Fig. 2.4j). We identified significant overlap between differentially accessible promoters and differentially expressed genes (p < 10-8, hypergeometric enrichment) (Fig. 2.4k). Chromatin accessibility was positively correlated with gene expression (p < 10-9, Spearman correlation) (Fig. 2.4l). Among these genes, EMT again appeared as a top affected pathway by enrichment analysis (Fig. 2.4m). Altogether, these data demonstrate that endometrial ARID1A loss and PI3K activation results in increased accessibility at gene promoters and differential accessibility of EMT pathway genes. 47 Figure 2.4 ATAC-seq analysis of differentially accessible chromatin in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelia 48 Figure 2.4 (cont’d) a, ATAC-seq read density heatmap from naive overlapping peaks of control and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl EPCAM-positive cells, ranked by total intensity. Reads are centered on the middle of the accessible peak ±3 kb. Control (n = 2, pooled groups of six mice) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (n = 2, single mice). b, Peak width distributions of control and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl ATAC-seq peaks, which are significantly different (p < 10-15, two-tailed, unpaired Wilcoxon test). c, Volcano plot for differential accessibility of ATAC-seq peaks between control and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl cells. Red points represent significant peaks (csaw FDR < 0.20). d, Peak width distribution of differentially accessible peaks. e, Magnitude distribution of differentially accessible peaks separated by total peaks (gray) and promoter peaks (red, within 3 kb of TSS). f, Detailed peak annotation of increasing and decreasing differentially accessible regions for total, non-repetitive and repetitive peaks based on genome annotation. g, Enrichment for significant genomic features among differentially accessible peaks, ranked by significance. Enrichment ratio is calculated by bp of feature in ATAC peak set compared to background genome. h, Histogram of all differential ATAC peaks depicting distance to nearest TSS. Percent of peaks found within 10, 30, or 100 kb of the TSS are shown. i, Histogram of differential ATAC promoter peaks depicting distance to nearest TSS. j, MSigDB Hallmark pathway enrichment of genes with differentially accessible promoter peaks. k, Differentially accessible promoter peak clustering based on direction and magnitude of change in gene expression and promoter accessibility. Black bars indicate significant differential gene expression by RNA-seq (FDR < 0.05). l, Scatter plot depicting the relationship between direction and magnitude of change in accessibility and gene expression for differential promoter peaks. Alterations in accessibility and expression are significantly correlated (rs = 0.26, p < 10-9, Spearman). m, MSigDB Hallmark pathway enrichment of overlapping differentially accessible promoters and differentially expressed genes. 49 2.3.5 ARID1A functionally binds gene promoters To explore the role of ARID1A loss alone in the regulation of endometrial epithelial chromatin accessibility, we utilized an immortalized human endometrial epithelial cell line, 12Z (Zeitvogel, Baumann, and Starzinski-Powitz 2001). Transfection of 12Z cells with small- interfering RNAs (siRNAs) targeting ARID1A (siARID1A) reduced ARID1A protein expression relative to cells transfected with non-targeting siRNA control (Fig. 2.5a). Next, we performed ATAC-seq on siARID1A transfected 12Z cells compared to control cells (Fig. A.5a-d). ARID1A loss led to a trend toward decreasing chromatin accessibility genome-wide, while chromatin accessibility was significantly increased at promoters compared to non-promoters (p < 10-500, Chi- squared test) (Fig. 2.5b). These results recapitulate our findings in vivo, suggesting differential chromatin changes in vivo are likely driven by ARID1A loss alone. In order to profile sites of genome-wide ARID1A occupancy, we performed ARID1A ChIP-seq in 12Z cells. The specificity of the ARID1A ChIP-seq antibody used was validated by co-immunoprecipitation (co-IP) and mass spectrometry (Fig. A.5e-f). We identified 46,180 unique sites of ARID1A genome-wide occupancy (Fig. 2.5c). The majority of ARID1A ChIP-seq peaks were less than 1000 bp in width (Fig. 2.5d) and were often proximal to TSS, with roughly one- quarter of all peaks being within 10 kb of the TSS (Fig. 2.5e-f). ARID1A binding was significantly enriched at promoters (Fig. 2.5g). Among ARID1A-bound sites, we observed an enrichment of the Activator protein 1 (AP-1) motif, both genome-wide (p < 10-8170) and at promoters (p < 10-800). ARID1A has been shown to regulate chromatin accessibility at AP-1 motifs (Kelso et al. 2017; Vierbuchen et al. 2017), and we also observed an enrichment for the AP-1 motif at sites of differential accessibility in vivo and in vitro (Fig. A.5g), suggesting ARID1A regulates chromatin at AP-1 binding sites. 50 ARID1A-bound promoters were enriched for EMT hallmark genes (Fig. 2.5i). We also observed significant overlap between ARID1A binding and sites of accessible chromatin, which were positively correlated (p < 10-15, Spearman) (Fig. 2.5j-k). Among differentially accessible promoters, ARID1A was bound to 354 promoters, which increased in accessibility, and 124 promoters, which decreased in accessibility upon ARID1A loss (Fig. 2.5l). To further explore the relationship between ARID1A binding and gene expression, we performed RNA-seq on non-targeting siRNA control and siARID1A treated 12Z cells. Differentially expressed genes (FDR < 0.0001, DESeq2) were significantly enriched for the LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl gene signature (p < 10-5, hypergeometric enrichment) (Fig. 2.5m). ARID1A promoter binding was significantly enriched in differentially expressed genes with ARID1A knockdown (p < 10-208, hypergeometric enrichment) (Fig. 2.5n). Over 800 genes were identified as displaying such direct, functional chromatin regulation by ARID1A in 12Z cells (Tables A.1 and A.2). While ARID1A promoter binding was evenly distributed among upregulated and downregulated genes (Fig. A.5h), we observed a higher degree of gene upregulation following ARID1A loss among genes with ARID1A binding at the promoter compared to those without detected ARID1A binding (p = 0.002, two-tailed, unpaired Wilcoxon test) (Fig. 2.5o). ARID1A bound, upregulated genes are enriched for EMT pathways (Fig. 2.5p), and ARID1A binding is observed in the promoters of mesenchymal identity genes (Fig. 2.5q). In total, we observed that ARID1A directly regulates the expression of over 50 Hallmark EMT genes, most of which are repressed by ARID1A chromatin interactions (Table 2.1). These data support a mechanistic role for ARID1A in the suppression of mesenchymal gene transcription. 51 Figure 2.5 ARID1A chromatin binding is associated with accessibility and differential gene expression driven by ARID1A loss in a human endometrial epithelial cell line 52 Figure 2.5 (cont’d) a, Western blot of ARID1A expression in siRNA-treated 12Z cells. β-actin was used as endogenous control. b, Annotation of differentially accessible ATAC peaks (FDR < 0.05) from 12Z siARID1A, separated into fractions by directionality and promoter vs. non-promoter. Significant association (p < 10-500, Chi-squared) between increasing accessibility and promoter status. c, Annotation of ARID1A ChIP peaks in wild-type 12Z cells. d, Peak width distribution of ChIP peaks. e, Histogram of all ChIP peaks depicting distance to nearest TSS. Percent of peaks found within 10, 30, or 100 kb of the TSS are shown. f, Histogram of ChIP promoter peaks depicting distance to nearest TSS. g, Enrichment for significant genomic features among ChIP peaks, ranked by significance. Enrichment ratio is calculated by bp of feature in ChIP peak set compared to background genome. h, de novo motif enrichment of ChIP peaks genome-wide and at promoters. i, MSigDB Hallmark pathway enrichment of genes with ChIP promoter peaks. j, Read density heatmap of ARID1A ChIP-seq and ATAC-seq (control) at all gene promoters (n = 24,132), ranked by signal intensity for ARID1A ChIP-seq. k, Scatter plot depicting correlation between ARID1A binding and chromatin accessibility (rs = 0.312, p < 10-15, Spearman). l, Proportional Euler diagram of overlap between ARID1A binding, decreasing and increasing chromatin accessibility at promoters. m, Enrichment for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl gene signature among 12Z siARID1A differentially expressed genes (p < 10-5, hypergeometric enrichment). n, Enrichment of ARID1A binding at 12Z siARID1A differentially expressed genes (p < 10-208, hypergeometric enrichment). o, Fold-change in gene expression of siARID1A upregulated genes, segregated based on ARID1A promoter-binding status (p = 0.002, two-tailed, unpaired Wilcoxon test). Box-and-whiskers plotted in the style of Tukey without outliers. p, MSigDB Hallmark pathway enrichment of 12Z siARID1A differentially expressed genes (FDR < 0.0001). q, Example browser tracks for ARID1A binding profile. Signal is displayed as log likelihood ratio (logLR) of ARID1A ChIP compared to input chromatin. Single replicate signal is represented in light green, overlapping signal is represented in dark green. Green bars represent peaks called. 53 Table 2.1 EMT genes functionally regulated by ARID1A chromatin interactions EMT genes directly repressed by ARID1A: CALD1 CD59 COL1A2 COL5A2 CXCL8 DPYSL3 EMP3 FAP FBLN2 GADD45A GJA1 GPC1 ID2 IL6 ITGA2 ITGA5 ITGAV ITGB1 LAMC1 LOX LOXL2 MYL9 NT5E PLAUR PMEPA1 RGS4 SCG2 SERPINH1 SPARC TAGLN TFPI2 TGFB1 TGM2 THBS1 THBS2 TIMP3 VCAM1 EMT genes directly activated by ARID1A: COL12A1 COL5A3 DKK1 FGF2 FN1 FSTL1 GEM LOXL1 MEST MMP3 OXTR PMP22 QSOX1 VEGFC WNT5A Genes within the MSigDB Hallmark EMT pathway with significant ARID1A promoter binding (ChIP-seq, n = 2, MACS2 broadPeak FDR < 0.05) in human 12Z endometriotic epithelial cells that also display ARID1A-mediated transcriptional repression (i.e. upregulation following acute ARID1A siRNA knockdown, siARID1A, DESeq2 FDR < 0.0001) or transcriptional activation (i.e. downregulation with siARID1A). ARID1A was detected to directly repress 37 EMT genes and directly activate 15 EMT genes, through this gene set analysis. 54 2.3.6 ARID1A loss promotes mesenchymal phenotype To further interrogate the relationship between ARID1A and PIK3CA in the regulation of the EMT pathway, we again utilized the 12Z cell line. EMT is regulated by several transcription factors, including SNAI1 (Snail), SNAI2 and TWIST1 (Twist) (Nieto et al. 2016). Upon ARID1A knockdown by siRNA (siARID1A), we observed upregulation of SNAI1, SNAI2, and TWIST1 protein expression (Fig. 2.6a). Transfection with PIK3CAH1047R expression plasmid (pPIK3CAH1047R) led to AKT/mTOR pathway activation, as indicated by phosphorylation of AKT at serine 473 (P-AKT Ser473) (Fig. 2.6a). In cells transfected with both siARID1A and pPIK3CAH1047R, we observed decreased induction TWIST1 (Fig. 2.6a). Expression of SNAI1 and SNAI2 was not affected by pPIK3CAH1047R (Fig. 2.6a). Moreover, pPIK3CAH1047R induced CDH1 expression (Fig. 2.6a) and partially rescued the CDH1 downregulation observed in cells transfected with only siARID1A. We next performed RNA-seq on cells transfected with siARID1A, pPIK3CAH1047R, or both. We found that while ARID1A loss resulted in differential gene expression of 2565 genes, PIK3CAH1047R expression resulted in differential gene expression of only 233 genes (FDR < 0.0001) (Fig. 2.6b). Some genes differentially expressed by PIK3CAH1047R overlapped with siARID1A and siARID1A/PIK3CAH1047R samples, displaying unique patterns of gene expression (Fig. A.6). Among Hallmark pathways, we observed siARID1A and PIK3CAH1047R convergence on the NFκB pathway, as previously described in ovarian clear cell carcinoma (Kim, Lu, and Zhang 2016), and the EMT pathway (Fig. 2.6c). Differentially expressed genes from siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R samples compared to controls were enriched for the LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl gene signature and the Mak and Tong pan-cancer EMT gene signature (Fig. 2.6d). For genes found in the Mak and Tong signature and 55 the hallmark EMT pathway, we identified an antagonistic relationship between siARID1A and PIK3CAH1047R, such that gene expression changes observed in siARID1A samples were reduced in siARID1A/PIK3CAH1047R samples (Fig. 2.6e-f). To further explore the antagonistic relationship between ARID1A loss and PIK3CAH1047R, we identified a unique group of genes at the intersection between differentially expressed genes in siARID1A relative to control and siARID1A/PIK3CAH1047R relative to siARID1A (Fig. 2.6g). These 127 genes represent genes, which are differentially expressed by siARID1A, and further altered by the addition of PIK3CAH1047R. Of these genes, 47.2% were bound by ARID1A at the promoter in wild-type 12Z cells (p < 10-18, hypergeometric enrichment) (Fig. 2.6h). We observed significant upregulation of these genes in siARID1A samples, and downregulation in siARID1A/PIK3CAH1047R (Fig. 2.6h-i). These genes were enriched for the hallmark EMT pathway, which was the most significant result (Fig. 2.6j). The differential gene expression of EMT genes upon ARID1A loss was confirmed by quantitative reverse transcriptase (qRT)-PCR (Fig. 2.6k). These data provide further evidence that ARID1A loss induces a mesenchymal phenotype, which is antagonized by the PIK3CAH1047R mutation, resulting in a partial EMT phenotype. 56 Figure 2.6 PIK3CAH1047R antagonizes ARID1A loss-induced mesenchymal phenotypes 57 Figure 2.6 (cont’d) a, Western blot of ARID1A, β-Actin, AKT, P-AKT, CDH1, SNAI1, SNAI2, and TWIST1 following co-transfection of siNONtg and empty vector (control), siARID1A and empty vector (siARID1A), siNONtg and pPIK3CAH1047R (PIK3CAH1047R), or siARID1A and pPIK3CAH1047R (siARID1A/PIK3CAH1047R). b, Proportional Euler diagram displaying differentially expressed genes (FDR < 0.0001) from siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R relative to control. c, MSigDB Hallmark pathway enrichment for siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R differentially expressed genes. d, Enrichment for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mouse signature ortholog genes and Mak et al. pan-cancer gene signature within differentially expressed genes from siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R relative to control. e-f, Fold-change values of experimental groups relative to control for genes in the Mak and Tong pan-cancer EMT signature (e) and the Hallmark EMT signature (f), separated based on direction of gene expression change in siARID1A. Statistic represented is two-tailed, paired Wilcoxon test. Box-and-whiskers plotted in the style of Tukey without outliers. g, Intersection between siARID1A differentially expressed genes relative to control and siARID1A/PIK3CAH1047R relative to siARID1A. h, Heat map detailing relative expression of intersecting genes (n = 127, Fig. 2.6g) in control, siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R, and ARID1A promoter binding. These genes were enriched for ARID1A promoter binding (p < 10-18, hypergeometric enrichment). i, Expression level of intersect genes (n = 127, Fig. 2.6g) in siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R relative to control. Statistic represented is two-tailed, paired Wilcoxon test. Box-and-whiskers plotted in the style of Tukey without outliers. j, MSigDB Hallmark pathway enrichment for intersecting genes (n = 127, Fig. 2.6g). k, Changes in relative EMT gene expression upon ARID1A loss and PIK3CAH1047R overexpression as measured by qRT-PCR. Data represents three biological replicates 58 2.3.7 ARID1A loss and PIK3CAH1047R promote invasive phenotypes Partial EMT is associated with invasive phenotypes (Nieto et al. 2016), and EMT pathways play key roles in endometrial cancer disease progression by promoting the invasion of epithelial cells into the myometrium (Mirantes et al. 2013). To distinguish between the effect of ARID1A loss or PIK3CAH1047R on invasive phenotypes, we co-transfected 12Z cells with a PIK3CAH1047R expression plasmid and lentivirus expressing ARID1A short-hairpin RNAs (shRNAs) (shARID1A) (Fig. 2.7a). ARID1A knockdown induced migratory and invasive phenotypes in 12Z cells, and co-transfection with pPIK3CAH1047R significantly enhanced migration and invasion (Fig. 2.7b-c). Cells treated with shARID1A displayed increased expression of F-actin (Fig. 2.7c). These results suggest that the co-mutation of ARID1A and PIK3CA in the endometrial epithelium promotes an invasive phenotype. In vivo, we observed a requirement for both ARID1A loss and PI3K activation for invasive phenotypes. In LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice, we observed invasion of endometrial epithelium into the myometrium (Fig. 2.7d). KRT8-positive epithelial cells migrated outside of the endometrium, invading α-smooth muscle actin (α-SMA)-positive myometrial cells and formed tumors (Fig. 2.7e). Invading epithelial cells contained a narrow leading edge and strand-like morphology, suggesting a collective migration of cells (Friedl and Gilmour 2009). Some invasive sites formed well-differentiated adenomas (Fig. 2.7d), while others were poorly differentiated clusters of tumor cells (Fig. 2.7d-e). Invasive KRT8- positive epithelial glands were observed in direct contact with myometrial cells, often appearing as strands of epithelial cells trailing through the myometrial layers (Fig. 2.7d). These results suggest that ARID1A loss and PIK3CAH1047R expression in the endometrial epithelium results in a partial EMT phenotype, promoting lesion formation and myometrial invasion (Fig. 2.7f). 59 Figure 2.7 ARID1A loss and PIK3CAH1047R promote myometrial invasion in vivo and migration in vitro 60 Figure 2.7 (cont’d) a, Western blot of ARID1A, β-Actin, AKT, P-AKT, following co-transfection of shNONtg and empty vector (control), shARID1A and empty vector (shARID1A), shNONtg and pPIK3CAH1047R (PIK3CAH1047R) or shARID1A and pPIK3CAH1047R (shARID1A/PIK3CAH1047R). b, Invasion assay of 12Z cells with ARID1A loss and PIK3CAH1047R overexpression. Representative images of calcein AM-stained cells are and total invaded cell counts are shown (scale bar = 500 μm). Data represents four biological replicates (mean ± s.d; * p < 0.05, ** p < 0.01, **** p < 0.0001, two-tailed, unpaired t-test). c, Migration assay of 12Z cells with ARID1A loss and PIK3CAH1047R overexpression. Upper images are representative of cells 24 hours following removal of insert (scale bar = 500 μm). Lower images are maximum intensity confocal projections of cells stained with fluorescent phalloidin to label with F-actin (scale bar = 50 μm). Average Migration represents the average difference distance across each migration front from 0 to 24 hours. Migrating cell counts represent number of cells in migration area after 24 hours. Data represents three biological replicates (mean ± s.d; * p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001, two-tailed, unpaired t-test). d, Myometrial invasion observed in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl. H&E staining and IHC for KRT8 at 3.33–6.66X (scale bar = 300- 600 μm, as stated on figure) and x20 (scale bar = 100 μm) magnification, with x20 magnifications representing portion panel to the right surrounded by yellow box. White arrows indicate invasive endometrial epithelium. Endo, endometrium; Myo, myometrium. e, Images of maximum intensity confocal projections of control and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrium sections stained with α-smooth muscle actin (α-SMA) (red), KRT8 (green) and counter-stained with DAPI (blue) (n ≥ 3). White arrows indicate invasive endometrial epithelium (scale bar = 50 or 10 μm, as stated on figure). f, Diagram representation of EMT-induced invasive endometrial epithelium following ARID1A loss and PIK3CAH1047R mutation. 61 2.4 Discussion In this chapter, we found that ARID1A functions as a haploinsufficient tumor suppressor in the endometrium. Mice developed invasive hyperplasia with complete penetrance when ARID1A heterozygosity or nullizygosity was combined with oncogenic PIK3CAH104R in the endometrial epithelium. Transcriptomic and genome-wide chromatin accessibility profiling of ARID1A/PIK3CA mutant endometrial epithelial cells in vivo revealed hallmarks of epithelial-to- mesenchymal transition by gene expression and promoter chromatin accessibility alterations. Further, gene expression signatures in ARID1A/PIK3CA mutant endometrial epithelial cells were reflective of advanced stage, invasive human endometrial tumors. We then leveraged the 12Z human endometriotic epithelial cell model to discern the joint and distinct effects of ARID1A and PIK3CA alterations. ARID1A genome-wide binding measurements by ChIP-seq revealed that ARID1A interactions with chromatin near gene promoters are associated with transcriptional misregulation following acute ARID1A loss. We observed that ARID1A loss promotes loss of epithelial characteristics, gain of mesenchymal features, and enhanced invasion in vitro, and PIK3CAH1047R functions antagonistically to preserve some epithelial characteristics reminiscent of partial EMT. Together, ARID1A/PIK3CA mutant mice collectively invade myometrium. Endometrial cancer survival rates are high if the disease is detected at an early stage when the tumors are still confined to the endometrium. Myometrial invasion or tumor dissemination to other sites in the body correlates with poor survival. The notion that collective epithelial invasion promotes endometrial cancer metastasis may lead to therapeutic options for patients with disseminated disease. The identification of pathways involved in the collective invasion may lead to the development of anti-metastatic drugs. 62 2.5 Methods 2.5.1 Mice All mice were maintained on an outbred genetic background using CD-1 mice (Charles River). (Gt)R26Pik3ca*H1047R and LtfCre (Tg(Ltf-iCre)14Mmul) alleles were purchased from The Jackson Laboratory and identified by PCR using published methods (Daikoku et al. 2014; Adams et al. 2011). Arid1afl and Arid1aV1068G alleles were distinguished by PCR (Chandler et al. 2015; Chandler et al. 2013). For detection of Arid1aV1068G allele, PCR product was treated with HincII at 37 °C for 1 hour. Endpoints were vaginal bleeding, severe abdominal distension, and signs of severe illness, such as dehydration, hunching, jaundice, ruffled fur, signs of infection, or non- responsiveness. Sample sizes within each genotype were chosen based on the proportions of animals with vaginal bleeding between each experimental group or a Kaplan-Meyer log-rank test for survival differences. For weight measurements, uteri were collected at time of sacrifice and placed immediately into neutral-buffered formalin at 4 °C. After 24 hours, tissues were washed with phosphate-buffered saline (PBS) and 50% EtOH, placed in 70% EtOH, and then weighed. Mice were housed at the Van Andel Research Institute Animal Facility and the Michigan State University Grand Rapids Research Center in accordance with protocols approved by Michigan State University. 2.5.2 Cell lines 12Z immortalized human endometrial epithelial cells (Zeitvogel, Baumann, and Starzinski-Powitz 2001) were provided by the laboratory of Asgi Fazleabas. 12Z cells were maintained in Dulbecco's Modified Eagle Media (DMEM)/F12 media supplemented with 10% fetal bovine serum (FBS), 1% L-glutamine and 1% penicillin/streptomycin (P/S). Lenti-XTM 293T (Clontech, 632180, 63 CVCL_0063) cells were maintained in DMEM + 110 mg/L Sodium Pyruvate (Gibco) supplemented with 10% FBS, 1% L-glutamine, 1% P/S. Cell line validation for the 12Z cell line was performed by IDEXX BioResearch: the 12Z cell line has a unique profile not found in the current public databases. The 12Z and Lenti-X 293T cell lines tested negative for mycoplasma contamination. Testing was performed using the Mycoplasma PCR Detection Kit (Applied Biological Materials). No commonly misidentified cell lines were used in this study. 2.5.3 Histology and immunohistochemistry For indirect immunohistochemistry (IHC), 10% neutral-buffered formalin (NBF)-fixed paraffin sections were processed for heat-based antigen unmasking in 10 mM sodium citrate [pH 6.0]. Sections were incubated with antibodies at the following dilutions: 1:200 ARID1A (D2A8U) (12354, Cell Signaling); 1:400 Phospho-S6 (4585, Cell Signaling); 1:100 KRT8 (TROMA1, DHSB); 1:100 EPCAM (G8.8-s, DHSB); 1:400 PGR (SAB5500165, Sigma). TROMA-I antibody was deposited to the DSHB by Brulet, P./Kemler, R. (DSHB Hybridoma Product TROMA-I). EPCAM antibody (G8.8) was deposited to the DSHB by Farr, A.G. (DSHB Hybridoma Product G8.8). The following Biotin-conjugated secondary antibodies were used: donkey anti-rabbit IgG (711-065-152, Jackson Immuno-research Lab) and donkey anti-rat IgG (#705-065-153, Jackson Immuno-research Lab). Secondary antibodies were detected using VECTASTAIN Elite ABC HRP Kit (Vector). Sections for IHC were lightly counter-stained with Hematoxylin QS or Methyl Green (Vector Labs). Routine Hematoxylin and Eosin (H&E) staining of sections was performed by the Van Andel Research Institute (VARI) Histology and Pathology Core. A VARI animal pathologist reviewed histological tumor assessments. 64 2.5.4 Immunofluorescence For indirect immunofluorescence, tissues were fixed in 4% paraformaldehyde. Frozen samples were sectioned at 10 μm on a CM3050 S cryostat (Leica) and collected on white frosted, positive charged ultra-clear microscope slides (Denville). Frozen slides were post-fixed with 2% PFA/1 PBS, and permeabilized with 0.3% TX100 in PBS, and treated with 100 mM glycine/1x PBS [pH 7.3]. Primary antibodies were applied to slides at the following dilutions: 1:200 ARID1A (D2A8U) (12354, Cell Signaling); 1:100 KRT8 (TROMA1, DHSB); 1:100 EPCAM (G8.8-s, DHSB); 1:50 ZO-1 (61-7300, ThermoFisher); 1:200 CDH1 (3195, Cell Signaling); 1:100 CLDN10 (38-8400, ThermoFisher); 1:100 VIM (5741, Cell Signaling); 1:400 PGR (SAB5500165, Sigma); 1:200 ERα (ab32063, abcam); 1:2000 SMA (Sigma, C618); 1:100 SNAI2 (9585, Cell Signaling); 1:40 ICAM- 1 (AF796-SP, R&D Systems). Secondary antibodies used were: 1:500 donkey anti-rabbit IgG, alexa fluor 555-conjugated antibody (#A-31572, ThermoFisher); 1:500 goat anti-rabbit IgG, alexa fluor 555-conjugated antibody (#A-21428, ThermoFisher); 1:500 goat anti-rat IgG, alexa fluor 647-conjugated antibody (A-21247, ThermoFisher); 1:250 donkey anti-rat IgG, alexa fluor 647- conjugated antibody (712-605-153, Jackson Immuno-Research Lab); 1:250 donkey anti-goat fluor 488-conjugated antibody (705-545-147, Jackson Immuno-Research Lab). Phalloidin-iFluor 594 (1:1000, abcam) was used to stain F-actin. Auto-fluorescence was quenched using the TrueVIEW Auto-fluorescence Quenching Kit (Vector Laboratories). ProLong Gold Antifade Reagent with DAPI (8961, Cell Signaling) was used for DAPI staining. 65 2.5.5 Microscopy and imaging Confocal images were taken on a Nikon Eclipse Ti inverted microscope using a Nikon C2 + confocal microscope laser scanner. Confocal immunofluorescent images are representative maximum intensity projections. 2.5.6 Cell sorting Mouse uteri were surgically removed and minced using scissors. Tissues were digested using the MACS Multi Tissue Dissociation Kit II (Miltenyi Biotec) for 80 minutes at 37 °C. Digested tissues were strained through a 40 μm nylon mesh (ThermoFisher). The Red Cell Lysis Buffer (Miltenyi Biotec) was used to remove red blood cells. Dead cells removed using the MACS Dead Cell Removal Kit (Miltenyi Biotec), and EPCAM-positive cells were positively selected and purified using a PE-conjugated EPCAM antibody and anti-PE MicroBeads (Miltenyi Biotec), per the manufacturers’ instructions. A BD Accuri C6 flow cytometer (BD Biosciences) was used to confirm purity of EPCAM-positive population. 2.5.7 RNA isolation and qRT-PCR The Arcturus PicoPure RNA Isolation Kit (ThemoFisher), including an on-column DNA digestion using the RNAse-free DNAse set (Qiagen), was used to purify RNA from in vivo EPCAM-sorted endometrial epithelial cells. To confirm loss of ARID1A transcript in EPCAM-positive LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl cells, complementary DNA (cDNA) was synthesized from RNA, and qRT-PCR was performed using Ssofast PCR master mix (Biorad) using previously described primers (Chandler et al. 2013) and the Applied Biosystems ViiA7 real-time PCR system. ARID1A expression was normalized to GAPDH. For in vitro experiments, RNA samples were collected 66 72 hours post siRNA transfection using the Quick-RNA Miniprep Kit (Zymo Research). cDNA was synthesized from RNA, and qRT-PCR was performed using PowerUp SYBR Green Master Mix (ThermoFisher) and the Applied Biosystems ViiA7 real-time PCR system. 2.5.8 RNA-seq Libraries were prepared by the Van Andel Genomics Core from 100 ng of total RNA for mouse samples, and Lexogen SIRV-set2 RNAs (Lexogen GmbH, Vienna Austria) were spiked into RNA prior to library preparation at a concentration of 1% by mass. For human samples, 500 ng of total RNA material was used as input, with no spike in. For all samples, libraries were generated using the KAPA Stranded mRNA-Seq Kit (v4.16) (Kapa Biosystems, Wilmington, MA USA). RNA was sheared to 250-300 bp and reverse transcribed. Prior to PCR amplification, cDNA fragments were ligated to Bio Scientific NEXTflex Adapters (Bio Scientific, Austin, TX, USA). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies, Inc.), QuantiFluor dsDNA System (Promega Corp., Madison, WI, USA), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). All libraries were pooled equimolarly, and single end sequencing to a minimum depth of 30 M reads per library was performed using an Illumina NextSeq 500 sequencer using a 75 bp sequencing kit (v2) (Illumina Inc., San Diego, CA, USA). Base calling was done by Illumina NextSeq Control Software (NCS) v2.0 and output of NCS was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 67 2.5.9 RNA-seq analysis Raw 75 bp reads were trimmed with cutadapt (Martin 2011) and Trim Galore! (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) followed by quality control analysis via FastQC (Andrews 2010). Trimmed mouse reads were aligned to mm10 genome assembly and indexed to GENCODE (Harrow et al. 2012; Frankish et al. 2019) vM16 GFF3 annotation via STAR (Dobin et al. 2013) aligner with flag `--quantMode GeneCounts` for feature counting, and human reads were aligned to GRCh38.p12 reference assembly (Schneider et al. 2017) and indexed to GENCODE v28. For mouse libraries, the SIRVome (Lexogen) was independently aligned and quantified for qualitative assessment of library concordance. Output gene count files were constructed into an experimental read count matrix in R. Low count genes were filtered (1 count per sample on average) prior to DESeq2 (Love, Huber, and Anders 2014; Love et al. 2015) count normalization and subsequent differential expression analysis. Calculated differential expression probabilities were corrected for multiple testing by independent hypothesis weighting (IHW) (Ignatiadis et al. 2016) for downstream analysis. Differentially expressed gene thresholds were set at FDR < 0.05 for mouse data and FDR < 0.0001 for human data. All reported instances of log2(fold-change) data from RNA-seq are adjusted by DESeq2 original shrinkage estimator except for TCGA-UCEC comparisons and statistical comparisons between log2(FC) values, which use non-adjusted values. Principal component analysis was calculated using DESeq2 from top 500 genes by variance across samples. RNA-seq heatmaps were generated using scaled regularized-logarithm (rlog) (Love, Huber, and Anders 2014) counts for visualization, or relative to controls by subtracting mean rlog counts. LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl signature genes were defined by FDR < 10-5 and |log2(FC)| > 1. 68 2.5.10 ATAC-seq Libraries were prepared following previously described methods (Buenrostro et al. 2013; Buenrostro et al. 2015). Mouse endometrial cells were isolated using methods described above. For purified mouse endometrial epithelium and 12Z cells, between 25,000 and 50,000 cells were resuspended in cold lysis buffer (10 mM Tris-HCL [pH 7.4], 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40) and centrifuged at 500 x g, 4 °C for 10 minutes to isolate nuclei. Nuclei were treated with Tn5 Transposase for 30 minutes at 37 °C using the Nextera DNA Library Prep Kit (Illumina). DNA was isolated using the Qiagen MinElute Reaction Cleanup Kit. Libraries were amplified using barcoded primers for 1-8 cycles as described (Buenrostro et al. 2013). Libraries were purified using Kapa Pure Beads to remove primer dimers and >1000 bp fragments. Libraries were sequenced by the Van Andel Genomics Core. Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies, Inc.), QuantiFluor dsDNA System (Promega Corp., Madison, WI, USA), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). All libraries were pooled equimolarly, and paired end sequencing to a minimum depth of 20 M reads per library was performed using an Illumina NextSeq 500 sequencer using a 150 bp sequencing kit (v2) (Illumina Inc., San Diego, CA, USA). Base calling was done by Illumina NextSeq Control Software (NCS) v2.0 and output of NCS was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 2.5.11 ATAC-seq analysis Libraries were combined across flow cells and trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality control analysis via FastQC (Andrews 2010). Trimmed reads were aligned to mm10 mouse reference genome via Bowtie2 (Langmead and Salzberg 2012) with flags 69 `--very-sensitive` and `-X 1000` in concordance with the library size-selection step, and, similarly, human reads were aligned to GRCh38.p12 reference genome assembly (Schneider et al. 2017) using the same parameters. Reads were then sorted and indexed with samtools (Li et al. 2009). Mitochondrial reads were then discarded from BAMs, using Harvard ATAC-seq module removeChrom script (https://github.com/harvardinformatics/ATAC-seq), and subsequently filtered for only properly paired reads by samtools view -f 3. At this step, working library complexity was estimated by ATACseqQC::estimateLibComplexity (Ou et al. 2018; Daley and Smith 2013). To compensate for differing library complexities within an experimental design, we normalized by randomly subsampling libraries to a calculated fraction of the original library, as estimated by the bootstrap interpolation, via samtools view with flag ‘-s’ to achieve normalized library sizes. After subsampling libraries to lowest complexity, PCR duplicates were removed with Picard MarkDuplicates (http://broadinstitute.github.io/picard/), and reads were finally name- sorted prior to conversion to BEDPE format with bedtools bamtobed (Quinlan and Hall 2010) with flag `-bedpe`. BEDPE coordinates were then shifted 4 and 5 bp to correct for Tn5 transposase integration (Buenrostro et al. 2013), and the standard BEDPE files were re-written to a minimal BEDPE format, as defined by MACS2 manual, through an awk script. MACS2 (Zhang et al. 2008) was used to call broad peaks from final minimal BEDPE fragment coordinates with FDR < 0.05 threshold and no control input, and the resulting peaks were repeat-masked by blacklist filtering. A naive overlap peak set, as defined by ENCODE (Landt et al. 2012), was constructed for each biological condition by combining replicates and calling broad peaks on pooled BEDPE files followed by bedtools intersect to select for peaks of at least 50% overlap with each biological replicate. 70 Differential accessibility was calculated by first defining a more relaxed consensus peak set, ! = (⋂"!#$ %! ) ⋃(⋂"!#$ (! ), for any partial intersect where %$ , … , %" are MACS2 peak sets from biological replicates of the experimental condition, and ($ , … , (" are peak sets from control biological replicates. This consensus peak set was used in csaw (Lun and Smyth 2016) as coordinates for counting reads within specified windows, with additional parameters set to restrict windows to standard chromosomes and non-blacklisted regions. Windows >1 kilobase in width were filtered along with low read-abundance windows (log2CPM < -3). In order to compensate for differing efficiencies of reactions between libraries, a non-linear loess-based normalization approach was employed to remove trended biases. This method was empirically determined to elicit the most conservative results as opposed to other approaches to window count normalization. csaw uses edgeR (Robinson, McCarthy, and Smyth 2010) quasi-likelihood functionality to calculate differential accessibility, for which FDR thresholds were used to determine final differential peak sets (FDR < 0.20 mouse data; FDR < 0.05 human data). Finally, proximal windows within 500 bp were merged, and the most significant window statistic was used to represent the merged window. Significant differentially accessible genomic regions were annotated by HOMER (Heinz et al. 2010) with a modification to cis-promoter classification as within 3000 bp of a canonical gene TSS, which remains consistent throughout all reported analyses. HOMER de novo motif enrichment and genome ontology analysis were performed on all significant differentially accessible genomic regions. Common differential mouse ATAC/RNA genes were selected by the presence of a differentially accessible promoter ATAC peak (FDR < 0.20) and RNA-seq differential expression (FDR < 0.05). 71 2.5.12 Analysis of TCGA-UCEC data ARID1A alteration incidence analysis was calculated using the TCGA Pan-Can UCEC (Cancer Genome Atlas Research Network et al. 2013; Hoadley et al. 2018) cohort (n = 509) retrieved from cBioPortal (Gao et al. 2013). All molecular data for subsequent analyses was pulled from the 28th January, 2016 release of Broad GDAC Firehose (https://doi.org/10.7908/C11G0KM9). For molecular comparisons, patients were considered ARID1Amut if they had somatic alterations (excluding missense and synonymous mutations) and ARID1Awt if no alterations were detected at the ARID1A locus. RNASeqV2 RSEM (Li and Dewey 2011) normalized gene counts were quantile normalized prior to filtering low-count genes (one count per sample on average) and fitting linear models via limma (Ritchie et al. 2015) for differential expression analysis in subsets of patients. Empirical Bayes moderated statistics were computed via limma::eBayes with arguments `trend = TRUE` and `robust = TRUE`, and probabilities were adjusted for multiple testing by FDR. Additional metrics for clinical staging and tumor invasion were acquired from the GDC (Grossman et al. 2016) TCGA-UCEC dataset (n = 605) in UCSC Xena (Goldman et al. 2020). Broad GSEA (Subramanian et al. 2005) for MSigDB v6.2 Hallmark pathways (Liberzon et al. 2015) was performed on ortholog-converted DESeq2 normalized counts from generated mouse data and RNASeqV2 RSEM normalized counts from TCGA-UCEC data. Broad ssGSEA (Barbie et al. 2009) was also performed on RNASeqV2 RSEM normalized counts from TCGA-UCEC data. Orthologs of the mouse gene signature established herein were used to define UCEC endometrioid patients in ssGSEA-enriched or unenriched quartiles, which reflect mouse model transcriptome. 72 2.5.13 Bioinformatics and statistics The 77 gene Pan-Cancer EMT signature was extracted from Supplementary Table S2 of Mak and Tong et al. (Mak et al. 2016). Various ClusterProfiler (Yu et al. 2012) functions were used to calculate and visualize pathway enrichment from a list of gene symbols or Entrez (Maglott et al. 2005) IDs with respective gene universes. biomaRt was used for all gene nomenclature and ortholog conversions (Durinck et al. 2009; Durinck et al. 2005; Smedley et al. 2009). ggplot2 was used for various plotting applications (Wickham 2016). ComplexHeatmap was used for hierarchical clustering by Euclidean distance and visualization (Gu, Eils, and Schlesner 2016). eulerr was used to produce proportional Euler diagrams (Larsson 2020). The cumulative hypergeometric distribution was used for enrichment tests performed throughout this manuscript. The statistical computing language R was used for many applications throughout this manuscript (R Core Team 2018). HOMER was used to annotate peaks and compute integer read counts at loci of interest for tag density heatmaps and scatter plots (Heinz et al. 2010). TxDb.Hsapiens.UCSC.hg38.knownGene was used to generate promoter regions for all standard hg38 genes (Bioconductor Core Team and Maintainer 2016). 2.5.14 Transfection of 12Z cells with siRNA and plasmid DNA 12Z cells were seeded at a density of 40,000 cells/mL in DMEM/F12 media supplemented with 10% FBS and 1% L-glutamine. The following day, cells were transfected with 50 pmol/mL of siRNA (Dharmacon, ON-TARGETplus Non-targeting Pool and human ARID1A #8289 SMARTpool) using the RNAiMax (ThermoFisher) lipofectamine reagent according to the manufacturer’s instructions at a ratio of 1:1 volume:volume in OptiMEM (Gibco). After 24 hours, the media was replaced. ATAC samples were collected after 48 hours. For plasmid co-transfection 73 experiments, 24 hours after siRNA transfection, cells were transfected with 500 ng pBabe vector containing PIK3CAH1047R (pPIK3CAH1047R) or pBabe empty vector using the FuGene HD transfection reagent (Promega) according to the manufacturers’ instructions at a ratio of 2:1 volume:mass, and media was replaced after 4 hours. The pPIK3CAH1047R was a gift from Jean Zhao (Addgene plasmid 12524) (Zhao et al. 2005). The following day, media was replaced with DMEM/F12 media supplemented with 0.5% FBS, 1% P/S, and 1% L-glutamine. Cells were collected 72 hours post siRNA transfection using the Quick-RNA Miniprep Kit (Zymo Research) for RNA or RIPA buffer (Cell Signaling) for protein. 2.5.15 Generation of lentiviral shRNA particles Lentiviral particles expressing shRNA were produced in 293T cells according to the manufacturers’ instructions. Briefly, Lenti-X™ 293T cells were transfected with lentiviral packaging mix (Sigma) and MISSION pKLO.1 plasmid containing non-targeting shRNA (shNONtg) or pooled ARID1A shRNAs (shARID1A) (Sigma) using polyethylenimine (PEI) in DMEM + 4.5 g/L D-Glucose, 110 mg/L Sodium Pyruvate, 10% FBS, 1% L-glutamine. After 4 hours, media was replaced with DMEM/F12, 10% FBS, 1% L-glutamine, 1% P/S. Viral particles were collected after 48 and 96 hours, and viral titers were calculated using the qPCR Lentiviral Titration Kit (ABM). 2.5.16 Migration assay 12Z cells were seeded into 35 mm dishes containing four-well culture inserts at a density of 4000 cells per well. After 24 hours, cells were transfected with 125 ng pBabe vector or pPIK3CAH1047R using the FuGene HD as described above. After 4 hours, cells were treated with lentiviral particles 74 expressing non-targeting shRNA or shARID1A at a multiplicity of infection of 100. After 24 hours, the media was replaced. At 48 hours post transfection, media was replaced with serum- free DMEM/F12 containing 1% L-glutamine and 1% P/S. After 16 hours of serum deprivation, culture inserts were removed and serum-free media was added. At 0 and 24 hours, images were taken using a Nikon Eclipse Ti microscope. Distances between migration fronts were measured using NIS Elements Advanced Research software at 16 different points 100 μm apart. Migration distance was calculated by subtracting the average distance across migration fronts at 24 hours from the average distance at 0 hours. Cells counts were conducted within a 1500 μm by 700 μm window surrounding the migration area. 2.5.17 Invasion assay 12Z cells were seeded in six-well dishes at a density of 50,000 cells per well. After 24 hours, cells were transfected with pPIK3CAH1047R or empty vector as described above. After 4 hours, cells were treated with lentiviral particles expressing shNONtg or shARID1A at a multiplicity of infection of 100. Media was replaced after 24 hours. At 48 hours post transfection, cells were trypsinized, and 100 μL of cell mixture containing 30,000 cells and 0.3 mg/mL Matrigel was seeded into transwell plates (8 μm pore polycarbonate membrane, Corning) pre-coated with 100 μL of 0.3 mg/mL Matrigel. After 1 hour, serum-free DMEM/F12 1% P/S, 1% L-glutamine media was added to the top chamber and DMEM/F12, 5% FBS, 1% P/S, 1% L-glutamine was added to the bottom chamber. After 16 hours, transwell units were transferred to plates containing 4 μg/mL calcein AM in DMEM/F12. After 1 hour, media was aspirated from the top chamber and unmigrated cells were removed with a cotton swab. Images were collected using a Nikon Eclipse 75 Ti microscope in five non-overlapping fields per well. ImageJ software (National Institutes of Health) was used to quantify cells based on size and intensity. 2.5.18 Western blotting Protein lysates were quantified using the Micro BCA Protein Assay Kit (ThermoFisher) and a FlexSystem3 plate reader. Protein lysates were run on a 4-15% gradient sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) gel (BioRad) and transferred to PVDF membrane using the TransBlot Turbo system (BioRad). Primary antibodies were used at the following dilutions: 1:1000 ARID1A (D2A8U) (12354, Cell Signaling); 1:1000 Akt (4691, Cell Signaling); 1:1000 β-Actin (8457, Cell Signaling); E-Cadherin (3195, Cell Signaling); 1:2000 Phospho-Akt (Ser473) (4060, Cell Signaling); 1:1000 Slug (9585, Cell Signaling); 1:1000 Snail (3879, Cell Signaling); 1:1000 Twist1 (T6451, Sigma); 1:100 ARID1B (sc-32762, Santa Cruz); 1:1000 Brg1 (ab110641, Abcam); 1:1000 BRM (11966, Cell Signaling); 1:100 ARID1A (PSG3) (sc-32761, Santa Cruz). Horseradish peroxidase (HRP) conjugated secondary antibodies (Cell Signaling) were used at a dilution of 1:2000. Clarity Western ECL Substrate (BioRad) was used for protein band visualization, and western blot exposures were captured using the ChemiDoc XRS + imaging system (BioRad). 2.5.19 Chromatin immunoprecipitation Wild-type 12Z cells were treated with 1% formaldehyde in DMEM/F12 media for 10 minutes at ambient temperature. Formaldehyde was quenched by the addition of 0.125 M Glycine and incubation for 5 minutes at ambient temperature, followed by wash with PBS. In all, 1 * 107 crosslinked cells were used per IP. Chromatin from crosslinked cells was fractionated by digestion 76 with micrococcal nuclease using the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) as per the manufacturers’ instructions, followed by 30 seconds of sonication. IPs were performed using the SimpleChIP Enzymatic Chromatin IP Kit per the manufacturers’ instructions with 1:100 anti-ARID1A (D2A8U) (12354, Cell Signaling). Crosslinks were reversed with 0.4 mg/mL Proteinase K (ThermoFisher) and 0.2 M NaCl at 65 °C for 2 hours. DNA was purified using the ChIP DNA Clean & Concentrator Kit (Zymo). 2.5.20 Chromatin immunoprecipitation sequencing (ChIP-seq) Libraries for input and IP samples were prepared by the Van Andel Genomics Core from 10 ng of input material and IP material using the KAPA Hyper Prep Kit (v5.16) (Kapa Biosystems, Wilmington, MA USA). Prior to PCR amplification, end repaired and A-tailed DNA fragments were ligated to Bioo Scientific NEXTflex Adapters (Bioo Scientific, Austin, TX, USA). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies, Inc.), QuantiFluor dsDNA System (Promega Corp., Madison, WI, USA), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled and 75 bp, single-end sequencing was performed on an Illumina NextSeq 500 sequencer using 75 cycle HO sequencing kits (v2) (Illumina Inc., San Diego, CA, USA), with all libraries run across two flow cells to return a minimum read depth of 80 M reads per input library and 40 M read per IP library. Base calling was done by Illumina NextSeq Control Software (NCS) v2.0 and output of NCS was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 77 2.5.21 ChIP-seq analysis Technical replicate libraries were combined across flow cells and trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality control analysis via FastQC (Andrews 2010). Trimmed reads were aligned to GRCh38.p12 reference human genome (Schneider et al. 2017) via Bowtie2 (Langmead and Salzberg 2012) with flag `--very-sensitive`. Reads were then sorted and indexed with samtools (Li et al. 2009). PCR duplicates were removed with Picard MarkDuplicates (http://broadinstitute.github.io/picard/), and again sorted and indexed. MACS2 (Zhang et al. 2008) was used to call broad peaks with FDR < 0.05 threshold on each ChIP replicate against the input control, and the resulting peaks were repeat-masked by blacklist filtering (Consortium 2012). A naive overlap peak set, as defined by ENCODE (Landt et al. 2012), was constructed by combining replicates and calling broad peaks on pooled BAM (binary SAM: Sequence Alignment/Map) files followed by bedtools intersect to select for peaks of at least 50% overlap with each biological replicate. Naive overlapping ChIP peaks were annotated by HOMER (Heinz et al. 2010), and de novo motif enrichment and genome ontology were performed on genome-wide and promoter (within 3 kb of a TSS) peak sets. Overlapping genes between ChIP/ATAC and ChIP/ATAC/RNA were selected by the presence of a significant ChIP peak and differentially accessible promoter ATAC peak (FDR < 0.05) located in the same promoter region (within 3 kb of TSS). 2.5.22 Co-immunoprecipitation (co-IP) Small-scale nuclear extracts and co-IPs from wild-type 12Z cells were performed. Briefly, Protein A or Protein G Dynabeads (Invitrogen) were conjugated with anti-ARID1A (D2A8U) (12354, Cell Signaling) anti-ARID1A (PSG3) (sc-32761, Santa Cruz), or anti-ARID1B (E9J4T) (92964, Cell Signaling) in PBS + 0.5% BSA overnight at 4 C. 400 µg of nuclear lysate was added to a final 78 volume of 1 mL IP buffer (20 mM HEPES [pH 7.9], 250 mM KCl, 10% glycerol, 0.2 mM EDTA, 0.1% Tween-20, 0.5 mM DTT, 0.5 mM PMSF), clarified by high-speed centrifugation and added to antibody-conjugated beads (D2A8U, 1:200; PSG3, 1:40; E9J4T, 1:200) and incubated overnight at 4 °C. IP samples were washed in a series of IP buffers with varying salt concentrations as follows: 150 mM KCl, 300 mM KCl, 500 mM KCl, 300 mM KCl, 100 mM KCl. IP samples were washed a final time in 60 mM KCl IP buffer in the absence of EDTA or Tween-20. Proteins were eluted twice with 100 mM glycine pH 2.5 on ice and neutralized by the addition of 1:10 (v:v) of 1 M Tris-HCl pH 8.0. 2.5.23 Co-IP followed by mass spectrometry Nuclear lysates from wild-type 12Z cells were prepared as described in the previous section. Protein A Dynabeads (Invitrogen) were conjugated with 8.3 μg anti-ARID1A (D2A8U) (12354, Cell Signaling) or IgG (2729, Cell Signaling) in PBS + 0.5% BSA + 0.01% Tween-20 overnight at 4 °C. Antibody-bead conjugates were crosslinked in BS3 (ThermoFisher) as described by the manufacturer protocol, and excess unlinked antibody was removed by one wash of 0.11 M glycine followed by quenching with Tris-HCl. 4.3 mg of nuclear lysate was added to a final volume of 14 mL IP buffer (20 mM HEPES [pH 7.9], 150 mM KCl, 10% glycerol, 0.2 mM EDTA, 0.1% Tween-20, 0.5 mM DTT, 0.5 mM PMSF) and clarified by high-speed centrifugation. Diluted nuclear lysate was added to antibody-crosslinked beads and incubated overnight at 4 °C. IP samples were washed in an IP buffer series with varying salt concentrations as follows: twice with 150 mM KCl, three times with 300 mM KCl, twice with 100 mM KCl. IP samples were washed a final time in 60 mM KCl IP buffer in the absence of EDTA or Tween-20. Proteins were eluted in 79 2x Laemmli + 100 µM DTT at 70 °C for 10 minutes. Eluates were processed for short-gel SDS- PAGE and mass spectrometry by the University of Massachusetts Mass Spectrometry core. 2.5.24 Mass spectrometry analysis All MS/MS samples were analyzed using Mascot (Matrix Science, London, UK). Mascot was set- up to search UniProtKB Swiss-Prot (Human) assuming the digestion enzyme as strict trypsin. Mascot was searched with a fragment ion mass tolerance of 0.050 Da and a parent ion tolerance of 10.0 PPM. Carbamidomethyl of cysteine was specified in Mascot as a fixed modification. Gln à pyro-Glu of glutamine and the N-terminus, oxidation of methionine and acetyl of the N- terminus were specified in Mascot as variable modifications. Scaffold (Proteome Software Inc., Portland, OR) was used to validate MS/MS-based peptide and protein identifications. Peptide identifications were accepted if they could be established at >85.0% probability by the Peptide Prophet algorithm (Keller et al. 2002) with Scaffold delta-mass correction. Protein identifications were accepted if they could be established at greater than 99.0% probability and contained at least two identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm (Nesvizhskii et al. 2003). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. Proteins sharing significant peptide evidence were grouped into clusters. 2.6 Data availability The RNA-seq, ChIP-seq and ATAC-seq data generated in this study are available in the GEO database under the accession code GSE121198. All the other data supporting the findings of this study are available within the study and appendix and from the author upon reasonable request. 80 2.7 Acknowledgments We thank Drs. Kathy Cho, Peter Laird, John Risinger, Jeff MacKeigan, Thomas McFall, Michael Anglesio, David Huntsman, and Ie-Ming Shih for helpful discussions. We thank the Van Andel Genomics Core for providing sequencing facilities and services. We thank the Van Andel Research Institute Histology and Pathology Core for the histology services, and Dr. Galen Hostetter for his assistance with mouse tumor pathology. We gratefully acknowledge the Mass Spectrometry Facility at the University of Massachusetts Medical School for assistance with the proteomic measurements. Ronald L. Chandler was supported by an Innovative Translational Grant from the Mary Kay Foundation (026-16) and Liz Tilberis Early Career Award from the Ovarian Cancer Research Fund Alliance (OCRFA) (457446). 81 CHAPTER 3 ATAC-SEQ NORMALIZATION METHOD CAN SIGNIFICANTLY AFFECT DIFFERENTIAL ACCESSIBILITY ANALYSIS AND INTERPRETATION A modified version of this chapter was previously published (Reske, Wilson, and Chandler 2020): Jake J. Reske, Mike R. Wilson, and Ronald L. Chandler. 2020. ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation. Epigenetics & Chromatin 13: 22. 3.1 Abstract Chromatin misregulation is associated with developmental disorders and cancer. Numerous methods for measuring genome-wide chromatin accessibility have been developed in the genomic era to interrogate the function of chromatin regulators. A recent technique which has gained widespread use due to speed and low input requirements with native chromatin is the Assay for Transposase-Accessible Chromatin, or ATAC-seq. Biologists have since used this method to compare chromatin accessibility between two cellular conditions. However, approaches for calculating differential accessibility can yield conflicting results, and little emphasis is placed on choice of normalization method during differential ATAC-seq analysis, especially when global chromatin alterations might be expected. Using the in vivo ATAC-seq data set generated in Chapter 2, we observed differences in chromatin accessibility patterns depending on the data normalization method used to calculate differential accessibility. This observation was further verified on published ATAC-seq data from yeast. We propose a generalized workflow for differential accessibility analysis using ATAC-seq data. We further show this workflow identifies sites of differential chromatin accessibility that correlate with gene expression and is sensitive to 82 differential analysis using negative controls. We argue that researchers should systematically compare multiple normalization methods before continuing with differential accessibility analysis. ATAC-seq users should be aware of the interpretations of potential bias within experimental data and the assumptions of the normalization method implemented. 3.2 Introduction In Chapter 2 studies, we measured genome-wide chromatin accessibility in ARID1A/PIK3CA mutant mouse endometrial epithelial cells and compared them to control cells through the ATAC-seq method. A major conclusion of that experiment was that mutant cells displayed more genes with increased promoter chromatin accessibility than those with decreased accessibility, and those chromatin accessibility alterations corresponded with changes in gene expression (Wilson et al. 2019). However, we observed that the ATAC-seq measurements between the two cell populations could be quantified and interpreted differently depending upon the analytical and statistical strategies used for defining query genomic regions, normalizing data across samples, and testing for differential accessibility. This chapter will begin with a review of genome-wide sequencing techniques leading to a focus on ATAC-seq analytical methodologies and differential accessibility analysis. Centrally, we will explore how in vivo ATAC-seq data generated in Chapter 2 can be analyzed and interpreted differently using various bioinformatic strategies, with emphasis and rationale on how we selected a conservative approach for our downstream analysis. Genome-wide quantitative sequencing methods for measuring genomic features have been recently developed to address various biological questions previously limited to locus-level interrogation. Current applications include gene expression (Wang, Gerstein, and Snyder 2009), 83 DNA methylation (Laird 2010), protein-DNA interactions (Park 2009), histone post-translational modifications (O'Geen, Echipare, and Farnham 2011), 3D genome organization (Lieberman-Aiden et al. 2009), nucleosome occupancy (Henikoff et al. 2011) and chromatin accessibility (Boyle et al. 2008). Researchers frequently apply these techniques to multiple cellular states in parallel to provide biological insight into the questions being investigated. These approaches promote unbiased experimental investigation by permitting genome-wide analysis paired with robust statistical testing procedures to improve null hypothesis rejection. As a result of their widespread application, there is an obvious need to benchmark and improve statistical or analytical methods used for properly interpreting the sequencing data generated by the molecular biology. The latest major technique for effectively measuring genome-wide chromatin accessibility is the Assay for Transposase-Accessible Chromatin, or ATAC-seq. This assay makes use of a Tn5 transposase reaction, which preferentially inserts a 9-bp adapter fragment into accessible regions of the genome (Buenrostro et al. 2013). Adapter-ligated genomic regions, which are typically nucleosome-depleted and euchromatic, can then be enriched for sequencing. This technique provides a similar readout as DNase I hypersensitivity (DNase-seq) and formaldehyde-assisted isolation of regulatory elements (FAIRE-seq), which also measure accessible chromatin regions, and it is an orthogonal assay to Micrococcal nuclease digestion (MNase-seq), which measures nucleosome-occupied regions (Klemm, Shipony, and Greenleaf 2019; Tsompana and Buck 2014). However, ATAC-seq offers many benefits over comparable assays including a lower input material requirement, shorter assay time, in situ library preparation, and further protocol adaptation to fresh-frozen tissue (Corces et al. 2017). These advantages have permitted precise in vivo regulatory genomic assays on small populations of sorted cells (Frerichs et al. 2019; Hilliard et al. 2019; Jen et al. 2019; Jia et al. 2018; Haines et al. 2018; Wilson et al. 2019). 84 ATAC-seq has been used to both identify basal accessible chromatin regions in a given cellular context as well as determine regions differentially accessible (DA) between two cellular states (Schep et al. 2015; Liu et al. 2019; Corces et al. 2018). The former is analyzed bioinformatically through a straightforward process which often involves calling signal peaks throughout the genome. In this respect, the analysis framework for ATAC-seq is similar to ChIP- seq and DNase-seq (Yan et al. 2020), though few comprehensive analyses and best practice reports exist. Previous work has evaluated performance of computational methods in DNase-seq footprinting analysis, and similar efforts should be made for ATAC-seq (Gusmao et al. 2016). For DA analysis, ATAC signal at enriched regions is quantified and compared between multiple conditions. Determining DA regions with high confidence poses a greater challenge due to variability in transposition reaction efficiency, upon which may be further compounded by in vivo heterogeneity and lack of guiding literature. The result is that outputs from multiple tools for calculating DA regions are often conflicting or inconsistent. Furthermore, while there are indeed a small number of studies which have attempted to streamline ATAC-seq data processing (Divate and Cheung 2018; Ahmed and Ucar 2017; Ou et al. 2018; Pranzatelli, Michael, and Chiorini 2018), there is little emphasis on statistical considerations of differential analysis. In this chapter, we further examine in vivo differential ATAC-seq analyses implemented in Chapter 2 studies and show that different analytical strategies lead to different results and biological interpretation, likely due to inherent biases within the data. 85 3.3 Results 3.3.1 Comparison of 8 analytical approaches to calculate ATAC-seq differential accessibility To determine if choice of ATAC-seq DA analysis method influences experimental results, we compared 8 different DA approaches (Table 3.1) using the published tools MACS2 (Zhang et al. 2008), DiffBind (Stark and Brown 2011), csaw (Lun and Smyth 2016), voom (Law et al. 2014), limma (Ritchie et al. 2015), edgeR (Robinson, McCarthy, and Smyth 2010), and DESeq2 (Love, Huber, and Anders 2014). Analyses I and II follow the DiffBind protocol, originally intended for ChIP-seq data, which constructs a consensus read count matrix from MACS2 replicate peak sets of m query regions by n samples. Briefly, MACS2 constructs an ATAC fragment pileup from aligned paired-end data, then builds a local bias track through a series of parameters to estimate background noise, and finally compares ATAC signal to the local background at each genomic bp using a Poisson test. Significant nearby regions are then merged into a peak. DiffBind then calculates linear scaling factors from either the total number of reads in each library, which assumes that true global differences may be expected and technical bias is small, or the total number of reads in queried peak regions, which should eliminate global differences in favor of reducing any technical biases. The former method is applied in I and the latter in II. The count matrix with normalization factors is then subject to the DESeq2 framework of dispersion estimation and negative binomial generalized linear model (GLM) fitting for hypothesis testing, according to the design matrix. Analyses III through VI follow approaches described in the csaw manual (Lun and Smyth 2016). III and IV count reads in query regions defined by MACS2 peak sets then filter low-abundance windows, while V and VI use the csaw sliding window approach to quantify ATAC signal in the 300-bp interval query windows across the genome. The de novo query 86 windows in V and VI then pass low-abundance filtering and are tested for signal enrichment greater than three-fold over the surrounding 2 kilobase local neighborhood. For normalization, III and V implement the trimmed mean of M values (TMM) (Robinson and Oshlack 2010) method to generate linear scaling factors from counts in large, 10-kb genomic bins. This method trims the top and bottom quantiles of bins based on fold-change and signal abundance in order to minimize the changes between samples at the majority of bins. TMM assumes that most regions are not truly DA, and it assesses for systematic signal differences present across the genome that are presumed to be technical. Therefore, the TMM method should control for technical error more than scaling to total read depth by eliminating any systematic biases in library ATAC distribution, while still permitting true asymmetric differences specifically in DA regions. IV and VI implement a non- linear loess-based (loess: locally estimated scatterplot smoothing) normalization method. This highly conservative method normalizes the signal distribution locally based on extent of ATAC signal abundance. As a result, the loess fit assumes a symmetric global distribution in which there are no true biological global differences in ATAC reaction efficiency or distribution, and any evidence of these biases are technical and should be removed. The count matrices with respective normalization factors or offsets from all four of these csaw analyses are then subject to the edgeR statistical framework of estimating dispersions by empirical Bayes and quasi-likelihood GLM fitting for hypothesis testing, according to the design matrix. Finally, analyses VII and VIII follow the same procedure as III and IV with respect to using MACS2 peak query regions filtered by abundance in csaw, but instead they follow a voom transformation to log2 counts per million (log2CPM) in VII which are further quantile normalized in VIII. The log2CPM transformation simply scales by full library size and maintains those assumptions, while quantile normalization equalizes the signal distribution across all libraries (Bolstad et al. 2003). Quantile normalization 87 should function analogously to loess normalization by eliminating any global or trended biases, and it has been previously applied to ATAC-seq data (Corces et al. 2018). The (normalized) log- count matrices from these two analyses are then mean-variance estimated to generate weights for limma linear modeling and hypothesis testing by empirical Bayes-moderated statistics. See Appendix B for workflow details for each analytical approach. Table 3.1 Description of 8 approaches used to calculate ATAC-seq differential accessibility # Genomic regions Tool Normalization DA testing I MACS2 DiffBind Full library size DESeq2 II MACS2 DiffBind Reads in peaks DESeq2 III MACS2 csaw TMM edgeR IV MACS2 csaw Loess edgeR V csaw csaw TMM edgeR VI csaw csaw Loess edgeR VII MACS2 csaw | voom log2CPM limma VIII MACS2 csaw | voom Quantile limma Summary of computational and statistical strategies for calculating differential accessibility (DA) from ATAC-seq data. Each row represents one tested strategy, out of 8 total, labeled by an arbitrary roman numeral (first column). Other columns represent, in order, the approach used for determining query genomic regions of interest, the overall DA testing computational framework tool, the applied normalization strategy, and the statistical software used for DA modeling and testing. 88 3.3.2 Choice of ATAC-seq analytical approach is a key step in determining differential chromatin accessibility In Chapter 2, we generated an ATAC-seq data set in which chromatin accessibility was compared between sorted hyperplastic LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl and control mouse endometrial epithelial cells. Mutant and wild-type endometrial epithelial cells were positively selected by a surface marker, EPCAM, and purified by magnetic bead separation. ATAC-seq was selected as a suitable method for analyzing chromatin accessibility changes in sorted cells due to feasibility of supplying low input material. We next compared chromatin accessibility patterns between mutant and control endometrial epithelial populations using the 8 DA approaches. Different patterns of DA measurements were observed through MA plot visualization depending on the choice of DA approach used (Fig. 3.1). MA plots are a type of Bland-Altman plot applied to genomic data, where they were originally used in microarray analysis (Dudoit 2002). MA plots depict global patterns of measurements compared between two samples, where each tested genomic feature is quantified by the difference between the two groups as the y-axis (M) and signal intensity as the x-axis (A). For example, TMM normalization of the MACS2 peaks read count matrix in III shows the majority of these genomic regions are increasing accessibility in mutant cells compared to control cells, whereas a small minority are decreasing. The TMM approach is similar to the default normalization method of DiffBind in I which scales counts based on library size, but TMM only considers regions expected to be unchanged for generating normalization factors. These observations are in contrast with the DA results of a more conservative loess-based count adjustment in IV. The loess normalization yielded more significant DA regions that were decreasing rather than increasing accessibility. MA plots for DA regions following all varying 89 analyses show how these global patterns are affected through different normalization methods. The DA distribution is shifted upwards in the TMM normalized windows, which may or may not be technical in nature, and is corrected with the loess normalization. Figure 3.1 DA distributions from the same ATAC-seq data set analyzed by 8 different DA approaches Example MA plots for ATAC-enriched regions of interest analyzed for differential accessibility by different approaches. I and II are from DiffBind using MACS2 peak sets and with scaling factors derived from full libraries or reads in peaks only, respectively. III and IV are from csaw using MACS2 peak sets as query regions with either a TMM or loess-based normalization method. Likewise, V and VI are from csaw, but instead using de novo query regions identified through local neighborhood enrichment. VII was calculated using MACS2 peak sets transformed to log2 counts per million (log2CPM) by voom which is further quantile normalized in VIII. MA plot x-axis represents average ATAC signal abundance at that region, while y-axis is the log2 difference in ATAC signal between the two conditions. Black dots represent non-significant regions, and red dots represent significant (FDR < 0.10) DA regions. Blue lines are loess fits to each distribution with 95% confidence intervals shaded in gray. 90 We next performed a detailed comparative analysis of the DA outputs from all 8 different approaches. The number of significant DA regions identified by an FDR < 0.10 threshold ranged from 33 to 24,450 with varying proportions of regions increasing vs. decreasing accessibility (Fig. 3.2a). Genomic annotation of these peaks by HOMER (Heinz et al. 2010) showed that gene promoters, defined as a region within 3 kb of a TSS, constituted varying extents of the DA regions ranging from 6% to 51% (Fig. 3.2b). However, in 6 out of the 8 tested approaches, gene promoters were predominantly increasing in accessibility overall (Fig. 3.2c). This comparative analysis permitted the probable conclusion that gene promoters are mostly increasing in accessibility in mutant cells, even though the global patterns observed in comparisons between each DA method show discordance. Understanding what gene promoters display affected accessibility is biologically informative since promoter chromatin accessibility often correlates with transcription (Boyle et al. 2008; Li, Carey, and Workman 2007). We first asked how many genes were commonly identified as exhibiting a DA promoter region among the multiple tested approaches. Strikingly, no genes were found commonly among all 8 approaches, though certain sets of genes were commonly identified in multiple approaches (Fig. 3.2d). As shown by pathway enrichment analysis, choice of DA approach also led to differences in biological processes observed for promoters with affected chromatin accessibility. Epithelial-to-mesenchymal transition (EMT) was uniquely highlighted in significant DA promoter genes from DA approaches IV, V, VII, and VIII as opposed to other methods (Fig. 3.2e). This result corroborates Chapter 2 data, in which we observed a role of ARID1A and PIK3CA mutations in EMT-related processes through multiple molecular and cell- based in vitro and in vivo assays (Wilson et al. 2019). 91 A commonly used approach to validate chromatin accessibility observations is to compare the results with gene expression data. Next, we asked if those genes with a significant DA promoter region found in any one of the DA approaches were associated with differential expression (DE) in mutant vs. control endometrial epithelial populations. One issue is that quantifying direct overlap between promoter DA and gene DE is sensitive to the DA FDR threshold. To overcome this, we implemented a precision-recall (PR) curve to predict gene DE status (binary, based on RNA-seq FDR < 0.05 threshold) with promoter DA FDR values yielded by a given DA analysis. The PR curves showed modest predictive ability of promoter DA, though analyses III, IV, V, and VIII were better predictors of gene expression changes than the others (Fig. B.1a). The predictive abilities are most apparent before 20% recall (i.e., 20% of all DE genes with a tested promoter ATAC region), indicating that gene expression changes observed in this experiment are not entirely determined by alterations in promoter chromatin accessibility. We further investigated how FDR thresholding could affect DA outputs. We observed different FDR thresholds between our 8 different approaches that elicited a 5% null hypothesis rejection, ranging from FDR = 0.00331 (I) to FDR = 0.936 (VI) (Fig. B.1b). As such, we compared the number of DE genes with a DA promoter region (Fig. B.1c) and saw that this was highly dependent on FDR threshold. These results suggest that FDR thresholding can change between DA testing methods, and optimizing this aspect of the analysis may also improve results and interpretation. Collectively, all of these analyses underscore the importance of comparative analysis with multiple DA outputs before settling on conclusions. Furthermore, choosing a conservative normalization method may reduce both the need for such rigorous comparisons or use of multiple independent assays. 92 Figure 3.2 Output comparison of approaches for computing differential accessibility a, Output comparison of 8 approaches described in Fig. 3.1 for calling significant DA regions in ATAC-seq data, separated by increasing vs. decreasing accessibility regions. b, Comparison of same 8 approaches divided by significant DA promoter regions (within 3 kb of a TSS) vs. distal (further than 3 kb of a TSS). c, Comparison of significant DA promoter regions in all 8 approaches segregated by increasing vs. decreasing accessibility. d, Quantification of overlapping genes associated with a significant DA promoter region between all 8 approaches. e, Gene set enrichment of Hallmark MSigDB pathways among genes with DA promoters for all 8 approaches. Enrichment displayed as observed/expected ratio, where red values indicate pathway overrepresentation. 93 3.3.3 Temporal chromatin accessibility measurements in yeast also display normalization bias In addition to comparing the effects of genetic mutations or other treatment conditions, examining temporal changes in chromatin accessibility in cell populations is another application of ATAC-seq DA analysis. We utilized the Schep et al. osmotic stress time-course ATAC-seq data set from yeast (Schep et al. 2015) to determine if choice of DA analysis workflow yielded different results. Yeast cells were treated with 0.6 M NaCl and harvested cells for ATAC-seq at four, 15- minute intervals, up to 60 minutes, for comparison against control cells at 0 minutes exposure (n = 2 per time point). An advantage to this data set is the inclusion of two control groups, one containing NaCl in the wash buffer and one without NaCl, which permits comparisons between two negative control groups. Schep et al. also utilized a published expression microarray data set by Ni et al. from the same 0.6 M NaCl treatment design, in which three patterns of gene expression (unchanged, upregulated, and downregulated) were defined based on osmotic stress response over time (Ni et al. 2009). Schep et al. reported that a subset of genes from each expression response pattern also displayed similar profiles with respect to promoter accessibility change (Schep et al. 2015). Our workflow detected between 1271 and 1894 genome-wide naïve overlap peaks at any given time condition, resulting in between 2261 and 2601 tested DA regions depending on the DA analysis approach used. DA comparison of the two control groups with all 8 analytical approaches resulted in very few statistically significant DA regions (FDR < 0.05), ranging from 0 to 85 regions (Fig. B.2). This in contrast with the 15-minute (n = 2) vs. 0-minute control (n = 4) comparison, in which between 491 and 1082 regions were determined significantly DA (Fig. B.3a). Because the yeast genome is highly compact with functional genes and relatively few introns or other distal 94 regulatory elements (compared to vertebrates), > 95% of DA regions at 15 minutes exposure were located within gene promoters (defined as -2000 to +200 bp from TSS) with every approach (Fig. B.3b). We then determined the overlap of promoter regions displaying increasing or decreasing accessibility with each of the three gene expression patterns defined by Ni et al. We were able to classify DA regions and gene expression patterns for all 8 approaches, with up to 75% of the increasing accessibility promoter regions and up to 70% of the decreasing accessibility regions classified by expression changes (Fig. B.3c). However, we noted a wide range in the number of genes displaying concordant gene expression and promoter chromatin accessibility changes (Fig. B.3d). The most compelling observations occurred when we assessed the full spectrum of gene expression and chromatin accessibility changes across the entire time-course series. Certain DA analyses showed biologically expected changes in overall accessibility that reflected expression, such as II, IV, and VIII, while the accessibility profiles from other approaches appeared asymmetrical indicating technical bias (Fig. 3.3). With some DA analyses, strong DA statistical significance is observed among stably expressed genes, which are not expected to display accessibility changes (Fig. B.4). In most cases, the MA plot profiles were predictive of the gene expression and chromatin accessibility patterns observed throughout the time course. In 4 DA methods with an asymmetrical MA plot trend (I, III, V, and VII), the displayed chromatin accessibility profiles over the time course did not match the direction of gene expression change (Fig. 3.3). Most importantly, even in a highly controlled time-course experiment in yeast, certain DA analyses can yield technically discordant results that do not align with the orthogonal assays. 95 Figure 3.3 Comprehensive DA analysis and gene expression comparisons of yeast osmotic time- course series Time series analysis of the Schep et al. osmotic stress in yeast ATAC-seq data set with all 8 DA approaches. MA plots are shown for 15-minute exposure vs. 0-minute controls and exemplary of global effects of data normalization. Time-course line plots depict the mean change in accessibility at each time point compared to control samples, for all gene promoter ATAC regions defined by respective gene expression changes. Gene expression changes following the same 0.6 M NaCl treatment reported by Ni et al. are defined as stable expression (gray line), upregulated expression (red line), and downregulated expression (blue line). See Fig. B.4 for complete data and statistical analysis of time-course series with all 8 approaches. 96 3.3.4 Generalized ATAC-seq workflow for differential chromatin accessibility analysis Various studies have described ATAC-seq quality control and data processing for non- computational scientists, but few emphasize DA analysis (Divate and Cheung 2018; Ahmed and Ucar 2017; Ou et al. 2018; Pranzatelli, Michael, and Chiorini 2018). Moreover, we noted a literature gap in the importance of standardizing molecular complexity before quantifying differences between experimental ATAC libraries. This, coupled with our observation that choice of normalization method impacts DA results, led us to develop a comprehensive and easy-to- follow computational workflow for differential ATAC-seq data analysis. The workflow presented in Fig. 3.4 is devised to be widely applicable to any ATAC-seq data set or experimental design. It is based after the standardized ENCODE pipeline devised by Kundaje et al. (https://libraries.io/github/kundajelab/atac_dnase_pipelines) with modifications. Example applications include calling baseline accessible regions in naïve cells and identifying DA genomic regions. The field typically accepts data sets with at minimum two biological replicates for standard peak calling, as was established by ENCODE for ChIP-seq data (Landt et al. 2012), and at least two biological replicates are absolutely required for any DA statistical analysis. For each step, we have included a descriptive phrase along with the software tools used and example code. A detailed description of the workflow and custom unix scripts for certain functions are available in Appendix B. Notably, we implemented the ENCODE-defined naïve overlap to determine biological replicate peak concordance. This method calls peaks on pooled replicates, and then identifies peaks displaying at least 50% overlap with all single replicate peaks. We have supplied a Unix shell script (naiveOverlapBroad.sh) to execute this function for computing naïve overlap from two broadPeak replicates and can be easily modified to support more replicates. Additionally, publicly available blacklist regions (ENCODE consortium) refer to highly repetitive 97 or unstructured regions that display artificially high signal in genomic experiments (Amemiya, Kundaje, and Boyle 2019). A major addition to our workflow is quantifying and normalizing library molecular complexity. Often times, it is desired to quantify and compare ATAC signal at different genomic loci which are not typically part of an ATAC-seq or differential ATAC-seq analysis framework. This could involve genomic tiling, i.e., quantifying signal in evenly distributed genomic intervals, or quantifying ATAC at regions defined through other assays, such as ChIP-seq, or k-means clustering (Corces et al. 2018; Kelso et al. 2017; Hosoya et al. 2018; Toenhake et al. 2018). For certain analyses, it may not be appropriate to implement one of the normalization methods described herein, as determining proper biological or statistical assumptions may not be plausible. Rather, a linear transformation approach is often implemented, such as scaling libraries by read depth. However, this method assumes no differences in global library preparation biases, such as differing ATAC reaction efficiency. A potential result of such biases is affected sequence diversity which manifests in library molecular complexity, or the estimated number of sequenced molecules as determined by duplication rates. If uncorrected, libraries of equivalent read depth may be confounded by complexity in downstream analysis. Instead, libraries can be normalized to the estimated number of sequenced molecules, determined by duplication rates as the library molecular complexity. This method mitigates potential biases by considering that the relative distribution of transposase integration (reads at a given feature) is most biologically informative under an equivalent number of transposase reactions (total molecules sequenced, i.e., molecular complexity). We suggest estimating the complexities of all samples in the compared conditions, and then performing a stochastic subsampling process in order to normalize all samples to equivalent 98 molecular complexity. All analyses presented in this study have undergone this step unless specifically stated otherwise. The R packages preseqR and a wrapper ATACseqQC have implemented functions to estimate complexity by calculating a duplicate frequency matrix then estimating the number of unique molecules sequenced (i.e., molecular complexity) in each library sample (Ou et al. 2018; Daley and Smith 2013). samtools view can then be used to subsample libraries based on these estimates. In support that the stochastic subsampling process should not greatly affect experimental results, a replicated analysis with two different random subsampling seeds yielded highly similar and overlapping results from peak calling (Fig. B.5a) and DA analysis (Fig. B.5b). Among our sorted mouse epithelial cell ATAC-seq data set, control libraries had lower molecular complexities than mutant libraries (Fig. B.6a-b), which we corrected by subsampling (Fig. B.6c). By performing this complexity-normalization process, we had improved confidence that the observed ATAC differences were biological and not technical in nature. An example of the functional effects of complexity normalization is illustrated through ATAC signal quantification at a set of significant DA promoter regions (FDR < 0.10) defined by approach IV. 35 promoter regions calculated as significantly decreasing accessibility by this method did not yield statistical significance when quantifying ATAC RPKM in read depth-normalized libraries (Fig. B.6d), but the decreasing accessibility patterns become more evident when we compare complexity-normalized libraries (p = 0.0196, two-tailed, paired Wilcoxon test) (Fig. B.6e). This analysis highlights the potential confounding effects of library molecular complexity in comparative ATAC-seq analysis. Moreover, it supports the use of complexity-normalized libraries for certain quantitative purposes. 99 Figure 3.4 Generalized ATAC-seq data processing workflow intended for comparative analysis Stepwise bioinformatics process and example commands for analyzing ATAC-seq data from raw reads to calling peaks for downstream differential accessibility analysis. Consider “treat1” as an example mouse ATAC-seq Illumina paired-end library. Blue text denotes optional or conditional steps dependent on experimental design and desired output. Users seeking only to discover replicate-concordant accessible regions in a singular cell state may wish to call naïve overlapping peaks, though this step is not necessary for differential accessibility analysis. Bash scripts for Tn5 coordinate shift (bedpeTn5shift.sh), minimal BEDPE format conversion (bedpeMinimalConvert.sh), and calling naïve overlap broad peaks (naiveOverlapBroad.sh) as well as a machine-readable text version of this workflow are described in Appendix B. 100 3.3.5 Proposed workflow effectively retains ATAC-seq peak calls in an independent data set To further assess the effectiveness of our ATAC-seq data analysis workflow, we tested it on one of the original reported data sets generated by Buenrostro et al. (Buenrostro et al. 2013). ATAC-seq libraries were generated on three replicates of 50,000 GM12878 human lymphoblastoid cells, and the reported bioinformatic analysis yielded a replicate-merged peak set of 99,885 accessible chromatin regions via ZINBA (Rashid et al. 2011). From these data, we are able to assay the ability for our proposed workflow to identify biologically relevant ATAC-seq peaks. Through our workflow, we identified 20,945 genomic regions which MACS2 called a significant broad peak in all three replicates, of which 20,909 (99.8%) were also retained in the naïve overlap peak set indicating replicate peak region concordance >50% (Fig. 3.5a). To directly compare hg38-aligned naïve overlap peaks called through our workflow with the hg19-aligned Buenrostro et al. ZINBA peak set, we lifted the hg19 coordinates to hg38 with 99.96% successful mapping rate (Kent et al. 2002). We identified extremely strong concordance between the naïve overlap peak set and Buenrostro et al. ZINBA peak set, with over 97% of the naïve overlap peaks intersecting (Fig. 3.5b). This indicated that nearly all of the naïve overlap peak set regions were also identified with the Buenrostro et al. ZINBA peak set, but there were an additional nearly 80,000 peaks which were not identified in the naïve overlap peak set. When the two peak sets were annotated via HOMER (Heinz et al. 2010), we observed even stronger concordance and conservation between the genes with an identified promoter ATAC-seq peak in both peak sets. Whereas only roughly 20% of the genome-wide Buenrostro et al. ZINBA peaks intersected with naïve overlap peaks, approximately 60% of the genes with an identified promoter ATAC-seq peak were also identified in the naïve overlap peak set (Fig. 3.5c). 101 By leveraging a GM12878 microarray gene expression data set from Ernst et al., we were able to next compare the expression of genes which were identified as having a promoter ATAC- seq peak concordantly or uniquely between the two peak sets. Again, we observed strong concordance between the genes measured for expression by microarray which had a promoter ATAC-seq peak in the two peak sets (p = 0, hypergeometric enrichment), where 5508 were concordantly identified between both, 2585 were uniquely called in the Buenrostro et al. ZINBA peak set, and 161 were uniquely called in the naïve overlap peak set (Fig. 3.5d). We then compared the expression of the genes in each of these three bins. Genes were equivalently highly expressed that were concordantly identified as exhibiting a promoter ATAC between both peak sets or uniquely in the naïve overlap peak set, but the 2585 genes which were uniquely identified as having a promoter ATAC peak in the Buenrostro et al. ZINBA peak set were lowly expressed (Fig. 3.5e). Furthermore, all genes with a promoter ATAC peak in either of the peak sets were indeed overall more highly expressed than genes which never exhibited a promoter ATAC peak. A further extension of this analysis compared the correlation between promoter ATAC peak signal and respective gene expression for the 5508 commonly identified genes with microarray expression data, as it is widely accepted that higher promoter chromatin accessibility generally corresponds to higher gene expression. Correlations between promoter ATAC peak signal and gene expression were highly significant in both peak sets and analytical approaches and were not significantly different (Fig. 3.5f). We further observed that the naïve overlap peak set typically identified only conservative ATAC-seq peaks, yet there were also examples of robust ATAC signal which were only called as significant with the naïve overlap peak set, such as near the PPIP5K1 promoter (Fig. 3.5g). Altogether, these analyses suggest that the workflow proposed herein can identify conservative regions of significant ATAC signal which corroborate gene expression observations. 102 Figure 3.5 Conservative and relevant peak calling by proposed framework exemplified on Buenrostro et al. data 103 Figure 3.5 (cont’d) a, Overlap of MACS2 broad peaks called with proposed workflow between independent GM12878 ATAC-seq replicates from Buenrostro et al. Naïve overlap identifies 99.8% of fully replicate- intersecting peaks. b, Genome-wide overlap of naïve overlap peak set generated herein compared to ZINBA peak set reported by Buenrostro et al. c, Overlap of genes with detected ATAC promoters identified in the two peak sets as in b. d, Overlap of expression-measured genes with detected ATAC promoters in the two peak sets compared to all measured genes. GM12878 expression data was pulled from a microarray data set generated by Ernst et al. e, Microarray log2 expression levels (RMA) of genes segregated by promoter ATAC peak status detected between the two peak sets. Genes were binned as having a detected peak in both sets, only by naïve overlap herein, only by Buenrostro et al. ZINBA, or neither. Statistic is two-tailed, unpaired Wilcoxon test. f, Correlation of promoter ATAC peak signal and gene expression for 5508 genes with a detected promoter ATAC peak in both peak sets. ATAC signal is quantified by reads in peak (log10 scale; linear values displayed on axis for clarity), and the strongest value was selected to represent promoters with multiple peaks. Correlation statistics displayed are Pearson and Spearman. Overlaid linear fit is displayed in red and loess in blue. Fisher Z-transformation was used to compare correlation coefficients between both peak sets. g, Example ATAC-seq signal tracks showing peaks called (black bars) at different loci between the two peak sets. All three replicates are overlaid with darker colors representing overlapping replicates. Y-axis is log likelihood ratio of peak signal. 3.3.6 csaw differential accessibility workflow permits testing of multiple normalization methods We suggest using csaw as a go-to toolkit for standard downstream differential accessibility analysis. csaw is a flexible R package, originally designed for ChIP-seq analysis, which accepts sorted BAM files for DA quantification via edgeR quasi-likelihood methodology following any one of numerous implemented normalization methods to address many biological scenarios (Lun and Smyth 2016; Robinson, McCarthy, and Smyth 2010). Furthermore, csaw can be supplied MACS2 peak coordinates for DA analysis or alternatively perform de novo ATAC enrichment detection with sliding windows and proper type I error control. These features make csaw an attractive tool for comprehensive DA analysis. For DA analysis, we have graphically represented 104 a typical csaw workflow in R (Fig. 3.6). This workflow outlines a continuation of Fig. 3.4 into complete DA analysis which aids users to compute DA approaches III, IV, V, and VI for output evaluation. The sensitivity of the proposed workflows in distinguishing signal from noise is further evident in DA analysis of two independent groups of negative controls from the Schep et al. yeast ATAC-seq data set (Fig. B.2). The final steps of this graphic detail generation of an MA plot through ggplot2 (Wickham 2016) to assess normalization outcomes. The distribution of an MA plot from DA analysis provides insight into trends or potential biases within the data, which users should critically consider. An upward or downward shift in the MA distribution could either indicate a global effect or substantial technical bias, and certain normalization methods will maintain or eliminate these features in the data. If the distribution further does not appear symmetrical along the horizontal axis, then a trended bias may be present which can be corrected with conservative normalization approaches like quantile or loess. Ultimately, the researcher must consider prior biological knowledge and the experimental design to determine if data trends should be accepted as biological or eliminated as technical. 105 Figure 3.6 csaw workflow for multiple differential accessibility analyses in R Graphical representation of proposed csaw workflow in R for calculating differential accessibility. Consider an experimental design with n = 2 biological replicates from two conditions: “treat” and “control”. A machine-readable text version is available in Appendix B. 106 3.4 Discussion This study has revealed that differential accessibility analysis of ATAC-seq data can be sensitive to underlying biases within the data, as might be expected. The design of the primary analyzed data set involved disrupting a chromatin remodeler subunit, which probably affects genome-scale chromatin structure, and comparing chromatin accessibility to that of control cells. Analytical interpretation is further confounded by in vivo heterogeneity in sorted cell populations. The consequence is that common tools and approaches for performing DA analysis give vastly different results that are difficult to interpret at first glance. MA plots of DA results displayed global ATAC distribution biases that were only thoroughly eliminated through certain normalization methods, like loess-based count adjustments. By comparing multiple DA analysis outputs, common patterns emerged that permitted high likelihood conclusions, such as gene promoters increasing in accessibility following chromatin remodeler subunit disruption. Application of precision-recall analysis to predict RNA-seq gene expression changes with DA data supported biological relevance of certain DA methods, since gene expression and promoter chromatin accessibility are known to correlate. This also further emphasized that pairing ATAC- seq with RNA-seq can be a useful approach to interpreting chromatin accessibility observations. FDR thresholding analyses suggest that optimizing the DA significance threshold can also improve results and interpretation. Overall, these results indicate that naively relying on one single DA analysis approach may lead to false conclusions, particularly so without assessing for presence of biases within the data. In the case of modifying or disrupting chromatin regulators, these biases may be commonplace and should be critically considered before further interpretation of data. Furthermore, certain methods are more conservative than others and can be initially selected to improve result confidence without the need to perform rigorous method comparisons. 107 The issue of intrinsic biases within ATAC-seq data for experimental designs where global changes may be expected are difficult to interpret. In the case of the experimental data presented here, limited prior knowledge is available whether or not disruption of this specific chromatin remodeler subunit should affect widespread chromatin structure, but it is not improbable. Thus, we were not able to determine if the inherent biases within the data are biological or technical, and we opted to remove them as technical. However, this decision could result in significant type II errors interpreted as technical in origin, if in fact they are truly related to the biology. Multiple downstream comparative analyses supported the loess normalization method as conservative and biologically relevant to our data set, so it was chosen as the normalization method with which to proceed forward for our published downstream analysis from Chapter 2 (Wilson et al. 2019). Though, significant information may have been lost by eliminating the global, trended biases that were observed within the data. The observation that comparing ATAC-seq samples between two conditions may be subject to substantial experimental biases was actually accounted by Schep et al. in the yeast osmotic stress differential ATAC-seq report (Schep et al. 2015). The authors specifically noted that, ‘variation in the degree of enrichment of fragments with open chromatin regions can affect differential accessibility measurements between ATAC-seq samples’. As such, they followed count quantile normalization with a lowess curve fit transformation to eliminate trended biases within the data. This account further supports that differential ATAC-seq analyses are sensitive to experimental, technical biases, such as ATAC reaction efficiency, as well as the rationale behind use of the loess normalization method to elicit a highly conservative DA result. Still, this normalization method is not widely used for ATAC-seq DA analysis. It is also important to note that similar observations have been reported for ChIP-seq analysis, where non-linear loess 108 normalization methods were proposed and developed to eliminate systematic errors between libraries (Taslim et al. 2009). However, the rationale behind a loess fit assumes that the data should be symmetrical without a global change observed, so users should be aware that implementing this technique may hide any true global alterations present between the two conditions. Biases inherent to quantitative genomic techniques based on chromatin feature signal enrichment have been observed and considered previously (Meyer and Liu 2014). ATAC-seq fundamentally relies on an enzymatic reaction for library construction, which is likely to be affected by amount of enzyme, number of nuclei, and chromatin compaction and structure. ATAC- seq was recently reported to exhibit a sequence-specific bias distinct from DNase I libraries (Karabacak Calviello et al. 2019). MNase digestion has previously been shown to be highly sensitive to enzymatic activity and also displays sequence specificity bias (Mieczkowski et al. 2016; Dingwall, Lomonossoff, and Laskey 1981). In ChIP-seq libraries, potential bias in factor binding measurements is thought to derive from local transcriptional activity and chromatin structural properties (Park et al. 2013). Our current investigation has shown that quantitative comparative analysis of ATAC libraries is confounded by technical bias. When alternative methods to detect chromatin accessibility changes are unavailable (e.g., due to low cell numbers or input retrieval), users should empirically determine the most appropriate normalization methods and employ orthogonal assays, such as gene expression, for comparisons with ATAC-seq data. Calculating linear library normalization factors is a standard approach, but sensitive to biases. Here, we have shown that MA plots are a simple, qualitative approach to identifying systematic biases in experimental ATAC-seq libraries, as others have shown with ChIP-seq data (Shao et al. 2012). Calculating the fraction of reads in peaks (FRiP) score, as described by ENCODE (Landt et al. 2012) and streamlined by DiffBind, is a simple method to evaluating ATAC 109 efficiency between libraries to determine whether or not a systematic bias may be present. In the case of substantially differing ATAC efficiencies, linear normalization factors can be derived from only reads in peaks, as in DiffBind, or by applying TMM to only high abundance regions, as is suggested by the authors of csaw. As we have discussed, csaw also has a non-linear loess-based count normalization which can be easily implemented to assess its effects on DA calculation after the above considerations. Before DA analysis, the most significant addition to our proposed standard ATAC-seq data analysis workflow is normalization of library complexity by random subsampling. In the case that ATAC reaction efficiencies are different between libraries, it is advised to investigate the library molecular complexity as a technical source for this error arising during sequencing. If less input material is retrieved from transposition for certain samples, and more PCR amplification cycles are required as a result (Buenrostro et al. 2015), then bias is introduced into the amplified fragments dependent on GC content, fragment length, and oligonucleotide complexity (Aird et al. 2011). If libraries are complexity-normalized within an experimental design to the same estimated number of unique molecules, then direct quantitative comparison of unique fragments between conditions is more informative, e.g., integer feature read counts at loci identified in other assays. Like the systematic biases present in DA analysis, however, library complexity normalization is currently also a flawed concept. In the case that a drastic global decrease in chromatin accessibility is truly biological, then less transposed DNA fragment retrieval is expected, and these libraries might exhibit lower complexity. In this scenario, complexity normalization may not be desired as it would confound the true chromatin biology. However, without independent knowledge, this decision is not easily made. Notably, others have approached similar problems in ChIP-seq through addition of exogenous chromatin from a distinct species as reference for IP efficiency and 110 sequencing bias, referred to as “spike-in” controls (Orlando et al. 2014; Chen et al. 2015). More recently, this technique has been extended to incorporating a fraction of spike-in live cells prior to lysis in the ATAC-seq protocol (Stewart-Morgan, Reveron-Gomez, and Groth 2019), and the effects of spike-in normalization of ATAC-seq data could help establish technical or biological basis for global accessibility patterns, in principle. The presented ATAC-seq workflow and the suggested DA toolkit are not absolute and should be improved as analytical methods continue to emerge. For example, one newly developed method implementing a hidden Markov model (HMM) showed better performance for differential ChIP-seq analysis than csaw or other sliding window approaches, which suffer in identifying narrow changes within large genomic domains (Allhoff et al. 2016). At least one HMM tool has been developed specifically for calling nucleosome-free regions within ATAC-seq data, and its implementation could be extended to differential analysis (Tarbell and Liu 2019). While our framework was able to identify both broad and narrow regions of strong ATAC signal in the Buenrostro et al. GM12878 data set, the peak calling thresholds may be too strict to identify truly nucleosome-depleted regions displaying weak signal. Methods also currently exist for correcting sequence-specific biases resulting from various chromatin digestion and enrichment techniques, and the extent of analytical affect from this correction should be evaluated (Wang, Quach, and Furey 2017; Martins et al. 2018). Currently, ATAC-seq normalization and DA approaches should be carefully considered to appropriately reduce the inherent biases within each analysis. In summary, we present data indicating that ATAC-seq is sensitive to bias when comparing chromatin accessibility across multiple conditions. We compared several commonly used, published methods for calculating differential accessibility to the Chapter 2 in vivo ATAC-seq data set as well as a published yeast ATAC-seq time series data set, and we observe conflicting results 111 dependent upon the normalization method used. We provide intuitive, standardized bioinformatics methodology for analyzing ATAC-seq data by non-computational scientists. Our validated workflow also includes a critical, complexity-normalization step. Altogether, we argue that researchers should properly normalize ATAC-seq data before calculating differential accessibility. 3.5 Methods 3.5.1 ATAC-seq and differential accessibility analysis See Figs. 3.4 and 3.6, and Appendix B for complete workflow details and description. Mouse libraries were aligned to mm10 genome assembly, and yeast libraries were aligned to sacCer3 genome assembly. ATAC-seq peaks were not filtered for blacklisted regions in yeast, as they are not defined in this organism. Presented DA analyses were computed through the use of R packages DiffBind, DESeq2, csaw, edgeR, voom, and limma as described in section 3.2 (Stark and Brown 2011; Love, Huber, and Anders 2014; Lun and Smyth 2016; Robinson, McCarthy, and Smyth 2010; Law et al. 2014; Ritchie et al. 2015). Workflows for all tools are described in detail in Appendix B. The BAM files supplied to DA tools correspond to the coordinate sorted/indexed, duplicate removed, complexity-normalized, properly paired restricted, non-mitochondrial, paired- end BAM files generated as described in Fig. 3.4. 3.5.2 RNA-seq analysis Differential gene expression results from Chapter 2 RNA-seq data of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl vs. control sorted mouse endometrial epithelial cells were re- analyzed in this study. 3481 significant DE genes were selected by FDR < 0.05 filtering. 24,097 total expressed genes were used as gene universe for enrichment analyses. 112 3.5.3 GM12878 gene expression microarray analysis Raw data were downloaded from GEO for both GM12878 replicates generated by Affymetrix HT Human Genome U133A Array. CEL files were read into R through the affy package and normalized via the Robust Multi-Array Average (RMA) expression measure. RMA values are reported as log2 scale. The mean RMA value of both replicates was used for analyses in this manuscript. The 22,277 measured probes were collapsed to 11,369 genes with a unique Ensembl and symbol identifier (Zerbino et al. 2018). Expression comparisons of gene groups binned by ATAC peak status was achieved by two-tailed, unpaired Wilcoxon test. 3.5.4 Bioinformatics and statistics Mouse and human ATAC-seq peak coordinates were annotated by HOMER (Heinz et al. 2010) with a modification to cis-promoter classification as within 3000 bp of a canonical gene TSS. Yeast genomic regions were annotated by TxDb.Scerevisiae.UCSC.sacCer3.sgdGene R package using the genes() and promoters() functions with default settings (yeast promoters are defined as -2000 to +200 bp around TSS) (Carlson 2015). Unweighted precision-recall curves were generated by the PRROC R package using the pr.curve() function (Grau, Grosse, and Keilwagen 2015). For PR predictive analysis, the strongest (lowest) promoter DA FDR value was selected for each expressed gene, for each approach, and this value was used to predict boolean DE gene status segregated by RNA-seq DGE DESeq2 FDR < 0.05 threshold. Read counts in ATAC-seq peaks were calculated by HOMER for correlation and box dot plot quantification. MACS2 was used to generate ATAC- seq signal tracks for display in IGV (Zhang et al. 2008; Robinson et al. 2011). MSigDB Hallmark pathway enrichment was reported as observed/expected ratios derived from expressed gene sets compared to the respective expressed gene universe (Liberzon et al. 2015). Pathway hierarchical 113 clustering by Euclidean distance and heatmap were generated by ComplexHeatmap (Gu, Eils, and Schlesner 2016). biomaRt was used for all gene nomenclature and mouse-human ortholog conversions (Durinck et al. 2005; Durinck et al. 2009; Smedley et al. 2009). The cumulative hypergeometric distribution was calculated in R for enrichment tests. ggplot2 was used for certain plotting applications throughout this manuscript (Wickham 2016). 3.6 Data availability All analyzed data sets are publicly available at GEO accessions GSE121198, GSE66386, GSE47753, and GSE26312. Yeast genes with distinct expression response patterns following 0.6 M NaCl exposure were defined and extracted from Supp. Table 4 reported by Ni et al. (Ni et al. 2009). All workflow and custom function scripts are available in Appendix B and a public GitHub repository (https://github.com/reskejak/ATAC-seq). A detailed description of the workflow commands in Figs. 3.4 and 3.6 as well as all different DA analysis methods are available in Appendix B. 3.7 Acknowledgments We thank Drs. Ben Johnson and Tim Triche, Jr. for helpful discussions. 114 CHAPTER 4 CO-EXISTING TP53 AND ARID1A MUTATIONS PROMOTE AGGRESSIVE ENDOMETRIAL TUMORIGENESIS This chapter is not published at the time of dissertation submission. The following study is in collaboration with Mike R. Wilson, Jeanne Holladay, Hilary Skaslki, Shannon Harkins, John I. Risinger, and Ronald L. Chandler of the Department of Obstetrics, Gynecology and Reproductive Biology at Michigan State University College of Human Medicine, Marie Adams of the Van Andel Institute Genomics Core, Galen Hostetter of the Van Andel Institute Pathology and Biorepository Core, and Ken Lin of the Department of Obstetrics & Gynecology and Women’s Health at the Albert Einstein College of Medicine. 4.1 Abstract TP53 and ARID1A are frequently mutated across cancer but rarely in the same primary tumor. Endometrial cancer has the highest TP53-ARID1A mutual exclusivity rate. However, the functional relationship between TP53 and ARID1A mutations in the endometrium has not been elucidated. We used genetically engineered mice and in vivo genomic approaches to discern both unique and overlapping roles of TP53 and ARID1A in the endometrium. TP53 loss with oncogenic PIK3CAH1047R in the endometrial epithelium results in features of endometrial hyperplasia, adenocarcinoma, and intraepithelial carcinoma. Mutant endometrial epithelial cells were transcriptome profiled and compared to control cells and ARID1A/PIK3CA mutant endometrium. In the context of either TP53 or ARID1A loss, PIK3CA mutant endometrium exhibited inflammatory pathway activation, but other gene expression programs differed based on TP53 or ARID1A status, such as epithelial-to-mesenchymal transition. Gene expression patterns observed 115 in the genetic mouse models are reflective of human tumors with each respective genetic alteration. Consistent with TP53-ARID1A mutual exclusivity, the p53 pathway is activated following ARID1A loss in the endometrial epithelium, where ARID1A normally directly represses p53 pathway genes in vivo, including the stress-inducible transcription factor, ATF3. However, co- existing TP53-ARID1A mutations led to invasive adenocarcinoma associated with mutant ARID1A-driven ATF3 induction, reduced apoptosis, TP63+ squamous differentiation and invasion. These data suggest TP53 and ARID1A mutations drive shared and distinct tumorigenic programs in the endometrium and promote invasive endometrial cancer when existing simultaneously. Hence, TP53 and ARID1A mutations may co-occur in a subset of aggressive or metastatic endometrial cancers, with ARID1A loss promoting squamous differentiation and the acquisition of invasive properties. 4.2 Introduction In Chapter 2, we established the tumor suppressor roles of ARID1A in the endometrial epithelium and characterized ARID1A as a mediator of epithelial identity and invasion. As was reviewed, TP53 mutations are also frequent among gynecologic cancers (Berchuck et al. 1994; Le Gallo et al. 2012; Le Gallo et al. 2017; Berger et al. 2018). Both ARID1A and TP53 are commonly mutated in ovarian and uterine cancers, although there are mutation-defining subtypes within each cancer type (Tashiro et al. 1997; Okuda et al. 2003; Wiegand et al. 2010; Cancer Genome Atlas Research 2011; Cancer Genome Atlas Research Network et al. 2013; Cherniack et al. 2017). In fact, TP53 (p53) and ARID1A are among the most frequently mutated tumor suppressor genes across cancer (Lawrence et al. 2014). The historic tumor suppressor roles of TP53 have been well characterized over many thousands of reports since its discovery, when it was found interacting 116 with the transforming agent SV40 large T antigen (Lane and Crawford 1979; Kastenhuber and Lowe 2017). Meanwhile, the functions of ARID1A in cellular homeostasis and carcinogenesis have only recently been described since exome studies revealed widespread mutation prevalence in disease (Wu and Roberts 2013; Kadoch and Crabtree 2015). Both proteins serve roles in transcriptional regulation—ARID1A is a SWI/SNF chromatin remodeling complex subunit, as has been reviewed, while TP53 is a transcription factor (Sullivan et al. 2018). Evidence supports TP53 and ARID1A also have other important nuclear functions including DNA repair and cell cycle regulation (Shaw 1996; Nagl et al. 2005; Shen et al. 2015; Williams and Schumacher 2016). Despite the high mutation rates for each gene, an early mechanistic study showed biochemical and functional evidence linking ARID1A and TP53 regulation and mutant ARID1A- TP53 mutual exclusivity in a cohort of 77 ovarian clear cell and uterine endometrioid carcinomas, where all ARID1A mutant tumors were TP53 wild-type, and vice versa (Guan, Wang, and Shih Ie 2011). Since then, numerous reports have observed ARID1A and TP53 alterations co-occur less frequently than expected by chance in other human cancer types, including gastric, breast, and esophageal (Wang et al. 2011; Zang et al. 2012; Streppel et al. 2014; Cho et al. 2015). Among gynecologic cancers, loss of ARID1A expression by immunohistochemical staining was significantly associated with wild-type TP53 expression in high-grade endometrial tumors (Allo et al. 2014). Within the endometrioid subtype of endometrial cancer, one study observed that tumors marked by high TP53 expression, indicative of TP53 mutation, almost never displayed low/absent ARID1A expression (Bosse et al. 2013). In this study, we aimed to contrast the effects of endometrial TP53 and ARID1A mutations and test whether co-existing TP53-ARID1A mutations were viable through genetically engineered mouse models. 117 4.3 Results 4.3.1 TP53 and ARID1A mutations rarely co-occur in endometrial cancer We quantified the co-mutation rates for TP53 and ARID1A across 10,144 primary tumor samples from 33 cancer types profiled by TCGA, through the standardized MC3 mutation data set (Ellrott et al. 2018) (Fig. 4.1a). Five of 33 cancer types (uterine corpus endometrial carcinoma, UCEC; stomach adenocarcinoma, STAD; breast invasive carcinoma, BRCA; colon adenocarcinoma, COAD; ovarian serous cystadenocarcinoma, OV) display significant mutual exclusivity (two-tailed Fisher’s exact test, p < 0.05), indicating these mutations co-occur less frequently than expected by chance. One tumor type, cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), indicated the opposite, that mutations in these two genes were occurring more frequently than expected. Uterine corpus endometrial carcinoma (UCEC) (Cancer Genome Atlas Research Network et al. 2013) displayed the highest rate of TP53-ARID1A mutual exclusivity out of all profiled cancer types (two-tailed Fisher’s exact test, OR = 0.155, p < 10-20). We further investigated TP53 and ARID1A genetic alteration status in UCEC via the TCGA Pan-Cancer Atlas data set (Hoadley et al. 2018), which includes 509 primary tumor samples with both mutation and copy number alteration (CNA) data (Fig 1b). Tumors were considered “altered” for a gene if they displayed either a mutation or CNA event at each locus. In the combined data set of all disease subtypes, TP53 and ARID1A mutations were co-altered in 7.5% of tumor samples, while the independent alteration frequencies for each gene were above 30% (Fig. 4.1b). The UCEC data set is comprised of three distinct histological disease subtypes—endometrioid, serous, and mixed type morphology—which have different incidence rates and are strongly associated with TP53 and ARID1A status (Le Gallo et al. 2012; Cancer Genome Atlas Research 118 Network et al. 2013; Morice et al. 2016; Urick and Bell 2019). The original TCGA-UCEC report also classified four molecular subtypes based on integrative multi-omic analyses: POLE ultra- mutated, copy-number alteration high (CN high), copy-number alteration low (CN low), and microsatellite instable (MSI) (Cancer Genome Atlas Research Network et al. 2013). Therefore, we sought to determine if TP53-ARID1A genetic mutual exclusivity could be attributed to sampling error by investigating alterations within each histological and molecular subtype independently. In the two predominant endometrial cancer histological subtypes, endometrioid and serous, TP53 and ARID1A alterations co-occurred less frequently than expected by chance in primary tumors (two- tailed Fisher’s exact test, Fig. 4.1c). Across the molecular subtypes, only CN low tumors displayed significant ARID1A-TP53 mutual exclusivity (OR = 0.12, Fig. 4.1d). The other molecular subtypes are characterized by heightened genomic instability, which are more likely to harbor passenger mutations in those tumor subtypes. Supporting this, POLE ultra-mutated primary tumors are more associated with TP53/ARID1A co-alterations compared to other subtypes (OR = 4.6, two-tailed Fisher’s exact test, Fig 1e). Overall, these analyses suggest that mutually exclusive TP53 and ARID1A alterations are observed in primary uterine endometrial tumors of both endometrioid and serous subtypes and are notable in CN low tumors. 119 Figure 4.1 TP53 and ARID1A mutations rarely co-occur in endometrial cancer a, Pan-cancer analysis of TP53 and ARID1A mutation rates across 33 TCGA tumor types, only considerate of somatic single nucleotide variants. For heatmap, darker color indicates a greater proportion of sequenced tumor samples. Odds ratio (OR) and statistics for each tumor type accompany two-tailed Fisher’s exact tests performed on TP53 and ARID1A mutation contingency tables. Asterisks indicate significant associations between TP53 and ARID1A mutations, either co-occurring or mutually exclusive: * p < 0.05; ** p < 0.01; *** p < 0.001. b, Details within the Uterine Corpus Endometrial Carcinoma (UCEC) cohort (n = 509), further inclusive of copy number alteration (CNA) events. TP53 and ARID1A mutation classes (left) and distribution of histological subtypes (right) across the UCEC cohort. c, Left, distribution of UCEC histological subtypes: endometrioid, serous, and mixed. Right, TP53 and ARID1A alteration rates and association of co-occurrence for primary tumors within each histological subtype. d, As in c but for UCEC molecular subtypes: POLE mutant, copy-number alteration high (CN high), copy-number alteration low (CN low), and microsatellite instable (MSI). e, Association between POLE mutant molecular subtype tumors and TP53/ARID1A co- alterations. Statistic is two-tailed Fisher’s exact test. 120 4.3.2 TP53 loss in the presence of PIK3CAH1047R drives hyperplasia and endometrial intraepithelial carcinoma In Chapter 2, we reported that ARID1A loss paired with PI3K activation through constitutive expression of oncogenic PIK3CAH1047R drives endometrial hyperplasia and myometrial invasion in mice (Wilson et al. 2019). In that study, we also showed that ARID1A and PIK3CA mutations frequently co-occur in UCEC tumor samples. Upon further examination of TCGA-UCEC data, we found that roughly half of TP53 mutant tumors also harbor PIK3CA mutations (Fig. C.1a). Since TP53 and PIK3CA mutations are frequently observed together, we tested whether TP53 mutations could also promote endometrial tumorigenesis in the presence of PIK3CAH1047R in mice. TCGA-UCEC data indicates that roughly 19% of TP53 mutations in uterine serous carcinoma putatively result in direct TP53 protein truncation through frameshift or splice site alteration, and missense vs. truncating TP53 mutations are not associated with differences in survival or tumor grading (Fig. C.1b-e). Therefore, we modeled the effects of TP53 loss in combination with PIK3CA mutation in the endometrium by crossing the Trp53fl allele (Marino et al. 2000) with (Gt)R26Pik3ca*H1047R (Adams et al. 2011) in LtfCre+ mice (Fig. 4.2a). The LtfCre allele results in tissue-specific Cre recombinase expression in the endometrial epithelium at onset of puberty (Daikoku et al. 2014). Vaginal bleeding, indicating endometrial dysfunction, was observed with biallelic loss of TP53 in the presence of PIK3CAH1047R at a median of 76 days (Fig. 4.2b). Compared to control mice (Fig. 4.2c and C.2a), histological analysis of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Trp53fl/fl mice (henceforth referred to as TP53/PIK3CA mutant mice) revealed features of hyperplasia, adenocarcinoma, and endometrial intraepithelial carcinoma (EIC) within luminal and glandular areas (Fig. 4.2d and C.2b). Mutant endometrial epithelial cells expressed 121 KRT8, a marker of endometrial epithelium, and phospho-S6, a marker of PI3K pathway activity (Fig. C.2c).As reviewed, EIC is typically considered a precursor lesion to uterine serous carcinoma, a subtype of endometrial cancer dominated by TP53 mutations (see Fig. 4.1c) (Hendrickson et al. 1982; Murali, Soslow, and Weigelt 2014). EIC is marked by high-grade cytology and often presents as non-invasive with hobnail and papillary morphologies (Ambros et al. 1995; Soslow, Pirog, and Isacson 2000; Pathiraja, Dhar, and Haldar 2013), although nuclear atypia was infrequently observed in TP53/PIK3CA mutant mice. This non-invasive, EIC-like morphology contrasts with the ARID1A loss-driven invasive hyperplasia observed in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice (henceforth referred to as ARID1A/PIK3CA mutant mice) as described in Chapter 2 (Wilson et al. 2019). Stromal or myometrial invasion is not observed in the uterus of TP53/PIK3CA mutant mice, while collective invasion is a critical pathological feature of ARID1A/PIK3CA mutant endometrial epithelia (Wilson et al. 2019), which further distinguishes the two genetic mouse models. In the context of mutant PIK3CA, these results indicate that TP53 mutation in the endometrial epithelium promotes an endometrial phenotype that is distinct from ARID1A mutation, suggesting distinct tumor suppressive mechanisms. 122 Figure 4.2 TP53 loss with oncogenic PIK3CA activation results in endometrial intraepithelial carcinoma a, Diagram of mouse alleles used in this study. b, Survival data of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Trp53fl/fl (TP53/PIK3CA mutant) mice. Survival is measured as days to vaginal bleeding, requiring euthanasia. c, Representative H&E histology of control mouse endometrium from CRE-negative littermates. d, Representative H&E histology of endometrial intraepithelial carcinomas and hyperplastic epithelia in TP53/PIK3CA mutant uterus. Arrowheads denote dyplastic endometrial epithelia. 123 4.3.3 Endometrial phenotypes driven by TP53 or ARID1A loss display overlapping and distinct gene expression signatures In order to dissect tumorigenic mechanisms resulting from ARID1A or TP53 mutations in the context of mutant PIK3CA, we isolated endometrial epithelial cells from TP53/PIK3CA mutant mice at onset of vaginal bleeding (n = 3), using our sorting method described in Chapter 2 (Wilson, Reske et al. 2019), and performed RNA-seq. TP53/PIK3CA mutant endometrial epithelial cells were isolated with an average purity of 84.6% (Fig. C.3). Transcriptome analysis was performed on these cells by comparing to RNA-seq data from control endometrial epithelial cells and hyperplastic ARID1A/PIK3CA mutant cells generated in Chapter 2 (Wilson, Reske et al. 2019). Samples grouped into genetically distinct clades by unsupervised hierarchical clustering (Fig. 4.3a). Trp53 gene expression was significantly decreased in TP53/PIK3CA mutant cells compared to controls (FDR < 10-37) (Fig. 4.3b). Interestingly, Trp53 expression was significantly upregulated in ARID1A/PIK3CA mutant cells (FDR = 0.0025) (Fig. 4.3b). Directly comparing TP53/PIK3CA mutants to ARID1A/PIK3CA mutant cells resulted in 1799 significant differentially expressed (DE) genes at an FDR < 0.05 significance threshold (Fig. 4.3c). Upon comparing each genetic model to control cells, 1514 genes were significantly affected in TP53/PIK3CA mutant cells, and 3455 genes were affected in ARID1A/PIK3CA mutant cells (Fig. 4.3d). Of these gene sets, 470 DE genes overlapped between the genetic mouse models (hypergeometric enrichment, p < 10-55) (Fig. 4.3d). Among overlapping DE genes, 92.3% of genes were affected in the same direction between both genetic models, which could be attributed to the PIK3CAH1047R tumorigenic mechanisms shared in both models (Fig. 4.3e). Next, we asked what biological processes and pathways were affected in each genetic model. We performed Broad Gene Set Enrichment Analysis (GSEA) (Subramanian et al. 2005) 124 for MSigDB Hallmark pathways (Liberzon et al. 2015) on human orthologs from each model compared to control cells. Comparing the GSEA normalized enrichment scores (NES) revealed that certain pathways were upregulated or downregulated in both genetic models, such as interferon responses (upregulated) and estrogen response (downregulated) (Fig. 4.3f). However, many pathways were upregulated in ARID1A/PIK3CA mutant cells and downregulated in TP53/PIK3CA mutant cells (Fig. 4.3f). These include Notch signaling, p53, epithelial-to- mesenchymal transition (EMT) and Myc targets. In Chapter 2, we revealed that ARID1A transcriptionally represses mesenchymal fates through integrative genomic and cellular assays in vivo and in vitro (Wilson et al. 2019). Further investigation into the Hallmark EMT pathway revealed a cluster of 49 genes which are upregulated in both models as well as a cluster of 60 genes which are upregulated in ARID1A/PIK3CA mutant cells but mostly downregulated in TP53/PIK3CA mutants (Fig. C.4a- d). Among the affected genes, the EMT master regulator SNAI2 is highly upregulated in ARID1A/PIK3CA mutants (4.54 log2FC vs. controls) but unaffected in TP53/PIK3CA mutant endometrial epithelial cells (Fig. C.4e). SNAI2/SLUG is a conserved transcription factor that directly represses epithelial gene transcription to regulate cellular processes like adhesion, polarity, migration, and invasion (Zhou, Gross, and Kuperwasser 2019). The observation that EMT- promoting factors are aberrantly upregulated in ARID1A mutant cells, but not in TP53 mutants, may explain the lack of collective invasion in TP53/PIK3CA mutant endometrial epithelia. Enrichment for Gene Ontology (GO) Biological Process gene sets provided further insight into cellular processes affected in the genetic models (Fig. 4.3g). As was observed in the GSEA analysis, interferon and immune pathways were enriched in genes commonly upregulated in both models, though various extracellular matrix pathways were uniquely enriched among upregulated 125 genes in ARID1A/PIK3CA mutant cells (Fig. 4.3g). Interestingly, no pathways were enriched among genes uniquely upregulated in the TP53/PIK3CA mutant model. Further, direct comparison of the two genetic models showed there were no significantly enriched gene sets among 603 human ortholog genes more highly expressed in TP53/PIK3CA mutant cells among the Hallmark, GO, and Oncogenic signature MSigDB Collections (Fig. C.5). Among downregulated pathways, ARID1A/PIK3CA mutant cells downregulate endoplasmic reticulum (ER) stress response and glycosylation processes, while TP53/PIK3CA mutant cells downregulate G1/S mitotic transition transcriptional programs, apoptotic signaling pathways, and cellular differentiation pathways (Fig. 4.3g). Notably, regulation of epithelial cell differentiation was enriched among genes downregulated in TP53/PIK3CA mutants but also genes upregulated in ARID1A/PIK3CA mutants. Further investigation into genes involved in this process showed that some appear to be oppositely affected following ARID1A or TP53 loss (DLL1), while others are uniquely affected by TP53 loss (CD109, HES1), uniquely affected by ARID1A loss (MYCN), or affected by loss of both TP53 and ARID1A (OVOL2, ROCK2) (Fig. 4.3h). Altogether, these results highlight transcriptional programs with shared and unique regulation by ARID1A and TP53. 126 Figure 4.3 Endometrial epithelial TP53 and ARID1A loss results in overlapping and distinct gene expression programs 127 Figure 4.3 (cont’d) a, Unsupervised hierarchical clustering of gene-level RNA-seq data from sorted endometrial epithelial cells of TP53/PIK3CA mutant mice compared to ARID1A/PIK3CA mutant and control cells. Relative Z-score expression of targeted genes are displayed below clustering result. b, Relative linear Trp53 expression in endometrial epithelial cell transcriptomes. c, Volcano plot depicting differential gene expression between TP53/PIK3CA mutant and ARID1A/PIK3CA mutant cells. d, Overlap of DE genes between TP53/PIK3CA mutants and ARID1A/PIK3CA mutant cells vs. controls. Statistic is hypergeometric enrichment. e, Heatmap of 470 shared misregulated genes in endometrial epithelial cells from each genetic model. 92.3% of intersecting DE genes are affected in the same direction. f, Overview of Broad GSEA results for MSigDB Hallmark pathways in endometrial epithelial cells from each genetic model. Axes display gene set normalized enrichment score (NES) for each model compared to control cells. g, Enrichment for Gene Ontology (GO) Biological Process gene sets among genetic model DE genes separated by directionality. h, Examples of DE genes within the GO regulation of epithelial cell differentiation gene set: DLL1, CD109, HES1, MYCN, OVOL2, ROCK2. Statistic is FDR as reported by DESeq2: * FDR < 0.05; ** FDR < 0.01; *** FDR < 0.001. 128 4.3.4 Gene expression programs in mouse models reflect human tumor genetics The molecular profiling data generated by TCGA serve as an excellent resource to support the human disease relevance of mouse model observations. We segregated UCEC bulk primary tumor samples by ARID1A and TP53 genetic mutation status and histological subtype, then performed Broad GSEA for the MSigDB collection of Hallmark pathways and GO Biological Process gene sets using RNA-seq data from each sample. The various UCEC comparisons included TP53mut/ARID1Awt vs. wt/wt, ARID1Amut/TP53wt vs. wt/wt, ARID1Amut/TP53wt vs. TP53mut/ARID1Awt, and endometrioid vs. serous. The same GSEA genetic comparison framework was also applied to the transcriptomic data generated from isolated mutant mouse cells compared to control cells and each other. GSEA results were then contrasted between the human and mouse comparisons, which were labeled as enriched by an absolute NES >1 threshold. Hallmark pathway GSEA results for UCEC tumors corroborated certain observations in our genetic mouse models, such as upregulation of EMT, apoptosis, and p53 pathway in ARID1A mutant tumors, but downregulation of these pathways in TP53 mutant tumors (Fig. C.6a). As TP53 mutations are a hallmark of uterine serous carcinoma, while ARID1A mutations comprise roughly half of uterine endometrioid adenocarcinoma, we also observed that both Hallmark pathway and GO Biological Process GSEA results significantly correlated in a comparison between ARID1A mutant vs. TP53 mutant tumors and endometrioid vs. serous histological subtype (Fig. C.6b-c). We further confirmed the observed GSEA correlations were stronger than may be expected by chance due to sampling dependency (Fig. C.6d), suggesting that tumor gene expression features linked to TP53 or ARID1A genetic status are associated with histological subtype. From 3653 total GO Biological Process gene sets queried in all comparisons, 225 were mutually upregulated and 142 were mutually downregulated between TP53mut/ARID1Awt vs. 129 wt/wt UCEC tumors and TP53/PIK3CA mutant vs. control mouse cells (Fig. 4.4a). Upregulated TP53 mutant gene sets include response to type I interferon (NES 1.80 and 1.82, respectively) and double strand break repair (NES 1.93 and 1.11, respectively). Downregulated gene sets include intrinsic apoptotic signaling pathway by p53 (NES -1.59 and -1.49) and regulation of response to extracellular stimulus (NES -1.59 and -1.02). With regard to ARID1A mutant comparisons, 225 gene sets were mutually upregulated and 156 were mutually downregulated between ARID1Amut/TP53wt vs. wt/wt UCEC tumors and ARID1A/PIK3CA mutant vs. control mouse cells (Fig. 4.4b). Upregulated ARID1A mutant gene sets include response to hyperoxia (NES 1.81 and 1.39, respectively) and collagen fibril organization (NES 1.75 and 1.42). Curiously, in both the ARID1A and TP53 human-mouse disease comparisons, mutually downregulated gene sets (compared to wild-types or controls) overlapped more than expected by chance, while upregulated gene sets did not (hypergeometric enrichment, Fig. C.7). Direct comparison of ARID1A mutant vs. TP53 mutant human tumors and mouse models furthered that many related processes appear to be distinctly affected in TP53 vs. ARID1A mutants, such as extracellular matrix assembly and EMT (Fig. 4.4c). Notably, gene sets that were expressed higher in ARID1A mutants compared to TP53 mutants strongly overlapped between UCEC tumors and mouse models (p < 10-8, hypergeometric enrichment), while downregulated gene sets did not display significant overlap (Fig. C.7). At the gene level, 81 genes were significantly more highly expressed in TP53 mutant mice and UCEC tumors as compared to ARID1A mutants, including ESRRB, MAL, WNT7A, RASAL1, USP51, PLCXD3, and AIF1L (Fig. C.8). In contrast, 149 genes were significantly more highly expressed in ARID1A mutants, including COL17A1, KRT5, TP63, SNAI2, ZNF750, HAS3, ANKK1, WDR38, C6, and IL33 (Fig. C.8). 130 In order to characterize the affected pathways in TP53 vs. ARID1A mutant disease in an unbiased manner, we identified over-represented terms among enriched GO Biological Process gene sets. Upregulated TP53 mutant gene sets were enriched for terms such as “type I interferon” and “response to virus”, and downregulated sets involved terms “endoplasmic reticulum” and “epithelial” (Fig. 4.4d). Upregulated ARID1A mutant gene sets were enriched for terms “cell cycle”, “cell migration”, “oxidative stress”, and “p53” (Fig. 4.4d). Human-human and mouse- mouse genetic comparisons also showed that p53 pathway-related processes are consistently upregulated in ARID1A mutant tumors and downregulated in TP53 mutant tumors (Fig. C.9). Figure 4.4 Pathway analysis of TP53 and ARID1A regulated expression programs in human disease and mouse models 131 Figure 4.4 (cont’d) a-c, Various Broad GSEA results for GO Biological Process gene sets (n = 3653) comparing TP53 and ARID1A mutant human UCEC tumors and genetically engineered mouse models: (a) TP53 mutant, ARID1A wild-type vs. wild-type/wild-type UCEC tumors compared to mouse endometrial epithelial cells from TP53/PIK3CA mutants vs. controls; (b) ARID1A mutant, TP53 wild-type vs. wild-type/wild-type UCEC tumors compared to mouse endometrial epithelial cells from ARID1A/PIK3CA mutants vs. controls; (c) ARID1A mutant, TP53 wild-type vs. TP53 mutant, ARID1A wild-type UCEC tumors compared to mouse endometrial epithelial cells from ARID1A/PIK3CA mutants vs. TP53/PIK3CA mutants. Presented are the overview of GSEA results (left) with zooms into shared upregulated (NES > 1, center) and shared downregulated (NES < -1, right) gene sets. Representative examples of highly enriched gene sets are labeled. d, Significantly over-represented terms in enriched gene sets (|NES| > 1) highlighted in a-c. Statistic is hypergeometric enrichment. See Methods section 4.5.8 for enrichment analysis framework. 4.3.5 ARID1A mutant tumors display p53 pathway activation p53 pathway gene signatures were upregulated in both ARID1A mutant UCEC tumors and the ARID1A/PIK3CA mutant genetic mouse model. This result suggests tumor cell dependencies on the p53 pathway itself could be a potential mechanism underpinning mutual exclusivity of TP53 and ARID1A mutations. Mouse model analysis of gene expression within the Hallmark p53 pathway showed that certain canonical members of the p53 pathway were upregulated in ARID1A/PIK3CA mutant mice, such as TP63, TP53, MDM2, DDIT3 (CHOP), and CDKN1A (Fig. 4.5a). As TP53 and ARID1A both regulate transcription, we proceeded further with an unbiased investigation to determine which aspects of TP53 directed transcriptional regulation are co-regulated by ARID1A in the endometrium. We interrogated a recently reported gene set composed of 103 high-confidence TP53 target genes that are transcriptionally regulated by TP53 in multiple cell lines, known as the core TP53 transcriptional program, which were further categorized based on known functions (Andrysik et al. 2017). These genes were enriched for expression alterations in diseased endometrial epithelia from both ARID1A/PIK3CA and 132 TP53/PIK3CA mutant mouse models (hypergeometric enrichment, p = 0.019 and 0.0063, respectively) (Fig. C.10). We analyzed expression alterations of orthologous genes in TP53/PIK3CA mutant and ARID1A/PIK3CA mutant mice compared to controls (Fig. 4.5b). Examples of opposing regulation by ARID1A and TP53 emerged, such as the pro-apoptotic Akt repressor PHLDA3 (Kawase et al. 2009), which is upregulated in ARID1A/PIK3CA mutant mice but downregulated in TP53/PIK3CA mutant mice (Fig. 4.5b). Next, we tested whether p53 pathway activation is a hallmark feature of ARID1A mutant tumors across human cancer. We leveraged PARADIGM pathway activity data (Vaske et al. 2010) produced by a recent pan-cancer TCGA study (Hoadley et al. 2018), which infers protein and pathway regulatory activity from both gene expression and copy-number data across 9829 tumors. In analysis of all TP53 wild-type primary tumors across cancer, we computed the mean difference in PARADIGM scores between ARID1A mutant and ARID1A wild-type tumors (n = 492 and 4832, respectively) to determine pathway alterations associated with ARID1A mutation (Fig. 4.5c). Overall, the 36 PARADIGM pathways with keyword “p53” were more highly activated in ARID1A mutant tumors than ARID1A wild-type tumors across cancer (Fig. 4.5c-e, permutation test, p = 0). This result was also recapitulated specifically in UCEC tumors (Fig. C.11), corroborating the GSEA results. Altogether, these data implicate aberrant p53-mediated transcriptional regulation as a hallmark feature of ARID1A mutant tumors. However, the functional mechanism underlying this activation remains unclear. 133 Figure 4.5 ARID1A mutation is associated with p53 pathway activation a, k-means clustering and heatmap of genetic mouse model RNA-seq relative log2 gene expression data for MSigDB Hallmark p53 pathway genes (n = 181 expressed orthologs). Representative genes are highlighted on the right. b, Differential expression of core TP53 transcriptional program gene orthologs, segregated by function, in mouse endometrial epithelial cells from ARID1A/PIK3CA mutants (green) and TP53/PIK3CA mutants (blue) compared to controls. Significant DE genes (FDR < 0.05) in each model are labeled and respectively colored. c, Distribution of PARADIGM score differences between ARID1A mutant (n = 492) vs. wild-type (n = 4832) TCGA Pan-Cancer Atlas tumors, considerate of only TP53 wild-type tumors. Top, all 19,503 measured pathways; bottom, the 36 pathways with keyword “p53”. d, Empirical distribution of mean differences between ARID1A mutant vs. wild-type PARADIGM scores, based on 50,000 samples of 36 random PARADIGM pathways. The blue line represents the mean score difference for the 36 pathways with keyword “p53” with associated permutation statistic. e, Example violin plot for the top p53 PARADIGM pathway significantly different between ARID1A mutant vs. wild-type tumors. Statistic is FDR-adjusted, two-tailed, unpaired Wilcoxon test. 134 4.3.6 p53 pathway target genes are directly regulated by ARID1A In Chapter 2, we demonstrated that ARID1A normally represses key genes involved in endometrial pathologies through chromatin interactions (Wilson et al. 2019). We hypothesized that p53 pathway activation following ARID1A loss could result from the derepression of ARID1A target genes. To test this, we profiled ARID1A binding genome-wide in sorted mouse endometrial epithelial cells through in vivo CUT&RUN (Fig. 4.6a) (Skene, Henikoff, and Henikoff 2018). Our ARID1A in vivo CUT&RUN in mouse endometrial epithelial cells revealed significant ARID1A binding enriched over the IgG control at 2146 genome-wide sites (MACS2, FDR < 0.25, Fig. 4.6b). These ARID1A bound genomic regions primarily comprised of intergenic (38%), intronic (36%), and promoter-TSS regions (24%, defined as within 3 kilobases of a gene transcription start site, TSS) (Fig. 4.6c). Gene promoters and CpG islands represented the top enriched genomic features (Fig. C.12a). Sequence motif analysis of ARID1A bound genomic regions revealed strong enrichment for AP-1/bZIP family transcription factor binding sites (Fig. 4.6d and C.12b). In Chapter 2, we reported that ARID1A binds chromatin near gene promoters and AP-1 motifs in vitro 12Z human endometriotic epithelial cells (Wilson et al. 2019). As an orthogonal experimental control supporting the validity of these ARID1A binding data, we leveraged our chromatin accessibility data (ATAC-seq) in this same cell population from Chapter 2. In that study, we found that ARID1A binding is associated with accessible chromatin genome-wide in vitro 12Z human endometriotic epithelial cells (Wilson et al. 2019). In agreement, 89% of ARID1A bound regions in vivo overlapped with accessible chromatin (Fig. 4.6e-f). We further observed that accessible chromatin regions bound by ARID1A were more highly accessible than those without detectable ARID1A binding (p < 10-61, two-tailed, unpaired Wilcoxon test, Fig. 135 4.6g). These data support detection of ARID1A binding at active chromatin in mouse endometrial epithelial cells in vivo. We next investigated genes regulated by ARID1A interactions at promoter chromatin in the endometrial epithelium, which are likely to influence nearby gene activity. 587 annotated mouse genes had ARID1A binding detected within 3 kilobases of the primary TSS, indicating highly likely regulation of associated gene expression activity (Table C.1). In agreement with our expression data, we observed significant overlap between expressed genes with ARID1A promoter binding and DE genes in the ARID1A/PIK3CA mutant mouse model (Fig. 4.6h). Similarly, we also observed significant overlap between the ARID1A/PIK3CA mutant DE genes and a more relaxed set of 3065 genes with ARID1A binding <50 kb from the gene TSS (Fig. C.12c). Functionally, 494 human orthologs of the ARID1A promoter bound genes were significantly enriched for gene sets including TNFα signaling via NFκB, apoptosis, hypoxia, and response to endoplasmic reticulum stress (Fig. 4.6i). Overall similar pathways were enriched in the larger set of genes with ARID1A binding detected within 50 kb of the TSS (Fig. C.12d). Among the Hallmark p53 pathway, 11 genes had ARID1A binding detected within the ±3 kb promoter region in endometrial epithelia (* p = 0.025, hypergeometric enrichment) among those including Atf3, Plk3, Jun, Ndrg1, Rab40c, Hbegf, and Mxd1 (Fig. 4.6j). These data indicate that ARID1A regulates numerous target genes within the p53 pathway through direct chromatin interactions, and ARID1A loss may lead to aberrant gene expression by disrupting local chromatin regulation. 136 Figure 4.6 Analysis of ARID1A chromatin interactions in mouse endometrial epithelia in vivo 137 Figure 4.6 (cont’d) a, Diagram of experimental workflow for measuring ARID1A chromatin interactions in vivo. Endometrial epithelial cells are purified from mouse uterus by positive enrichment with labeling and magnetic beads. Purified cells were then subject to CUT&RUN for ARID1A or IgG negative control. b, Correlation of ARID1A binding signal (compared to IgG) in vivo across 2146 genomic regions with significant binding detected (FDR < 0.25) in two independent experiments. c, Genomic annotation of 2146 ARID1A bound genomic regions in vivo. d, ARID1A binding and accessibility profiles at 3842 bound AP-1/bZIP motifs, using the top de novo enriched motif among ARID1A genome-wide binding sites in vivo, TGA(C/G)TCA. e, Heatmap of ARID1A binding and chromatin accessibility signal across 2146 genomic regions with significant ARID1A binding detected, segregated by accessibility. f, Overlap of ARID1A bound regions and accessible chromatin regions. g, Chromatin accessibility quantified at accessible regions with significantly detected ARID1A binding vs. not. Statistic is two-tailed, unpaired Wilcoxon test. h, Significant overlap of genes with ARID1A promoter binding (within 3kb of TSS) and DE genes from ARID1A/PIK3CA mutant endometrial epithelia. i, Enrichment statistics for top 10 (left) GO Biological Process gene sets and (right) Hallmark pathways among 494 human gene orthologs with ARID1A promoter binding in vivo mouse endometrial epithelia. j, Examples of ARID1A chromatin interactions and accessibility (ATAC) at Hallmark p53 pathway genes in vivo. y-axis is log-likelihood ratio of signal compared to background. Bars underneath signal tracks represent significant (FDR < 0.25) and reproducible (n = 2) signal detection i.e. peaks. phyloP track represents sequence conservation across vertebrates. 138 4.3.7 Simultaneous TP53 and ARID1A loss promotes aggressive tumorigenesis Despite the implication of p53 pathway activation as a hallmark of ARID1A mutant tumors, rare co-mutant patient tumors indeed exist, but etiology and characteristics have not been elucidated. Intriguingly, the 38 UCEC primary tumors with alterations in both TP53 and ARID1A were associated with higher histologic tumor grading (Fig. C.13a), suggesting co-altered tumors may be more aggressive. We also found that TP53/ARID1A co-altered primary tumors were enriched for mixed morphology subtype (hypergeometric enrichment, p = 0.015) (Fig. C.13b). To determine if TP53 and ARID1A co-alterations were associated with metastasis, we examined the MSK-IMPACT Clinical Sequencing Cohort of 10,501 primary or metastatic tumor samples with targeted mutation data (Zehir et al. 2017). We observed that TP53/ARID1A co-mutation rates were slightly higher in metastatic tumors than in primary tumors (one-tailed Fisher’s exact test, OR = 1.20, p = 0.045) (Fig. C.13c). However, it is known that TP53 mutations are more prevalent in advanced stage and metastatic tumors compared to primaries (Zehir et al. 2017). When controlling for TP53 mutation status, we observed a similar, minor but insignificant association trend between ARID1A mutations and metastatic TP53 mutant tumors (OR = 1.10, p = 0.21) (Fig. C.13d). As in the TCGA-UCEC cohort, POLE mutant tumors in the MSK-IMPACT cohort were also enriched for TP53/ARID1A co-mutations (Fig. C.13e). To rule out the possibility of tumor heterogeneity contributing to TP53-ARID1A co-dependencies, Cancer Dependency Map (DepMap) data (Dempster et al. 2019) suggested that TP53 mutant cancer cell lines were not more genetically dependent on ARID1A than TP53 wild-type lines (Fig. 4.7a). On the contrary, ARID1A loss had a significantly lesser effect on cellular viability in TP53 mutant lines (two-tailed, unpaired Wilcoxon test, p = 0.0066). 139 As ARID1A is an established metastasis gene in numerous tumor contexts (Wei et al. 2014; Cho et al. 2015; Sun et al. 2017; Yates et al. 2017; Xu et al. 2019; Namjan et al. 2020), including endometrial cancer (Gibson et al. 2016), we hypothesized that TP53 mutant primary tumors are not dependent on ARID1A function, and ARID1A mutations may promote metastasis in this genetic context. Therefore, we established and interrogated the tumorigenic potential of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Trp53fl/fl mice (henceforth referred to as TP53/ARID1A/PIK3CA mutant mice), simultaneously harboring endometrial epithelial PIK3CAH1047R, ARID1A loss, and TP53 loss (Fig. 4.7b). By gross observation and histopathology, TP53/ARID1A/PIK3CA mutant mice had visibly greater tumor burden as compared to TP53/PIK3CA mutant mice (Fig. 4.7c-d). Vaginal bleeding was observed at a median of 98 days (Fig. C.14a). This latency period is significantly longer than TP53/PIK3CA mutant mice (Fig. C.14a), indicating that the humane survival endpoint of vaginal bleeding is not predictive of tumor burden in our mouse models, as we have previously observed (Wilson et al. 2019). Uterine histology of TP53/ARID1A/PIK3CA mutant mice showed invasive adenocarcinoma with areas of squamous differentiation (Fig. 4.7d- e). Endometrial epithelial tumor origin was confirmed through staining for KRT8, phospho-S6, and loss of ARID1A (Fig 4.7d and C.14b). Proliferation and caspase-mediated cell death indices of TP53/ARID1A/PIK3CA mutant endometrial epithelial cells were characterized through IHC staining for Ki67 and Cleaved Caspase-3, respectively. Compared to TP53/PIK3CA mutants, endometrial epithelia with additional ARID1A loss displayed markedly fewer caspase-3-positive foci while proliferation was not dramatically affected (Fig. 4.7f and C.15). These data suggest that co-occurring TP53-ARID1A mutations promote tumor cell progression towards invasive adenocarcinoma and metastatic disease. 140 Figure 4.7 Co-existing TP53 and ARID1A mutations promote aggressive endometrial tumorigenesis 141 Figure 4.7 (cont’d) a, Cancer dependency map (DepMap) data for ARID1A wild-type cell lines, measuring ARID1A knockout viability effect on TP53 wild-type vs. mutant lines. Statistic is unpaired, two-tailed Wilcoxon test. b, Schematic of genetically engineered mice harboring endometrial epithelial specific PIK3CAH1047R, TP53 loss, and ARID1A loss. c, Example gross necropsy images in TP53/PIK3CA mutant (top) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Trp53fl/fl; Arid1afl/fl (TP53/ARID1A/PIK3CA mutant, bottom) mice. Arrowheads denote uterine abnormalities. d, Immunohistochemical staining of KRT8 staining in TP53/PIK3CA and TP53/ARID1A/PIK3CA mutant uterus. Arrowheads denote mutant endometrial epithelial cells. e, Representative H&E uterine histology images of TP53/ARID1A/PIK3CA mutant mice. Arrowheads depict mutant tumor cells with squamous differentiation. f, Uterine IHC staining for Cleaved Caspase-3 (cell death, left) and Ki67 (proliferation, right) in TP53/PIK3CA mutant (top) and TP53/ARID1A/PIK3CA mutant (bottom) mice. Arrowheads denote endometrial epithelial cells. 4.3.8 ARID1A loss results in ATF3 activation and squamous differentiation We observed that endometrial epithelia harboring loss of both TP53 and ARID1A in addition to PIK3CA activation formed invasive tumors marked by high proliferation and minimal cell death. Compared to TP53/PIK3CA mice, which displayed frequent cell death marked by cleaved caspase-3-positive foci, the addition of ARID1A loss in TP53/ARID1A/PIK3CA mutant mice drastically reduced the number of apoptotic cells. This result suggests that ARID1A may normally promote cell death and stress-related programs in the absence of TP53, further supporting ARID1A regulation of apoptotic, inflammatory, and p53 pathway genes (see Fig. 4.6). Activation Transcription Factor 3 (ATF3) is a stress-inducible transcription factor and member of the Hallmark p53 pathway and apoptosis module of the core TP53 transcriptional program (Hai et al. 1999; Andrysik et al. 2017). ATF3 has context-dependent roles in tumorigenesis, cell death, and cell cycle arrest or senescence (Hartman et al. 2004; Lu, Wolfgang, and Hai 2006; Wang, Arantes, et al. 2008; Yin, Dewille, and Hai 2008; Wu et al. 2010; Taketani 142 et al. 2012; Xie et al. 2014; Sharma et al. 2018; Borgoni et al. 2020; Azizi et al. 2021; Zhang et al. 2021). ATF3 is known to both activate and repress transcriptional activity, and it co-regulates chromatin at p53 target genes (Hai et al. 1999; Tanaka et al. 2011; Zhao et al. 2016). ATF3 has also been shown to stabilize TP53 by competing for ubiquitination by MDM2 (Li et al. 2020). Further, ATF3 binds AP-1 consensus DNA motifs (Hai and Curran 1991)—the top motif enriched among ARID1A binding sites (see Fig. C.12b)—indicating ATF3 may co-regulate ARID1A chromatin targets. We observed that Atf3 gene expression is directly repressed by ARID1A chromatin regulation in endometrial epithelia in vivo (Fig. C.16a and 4.6j). Uterine ATF3 staining revealed focal nuclear staining in a subset of endometrial epithelia in ARID1A/PIK3CA mutant and TP53/ARID1A/PIK3CA mutant mice, but not controls or TP53/PIK3CA mutants (Fig 4.8a-b). Among other roles, ATF3 is also misregulated in some squamous tumor types (Wu et al. 2010; Xie et al. 2014; Shi et al. 2018; Xu et al. 2021), and ATF3 has been shown to promote epithelial squamous differentiation in vivo (Wang et al. 2007; Wang, Arantes, et al. 2008). Areas of squamous-like cells were noted during histopathological analysis of TP53/ARID1A/PIK3CA mutant uteri, as well as induction of squamous marker TP63 in ARID1A/PIK3CA mutant mouse RNA-seq data (see Fig. 4.5a), so we further examined markers of squamous differentiation. Nuclear TP63 staining was observed in both TP53/ARID1A/PIK3CA and ARID1A/PIK3CA mutant endometrial epithelia that neighbored basement membrane and those cells collectively invading (Fig. 4.8c and C.17a-b), confirming squamous differentiation, while TP63 expression was not observed in TP53/PIK3CA mutants or controls (Fig 4.8c and C.17a-b). We also observed in our RNA-seq data that expression of Trp63 (mouse TP63) correlated with pro-EMT transcription factor Snai2 (Fig. C.16b), supporting an association between squamous 143 differentiation and invasion phenotypes. Using a recently reported set of direct and functional TP63 target genes (Riege et al. 2020), we identified 30 genes through our data predicted as TP63 target genes associated with endometrial ARID1A mutation (p < 10-4, hypergeometric enrichment), including COL17A1 (Fig. C.16c-f). In TP53/ARID1A/PIK3CA and ARID1A/PIK3CA mutant endometrial epithelia, COL17A1 expression patterns were similar to TP63 (Fig. 4.8d and C.17c). As further evidence that squamous phenotypes are associated with ATF3 expression, we observed expression of ATF3, TP63, and COL17A1 in normal vaginal pseudostratified squamous epithelia (Fig. C.17d). Collectively, these results suggest that ARID1A mediated repression of ATF3 promotes apoptosis in the absence of TP53, and derepression of ATF3 following ARID1A loss suppresses proapoptotic genes and promotes squamous differentiation (Fig. 4.9). In ARID1A-deficient cells with wild-type TP53, enhanced TP53 signaling blocks proliferation and induces senescence. 144 Figure 4.8 ARID1A loss relieves Atf3 repression associated with squamous differentiation a, Representative H&E histology of (from left to right) control mice, TP53/PIK3CA mutants, ARID1A/PIK3CA mutants, and TP53/ARID1A/PIK3CA mutants. Arrowheads depict endometrial epithelia. b-d, Immunohistochemical staining of ATF3 (b), TP63 (GeneTex) (c), and COL17A1 (d) of mouse sections as in a. 145 Figure 4.9 Model of independent and co-existing TP53 and ARID1A mutations in endometrial epithelia Summary of endometrial disease features and hypothesized molecular mechanisms in genetically engineered mouse models from this study. 146 4.4 Discussion In this chapter, we developed new models of TP53 mutant endometrial dysfunction and provide genetic and molecular evidence in support of unique and overlapping roles for TP53 and ARID1A. We demonstrate that concurrent TP53 and oncogenic PIK3CA mutations in the mouse endometrial epithelium drive intraepithelial carcinoma, and—unexpectedly, based on historic accounts of presumed mutation mutual exclusivity—additional ARID1A loss promotes aggressive endometrial cancer with squamous differentiation or metaplasia. Transcriptome profiling in purified mutant epithelial cell populations and comparison to human tumors with respective genetic alterations uncovered hallmarks of endometrial TP53 and ARID1A mutations: TP53 mutations lead to epithelial dedifferentiation and loss of intrinsic apoptotic signaling, while ARID1A mutations promote EMT and activation of the p53 pathway. Our genome-wide analysis of ARID1A in vivo chromatin interactions by CUT&RUN indicated that ARID1A directly regulates numerous TP53 target genes, notably including repression of the stress-induced transcription factor Atf3. We hypothesize that ATF3 induction is associated with invasive, TP63+ squamous differentiation in ARID1A mutant lesions, which we demonstrate may occur independent of TP53 genetic status. In addition, further loss of ARID1A prevented the caspase- mediated cell death observed in TP53/PIK3CA mutant endometrium, and, as ATF3 has known roles in apoptotic and cell death regulation (Lu, Wolfgang, and Hai 2006; Tanaka et al. 2011), endometrial epithelial ATF3 induction may serve to suppress pro-apoptotic genes in the context of absent TP53. TP53/PIK3CA mutant mice develop endometrial intraepithelial carcinoma, which is typically considered a precursor lesion to aggressive, high-grade uterine serous carcinoma associated with TP53 mutations (Ambros et al. 1995). EIC lesions are non-invasive but frequently 147 associate with extrauterine spread e.g. to the peritoneum (Baergen et al. 2001). Myometrial invasion in endometrioid tumors is associated with metastasis and poor outcomes. Acquisition of ARID1A mutation in EIC confined to the uterus may promote extrauterine dissemination. If TP53/PIK3CA and TP53/ARID1A/PIK3CA mutant mice were able to progress past the vaginal bleeding endpoint, we may expect to observe development of uterine serous carcinoma with extrauterine dissemination. Intriguingly, 5 out of 57 primary tumors (8.8%) profiled in the TCGA uterine carcinosarcoma (UCS) data set harbored mutations in both TP53 and ARID1A in the absence of hypermutator signatures (Cherniack et al. 2017). A reported mouse model of uterine Fbxw7 and Pten loss developed invasive endometrioid intraepithelial neoplasia followed by uterine carcinosarcomas at late stage confirmed as endometrial epithelial origin (Cuevas et al. 2019). If our mouse models were able to age past the point of vaginal bleeding, it remains possible that a subset of TP53/ARID1A/PIK3CA mutant mice could progress toward uterine carcinosarcoma phenotype. Lack of invading features in TP53/PIK3CA mutant endometrial epithelia could be attributed to the use of a Trp53fl allele, which may not completely model non-deleterious, gain-of- function TP53 mutations. In mice, Trp53 missense alleles corresponding to known gain-of- function TP53 clinical mutations are associated with enhanced metastasis (Lozano 2010). Using a gain-of-function mutation model may elicit different phenotypes in the endometrial epithelium than those observed with loss-of-function alleles, and these features could be cell type specific. In support of a nullizygous approach in the endometrium, Trp53 null mutations in the mouse endometrium have been previously demonstrated to promote tumorigenesis (Wild et al. 2012; Guo et al. 2019). In TCGA-UCEC samples, we show there is no significant difference in tumor grading or overall survival between TP53 missense vs. truncating mutations, further supporting the use of 148 either strategy to study TP53 mutant tumors. Further, the type of TP53 mutation in endometrial cancer is not associated with genomic or histologic subtype (Schultheis et al. 2016). Although a clinical sign of endometrial cancer in women, vaginal bleeding is a humane survival endpoint in mice, requiring euthanasia at early stages of the disease. In the absence of vaginal bleeding, it remains possible that endometrial invasion would be detected in TP53/PIK3CA mutant endometrium at later stages of the disease. It is worth noting that these genetic mechanisms are difficult to explore in cell culture models, as the majority of immortalized cell lines have impaired p53 signaling, whether that be through inherent mutation (Berglind et al. 2008), selection during culture (Harvey and Levine 1991), or immortalization techniques like SV40 large T antigen (Ali and DeCaprio 2001). In fact, half of the 30 endometrial cancer cell lines profiled by CCLE are TP53/ARID1A co-mutant (Ghandi et al. 2019). This places further emphasis on the utility of in vivo characterization of TP53- ARID1A functional and genetic relationships. 4.5 Methods 4.5.1 Mice and animal husbandry All mice were maintained on an outbred genetic background using CD-1 mice (Charles River). (Gt)R26Pik3ca*H1047R, Trp53fl, and LtfCre (Tg(Ltf-iCre)14Mmul) alleles were purchased from The Jackson Laboratory and confirmed by PCR using published methods (Marino et al. 2000; Adams et al. 2011; Daikoku et al. 2014). Age-matched, Cre-negative (LtfCre0/0) littermates were used as controls. Endpoints were vaginal bleeding, severe abdominal distension, and signs of severe illness, such as dehydration, hunching, jaundice, ruffled fur, signs of infection, or non- responsiveness. Sample sizes within each genotype were chosen based on the proportions of 149 animals with vaginal bleeding between each experimental group or a Kaplan-Meyer log-rank test for survival differences. Mice were housed at the Van Andel Research Institute (VARI) Animal Facility and the Michigan State University Grand Rapids Research Center in accordance with protocols approved by the Michigan State University Institutional Animal Care and Use Committee (IACUC). 4.5.2 Histology and immunohistochemistry For indirect immunohistochemistry (IHC), 10% neutral-buffered formalin (NBF)-fixed paraffin sections were processed for heat-based antigen unmasking in 10 mM sodium citrate [pH 6.0], with the exception of ATF3, which used 10 mM Tris-HCl, 1 mM EDTA [pH 9.0]. Sections were incubated with antibodies at the following dilutions: 1:200 ARID1A (D2A8U) (12354, Cell Signaling); 1:400 Phospho-S6 (4585, Cell Signaling); 1:100 KRT8 (TROMA1, DHSB); 1:200 Cleaved Caspase-3 (9579, Cell Signaling); 1:400 Ki67 (12202, Cell Signaling); 1:200 ATF3 (GTX37776, GeneTex); 1:200 TP63 (N2C1, GeneTex); 1:200 TP63 (13109, Cell Signaling); 1:100 COL17A1 (ab184996, abcam). The following Biotin-conjugated secondary antibodies were used: donkey anti-rabbit IgG (711-065-152, Jackson Immuno-research Lab) and donkey anti-rat IgG (#705-065-153, Jackson Immuno-research Lab). Secondary antibodies were detected using VECTASTAIN Elite ABC HRP Kit (Vector). Sections for IHC were lightly counter-stained with Hematoxylin QS or Methyl Green (Vector Labs). Routine Hematoxylin and Eosin (H&E) staining of sections was performed by the VARI Histology and Pathology Core. Adjacent sections were used for H&E and IHC marker comparisons as in Fig. 4.8. At least four animals per genotype were assayed for each histological analysis and immunohistochemical marker. 150 4.5.3 Endometrial epithelial cell isolation and RNA-seq Approximately 76-day old mouse uteri were surgically removed, digested, enriched for EPCAM- positive epithelial cells by magnetic sorting, and purified for RNA as described in Chapter 2 (Wilson et al. 2019). Mouse libraries (n = 3 biological replicates) were prepared and sequenced by the Van Andel Institute Genomics Core from 100 ng of isolated mouse cell total RNA as previously described in Chapter 2. For analysis, briefly, raw reads were trimmed and aligned to mm10 assembly and indexed to GENCODE (vM16) via STAR (Dobin et al. 2013). Lowly expressed genes with less than one count per sample on average were filtered prior to count normalization and differential gene expression (DGE) analysis by DESeq2 (Love, Huber, and Anders 2014), using a single term model matrix. Differential expression probabilities were corrected for multiple testing by independent hypothesis weighting (IHW) (Ignatiadis et al. 2016) for downstream analyses. Comparisons between TP53/PIK3CA mutant and ARID1A/PIK3CA mutant mouse DGE results and gene sets were initially filtered for genes with transcripts commonly detected in both cell populations. 4.5.4 Cleavage Under Targets and Release Using Nuclease (CUT&RUN) EPCAM-positive endometrial epithelial cells were enriched from healthy, adult wild-type mouse uteri as described above, and 100,000 resulting cells were used for each CUT&RUN reaction as previously described (Skene, Henikoff, and Henikoff 2018). Briefly, Concanavalin A magnetic beads (Bangs) were washed in Binding Buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 1 mM CaCl2, 1 mM MnCl2) and incubated with either anti-ARID1A (n = 2, D2A8U, Cell Signaling) or Rabbit IgG (n = 2, 2729, Cell Signaling). Purified cells were washed in Wash Buffer (20 mM HEPES-NaOH pH 7.5, 150 mM NaCl, 0.5 mM spermidine, 5 mM sodium butyrate) then added to 151 the conjugated antibody-bead slurry. After 10 minutes of nutating incubation at ambient temperature, Antibody Buffer (20 mM HEPES-NaOH pH 7.5, 150 mM NaCl, 0.5 mM spermidine, 5 mM sodium butyrate, 2 mM EDTA, 0.05% digitonin) was added to cell bead mixtures, and nuclear permeabilization was confirmed with Trypan blue dye. Reactions were then incubated overnight at 4 °C. Reactions were washed with Digitonin Buffer (20 mM HEPES-NaOH pH 7.5, 150 mM NaCl, 0.5 mM spermidine, 5 mM sodium butyrate, 0.05% digitonin) and incubated with pAG-MNase (CUTANA, EpiCypher) for 1 hour at ambient temperature, followed by an additional wash in Digitonin Buffer then Low-Salt Rinse Buffer (20 mM HEPES-NaOH pH 7.5, 0.5 mM spermidine, 5 mM sodium butyrate). Calcium Incubation Buffer (3.5 mM HEPES-NaOH pH 7.5, 10 mM CaCl2, 0.05% digitonin) was then added to activate the pAG-MNase enzyme, and the reaction was quenched after 3 minutes using EGTA-STOP buffer (170 mM NaCl, 20 mM EGTA, 0.05% digitonin, 20 µg/mL RNase A, 20 µg/mL glycogen, 0.8 pg/mL Saccharomyces cerevisiae nucleosomal DNA as spike-in). Fragments were then released into solution at 37 °C followed by 5 minutes centrifugation at 16,000 x g. Eluted DNA was then purified with the NucleoSpin Gel and PCR Clean-up Kit (Takara). 4.5.5 Construction and sequencing of CUT&RUN libraries Libraries for CUT&RUN samples were prepared by the Van Andel Genomics Core from 0.1-0.3 ng of IP material, using the KAPA Hyper Prep Kit (v8.2) (Kapa Biosystems). Prior to PCR amplification, end-repaired and A-tailed DNA fragments were ligated to IDT for Illumina Unique Dual Index adapters (IDT, Coralville, IA USA) at a concentration of 500 nM. Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies, Inc.), QuantiFluor dsDNA System (Promega Corp.), and Kapa Illumina 152 Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled, and 50 bp, paired-end sequencing was performed on an Illumina NovaSeq6000 sequencer using a 100 cycle sequencing kit (Illumina Inc.). Each library was sequenced to an average depth of 50 million reads. Base calling was done by Illumina RTA3 and output was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 4.5.6 CUT&RUN bioinformatic analysis Raw paired-end reads for anti-ARID1A or IgG CUT&RUN were trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality analysis via FastQC (Andrews 2010). Trimmed reads were aligned to mm10 genome assembly with bowtie2 (Langmead and Salzberg 2012) using flag `--very-sensitive` and filtered for only properly-paired reads with samtools (Li et al. 2009) using flag `-f 3`. Picard MarkDuplicates (http://broadinstitute.github.io/picard/) was used to remove PCR duplicates. For each biological replicate, MACS2 (Zhang et al. 2008) was used to call ARID1A broad peaks against the respective IgG negative control as input, with FDR < 0.25 threshold. The resulting peaks were repeat-masked by ENCODE blacklist filtering and filtered for non-standard contigs (Amemiya, Kundaje, and Boyle 2019). A naïve replicate-overlapping peak set was constructed by calling peaks on pooled replicates followed by bedtools intersect (Quinlan and Hall 2010) to select for peaks of at least 50% overlap with each biological replicate. HOMER (Heinz et al. 2010) was used to annotate peaks, test genomic enrichment, compute motif enrichment, re-center peaks on motifs, and quantify signal profiles across regions of interest. csaw (Lun and Smyth 2016) was used to count sequencing reads in genomic loci of interest. IGV (Robinson et al. 2011) was used for visualizing CUT&RUN and ATAC signal as MACS2 enrichment log-likelihood (logLR) at mm10 genomic loci. 153 4.5.7 Clinical and public cancer data analysis The Cancer Genome Atlas (TCGA) Pan-Cancer Atlas (Hoadley et al. 2018) and UCEC cohort- specific data (Cancer Genome Atlas Research Network et al. 2013) were utilized in this study. TCGA Pan-Cancer Atlas somatic mutation data were extracted from the MC3 Public MAF (v0.2.8) analysis data set (Ellrott et al. 2018). Clinical data and ARID1A and TP53 alteration incidence rates specifically in endometrial cancer were extracted from the TCGA Pan-Can UCEC cohort (n = 509) retrieved from cBioPortal (Cerami et al. 2012; Gao et al. 2013). MSK-IMPACT 2017 data (Zehir et al. 2017) were also retrieved from cBioPortal. Cancer dependency map CRISPR knockout screen data from Achilles (DepMap Public 21Q1) were retrieved from the DepMap portal (Meyers et al. 2017; Dempster et al. 2019; Depmap 2020). PARADIGM pathway activity inference (Vaske et al. 2010) data for TCGA Pan-Cancer Atlas were retrieved from NCI Genomic Data Commons (Grossman et al. 2016). For TCGA-UCEC specific molecular analyses, data were retrieved from the 28th January, 2016 release of Broad GDAC Firehose (https://doi.org/10.7908/C11G0KM9). RNASeqV2 RSEM (Li and Dewey 2011) normalized gene counts were further quantile normalized prior to filtering lowly expressed genes (less than one normalized count per sample on average) and fitting linear models via limma (Ritchie et al. 2015) for differential expression analysis in subsets of patients using a single term model matrix. Empirical Bayes moderated statistics were computed via limma::eBayes with arguments `trend = TRUE` and `robust = TRUE`, and probabilities were adjusted for multiple testing by Benjamini-Hochberg FDR correction (Benjamini et al. 2001). Only non-silent frameshift, nonsense, or splice site mutations were considered functional ARID1A mutations in UCEC- specific molecular analyses, e.g. GSEA and DGE comparisons. All non-synonymous or non-silent 154 variants were considered as mutations for TP53, PIK3CA, POLE, and all pan-cancer molecular analyses. “Wild-type” samples included synonymous or silent mutations. 4.5.8 Gene set enrichment analysis For MSigDB Hallmark pathways (Liberzon et al. 2015) and GO Biological Process (Ashburner et al. 2000; The Gene Ontology Consortium 2019) gene set collections (v6.2), Broad GSEA (Subramanian et al. 2005) was performed via GenePattern (Reich, Liefeld et al. 2006) on ortholog- converted DESeq2 normalized counts from experimental mouse data and RNASeqV2 RSEM normalized counts from TCGA-UCEC data. Manual hypergeometric enrichment tests or clusterProfiler (Yu et al. 2012) enrichment functions were computed on gene sets of interest compared to respective expressed gene universes. Identifying significantly over-represented terms in enriched GO Biological Process gene sets was achieved with a hypergeometric enrichment test framework. Briefly, universes of gene set terms were constructed for single words, word doublets, and word triplets within the 3653 measured gene sets. The number of gene sets containing a given term was then computed for the list of enriched gene sets as well as the respective gene set universe. A hypergeometric enrichment test was then employed to determine if the specific term is over- represented within the respective gene set universe, and this framework was applied to all single word, word doublet, and word triplet terms. For simplified visualization in Fig. 4.4d, results were manually curated to omit some duplicate terms, such as “response to virus” and “to virus”. 4.5.9 Bioinformatics and statistics biomaRt was used for all gene nomenclature and mouse-human ortholog conversions (Smedley et al. 2009). The cumulative hypergeometric distribution was calculated in R for enrichment tests. 155 Hierarchical clustering by Euclidean distance and heatmaps were generated by ComplexHeatmap (Gu, Eils, and Schlesner 2016). Mouse mm10 genome sequence conservation across vertebrates, computed via PHAST (Pollard et al. 2010; Siepel et al. 2005), was extracted from the UCSC browser. ggplot2 was used for some plots in this study (Wickham 2016). The statistical language R was used for various computing functions throughout this study (R Core Team 2018). 4.6 Data availability Sequencing data generated in this study will be made publicly available upon publication. Chapter 2 RNA-seq data from isolated endometrial epithelial cells of control (n = 3) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (ARID1A/PIK3CA mutant, n = 4) mice were retrieved and reanalyzed from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) at SuperSeries accession GSE121198. 4.7 Acknowledgments We thank the Van Andel Research Institute Genomics Core for providing sequencing facilities and services. We thank the Van Andel Research Institute Histology and Pathology Core for histology services. 156 CHAPTER 5 SWI/SNF INACTIVATION IN THE ENDOMETRIAL EPITHELIUM LEADS TO LOSS OF EPITHELIAL INTEGRITY A modified version of this chapter was previously published (Reske et al. 2020): Jake J. Reske*, Mike R. Wilson*, Jeanne Holladay, Marc Wegener, Marie Adams, and Ronald L. Chandler. 2020. SWI/SNF inactivation in the endometrial epithelium leads to loss of epithelial integrity. Human Molecular Genetics 29(20): 3412-3430. *These authors contributed equally. 5.1 Abstract Although ARID1A mutations are a hallmark feature, mutations in other SWI/SNF chromatin remodeling subunits are also observed in endometrial neoplasms. Here, we interrogated the roles of Brahma/SWI2-related gene 1 (BRG1, SMARCA4), the SWI/SNF catalytic subunit, in the endometrial epithelium. BRG1 loss affects more than one-third of all active genes and highly overlaps with the ARID1A gene regulatory network. Chromatin immunoprecipitation studies revealed widespread subunit-specific differences in transcriptional regulation, as BRG1 promoter interactions are generally associated with gene activation, while ARID1A binding is generally associated with gene repression. However, we identified a physiologically relevant subset of BRG1 and ARID1A co-activated epithelial identity genes. Mice were genetically engineered to inactivate BRG1 specifically in the endometrial epithelium. Endometrial glands were observed embedded in uterine myometrium, indicating adenomyosis-like phenotypes. Molecular similarities were observed between BRG1 and ARID1A mutant endometrial cells in vivo, including loss of epithelial cell adhesion and junction genes. Collectively, these studies illustrate overlapping 157 contributions of multiple SWI/SNF subunit mutations in the translocation of endometrium to distal sites, with loss of cell integrity being a common feature in SWI/SNF mutant endometrial epithelia. 5.2 Introduction In Chapters 2 and 4, we characterized ARID1A mutant endometrial pathologies in vivo in genetically engineered mice and human tumors, but the roles of other mammalian SWI/SNF subunits in this cell type have not been well studied. BRG1 (Brahma/SWI2-related gene 1, SMARCA4) serves as a core catalytic subunit of the mammalian SWI/SNF complex, and the ARID1A subunit is often found within BRG1-containing SWI/SNF complexes (Clapier and Cairns 2009; Roberts and Orkin 2004). SMARCA4 mutations are observed in many cancer types and linked to some developmental disorders (Alfert, Moreno, and Kerl 2019; Hodges, Kirkland, and Crabtree 2016). Both oncogenic and tumor suppressive roles of BRG1 have been reported in various tumor contexts (Wong et al. 2000; Bai et al. 2013; Roy et al. 2015; Muthuswami et al. 2019). Most SMARCA4 mutations in cancer affect the catalytic ATPase domain leading to defects in DNA translocase activity (Hodges et al. 2018). In particular, SMARCA4 mutations are nearly ubiquitous in the hypercalcemic subtype of small cell ovarian carcinoma (Conlon et al. 2016). A recent case study reported a patient harboring a deleterious germline SMARCA4 mutations with epithelial ovarian cancer and a past history of adenomyosis (Muppala et al. 2017). BRG1 silencing has also been observed across many cancer types (Marquez-Vilendrer, Thompson, et al. 2016). Moreover, loss of BRG1 expression by immunohistochemistry was observed in one-third of undifferentiated endometrial carcinomas screened in a recent report (Ramalingam, Croce, and McCluggage 2017), and BRG1 mutations have also been observed in this disease (Kolin et al. 158 2020). In this study, we aimed to explore the phenotypic and transcriptional effects of BRG1 mutations in the endometrial epithelium and understand their relations to ARID1A loss. 5.3 Results 5.3.1 BRG1 subunit loss in endometrial epithelial cells results in widespread transcriptional changes Few reports have specifically focused on SMARCA4 mutations in the endometrial epithelium. We analyzed The Cancer Genome Atlas (TCGA) Uterine Corpus Endometrial Carcinoma (UCEC) data set and observed frameshift and nonsense SMARCA4 mutations predicted as ‘likely oncogenic’ by OncoKB annotation (Table D.1) (Cancer Genome Atlas Research Network et al. 2013; Hoadley et al. 2018; Chakravarty et al. 2017). Some of these tumors also had a relatively low background mutation rate (<1000 total mutations), suggesting deleterious SMARCA4 mutations may be important events in disease pathogenesis. To explore the consequences of BRG1 loss in the endometrial epithelium, we utilized BRG1 knockdown approaches in the immortalized 12Z human endometrial epithelial cell line (Zeitvogel, Baumann, and Starzinski-Powitz 2001). BRG1 knockdown was achieved by siRNA (siBRG1) transfection and confirmed by western blot, compared to a non-targeting siRNA (control) (Fig. 5.1a-b). Knockdown of BRG1 did not result in a change in 12Z cell growth (Fig. 5.1c), suggesting that cellular growth is not affected following depletion of BRG1. RNA-seq was performed following BRG1 knockdown (siBRG1). The data were further compared to our ARID1A knockdown (siARID1A) RNA-seq data set from 12Z cells generated in Chapter 2 study. BRG1 loss resulted in the DE (differential expression; differentially expressed) of 7708 genes compared to controls [false discovery rate (FDR) < 0.0001], and a greater number of genes were 159 affected by BRG1 loss compared to ARID1A loss (Fig. 5.1d-e). In addition to a greater number of genes, BRG1 loss resulted in a greater magnitude of gene expression change compared to ARID1A loss, with 11.5% of siBRG1 DE genes displaying >4-fold linear change in expression compared to <1% of siARID1A DE genes (Fig. 5.1f). Overall, siBRG1 DE genes had a significantly greater absolute fold change relative to control cells than siARID1A (two-tailed, unpaired Wilcoxon test, p < 10-15) (Fig. 5.1g). A total of 1492 overlapping DE genes were affected by both BRG1 and ARID1A loss (hypergeometric enrichment, p < 10-283) (Fig. 5.1h), indicating that ARID1A and BRG1 subunits regulate shared transcriptional programs. We further examined intersecting gene regulatory networks mediated by both ARID1A and BRG1 in 12Z cells. DE genes identified in siBRG1-treated cells were more likely to be downregulated if they were also affected by ARID1A loss (two-tailed Fisher’s exact test, p < 10-5, odds ratio (OR) = 1.33) (Fig. 5.1i). In spite of differences, 60.9% of siBRG1 and siARID1A shared DE genes were affected in the same direction following BRG1 loss and ARID1A loss (Fig. 5.1j). Notably, among both activated and downregulated shared genes, BRG1 loss resulted in a greater magnitude of transcriptional change than ARID1A loss for these genes (Fig. 5.1k). We further compared the direction of gene expression change in both shared and unique DE gene classes (Fig. 5.1l). Enrichment for MSigDB Hallmark pathways (Liberzon et al. 2015) was then calculated among the varying classes of gene regulation orchestrated by ARID1A and BRG1, separated further by directionality (Fig. 5.1m). Certain processes exhibited distinct patterns of transcriptional regulation by SWI/SNF subunits: (1) pathways repressed by both BRG1 and ARID1A, such as interferon response and fatty acid metabolism, (2) repressed by ARID1A but activated by BRG1, such as unfolded protein response and (3) repressed by ARID1A but both repressive and activating regulation by BRG1, such as epithelial-to-mesenchymal transition (EMT) (Fig. 5.1m and D.1). 160 Figure 5.1 BRG1 and ARID1A loss in endometrial epithelial cells lead to widespread differences in transcriptional regulation 161 Figure 5.1 (cont’d) a, Western blot for BRG1 and β-actin following siRNA treatment with non-targeting control or siBRG1. b, Relative densitometry quantification for BRG1 protein levels on western blots from independent experiments (n = 2) as in a. Statistic is two-tailed, unpaired t-test. c, Cell growth assay results showing no significant difference in growth between 48- and 72-hours following siRNA treatment in control and siBRG1 cells. d, Transcriptomic visualization of DGE in BRG1 knockdown (siBRG1)-treated 12Z cells (n = 3 replicates; 23,148 expressed genes). x-axis displays regularized-logarithm (rlog) gene expression in control cells, and y-axis shows siBRG1-treated cells. Black dots represent stable (n.s.) genes, and red dots represent significant DE genes (FDR < 0.0001, n = 7708 genes). e, Same as in d but for ARID1A knockdown (siARID1A)-treated cells (n = 2206 significant DE genes at same FDR threshold) based on previously reported data. f, Histogram of expression log2 fold-change (log2FC) values in DE genes from siBRG1- versus siARID1A-treated compared to control 12Z cells. g, Violin plot of DGE magnitude (represented by absolute log2FC values) in siBRG1- versus siARID1A-treated 12Z cells. Statistic is two-tailed, unpaired Wilcoxon test. h, Significant overlap of DE genes with siBRG1 and siARID1A treatment. Statistic is hypergeometric enrichment. i, Frequency of gene upregulation versus downregulation in (top) siBRG1 DE genes, which are also affected by siARID1A versus not, compared to (bottom) siARID1A DE genes, which are also affected by siBRG1 versus not. Statistic is two-tailed Fisher’s exact test. j, Heatmap of expression log2FC values in overlapping DE genes by either BRG1 or ARID1A loss. Red values indicate gene upregulation and blue values indicate downregulation. 60.9% of shared DE genes display same direction between loss of the two subunits. k, Violin plots displaying directional DGE magnitude for shared upregulated genes (left) or shared downregulated genes (right). Statistic is two-tailed, unpaired Wilcoxon test. l, Classification of siBRG1/siARID1A shared DE genes into four groups based on directionality. y-axis of bar plots displays number of genes in each class. m, Heatmap of MSigDB Hallmark pathway enrichment, represented as observed/expected ratios compared to the expressed gene universe, for different DGE classes based on shared versus unique SWI/SNF subunit regulation and directionality. Red to black values indicate pathway overrepresentation among the gene set of interest. White cells represent pathways with less than 2-fold overrepresentation. * p < 0.05, ** p < 0.01, *** p < 0.001. 162 5.3.2 Promoter chromatin interaction patterns underlie transcriptional contributions of SWI/SNF subunits We next used ChIP followed by sequencing (ChIP-seq) to discern how ARID1A and BRG1 directly regulate transcription in endometrial epithelial cells. In wild-type 12Z cells, we profiled genome-wide BRG1 binding by ChIP-seq and identified 6343 significant binding signal peaks (Fig. 5.2a). Intergenic and intronic regions comprised the majority of BRG1 binding loci (collectively 91.1% of peaks), while gene promoter regions comprised the next largest binding annotation class (defined as within 3 kb of a gene TSS; 7.3% of peaks) (Fig. 5.2a). In Chapter 2 studies, we identified a role for ARID1A in regulating gene expression via promoter interactions, so we first focused on this element of transcriptional regulation. We identified 270 expressed genes displaying significant BRG1 promoter binding, and Gene Ontology (GO) Biological Process (BP) gene set enrichment (Ashburner et al. 2000; The Gene Ontology Consortium 2019) showed that these genes were involved in relevant epithelial processes such as locomotion, cell motility and biological adhesion (Fig. 5.2b). Further, BRG1 promoter binding status was significantly associated with gene expression misregulation following BRG1 loss (hypergeometric enrichment, p < 10-74) (Fig. 5.2c), indicating that BRG1 promoter chromatin interactions regulate transcription, as was demonstrated with the ARID1A subunit in Chapter 2. Moreover, we leveraged our ARID1A ChIP-seq data from this cell line, generated in Chapter 2, to demonstrate that gene promoter chromatin regulation by BRG1 strongly overlaps with ARID1A (hypergeometric enrichment, p < 10-218) (Fig. 5.2d). We next used our BRG1 and ARID1A ChIP-seq data in tandem to better understand how SWI/SNF promoter chromatin interactions influence subunit-specific transcriptional regulation. BRG1 and ARID1A binding were quantified across the entire 6 kb region flanking gene TSS (i.e. 163 ±3 kb) as a standardized measurement of promoter regulation. Intriguingly, we noted that BRG1 promoter binding was significantly stronger at genes downregulated following BRG1 depletion as opposed to upregulated genes (two-tailed, unpaired Wilcoxon test, p < 10-15) (Fig. 5.2e). ARID1A displayed the opposite effect, as promoter binding was significantly stronger at genes upregulated following ARID1A depletion (two-tailed, unpaired Wilcoxon test, p < 10-15) (Fig. 5.2e). These results suggest that BRG1 overall plays direct roles in gene activation in the endometrial epithelium, while ARID1A overall functions in gene repression in this context. To further explore the relationship between ARID1A and BRG1, we used ARID1A and BRG1 ChIP-seq data in combination with our ATAC-seq data in 12Z cells, generated in Chapter 2, for comparisons with siARID1A and siBRG1 differential gene expression (DGE). ARID1A binding, BRG1 binding, and chromatin accessibility were highest at promoters of genes displaying overlapping transcriptional regulation by both subunits (Fig. 5.2f). This suggests that overlapping ARID1A- BRG1 direct target genes are more likely to be associated with accessible chromatin and transcriptional regulation by SWI/SNF. We further investigated the promoter chromatin at specific SWI/SNF regulatory classes of shared DE genes (described in Fig. 5.1l) affected following loss of either subunit, further separated by directionality. Genes normally activated by BRG1 and ARID1A, corresponding to downregulated genes in siBRG1- and siARID1A-treated cells, displayed the strongest promoter SWI/SNF binding and chromatin accessibility (Fig. 5.2g). The RNA-seq data further indicate that these genes are more significantly downregulated with BRG1 loss compared to other siBRG1 DE genes (Fig. 5.2h). Overall, these analyses suggest that both BRG1 and ARID1A SWI/SNF subunits regulate gene expression through promoter chromatin interactions at many shared target genes, including a highly regulated class of genes normally activated by SWI/SNF. 164 Figure 5.2 BRG1 and ARID1A activate highly regulated gene promoters 165 Figure 5.2 (cont’d) a, Genomic annotation of significant genome-wide BRG1 binding sites in 12Z cells as measured by ChIP-seq (6343 broad peaks overlapping in n = 2 IP). Gene promoter peaks are defined as located within 3 kb of a TSS. b, Significant (FDR < 0.05) GO BP enrichment among 270 BRG1 promoter-bound and expressed genes. c, Significant association between BRG1 promoter binding and DGE following BRG1 loss. Statistic is hypergeometric enrichment. d, Significant overlap of BRG1 and ARID1A promoter binding at expressed genes in 12Z cells. Statistic is hypergeometric enrichment. e, Violin plots displaying gene promoter ChIP fold enrichment (FE) by SWI/SNF subunits at respectively upregulated versus downregulated genes following subunit depletion. Left is BRG1 regulation, and right is ARID1A regulation. ChIP FE is quantified by IP/input chromatin signal, where FE = 1 indicates no enrichment, representative of the entire ±3 kb promoter region. Statistic is two-tailed, unpaired Wilcoxon test. f, Measurement of (left) BRG1 binding, (center) ARID1A binding and (right) chromatin accessibility (ATAC) at gene promoters classified by SWI/SNF subunit depletion DGE. Promoter ATAC signal is quantified as fragments per kilobase per million mapped reads (FPKM). Statistic is two-tailed, unpaired Wilcoxon test. g, Promoter chromatin SWI/SNF binding and accessibility as in f but instead at overlapping DE functional gene classes as described in Fig. 5.1. Statistic is two-tailed, unpaired Wilcoxon test. h, DGE expression classes following (left) BRG1 loss or (right) ARID1A loss, further broken down into co-regulation status by the opposing subunit. Statistic is two-tailed, unpaired Wilcoxon test. * p < 0.05, ** p < 0.01, *** p < 0.001. UTR, untranslated region; TSS, transcription start site; TTS, transcription termination site. 166 5.3.3 BRG1 and ARID1A promote transcription at epithelial identity genes The observation that ARID1A and BRG1 activate a similar set of highly regulated genes warranted further investigation into overlapping SWI/SNF complex roles in the endometrial epithelium. We narrowed down our findings to 35 genes that are directly activated by both BRG1 and ARID1A, which display ChIP promoter binding and DGE with subunit loss (Fig. 5.3a). MSigDB gene set enrichment analysis (GSEA) of all 35 genes identified 17 significant GO BP gene sets (FDR < 0.05) (Fig. 5.3b). Enriched gene sets included biological adhesion, cell projection organization, locomotion, taxis and cell-cell adhesion (Fig. 5.3b). Further, the top enriched Gene Transcription Regulation Database (GTRD) (Yevshin et al. 2019) transcription factor (TF) network among these genes was LIM homeobox 9 (LHX9) (Fig. D.2). SWI/SNF target genes found in this GO BP were often highly expressed in 12Z cells (Fig. 3C). Representative genes showed overlapping BRG1 and ARID1A promoter chromatin binding patterns (Fig. 5.3d). We further investigated the TF motifs enriched specifically at BRG1 and ARID1A co-bound ChIP-seq peaks within SWI/SNF-regulated gene promoters. Among all 332 promoter peaks co-bound by BRG1 and ARID1A, the top motifs included AP-1, TEAD, RUNX and ETS (Fig. 5.3e). At 34 promoter peaks specifically within the direct activating SWI/SNF target genes, AP-1, ETS and TEAD were also significantly enriched (Fig. 5.3f). Altogether, these data suggest that SWI/SNF activated genes in endometrial epithelial cells are involved in cell-cell interactions and cell adhesion, which may be affected upon mutation or loss of expression. 167 Figure 5.3 SWI/SNF directly promotes transcription at epithelial identity genes 168 Figure 5.3 (cont’d) a, Identification of 35 genes, by RNA-seq and ChIP-seq experiments, that are directly promoter bound by BRG1 and ARID1A and downregulated following depletion of either subunit. b, MSigDB GSEA results for GO BP, showing 17 significantly enriched (FDR < 0.05) gene sets among the 35 direct activating SWI/SNF target genes. Hash indicates the number of SWI/SNF target genes found within that GO BP gene set. Gene set constituent matrix shows the pairwise membership of each gene to the enriched GO BP gene sets, where black cells indicate a gene belongs to that gene set. Genes are arranged from left to right based on number of enriched sets. c, Baseline expression ranking of all 23,148 expressed genes in 12Z cells with relevant SWI/SNF target genes of interest denoted in red. y-axis is gene expression, represented by rlog counts as reported by DESeq2. d, Functional evidence of activating SWI/SNF target gene regulation by BRG1 and ARID1A subunits at ALCAM, MYPN, NTN4, RHOBTB3, ENAH, SHC3, NEGR1, ANXA2 and PAWR. Left, all listed genes are downregulated with both BRG1 loss and ARID1A loss. Statistic is FDR as reported by DESeq2 and IHW. Right, all listed genes display significant SWI/SNF overlapping promoter chromatin binding. Locus snapshot y-axis is log-likelihood ratio (logLR) of ChIP signal compared to input chromatin, and the small bars under signal traces indicate the presence of significant (FDR < 0.05) binding peak. e, Top significant results from known motif analysis of all BRG1 promoter ChIP-seq peaks co- bound by ARID1A (n = 332), comprising the 256 genes co-bound by BRG1 and ARID1A (described in a). f, Motif analysis as in e but for the 34 BRG1 promoter peaks co-bound by ARID1A at direct activating SWI/SNF target genes defined in a. 5.3.4 SWI/SNF functions at distal open chromatin regions to regulate epithelial identity genes While these results suggest that BRG1 indeed regulates transcription through promoter chromatin interactions, the majority of BRG1 binding sites (92.7%) were located distally in our data. As such, we further investigated the roles of BRG1 and SWI/SNF at distal binding sites located at least 3 kb from a gene TSS. Motif analysis revealed a different set of TF enriched specifically at BRG1 distal binding sites, including STAT, GATA, ZF and SMAD (Fig. 5.4a). Genome-wide, >75% of BRG1 peaks and BRG1 bound genomic bases are co-occupied by ARID1A, a highly significant overlap (hypergeometric enrichment, p = 0) (Fig. 5.4b-c). A similar extent of overlap was also observed specifically at distal binding sites (Fig. 5.4d). 169 We next used ATAC-seq peaks marking open chromatin as an indicator of potential regulatory activity at distal genomic regions bound by SWI/SNF, since chromatin accessibility is strongly associated with TF binding and other forms of chromatin regulation (Klemm, Shipony, and Greenleaf 2019). Out of 27,206 distal ATAC peaks, we observed BRG1 and ARID1A co- bound at 3233 sites, ARID1A bound without significant BRG1 at 18,376 sites, BRG1 bound without ARID1A at 61 sites and neither SWI/SNF subunit bound at the remaining 5536 sites (Fig. 5.4e). Of these four classes of putative active regulatory elements, BRG1 + ARID1A co-bound distal open chromatin sites showed the greatest accessibility in terms of magnitude and peak width (Fig. 5.4f-g). Further, BRG1 and ARID1A binding were strongest at co-bound sites as opposed to sites bound by either individually (Fig. 5.4f-g). These results suggest that BRG1 + ARID1A co- bound distal open chromatin sites are likely highly active regulatory elements. Motif analysis was then performed on each of the four classes of distal open chromatin sites and numerous patterns emerged. The TF motif patterns included: enriched in BRG1 only sites (blue, e.g. STAT), enriched in all ARID1A-bound sites (green, e.g. TEAD), enriched in sites not bound by BRG1 (gray, e.g. RFX), enriched in sites bound by either subunit individually but not both (purple, e.g. CTCF) and universally enriched but most strongly in ARID1A-bound sites (black, e.g. AP-1) (Fig. 5.4h). In order to associate distal open chromatin sites with potential transcriptional regulatory functions, we used the GeneHancer database (Fishilevich et al. 2017) to link distal ATAC sites to specific genes as supported by computational and experimental evidence (Fig. 5.4i). Hallmark pathway and GO BP enrichment analysis were then performed on genes associated with distal open chromatin sites segregated by SWI/SNF binding, and relevant cellular processes were highlighted in the BRG1 + ARID1A and ARID1A-only classes (Fig. 5.4i). Notably, this analysis revealed that BRG1 and ARID1A co-regulate distal elements associated with genes involved in extracellular 170 structure organization, cell junction organization, cell junction assembly, cell substrate adhesion and cell matrix adhesion (Fig. 5.4i), corroborating promoter regulation results. Further evidence that distal open chromatin sites bound by SWI/SNF are functional was achieved by segregating genes based on the number of associated SWI/SNF-bound distal open chromatin sites. We observed that genes with a greater number of associated SWI/SNF-bound distal open chromatin sites were more likely to be transcriptionally misregulated following SWI/SNF subunit loss (Fig. 5.4j). For example, out of 3962 genes with exactly one associated SWI/SNF-bound distal open chromatin site, 43% were DE with siBRG1 treatment (Fig. 5.4j). In contrast, 55% of the 381 genes with six or more associated SWI/SNF-bound distal open chromatin sites were DE with siBRG1 treatment (Fig. 5.4j). We also noted that baseline expression levels were significantly higher at genes with 6+ associated SWI/SNF-bound distal open chromatin sites as opposed to those with exactly 1 (two-tailed, unpaired Wilcoxon test, p < 10-15) (Fig. 5.4k). Remarkably, associated distal open chromatin sites were more likely to be BRG1 + ARID1A co- bound at genes with 6+ SWI/SNF-bound distal open chromatin sites as opposed to those with exactly 1 (two-tailed Fisher’s exact test, OR = 6.16, p < 10-15) (Fig. 5.4l). Altogether, these results suggest that SWI/SNF regulation of distal elements contributes to gene expression at epithelial identity genes in the endometrial epithelium. 171 Figure 5.4 SWI/SNF activity at distal sites also regulates expression of epithelial identity genes 172 Figure 5.4 (cont’d) a, Top significant results from known motif analysis of BRG1 distal ChIP-seq peaks located at least 3 kb from a gene TSS (n = 5983). b, Proportional Euler diagram displaying overlapping BRG1 and ARID1A genome-wide ChIP- seq peaks. Intersect number is an approximation. c, Significant association of ARID1A binding at BRG1-occupied genomic bases. Statistic is hypergeometric enrichment. d, Proportional Euler diagram as in B, but for only distal ChIP-seq peaks, located at least 3 kb from a gene TSS. e, Proportional Euler diagram displaying distal ATAC-seq peaks overlapping with ARID1A and/or BRG1 binding or neither. Statistic is hypergeometric enrichment. f, Violin plots quantifying ATAC peak width (bp), accessibility magnitude (ATAC FPKM), BRG1 binding (FE) and ARID1A binding (FE) across distal open chromatin sites segregated by SWI/SNF binding as defined in E. Statistic is two-tailed, unpaired Wilcoxon test. g, Signal heatmap of ATAC, BRG1 binding and ARID1A binding across all distal open chromatin sites segregated by SWI/SNF binding as defined in e. ATAC is quantified as reads per million per bp (RPM/bp), while BRG1 and ARID1A binding are quantified by FE. h, Comprehensive motif analysis and clustering of distal open chromatin sites. Heatmap displays FE for motif presence in regions of interest compared to background genomic regions, for 103 known motifs significant (HOMER -log(p) > 60) in at least one distal open chromatin peak set. Manually curated regulatory patterns of interest were then extracted and given an arbitrary color annotation. Representative motif logos are displayed. i, Hallmark pathway and GO BP enrichment analysis for genes associated with distal open chromatin sites bound by BRG1 + ARID1A (n = 2972 genes), ARID1A-only (n = 7622 genes), BRG1-only (n = 139 genes) and neither (n = 3819 genes). j, Enrichment for siBRG1 (top) or siARID1A (bottom) DGE among genes associated with SWI/SNF-bound distal open chromatin sites via GeneHancer database, based on the number of associated sites: all genes (n = 23,148), genes with 0 associated sites (n = 14,727), genes with 1 (n = 3962), genes with 2 (n = 2130), genes with 3 (n = 1088), genes with 4 (n = 562), genes with 5 (n = 298), or genes with 6 or more (n = 381). Enrichment statistic is hypergeometric enrichment. Pairwise statistic is two-tailed Fisher’s exact test. k, Baseline expression as quantified by rlog counts for genes as described in j. Statistic is two- tailed, unpaired Wilcoxon test. l, Distribution of SWI/SNF binding classification for associated sites of genes with exactly 1 (n = 3962 genes) versus 6+ (n = 381 genes) SWI/SNF-bound distal open chromatin sites. Statistic is two-tailed Fisher’s exact test. * p < 0.05, ** p < 0.01, *** p < 0.001. 173 5.3.5 BRG1 loss in the mouse endometrial epithelium promotes gland translocation to the uterine myometrium To discern the consequences of BRG1 loss in vivo, we developed a GEMM of BRG1 loss specifically in the endometrial epithelium. We utilized LtfCre (Tg(Ltf-iCre)14Mmul) (Daikoku et al. 2014) in crosses with the Brg1fl allele (Gebuhr et al. 2003) (Fig. 5.5a). LtfCre becomes active when mice reach sexual maturity (Daikoku et al. 2014). Inheritance of the Brg1fl allele was confirmed by polymerase chain reaction (PCR) (Fig. 5.5b). Heterozygous (LtfCre0/+; Brg1fl/+) or homozygous (LtfCre0/+; Brg1fl/fl) loss of BRG1 in the endometrial epithelium resulted in no macroscopic disease burden or deaths, and mice were aged out to a maximum of 48 weeks (n = 9 and 10, respectively). Gross inspection of LtfCre0/+; Brg1fl/+ or LtfCre0/+; Brg1fl/fl uterus revealed no obvious phenotypic consequences of BRG1 loss (Fig. 5.5c). Upon histological analysis, no abnormal features were observed in the uterus of LtfCre0/+; Brg1fl/+ mice (n = 3; Fig. 5.5d). However, endometrial glands were observed within the myometrium of all inspected LtfCre0/+; Brg1fl/fl mice (n = 4), indicating adenomyosis-like phenotypes (Fig. 5.5e and D.3). Endometrial BRG1 expression localized to the glandular and luminal epithelium of LtfCre0/+; Brg1fl/fl mice varied, as some glands and epithelial cells retained BRG1 expression, while others displayed loss of BRG1 expression by immunohistochemistry (Fig. 5.5e and D.3). Remarkably, endometrial glands found within the myometrium displayed uniform loss of BRG1 expression (Fig. 5.5e and D.3). The observation that exclusively BRG1-null glands were observed in the myometrium, but not the endometrium, suggests that BRG1 loss is a requirement for gland translocation of LtfCre+ endometrial epithelial cells. Glands found both in the endometrium and myometrium expressed cytokeratin 8 (KRT8), a marker of endometrial epithelia, confirming cellular identity (Fig. 5.5e and D.3). Since adenomyosis is further characterized as a progesterone-resistant condition (Tan, 174 Yong, and Bedaiwy 2019) and often leads to PGR silencing (Inoue et al. 2019), we additionally examined the expression of progesterone receptor (PGR). LtfCre0/+; Brg1fl/fl endometrial glands displayed PGR loss in myometrium-localized glands (Fig. 5.5e and D.3). These results suggest that BRG1 normally prevents myometrial invasion of endometrial epithelia and genetic disruption promotes invasive pathology and adenomyosis-like phenotypes. 175 Figure 5.5 Genetically engineered mice harboring BRG1 loss in the endometrial epithelium develop adenomyosis-like phenotypes 176 Figure 5.5 (cont’d) a, Diagram of Brg1fl allele. b, PCR verification of Brg1fl allele presence in wild-type, heterozygous and homozygous mice. c, Gross images of LtfCre0/+; Brg1fl/+ and LtfCre0/+; Brg1fl/fl mice. White arrowheads denote the uterus without visible, gross abnormalities. d-e, Representative hematoxylin and eosin (H&E) staining and IHC for BRG1, KRT8, and PGR (only in homozygotes) in the uterus of (d) LtfCre0/+; Brg1fl/+ or (e) LtfCre0/+; Brg1fl/fl mice at 5X (scale bar = 500 μm), 10X (scale bar = 300 μm) or 20X (scale bar = 200 μm) magnification. Endometrial and myometrial images are representative magnifications of sections identified within whole uterus surrounded by black box. BRG1 expression is lost in the endometrial epithelium of LtfCre0/+; Brg1fl/fl mice. KRT8 is a marker of endometrial epithelium. Arrowheads indicate endometrial epithelium, and high magnification images of these glands have been provided in Fig. D.3. 177 5.3.6 Transcriptomic analysis of BRG1-null endometrial epithelium reveals actin cytoskeletal and anchoring junction defects In order to characterize molecular programs driving the BRG1 mutant endometrial epithelial phenotype, we performed RNA-seq on sorted endometrial epithelial cells from 120-day- old (60 days post-Cre induction) LtfCre0/+; Brg1fl/fl mice using the EPCAM+ positive enrichment magnetic sorting method we developed in Chapter 2 (Wilson et al. 2019). EPCAM-positive endometrial epithelial cells were enriched from LtfCre0/+; Brg1fl/fl mice at 82% purity on average (Fig. D.4). Compared to endometrial epithelial cells from control mice, cells from LtfCre0/+; Brg1fl/fl mice displayed widespread transcriptomic misregulation, with 3145 genes significantly DE (Fig. 5.6a). As validation, BRG1 expression (Smarca4) was uniformly depleted (FDR < 10-10) among sorted mutant cells (Fig. 5.6b). In mutant cells, equal numbers of DE genes with increasing (51.6%) and decreasing (48.4%) expression were observed (Fig. 5.6c). Gene expression changes in LtfCre0/+; Brg1fl/fl significantly overlapped with those from siBRG1 12Z cells (hypergeometric enrichment, p < 10-89) (Fig. 5.6d). Pathway analyses revealed that upregulated genes were enriched for Hallmark pathways, including G2M checkpoint and E2F targets (Fig. 5.6e), while downregulated genes were enriched for TNFα signaling, p53 pathway and hypoxia response (Fig. 5.6f). Among GO BP gene sets, many downregulated pathways were related to cellular invasion, including anchoring junction, actin filament-based process, cell substrate junction and actin cytoskeleton (Fig. 5.6f). These results suggest that BRG1 normally maintains non-invasive, cell- adherent phenotypes in the endometrial epithelium in vivo by promoting transcription of cell junction and actin cytoskeletal processes, similarly as observed in 12Z cell studies. 178 Figure 5.6 LtfCre0/+; Brg1fl/fl endometrial epithelial cells display loss of actin cytoskeletal and cellular junction programs 179 Figure 5.6 (cont’d) a, Volcano plot of RNA-seq differential gene expression in sorted EPCAM-positive endometrial epithelial cells purified from LtfCre0/+; Brg1fl/fl (n = 4) versus control (n = 3) mice. x-axis is log2 fold-change (log2FC) mutant versus control, and y-axis is significance as −log10(FDR). 3145 genes were identified as significant by FDR < 0.05 threshold. b, Relative Smarca4 (BRG1) linear expression in control and LtfCre0/+; Brg1fl/fl cells. Significance is FDR value reported by DESeq2 and IHW. c, Relative expression heatmap of 3145 significant DE genes. Heatmap displays expression as scaled (Z-scored) rlog counts. 51.6% of DE genes are upregulated in LtfCre0/+; Brg1fl/fl cells, and 48.4% of DE genes are downregulated. d, Significant overlap between DE genes in LtfCre0/+; Brg1fl/fl versus control sorted endometrial epithelial cells and siBRG1-treated versus control 12Z cells. Statistic is hypergeometric enrichment. e-f, Significant MSigDB Hallmark (top) and GO BP (bottom) gene set enrichment results from human orthologs of (e) upregulated and (f) downregulated genes in LtfCre0/+; Brg1fl/fl cells. 5.3.7 SWI/SNF subunits commonly regulate epithelial cell adhesion and junction programs In Chapter 2, we generated two GEMM of endometrial epithelial ARID1A loss, with or without PI3K activation via oncogenic PIK3CAH1047R expression. While mice with ARID1A loss alone (LtfCre0/+; Arid1afl/fl) display endometrial cell death, ARID1A/PIK3CA co-mutant mice (LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl) display myometrial and peritoneal invasion (Wilson et al. 2019; Wilson, Holladay, and Chandler 2020). In order to identify consistent features among SWI/SNF-mutant, invasive endometrial epithelium, we next compared gene expression changes observed in sorted endometrial epithelial cells from LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl to LtfCre0/+; Brg1fl/fl. We initially observed 578 overlapping DE genes affected in both GEMM, a significant overlap (hypergeometric enrichment, p < 10-61) (Fig. 5.7a). Moreover, 88% of these overlapping DE genes displayed the same directionality change between the two models, with half of these genes downregulated in both models (Fig. 5.7b), indicating overlapping functions of BRG1 and ARID1A in vivo. Broad GSEA was performed on endometrial epithelial transcriptomes 180 of each GEMM versus control cells for MSigDB Hallmark pathways and GO BP gene sets, and normalized enrichment score (NES) results were compared between the two GEMM. Relatively few gene sets were activated in BRG1-null endometrial epithelia, but many critical Hallmark pathways were downregulated in both GEMM, including Hallmark estrogen and androgen hormone responses, hypoxia, and the unfolded protein response (Fig. 5.7C). Notably, EMT was downregulated in BRG1-null mice (NES = -1.46, p = 0.07, FDR = 0.15) (Fig. 5.7C). In Chapter 2, we demonstrated that EMT activation is a feature of ARID1A mutant cells (Wilson et al. 2019), suggesting that this process may not occur following BRG1 loss. Moreover, among GO BP gene sets (Fig. 5.7d-e), other gene sets were significantly downregulated in BRG1-null mice and activated in ARID1A/PIK3CA mutant mice, including positive regulation of actin cytoskeleton reorganization (Fig. 5.7f). However, gene sets downregulated in both GEMM include epithelial cell-cell adhesion and cell junction assembly (Fig. 5.7g-h). We also observed three related cellular junction GO BP gene sets that were significantly enriched in genes downregulated in 12Z following subunit depletion (Fig. 5.7i). These data suggest that, while subunit-specific roles are frequently observed, loss of epithelial cell adhesion and cell junction regulation are shared phenotypes among endometrial SWI/SNF subunit mutations. 181 Figure 5.7 Endometrial SWI/SNF mutant mouse models converge on loss of cellular adhesion and junction 182 Figure 5.7 (cont’d) a, Proportional Euler diagram displaying significant overlap of DE genes in sorted epithelial cells from both endometrial epithelial-specific SWI/SNF mutant GEMM versus control mice. Statistic is hypergeometric enrichment. b, Directional classification of 578 overlapping DE genes between both GEMM. 88% of overlapping DE genes are affected in the same direction, with the dominant class being shared downregulated genes in both GEMM. c-d, Overview of Broad GSEA results for (c) MSigDB Hallmark pathways (n = 50) and (d) GO BP gene sets (n = 3879) in both GEMM. Enrichment is presented as GSEA NES, where positive values indicate pathway upregulation and negative values indicate downregulation. x-axis displays enrichment in LtfCre0/+; Brg1fl/fl mice, and y-axis displays enrichment in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice. e, Zoom into enriched (|NES| > 1) pathways based on directionality between the two models. Representative, highly enriched pathway titles are overlaid on each plot. Red and bolded pathways are representatively predicted as important to GEMM phenotype and further display associated waterfall plots and statistical results in f-h. f, Positive regulation of actin cytoskeleton reorganization is downregulated in LtfCre0/+; Brg1fl/fl cells but upregulated in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl cells. g-h, Epithelial cell-cell adhesion and cell junction assembly are mutually downregulated in both GEMM. i, GO BP enrichment results for three significant (FDR < 0.10) cellular junction gene sets in 12Z cells treated with (left) siBRG1 or (right) siARID1A versus control. Hash indicates the number of DE genes in each condition belonging to that gene set. 183 5.4 Discussion These studies provide evidence that subunit-specific mutations within the SWI/SNF chromatin remodeling complex contribute to endometrial pathogenesis by inducing invasion-like phenotypes. In vitro studies in immortalized human endometrial epithelial cells revealed that loss of either ARID1A or BRG1 subunits results in misregulation of physiologically relevant gene regulatory networks. Intriguingly, widespread differences in subunit-specific target gene chromatin regulation were observed, where ARID1A mostly represses transcription, and BRG1 mostly activates transcription. Despite this, overlapping downregulated genes following loss of either BRG1 or ARID1A, corresponding to genes requiring normal SWI/SNF function for proper transcriptional activation, play roles in cell adhesion and cell junction. BRG1 and ARID1A were also shown to directly co-regulate distal open chromatin sites associated with genes involved in epithelial identity processes. In vivo, endometrial glands without obvious oncogenic transformation or hyperplasia were observed in the myometrium following BRG1 deletion in the endometrial epithelium in our newly characterized genetically engineered mouse model. Abnormal endometrial glands in the uterine myometrium are a pathological hallmark of adenomyosis. Co- existent ARID1A/PIK3CA mutations in the endometrial epithelium also promote myometrial invasion alongside hyperplasia, although ARID1A mutation alone is not sufficient for this process to occur, as we showed in Chapter 2 (Wilson et al. 2019). Mutually, BRG1 and ARID1A mutant mouse models converge upon loss of epithelial adhesion and junctions. These results suggest that SWI/SNF subunits have overlapping roles in maintaining epithelial identity and preventing displacement of endometrial glands, while also displaying subunit-specific functions. 184 5.5 Methods 5.5.1 Mice All mice were maintained on an outbred genetic background using CD-1 mice (Charles River, Wilmington, MA). LtfCre (Tg(Ltf-iCre)14Mmul) allele was purchased from The Jackson Laboratory (Bar Harbor, ME) and verified by PCR using published methods (Daikoku et al. 2014). Inheritance of the Brg1fl allele was confirmed using published genotyping methods (Gebuhr et al. 2003). Mice were monitored for signs of distress including vaginal bleeding and severe abdominal distension, as well as signs of severe illness including dehydration, hunching, jaundice, ruffled fur, signs of infection or non-responsiveness. Mice were housed at the Michigan State University Grand Rapids Research Center in accordance with protocols approved by Michigan State University. Michigan State University is registered with the U.S. Department of Agriculture and has an approved Animal Welfare Assurance from the NIH Office of Laboratory Animal Welfare. Michigan State University is accredited by the Association for Assessment and Accreditation of Laboratory Animal Care. 5.5.2 Cell lines 12Z immortalized human endometrial epithelial cells (Zeitvogel, Baumann, and Starzinski-Powitz 2001) were maintained in DMEM/F12 media supplemented with 10% fetal bovine serum (FBS), 1% L-glutamine and 1% penicillin/streptomycin (P/S). 5.5.3 Transfection of 12Z cells with siRNA 12Z cells were seeded at a density of 40,000 cells/ml in DMEM/F12 media supplemented with 10% FBS and 1% L-glutamine. After 24 hours, cells were transfected with 50 pmol/ml of siRNA 185 (Dharmacon, ON-TARGETplus Non-targeting Pool and human SMARCA4 #6597 SMARTpool) using the RNAiMax (ThermoFisher, Waltham, MA) lipofectamine reagent at a ratio of 1:1 volume:volume in OptiMEM (Gibco, ThermoFisher), and the media were replaced after 24 hours. Media were replaced again 48 hours post-transfection with DMEM/F-12 media supplemented with 0.5% FBS, 1% P/S and 1% l-glutamine. Transfected 12Z cells were harvested 72 hours post- transfection using the Quick-RNA Miniprep Kit (Zymo Research, Irvine, CA) for RNA or RIPA buffer (Cell Signaling, Danvers, MA) for protein. 5.5.4 Cell growth assay For cell growth assay, cells were initially seeded at a density of 4000 cells per well in a 96-well plate (n = 4 wells per condition). At 48- and 72-hours post-transfection, cells were incubated with 2 μg/ml calcein-AM for 1 hour, and fluorescence was measured using a SpectraMax i3x (Molecular Devices, San Jose, CA). 5.5.5 Histology and immunohistochemistry Mice were euthanized by carbon dioxide inhalation, and uteri were collected. For indirect immunohistochemistry (IHC), 10% neutral buffered formalin-fixed paraffin sections were processed for heat-based antigen unmasking in 10 mm sodium citrate [pH 6.0]. Sections were incubated with antibodies at the following dilutions: 1:200 BRG1 (ab110641, Abcam, Cambridge, United Kingdom); 1:400 PGR (SAB5500165, Sigma, St. Louis, MO); 1:100 KRT8 (Cytokeratin 8) (TROMA-I, DHSB, Iowa City, IA). TROMA-I antibody was deposited to the Developmental Studies Hybridoma Bank (DSHB) by Brulet, P./Kemler, R. (DSHB Hybridoma Product TROMA- I). The following biotin-conjugated secondary antibodies were used: donkey anti-rabbit IgG (711- 186 065-152, Jackson Immuno Research Labs, West Grove, PA) and donkey anti-rat IgG (#705-065- 153, Jackson Immuno Research Labs). Secondary antibodies were detected using VECTASTAIN Elite ABC HRP Kit (Vector, Burlingame, CA). Sections for IHC were lightly counterstained with Hematoxylin QS or Methyl Green (Vector Labs). Routine hematoxylin and eosin staining of sections was performed by the Van Andel Research Institute, Grand Rapids, MI (VARI) Histology and Pathology Core. A VARI animal pathologist reviewed histological sections. 5.5.6 Cell sorting EPCAM-positive endometrial epithelial cells were purified as described in Chapter 2 (Wilson et al. 2019). Briefly, mouse uteri were surgically removed and minced using scissors, followed by digestion with the MACS Multi Tissue Dissociation Kit II (Miltenyi Biotec, Gladbach, Germany) for 80 minutes at 37 °C. Digested tissues were strained through a 40 μm nylon mesh (ThermoFisher), red blood cells were removed using the red cell lysis buffer (Miltenyi Biotec) and dead cells were removed by the MACS Dead Cell Removal Kit (Miltenyi Biotec). EPCAM- positive cells were purified using a PE-conjugated EPCAM antibody and anti-PE MicroBeads (Miltenyi Biotec). Purity of cell populations was confirmed using a BD Accuri C6 flow cytometer (BD Biosciences) and analysis was performed using FlowJo v10 software (BD Biosciences, San Jose, CA). 5.5.7 RNA isolation The Arcturus PicoPure RNA Isolation Kit (ThemoFisher) was used to purify RNA from in vivo EPCAM-sorted endometrial epithelial cells from 120-day-old LtfCre0/+; Brg1fl/fl mice (n = 4 biological replicates), and DNA was digested on-column using the RNAse-free DNAse set 187 (Qiagen). RNA samples were collected from 12Z cells 72 hours post siRNA transfection using the Quick-RNA Miniprep Kit (Zymo Research). 5.5.8 Construction and sequencing of directional mRNA-seq libraries Libraries were prepared by the Van Andel Genomics Core from 500 ng of total RNA using the KAPA mRNA HyperPrep kit (v4.17) (Kapa Biosystems), Wilmington, MA. RNA was sheared to 300-400 bp. Prior to PCR amplification, cDNA fragments were ligated to IDT for Illumina unique dual adapters (IDT DNA Inc, Coralville, IA). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies, Santa Clara, CA), QuantiFluor® dsDNA System (Promega, Madison, WI) and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled, and 50 bp, paired-end sequencing was performed on an Illumina NovaSeq 6000 sequencer using an S1, 100 cycle sequencing kit (Illumina, San Diego, CA). Each library was sequenced to an average raw depth of 20 M reads. Base calling was performed by Illumina RTA3 and NextSeq Control Software (NCS) output was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 5.5.9 RNA-seq analysis Raw paired-end reads were trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality analysis using FastQC (Andrews 2010) and MultiQC. Trimmed mouse reads were aligned to mm10 genome assembly and indexed to GENCODE (Harrow et al. 2012; Frankish et al. 2019) vM16 GFF3 annotation via STAR aligner with flag `--quantMode GeneCounts` for feature counting. Trimmed human reads were aligned to GRCh38.p12 reference genome (Schneider et al. 188 2017) and indexed to GENCODE v28. Output gene count files were constructed into an experimental read count matrix in R, where they were combined with respective model system raw counts from control and experimental RNA-seq data generated in Chapter 2, re-analyzed from GEO accession series GSE121198. Low count genes were filtered (1 count per sample on average) prior to normalization factor generation, dispersion estimation, negative binomial generalized linear model fitting and hypothesis testing in DESeq2 (Love, Huber, and Anders 2014) for DGE analysis. For linear modeling, the design matrix was constructed from a single ‘condition’ variable with the inclusion of an intercept, in the form: ∼ (,-./0/,-⁠. Wald probabilities were corrected for multiple testing by independent hypothesis weighting (IHW) (Ignatiadis et al. 2016) for downstream analysis. Significance threshold for differential gene expression was set at FDR < 0.0001 for 12Z cell studies and FDR < 0.05 for sorted mouse cell studies. RNA-seq expression heatmaps were generated using scaled regularized-logarithm (rlog) (Love, Huber, and Anders 2014) counts for visualization. 5.5.10 Chromatin immunoprecipitation Wild-type 12Z cells were treated with 1% formaldehyde in fresh DMEM/F12 media without supplements for 10 minutes at ambient temperature, and crosslinking was quenched with 0.125 M Glycine and incubation for 5 minutes at ambient temperature, followed by wash with PBS. 1 * 107 crosslinked cells were used per IP. Chromatin was fractionated by digestion of crosslinked cells with micrococcal nuclease using the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) per the manufacturers’ instructions, followed by 30 seconds of sonication using a Bioruptor Pico sonicator (Diagenode, Liège, Belgium). IP were performed in duplicate using the SimpleChIP Enzymatic Chromatin IP Kit per the manufacturers’ instructions with 1:100 anti-BRG1 antibody 189 (ab110641, Abcam). Crosslinks were reversed with 0.4 mg/ml Proteinase K (ThermoFisher) and 0.2 M NaCl at 65 °C for 2 hours. DNA was purified using the ChIP DNA Clean & Concentrator Kit (Zymo). 5.5.11 Construction and sequencing of ChIP-seq libraries Libraries for input and IP samples were prepared by the Van Andel Genomics Core from 10 ng of material using the KAPA Hyper Prep Kit (v5.16) (Kapa Biosystems). Prior to PCR amplification, end-repaired and A-tailed DNA fragments were ligated to IDT for Illumina UDI Adapters (IDT DNA Inc.). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies), QuantiFluor® dsDNA System (Promega) and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled and 100 bp, single-end sequencing was performed on an Illumina NovaSeq 6000 sequencer using an S1 sequencing kit (Illumina). Each library was sequenced to minimum read depth of 80 million reads per input library and 40 million reads per IP library. Base calling was performed by Illumina NCS v2.0, and NCS output was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 5.5.12 ChIP-seq analysis Raw single-end reads for IP and inputs were trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality control analysis via FastQC (Andrews 2010) and MultiQC (Ewels et al. 2016). Trimmed reads were aligned to GRCh38.p12 reference genome (Schneider et al. 2017) via Bowtie2 (Langmead and Salzberg 2012) with flag `--very-sensitive`. Aligned reads were sorted and indexed with samtools (Li et al. 2009). Picard MarkDuplicates 190 (http://broadinstitute.github.io/picard/) was used to remove PCR duplicates, followed by sorting and indexing. MACS2 was used to call broad peaks on each ChIP replicate against the input control with FDR < 0.05 threshold and otherwise default settings (Zhang et al. 2008). The resulting peaks were repeat-masked by the ENCODE blacklist and filtered for non-standard contigs (Amemiya, Kundaje, and Boyle 2019). A naïve overlapping peak set, as defined by ENCODE (Landt et al. 2012), was constructed by calling MACS2 peaks on pooled replicates followed by bedtools intersect (Quinlan and Hall 2010) to select for pooled peaks of at least 50% bp overlap with each biological replicate. 5.5.13 Western blotting Crude protein lysates were quantified using the Micro BCA Protein Assay Kit (ThermoFisher) and a FlexSystem3 plate reader (Molecular Devices). Protein lysates were run on a 4-15% gradient SDS-PAGE gel (Bio-Rad, Hercules, CA) and transferred to PVDF membrane using the TransBlot Turbo system (BioRad). Primary antibodies were used at the following dilutions: 1:1000 BRG1 (G-7) (sc-17 796, Santa Cruz, Dallas, TX) and 1:1000 β-Actin (8457, Cell Signaling). Secondary antibodies conjugated to horseradish peroxidase (Cell Signaling) were used at a dilution of 1:2000. Protein band visualization was performed using Clarity Western ECL Substrate (BioRad) and a ChemiDoc XRS+ imaging system (BioRad). Densitometry calculations were performed using ImageJ software (National Institutes of Health, Bethesda, MD). 5.5.14 Bioinformatics and statistics Various HOMER (Heinz et al. 2010) functions were applied to count reads at loci of interest, perform motif analysis on peak coordinates and annotate genomic regions, with a modification to 191 gene promoter classification as within 3 kb of a TSS. Motif logos generated by HOMER are scaled by information content. TxDb.Hsapiens.UCSC.hg38.knownGene (Bioconductor Core Team and Maintainer 2016) was used to define gene promoters for all standard hg38 genes as 3 kb regions surrounding the primary TSS. ATAC-seq data were analyzed as described in Chapters 2 and 3 (Wilson et al. 2019; Reske, Wilson, and Chandler 2020). MACS2 was used to produce genome- wide signal log-likelihood ratio tracks for IGV visualization (Zhang et al. 2008; Robinson et al. 2011). Broad GSEA (Subramanian et al. 2005) was performed via GenePattern (Reich et al. 2006) on DESeq2 normalized counts, which were converted to human orthologs in the case of mouse data. clusterProfiler was used to compute and visualize pathway and gene set enrichment from a list of gene symbols compared to the respective gene universe (Yu et al. 2012). Hallmark pathways, GO BPs and GTRD TF target gene sets were retrieved from MSigDB (v7.1) (Liberzon et al. 2015), and MSigDB GSEA ‘Compute Overlaps’ function was used for (Fig. 5.3b and D.2) enrichment analyses and visualization (Liberzon et al. 2015; Ashburner et al. 2000; The Gene Ontology Consortium 2019; Yevshin et al. 2019). GeneHancer database (Fishilevich et al. 2017) was used to link distal open chromatin regions to regulation of specific genes, and only GeneHancer associations above an arbitrary score threshold of 1 were considered. ComplexHeatmap was used for hierarchical clustering by Euclidean distance and general heatmap visualization (Gu, Eils, and Schlesner 2016). GenomicRanges functions were used to intersect and manipulate genomic coordinates (Lawrence et al. 2013). eulerr was used to produce proportional Euler diagrams (Larsson 2020). biomaRt was used for all gene nomenclature and ortholog conversions (Durinck et al. 2005; Durinck et al. 2009; Smedley et al. 2009). ggplot2 was used for certain plotting applications (Wickham 2016). The cumulative hypergeometric distribution was used for manual 192 enrichment tests. The statistical computing language R was used for many applications throughout this study (R Core Team 2018). 5.6 Data availability RNA-seq and ChIP-seq data generated in this study were deposited into Gene Expression Omnibus (GEO) at accession series GSE152663. Wild-type 12Z cell ARID1A ChIP-seq data (n = 2), non- targeting siRNA control-treated 12Z cell ATAC-seq data (n = 2), siARID1A and non-targeting siRNA control-treated 12Z cell RNA-seq data (n = 3), and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (n = 4) and control (n = 3) GEMM sorted EPCAM-positive endometrial epithelial cell RNA-seq data generated and published in Chapter 2 were all retrieved and reanalyzed from GEO accession series GSE121198. SMARCA4 somatic mutation data from the TCGA-UCEC Pan-Cancer cohort were retrieved from cBioPortal (Hoadley et al. 2018; Cancer Genome Atlas Research Network et al. 2013; Gao et al. 2013). 5.7 Acknowledgments We thank Dr. John Risinger for helpful discussions. The Brg1fl allele was a generous gift from Dr. Terry Magnuson. We thank the Van Andel Research Institute Genomics Core for providing library construction and sequencing facilities and services. We thank the Van Andel Research Institute Histology and Pathology Core for histology services and Dr. Galen Hostetter for his assistance with pathology. 193 CHAPTER 6 ARID1A MUTATIONS PROMOTE P300-DEPENDENT ENDOMETRIAL INVASION THROUGH SUPER-ENHANCER HYPERACETYLATION A modified version of this chapter was previously published (Wilson et al. 2020): Mike R. Wilson*, Jake J. Reske*, Jeanne Holladay, Subechhya Neupane, Julie Ngo, Nina Cuthrell, Marc Wegener, Mary Rhodes, Marie Adams, Rachael Sheridan, Galen Hostetter, Fahad T. Alotaibi, Paul J. Yong, Michael S. Anglesio, Bruce A. Lessey, Richard E. Leach, Stacy A. Missmer, Jose M. Teixeira, Asgerally T. Fazleabas, and Ronald L. Chandler. 2020. ARID1A mutations promote P300-dependent endometrial invasion through super-enhancer hyperacetylation. Cell Reports 33(6): 108366. *These authors contributed equally. 6.1 Abstract ARID1A loss in the endometrial epithelium is associated with invasive chromatin reprogramming through disrupted local chromatin interactions. Here, to identify epigenetic dependencies driving ARID1A mutant invasion, we use an unbiased approach to map chromatin state transitions accompanying ARID1A loss in the endometrium. We show that super-enhancers marked by high H3K27 acetylation are strongly associated with ARID1A binding. ARID1A loss leads to H3K27 hyperacetylation and increased chromatin accessibility and enhancer RNA transcription at super-enhancers, but not typical enhancers, indicating that ARID1A normally prevents super-enhancer hyperactivation. ARID1A co-localizes with P300 at super-enhancers, and genetic or pharmacological inhibition of P300 in ARID1A mutant endometrial epithelia suppresses invasion and induces anoikis through the rescue of super-enhancer hyperacetylation. Among hyperactivated super-enhancers, SERPINE1 (PAI-1) is identified as an essential target gene 194 driving ARID1A mutant endometrial invasion. Broadly, our findings provide rationale for therapeutic strategies targeting super-enhancers in ARID1A mutant endometrium. 6.2 Introduction A hallmark pathological feature of ARID1A mutant endometrial epithelia, as observed in mouse and cell-based models, is cellular invasion into nearby tissue. Studies in previous chapters have revealed that ARID1A normally suppresses transcriptional activation at mesenchymal genes that promote cellular migration and invasion (Wilson et al. 2019). This is achieved at least in part by ARID1A-SWI/SNF chromatin interactions near already active, accessible gene promoters (Wilson et al. 2019). Additionally, ARID1A-SWI/SNF interactions with distal chromatin also influence gene expression across large genomic distances (Reske et al. 2020). However, it remains unclear how ARID1A regulates chromatin activity through biochemical functions that alter gene expression both proximately and distally. Further, the chromatin logic dictating active vs. repressive regulation by ARID1A has not yet been determined. Evidently, a major result of Chapter 5 chromatin immunoprecipitation studies was that the BRG1 catalytic subunit is most directly involved in gene activation, while ARID1A is most directly involved in gene repression (Reske et al. 2020), despite both subunits often functioning within the same complex. These data imply that SWI/SNF subunit composition and likely other local chromatin features probably cooperatively govern complex functional specificity. In eukaryotes, transcriptional activity is regulated both in cis and trans through varied chromatin mechanisms (Li, Carey, and Workman 2007). In addition to core transcriptional machinery and generalized transcription factors, these include: genomic sequence and DNA modifications e.g. 5-methylcytosine (Moore, Le, and Fan 2013); sequence-specific transcription 195 factors that can recruit or block co-regulators toward certain motifs (Lambert et al. 2018); factors that control transcription initiation, elongation, and termination by RNA polymerase (Harlen and Churchman 2017); 3D genome organization, enhancer-promoter contacts, and supporting structural factors (Gorkin, Leung, and Ren 2014); chromatin remodeling and nucleosome composition and spatial arrangement altering access to genomic DNA (Clapier and Cairns 2009); and transient or stable post-translational modifications to histone tails, their effects on nucleosome dynamics, and the factors that write, read, and erase these signals (Bowman and Poirier 2015). The combinatorial actions of such regulatory mechanisms, often functioning flexibly toward redundant outcomes (Spivakov 2014), confer transcriptional homeostasis and intricate response to stimuli. Histone post-translational modifications and variants play a pivotal role in properly coordinating transcription as well as DNA repair and replication (Bannister and Kouzarides 2011; Venkatesh and Workman 2015; van Attikum and Gasser 2005; Alabert and Groth 2012). Histone proteins contain positively charged, disordered tails with overrepresented lysine and arginine residues (Luger and Richmond 1998). Covalent modifications to histone tail residues—such as lysine acetylation, arginine methylation, and serine phosphorylation, among others—can assist or impede nucleosome stability, higher-order chromatin structure, and the ability for chromatin factors like remodelers to act upon nucleosome substrates. Chromatin remodelers contain subunit domains that interact with specific forms of modified or unmodified histone substrates (Clapier and Cairns 2009). For example, SWI/SNF contains a bromodomain in the catalytic subunit that recognizes acetylated H3 residues, and this interaction stimulates remodeler activity (Hassan, Neely, and Workman 2001; Hassan, Awad, and Prochasson 2006). While previous efforts have described differing functional activities associated with mammalian SWI/SNF complex subunit architecture—e.g. transcriptional regulatory differences between ARID1A vs. ARID1B vs. 196 ARID2 containing complexes (Raab, Resnick, and Magnuson 2015)—the chromatin and biochemical mechanisms governing such regulatory nuances have not been well characterized. In this chapter, we aimed to determine chromatin features associated with ARID1A genomic regulation and discern the epigenomic consequences of ARID1A loss that lead to altered gene expression and endometrial pathophysiology. 6.3 Results 6.3.1 ARID1A co-localizes with H3K27ac and is associated with super-enhancers Although ARID1A is mutated in several disorders of the endometrial epithelium, little is known about how ARID1A loss alters the epigenomic landscape in these cells. Here, in an unbiased approach, we examined chromatin features from both control and ARID1A-depleted cells and built a genome-wide segmentation model of distinct chromatin states via ChromHMM (Ernst and Kellis 2012). We profiled several post-translational histone modifications by ChIP-seq, including H3K4me1, H3K4me3, H3K27me3, H3K27ac, and H3K18ac, following ARID1A- depletion in 12Z human endometrial epithelial cells (Zeitvogel, Baumann, and Starzinski-Powitz 2001). H3K4me1 marks active and inactive/poised distal regulatory elements (Heintzman et al. 2007; Creyghton et al. 2010); H3K4me3 marks actively transcribed gene promoter regions (Heintzman et al. 2007); H3K27me3 marks transcriptional silencing and heterochromatin (Kirmizis et al. 2004; Saksouk, Simboeck, and Dejardin 2015), or poised chromatin when in combination with H3K4me3 i.e. bivalent (Voigt, Tee, and Reinberg 2013); H3K27ac marks active regulatory elements including both promoters and enhancers (Creyghton et al. 2010; Wang, Zang, et al. 2008); H3K18ac also marks active promoter and enhancer elements (Wang, Zang, et al. 2008). These data were used in conjunction with ATAC-seq and total RNA-seq datasets from 197 control and ARID1A-depleted 12Z cells generated in Chapter 2 (Wilson et al. 2019). Combinations of these chromatin features can be used to functionally characterize genomic regions with kilobase resolution (Fig. E.1), which allowed us to build a comprehensive model of chromatin state transitions accompanying ARID1A loss (Fig. 6.1a-b). A series of genomic feature enrichment tests allowed us to annotate the predicted biological function of each of the 18 chromatin states, including 8 distinct classes of enhancer elements segregated by combinatorial chromatin features (Fig. 6.1c-f). Using ARID1A ChIP-seq data from 12Z cells generated in Chapter 2 (Wilson et al. 2019), we observed that ARID1A binding is most strongly associated with highly active regulatory elements marked by H3K27ac, including super-enhancer chromatin states (S11-S13) and other highly active enhancer states (S14) (Fig. 6.1g-h). Super-enhancers are large enhancer clusters marked by abundance of certain active chromatin features, like P300 and H3K27ac, that control the transcription of genes involved in cellular identity and thus play key roles in developmental and disease processes (Loven et al. 2013; Whyte et al. 2013; Parker et al. 2013). Enhancers are characterized by abundant H3K27ac and accessible chromatin (Calo and Wysocka 2013). We used both H3K27ac ChIP-seq and ATAC- seq data to identify 18,050 putative active enhancers (Fig. E.2a). From this set of active enhancers, we used the Rank Ordering of Super-Enhancers (ROSE) algorithm (Loven et al. 2013; Whyte et al. 2013) to identify active super-enhancers and observed 413 unique super-enhancers that contained 1430 constituent regions marked by H3K27ac and ATAC (Fig. E.2a). Active distal enhancer regions (located further than 3 kb from a transcription start site [TSS]) not categorized as super-enhancers were designated as typical enhancers (TEs) (n = 16,620) (Fig. E.2a). We observed greater H3K27ac signal at super-enhancer peaks relative to typical enhancer peaks (Fig. E.2b). Super-enhancers comprised three H3K27ac peaks on average (Fig. E.2c). ARID1A 198 associated with the majority of both super-enhancers and typical enhancers, but ARID1A was bound to a higher proportion of super-enhancers than typical enhancers (Fig. 6.1i), suggesting a role for ARID1A in the regulation of active super-enhancers. Figure 6.1 ARID1A is associated with highly active regulatory elements marked by H3K27ac 199 Figure 6.1 (cont’d) a, Chromatin state model generated by ChromHMM. A total of 18 states were identified through genomic profiling of 7 chromatin features in control and ARID1A knockdown 12Z cells: total RNA-seq, ATAC-seq, and H3K27me3, H3K4me3, H3K4me1, H3K18ac, and H3K27ac ChIP-seq. Genome was segmented into 200-bp intervals based on state classifications. Darker heatmap colors indicate higher relative enrichment for each chromatin feature in that state. Right-side labels are inferred biological functions of each state based on combinatorial chromatin features and genome ontology annotation. b, Heatmap displaying chromatin state adjacency frequencies (how often two chromatin states neighbor each other). Darker color indicates more frequent state neighboring. c, Percentage of genome coverage for each chromatin state. d, Total RNA quantification of each chromatin state as reads per kilobase per million mapped reads (RPKM) per 200-bp genomic interval. Left, linear scale; right, log2 scale. e, Percentage of genome coverage per chromatin state for all other measured chromatin features. The statistic is hypergeometric enrichment compared to whole genome. f, Percentage of genome coverage per chromatin state for other genomic features. Active super- enhancers and typical enhancers are distal H3K27ac peaks marked by ATAC, as defined in Fig. E.2a. Statistic is hypergeometric enrichment. g, Percentage of genome coverage per chromatin state for ARID1A binding. Statistic is hypergeometric enrichment. h, Genome-wide association between ARID1A binding and profiled histone modifications. Enrichments are displayed as fold-enrichment, per genomic base pair. The statistic is hypergeometric enrichment. Pairwise enrichment statistics computed by the chi-square test. i, Association between ARID1A binding and typical enhancers versus super-enhancers, per H3K27ac peak, as defined in Fig. E.2a. Statistic is two-tailed Fisher’s exact test. *** p < 0.001. 200 6.3.2 ARID1A prevents super-enhancer hyperacetylation To further understand the role of ARID1A in chromatin regulation, we analyzed the effects of ARID1A depletion on chromatin state classification and the abundance of histone modifications. ChromHMM modeling revealed that most chromatin states do not display substantial reprogramming following ARID1A loss (Fig. 6.2a). Super-enhancers and other enhancer states bound by ARID1A typically did not change state, although some highly active enhancers (S14) gained further activation characteristics (S14 à S13, S14 à S12), while others lost active marks (S14 à S15) (Fig. 6.2b). Among the histone modifications tested, H3K27ac displayed the greatest proportion of differentially regulated sites following ARID1A loss (Fig. 6.2c and E.2d-h). Interestingly, ARID1A loss did not affect H3K27me3 occupancy genome-wide (Fig. E.2f), even though SWI/SNF is known to antagonize polycomb chromatin silencing in other cellular contexts (Bracken, Brien, and Verrijzer 2019). Next, we examined the H3K27ac changes occurring in ARID1A-deficient cells. The majority of differential H3K27ac sites were found among distal elements and, among those sites, we observed decreased acetylation following ARID1A loss (Fig. 6.2d). Furthermore, most H3K27ac changes occurred at super-enhancers and highly active enhancer chromatin states where ARID1A is bound (Fig. 6.2e-g). Sites that gained H3K27ac following ARID1A loss tended to become super-enhancer states (S11-S13) (Fig. 6.2f), while sites that lost H3K27ac tended to transition from super-enhancer to other enhancer states (S14-S18) (Fig. 6.2g). Consistently, H3K27ac sites at promoters (within 3 kb of a TSS) were less likely to be affected by ARID1A loss than distal intergenic and intronic elements (Fig. 6.2h and E.2i-l), and super-enhancers were marginally more likely to show changes in H3K27ac than typical enhancers (Fig. 6.2i). However, while most active typical enhancers displayed decreased H3K27ac, most active super-enhancers 201 displayed increased H3K27ac following ARID1A loss (Fig. 6.2j), suggesting a specific role for ARID1A in preventing H3K27ac hyperacetylation at super-enhancers. Among the 413 active super-enhancers, 74.1% displayed differential H3K27ac at one or more sites following ARID1A loss (Fig. 6.2k-l). H3K27ac was increased at 360 peaks within active super-enhancers following ARID1A loss (Fig. 6.2m). Compared to typical enhancers, super-enhancers also displayed a greater proportion of sites with increased chromatin accessibility upon ARID1A loss (Fig. 6.2n). Enhancer RNA (eRNA) transcription promotes enhancer activity through enhancer- promoter communication and chromatin looping, and eRNA is associated with super-enhancers. We explored the role of eRNA at ARID1A-regulated enhancers as a marker of activity, and, among 3668 intergenic enhancers with detectable eRNA expression, we observed 157 differentially expressed eRNAs upon ARID1A loss (Fig. 6.2o). Among these, ARID1A binding was stronger at sites with upregulated eRNA following ARID1A loss (Fig. 6.2p). Furthermore, upregulated eRNAs were associated with increased H3K27ac (Fig. 6.2q-r). Collectively, these data support a role for ARID1A in restricting super-enhancer activity, such that ARID1A loss results in H3K27 hyperacetylation, increased chromatin accessibility, and eRNA expression. 202 Figure 6.2 ARID1A prevents H3K27-hyperacetylation at super-enhancers 203 Figure 6.2 (cont’d) a, Map of chromatin state changes following ARID1A loss. For each state-state change, circle size depicts the relative amount of that state change compared to the initial genome-wide state representation ([genomic bp initial à final] / [genomic bp initial]), and color indicates the proportion bound by ARID1A. b, Scatterplot of the two features quantified in (A) for each state-state change. Each dot representing a state-state change is further colored by its initial state class: S1–S5, “else”; S6–S10, “promoter”; S11–S13, “SE” i.e. super-enhancer; S14–S18, “enhancer.” Prominent state-state changes are labeled as [initial] > [final]. c, Proportion of genome-wide regions displaying significant (FDR < 0.05) differential abundance following ARID1A loss for each histone modification. Tested regions are the union of replicate- overlapping peak sets per assay. Pairwise statistic is two-tailed Fisher’s exact test. d, Gene proximity and directionality of significant differential H3K27ac sites (FDR < 0.05, n = 8,314). e, Genomic enrichment for (left) increasing H3K27ac or (right) decreasing H3K27ac following ARID1A loss at each chromatin state compared to the whole genome. Statistic is hypergeometric enrichment. f, Map of chromatin state changes as in a, but overlaid color feature is the proportion of state-state base pairs displaying increasing H3K27ac. g, Map of chromatin state changes as in f, but for decreasing H3K27ac. h, Distribution of genomic features of all tested H3K27ac regions compared to differential (total, increasing, or decreasing). Statistic is the Chi-squared test. i, Enrichment of differential H3K27ac among promoters, typical enhancers (TE) or super- enhancers (SE), compared to all tested H3K27ac regions. Statistic is hypergeometric enrichment and pairwise two-tailed Fisher’s exact test. j, Proportion of increasing versus decreasing H3K27ac at significant differential regions binned by promoter, SE, and TE, compared to all differential regions. Statistic is hypergeometric enrichment. k, Percentage of active SE (n = 413) with at least 1 H3K27ac peak displaying differential H3K27ac upon ARID1A loss. l, Number of differential H3K27ac regions per SE depicted as a boxplot in the style of Tukey (top) or a histogram (bottom). The median number of differentially acetylated regions per SE is 1. m, Signal heatmap at distal H3K27ac peaks located within super-enhancers, segregated by differential H3K27ac status: increasing (n = 360), decreasing (n = 188), or stable (n = 880). Each peak subset is ranked by H3K27ac signal in the control cells. Delta corresponds to H3K27ac log2 fold change (log2FC) from shARID1A versus control: red values, increased H3K27ac; blue, decreased H3K27ac. n, Proportion of increasing vs. decreasing differential ATAC regions located within super- enhancers and typical enhancers following ARID1A loss. Statistic is two-tailed Fisher’s exact test. o, Volcano plot displaying DE intergenic eRNA (n = 3668) following ARID1A loss. Intergenic eRNA regions were selected from the 18,050 distal ATAC + H3K27ac peaks (Fig. E.2a), which did not overlap gene bodies and had detectable RNA. The x-axis is eRNA log2FC expression upon ARID1A loss; the y-axis is DE significance. Significant (p < 0.05) DE eRNA marked in red. Pie chart displays the ratio of intergenic eRNA significantly increasing or decreasing expression upon ARID1A loss. 204 Figure 6.2 (cont’d) p, ARID1A binding at intergenic enhancer sites with decreasing (n = 83) or increasing (n = 74) eRNA expression. Statistic is the two-tailed, unpaired Wilcoxon test. q, Change (log2FC) in H3K27ac abundance at intergenic sites of increasing (n = 74) or decreasing (n = 83) eRNA expression following ARID1A loss. Statistic is two-tailed, unpaired Wilcoxon test. r, Change (log2FC) in eRNA expression at intergenic enhancer sites, with increasing (n = 577) or decreasing (n = 686) H3K27ac upon ARID1A loss. Statistic is two-tailed, unpaired Wilcoxon test. * p < 0.05, ** p < 0.01, and *** p < 0.001. 6.3.3 ARID1A and P300 co-occupy highly active super-enhancers Having observed a role for ARID1A in preventing H3K27ac at the majority of super- enhancers, we next asked whether ARID1A is associated with P300, a histone acetyltransferase (HAT) that acetylates H3K27 and H3K18 residues (Jin et al. 2011; Schiltz et al. 1999) and has known roles at super-enhancers (Pott and Lieb 2015; Hnisz et al. 2013). We used the Enrichr tool to screen ENCODE (Encyclopedia of DNA Elements) ChIP-seq datasets for factors with overlapping sets of target genes (Chen, Tan, et al. 2013; Kuleshov et al. 2016) and identified P300 as the top factor likely to co-regulate DE genes following endometrial ARID1A loss (Fig. E.3a-b). We performed P300 ChIP-seq in wild-type 12Z cells and identified 25,096 P300 binding sites throughout the genome, enriched within several chromatin states (Fig. 6.3a-b). Intriguingly, P300 is more associated with active TSS (S10) than ARID1A (Fig. 6.3b, compared to Fig. 6.1g), and P300 binding was enriched and co-bound with ARID1A at promoters (Fig. E.3c-m). Known roles for P300 in enhancer regulation (Long, Prescott, and Wysocka 2016) led us to study ARID1A and P300 co-regulation at distal sites. We observed 2,609 distal sites with both ARID1A and P300 binding (Fig. 6.3c). Chromatin accessibility marks the sites of regulatory activity (Kornberg and Lorch 1992), and ARID1A is associated with open chromatin states (Kelso et al. 2017; Wilson et al. 2019). Among P300-bound, accessible sites, ARID1A is associated more with the co-regulation of distal sites than promoters (Fig. 6.3d). 205 Chromatin remodeling enzymes regulate both the recruitment and catalytic activity of the histone modifying enzymes (Clapier and Cairns 2009; Swygert and Peterson 2014). Given the changes in H3K27ac in ARID1A-deficient cells, we tested whether P300 localization was affected by ARID1A loss using ChIP-seq. We observed no change in P300 binding following ARID1A loss at >99% of sites (Fig. 6.3e), suggesting that ARID1A loss does not greatly affect P300 recruitment. We then explored the role of ARID1A and P300 co-localization at enhancers. Among the 18,050 putative active enhancers, the majority were bound predominantly by ARID1A without P300 (Fig. 6.3f-g). However, ARID1A-P300 co-bound enhancers displayed greater H3K27ac peak signal and broader H3K27ac peak distribution (Fig. 6.3h-i). Among enhancers that display differential H3K27ac, ARID1A was again bound without P300 at the majority of sites (Fig. 6.3j), although the enrichment of ARID1A with or without P300 at enhancers with differential H3K27ac was not significantly different (Fig. 6.3k). We next considered the role of P300 binding and ARID1A co-regulation at super-enhancers and found that P300 binding was observed at a greater proportion of distal super-enhancer peaks than typical enhancer peaks (Fig. 6.3l-m). Among P300- bound sites, differential H3K27ac following ARID1A loss was more frequently observed at super- enhancers than typical enhancers (Fig. 6.3n). Furthermore, among P300 bound sites with differential H3K27ac, a greater number of super-enhancers than typical enhancers displayed increased H3K27ac (Fig. 6.3o). Lastly, we compared ARID1A, P300, and H3K27ac levels at super-enhancers versus typical enhancers that are either P300 bound or not bound. At enhancers where P300 is bound, P300 binding is strongest at typical enhancers compared to super-enhancers (Fig. 6.3p). However, ARID1A binding signal is stronger at P300-bound super-enhancers (Fig. 6.3q), where the H3K27ac signal is highest, compared to typical enhancers (Fig. 6.3r). These 206 results collectively suggest that ARID1A normally co-regulates H3K27ac with P300 at super- enhancers in the endometrium. Figure 6.3 P300 and ARID1A co-regulate H3K27ac at highly active super-enhancers 207 Figure 6.3 (cont’d) a, Genomic annotation of replicate-overlapping P300 ChIP broad peaks in wild-type 12Z cells (FDR < 0.05, n = 25,096 peaks). b, Enrichment for P300 binding (control cells) among chromatin states compared to whole genome. Statistic is hypergeometric enrichment. c, Proportional Euler diagram displaying overlap between distal regions bound by ARID1A (n = 42,165) and P300 (wild-type cells, n = 17,812). d, ARID1A binding among accessible P300-bound sites. P300 bound sites (wild-type cells) were first segregated by promoter versus distal status, then filtered for accessibility (ATAC). Statistic is two-tailed Fisher’s exact test. e, Differential P300 ChIP-seq following ARID1A loss. At left is an MA plot revealing differential binding, with significant sites (FDR < 0.05) highlighted in red. The x-axis is signal abundance quantified as log2 counts per million (log2CPM), and the y-axis is the log2FC difference of P300 binding in shARID1A versus control conditions (n = 2 ChIP replicates). At right is the ratio of tested sites binned by differential binding significance. Further analyses of P300 binding use the control condition data (f–r). f, Signal heatmap displaying chromatin accessibility (ATAC), H3K27ac, and binding of ARID1A and P300 at enhancers (n = 18,050), centered on H3K27ac peak ±3 kb. Enhancers were ranked by total H3K27ac signal. g, Proportion of active enhancers (n = 18,050) bound by ARID1A, P300, both, or neither. h, H3K27ac ChIP peak signal (fold enrichment, FE) relative to input at active enhancers segregated by ARID1A and P300 binding. Statistic is two-tailed, unpaired Wilcoxon test. i, H3K27ac ChIP peak width at active enhancers segregated by ARID1A and P300 binding. Statistic is two-tailed, unpaired Wilcoxon test. j, Ratio of enhancers (n = 18,050) displaying differential H3K27ac following ARID1A loss (left), and further segregation by ARID1A and P300 binding status (n = 4681) (right). k, Proportion of differential H3K27ac regions among enhancers bound by ARID1A, P300, both, or neither. Statistic is two-tailed Fisher’s exact test. l, P300 ChIP signal at distal super-enhancer and typical enhancer H3K27ac peaks. The x-axis is the distance to the H3K27ac peak center. The y-axis is signal as ChIP - Input RPM per base pair per peak. m, Proportion of distal super-enhancer and typical enhancer H3K27ac peaks bound by P300. Statistic is two-tailed Fisher’s exact test. n, Proportion of P300-bound super-enhancer and typical enhancer regions displaying differential H3K27ac upon ARID1A loss. Statistic is two-tailed Fisher’s exact test. o, Proportion of increasing versus decreasing H3K27ac at differential super-enhancer and typical enhancer regions bound by P300. Statistic is two-tailed Fisher’s exact test. p-r, Violin plots (left) of ChIP signal for P300 (p), ARID1A (q), and H3K27ac (r) at distal H3K27ac peaks in super-enhancer and typical enhancer regions further binned by P300 binding. Peak n’s from left to right: 415, 1015, 3508, and 13,112. Statistic is two-tailed, unpaired Wilcoxon test. Meta peak profiles (right) for P300 (p), ARID1A (q), and H3K27ac (r) at P300-bound super- enhancer (entire super-enhancer region, n = 329) and P300-bound typical enhancer (n = 3508). * p < 0.05, ** p < 0.01, and *** p < 0.001. 208 6.3.4 P300 histone acetyltransferase activity is required for ARID1A mutant cell invasion ARID1A loss in the endometrial epithelium leads to collective invasion when combined with an oncogenic PIK3CA mutation (Wilson et al. 2019). To explore the functional relationship between ARID1A and P300, we used small interfering RNAs (siRNAs) targeting P300 (siP300), ARID1A (siARID1A), or non-targeting siRNAs (control). Knockdown of ARID1A and/or P300 in 12Z cells (Fig. 6.4a) had no effect on cell growth or proliferation (Fig. E.4a-b). ARID1A loss increased cell invasion, and P300 loss alone had no effect, but co-knockdown of ARID1A and P300 completely rescued ARID1A mutant cell invasion (Fig. 6.4b). Invasion was not observed in 12Z treated with broad-spectrum histone deacetylase inhibitors, suggesting that invasion does not depend solely on a global increase in histone acetylation (Fig. E.4c-d). These results demonstrate an essential role for P300 in driving invasive phenotypes in ARID1A mutant endometriotic cells. To determine whether P300 loss rescues the invasive phenotype in vivo, we crossed Ep300 conditional knockout mice (Kasper et al. 2006) with our LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl model, resulting in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl (Fig. E.4e-f). LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice displayed an increased survival compared to LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice (Fig. 6.4c). LtfCre0/+; Ep300fl/fl mice displayed no phenotype (Fig. 4C and E.4g). P300 expression was lost in the endometrial epithelium of LtfCre0/+; Ep300fl/fl and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice by immunohistochemistry (IHC) (Fig. 6.4d). Increased expression of the apoptotic marker cleaved caspase 3 was observed in the endometrial epithelium of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice, but not LtfCre0/+; Ep300fl/fl mice, indicating a specific effect of P300 loss on ARID1A and PIK3CA mutant endometrium (Fig. 6.4d). The epithelial layer in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; 209 Ep300fl/fl mice appeared to desquamate from the endometrial stroma, and cleaved caspase 3+, desquamated epithelial cells were observed throughout the lumen of the uterus (Fig. E.4h). P300 loss suppressed the proliferation occurring in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl epithelia (Fig. E.4i). While LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrial epithelia invade the myometrium, the presence of endometrial glands in the myometrium was not observed in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice. Similar to ARID1A-deficient 12Z cells, we observed the loss of H3K27ac, but not H3K18ac, in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice, suggesting that P300 loss leads to a specific reduction of H3K27ac in endometrial epithelial cells (Fig. 6.4d-e). These results implicate P300 HAT activity in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl lesion development. We next explored the role of P300 HAT activity in promoting ARID1A mutant phenotypes. A-485 is a small-molecule P300/CREB-binding protein (CBP) HAT inhibitor (Lasko et al. 2017; Weinert et al. 2018). We tested the efficacy of A-485 in 12Z cells and observed a dose-dependent reduction in H3K27ac with significant inhibition at 316 nM (Fig. 6.4f). In both ARID1A-deficient and wild-type states, we observed a limited effect of A-485 on cell growth and viability (Fig. 6.4g and E.5a-d). These results suggest that A-485 treatment results in the inhibition of P300 HAT activity at low concentrations without an effect on cell health. Next, we tested the efficacy of A-485 in inhibiting P300-dependent, ARID1A mutant invasive phenotypes. We observed a significant reduction in ARID1A mutant invasion at concentrations that did not inhibit cell growth, with significant decreases in invasion at 10 nM A- 485 and a complete rescue of the phenotype at 100 nM A-485 (Fig. 6.4h and E.5e), while the migration phenotype was inhibited at 31 nM and completely rescued at 316 nM (Fig. E.5f). Since apoptosis is induced in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice, we considered 210 that P300 HAT inhibition may lead to anchorage-dependent cell death or anoikis (Paoli, Giannoni, and Chiarugi 2013). We tested whether A-485 induces anoikis under non-adherent conditions, and we observed increased caspase 3/7 activity in ARID1A-deficient cells following A-485 treatment, suggesting that A-485 induces anoikis (Fig. 6.4i). Furthermore, we observed an increase in cell death in ARID1A-deficient cells embedded in Matrigel following A-485 treatment (Fig. 6.4j). In mice, coexisting mutations in ARID1A and PIK3CA are required for lesion formation (Wilson et al. 2019), so we wanted to determine whether the effect of A-485 in ARID1A-deficient cells was modulated by phosphatidylinositol 3-kinase (PI3K) activation. In cells with an overexpression of PIK3CAH1047R and ARID1A loss, we observed a similar inhibition of invasion and migration and an induction of anoikis (Fig. E.6). These results suggest that the inhibition of P300 HAT activity via low-dose A-485 treatment blocks invasion and promotes anoikis of ARID1A-deficient endometriotic cells. 211 Figure 6.4 P300 promotes invasion and survival of ARID1A mutant endometrial epithelia 212 Figure 6.4 (cont’d) a, Western blot analysis as indicated in 12Z cells, representative of 2 independent experiments. b, Invasion of 12Z following indicated treatments. Representative images and total invaded cell numbers are shown (scale bar, 500 μm). Means ± SDs, n = 3. Statistic is unpaired, two-tailed t- test. c, Survival of mice based on time until vaginal bleeding. LtfCre0/+; (Gt)R26Pik3ca∗H1047R; Arid1afl/fl (n = 16) median (μ1/2) 107 days. LtfCre0/+; (Gt)R26Pik3ca∗H1047R; Arid1afl/fl; Ep300fl/fl (n = 12) median 143 days (p = 0.0018, Mantel-Cox test). LtfCre0/+; Ep300fl/fl mice were aged to 187 days, and no phenotypes were observed (n = 6). d, Histology and IHC using indicated antibodies (n ≥ 2 mice) in endometrium (scale bar, 200 μm). KRT8 was a positive control for endometrial epithelium. The arrowheads indicate epithelia. e, Quantification of H3K27ac and H3K18ac IHC, ratio of H-scores of epithelia to stroma. Means ± SDs, n = 4-8 mice. Statistic is unpaired, two-tailed t-test. f, Western blot of H3K27ac following A-485 treatment of 12Z for 24 hours and densitometry of H3K27ac relative to H3, normalized to control (vehicle). Means ± SDs, n = 3-5 independent replicates per condition. Unpaired, two-tailed t-tests were performed in comparison to the vehicle treatment condition. g, Viability assay for cells treated with A-485, normalized cell counts relative to vehicle control. Raw data are presented in Fig. E.5c. Half-maximal inhibitory concentration (IC50) values were not significantly different between 12Z untreated and control shRNA, or between control shRNA and shARID1A (unpaired, two-tailed t-test). Means ± SDs, n = 4. h, Invasion of 12Z following indicated cell treatments. Representative images and total invaded cell numbers are shown (scale bar, 500 μm). Means ± SDs, n = 4. Unpaired, two-tailed t-tests performed in comparison to siARID1A + vehicle. i, Caspase 3/7 activity of indicated cell treatments under non-adherent culture conditions. Means ± SDs, n = 3. Statistic is unpaired, two-tailed t-test. j, Ratio of dead to live cells after 16 hours in Matrigel. Means ± SDs, n = 6. Statistic is unpaired, two-tailed t-test. * p < 0.05, ** p < 0.01, and *** p < 0.001. 213 6.3.5 P300 HAT inhibition reverses H3K27 hyperacetylation at a subset of super- enhancers in ARID1A-deficient endometrial cells To explore how ARID1A and P300 co-regulate H3K27ac, we used the targeted genome profiling approach cleavage under targets and release using nuclease (CUT&RUN) (Skene, Henikoff, and Henikoff 2018). H3K27ac CUT&RUN showed significant overlap with H3K27ac ChIP-seq (Fig. 6.5a). To determine the effects of P300 loss or HAT inhibition on H3K27ac in ARID1A-deficient cells, we next compared differential H3K27ac among 12Z cells treated with siARID1A versus control, and we also compared cells co-treated with siARID1A + siP300 or 1 μM A-485 versus siARID1A alone (Fig. 6.5b). Notably, the genome-wide effects of 1 μM A-485 on H3K27ac in siARID1A cells highly overlapped with siP300, validating that A-485 affects P300 targets. We identified 6521 regions of H3K27ac that were affected by ARID1A loss and further affected by P300 loss or A-485 treatment (Fig. 6.5c). Among these 6521 intersecting regions, the majority of H3K27ac sites showed an additive increase or decrease in H3K27ac with combination treatments: decreased acetylation following ARID1A loss and further decreases with P300 loss or inhibition (n = 3005) or increased acetylation following ARID1A loss and further increases with P300 loss or inhibition (n = 1455) (Fig. 6.5d-e). However, a subset of sites displayed increased H3K27ac following ARID1A loss, which was rescued by further P300 loss or A-485 treatment (“gain reversal,” n = 1132) (Fig. 6.5d-e). Interestingly, the gain reversal sites had the lowest levels of H3K27ac in control cells compared to other groups (Fig. 6.5f), suggesting ARID1A normally limits acetylation at these sites. Furthermore, a large proportion of gain reversal regions are bound by ARID1A, while “acetylation gain” sites were infrequently bound by ARID1A (Fig. 6.5g). This was further supported by genomic annotation showing that gain reversal sites were found at intergenic regions and introns, were enriched for super-enhancers and other highly active enhancer 214 chromatin states and contained the highest proportion of active super-enhancer regions (Fig. 6.5h- k), suggesting that gain reversal sites contain super-enhancer elements at which ARID1A antagonizes P300 HAT activity toward H3K27ac. To understand how increased P300 HAT activity affects transcriptional processes in ARID1A-deficient cells, we performed RNA-seq following knockdown of P300, ARID1A, or both in 12Z cells. We used the GeneHancer database (Fishilevich et al. 2017) to associate regions of differential H3K27ac targets (Fig. 6.5l). Genes linked to the gain reversal cluster were enriched for genes with differential expression following ARID1A knockdown versus control and differential expression following ARID1A and P300 co-knockdown versus ARID1A knockdown alone (Fig. 6.5m). Specifically, genes linked to gain reversal regions were more likely to be upregulated following ARID1A knockdown relative to control and downregulated following ARID1A and P300 co-knockdown relative to ARID1A knockdown alone (Fig. 6.5n-o). We reasoned that upregulated genes driving ARID1A-deficient invasion would be rescued upon P300 loss or A-485 treatment. To narrow down a smaller subset of genes responsible for P300-dependent invasion in ARID1A-deficient cells, we performed additional RNA-seq using 100 nM A-485, a lower dose that has no effects on cell health or global H3K27ac reduction, but significantly inhibits invasion and migration (Fig. 6.4f-h, E.4b, and E.5f). While P300 loss resulted in the differential expression of 2657 genes (FDR < 0.0001), 100 nM A-485 treatment resulted in the differential expression of only 566 genes, suggesting a more specific effect (Fig. 6.6a). Concordantly misregulated genes between siP300 and A-485 overlapped, providing additional validation of this approach (Fig. 6.6a). To determine gene regulation by super-enhancers, we identified 3 groups of super-enhancer-regulated genes: active genes with a promoter directly within a super-enhancer (Fig. 6.6b) (Whyte et al. 2013), active genes with a promoter within 50 215 kb of a super-enhancer (Fig. 6.6c) (Sanyal et al. 2012), and active genes linked to super-enhancers through the GeneHancer database (Fig. 6.6d). In all cases, super-enhancer-regulated genes were enriched among DE genes with ARID1A loss and further P300 loss or HAT inhibition (Fig. 6.6b- d). To identify genes implicated in ARID1A mutant invasion, we compared overlapping genes sets from siARID1A versus control, siARID1A + siP300 versus siARID1A, and siARID1A + 100 nM A-485 versus siARID1A comparisons and identified a set of 138 “triple intersect” genes (Fig. 6.6e). These correspond to genes affected by ARID1A loss and further affected by P300 loss or inhibition of P300 HAT activity. This gene set was enriched for the hallmark epithelial-to- mesenchymal transition pathway and Gene Ontology (GO) gene sets related to invasive phenotypes (Fig. 6.6f-g). Among the 138 triple intersect genes, we identified 50 genes that were upregulated by ARID1A loss and further suppressed by P300 loss or low-dose A-485-mediated HAT inhibition (Fig. 6.6h). Of these, 16 genes were associated with H3K27ac gain reversal enhancers, and 3 gene loci have associated super-enhancer elements. Only SERPINE1 was identified as displaying gene expression reversal following ARID1A loss and further 100-nM A- 485 treatment, association with H3K27ac gain reversal enhancer elements, and regulation by a SE. 216 Figure 6.5 ARID1A antagonizes P300 HAT activity at a subset of active super-enhancers 217 Figure 6.5 (cont’d) a, Comparison of H3K27ac CUT&RUN and ChIP-seq. Left, pie chart displaying theproportion of H3K27ac ChIP-seq replicate-overlapping peaks (n = 40,019) identified by CUT&RUN vs. not identified. Center, CUT&RUN signal at replicate-overlapping peaks quantified by -log10(FDR), displayed as a boxplot in the style of Tukey with outliers. CUT&RUN peaks are further segregated by whether they were also identified by ChIP-seq. Statistic is unpaired, two-tailed Wilcoxon test. Right, correlation of CUT&RUN vs. ChIP signal at 37,803 consensus peaks identified by ChIP used for differential analysis. RPKM signal values are further log2 transformed for plotting. Statistics are Pearson and Spearman correlations. b, MA plots for H3K27ac CUT&RUN comparisons: left, siARID1A vs. control; center, siARID1A + siP300 vs. siARID1A; right, siARID1A + 1 μM A-485 vs. siARID1A. A total of 37,803 consensus peaks previously identified by H3K27ac ChIP were used for differential testing, and significant (FDR < 0.01) regions are marked in red. c, Proportional Euler diagrams displaying overlapping differential H3K27ac regions between the comparisons in b. Statistic is hypergeometric enrichment. d, Clustering of H3K27ac log2FC values among 6521 intersect regions shown in c. H3K27ac classes are defined by directionality patterns. e, Diagrammatic explanation of H3K27ac classes identified in (d). “Acetylation loss” sites (n = 3005) display decreasing H3K27ac with siARID1A and further decrease with siP300 or 1 μM A- 485 treatment. “Acetylation gain” sites (n = 1455): increasing H3K27ac with siARID1A and further increase with siP300 or 1 μM A-485 treatment. “Gain reversal” sites (n = 1132): increasing H3K27ac with siARID1A and decrease with further siP300 or 1 μM A-485 treatment. “Loss reversal” sites (n = 629): decreasing H3K27ac with siARID1A that increase with siP300 or 1 μM A-485 treatment. f, H3K27ac ChIP-seq signal quantification at intersect regions vs. else, and the five intersect region classes defined in d-e. Statistic is unpaired, two-tailed Wilcoxon test. g, Genomic enrichment for ARID1A binding at H3K27ac intersect regions and subclasses. Statistic is hypergeometric enrichment. h, Genomic annotation of H3K27ac regions and intersect subclasses. Statistic is chi-squared. i, Genomic enrichment for H3K27ac intersect region classes at each chromatin state, compared to the whole genome. Statistic is hypergeometric enrichment. j, Map of chromatin state changes following ARID1A loss overlaid by the proportion of state-state base pairs displaying acetylation gain reversal as the color feature. k, Enrichment for H3K27ac intersect regions and subclasses at (top) active distal super-enhancer peaks and (bottom) active typical enhancer peaks. Statistic is hypergeometric enrichment. l, Diagram of GeneHancer database usage to associate H3K27ac enhancer regions with genes. m, Enrichment for differential gene expression following (top) siARID1A or (bottom) siP300 (in siARID1A cells) treatment among expressed genes associated with H3K27ac enhancer regions by GeneHancer. Statistic is hypergeometric enrichment. n, Enrichment for (left) upregulated vs. (right) downregulated genes following (top) siARID1A vs. (bottom) siP300 (in siARID1A cells) treatment among enhancer-associated genes as in m. The statistic is hypergeometric enrichment. o, Distribution of upregulated vs. downregulated genes in enhancer-associated gene classes as in m-n for (left) siARID1A or (right) siP300 (in siARID1A cells) DE genes. Statistic is hypergeometric enrichment. 218 Figure 6.6 Inhibition of P300 HAT activity reverses the expression of a subset of ARID1A- regulated genes a, Proportional Euler diagram displaying concordant, overlapping DE genes (FDR < 0.0001) by siP300 or 100 nM A-485 treatment (p < 10-186). Statistic is hypergeometric enrichment. b-d, Enrichment of DE genes affected by ARID1A loss, P300 loss, or A-485 treatment for (b) genes with active promoters directly inside of super-enhancers (n = 164), (c) promoters within 50 kb of a super-enhancer (n = 496), or (d) genes linked to super-enhancers by the GeneHancer database (n = 1599). Statistic is hypergeometric enrichment. e, Proportional Euler diagram displaying overlap of DE genes (FDR < 0.0001) in indicated comparisons. “Triple intersect” genes refer to the full intersection of all noted gene expression comparisons. f-g, Gene set enrichment analysis for (f) MSigDB Hallmark pathways and (g) Gene Ontology (GO) biological process terms on various DE genes clades identified in a and e. h, Heatmap for relative expression of triple intersect genes (n = 138, as in e), highlighting genes in which 100 nM A-485 reverses ARID1A loss-driven upregulation (right, n = 50). Red values: increased expression relative to control; blue: decreased expression relative to control. The rightmost columns indicate association with acetylation gain reversal enhancers (Fig. 6.5d-e) or regulation by super-enhancer, in purple. *** p < 0.001. 219 6.3.6 SERPINE1 promotes ARID1A mutant cell invasion The serine protease inhibitor, SERPINE1 (also known as plasminogen activator inhibitor type 1, PAI-1), is a member of the urokinase plasminogen activator (uPA) system (Smith and Marshall 2010). This system regulates extracellular fibrin proteolysis and influences cell invasion, migration, and extracellular matrix (ECM) remodeling (Duffy 2004). SERPINE1 is a biomarker for endometriosis, with high levels of expression observed in ovarian and deep infiltrating endometriosis (Alotaibi et al. 2019; Gilabert-Estelles et al. 2003; Ramon et al. 2005; Ye et al. 2017). We examined a published RNA-seq dataset of human endometrial organoids and observed that SERPINE1 was upregulated in organoids derived from ectopic endometrial tissue compared to healthy endometrial tissue (log2FC = 3.86, FDR = 0.051) (Boretto et al. 2019). In 12Z cells, the SERPINE1 super-enhancer was ranked in the top 5% of active super-enhancers (Fig. 6.7a) and displayed H3K27 hyperacetylation upon ARID1A loss, which was reversed by further P300 loss or A-485 treatment (Fig. 6.7b). Notably, SERPINE1 was the most significant upregulated gene upon ARID1A loss (Fig. 6.7c), and P300 co-knockdown or HAT inhibition rescued SERPINE1 expression (Fig. 6.7d-e). SERPINE1 was also upregulated in the endometrial epithelium of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice (Fig. 6.7f). LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl eutopic endometrial epithelia and ectopic lesions showed increased SERPINE1 by IHC, which was not observed in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl (Fig. 6.7g-h). Among a cohort of deep infiltrating and ovarian endometriosis tissue samples, samples with a loss of ARID1A expression displayed the highest expression of SERPINE1 by IHC (Fig. E.7). To determine whether SERPINE1 promotes the invasion of ARID1A-deficient cells, we depleted SERPINE1 via siRNA transfection in 12Z (Fig. 6.7i). While there was no change in invasion with SERPINE1 loss alone, SERPINE1 loss suppressed the invasive phenotype of 220 ARID1A-deficient cells (Fig. 6.7j). SERPINE1 loss had no effect on adherent cell growth (Fig. 6.7k and E.4b). In non-adherent conditions, ARID1A and SERPINE1 co-knockdown resulted in increased caspase 3/7 activity (Fig. 6.7l) and increased death in cells suspended in Matrigel (Fig. 6.7m), indicating that SERPINE1 is required for anoikis resistance in ARID1A mutant cells. These results suggest that ARID1A prevents hyperacetylation of the SERPINE1 super-enhancer in the wild-type state, while ARID1A loss results in P300-dependent hyperacetylation and increased activity of the SERPINE1 super-enhancer, increased SERPINE1 transcription, and the acquisition of invasive phenotypes. P300 inhibition in ARID1A-deficient cells suppresses H3K27 hyperacetylation of the SERPINE1 SE, resulting in decreased SERPINE1 expression and anoikis. 221 Figure 6.7 Hyperactivation of the SERPINE1 super-enhancer promotes ARID1A mutant cell invasion 222 Figure 6.7 (cont’d) a, ROSE ranking of active super-enhancers in 12Z cells (n = 413). The SERPINE1 super-enhancer locus is ranked 20th out of 413 based on basal H3K27ac levels. b, Genomic snapshot of ChIP and ATAC signals alongside differential H3K27ac and chromatin state annotations at the SERPINE1 super-enhancer locus. For signal tracks, the y-axis represents assay signal-to-noise presented as log-likelihood ratio (logLR) as reported by MACS2, and small bars below the tracks represent replicate-overlapping peaks. H3K27ac log2FC colored bars denote significant differential H3K27ac regions (FDR < 0.05 for ChIP, FDR < 0.01 for CUT&RUN). ROSE-defined active super-enhancer locus is represented by the black bar. c, Significance (log10FDR, y-axis) of DE genes following ARID1A loss, ranked by FDR value (x- axis). SERPINE1 is the most significantly upregulated gene (arrow). d, Expression of SERPINE1 (RNA-seq) following indicated 12Z cell treatments. Means ± SDs, n = 3. The statistic is DESeq2 Wald test, FDR-adjusted. e, Western blot analysis as indicated in 12Z cells, representative of two independent experiments. f, Relative expression of SERPINE1 by RNA-seq. Means ± SDs, n = 3 control mice and n = 4 mutant mice. The statistic is DESeq2 Wald test, FDR-adjusted. g, IHC of SERPINE1 in endometrium of indicated genotypes; n = 4-5 mice per condition. h, Quantification of IHC staining, ratio of H-scores of epithelia to stroma. Means ± SDs, n = 4-5 mice, unpaired, two-tailed t-test. i, Western blot analysis as indicated in 12Z cells, representative of two independent experiments. j, Invasion of 12Z cells following indicated treatment. Representative images and total invaded cell numbers are shown (scale bar, 500 μm). Means ± SDs, n = 5, unpaired, two-tailed t-test. k, Measurement of 12Z cell growth following indicated treatments. Means ± SDs, n = 4. No significant differences, unpaired, two-tailed t-test. l, Caspase-Glo assay of 12Z cells in suspension following indicated treatments. Means ± SDs, n = 5, unpaired, two-tailed t-test. m, Ratio of dead-to-live cells after 24 hours in Matrigel. Means ± SDs, n = 6, unpaired, two-tailed t-test. * p < 0.05, ** p < 0.01, and *** p < 0.001. 223 6.4 Discussion From these experiments, we have demonstrated that ARID1A prevention of super- enhancer hyperactivation plays an essential physiological role in maintaining endometrial tissue homeostasis and preventing cell invasion. Despite enrichment at gene promoters, our genome- wide chromatin state model indicated that ARID1A frequently regulates highly active enhancer- like regions, both proximately and distally to genes. We observed that P300 is required for ARID1A mutant invasion, and ARID1A normally antagonizes P300 histone acetyltransferase activity at many super-enhancers, but not frequently typical enhancers, with consequences on local gene expression. ARID1A mutant cell invasion has been described in other diseases and malignancies, but the functional link between ARID1A loss, super-enhancer hyperactivation, and the subsequent acquisition of P300-dependent invasiveness is unique to the endometrium (Lakshminarasimhan et al. 2017; Li et al. 2017; Sun et al. 2017; Yan et al. 2014). Retrograde menstruation is thought to play a role in the spread of abnormal endometrial tissue to ectopic sites. ARID1A mutations may predispose displaced endometrial cells to forming endometriotic lesions by promoting the acquisition of invasive phenotypes in a cell-autonomous manner (Wilson et al. 2019; Wilson, Holladay, and Chandler 2020). Our findings suggest that epigenetic misregulation of super-enhancers promotes endometrial invasion and survival at ectopic sites. Alterations in super-enhancer activity may be an important pathophysiological feature of endometriotic epithelium. Recently, there has been interest in the therapeutic inhibition of super-enhancer activity in several diseases. Small-molecule inhibitors of super-enhancer factors, particularly the BET bromodomain inhibitor JQ1, have undergone clinical trials for multiple cancer types (Shin 2018). BRD4 interacts with H3K27ac-rich super-enhancer regions, and the disruption of BRD4 224 bromodomain-super-enhancer interactions using small molecules can decrease oncogene expression (Sengupta et al. 2015). The inhibition of histone acetylation represents a growing area of interest in small-molecule therapeutics (Simon et al. 2016). Targeted disruption of P300 HAT activity at super-enhancers may have therapeutic utility in endometrial diseases. 6.5 Methods 6.5.1 Mouse care, use, and genotyping All mice were maintained on an outbred genetic background using CD-1 mice (Charles River). (Gt)R26Pik3ca*H1047R, LtfCre (Tg(Ltf-iCre)14Mmul) and Ep300fl alleles were purchased from The Jackson Laboratory and identified by PCR using published methods (Adams et al. 2011; Daikoku et al. 2014; Kasper et al. 2006). Arid1afl allele was distinguished by PCR as previously described (Chandler et al. 2015). Genotyping primers are listed in Key Resources Table. Endpoints were vaginal bleeding, severe abdominal distension, and signs of severe illness including dehydration, hunching, jaundice, ruffled fur, signs of infection, or non-responsiveness. Sample sizes for each genotype were chosen based on the proportions of animals with vaginal bleeding between each experimental group and Kaplan-Meyer log rank test for survival differences. All mice analyzed in the study were between 6 and 32 weeks old. In cases where a mobility endpoint occurred, tissues were collected at the time of vaginal bleeding, including LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (μ1/2 = 107 days) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl (μ1/2 = 143 days) mice. In cases where the animal did not reach a morbidity endpoint or show reduced survival, tissues were collected at comparable time points (between 90 and 150 days) from age-matched, littermate control mice from the mutant crosses. Uteri were collected at time of sacrifice and placed immediately into neutral-buffered formalin at 4 °C. After 24 hours, tissues were washed with PBS 225 and 50% EtOH, placed in 70% EtOH, and weight measurements were recorded. Mice were housed at the Michigan State University Grand Rapids Research Center in accordance with protocols approved by Michigan State University. Michigan State University is registered with the U.S. Department of Agriculture (USDA) and has an approved Animal Welfare Assurance from the NIH Office of Laboratory Animal Welfare (OLAW). MSU is accredited by the Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC). 6.5.2 Cell lines 12Z immortalized human endometrial epithelial cells (Zeitvogel, Baumann, and Starzinski-Powitz 2001) were maintained in DMEM/F12 media supplemented with 10% fetal bovine serum (FBS), 1% L-glutamine and 1% penicillin/streptomycin (P/S). The 12Z cells were provided by the laboratory of Asgi Fazleabas, and cell line validation was performed by IDEXX BioResearch, finding the result that the 12Z cell line has a unique profile not found in the current public databases. A recent study found 12Z cells to be an authentic and pure endometriosis cell line based on marker analysis and short tandem repeat profiling (Romano et al. 2020). Lenti-X™ 293T (Clontech) cells were maintained in DMEM +110 mg/L Sodium Pyruvate (GIBCO) supplemented with 10% FBS, 1% L-glutamine, 1% P/S. 12Z and Lenti-X 293T cells were regularly tested for mycoplasma using the Mycoplasma PCR Detection Kit (Applied Biological Materials). No commonly misidentified cell lines were used in this study. 6.5.3 Histology and immunohistochemistry For indirect immunohistochemistry (IHC), 10% neutral buffered formalin (NBF)-fixed paraffin sections were processed for heat-based antigen unmasking in 10 mM sodium citrate [pH 6.0]. 226 Sections were incubated with antibodies at the following dilutions: 1:200 ARID1A (D2A8U) (12354, Cell Signaling); 1:1000 P300 (86377, Cell Signaling); 1:400 Phospho-S6 (4585, Cell Signaling); 1:100 KRT8 (TROMA1, DHSB); 1:200 Cleaved Caspase-3 (9579, Cell Signaling); 1:400 Ki67 (12202, Cell Signaling); 1:200 H3K27ac (39133, Active Motif); 1:200 H3K18ac (ab1191, Abcam); 1:1000 PAI-1 (SERPINE1) (ab66705, Abcam). TROMA-I antibody was deposited to the DSHB by Brulet, P./Kemler, R. (DSHB Hybridoma Product TROMA-I). Biotin- conjugated secondary antibodies were donkey anti-rabbit IgG (711-065-152, Jackson Immuno- research Lab) and donkey anti-rat IgG (#705-065-153, Jackson Immuno-research Lab). VECTASTAIN Elite ABC HRP Kit (Vector) was used for secondary antibody detection. Sections for IHC were lightly counter-stained with Hematoxylin QS or Methyl Green (Vector Labs). Routine Hematoxylin and Eosin (H&E) staining of sections was performed by the Van Andel Research Institute (VARI) Histology and Pathology Core. A VARI animal pathologist reviewed histological tumor assessments. To determine H-scores from mouse slides, one field of view (20X) on a Nikon Eclipse Ni- U upright microscope per mouse from a slide stained with antibody (SERPINE1, H3K27ac, H3K18ac) was used. Epithelial and stromal cells were assigned a value from 0 to 3 indicating intensity of staining (no staining = 0, low staining = 1, moderate staining = 2, and strong staining = 3) and the proportion of cells was determined for each staining intensity. For Ki67, a value of 1 (positive staining) or 0 (negative staining) was assigned to determine the number of Ki67+ cells. For human endometriosis tissue samples, SERPINE1 (PAI-1) IHC was carried out as published (Alotaibi et al. 2019). Briefly, IHC using the EnVision+ Dual Link system (Dako) and 3,3-diaminobenzidine (DAB) was performed, using mouse monoclonal PAI-1 antibody C-9 (sc 5297, Santa Cruz). PAI-1 expression was evaluated in endometriotic epithelium and stroma using 227 the Histoscore calculation. Areas of endometriosis epithelium and stroma were first scanned at low power (x10) and then analyzed at high power (x40) to evaluate the staining intensity and estimate the proportion of positive cells. ARID1A IHC was used as a surrogate of loss-of-function alterations (Khalique et al. 2018) using a Dako Omnis automated immunostainer (Agilent Technologies) and the anti-ARID1A rabbit monoclonal D2A8U (Cell Signaling). 6.5.4 Transfections 12Z cells were seeded at a density of 30,000 cells/mL in DMEM/F12 media supplemented with 10% FBS and 1% L-glutamine. The following day, cells were transfected with 50 nM siRNA (Dharmacon, ON-TARGETplus Non-targeting Pool, human ARID1A #8289 SMARTpool, human P300 #3486 SMARTpool, human SERPINE1 #19376 SMARTpool) using the RNAiMax (ThermoFisher) lipofectamine reagent according to the manufacturer’s instructions at a ratio of 1:1 volume:volume in OptiMEM (GIBCO). After 24 hours, the media was replaced. For plasmid co-transfection experiments, cells were transfected the following day with 500ng pBabe vector containing PIK3CAH1047R (pPIK3CAH1047R) or pBabe empty vector using the FuGene HD transfection reagent (Promega) according to the manufacturers’ instructions at a ratio of 2:1 volume:mass, and media was replaced after 4 hours. The pPIK3CAH1047R was a gift from Jean Zhao (Addgene plasmid 12524) (Zhao et al. 2005). In A-485 co-treatment studies, A-485 was included in the media 24 hours post-transfection in 0.1% DMSO. 48 hours after transfection, media was replaced with DMEM/F-12 media supplemented with 0.5% FBS, 1% P/S and 1% L- glutamine. Cells were collected 72 hours post siRNA transfection using the Quick-RNA Miniprep Kit (Zymo Research) for RNA, RIPA buffer (Cell Signaling) for whole cell lysate, or histone extraction. 228 6.5.5 Generation and use of lentiviral shRNA particles Lentiviral particles expressing shRNAs were produced in 293T cells according to the manufacturers’ instructions. Lenti-X™ 293T cells were transfected with lentiviral packaging mix composed of pNHP and pVSVG (generous gifts from Dr. Fredric Manfredsson) and MISSION pKLO.1 plasmid containing non-targeting shRNA (control) or pooled ARID1A shRNAs (shARID1A) (Sigma) using polyethylenimine (PEI) in DMEM + 4.5g/L D-Glucose, 110 mg/L Sodium Pyruvate, 10% FBS, 1% L-glutamine. After 24 hours, media was replaced with DMEM/F12, 10% FBS, 1% L-glutamine, 1% P/S. Viral particles were collected after 48 and 96 hours, and viral titers were calculated using the qPCR Lentiviral Titration Kit (ABM). For lentiviral transduction of 12Z cells, cells were treated with a multiplicity of infection of 100 units per cell. After 24 hours, media was replaced. For plasmid co-transfection experiments, cells were transfected the following day with 500 ng pBabe vector containing PIK3CAH1047R (pPIK3CAH1047R) or pBabe empty vector using the FuGene HD transfection reagent (Promega) according to the manufacturers’ instructions at a ratio of 2:1 volume:mass, and media was replaced after 4 hours. In A-485 co-treatment studies, A-485 was included in the media 24 hours post- transfection in 0.1% DMSO. To generate stable expression cell lines, transduced cells were treated with 600 ng/mL puromycin (Sigma) for three weeks. 6.5.6 Histone extraction Cells were washed with PBS and scraped in PBS containing 5 mM sodium butyrate. Cells were centrifuged and resuspended in TEB buffer (phosphate buffered saline supplemented with 0.5% Triton X-100, 5 mM sodium butyrate, 2 mM phenylmethylsulfonyl fluoride, 1X protease inhibitor cocktail) and incubated on a 3D spindle nutator at 4 °C for 10 min. Cells were centrifuged at 3000 229 RPM for 10 minutes at 4 °C. TEB wash step was repeated once. Following second wash, pellet was resuspended in 0.2 N HCl, and incubated on 3D spindle nutator at 4 °C overnight. The following day, samples were neutralized with 1:10 volume 1M Tris-HCl pH 8.3. Sample was centrifuged at 3000 RPM for 10 minutes at 4 °C, and supernatant containing histone proteins was collected. 6.5.7 Western blotting Protein whole cell lysates and histone extracts were quantified using the Micro BCA Protein Assay Kit (ThermoFisher) and a FlexSystem3 plate reader. Protein lysates were run on a 4%–15% gradient SDS-PAGE gel (BioRad) and transferred to PVDF membrane using the TransBlot Turbo system (BioRad). Primary antibodies dilutions were 1:1000 ARID1A (D2A8U) (12354, Cell Signaling); 1:100 P300 (NM11) (sc-32244, Santa Cruz); 1:1000 β-Actin (8457, Cell Signaling); 1:100 PAI-1 (sc-5297, Santa Cruz); 1:1000 Akt (4691, Cell Signaling); 1:2000 Phospho-Akt (Ser473) (4060, Cell Signaling). Horseradish peroxidase (HRP) conjugated secondary antibodies (Cell Signaling) were used at a dilution of 1:2000. Clarity Western ECL Substrate (BioRad) was used for protein band visualization, and western blot exposures were captured using the ChemiDoc XRS+ imaging system (BioRad). For histone extracts, samples were run on a 15% SDS-PAGE gel and transferred to nitrocellulose membrane in 20 mM sodium phosphate pH 6.7 at 400 mA for 90 minutes. Primary antibody dilutions were 1:2000 Histone H3 (4499, Cell Signaling); 1:1000 H3K27ac (Active Motif, 39133). Donkey anti-rabbit IgG, IRDye 800CW conjugated secondary antibody (LI-COR Biosciences) was used at a dilution of 1:10,000 and fluorescence imaging was performed using the LI-COR Odyssey CLx imaging system (LI-COR Biosciences). 230 6.5.8 Transwell invasion assay 12Z cells were seeded in 6-well dishes at a density of 50,000 cells per well. After 24 hr, cells were transfected with siRNA as described above. For drug treatment experiments, cells were treated drug 24 hr after transfection. At 48 hr post-transfection, cells were trypsinized, and 100 μL of cell mixture containing 30,000 cells and 0.3 mg/mL Matrigel was seeded into transwell plates (8 μm pore polycarbonate membrane, Corning) pre-coated with 100 μL of 0.3 mg/mL Matrigel. After 1 hour, serum-free DMEM/F12, 1% P/S, 1% L-glutamine media was added to the top chamber and DMEM/F12, 5% FBS, 1% P/S, 1% L-glutamine was added to the bottom chamber. For drug studies, drug was included in both top and bottom chamber media. After 16 hours, transwell units were transferred to plates containing 2 μg/mL calcein-AM in DMEM/F12. After 1 hour, media was aspirated from the top chamber and unmigrated cells were removed with a cotton swab. Images were collected using a Nikon Eclipse Ti microscope in five non-overlapping fields per well. ImageJ software (National Institutes of Health) was used to quantify cells based on size and intensity. 6.5.9 Matrigel viability assay 12Z cells were seeded in 6-well dishes at a density of 50,000 cells per well. After 24 hours, cells were transfected with siRNA as described above. For drug treatment experiments, cells were treated drug 24 hours after transfection. At 48 hours post-transfection, cells were trypsinized, and 50 μL of cell mixture containing 10,000 cells and 0.3 mg/mL Matrigel was seeded into 96-well plates pre-coated with 100 μL of 0.3 mg/mL Matrigel. After 1 hour, 50 μL of serum-free DMEM/F12 1% P/S, 1% L-glutamine media was added. For drug studies, 100 nM A-485 or vehicle was included in the media. After 16 or 24 hour, 2 μg/mL calcein-AM and 4 μg/mL ethidium 231 homodimer III were added. Wells were imaged using a Nikon Eclipse Ti microscope, and ImageJ software (National Institutes of Health) was used to quantify cells based on size and intensity. 6.5.10 Migration assay 12Z cells were seeded into 35mm dishes containing 4-well culture inserts at a density of 4000 cells per well. 24 hours after seeding, cells were treated with lentiviral particles expressing non-targeting shRNA (control) or shARID1A at a multiplicity of infection of 100. Media was replaced with serum-free DMEM/F12 containing 1% L-glutamine and 1% P/S including drug or vehicle after 24 hours. Culture inserts were removed and serum-free media containing vehicle or drug was replenished after 16 hours. At 0 and 24 hours of migration, images were taken using a Nikon Eclipse Ti microscope. Distances between migration fronts were measured using NIS Elements Advanced Research software at 16 different points 100 μm apart. Migration distance was calculated by subtracting the average distance across migration fronts at 24 hours from the average distance at 0 hours. Cells were counted within a window surrounding the 1050 mm2 migration area. 6.5.11 Viability assay Cells were seeded in 6-well plates at a density of 100 cells/well. After 24 hours, cells were treated with A-485 at concentrations from 10 nM to 100 μM. After 6 days, cells were stained with crystal violet and counted. 232 6.5.12 Cell growth assay Cells were seeded at a density of 4000 cells per well in a 96-well plate. After 24 hours, cells were transfected as described above. After 24 hr, cells were treated with drugs for 48-72 hours. Cells were incubated with 2 μg/mL calcein-AM for 1 hour, and fluorescence was measured using a SpectraMax i3x (Molecular Devices). 6.5.13 Cell suspension Caspase-Glo assay The Caspase-Glo 3/7 Assay (Promega) was used according to the manufacturer’s instructions. Following transfection (48 hours) and drug treatment (24 hours), cells were seeded at 10,000 cells per well in a 96-well Cellstar Cell-Repellent plate (Greiner Bio-one) in serum-free DMEM/F12, 1% L-glutamine, 1% P/S containing A-485 or vehicle. After 24 hours, cells were treated with Caspase-Glo at a ratio of 1:1 and incubated at 37 °C for 1 hour. Cells were then transferred to a white 96-well plate (costar) and luminescence was measured using a SpectraMax i3x (Molecular Devices). 6.5.14 Annexin V assay Expression of Annexin V was measured by flow cytometry using the Annexin V-FITC Kit (Miltenyi Biotec) according to the manufacturer’s instructions. Flow cytometry was performed using a BD Accuri C6 flow cytometer (BD Biosciences) and analyzed using FlowJo v10 software (BD Biosciences). 233 6.5.15 Cell cycle assay The Click-iT Plus EdU Flow cytometry Assay Kit (Invitrogen) was used for cell cycle assays. Cells were treated with 10 μM of EdU for 2 hours in culture media. Cells were harvested by trypsinization and washed in 1% BSA in PBS. Cells were resuspended in 100 μL of ice cold PBS, and 900 μL of ice cold 70% ethanol was added dropwise while vortexing. Cells were incubated on ice for two hours. Cells were washed with 1% BSA in PBS and then treated with the Click-iT Plus reaction cocktail including Alexa Fluor 488 picolyl azide according to the manufacturer’s instructions for 30 min. Cells were washed with 1X Click-iT permeabilization buffer and wash reagent, and then treated with 5 μM of Vybrant Dye Cycle Ruby Stain (ThermoFisher) diluted in 1% BSA in PBS for 30 minutes at 37 °C. Flow cytometry was performed using a BD Accuri C6 flow cytometer (BD Biosciences) and analyzed using FlowJo v10 software (BD Biosciences). 6.5.16 Construction and sequencing of directional mRNA-seq libraries RNA samples were collected 72 hours following siRNA transfection using the Quick-RNA Miniprep Kit (Zymo Research). Libraries were prepared by the VARI Genomics Core from 500 ng of total RNA using the KAPA mRNA HyperPrep kit (v4.17) (Kapa Biosystems). RNA was sheared to 300-400 bp. Prior to PCR amplification, cDNA fragments were ligated to IDT for Illumina unique dual adapters (IDT DNA Inc). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies), QuantiFluor® dsDNA System (Promega), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled and 100 bp, single end sequencing was performed on an Illumina NovaSeq 6000 sequencer using an SP, 100 cycle sequencing kit (Illumina) and each library was sequenced to an average raw depth of 35M reads. Base calling was 234 done by Illumina RTA3 and output of NCS was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 6.5.17 Chromatin immunoprecipitation 12Z cells were crosslinked 72 hours post-transduction with lentiviral particles containing control shRNAs or ARID1A-targeting shRNAs (differential P300, H3K18ac, H3K27ac, H3K27me3, H3K4me3 and H3K4me1 ChIP-seq) or untreated cells were used (wild-type P300 ChIP-seq). For crosslinking, cells were treated with 1% formaldehyde in cell culture media for 15 minutes at ambient temperature. Formaldehyde was quenched by the addition of 0.125 M Glycine, and cells were washed with PBS. 4*106 crosslinked cells were used per IP for H3K4me3 and H3K27me3, and 1*107 crosslinked cells were used per IP for all other antibodies. Chromatin from crosslinked cells was fractionated by digestion with micrococcal nuclease using the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) per the manufacturers’ instructions. IPs were performed in duplicate per antibody and condition by adapting established methods (Boyd and Farnham 1997). For P300 IPs, nuclei were resuspended in nuclear lysis buffer (50 mM Tris-HCl [pH 8.0], 10 mM EDTA [pH 8.0], 1% SDS) and sonicated for 30 seconds. Protein G magnetic beads (Cell Signaling) were preconjugated with antibody overnight at 4 °C in wash buffer (1X PBS, 0.5% BSA, 0.02% Tween-20). Antibody used was 5 μg P300 (sc-32244, Santa Cruz). Fractionated chromatin was diluted into IP buffer (0.01%SDS, 1.1% TrionX-100, 1.2 mM EDTA [pH 8.0], 16.7 mM Tris-HCl [pH 8.0], 167 mM NaCl) and incubated with preconjugated antibody/Dynabeads overnight at 4 °C. Samples were washed at 4 °C with high-salt buffer (0.1% SDS, 1% Triton X- 100, 2 mM EDTA [pH 8.0], 20 mM Tris-HCl [pH 8.0], 0.5 M NaCl), low-salt buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA [pH 8.0], 20 mM Tris-HCl [pH 8.0], 150 mM NaCl), dialysis 235 buffer (0.2% Sarcosyl, 2 mM EDTA [pH 8.0], 50mM Tris-HCl [pH 8.0]), IP wash buffer (0.25M LiCl, 1% NP-40, 1% Deoxycholate, 1 mM EDTA [pH 8.0], 10 mM Tris-HCl [pH 8.0]) and TE (10mM Tris-HCL [pH 8.0], 1mM EDTA [pH 8.0]). IP chromatin was eluted for 30 minutes at 37 °C with elution buffer (1% SDS, 0.1 M NaHCO3). Crosslinks were reversed with 0.4 mg/mL Proteinase K (ThermoFisher) and 0.2 M NaCl at 65 °C for 2 hours. DNA was purified using the ChIP DNA Clean & Concentrator Kit (Zymo). For H3K27ac, H3K18ac, H3K27me3, H3K4me3 and H3K4me1, IPs were performed using the SimpleChIP Enzymatic Chromatin IP Kit per the manufacturers’ instructions. For H3K27ac and H3K18ac the addition of 5 mM sodium butyrate included in Buffer A and ChIP Buffer. Antibodies used were 10 μg H3K27ac (Active Motif, 39133), 5 μg H3K18ac (ab1191, Abcam), 10 μL H3K27me3 (Cell Signaling, 9733), 10 μL H3K4me3 (Cell Signaling, 9751), or 4 μg H3K4me1 (Abcam, ab8895) per IP. DNA was purified as described above. 6.5.18 Construction and sequencing of ChIP-seq libraries Libraries for Input and IP samples were prepared by the VARI Genomics Core from 10 ng of input and IP material when available, and all material when less than 10 ng available, using the KAPA Hyper Prep Kit (v5.16) (Kapa Biosystems). Prior to PCR amplification, end repaired and A-tailed DNA fragments were ligated to Bioo Scientific NEXTflex Adapters (Bioo Scientific). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies), QuantiFluor® dsDNA System (Promega), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled. For P300 ChIP in wild-type cells, 100 bp, single end sequencing was performed on an Illumina NovaSeq 6000 sequencer using an SP, 100 cycle sequencing kit (Illumina) and each 236 library was sequenced to minimum read depth of 100M reads per input library, 50M reads per IP library. Base calling was done by Illumina NextSeq Control Software (NCS) v2.0. For differential P300 ChIP, 75 bp, paired end sequencing was performed on an Illumina NextSeq 500 sequencer using 150 cycle HO and MO sequencing kits (v2) (Illumina), with all libraries run across 2 flowcells to return a minimum read depth of 80M reads per input library and 40M read per IP library. Base calling was done by NCS v2.0. For differential H3K27ac, H3K18ac and H3K4me1 ChIP-seq IPs, 50 bp, paired end sequencing was performed on an Illumina NovaSeq 6000 sequencer using an S1, 100 cycle sequencing kit and each library was sequenced to minimum read depth of 50 M reads per IP library. Input samples were sequenced using 100 bp, single end sequencing to a minimum read depth of 100 M reads. Base calling was done by Illumina RTA3. For H3K27me3 and H3K4me3 ChIP, 75 bp, single-end sequencing was performed on an Illumina NextSeq 500 sequencer using 75 cycle HO sequencing kits (v2), with all libraries run across two flow cells to return a minimum read depth of 80 M reads per input library and 40 M read per IP library. Base calling was done by Illumina NextSeq Control Software (NCS) v2.0. For all experiments, output data was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 6.5.19 Cleavage Under Targets and Release Using Nuclease (CUT&RUN) The CUT&RUN protocol was adapted from established methods (Skene, Henikoff, and Henikoff 2018). BioMag Plus Concanavalin A-coated magnetic beads (Bangs Laboratories) were washed in Binding Buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 1 mM CaCl2, 1 mM MnCl2). 72 hours following siRNA transfection, 500,000 12Z cells were harvested and resuspended in Wash Buffer (20 mM HEPES-NaOH pH 7.5, 150 mM NaCl, 0.5 mM Spermidine, 1X protease inhibitor 237 cocktail) and washed twice by centrifuge at 600 x g for 3 minutes, and then added to the concanavalin A bead suspension and mixed on a tube rotator for 10 minutes at ambient temperature. Cell/bead conjugates were resuspended in 500 μL of Antibody Buffer (Wash Buffer with 0.05% Digitonin and 2 mM EDTA) containing 5 μg of H3K27ac antibody (Active Motif, 39133) or Rabbit IgG (Cell Signaling, 2729) and incubated in a tube nutator overnight at 4 °C. The following day, cells were washed in Digitonin Buffer (Wash Buffer with 0.05% Digitonin) three times, resuspended in 250 μL of Digitonin Buffer and 12.5 μL of CUTANA pAG-MNase (EpiCypher, 15-1016) was added. Cells were mixed on a nutator at ambient temperature for 1 hour, followed by two washes in Digitonin Buffer and one wash with Low-Salt Rinse Buffer (20 mM HEPES-NaOH pH 7.5, 0.5 mM Spermidine, 1X protease inhibitor cocktail). Tubes were chilled on ice, 1 mL of Calcium Incubation Buffer (3.5 mM HEPES-NaOH pH 7.5, 10 mM CaCl2, 0.05% Digitonin) was added, and tubes were nutated at 4 °C. After 2.5 minutes, beads were bound to magnet, supernatant was removed and 250 μL of EGTA-STOP Buffer (170 mM NaCl, 20 mM EGTA, 0.05% Digitonin, 20 μg/mL RNase A, 20 μg/mL Glycogen, 0.8 pg/ml Saccharomyces cerevisiae fragmented nucleosomal DNA) was added. Beads were nutated at 37 °C for 30 minutes, followed by centrifugation at 16,000 x g for 5 minutes at 4 °C. DNA was purified using the NucleoSpin Gel and PCR Clean-up Kit (Takara, 740609.50). 6.5.20 Construction and sequencing of CUT&RUN libraries Libraries for CUT&RUN samples were prepared by the Van Andel Genomics Core from 0.5-1 ng of IP material, using the KAPA Hyper Prep Kit (v5.16) (Kapa Biosystems). Prior to PCR amplification, end-repaired and A-tailed DNA fragments were ligated to Bioo Scientific NEXTflex Adapters (Bioo Scientific) at a concentration of 500 nM. Quality and quantity of the 238 finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies, Inc.), QuantiFluor® dsDNA System (Promega Corp.), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled and 50 bp, paired end sequencing was performed on an Illumina NovaSeq 6000 sequencer using an S1, 100 cycle sequencing kit (Illumina Inc.) Each library was sequenced to an average depth of 75 M reads. Base calling was done by Illumina RTA3 and output was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. 6.5.21 RNA-seq analysis For standard mRNA gene-level expression analysis, single-end raw reads were trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality control analysis via FastQC (Andrews 2010) and MultiQC (Ewels et al. 2016). Trimmed reads were aligned to GRCh38.p12 reference human genome assembly (Schneider et al. 2017) and indexed to GENCODE (Harrow et al. 2012; Frankish et al. 2019) v28 GFF3 annotation via STAR aligner (Dobin et al. 2013) with flag `--quantMode GeneCounts` for feature counting. Reverse-stranded, gene-level counts were extracted from the STAR output files and constructed into an experimental read count matrix in R. Low count genes were filtered (1 count per sample on average) prior to DESeq2 (Love, Huber, and Anders 2014; Love et al. 2015) count normalization and differential expression analysis. Modeling design matrices were constructed with a single ‘condition’ variable and included an intercept term. Calculated differential expression probabilities were corrected for multiple testing by independent hypothesis weighting (IHW) (Ignatiadis et al. 2016) for downstream analysis. Threshold for differential expression significance was set at FDR < 0.0001. Relative expression heatmaps were produced using relative regularized-logarithm (rlog) (Love, Huber, and Anders 239 2014) counts by subtracting mean rlog counts of the control group. Relative linear expression bar plots were produced from DESeq2 normalized counts table. Re-analyzed GEMM expression data from Chapter 2 (Wilson et al. 2019) were extracted from GEO at accession GSE129784. Intergenic eRNA and associated differential expression analysis were also analyzed similarly. Briefly, paired-end total RNA sequencing data from 12Z cells treated with siARID1A or non-targeting siRNA control generated in Chapter 2 (Wilson et al. 2019) were re-analyzed from GEO accession GSE129782. Reads were trimmed and aligned as described above. Aligned BAMs were inputted to HOMER (Heinz et al. 2010) in order to count integer RNA-seq reads at each of the 18,050 distal, putatively active enhancer elements described in this study. Counted regions were then excluded which overlapped with any genic regions, including introns, using the genes() function of TxDb.Hsapiens.UCSC.hg38.knownGene R package (Bioconductor Core Team and Maintainer 2016). Expression status of each eRNA locus was then determined by observation of at least 1 count per sample on average, resulting in 3668 expressed intergenic eRNAs. The filtered eRNA counts table was then normalized and modeled for differential expression testing by DESeq2 (Love, Huber, and Anders 2014) as described above. 6.5.22 ChIP-seq analysis Wild-type P300 and differential H3K27ac, H3K18ac, H3K27me3, and H3K4me3 ChIP-seq experiments were analyzed as single-end libraries, while differential P300 and H3K4me1 ChIP- seq were analyzed as paired-end. Raw reads for IPs and inputs were trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality control analysis via FastQC (Andrews 2010) and MultiQC (Ewels et al. 2016). Trimmed reads were aligned to GRCh38.p12 reference genome (Schneider et al. 2017) via Bowtie2 (Langmead and Salzberg 2012) with flag `--very-sensitive`. 240 Aligned reads were sorted and indexed with samtools (Li et al. 2009). For paired-end analyses, only properly-paired read fragments were retained by samtools view with flag `-f 3` followed by sorting and indexing. Specifically for libraries with differential comparisons, molecular complexity was then estimated from duplicate rates by ATACseqQC (Ou et al. 2018) and preseqR (Daley and Smith 2013), and libraries were subsampled to equivalent molecular complexity within an experimental design based on these estimates with samtools. Picard MarkDuplicates (http://broadinstitute.github.io/picard/) was used to remove PCR duplicates, followed by sorting and indexing. MACS2 was used to call peaks on each ChIP replicate against the respective input control (Zhang et al. 2008). For P300 IPs, MACS2 called broadPeaks with FDR < 0.05 threshold and otherwise default settings. For H3K4me3, H3K4me1, H3K27ac and H3K18ac IPs, MACS2 called narrowPeaks with FDR < 0.05 threshold and flags `--nomodel --extsize 146` to bypass model building. For H3K27me3, MACS2 called broadPeaks with FDR < 0.05 threshold and flags `--nomodel --extsize 146` to bypass model building. The resulting peaks were repeat-masked by ENCODE blacklist filtering and filtered for non-standard contigs (Amemiya, Kundaje, and Boyle 2019). A naive overlapping peak set, as defined by ENCODE (Landt et al. 2012), was constructed by calling peaks on pooled replicates followed by bedtools intersect (Quinlan and Hall 2010) to select for peaks of at least 50% overlap with each biological replicate. ChIP-seq differential binding or abundance analysis was performed with csaw (Lun and Smyth 2016). Briefly, a consensus peak set was constructed for each differential experiment from the union of replicate-intersecting, filtered MACS2 peaks called in each condition. The replicate intersection criteria used here are less stringent than the naive overlap; any partial intersect between ChIP replicates was accepted as a query region tested for differential binding/abundance. ChIP reads were counted in these query regions by csaw, then filtered for low abundance peaks with 241 average log2CPM > -3. When comparing ChIP libraries, any global differences in IP efficiency observed between the two conditions were considered a result of technical bias to ensure a highly conservative analysis. As such, we employed a non-linear loess-based normalization to the peak count matrix, as is implemented in csaw (Lun and Smyth 2016), to assume a symmetrical MA distribution. A design matrix was then constructed from one ‘condition’ variable, without an intercept term. The count matrix and loess offsets were then supplied to edgeR (Robinson, McCarthy, and Smyth 2010) for estimating dispersions and fitting quasi-likelihood generalized linear models for hypothesis testing. Nearby query regions were then merged up to 500 bp apart for a maximum merged region width of 5 kb, and the most significant probability was used to represent the merged region. A FDR < 0.05 threshold was used to define significant differentially bound/abundant regions. 6.5.23 CUT&RUN analysis Analysis of CUT&RUN data followed a highly similar procedure as paired-end ChIP-seq. Briefly, raw paired-end reads for H3K27ac or IgG CUT&RUN were trimmed and aligned, filtered for only properly-paired reads, then molecular complexity was estimated, and libraries were subsampled to equalize based on complexity estimates. PCR duplicates were removed, and MACS2 was used to call narrowPeaks against the IgG negative control as input, with FDR < 0.05 threshold and flags `--nomodel --extsize 146` to bypass model building. Peaks were then blacklist-filtered, and a naive overlapping peak set was constructed as described above. Differential H3K27ac CUT&RUN analysis was computed with csaw. In order to promote similarity between the differential H3K27ac CUT&RUN and ChIP-seq experiments, we used the same consensus peak set as defined by our differential H3K27ac ChIP-seq experiment for the 242 CUT&RUN analysis here. Briefly, H3K27ac CUT&RUN reads were counted in these query regions by csaw, then filtered for low abundance peaks with average logCPM > -3. When comparing CUT&RUN libraries, any global differences in CUT&RUN reaction efficiency observed between two conditions were considered a result of technical bias to ensure a highly conservative analysis. As such, we employed a loess-based normalization to the peak count matrix, as is implemented in csaw, to assume a symmetrical MA distribution. A design matrix was then constructed from one ‘condition’ variable, without an intercept term. The count matrix and loess offsets were then supplied to edgeR for estimating dispersions and fitting the quasi-likelihood generalized linear model for hypothesis testing. Nearby query regions were then merged up to 500 bp apart for a maximum merged region width of 5 kilobases, and the most significant probability was used to represent the merged region. 6.5.24 Chromatin state analysis ChromHMM (Ernst and Kellis 2012, 2017) was used to segment the hg38 genome based on combinatorial chromatin features in control and ARID1A-depleted 12Z cell conditions. Briefly, all seven chromatin features (total RNA, ATAC, H3K4me1, H3K4me3, H3K27me3, H3K27ac, and H3K18ac) were binarized from aligned BAM files, and chromatin features were modeled in both conditions simultaneously through the “concatenated” option. The concatenated option was selected because it creates a unified model for direct comparison between control and ARID1A- depleted conditions used to identify chromatin state changes. Chromatin state models were built from 5 to 25 states, and each model was manually curated based on inferred biological function to select one with balance between unique and overlapping combinatorial features. We selected 18 states as our final model for downstream analysis. State emissions were then user-reordered to 243 group based on inferred biological function. BED files containing coordinates for each chromatin state in each condition were constructed into non-overlapping GenomicRanges objects in R for downstream enrichment analyses, differential chromatin state analysis, and plotting (Lawrence et al. 2013). 6.5.25 Bioinformatics and statistics For RNA-seq experiments, three biological replicates were analyzed (n = 3). For ChIP-seq and CUT&RUN experiments two independent IPs were used (n = 2) and were compared against a condition-respective input chromatin sample or IgG negative control, respectively. For in vivo experiments, n represents number of mice. For cell-based assays, n represents biological replicates or independent experiments as indicated in the figure legend. Multiple hypothesis tests corrections via FDR were employed when appropriate to reduce type I errors. Presented probability (p) values are representative of the associated statistical tests as indicated in the figure legends. All boxplots presented for genomic analyses are in the style of Tukey without outliers. The ROSE algorithm (Whyte et al. 2013; Loven et al. 2013) was used to define active super-enhancers from H3K27ac peaks which overlapped with accessibility (ATAC) in control 12Z cells. GeneHancer database (Fishilevich et al. 2017) was used to associate enhancers to genes with a score >1 threshold. Various HOMER (Heinz et al. 2010) functions were applied to annotate genomic regions of interest, quantify signal and count reads at sites of interest for tag density heatmaps and meta peak plots. Chromatin analyses involving ChIP signal quantification at regions of interest used pooled reads from both IP replicates, per feature. TxDb.Hsapiens.UCSC.hg38.knownGene (Bioconductor Core Team and Bioconductor Package Maintainer, 2016) was used to define gene promoters for all standard hg38 genes as 3 kilobase regions surrounding the primary TSS. MACS2 was used to 244 produce genome-wide signal log-likelihood ratio (logLR) tracks for IGV visualization (Zhang et al. 2008; Robinson et al. 2011). ClusterProfiler was used to compute and visualize pathway enrichment from a list of gene symbols with respective gene universes (Yu et al. 2012). Hallmark pathways and GO Biological Process gene sets were retrieved from MSigDB (Liberzon et al. 2015). ComplexHeatmap was used for hierarchical clustering by Euclidean distance and general heatmap visualization (Gu, Eils, and Schlesner 2016). GenomicRanges functions were frequently used to intersect and manipulate genomic coordinates (Lawrence et al. 2013), e.g., for genome- wide association tests. eulerr was used to produce proportional Euler diagrams (Larsson 2020). biomaRt was used for all gene nomenclature and ortholog conversions (Durinck et al. 2005; Durinck et al. 2009; Smedley et al. 2009). ggplot2 was used for certain plotting applications (Wickham 2016). The statistical computing language R was used for many applications throughout this manuscript (R Core Team 2018). Mantel-Cox tests and t-tests were performed using GraphPad Prism 8 software. 6.6 Data availability Data generated in this chapter are available at GEO at accession GSE148474. Re-analyzed data sets generated in previous chapters were retrieved from GEO SuperSeries accession GSE121198. 245 6.7 Acknowledgments We thank Drs. John Risinger, Jeff MacKeigan, Peter Laird, Fredric Manfredsson, and Jae-Wook Jeong for helpful discussions. We thank the Van Andel Genomics Core for providing library construction and sequencing facilities and services, and the Van Andel Histology and Pathology Core for histology services. R.L.C. was supported by an Innovative Translational Grant from the Mary Kay Foundation (026-16), a Liz Tilberis Early Career Award from the Ovarian Cancer Research Fund Alliance (OCRFA) (457446), and the NIH National Institute for Child Health and Human Development (HD099383-01). 246 CHAPTER 7 ARID1A MAINTAINS TRANSCRIPTIONALLY REPRESSIVE H3.3 THROUGH CHD4-ZMYND8 CHROMATIN INTERACTIONS This chapter is not published at the time of dissertation submission. The following study is in collaboration with Mike R. Wilson and Ronald L. Chandler of the Department of Obstetrics, Gynecology and Reproductive Biology at Michigan State University College of Human Medicine, Joel Hrit and Scott Rothbart of the Van Andel Institute Department of Epigenetics, and Marie Adams of the Van Andel Institute Genomics Core. 7.1 Abstract ARID1A normally suppresses mesenchymal gene expression and associated cell fate in the endometrial epithelium. This is achieved at least in part by monitoring super-enhancer activation states associated with H3K27-acetylation and its catalyst, P300, notably at the SERPINE1 gene. However, the precise molecular mechanisms by which ARID1A governs co-regulator activity and super-enhancers is not well understood. ARID1A mutant cancer cell lines display slightly lower global H3.3 levels. In human endometrial epithelial cells, more than half of ARID1A binding sites genome-wide are marked by H3.3, and ARID1A loss causes direct reduction of H3.3 through disrupted local chromatin interactions. H3.3 depletion leads to relief of repression at target genes co-repressed by ARID1A. Mechanistically, ARID1A interacts with H3.3 remodeler CHD4 (NuRD) to regulate H3.3 chromatin. H4K16ac-reader ZMYND8 further specifies target regulatory activity including suppression of super-enhancer hyperactivation. Altogether, ARID1A, H3.3, CHD4, and ZMYND8 co-repress expression of genes governing extracellular matrix, motility, adhesion, and epithelial-to-mesenchymal transition. Moreover, these gene expression alterations 247 are observed in human endometriomas. ARID1A collaborates with the CHD4-NuRD remodeler complex and co-regulator ZMYND8 to maintain transcriptionally repressive H3.3 chromatin at genes related to cellular invasion and ectopic spread of the endometrial epithelium. Loss of any of these factors disrupts this complex regulatory orchestration, resulting in aberrant gene upregulation and associated pathology. These studies also reveal an example of how chromatin remodelers cooperate to govern transcriptional activity, furtherer aided by histone readers. 7.2 Introduction In previous chapters, we characterized the normal and pathological roles of ARID1A in the endometrial epithelium. ARID1A normally promotes epithelial identity and suppresses expression of mesenchymal genes, at least in part through chromatin interactions that affect transcriptional activity (Wilson et al. 2019; Reske et al. 2020). In vivo, ARID1A and concomitant PI3K pathway mutations lead to collective invasion of endometrial epithelial cells through partial EMT (Wilson et al. 2019). Mechanistically, in Chapter 6 we revealed that ARID1A loss-driven invasion is dependent on P300, histone acetyltransferase and transcriptional co-activator, as P300 is required to induce pro-invasive transcriptional changes (Wilson et al. 2020). P300 loss in ARID1A mutant cells suppresses invasion and induces anchorage-dependent cell death, i.e., anoikis (Wilson et al. 2020). Antagonism between ARID1A and P300 normally occurs most notably at super-enhancers, cis-regulatory genomic regions that govern cell identity processes and are ubiquitously regulated by ARID1A (Whyte et al. 2013; Wilson et al. 2020). At a subset of super-enhancers, ARID1A maintains lower basal acetylation levels, such that ARID1A loss leads to H3K27-hyperacetylation by P300 and consequent transcriptional hyperactivation (Wilson et al. 2020). Particularly, the SERPINE1 super-enhancer is governed by ARID1A-P300 antagonism, where loss of ARID1A 248 results in P300-dependent induction of SERPINE1/PAI-1 expression that is necessary for ARID1A loss-mediated cellular invasion (Wilson et al. 2020). Despite these functional works, it remains unknown how ARID1A antagonizes P300 histone acetyltransferase activity at the level of chromatin. In this chapter, we aimed to further elucidate the chromatin mechanisms by which ARID1A suppresses transcriptional hyperactivation through in silico screens with public data, genome-wide chromatin profiling, and integrative transcriptomics. 7.3 Results 7.3.1 ARID1A mutations in cancer are associated with slight loss of histone H3.3 To reveal potential molecular mechanisms governing chromatin regulation by ARID1A, we leveraged the Cancer Cell Line Encyclopedia (CCLE) global chromatin profiling data set of bulk histone H3 peptide measurements by mass spectrometry (Ghandi et al. 2019). 896 total cancer cell lines were assayed that also have genetic mutation profiling, and 27 are specifically endometrial cancer. Cancer cell lines were segregated by ARID1A mutation status, and relative bulk H3 peptide abundances were compared between ARID1A mutant and wild-type cancer cell lines (Fig. 7.1a). This resource and analysis allowed us to identify histone peptide abundances associated with ARID1A mutation. Across cancer and specifically within endometrial cancer, we observed that ARID1A mutant lines showed overall higher levels of H3K27ac1K36me0 and H3K79me2 and lower levels of H3K4me1, H3K4me2, and H3.3 (H3.3K27me0K36me0) (Fig. 7.1b). SWI/SNF (BAF) was previously shown to biophysically and functionally regulate H3K4me1 chromatin (Local et al. 2018), so this association may be expected. The roles of SWI/SNF in remodeling and regulating H3.3-containing nucleosomes are less characterized (Pillidge and Bray 2019; Gehre et al. 2020). In contrast with H3.3, abundance of the respective 249 canonical H3 (H3.1) peptide was not different between ARID1A mutant and wild-type lines (Fig. 7.1c-d). To rule out the possibility that cancer-associated histone gene mutations could cause this phenomenon, we recapitulated these results in cell lines that are wild-type for all 74 human histone genes (Nacev et al. 2019) (Fig. F.1). While a minor effect size, the negative H3.3 association suggests that H3.3 chromatin deposition or stabilization might be mediated by ARID1A-SWI/SNF remodeler activity, and H3.3 disruption could be linked to pathogenic chromatin alterations following ARID1A mutation. Figure 7.1 Histone peptide abundance associated with ARID1A mutation in cancer cell lines 250 Figure 7.1 (cont’d) a, Representation of 42 distinct histone H3 peptide measurements by mass spectrometry in public CCLE global chromatin profiling data, specifically for endometrial cancer cell lines stratified by ARID1A mutation status. Heatmap values are relative peptide abundance Z-scores, where 0 (white) is the mean across all pan-cancer cell lines. H3 peptides were then ranked (y-axis) by differential abundance between ARID1A mutant vs. wild-type lines. b, Summary of ARID1A mutant peptide associations across all cancer cell lines (n = 896, y-axis) or specifically endometrial cancer lines (n = 27, x-axis). c-d, Box dot plots showing relative abundance of H3.3 vs. canonical H3.1 peptides in ARID1A wild-type and mutant lines. c, pan-cancer lines; d, endometrial lines. Statistic is two-tailed, unpaired Welch’s t-test. 7.3.2 ARID1A regulates H3.3+ active chromatin We next interrogated the chromatin regulatory roles of H3.3 in human 12Z endometrial epithelial cells. As discussed in Chapter 2, 12Z cells express wild-type ARID1A unlike most endometrial cancer cell lines. Further, we showed that ARID1A promotes epithelial characteristics in 12Z cells at both transcriptional and phenotypic levels, such that ARID1A loss leads to EMT and enhanced migration and invasion (Wilson et al. 2019). H3.3 ChIP-seq in control 12Z cells (n = 2 IP replicates) detected significant H3.3 enrichment at 40,006 genomic regions (Fig. 7.2a). Intronic, intergenic, and promoter regions comprised the vast majority of H3.3 enrichment sites (Fig. 7.2a), which may be expected as H3.3 is known to mark active regulatory elements such as enhancers and gene promoters (Chen, Zhao, et al. 2013). H3.3 ChIP-seq peaks were 1830 bp in width on average and ranged from <500 bp to >10 kilobases (Fig. 7.2b). Intersecting H3.3 ChIP- seq peaks with our ARID1A ChIP-seq data from these cells, generated in Chapter 2 studies, revealed that over half of each peak set overlapped, indicating widespread co-regulation (Fig. 7.2c). In Chapter 6, we constructed a genome-wide chromatin state map accompanying ARID1A loss in 12Z cells via ChromHMM (Ernst and Kellis 2012) by measuring seven chromatin features 251 associated with transcriptional regulation: total RNA, ATAC (accessibility), H3K27ac, H3K18ac, H3K4me1, H3K4me3, and H3K27me3 (Wilson et al. 2020). Like our previous observations of ARID1A regulated chromatin states, genomic H3.3 enrichment was highly associated with all active, euchromatic transcriptional features, but not heterochromatic H3K27me3 (Fig. 7.2d). Annotating H3.3 enrichment in each of our characterized chromatin states revealed that H3.3 is associated with similar regulatory chromatin states as ARID1A binding, most notably super- enhancers and active typical enhancers (Fig. 7.2e, left). In agreement, co-regulation by H3.3 and ARID1A was most prominently observed at these same chromatin states (Fig. 7.2e, center). Co- regulation was further decrypted by measuring ARID1A binding at H3.3+ vs. H3.3- chromatin sub-states, and ARID1A binding was most associated with H3.3 at promoter and genic super- enhancers and active transcription start sites (TSS) (Fig. 7.2e, right). ARID1A-H3.3 chromatin co-regulation was further examined. Mutually, we observed that genome-wide ARID1A peaks showed overall stronger ARID1A binding when H3.3 was also localized, and H3.3 was overall more abundant at genome-wide H3.3 peaks additionally bound by ARID1A (two-tailed, unpaired Wilcoxon test) (Fig. 7.2f). ARID1A and H3.3 ChIP-seq peaks were then broadly classified into promoter-proximal (located within 3 kb of a TSS) and distal (located further than 3 kb from a TSS). Interestingly, promoter-proximal ARID1A peaks were more likely to show H3.3 co-regulation than distal ARID1A peaks, whereas distal H3.3 peaks were more likely to show ARID1A co-binding than promoter-proximal H3.3 peaks (two-tailed Fisher’s exact test) (Fig. 7.2g). Further, ARID1A and H3.3 peaks were segregated into four classes based on co- regulation status and promoter vs. distal annotation. ARID1A binding was strongest at distal peaks co-regulated by H3.3, while H3.3 was most abundant at promoter-proximal peaks co-regulated by ARID1A (two-tailed, unpaired Wilcoxon test) (Fig. 7.2h). These genome-wide binding patterns 252 indicate that ARID1A and H3.3 often co-regulate active chromatin elements like enhancers and promoters, but these features may display nuanced regulation dependent upon other local chromatin activity. In Chapter 2, we reported that ARID1A chromatin binding near gene promoters is associated with transcriptional regulation, such that ARID1A loss leads to aberrant gene expression (Wilson et al. 2019). Our H3.3 data further revealed that ARID1A promoter binding is highly enriched among genes marked by promoter H3.3 (hypergeometric enrichment test) (Fig. 7.2i, top), indicating that ARID1A transcriptional regulation may be coupled with H3.3. Moreover, the 2037 genes co-marked by ARID1A and H3.3 in the promoter region were more likely to show differential expression (DE) following ARID1A loss than genes without promoter H3.3 (two- tailed, Fisher’s exact test) (Fig. 7.2i, bottom). In addition, locus-level investigation clearly showed that ARID1A and H3.3 often co-mark active chromatin regulatory elements, which infrequently also includes gene body coating by H3.3 (Fig. 7.2j). The data collectively suggest H3.3 may be linked to transcriptional regulatory activity by ARID1A at the level of chromatin. 253 Figure 7.2 Genome-wide analysis of H3.3-ARID1A chromatin co-regulation 254 Figure 7.2 (cont’d) a, Genomic annotation of 40,006 genome-wide H3.3 ChIP-seq peaks in 12Z cells. b, Distribution of H3.3 peak widths. Median H3.3 peak width is 1830 bp. c, Genome-wide overlap of ARID1A and H3.3 ChIP-seq peaks. d, Genome-wide association between H3.3 and other previously measured chromatin features, per genomic bp. e, Enrichment for H3.3 and ARID1A co-regulation across 18 chromatin states modeled in Chapter 6 studies via ChromHMM. Left, enrichment of H3.3 peaks; center, enrichment of H3.3+ARID1A binding; right, enrichment of ARID1A binding at sites with or without H3.3. f, Left, ARID1A binding levels (ChIP/input fold-enrichment) at H3.3+ vs. H3.3- ARID1A peaks. Right, H3.3 abundance (ChIP/input fold-enrichment) at ARID1A+ vs. ARID1A- H3.3 peaks. Statistic is two-tailed, unpaired Wilcoxon’s test. g, Association between ARID1A and H3.3 co-binding at promoter proximal (<3 kb from a TSS) vs. distal (>3 kb from a TSS) peaks. Statistic is two-tailed Fisher’s exact test. h, Left, ARID1A binding levels at promoter vs. distal peaks with or without H3.3. Right, H3.3 abundance at promoter vs. distal peaks with or without ARID1A. Statistic is two-tailed, unpaired Wilcoxon’s test. i, Top, enrichment of H3.3 at genes promoter-bound by ARID1A. Bottom, enrichment of ARID1A+H3.3 co-binding at genes DE following ARID1A loss (siARID1A treatment). Statistics are hypergeometric enrichment test and pairwise two-tailed Fisher’s exact test. j, Example hg38 browser shots of genes and regulatory elements co-regulated by H3.3 and ARID1A. * p < 0.05, ** p < 0.01, *** p < 0.001. 7.3.3 ARID1A chromatin interactions maintain H3.3 CCLE data suggested that ARID1A loss might affect global levels of H3.3. To test this hypothesis, we used shRNA particles targeting ARID1A (shARID1A) for acute depletion from 12Z cells then measured H3.3 dynamics by ChIP-seq. Our differential H3.3 ChIP-seq analysis (shARID1A vs. non-targeting shRNA control, n = 2) indicated that nearly 1/3 of tested H3.3 regions showed significantly perturbed H3.3 abundance (csaw/edgeR FDR < 0.05) at 72 hours following ARID1A knockdown (Fig. 7.3a and F.2a). Note, ARID1A knockdown in 12Z cells does not result in obvious change in global H3.3 levels by immunoblotting, suggesting any effects are likely context specific (Fig. F.2b). Particularly, genomic regions that showed decreasing H3.3 following ARID1A loss had overall higher baseline levels of H3.3 compared to increasing H3.3 255 regions and stable H3.3 regions that were not affected by ARID1A loss (two-tailed, unpaired Wilcoxon test) (Fig. 7.3b). We then investigated how ARID1A chromatin binding may be directly associated with the observed changes in H3.3 following ARID1A loss. Strikingly, ARID1A directly dependent H3.3 regions almost exclusively lost H3.3 and rarely gained H3.3 (Fig. 7.2c- d). While 33% of all tested H3.3 regions had detected ARID1A binding, 81% of the 8418 shARID1A decreasing H3.3 regions were normally bound by ARID1A, as opposed to only 3% of the 11,059 shARID1A increasing H3.3 regions (Fig. 7.2d). This phenomenon contrasts with our context-dependent observations of direct H3K27ac regulation by ARID1A revealed in Chapter 6, which included both agonistic and antagonistic behavior (Wilson et al. 2020). These results indicate that ARID1A interactions with H3.3 chromatin may serve to promote H3.3 incorporation or maintain its stability, supporting the observed CCLE association between ARID1A mutations and H3.3 loss. We further characterized the changes in H3.3 occurring following ARID1A loss. Gobally, we found that typical enhancers (distal regions marked by H3K27ac and ATAC) were most enriched for shARID1A-driven H3.3 alterations as compared to gene promoters and super- enhancers (Fig. F.2c). Remarkably, gene promoters displayed both decreasing and increasing H3.3, whereas typical enhancers and super-enhancers almost exclusively lost H3.3 if affected (Fig. F.2d). Among genomic H3.3 regions that were directly affected by ARID1A loss, decreasing H3.3 regions tended to display greater H3.3 flux than increasing H3.3 regions (two-tailed, unpaired Wilcoxon test) (Fig. 7.3e). Regions that displayed shARID1A decreasing H3.3 tended to have overall wider footprints than increasing or stable H3.3 regions (two-tailed, unpaired Wilcoxon test) (Fig. 7.3f). In alignment with where ARID1A-H3.3 co-regulation is most frequently observed (promoter and genic super-enhancers), genomic annotation indicated that decreasing H3.3 regions 256 were more likely to be found in intronic i.e. gene body regions, and less likely to be intergenic (Fig. 7.3g). From the 412 genes we identified with shARID1A direct decreasing promoter H3.3, we found significant enrichment for inflammatory, hypoxia, apoptosis, locomotion, and EMT pathways (Fig. 7.3h-i). Figure 7.3 Genome-wide analysis of ARID1A-dependent H3.3 257 Figure 7.3 (cont’d) a, MA plot of shARID1A vs. control differential H3.3 ChIP-seq, n = 67,502 tested genomic regions. Regions are colored based on shARID1A differential H3.3 significance. FDR < 0.05 was used as the significance threshold for all downstream analyses. b, Baseline H3.3 abundance (log2CPM) at regions with stable H3.3 following ARID1A knockdown, regions that display increasing H3.3, and regions that display decreasing H3.3. Statistic is two-tailed, unpaired Wilcoxon’s test. c, shARID1A differential H3.3 regions segregated by detection of normal ARID1A binding. Left, MA plot with all genome-wide H3.3 tested regions, colored by ARID1A binding status. Right, box plot quantification of shARID1A log2FC H3.3 abundance, segregated by ARID1A binding status. Statistic is two-tailed, unpaired Wilcoxon’s test. d, Enrichment of ARID1A binding detection at regions with decreasing H3.3 following ARID1A loss compared to all tested H3.3 regions. Statistics are hypergeometric enrichment test and pairwise two-tailed Fisher’s exact test. e, Magnitude of H3.3 change (log2FC) among ARID1A-bound (direct) shARID1A significantly decreasing vs. increasing H3.3 regions. Statistic is two-tailed, unpaired Wilcoxon’s test. f, Distribution of H3.3-enriched region widths among shARID1A stable vs. increasing vs. decreasing H3.3 regions. Statistic is two-tailed, unpaired Wilcoxon’s test. g, Genomic annotation of all tested, shARID1A increasing and decreasing H3.3 regions. h, Top 10 significant (FDR < 0.05) enriched Hallmark pathways (left) and GO Biological Process gene sets (right) among genes with shARID1A direct (ARID1A-bound) decreasing promoter H3.3. i, Representative hg38 locus (CCL2) displaying ARID1A-dependent H3.3 maintained by ARID1A chromatin interactions. 7.3.4 H3.3 depletion phenocopies transcriptional effects of ARID1A loss We next sought to determine the transcriptional consequences of H3.3 loss in endometrial epithelial cells. In 12Z cells, H3F3B (H3-3B) is the dominantly expressed H3.3 encoding gene (Fig. 7.4a). We hypothesized that H3F3B could be knocked down to reduce H3.3 levels for acute transcriptome evaluation without affecting cell health. Using siRNA targeting H3F3B (siH3F3B), we observed substantial H3.3 loss by immunoblotting without affecting cell cycle, as a barometer for cell health (Fig. 7.4b and F.3a-b). RNA-seq transcriptome analysis 72 hours following siRNA transfection showed clear loss of H3F3B expression, but not H3F3A (H3-3A), accompanying 1608 total significant DE genes (DESeq2, FDR < 0.001) including those both upregulated (repressed by H3.3) and downregulated (activated by H3.3) (Fig. 7.4c-e). As expected, we also observed highly 258 significant enrichment for H3.3 dependent transcriptional changes among genes marked by promoter H3.3 (Fig. F.3c). Similar to our previous observations with acute ARID1A loss in Chapters 2, 5, and 6, tolerable depletion of H3.3 led to mostly minor alterations in gene expression, with the majority of DE genes displaying <0.5 log2FC expression change (Fig. 7.4e). These data indicate H3.3 serves both activating and repressing roles in transcriptional regulation of endometrial epithelial cells. Comparing the gene expression changes following H3.3 loss with those following ARID1A loss, 682 overlapping genes were identified—a highly significant overlap (hypergeometric enrichment test) (Fig. 7.4f). These 682 genes were then directionally stratified to identify genes with same or different expression patterns following ARID1A vs. H3.3 loss. A significant association was observed between the effects of H3.3 and ARID1A loss indicating shared transcriptional consequences (Chi-squared test) (Fig. 7.4g). Gene expression changes also positively correlated transcriptome-wide (Pearson and Spearman correlations) (Fig. 7.4h). Intriguingly, those 682 genes affected by loss of either H3.3 or ARID1A were more likely to be repressed by H3.3 (hypergeometric enrichment test) (Fig. 7.4i). 196 genes were identified as mutually repressed by both ARID1A and H3.3, including PLAU, ADAMTS15, CD82, CCL2, and CLSTN2 (Fig. 7.4j). In agreement with the chromatin observations, these 196 co-repressed genes were enriched for similar gene sets as observed among the shARID1A direct decreasing promoter H3.3 genes, including EMT, TNFα signaling, estrogen response, apoptosis, adhesion, migration, extracellular matrix, and collagens (Fig. 7.4k). Altogether, these data suggest that ARID1A and H3.3 co-regulate expression of similar target genes in endometrial epithelial cells. At the level of chromatin, ARID1A loss-mediated depletion or destabilization of H3.3 may notably relieve repression of a physiologically relevant set of EMT and invasion genes. 259 Figure 7.4 Transcriptional effects of H3.3 depletion and overlap with ARID1A 260 Figure 7.4 (cont’d) a, Baseline relative linear expression of H3F3A (H3-3A) and H3F3B (H3-3B) gene isoforms encoding H3.3, as measured by RNA-seq. b, Western blot for H3.3 vs. total H3 in control vs. siH3F3B treated cells. c, Global transcriptomic effects of 24,192 genes following H3.3 knockdown via siH3F3B treatment. Red dots represent significant DE genes (DESeq2, FDR < 0.001). d, Relative linear expression of H3F3A and H3F3B by RNA-seq in control and siH3F3B cells. e, Volcano plot depicting siH3F3B vs. control differential gene expression. Top significant genes are labeled. f, Significant overlap in DE genes following H3.3 knockdown (siH3F3B) vs. ARID1A knockdown (siARID1A). Statistic is hypergeometric enrichment. g, Directional segregation of siH3F3B/siARID1A overlapping DE genes. A positive association is observed by Chi-squared test, i.e., genes are more likely to be upregulated or downregulated in both conditions as opposed to antagonistic regulation. h, Scatter plot of siH3F3B vs. siARID1A expression log2FC (with shrinkage correction) for all 19,900 transcriptome-wide commonly detected genes. Statistic is pearson (r) and Spearman (rs) correlation coefficients. Colored dots indicate significant DE genes (FDR < 0.001) in both treatment conditions. i, Enrichment for H3.3 repression (siH3F3B upregulation) among siH3F3B genes which are also affected by ARID1A loss (siARID1A). Statistic is hypergeometric enrichment test. j, Scatter plot of 196 shared DE genes upregulated following knockdown of either H3.3 and ARID1A. These genes are mutually repressed by H3.3 and ARID1A. k, Top significant (FDR < 0.05) enriched gene sets among the 196 ARID1A-H3.3 mutually repressed genes. 261 7.3.5 ARID1A interacts with CHD4 to co-regulate H3.3 with ZMYND8 These data clearly implicate H3.3 in ARID1A loss-mediated pathogenesis through transcriptional misregulation. However, the molecular mechanism underpinning this chromatin regulation remains to be seen, as few reports have linked SWI/SNF activity specifically to H3.3 nucleosomes (Pillidge and Bray 2019; Gehre et al. 2020). To gain insight into possible context specific chromatin regulators mediating H3.3 regulation by ARID1A, we used the comprehensive ReMap2020 database of 165 million peak regions extracted from various DNA-binding genomic assays (Cheneby et al. 2020). For all 1135 transcriptional regulators included in this database, we calculated genome-wide associations for each set of factor peaks with H3.3+ vs. H3.3- ARID1A binding. This analysis revealed that two zinc finger MYND-type proteins, ZMYND11 (BS69) and ZMYND8 (PRKCBP1, RACK7), were among the top co-regulators associated with H3.3+ ARID1A chromatin binding (Fig. 7.5a), suggesting that H3.3 regulation by ARID1A may be mediated by these co-regulators. ZMYND11 and ZMYND8 are multi-valent chromatin readers that are suggested to function as interfaces between histones and other chromatin regulator complexes like remodelers, writers, and erasers (Guo et al. 2014; Savitsky et al. 2016). Both proteins have been biochemically and biophysically characterized to interact with H3/H4 acetylated tails through bromodomains and may show specificity toward or against H3.3- containing nucleosomes (Guo et al. 2014; Wen et al. 2014; Adhikary et al. 2016; Li et al. 2016). Numerous studies have now shown ZMYND8 interacts with and can recruit the SWI/SNF-like repressive NuRD (Nucleosome Remodeling and Deacetylase complex; Mi2-β) chromatin remodeler complex (Gong et al. 2015; Adhikary et al. 2016; Savitsky et al. 2016; Spruijt et al. 2016), which contains histone deacetylases (HDACs) and interacts with H3.3 (Kraushaar et al. 2018; Tong et al. 1998). Like ARID1A, ZMYND8 has also been shown to suppress super-enhancer 262 hyperactivation (Shen, Xu, et al. 2016), and more recently, ZMYND8 and ARID1A were identified in the same screen as master regulators of EMT (Serresi et al. 2021). Therefore, we investigated the potential roles of ZMYND8 and possible NuRD co-factors as biochemically mediating ARID1A-H3.3 co-regulation (Fig. 7.5b). ARID1A co-immunoprecipitation (co-IP) was first used to detect high-stringency nuclear interactions with ZMYND8. While ZMYND8 was not detected in the ARID1A pulldown at high salt (300 mM KCl), NuRD catalytic subunit CHD4 was evident (Fig. 7.5c). We then hypothesized that CHD4-NuRD may serve as an interface between ARID1A and ZMYND8. A reciprocal CHD4 co-IP confirmed nuclear interactions with both ARID1A and ZMYND8 (Fig. 7.5d). To orthogonally suggest the possibility of native interacting complexes, a nuclear extract density sedimentation was performed via 10-30% glycerol gradient ultracentrifugation. Native fractions were observed that included ZMYND8 and members of both SWI/SNF and NuRD (Fig. 7.5e). These data suggest that a physical interaction is observed between ARID1A and CHD4 that could regulate H3.3 chromatin with support from reader ZMYND8. Genome-wide binding profiles were measured by ChIP-seq to query the chromatin regulatory activity of ZMYND8 and CHD4 in relation to ARID1A and H3.3. Roughly 2000 genomic regions were identified with H3.3 and all three chromatin regulators co-localized, such as across the ARID5B locus (Fig. 7.5f-g). Across H3.3 peaks genome-wide, we found that ARID1A binding was nearly ubiquitously detected at CHD4-bound sites, and particularly so at ZMYND8- CHD4 co-bound sites (Fig. 7.5h). However, presence of ZMYND8 at H3.3 peaks in the absence of CHD4 did not associate with ARID1A binding (Fig. 7.5h). These data hint that CHD4 may be primarily responsible for ARID1A recruitment to H3.3 chromatin, in support of the biochemical data. We further investigated the ARID1A binding strength and H3.3 abundance across H3.3 peaks 263 segregated by detection of CHD4/ZMYND8 binding. ARID1A binding was again strongest at H3.3 peaks co-bound by CHD4 as opposed to those without CHD4 (Fig. 7.5i). H3.3 abundance was similarly highest at CHD4-bound peaks, although CHD4+ZMYND8 peaks showed the overall highest H3.3 levels (Fig. 7.5i). With respect to H3.3 regions directly dependent on ARID1A, we observed that baseline H3.3 levels were significantly higher at regions that decreased in H3.3 following ARID1A knockdown if they were co-occupied by CHD4 or ZMYND8, but this was not observed at regions that gained H3.3 following ARID1A loss (two-tailed, unpaired Wilcoxon test) (Fig. 7.5j). Intriguingly, we observed that genome-wide H3.3 regions directly maintained by ARID1A chromatin interactions are associated with CHD4 but not ZMYND8 (Fig. 7.5k). In fact, regions that lose H3.3 following ARID1A loss due to disrupted ARID1A chromatin interactions tend to have higher baseline levels of ARID1A, CHD4, and H3.3, but lower levels of ZMYND8 in comparison to stable H3.3 regions (Fig. 7.5l). These results suggest ARID1A-H3.3 co- regulation may depend on CHD4 interactions, but the possible co-regulatory role of ZMYND8 is not clear. 264 Figure 7.5 Biochemical and genomic characterization of ARID1A, CHD4, and ZMYND8 chromatin interactions co-regulating H3.3 265 Figure 7.5 (cont’d) a, Genome-wide associations between ARID1A binding (peaks) at H3.3+ vs. H3.3- regions for all 1135 transcriptional regulator peak sets included in the ReMap2020 peak database. ZMYND11 and ZMYND8 are two of the top factors most associated with H3.3+ ARID1A binding (OR = 2.93 and 2.73, respectively). b, Chromatin model schematic depicting hypothesized relationship between ARID1A-SWI/SNF and ZMYND8 co-regulation of H3.3, possibly mediated by co-factors. c, ARID1A co-immunoprecipitation detecting high-stringency interaction with NuRD catalytic subunit CHD4, but not ZMYND8. d, CHD4 co-immunoprecipitation detecting high-stringency interactions with both ARID1A and ZMYND8. e, Nuclear extract density sedimentation via 10-30% glycerol gradient. Relative fractions display native protein complexes transitioning from low molecular weight (left) to high molecular weight (right). Underlined fractions highlight potential interacting native complexes containing members of SWI/SNF, NuRD, and ZMYND8. f, Genome-wide ChIP-seq peak overlaps between ARID1A, CHD4, ZMYND8, and H3.3. Peak numbers within the Euler diagram are approximations and not mutually exclusive. g, Example locus displaying ARID1A, CHD4, ZMYND8, and H3.3 co-regulation across ARID5B. h, Enrichment for ARID1A co-regulation of H3.3 peaks bound by CHD4 and/or ZMYND8. Statistic is two-tailed Fisher’s exact test. i, Average ChIP-seq signal density histograms for ARID1A (left) and H3.3 (right) at H3.3 peaks bound by CHD4 and/or ZMYND8. y-axis signal is quantified as ChIP / input fold-enrichment. j, H3.3 abundance (ChIP FPKM) at ARID1A directly-dependent H3.3 regions co-bound by CHD4 and/or ZMYND8. Abundance is quantified as log2CPM across the tested region. Statistic is two- tailed, unpaired Wilcoxon’s test. k, Positive association between CHD4 binding (top) and negative association between ZMYND8 binding (bottom) and ARID1A direct maintenance of H3.3 chromatin, genome-wide. Statistic is two-tailed Fisher’s exact test. l, Average ChIP-seq signal density histograms for ARID1A, H3.3, CHD4, and ZMYND8 across ARID1A-bound H3.3 regions that decreased or were stable with shARID1A, quantified as in i. 266 7.3.6 ARID1A-CHD4-ZMYND8 co-repress hyperactivation of H3.3+ super-enhancers While genome-wide analyses indicate ZMYND8 is not associated with ARID1A functional maintenance of H3.3, there may be context-specific dependencies on ZMYND8 co- regulation, as Chapter 6 studies revealed that ARID1A differentially regulates H3K27ac at super- enhancers vs. typical enhancers (Wilson et al. 2020). We next examined ARID1A-CHD4- ZMYND8 co-regulation of H3.3 at enhancers. At ARID1A-bound active enhancers, defined as accessible H3K27ac peaks located >3 kb from an annotated TSS (Fig. 7.6a), ZMYND8 binding is associated with presence of CHD4, as expected (Fig. 7.6b). Moreover, ZMYND8 binding was detected at 84.6% of ARID1A+CHD4 super-enhancer (n = 507) as opposed to 63.2% of ARID1A+CHD4 typical enhancers (n = 2282) (Fig. 7.6b). Importantly, ARID1A, CHD4, and ZMYND8 co-binding at active enhancers is associated with presence of H3.3, and this association is greatest at super-enhancers (Fig. 7.6c). These results indicate that ZMYND8 is associated with CHD4 and ARID1A most frequently at active super-enhancers. In Chapter 6, we observed that ARID1A suppresses H3K27-hyperacetylation by P300 at a subset of active super-enhancers (Wilson et al. 2020). ARID1A and CHD4 binding levels are not substantially different at suppressed super-enhancers that become hyperacetylated following ARID1A loss vs. those with stable acetylation (Fig. 7.6d). Strikingly, ZMYND8 binding and H3.3 abundance are significantly higher at suppressed super-enhancers that become hyperacetylated following (two-tailed, unpaired Wilcoxon test) (Fig. 7.6d). These data implicate high levels of H3.3 and ZMYND8 as marks of super-enhancers suppressed by ARID1A. 267 Figure 7.6 H3.3 enhancer regulation by ARID1A, CHD4, and ZMYND8 268 Figure 7.6 (cont’d) a, Heatmap of chromatin features at 15,925 active typical enhancers and 1374 distal super- enhancer constituents (ATAC+ H3K27ac peaks) segregated by ARID1A±CHD4±ZMYND8 binding. Enhancers are centered on the H3K27ac peak, and signal is displayed for the flanking 5 kb in either direction. b, ZMYND8 binding detection at ARID1A-bound typical and super-enhancers with or without CHD4 co-binding. Statistic is two-tailed Fisher’s exact test. c, Association between ARID1A+CHD4+ZMYND8 co-binding at enhancers and presence of H3.3. H3.3+ super-enhancers show the most frequent co-binding. d, Chromatin features at active super-enhancer constituents segregated by ARID1A loss-driven H3K27-acetylation dynamics: hyper-acetylated, de-acetylated, or stably acetylated. Left, average ChIP-seq signal density histograms across enhancer classes. Right, violin plots quantifying signal (ChIP/input fold-enrichment) across enhancer classes. Statistic is two-tailed, unpaired Wilcoxon’s test. 7.3.7 ZMYND8 specifies ARID1A-CHD4 chromatin repression toward H4(K16)ac While the ZMYND8 module appears to be associated with repressive chromatin targeting at super-enhancers, we have not yet elucidated an empirical mechanism. Histone tail reader functions of ZMYND8 are a plausible explanation through its BRD, PWWP, and PHD domains, which interact with acetylated H3/H4 residues and methylated H3 residues (Savitsky et al. 2016). Particularly, the ZMYND8 bromodomain was recently described to interact with H4-acetyl tails and recruit CHD4 to repress transcription following DNA damage (Gong et al. 2015). However, NuRD also interacts with other histone substrates such as H2A.Z, which has been linked to H3.3- mediated transcriptional poising (Chen, Zhao, et al. 2013). Of further relevance, P300 was recently shown to acetylate H2A.Z when stimulated by recognition of H4 acetylated residues through its bromodomain (Colino-Sanguino et al. 2019). To better resolve our model of ARID1A-CHD4- ZMYND8-mediated H3.3 chromatin repression, we updated our genome-wide chromatin state model to contain 5 additional features related to reported ZMYND8 and NuRD activity: H3.3, pan-H4ac (K5/K8/K12/K16), H4K16ac, H2A.Z, and H2A.Z-acetyl (K4/K7; H2A.Zac) (Fig. 7.7a 269 and F.4). We also confirmed high specificity of the anti-H2A.Zac antibody by histone peptide array (Rothbart et al. 2012; Cornett, Dickson, and Rothbart 2017) (Fig. F.5). The new 12-feature chromatin state model was quantitatively optimized at 25 chromatin states (Fig. F.6.; see Methods section 7.5.9 for details). The enhanced resolution of our model revealed new state identities such as H3.3+ poised bivalent promoters (S8), H2A.Zac+ poised enhancers (S14), and notably the segregation of upstream active promoter super-enhancers into H4(K16)ac+ (S2) and negative (S5) classes (Fig. 7A). We next determined states enriched for co-binding of ARID1A-CHD4- ZMYND8 and shARID1A direct decreasing H3.3 (Fig. 7.7b). Both upstream active promoter super-enhancer states (S2 and S5) showed the strongest enrichment for shARID1A direct decreasing H3.3, while ZMYND8 binding and ARID1A-CHD4-ZMYND8 co-binding was most strongly enriched at the H4(K16)ac+ upstream active promoter super-enhancer class (S2) (Fig. 7.7b). We further investigated chromatin state identities at reference annotated gene promoters (±3 kilobases flanking TSS). As expected, the highly active TSS state (S1) is the most prevalent promoter chromatin identity at genes transcriptionally regulated by ARID1A (siARID1A DE genes), while upstream active promoter super-enhancer states (S2, S5) are relatively rare (Fig. 7.7c, left). However, comparing the promoter chromatin state identities across siARID1A DE vs. stable genes revealed that upstream active promoter super-enhancer states are the most highly enriched chromatin states associated with ARID1A transcriptional regulation (Fig. 7.7c, center). We further segregated promoter chromatin based on genes which are upregulated with siARID1A (i.e. repressed by ARID1A) vs. downregulated with siARID1A (i.e. activated by ARID1A). The H4(K16)ac+ active promoter super-enhancer state marked by high ARID1A-CHD4-ZMYND8 co- binding (S2) showed stronger enrichment for ARID1A transcriptional repression than those 270 without H4(K16)ac (S5) (Fig. 7.7c, right). In agreement, we also observed that chromatin accessibility directly repressed by ARID1A is associated with presence of H4 acetylation (Fig. F.7). Promoter super-enhancer H3K27ac peaks were next directly segregated by detection of H4K16ac (H4K16ac+, n = 112; H4K16ac-, n = 115). ARID1A binding and H3.3 abundance were not significantly different between H4K16ac stratified super-enhancers, but ZMYND8 binding was stronger at the H4K16ac+ regions, while CHD4 binding was actually lower at H4K16ac+ regions (Fig. 7.7d). These analyses collectively suggest that H4(K16)ac marking promoter super- enhancers may recruit repressive ARID1A-CHD4-ZMYND8 regulation of H3.3. To identify genes targeted by the ARID1A-CHD4-ZMYND8 regulation of repressive H3.3, we also used siRNA to deplete CHD4 (siCHD4) and ZMYND8 (siZMYND8) followed by RNA-seq (Fig. 7.7e, F.8a-f). As expected, we observed enrichment of expression alterations following loss of ZMYND8 and CHD4 among genes with detected promoter binding by the respective factor (Fig. F.8g-h). To further support whether ZMYND8 is associated with transcriptional repression by ARID1A and H3.3, we investigated ZMYND8-dependent gene expression at siARID1A/siH3F3B DGE directional classes (refer to Fig. 7.4g-h). Indeed, siZMYND8 gene expression alterations were most strongly enriched at genes mutually repressed by ARID1A and H3.3 compared to other gene classes, where ZMYND8 also functioned mostly as a repressor (Fig. F.8i). Differential gene expression again revealed highly overlapping genes transcriptionally regulated by ARID1A, CHD4, ZMYND8, and H3.3 (Fig. F.8j-l), including 603 genes affected by each of the four knockdowns at FDR < 0.05 significance (Fig. 7.7f and F.8l). These included 60 genes mutually repressed by ARID1A, CHD4, ZMYND8, and H3.3 (Fig. 7.7g). These mechanistic co-repressed genes were enriched for EMT, adhesion, development, locomotion, collagens, and 271 extracellular matrix gene sets (Fig. 7.7h). Further, 68% of these genes were marked by gene body H4K16ac, an enrichment compared to less than half of all expressed genes (Fig. F.9). Two physiologically relevant, mechanistic target genes identified through integrative epigenomic analysis are PLAU and TRIO, both of which are located within broad H4K16ac+ domains and near active H3.3+ super-enhancers co-bound by ARID1A, CHD4, and ZMYND8 (Fig. 7.7i). Through this cooperative repressive chromatin mechanism, ARID1A loss leads to decreased promoter H3.3 abundance and transcriptional hyperactivation of PLAU and TRIO (Fig. 7.7i). 272 Figure 7.7 ZMYND8-mediated chromatin repression is specified by H4(K16)ac 273 Figure 7.7 (cont’d) a, Heatmap of clustered, normalized feature emission probabilities and associated functional annotation of the new 12 feature, genome-wide chromatin 25-state model. States (S_) are labeled based on normalized emission probability clustering. See methods for details on optimal model selection. b, Genomic enrichment for ARID1A, CHD4, ZMYND8, co-binding, and shARID1A direct decreasing H3.3 among the 25 chromatin states. Statistic is hypergeometric enrichment test. c, Modeled chromatin states among reference gene promoter regions (±3 kb around annotated TSS). Left, proportion of promoter chromatin for siARID1A DE genes (DESeq2, FDR < 0.0001) belonging to each of the 25 states. Center, ratio of promoter chromatin states associated with siARID1A DE genes (FDR < 0.0001) compared to stable genes (FDR > 0.05). Right, ratio of promoter chromatin states associated with ARID1A transcriptional repression (i.e., siARID1A upregulation) compared to activation (i.e., siARID1A downregulation). d, Violin plots quantifying chromatin feature signal at H4K16ac+ (purple) vs. H4K16ac- (gray) promoter super-enhancer constituent H3K27ac peaks. Statistic is two-tailed, unpaired Wilcoxon’s test. e, Principal component analysis of RNA-seq expression log2FC (shrinkage corrected) values for siCHD4, siZMYND8, siH3F3B, and siARID1A treatment conditions vs. controls. f, Schematic of identifying mechanistic genes co-repressed by ARID1A, H3.3, CHD4, and ZMYND8, i.e., upregulated (DESeq2, FDR < 0.05) with siARID1A, siH3F3B, siCHD4, and siZMYND8 treatments. See Fig. F.8j-l for data. g, Clustered heatmap of expression log2FC values for 60 co-repressed genes upregulated in all 4 knockdown conditions. Rightmost column demarcates presence of H4K16ac peaks over promoter or gene body. h, Top gene sets enriched (hypergeometric enrichment test, FDR < 0.05) among the 60 ARID1A- CHD4-ZMYND8-H3.3 co-repressed genes from various gene set databases. i, Example mechanistic gene loci, PLAU and TRIO, marked by H3.3+, H4K16ac+ super-enhancers that are co-bound by ARID1A, CHD4, ZMYND8, where ARID1A loss leads to significant reduction of H3.3 (FDR < 0.05) and significant expression upregulation (FDR < 0.05) following knockdown of ARID1A, H3.3, CHD4, or ZMYND8. 274 7.3.8 ARID1A-H3.3 repressed chromatin targets are aberrantly activated in human endometriomas Our studies in 12Z human endometrial epithelial cells have revealed a mechanism of cooperative chromatin regulation by ARID1A-CHD4-ZMYND8 maintenance of repressive H3.3. To support the relevance of these chromatin regulatory networks on pathologically related gene expression, we utilized an expression microarray data set comparing human endometriomas, i.e. ovarian endometriosis lesions, to control endometrial tissue samples (Hawkins et al. 2011). Endometriomas are a result of ectopic spread of endometrial tissue onto the ovary, forming cysts associated with cancer development (Grandi et al. 2015; Zondervan, Becker, and Missmer 2020). Three empirically derived gene sets were investigated for relevance in human endometrioma gene expression alterations: 1) shARID1A direct decreasing promoter H3.3 genes (n = 412), 2) ARID1A-H3.3 co-repressed genes (i.e. siARID1A/siH3F3B upregulated, FDR < 0.001, n = 196), and 3) ARID1A-H3.3-CHD4-ZMYND8 co-repressed genes (i.e. upregulated with any knockdown, FDR < 0.05, n = 60). We observed significant enrichment for all three of these gene sets among human endometrioma DGE (hypergeometric enrichment test) (Fig. 7.8a, left). Moreover, the overlapping DE genes were more likely to be upregulated in endometriomas than expected by chance (hypergeometric enrichment test), indicating relief of repression is also observed in pathology (Fig. 7.8a, right). Similarly, examining the endometrioma vs. control endometrium expression log2FC values indicated that each gene set tended to be overall upregulated in the pathological, pre-cancerous state (Fig. 7.8b). Mechanistic genes aberrantly activated in endometriomas that could be attributed to disruption of ARID1A-H3.3 chromatin repression mechanisms include C1S, SCARB1, GYPC, WWC3, COL6A2, and MAP4K4 (Fig. 7.8c). Our data indicate that ARID1A co-operatively maintains transcriptionally repressive H3.3 275 chromatin through protein-protein interactions with CHD4 and ZMYND8, likely recruited to H4(K16)ac+ chromatin, such that loss of any of these factors leads to alleviation of H3.3-mediated repression and consequential aberrant gene activation associated with pathophysiological outcomes, such as invasion and ectopic tissue spread. Figure 7.8 Mechanistic gene expression alterations in human endometriomas a, Left, enrichment for ARID1A-H3.3 co-repressive chromatin mechanistic gene sets among human endometrioma (ovarian endometriosis) vs. control endometrium DE genes reported by Hawkins et al. (Hawkins et al. 2011), compared to all unique measured genes. Right, proportion of overlapping DE genes that are upregulated vs. downregulated in endometriomas, compared to all unique measured genes. Statistic is hypergeometric enrichment. b, Box plots displaying endometrioma expression log2FC values for probes annotated to genes within mechanistic gene sets, compared to all measured probes. Statistic is two-tailed, unpaired Wilcoxon’s test. c, Relative expression box-dot plots of 6 genes upregulated in endometriomas vs. control endometrium that are co-repressed by ARID1A, H3.3, CHD4, and ZMYND8. Statistic is limma FDR-adjusted p. * p < 0.05, ** p < 0.01, *** p < 0.001. 276 7.4 Discussion We have provided evidence that SWI/SNF and NuRD cooperatively maintain histone variant H3.3 in active chromatin, likely through ARID1A-CHD4 biochemical interactions. We first identified H3.3 as an ARID1A regulated chromatin feature through CCLE data, which showed slight global reduction in ARID1A mutant lines. We then demonstrated that ARID1A and H3.3 are co-localized at active chromatin genome-wide, including super-enhancers suppressed by ARID1A, and H3.3 depletion leads to similar gene expression consequences as ARID1A loss. While SWI/SNF has not been shown to directly interact with H3.3 nucleosomes, we observed a physical interaction between ARID1A and H3.3-interacting remodeler CHD4 (NuRD) associated with genomic H3.3 regulation by ARID1A. Target regulatory activity is further specified by CHD4 interactions with histone reader ZMYND8, where ZMYND8 is recruited to H4(K16)ac chromatin through its bromodomain. This ZMYND8-mediated targeting of ARID1A-CHD4 to H3.3 is associated with repressive chromatin regulation, such that disruption of this chromatin mechanism causes relief of repression and subsequent transcriptional activation. At the root of H3.3 regulation, we hypothesize a cooperative mechanism in which ARID1A may remodel or remove canonical H3 in favor of H3.3 incorporation, and CHD4 may stabilize H3.3 nucleosomes. Other histone chaperones and co-factors are likely additionally involved in this process. SWI/SNF-NuRD interactions present a plausible mechanism underlying P300 antagonism. NuRD canonically contains HDAC1/2 (Kraushaar et al. 2018; Tong et al. 1998; Xue et al. 1998), which likely promote lower acetylation states at active regulatory elements. When SWI/SNF is depleted, such as through ARID1A mutation, NuRD may not be able to properly regulate target gene chromatin in certain contexts like super-enhancers, leading to higher acetylation states by local HAT activity. Ongoing experimentation seeks to further characterize the interdependence of 277 NuRD in H3.3 maintenance and more completely understand the biochemical and genomic interactions between ARID1A and CHD4. 7.5 Methods 7.5.1 Cell culture, siRNA transfections, and lentiviral shRNA particle usage Adherent, human 12Z endometriotic epithelial cells (Zeitvogel, Baumann, and Starzinski-Powitz 2001) were cultured in presence of 10% FBS, 1% L-glutamine, and 1% P/S. Cells were seeded in antibiotic-free media the day before siRNA transfection. 50 nM siRNA (Dharmacon, ON- TARGETplus) were transfected into cells using the Lipofectamine RNAiMAX (ThermoFisher Scientific) reagent, according to the manufacturer protocol, in OptiMEM (Gibco). Growth media was replaced 24 hours following transfection, without antibiotics. 48 hours after transfection, low serum (0.5% FBS) growth media was added with antibiotics. Cells were harvested 72 hours following siRNA transfection. Lentiviral shRNA particles were prepared with Lenti-X 293T cells (Takara) and MISSION pLKO.1 plasmids (Sigma-Aldrich) as previously described (Wilson, Reske et al. 2020). Lentiviral shRNA particles were titered using the qPCR Lentiviral Titration Kit (ABM). shRNA particles were transduced into 12Z cells at a 100-fold multiplicity of infection, and media was replaced 24 hours later. Cells were harvested 72 hours following shRNA transduction. 7.5.2 Cell cycle analysis The Click-iT Plus EdU Flow cytometry Assay Kit (Invitrogen) was used for cell cycle assays. Cells were treated with 10 mM of EdU for 2 hours in culture media. Cells were harvested by trypsinization and washed in 1% BSA in PBS. Cells were resuspended in 100 µL of ice cold PBS, 278 and 900 µL of ice cold 70% ethanol was added dropwise while vortexing. Cells were incubated on ice for two hours. Cells were washed with 1% BSA in PBS and then treated with the Click-iT Plus reaction cocktail including Alexa Fluor 488 picolyl azide according to the manufacturer’s instructions for 30 minutes. Cells were washed with 1X Click-iT permeabilization buffer and wash reagent, and then treated with 5 mM of Vybrant Dye Cycle Ruby Stain (ThermoFisher) diluted in 1% BSA in PBS for 30 minutes at 37 °C. Flow cytometry was performed using a BD Accuri C6 flow cytometer (BD Biosciences) and analyzed using FlowJo v10 software (BD Biosciences). 7.5.3 Histone extraction Cells were washed with PBS and scraped in PBS containing 5 mM sodium butyrate. Cells were centrifuged and resuspended in TEB buffer (PBS supplemented with 0.5% Triton X-100, 5 mM sodium butyrate, 2 mM phenylmethylsulfonyl fluoride, 1X protease inhibitor cocktail) and incubated on a 3D spindle nutator at 4 °C for 10 minutes. Cells were centrifuged at 3000 RPM for 10 minutes at 4 °C. TEB wash step was repeated once. Following second wash, pellet was resuspended in 0.2 N HCl, and incubated on 3D spindle nutator at 4 °C overnight. The following day, samples were neutralized with 1:10 volume 1M Tris-HCl pH 8.3. Sample was centrifuged at 3000 RPM for 10 minutes at 4 °C, and supernatant containing histone proteins was collected. 7.5.4 Co-immunoprecipitation Nuclear extracts were prepared as previously described (Chandler et al. 2013), dialyzed overnight into 0% glycerol (25 mM HEPES, 0.1 mM EDTA, 12.5 mM MgCl2, 100 mM KCl, 1 mM DTT) using a Slide-A-Lyzer G2 Dialysis Cassette (10 kDa cutoff, ThermoFisher Scientific), and quantified with the BCA Protein Assay Kit (Pierce, ThermoFisher Scientific). Primary antibodies 279 (anti-ARID1A, D2A8U, Cell Signaling; anti-CHD4, D8B12, Cell Signaling;) were conjugated to Protein A Dynabeads (Invitrogen) overnight at 4 °C in 1X PBS + 0.5% BSA. Normal rabbit IgG (Cell Signaling) IPs were performed in parallel at equivalent masses, as negative controls. 500 µg nuclear lyase was diluted into IP buffer (20 mM HEPES, 150 mM KCl, 10% glycerol, 0.2 mM EDTA, 0.1% Tween-20, 0.5 mM DTT) to a final volume of 1 mL and clarified by centrifugation. After overnight IP at 4 °C, bead slurries were washed with a series of IP buffers at different KCl concentrations: 2X washes at 150 mM, 3X washes at 300 mM, 2X washes at 100 mM, 1X wash at 60 mM. Immunoprecipitants were eluted in 2X Laemmli buffer + 100 mM DTT at 70 °C for 10 minutes with agitation. 7.5.5 Density sedimentation via glycerol gradient ultracentrifugation Nuclear extracts were prepared, dialyzed, and quantified as described in the co-IP methods section. Density sedimentation was performed and probed similar to published reports (Mashtalir et al. 2018). Briefly, 4.5 mL 10-30% linear glycerol gradients were prepared using an ÄKTA start (Cytiva) from density sedimentation buffer (25 mM HEPES, 0.1 mM EDTA, 12.5 mM MgCl2, 100 mM KCl, 1 mM DTT) additionally containing 30% and 10% glycerol for initial and target concentrations, respectively. 200 µg nuclear lyase was overlaid on the glycerol gradient followed by ultracentrifugation at 40,000 rpm in an AH-650 swinging bucket rotor (ThermoFisher Scientific) for 16 hours at 4 °C. 225 µL gradient fractions were collected and concentrated using StrataClean resin (Agilent). Concentrated fractions were eluted in 1.5X Laemmli buffer + 37.5 mM DTT and run on SDS-PAGE for immunoblotting. 280 7.5.6 Immunoblotting Whole-cell protein lysates and histone extracts were prepared as previously described (Wilson et al. 2020). Proteins were quantified with the BCA Protein Assay Kit (Pierce, ThermoFisher Scientific). Protein samples in Laemmli buffer + DTT were denatured at 94 °C for 3 minutes prior to running on SDS-PAGE gels (6% gels for co-IP and glycerol gradients, 15% gels for histone extracts, and 4-15% gradient gels for whole-cell protein lysates). Gels containing histone extracts were wet transferred to nitrocellulose membranes at 4 °C for 3 hours at 400 mA current, then dried at ambient temperature followed by rehydration in tris-buffered saline (TBS) + 0.1% Tween-20 (TBS-T) and blocking with Odyssey blocking buffer (LI-COR). All other gels were semi-dry transferred to PVDF using a Trans-Blot Turbo (Bio-Rad) according to the manufacturer’s protocol designed for high-molecular weight proteins, and blocked with either 5% BSA or 5% milk in TBS. The following primary antibodies were used: anti-ARID1A (D2A8U, Cell Signaling), anti-CHD4 (D4B7, Cell Signaling), anti-ZMYND8 (A302-089, Bethyl), anti-ZMYND8 (Atlas), anti-BRG1 (ab110641, abcam), anti-BAF155 (D7F8S, Cell Signaling), anti-HDAC1 (10E2, Cell Signaling), anti-histone H3.3 (ab176840, abcam), and anti-histone H3 (D1H2, Cell Signaling). IRDye fluorescent dye (LI-COR) secondary antibodies were used for LI-COR fluorescence-based protein visualization of histones. Horseradish peroxidase (HRP) conjugated secondary antibodies (Cell Signaling) were used for chemiluminescence-based protein visualization of all other targets. Clarity Western ECL substrate (Bio-Rad) was used to activate HRP for chemiluminescence, captured by ChemiDoc XRS+ imaging system (Bio-Rad). 281 7.5.7 mRNA-seq and analysis 72 hours after initial siRNA transfection, and 24 hours after low-sera conditioning, 12Z cells were purified for RNA using the Quick-RNA Miniprep Kit (Zymo Research). Transcriptome libraries (n = 3 replicates) were prepared by the Van Andel Research Institute Genomics Core from 500 ng of total RNA using the KAPA mRNA HyperPrep kit (v4.17) (Kapa Biosystems). RNA was sheared to 300-400 bp. Prior to PCR amplification, cDNA fragments were ligated to IDT for Illumina unique dual adapters (IDT DNA Inc). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies), QuantiFluor dsDNA System (Promega), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled and 50 bp, paired-end sequencing was performed on an Illumina NovaSeq 6000 sequencer using a 100-cycle sequencing kit (Illumina). Each library was sequenced to an average raw depth of 20-25 million reads. Base calling was done by Illumina RTA3 and output of NCS was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. For analysis, briefly, raw reads were trimmed and aligned to hg38 assembly and indexed to GENCODE (v28) along with gene feature counting via STAR (Dobin et al. 2013). Low count genes with less than 1 count per sample on average were filtered prior to count normalization and differential gene expression (DGE) analysis by DESeq2 with Empirical Bayes shrinkage for fold- change estimation (Love et al. 2015; Love, Huber, and Anders 2014). Wald probabilities were corrected for multiple testing by independent hypothesis weighting (IHW) (Ignatiadis et al. 2016) for downstream analyses. Pairwise comparisons between different DGE analyses and gene sets were initially filtered for genes with transcripts commonly detected in both cell populations. 282 7.5.8 ChIP-seq and analysis Wild-type and lentiviral shRNA particle transduced 12Z cells (72 hours following transduction) were treated with 1% formaldehyde in growth media for 10 minutes at ambient temperature. Formaldehyde was quenched by the addition of 0.125 M Glycine and incubation for 5 minutes at ambient temperature, followed by PBS wash and scraping. 1*107 crosslinked cells were used for each chromatin immunoprecipitation (ChIP) sample. Chromatin from crosslinked cells was fractionated by digestion with micrococcal nuclease using the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) according to the manufacturer protocol, followed by 30 seconds of sonication. ChIP was then performed according to the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) with the addition of 5 mM sodium butyrate to preserve histone acetylation. To each 1.25 mL IP, the following antibodies were used: 1:125 anti-histone H3.3 (2D7-H1, abnova); 1:50 anti-histone H2A.Z-acetyl (K4/K7) (D3V1I, Cell Signaling); 1:250 anti-histone H2A.Z (ab4174, abcam); 1:50 anti-acetyl-histone H4 (06-866, Millipore); 1:125 anti-histone H4K16ac (Active Motif); 1:50 anti-CHD4 (D4B7, Cell Signaling); 1:250 anti-ZMYND8 (A302-089, Bethyl). Crosslinks were reversed with 0.4 mg/mL Proteinase K (ThermoFisher) and 0.2 M NaCl at 65 °C for 2 hours. DNA was purified using the ChIP DNA Clean & Concentrator Kit (Zymo). Libraries for input and IP samples were prepared by the Van Andel Research Institute Genomics Core. 10 ng of material was used for input samples, and the entire precipitated sample was used for IPs. Libraries were generated using the KAPA Hyper Prep Kit (v5.16) (Kapa Biosystems). Prior to PCR amplification, end-repaired and A-tailed DNA fragments were ligated to IDT for Illumina UDI Adapters (IDT DNA Inc.). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies), QuantiFluor® dsDNA System (Promega) and Kapa Illumina Library Quantification qPCR assays 283 (Kapa Biosystems). Individually indexed libraries were pooled, and 50 bp, paired-end sequencing (for ZMYND8, H3.3, H2A.Zac, and H4K16ac) or 100 bp, single-end sequencing (for CHD4, H2A.Z, and pan-H4ac) was performed on an Illumina NovaSeq 6000 sequencer using a 100-cycle sequencing kit (Illumina). Each library was sequenced to minimum read depth of 80 million reads per input library and 40 million reads per IP library. Base calling was performed by Illumina NCS v2.0, and NCS output was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0. (Differential) ChIP-seq experiments were analyzed as described in Chapter 6 (Wilson et al. 2020). Re-analyzed ChIP-seq data from Chapters 2 and 6 were analyzed as respectively described. Briefly, wild-type CHD4 and differential H2A.Z and pan-H4ac ChIP-seq experiments were analyzed as single-end libraries, while wild-type ZMYND8 and differential H3.3, H2A.Zac, and H4K16ac ChIP-seq were analyzed as paired-end libraries. Raw reads for IPs and inputs were trimmed with cutadapt (Martin 2011) and Trim Galore! followed by quality control analysis via FastQC (Andrews 2010) and MultiQC (Ewels et al. 2016). Trimmed reads were aligned to GRCh38.p12 reference genome (Schneider et al. 2017) via Bowtie2 (Langmead and Salzberg 2012) with flag `--very-sensitive`. Aligned reads were sorted and indexed with samtools (Li et al. 2009). Only properly-paired read fragments were retained for paired-end libraries via samtools view with flag `-f 3` followed by sorting and indexing. For libraries intended for differential analyses, molecular complexity was then estimated from duplicate rates by ATACseqQC (Ou et al. 2018) and preseqR (Daley and Smith 2013), and libraries were subsampled to equivalent molecular complexity within an experimental design based on these estimates with samtools. Picard MarkDuplicates (http://broadinstitute.github.io/picard/) was used to remove PCR duplicates, followed by sorting and indexing. MACS2 (Zhang et al. 2008) was used to call peaks on each ChIP 284 replicate against the respective input control. For CHD4 and ZMYND8 IPs, MACS2 called broadPeaks with FDR < 0.05 threshold and otherwise default settings. For H2A.Z and H2A.Zac IPs, MACS2 called narrowPeaks with FDR < 0.05 threshold and flags `--nomodel --extsize 146` to bypass model building. For H3.3, pan-H4ac, and H4K16ac IPs, MACS2 called broadPeaks with FDR < 0.05 threshold and flags `--nomodel --extsize 146` to bypass model building. The resulting peaks were repeat-masked by ENCODE blacklist filtering and filtered for non-standard contigs (Amemiya, Kundaje, and Boyle 2019). A naive overlapping peak set, as defined by ENCODE (Landt et al. 2012), was constructed by calling peaks on pooled replicates followed by bedtools intersect (Quinlan and Hall 2010) to select for peaks of at least 50% overlap with each biological replicate. ChIP-seq differential histone abundance analysis was performed with csaw (Lun and Smyth 2016). First, a consensus peak set was constructed for each differential experiment from the union of replicate-intersecting, filtered MACS2 peak regions called in each condition. ChIP reads were counted in these query regions by csaw, then filtered for low abundance peaks with average log2CPM > -3. When comparing ChIP libraries, any global differences in IP efficiency observed between the two conditions were considered a result of technical bias to ensure a highly conservative analysis. As such, we employed a loess-based local normalization to the peak count matrix, as is implemented in csaw (Lun and Smyth 2016), to assume a symmetrical MA distribution. A design matrix was then constructed from one ‘condition’ variable, without an intercept term. The count matrix and loess offsets were then supplied to edgeR (Robinson, McCarthy, and Smyth 2010) for estimating dispersions and fitting quasi-likelihood generalized linear models for hypothesis testing. Nearby query regions were then merged up to 500 bp apart for a maximum merged region width of 5 kb, and the most significant probability was used to 285 represent the merged region. Finally, FDR < 0.05 threshold was used to define significant differentially abundant regions. 7.5.9 Chromatin state modeling and optimization The same genome-wide chromatin 18-state map of 12Z cells with or without ARID1A depletion, constructed with ChromHMM (Ernst and Kellis 2012, 2017) in Chapter 6 using total RNA, ATAC, H3K4me1, H3K4me3, H3K18ac, H3K27ac, and H3K27me3 data (Wilson et al. 2020), was re- analyzed in section 7.3.2 of this chapter. A refined ChromHMM model was constructed with further addition of H3.3, H2A.Z, H2A.Zac (K4/K7), pan-H4ac (K5/K8/K12/K16), and H4K16ac features with some procedural modifications. In order to reduce technical confounders in differential chromatin state analysis between control and ARID1A-depleted cell types, we adopted an equalized binarization framework described by Fiziev et al. (Fiziev et al. 2017). Briefly, the ChromHMM chromosomal signal intermediate files during BAM binarization were saved and imported into R. Feature signal values were then background-subtracted by respective control signals when available (e.g. input chromatin for ChIP; does not occur for ATAC). For each feature and cell type, those (background-subtracted) signal values were ranked, and the top n ranked binarization calls were selected, where n is the lower number of calls among the two cell types for the given feature. The result is a new equalized binarization, where each feature has the same number of ‘present’ region calls in both cell types, per chromosome. As an example, if H3K18ac called 27,000 present regions on chromosome 1 in control cells and 35,000 present regions in ARID1A-depleted cells, then the top 27,000 regions are retained in both cell types. Chromatin state models from 5 to 40 states were then computed using the “concatenated” approach to unify both cell types for differential state comparisons. The new chromatin state model was optimized 286 at 25 states through a strategy devised by Gorkin et al. (Gorkin et al. 2020), which utilizes the ChromHMM CompareModels function to compare feature emission parameters from the 40-state (most complex) model against all other simpler models, as well as a k-means clustering of emission probabilities from all models together and analyzing the goodness of fit. See Fig. F.6 for related analyses. Across both strategies, 25 states was observed as a threshold for >95% median maximal state correlation and goodness of fit (between-cluster vs. total sum-of-squares) relative to the most complex model. Therefore, the 25-state model was selected for downstream interpretation. 7.5.10 Histone peptide arrays Anti-acetyl-H2A.Z (K4/K7) antibody specificity was analyzed via histone peptide microarrays as previously described (Cornett, Dickson, and Rothbart 2017) with minor modifications. Arrays were designed in ArrayNinja (Dickson et al. 2016) and printed using a 2470 Arrayer (Quanterix). All hybridization and wash steps were performed at ambient temperature. Slides were blocked with hybridization buffer (1X PBS [pH 7.6], 0.1% Tween, 5% BSA) for 30 minutes, then incubated with primary antibody diluted 1:1000 in hybridization buffer for 1 hour. Slides were washed 3X for 5 minutes with PBS, then probed with Alexa647-conjugated secondary antibody diluted 1:5000 in hybridization buffer for 30 minutes. Slides were washed 3X for 5 minutes with PBS, dipped in 0.1X PBS to remove salt, and spun dry. Slides were scanned on an InnoScan 1100 microarray scanner (Innopsys) and images were analyzed and quantified using ArrayNinja. Plots were generated in Prism (GraphPad). Each peptide antigen is printed six times per array, and each antibody was screened on two separate arrays. 287 7.5.11 Bioinformatics and statistics Genomic data from Chapters 2 and 6 were re-analyzed as respectively described in those chapters. biomaRt was used for all gene nomenclature and mouse-human ortholog conversions. HOMER was used to quantify sequencing reads across sets of genomic regions including heatmaps (Heinz et al. 2010). GenomicRanges functions were used to intersect and manipulate genomic coordinates (Lawrence et al. 2013). Hierarchical clustering by Euclidean distance and heatmaps were computed and produced with ComplexHeatmap (Gu, Eils, and Schlesner 2016). ggplot2 was used for some plots in this study (Wickham 2016). The statistical language R was used for various computing and statistical functions throughout this study, such as cumulative hypergeometric distribution calculations for enrichment testing (R Core Team 2018). 7.6 Data availability Sequencing data generated in this study will be made publicly available upon publication. Re- analyzed data from previous chapters were retrieved and processed as respectively described. 7.7 Acknowledgments We thank the Van Andel Research Institute Genomics Core for providing sequencing facilities and services. We thank Dr. Jeremy Prokop for the use of his ÄKTA start FPLC to prepare linear glycerol gradients. 288 CHAPTER 8 CONCLUSION Portions of the text from this chapter were previously published (Wilson et al. 2019; Wilson et al. 2020; Reske et al. 2020; Reske, Wilson, and Chandler 2020). 8.1 Summary This dissertation has centrally investigated the roles of ARID1A, essential SWI/SNF chromatin remodeling complex subunit, and the consequences to its disruption in the endometrium at various levels of biological organization. From molecular biology to physiology, these studies have established principles of ARID1A function and ARID1A mutant pathogenesis using the endometrial epithelium as a model of disease relevance. ARID1A is mutated in nearly half of endometrial cancer tumors, and loss of ARID1A expression is observed in endometriosis. The presented experimentation has leveraged in vivo genetically engineered mice and in vitro human cell-based models—namely the 12Z human endometriotic epithelial cell line—to interrogate ARID1A functions and disease processes. As pathophysiological and cellular hallmarks of ARID1A mutant disease were uncovered in early chapters, studies in later chapters shifted increasingly toward deciphering how ARID1A governs physiologically relevant gene expression through chromatin-based mechanisms. In Chapter 2, we used Arid1afl and Arid1aV1068G mice to develop an allelic series of loss- of-function ARID1A mutations in the endometrium, each with increasing severity. We revealed that ARID1A functions as a haploinsufficient tumor suppressor in the endometrial epithelium when combined with oncogenic PIK3CAH1047R. We then employed genome-wide approaches to 289 profile gene expression and chromatin accessibility of sorted endometrial epithelial cells in vivo and identified chromatin accessibility changes at gene promoters following ARID1A loss, which correlate with changes in expression. Supporting clinical relevance, gene expression alterations in ARID1A/PIK3CA mutant mouse endometrial epithelia were reflective of invasive human endometrial tumors profiled by The Cancer Genome Atlas (TCGA). Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we showed that ARID1A binding correlates with chromatin accessibility and is associated with gene expression changes upon loss of ARID1A. We then utilized a human endometriotic epithelial cell model, 12Z, to elucidate the consequences of ARID1A loss and PIK3CAH1047R in vitro. We observed an antagonistic relationship by which ARID1A and PIK3CA mutations result in a partial epithelial-to- mesenchymal (EMT) phenotype capable of collective invasion into the uterine myometrium, where ARID1A loss is responsible for acquisition of mesenchymal and invasive features. In this context, we characterized the role of ARID1A in epithelial cell identity of the endometrium. In Chapter 3, using the in vivo sorted mouse ATAC-seq data generated in Chapter 2, we showed that different tools and normalization methods for calculating significant differentially accessible (DA) regions yielded distinct results, and we explored strategies to empirically improve data interpretation. We demonstrated that qualitative techniques like MA plots can be used to identify and address global accessibility patterns which may or may not be technical in nature. We assessed the DA outputs from different analytical approaches and observed vastly different numbers of significant genome-wide DA regions, promoter DA regions, and global accessibility trends depending on the approach used. By cross-comparing DA method outputs, we were able to define genes commonly identified as having a DA promoter across multiple approaches. We then analyzed a historic yeast ATAC-seq data set by Schep et al. (Schep et al. 2015) and showed that 290 choice of DA method can alter biological interpretation in this context. We further provided computational methodology that serves as a “one size fits all” guideline for ATAC-seq data analysis from reads to differential accessibility analysis, including testing of multiple normalization methods. Altogether, we presented evidence that researchers should be aware of differing biological interpretations resulting from different normalization methods and any biases, which may not be considered or eliminated during analysis. In Chapter 4, we explored a historic clinical observation that ARID1A mutations rarely co- occur with TP53 mutations in cancer, despite both genes being frequently mutated across human cancer. We showed that endometrial cancer displays the highest mutual exclusivity rate for TP53 and ARID1A mutations, irrespective of histological subtype, across over 10,000 human tumors profiled by TCGA. We developed a genetically engineered mouse model with co-existent TP53 loss-of-function and oncogenic PIK3CAH1047R activation specifically in the endometrial epithelium. To discern both overlapping and distinct molecular features associated with TP53 or ARID1A mutation, endometrial epithelial cells were then isolated from this model, profiled by RNA-seq, and compared to control cells and ARID1A/PIK3CA mutant cell data generated in Chapter 2. These mouse model data were compared with human tumor data to determine cross- species gene expression signatures associated with TP53 and ARID1A mutation status. We revealed that ARID1A mutant tumors display p53 pathway activation in endometrial cancer and across cancer, and ARID1A directly regulates TP53 target genes in vivo. Finally, we developed mice simultaneously harboring endometrial mutations in TP53, ARID1A, and PIK3CA, which developed invasive adenocarcinoma, despite presumed mutual exclusivity. We further showed that ARID1A directly represses promoter chromatin at TP53 target gene Atf3, and ATF3 induction in ARID1A mutant cells is associated with invasive squamous differentiation independent of TP53 291 mutation status. Collectively, these studies suggest that co-existing TP53 and ARID1A mutations promote invasive endometrial cancer. In Chapter 5, we examined the consequences of deleterious mutations in BRG1 (SMARCA4), the SWI/SNF catalytic subunit, in the endometrium and compared those to ARID1A. In human and mouse model systems, we determined both shared and unique roles of BRG1 and ARID1A in the endometrial epithelium. In 12Z cells, BRG1 loss has a greater effect on gene expression than ARID1A loss, and ChIP-seq studies revealed that BRG1 and ARID1A co-regulate target chromatin at both gene promoters and distal regulatory elements. However, we also identified subunit-specific differences in transcriptional regulation, where BRG1 functions most toward gene activation and ARID1A functions most toward gene repression. We then developed a genetically engineered mouse model in which BRG1 is specifically deleted in the endometrial epithelium, and endometrial glands were observed in myometrium without obvious hyperplasia or oncogenic transformation. BRG1-null endometrial epithelia were isolated and profiled by RNA-seq, and comparative analyses support that both BRG1 and ARID1A mutations disrupt epithelial integrity. These experiments suggest that endometrial SWI/SNF mutations contribute to pathogenesis through some shared mechanisms, despite global differences in transcriptional regulation. In Chapter 6, we identified a chromatin mechanism by which ARID1A represses invasive phenotypes in the endometrial epithelium by antagonizing P300 histone acetyltransferase (HAT) hyperactivity toward H3K27 at super-enhancers. We first constructed a genome-wide chromatin state map accompanying ARID1A loss in 12Z cells by measuring genomic profiles of seven chromatin features associated with transcriptional regulation. We observed that ARID1A regulation is most associated with highly active cis-regulatory elements marked by H3K27ac and 292 accessibility, such as enhancers and super-enhancers. Notably, we found that ARID1A is normally responsible for maintaining typical enhancer activity but suppressing super-enhancer hyperactivation. Functionally, we demonstrated that ARID1A loss leads to P300-dependent hyperacetylation of the SERPINE1 super-enhancer, leading to upregulated SERPINE1/PAI-1 expression that is essential for ARID1A mutant invasion. In genetically engineered mice, genetic P300 loss reduced disease burden and promoted cell death in the ARID1A/PIK3CA mutant mouse model. Further, therapeutic targeting of P300 by HAT inhibition rescues SERPINE1 hyperacetylation in ARID1A-deficient cells, preventing invasion phenotypes. These studies provide pre-clinical rationale toward epigenetic therapy in the treatment of invasive ARID1A mutant endometrial diseases. Finally, in Chapter 7, we found that ARID1A mutant cancer cell lines display slightly lower global abundance of histone variant H3.3, suggesting regulation of nucleosome composition may be a molecular function of ARID1A-SWI/SNF. We then observed that ARID1A co-localizes genome-wide with H3.3 at active chromatin, and H3.3 depletion phenocopies the transcriptional effects of ARID1A loss. Mechanistically, we revealed that ARID1A directly promotes the maintenance of H3.3 associated with biochemical interactions with the SWI/SNF-like CHD4 (NuRD) remodeler complex. This regulation is further specified toward repression of transcriptional hyperactivation through CHD4 interactions with the histone multi-reader ZMYND8 to target H4(K16)-acetylated regions, including a subset of super-enhancers. We finally reveal that this mechanism of ARID1A, H3.3, CHD4, and ZMYND8 transcriptional co-repression targets pathophysiologically relevant genes involved in EMT and cellular invasion, and these genes are aberrantly upregulated in pre-cancerous human endometriomas. 293 Altogether, these works have identified key functions of ARID1A and hallmarks of ARID1A mutant disease pathogenesis in the endometrium. ARID1A is a bona fide regulator of the endometrial epithelium. ARID1A is an endometrial tumor suppressor that cooperates with other tumor suppressors and oncogenes. ARID1A maintains endometrial epithelial cell identity and prevents cellular migration and invasion in the endometrial epithelium. ARID1A is a transcriptional regulator through chromatin-based activities that affect the epigenome. In the following sections, major concepts derived from these studies will be elaborated. 8.2 ARID1A as a regulator of the endometrial epithelium High ARID1A mutation rates in diseases originating from the endometrial epithelium, like endometriosis, endometrial cancer, and endometrioid ovarian cancer, support a cell-autonomous role for ARID1A. In this dissertation, maintenance of cell identity is established as a hallmark of ARID1A and SWI/SNF function in the endometrial epithelium. ARID1A mutations may increase the risk of endometriosis and malignant transformation by providing a selective advantage to displaced endometrial cells undergoing retrograde menstruation (Suda et al. 2018). Alterations in endometrial cell identity, such as the transdifferentiation of endometrial epithelium, promote the acquisition of invasive cell properties, a feature often observed in mesenchymal cells (Bartley et al. 2014; Bilyk et al. 2017; Yang and Yang 2017). Cellular invasion requires cells to migrate, degrade the extracellular matrix, and survive under anchorage-independent conditions (Kalluri and Weinberg 2009; Mareel and Leroy 2003). These properties allow abnormal endometrial cells to spread locally or colonize distal sites. The phenotypes observed following ARID1A or SWI/SNF disruption in the endometrial epithelium are likely reflective of the functions of this complex in normal endometrial physiology. 294 As ARID1A deficiency leads to loss of epithelial features like E-cadherin and tight junction protein-1 (ZO-1) and gain of mesenchymal markers like SNAI1/SNAI2 and Twist, ARID1A may normally promote endometrial plasticity by limiting the differentiation capacity of the epithelial cells. In this sense, ARID1A may be a critical mediator of processes such as menstrual cycling and implantation, where endometrial remodeling must be highly controlled to prevent aberrant pathology. Throughout the menstrual cycle, constant tissue remodeling is stimulated by steroid hormone cycling to permit blastocyst implantation (Reed and Carr 2000). Proper uterine receptivity during the window of implantation requires extracellular matrix remodeling and loss of cell-cell interactions to facilitate embryo invasion (Kim and Kim 2017). If implantation does not occur, progesterone withdrawal during menstruation promotes endometrial tissue desquamation and sloughing into the lumen for further regeneration in the next cycle (Reed and Carr 2000), and this process also requires extracellular matrix remodeling and loss of adhesion (Gaide Chevronnay et al. 2009; Tabibzadeh 1996). The loss of cellular junctions and adhesion observed following genetic SWI/SNF loss in this context might suggest that SWI/SNF is instrumental to governing these processes in healthy endometrial physiology. It is possible that SWI/SNF activity may be altered during certain periods of the menstrual cycle to carefully promote such behaviors appropriately under normal physiological conditions. As will be discussed in future directions (section 8.8), current efforts aim to understand how ARID1A and SWI/SNF govern such physiological processes by properly orchestrating EMT and invasive chromatin programs in a non-pathological state. 295 8.3 ARID1A as an endometrial tumor suppressor In the mouse, Chapter 2 studies revealed that ARID1A is haploinsufficient in the context of oncogenic PIK3CAH1047R. LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ (ARID1A heterozygosity with PIK3CAH1047R) is sufficient to drive tumorigenesis and is nearly identical to LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (ARID1A nullizygosity with PIK3CAH1047R) with respect to tumor burden, survival, gene expression, and chromatin accessibility changes. This is consistent with the spectrum of single-hit ARID1A mutations observed in endometrial cancer, in which a majority of patients have only a single ARID1A nonsense mutation. Previous studies suggested ARID1A functions as a haploinsufficient tumor suppressor (Wu and Roberts 2013) in ovarian (Jones et al. 2010; Wiegand et al. 2010), breast (Mamo et al. 2012), gastric (Zang et al. 2012), and liver cancer (Guichard et al. 2012). As a haploinsufficient protein, ARID1A expression or mutation status alone may not predict disease status, as heterozygous mutations or epigenetic silencing may be sufficient to drive ARID1A loss-dependent changes in gene expression or pathophysiological transformation. Additionally, heterozygous loss of ARID1A may promote metastasis at late stages of the tumor progression, as observed in liver cancer (Sun et al. 2017). ARID1A levels may be regulated throughout the menstrual cycle and mediate dissociation of decidua from the uterus. In this case, ARID1A heterozygosity may suffice for oncogenesis during points of low ARID1A expression, which may account for the ARID1A genetic differences observed between the present mouse model and epithelial ovarian cancer models (Guan et al. 2014; Chandler et al. 2015). This model would be consistent with the high ARID1A mutation rates in endometrial cancer. However, despite haploinsufficiency, we showed that in vivo ARID1A loss causes endometrial epithelial cells to undergo EMT which triggers apoptosis probably through anchorage- dependent cell death, i.e. anoikis (Paoli, Giannoni, and Chiarugi 2013). The addition of the 296 PIK3CA mutation promotes a partial EMT state, resulting in collective invasion, and likely allows ARID1A mutant cells to bypass anoikis. Recent evidence supports that clonal PIK3CA mutations are observed in non-malignant, eutopic endometrium, but ARID1A mutations are not (Suda et al. 2018). Based on these data, endometrial PIK3CA mutations likely occur before deleterious ARID1A mutations during endometriosis or endometrial cancer development, and it is unlikely that ARID1A loss would serve as an initial tumorigenic event. Deleterious mutations in PTEN, the PIP3 phosphatase and negative regulator of PI3K signaling, are also frequent in endometrial cancer. It is hypothesized that ARID1A/PTEN mutant mice would develop similar endometrial phenotypes as ARID1A/PIK3CA mutants. As reviewed, ARID1A also has been observed as a metastatic gene in numerous tissue contexts. Chapter 4 studies support that ARID1A mutations may promote invasive or metastatic dissemination of TP53 mutant endometrial tumors. Currently, one large-scale sequencing study has focused specifically on endometrial cancer metastasis, and frequent, subclonal ARID1A mutations were observed in metastatic lesions (Gibson et al. 2016). In that study, ARID1A mutations were more associated with late-occurring metastatic endometrial cancer clones as compared to other frequently mutated driver genes like TP53 associated with early tumorigenesis. These results further support an additional, genetic context-dependent role for ARID1A mutations as metastatic drivers in endometrial cancer. In Chapter 4, we further characterized the effects of endometrial ARID1A mutations in the mouse, remarking that a subset of ARID1A/PIK3CA mutant endometrial epithelial cells differentiate toward a TP63+ squamous phenotype associated with the collective invasion feature. TP63 is a classical marker of squamous differentiation in cancer (Kaufmann et al. 2001; Blanco et al. 2013), and our studies suggest ATF3 induction is associated with TP63 expression and 297 squamous phenotypes in invading endometrial epithelial cells following ARID1A loss. Our ARID1A in vivo binding data indicated that ARID1A normally directly represses Atf3 chromatin in the endometrial epithelium. Previous studies have identified roles for ATF3 in squamous tumors (Wu et al. 2010; Xie et al. 2014; Shi et al. 2018; Xu et al. 2021), and ATF3 promotes epithelial squamous differentiation in vivo (Wang et al. 2007; Wang, Arantes, et al. 2008). Squamous differentiation is observed in 25% of human endometrial cancer and was recently associated with disease recurrence, and it has been observed in mouse models (Zaino and Kurman 1988; Shen, Yao, et al. 2016; Andrade et al. 2019). ARID1A and SWI/SNF mutations have been shown to promote carcinogenesis of various squamous tumor types (Achenbach et al. 2020; Luo et al. 2020). A recent mouse model of ARID1A deletion paired with oncogenic KRASG12D developed invasive vaginal squamous cell carcinoma (Wang et al. 2021). TP63+ squamous differentiation in invasive endometrial cancer was also recently reported in a subset of Fbxw7/Pten knockout mice (Cuevas et al. 2019). Another mouse model with uterine-specific β-catenin deletion showed increased TP63 expression and squamous differentiation in endometrial epithelia (Jeong et al. 2009). TP63 marks basal endometrial epithelial cells during fetal life (O'Connell et al. 2001), and TP63+ basal cells are reactivated in GATA2 knockout and SOX17 knockout mouse uteri (Rubel et al. 2016; Wang et al. 2018). Transcriptional regulation by TP63 has been previously linked to cellular migration and invasion (Gu et al. 2008). In our study, a subset of direct TP63 target genes are activated following ARID1A mutation in both genetic mouse models and human tumors, including COL17A1. Given the reported links between ATF3 and TP53 post-translational regulation, stability, and target gene regulation, it remains possible that ATF3 regulates TP63. Additional work will be required to determine the functional relationship between ATF3 and TP63 in endometrial squamous differentiation. 298 In addition to EMT and invasion, ARID1A and SWI/SNF are implicated in regulating genomic stability, such that ARID1A inactivation is associated with increased DNA damage, but the roles of ARID1A in apoptotic signaling are not well characterized and may be context dependent (Luo et al. 2008; Shen et al. 2015; Hiramatsu et al. 2019; Zhao et al. 2019). Genetic experiments in Chapter 4 suggest that TP53 mutant endometrial epithelia display increased caspase-mediated apoptosis, and further ARID1A loss in TP53-ARID1A mutant mice suppresses cell death. ATF3 has well established ties to apoptosis and cell death regulation despite context- dependency (Lu, Wolfgang, and Hai 2006; Tanaka et al. 2011), and ATF3 induction in ARID1A- deficient endometrial epithelia may suppress pro-apoptotic genes in the absence of TP53. TP63 has also been previously shown to antagonize BCL-2-related, pro-apoptotic transcriptional programs (Rocco et al. 2006). ATF3 induction and associated TP63 could be a mechanism of how ARID1A loss causes cells to bypass cell death, perhaps explaining why ARID1A mutant tumors are dependent on p53 pathway function, but not TP53 itself. In addition, Chapter 4 genome-wide ARID1A in vivo binding data in mouse endometrial epithelial cells indicated that ARID1A directly regulates chromatin at numerous apoptosis genes. The TNF-mediated extrinsic apoptotic pathway is also regulated by the p53 pathway to some extent but is less characterized (Aubrey et al. 2018). 8.4 Relationship between ARID1A and other cancer-associated genes 8.4.1 PIK3CA ARID1A and PIK3CA mutations frequently co-occur in endometrial cancer as evidenced by TCGA-UCEC data. As reviewed, PIK3CA is a member of the PI3-K pathway and is highly mutated in human cancer. In Chapter 2, we found that mice with endometrial epithelial ARID1A deletion underwent EMT but did not develop hyperplasia or invade myometrium until paired with 299 oncogenic PIK3CAH1047R. Without the support of PIK3CA mutation, EMT driven by ARID1A loss likely triggers anoikis, as caspase-mediated cell death is observed in the LtfCre0/+; Arid1afl/fl endometrium. Interestingly, our functional analysis of the independent and combined effects of ARID1A loss and oncogenic PIK3CAH1047R indicate an antagonistic relationship, where ARID1A loss is the major driver of EMT and invasion, while PIK3CA mutations temper EMT and allow cells to bypass apoptosis. The retention of some epithelial characteristics upon PIK3CAH1047R expression may facilitate the establishment of epithelial tumors (Polyak and Weinberg 2009). Epithelial transdifferentiation is a proposed mechanism by which normal epithelia convert into abnormal epithelia without undergoing an mesenchymal cell intermediate (Zeisberg and Neilson 2009). PIK3CA mutation is an early event in atypical hyperplasia (Berg et al. 2015), whereas loss of ARID1A immunoreactivity correlates with malignant transformation in endometrial cancer (Yen et al. 2018). A recent study identified PIK3CA as being commonly mutated in endometrial glands, often without transformation, suggesting PIK3CA mutation as an early event, with ARID1A mutation coming later in the progression of endometriosis (Suda et al. 2018). ARID1A mutations have previously been implicated in invasion during metastasis (Sun et al. 2017; Li et al. 2017; Yan et al. 2014; Lakshminarasimhan et al. 2017). In the ARID1A/PIK3CA mutant endometrial epithelium, PI3K activation may partially suppress the full acquisition of mesenchymal phenotypes upon ARID1A loss, resulting in an abnormal epithelial state with invasive properties. PI3K activation may also allow cells to bypass the endometrial epithelial cell apoptosis observed in LtfCre0/+; Arid1afl/fl mice. This may be another reason why ARID1A mutations are commonly observed alongside activating PI3K mutations in neoplasms originating from the endometrial epithelium. As has been discussed, it is speculated that mutations in other 300 PI3K pathway genes like PTEN may also provide similar functionality as oncogenic PIK3CAH1047R, promoting partial EMT and permitting cells to bypass anoikis. 8.4.2 TP53 In Chapter 4, we explored the relationship between endometrial ARID1A and TP53 mutations in mouse models and human tumors profiled by TCGA. As reviewed, numerous reports have accounted that TP53 and ARID1A alterations rarely co-exist in various cancer types. In our study, gene expression changes associated with each mutation support mutation-specific programs that differentially affect cellular function, though commonalities are also found. Extensive pathway analysis in genetic mouse models and human tumors summarized distinct hallmarks of TP53 and ARID1A mutant tumors: TP53 mutations are associated with epithelial dedifferentiation and loss of intrinsic apoptosis and other p53-mediated cellular processes, while ARID1A mutations are associated with gene expression signatures related to EMT, cell cycle, cell migration, and the p53 pathway. In the endometrium, ARID1A directly regulates promoter chromatin at p53 target genes, which notably includes repression of Atf3. As reviewed, stress-induced ATF3 has numerous characterized roles in p53 signaling, cell death, and senescence, and our data support it as a marker of squamous epithelium. In human TCGA-UCEC samples, ARID1A mutant tumors have higher TP63 gene expression than TP53 mutant tumors. We hypothesize that ATF3 induction is associated with squamous differentiation and anti-apoptotic mechanisms in the endometrium. Despite accounts of mutual exclusivity observed in primary tumors, our genetic mouse model experiments indicate co-existing TP53 and ARID1A mutations are tolerated in vivo and promote more aggressive cancer phenotypes than either mutation independently. Myometrial invasion is suggested to be driven at least partially through squamous differentiation following 301 ARID1A loss, since the invasive front cells were TP63/COL17A1+. Given the invasiveness observed in TP53/ARID1A/PIK3CA co-mutant mice, we hypothesize that a human TP53 mutant endometrial cancer cell that acquires an ARID1A mutation will gain metastatic properties. This hypothesis may explain the rarity of TP53/ARID1A co-mutant samples in clinical cohorts like TCGA, which have mostly focused on sequencing primary tumors. The observation that TP53/ARID1A co-alterations are enriched among POLE mutant ultra-mutators could be interpreted possibly such that these are often non-functional passenger mutations. However, our data suggest that functional deleterious ARID1A mutation in a TP53 mutant primary may promote metastatic dissemination. In addition, it is worth noting that 3.3% of POLE wild-type tumors profiled by MSK-IMPACT are TP53/ARID1A co-mutant, which could be functional mutations. Outside of POLE ultra-mutators, our analysis of the TCGA molecular subtype framework suggests that TP53/ARID1A co-mutated primary UCEC tumors are most likely to present as microsatellite instable (MSI). A recent study examined cases of synchronous endometrial and epithelial ovarian cancers and found 4 out of 7 TP53 mutant tumors also harbored ARID1A mutations (Hajkova et al. 2019). Additional clinical sequencing of matched primary and metastases from endometrial cancer patients would provide additional support for these findings. We and others have now shown in various tissue contexts that deleterious ARID1A alterations promote metastatic molecular and cellular signatures, such as EMT induction (Yan et al. 2014; He et al. 2015; Zhang et al. 2018; Wilson et al. 2019; Yang et al. 2019). However, TP53 has also long been considered a metastasis gene in addition to displaying general tumor suppressor properties. Early studies discovered that dedifferentiation and high-grade status of some tumor types were associated with TP53 loss and highly malignant behavior (Aloni-Grinstein et al. 2014). TP53 has been demonstrated to promote epithelialization through cell-cell adhesion and 302 maintenance of the extracellular matrix, and it also regulates cell migration and invasion, stemness, EMT, and anoikis phenotypes (Powell, Piwnica-Worms, and Piwnica-Worms 2014). Classically, uterine endometrioid adenocarcinomas, which are most associated with ARID1A mutations, are often indolent, low-grade tumors not generally associated with metastatic risk (Morice et al. 2016). On the other hand, uterine serous carcinomas associated with TP53 mutations are generally higher grade and more aggressive. Numerous genetically engineered mouse model studies have demonstrated that TP53 functions as a tumor suppressor in the endometrium (Daikoku et al. 2008; Wild et al. 2012; Akbay et al. 2013; Guo et al. 2019). Gene expression pathway analysis from this chapter supports endometrial TP53 loss as leading to epithelial dedifferentiation. This phenotype is distinct from ARID1A deficient lesions, which also lose epithelial features like cell adhesion and cell-cell junctions but acquire invasive, mesenchymal characteristics akin to canonical EMT or transdifferentiation (Wilson et al. 2019), in addition to a subset of cells undergoing squamous metaplasia in our ARID1A mutant mouse model. In TP53 mutant serous primary tumor cells, further ARID1A mutation may trigger metastatic progression. In ARID1A mutant cells with intact TP53 signaling, we suspect that TP53 suppresses proliferation and induces senescence, given the phenotypes observed and upregulation of TP53 target genes, such as p21 (CDKN1A) and CHOP (DDIT3). In addition, senescence-associated secretory phenotypes (SASP) are associated with metastatic invasion (Coppe et al. 2010). Further experiments will be required to fully understand the phenotypic effects of ARID1A mutations in TP53 mutant cells, and vice versa. The root of mutation mutual exclusivity could be attributed to interconnected transcriptional regulation by ARID1A and TP53. Our in vivo binding data revealed that ARID1A interacts with chromatin near TP53 target genes, notably including direct repression of stress- induced transcription factor Atf3. ATF3 also has been shown to bind 20-40% of TP53 genomic 303 targets (Tanaka et al. 2011; Zhao et al. 2016), suggesting ARID1A loss probably affects multiple aspects of TP53 regulated chromatin. Recently, integrative transcriptomic analyses in mouse tumors and human cell-based models of ARID1A mutant endometrial tumorigenesis have identified the p53 pathway as a key ARID1A-mediated signaling network (Suryo Rahmanto et al. 2020). Our analyses suggest that ARID1A mutant tumors are dependent on p53 pathway function. In TP53 mutant tumors, ATF3 induction and TP63+ squamous differentiation following ARID1A loss could be a compensatory mechanism that partially restores p53 pathway function. Endoplasmic reticulum (ER) stress related gene sets were the most highly enriched GO terms among ARID1A promoter bound genes in vivo endometrial epithelia, and ER stress crosstalk between ARID1A and TP53 could contribute to interrelated transcriptional and cell death regulation (Sano and Reed 2013; Gonzalez-Quiroz et al. 2020). 8.4.3 SMARCA4 As reviewed, SMARCA4 (BRG1) mutations are also observed in various cancers including endometrial. Described in Chapter 5 mouse studies, BRG1-nullizygous endometrial epithelia are observed in the uterine myometrium, a hallmark of adenomyosis, without obvious carcinogenic transformation or hyperplasia. Unlike ARID1A mutant endometrium, PIK3CA mutations are not required for BRG1 mutant cell invasion or displacement of endometrial glands into myometrium. Therefore, additional cellular properties are likely provided by BRG1 null states that do not occur following ARID1A disruption alone. This is evidenced by transcriptomic studies in 12Z cells described here, which demonstrate that BRG1 loss elicits a greater effect on gene expression than ARID1A loss. This may be expected, as BRG1 functions as a core catalytic subunit of additional SWI/SNF complexes containing other ARID subunits (Wilson and Roberts 2011). We speculate 304 that BRG1 loss allows endometrial epithelium to undergo matrix detachment or loss of cell adhesion and bypass anoikis (anchorage-dependent cell death) in the absence of PIK3CA alteration, and PIK3CA mutations may not be required for myometrial translocation in the context of SMARCA4 mutations. Notably, adenomyosis is a reported risk factor for endometrial cancer development, so it is possible that some BRG1 mutant endometrial cancers may originate from adenomyotic lesions. A whole-exome sequencing study in adenomyotic lesions detected possible ARID1A mutations, but they could not be verified by targeted sequencing potentially due to insufficient sequencing read depth required to detect rare variant alleles (Inoue et al. 2019). In these studies, we did not generate SMARCA4/PIK3CA mutant mice, so we are unaware if these mice would develop hyperplasia or carcinogenic phenotypes akin to ARID1A/PIK3CA mutants, but it is speculated. Further, BRG1-null endometrial epithelia do not express EMT transcriptional signatures nor collectively invade myometrium, unlike ARID1A/PIK3CA mutants. The presence or absence of collective invasion in SMARCA4/PIK3CA mutant mice would permit conclusion that ARID1A loss indeed confers different phenotypes than BRG1 loss with respect to this key pathological feature. It is possible that BRG1-null endometrial epithelia may passively invade myometrium through loss of cellular junctions and adhesion. This hypothesis aligns with the dedifferentiated and undifferentiated features observed in some SMARCA4-deficient endometrial carcinomas (Kolin et al. 2020). 8.5 ARID1A as a mediator of epithelial identity and invasion In Chapter 2, we demonstrated a cell-autonomous role for ARID1A in the preservation of endometrial epithelial cell identity and EMT regulation through genetically engineered mouse models and human cell-based assays. We show LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl cells gain 305 VIM and ICAM-1 and invade the myometrium but retain CDH1, EPCAM and KRT8 expression, suggesting an incomplete EMT phenotype (Nieto et al. 2016). VIM expression is upregulated in epithelial tumors of uterine corpus origin, but not epithelial tumors of ovarian origin (Desouki et al. 2014). ICAM-1 is expressed in migratory endometrial cancer (Menkhorst et al. 2018) and is linked to increased peritoneal adhesion in endometriosis (Rutherford, Hill, and Hopkins 2018). VIM and ICAM-1 may serve as markers of ARID1A-negative tumors of endometrial origin. Partial or incomplete EMT is associated with invasive phenotypes in various cancers (Nieto et al. 2016; Friedl and Gilmour 2009). In endometrial cancer, EMT is thought to play a role in myometrial invasion (Mirantes et al. 2013). In this study, we found that ARID1A loss and PI3K activation in endometrial epithelium leads to enhanced migration and invasion in vitro and myometrial invasion in vivo, reflecting the myometrial invasion phenotypes observed clinically. Myometrial invasion in endometrial cancer correlates with distal metastases, disease recurrence, and adenomyosis (Euscher et al. 2013; Morice et al. 2016; Ismiil et al. 2007). endometrial cancer patients with gene expression signatures most similar to LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl had greater tumor invasion and higher tumor grade. The collective migration of mutant endometrial epithelium undergoing partial EMT may enhance the invasive properties of endometrial cancer, permitting myometrial invasion. The partial EMT phenotype may increase the invasive potential of the affected cells. The expression of EMT factors is increased at the myoinvasive front of endometrial cancers (Mirantes et al. 2013), and endometriotic lesions retain CDH1 expression (Bartley et al. 2014), suggesting collective migration rather than single-cell metastasis (Lambert, Pattabiraman, and Weinberg 2017). Within primary tumors, adjacent cells may differentiate into different intermediate stages along the EMT-spectrum due to differing stimulus within the tumor microenvironment, including 306 surrounding stroma (Nieto et al. 2016). Invasive, mesenchymal-like cells may lead the way for cohorts of epithelial cells with which they retain some cell-cell junctions (Lambert, Pattabiraman, and Weinberg 2017). Upon arrival to metastatic sites, lack of stromal signals present at the site of origin may allow for epithelial gland formation (Polyak and Weinberg 2009). This may explain the formation of endometrial glands outside of the endometrium derived from cells with mesenchymal-like invasiveness. In Chapter 4 we revealed that ARID1A directly represses Atf3 promoter chromatin in the endometrial epithelium in vivo, and ARID1A loss induces ATF3 expression associated with TP63+ squamous differentiation independent of TP53 mutation status. Similar to our findings, ARID1A downregulation in testicular germ cell tumors also leads to ATF3 upregulation (Kurz et al. 2020). We also showed ATF3 expression in the vaginal squamous epithelium is strongest near basal epithelial cells marked by TP63 and COL17A1, indicating that ATF3 may be linked to squamous differentiation or metaplasia observed in ARID1A-deficient endometrial epithelia. Remarkably, previous reports have shown that ATF3 functions are context-dependent, and ATF3 may promote apoptosis or suppresses proapoptotic genes depending on whether it is activated in healthy versus tumor cells (Lu, Wolfgang, and Hai 2006; Tanaka et al. 2011). In Chapter 5, we proposed that the loss of cellular adhesion and cellular junction programs may be a common mechanism by which invasive phenotypes manifest in SWI/SNF mutant endometrial pathologies. Cell-cell adhesion and cell-matrix interactions are characteristic properties of epithelial cells required for various cellular processes (Jandial 2013). Extracellular matrix and basement membrane provide anchoring support for epithelial cell spatial stability and functional membrane polarization, and degradation of these features permits tissue dissociation and stromal invasion (Jandial 2013). In the endometrium, matrix degradation and disruption of 307 adhesion normally occur with endometrial sloughing during menstruation (Gaide Chevronnay et al. 2009; Tabibzadeh 1996). However, malignant cells also disrupt these processes to metastasize from the primary tissue site, through the acquisition of mesenchymal characteristics following EMT (Fouad and Aanei 2017). In Chapter 2, we showed that ARID1A mutant endometrial epithelia undergo EMT and are capable of collective myometrial invasion in the presence of PIK3CAH1047R (Wilson et al. 2019). However, we have showed that BRG1 mutant endometrial epithelia do not display canonical EMT transcriptional features but are still displaced in nearby myometrium. These findings suggest that disruption of other epithelial integrity processes related to cell-cell adhesion and cell junctions likely contributes to invasive phenotypes, which are shared features of BRG1 and ARID1A mutant endometrial epithelia. Previous studies have demonstrated that SWI/SNF alterations modulate extracellular matrix and cellular adhesion (Marquez-Vilendrer, Rai, et al. 2016; Hill et al. 2004; Saladi et al. 2010). Other features of disease pathogenesis, such as abnormal steroid hormone signaling, may promote endometrial invasion through similar mechanisms (Dai et al. 2002). In Chapter 6, we demonstrated that ARID1A suppresses SERPINE1 hyperactivation associated with super-enhancer hyperacetylation that is essential for ARID1A mutant invasion (Wilson et al. 2020). Several studies have demonstrated a relationship between SERPINE1/PAI-1 expression and endometriosis (Bruse et al. 1998; Bruse, Radu, and Bergqvist 2004). SERPINE1 promoter polymorphisms linked to high levels of PAI-1 expression have been reported in endometriosis (Bedaiwy et al. 2006; Ramon et al. 2005). PAI-1 expression is increased in deep infiltrating (Alotaibi et al. 2019) and ovarian endometriosis (Gilabert-Estelles et al. 2003; Ramon et al. 2005) relative to other types of endometriosis. ARID1A mutations exist in both deep infiltrating and ovarian endometriosis (Anglesio et al. 2017; Suda et al. 2018), and our clinical 308 cohort supports that ARID1A loss leads to PAI-1 overexpression in endometriosis. As a secreted factor, elevated plasma PAI-1 levels have been observed in women with recurrent pregnancy loss or preeclampsia, and the secretion of PAI-1 from endometriotic lesions may contribute to endometriosis-associated infertility and pain (Ye et al. 2017). PAI-1 negatively regulates fibrinolysis and plays a role in endometrial hemostasis during menstruation (Davies and Kadir 2012; Mehta and Shapiro 2008). In addition to its roles in cell adhesion and migration, it remains possible that increased PAI-1 affects menstrual clotting and promotes fibrosis or scar tissue formation in endometriosis. Lastly, PAI-1 may serve as a biomarker for invasive ARID1A mutant endometriosis. 8.6 ARID1A as a transcriptional regulator In Chapter 2, we measured genome-wide ARID1A binding paired with chromatin accessibility and gene expression alterations accompanying acute ARID1A loss in 12Z cells to elucidate how disrupted ARID1A chromatin activities may motivate disease processes. In this model, ARID1A loss affected >10% of the expressed transcriptome. However, ARID1A- dependent expression alterations were relatively minor in effect size, with <10% of the affected genes displaying at least 2 fold-change in expression toward either direction. A roughly equivalent number of genes were downregulated and upregulated following ARID1A loss, corresponding to genes activated vs. repressed by ARID1A, respectively. Further, ARID1A promoter chromatin binding was equally observed at genes activated vs. repressed by ARID1A. However, genes directly repressed by ARID1A were most enriched for known disease processes like TNFα signaling via NFκB and inflammatory response, epithelial-to-mesenchymal transition, hypoxia, and apoptosis. 309 While ARID1A clearly plays roles in transcriptional repression associated with chromatin interactions, such as at mesenchymal genes, in Chapter 5, we found that ARID1A also promotes activation of epithelial identity genes alongside SWI/SNF catalytic subunit BRG1. We identified 35 genes with direct activating regulation by ARID1A and BRG1, and these genes are involved in processes like cell adhesion, projections, and taxis. In addition, our ChIP-seq studies indicated that distal ARID1A-SWI/SNF binding (>3 kb from a gene TSS) was also associated with active gene expression. Despite this physiologically relevant overlap in function, ARID1A and BRG1 uniquely or oppositely regulate the majority of their respective transcriptional programs. Further, BRG1 did not demonstrate as much direct repression of gene expression as ARID1A. Opposing changes in gene expression following ARID1A or BRG1 knockdown may be attributed to differences in SWI/SNF combinatorial subunit composition (Wu, Lessard, and Crabtree 2009). For example, BRG1 displays mutual exclusivity with the alternate SWI/SNF catalytic subunit, BRM (Brahma; SMARCA2), while the ARID1B and ARID2 subunits are mutually exclusive with ARID1A (Wu, Lessard, and Crabtree 2009; Wang et al. 2004; Yan et al. 2005). In Chapter 2 study, we observed that ARID1A interacts with both BRG1 and BRM by co- immunoprecipitation in the 12Z cell model (Wilson et al. 2019). In Chapter 5, we demonstrated that while ARID1A binding occurs preferentially at genes that are repressed by ARID1A (and upregulated upon ARID1A loss), ARID1A/BRG1 co-binding is strongest at genes that are normally activated by ARID1A/BRG1 complexes. A recent study observed that ARID1A degradation leads to rapid loss of enhancer accessibility (Blumli et al. 2021), supporting the primarily activating roles of SWI/SNF observed here. Our data may suggest that ARID1A/BRM complexes could play an important role in repressing gene transcription at co-bound genes, and 310 future studies will interrogate the role of BRM in the regulation of ARID1A-dependent phenotypes. In Chapter 6, an in silico screen of ENCODE ChIP data sets through Enrichr suggested that ARID1A direct, functional target genes are co-regulated by P300, the histone acetyltransferase. Indeed, P300 ChIP-seq in 12Z cells showed strong promoter binding overlap with ARID1A target genes, and P300 knockdown also altered expression of hundreds of ARID1A target genes. While ARID1A and P300 synergized to promote activity of many genes, we identified 50 invasion and EMT-associated genes that are normally repressed by ARID1A such that ARID1A loss-induction is dependent upon P300 HAT activity, including SERPINE1. As discussed, P300 knockdown reversed the ARID1A mutant invasion phenotype, suggesting ARID1A-P300 antagonism is pathologically relevant. We further described this mechanism at the level of chromatin as specifically antagonizing H3K27ac, an active chromatin mark catalyzed by P300. The mechanism by which ARID1A antagonizes P300 HAT-dependent gene activation is not well understood at this time. In Chapter 7, we further explored such chromatin mechanisms governing ARID1A transcriptional repression vs. activation that involve other nuclear co- regulators and specific combinations of epigenetic marks, like H4(K16)ac-mediated repression associated with ZMYND8 reader function. Remarkably, ARID1A does not appear to often influence total activation or silencing of gene expression, but, rather, it generally functions as a rheostat by modifying activation states and likely activation potential. In 12Z cells, genes that are repressed by ARID1A, which we have demonstrated are relevant to disease phenotypes, are often already expressed to some extent. For example, SERPINE1/PAI-1 is already expressed at high levels with detectable protein abundance in wild-type 12Z cells, as it is marked by an active super-enhancer, but SERPINE1 311 hyperexpression is required for ARID1A mutant invasion. In vivo, gene expression alterations in ARID1A/PIK3CA mutant endometrial epithelial cells are often more dynamic than those observed in vitro, however, that comparison involved hyperplastic mutant vs. healthy cells months following genetic depletion. The in vivo data capture disease states rather than acute genetic consequences, so mechanistic interpretation is confounded. 8.7 ARID1A as a chromatin regulator 8.7.1 Repression of chromatin at EMT and invasion genes Previously, Raab et al. identified ARID1A binding preferentially at promoters in HepG2 liver cancer cells (Raab, Resnick, and Magnuson 2015). In Chapter 2, we showed ARID1A binding enrichment at promoters, which was significantly correlated with chromatin accessibility. We observed mostly increased accessibility at promoters upon ARID1A loss in human 12Z endometrial epithelial cells and, in vivo, in sorted mouse endometrial epithelial cells. In addition, we observed greater activation of genes following ARID1A loss with detected ARID1A promoter binding, as compared to genes without detected ARID1A binding. These data suggest ARID1A- containing SWI/SNF complexes maintain endometrial epithelial cell identity by repressing genes required for transdifferentiation of epithelial cells into mesenchyme, at least partially through promoter chromatin interactions. Among direct ARID1A target genes, we also observed significant correlations between increasing promoter accessibility and increasing expression of mesenchymal genes upon ARID1A loss. In fact, ARID1A interactions with promoter chromatin suppress expression of numerous EMT genes in 12Z cells, including VCAM1, VIM, LOXL1, and LOXL2 (refer to Table 2.1 and Fig. 2.5q). In Chapter 4, we used CUT&RUN to measure genome- 312 wide ARID1A in vivo chromatin binding in adult mouse endometrial epithelial cells and observed direct regulation at EMT genes Jun, Plaur, Lamc1, and Pmp22 (refer to Table C.1). Interestingly, the overwhelming enhanced promoter chromatin accessibility following ARID1A loss in 12Z cells mostly occurred in the absence of detected promoter ARID1A binding. While >90% of the ~2000 gene promoters with chromatin accessibility alterations led to increased accessibility following ARID1A loss, less than 20% of those increasing accessibility promoters had detected ARID1A binding. This greatly contrasts with the 150 promoters with decreasing accessibility, in which >80% had detected ARID1A binding. A first potential explanation is that ARID1A was shown to function mostly at active cis-regulatory elements e.g. enhancers, both distal and proximal to gene promoters, in Chapter 6 epigenome-wide chromatin state modeling studies. Second, this phenomenon is likely related to the H3K27ac alterations observed in Chapter 6, where ARID1A-regulated, narrow, typical enhancers are mostly maintained by ARID1A chromatin remodeling activities, while ARID1A suppresses H3K27-hyperacetylation across large distances at super-enhancers. It remains unclear how ARID1A governs large chromatin domains in the endometrial epithelium, but it is suspected to involve 3D chromatin maintenance of topologically associating domains (TADs) and loops and associated structural machinery (Wu et al. 2019). In Chapter 7, we furthered the regulatory mechanisms of how ARID1A chromatin interactions can function repressively through association with NuRD-ZMYND8 complexes that contain HDACs. This function is associated with maintenance of histone variant H3.3 by cooperative remodeling activities of ARID1A-SWI/SNF and H3.3-interacting NuRD. 313 8.7.2 Association with transcription factor networks A common theme seen in various epigenome-wide chromatin analyses from these studies is SWI/SNF regulation of genomic AP-1/bZIP motifs; notably the canonical TGA(C/G)TCA. In Chapter 2, we observed that ARID1A binds and regulates chromatin accessibility most prevalently at AP-1 motifs in 12Z cells, and chromatin accessibility alterations in ARID1A/PIK3CA mutant mouse endometrial epithelia are most enriched for AP-1 motifs. In concordance, we found in Chapter 4 that ARID1A binds accessible AP-1 motifs in vivo mouse endometrial epithelial cells. In Chapter 5, we further showed that BRG1-ARID1A co-bound genomic regions are enriched for AP-1 motifs. ARID1A and SWI/SNF have been previously shown to regulate chromatin at AP-1 sites (Vierbuchen et al. 2017; Alver et al. 2017; Kelso et al. 2017), supporting our findings. The related TGACGTCA cyclic-AMP responsive element (CRE) and ATF binding site is also enriched among ARID1A promoter binding in 12Z cells. In Chapter 4, we identified stress-inducible transcription factor Atf3 as directly repressed by ARID1A in vivo mouse endometrial epithelial cells, and we hypothesize that ARID1A mutant ATF3 induction is associated with squamous differentiation and altered apoptotic signaling. ATF transcription factors are known to form cross- family heterodimers with AP-1 factors and bind both ATF/CRE and AP-1 sites (Hai and Curran 1991). Our ARID1A in vivo binding data in mouse endometrial epithelial cells also indicated that ARID1A regulates chromatin near AP-1 subunit-encoding genes Jun and Fosl2 (refer to Table C.1). In 12Z cells, ARID1A directly represses expression of AP-1 subunit-encoding genes JUNB, FOSL1, and FOSL2 (refer to Table A.2). These data suggest that ARID1A normally regulates chromatin at AP-1 motifs genome-wide, and ARID1A loss alters expression of certain bZIP transcription factors that bind these regions. Thus, ARID1A probably monitors AP-1 activity through various, possibly redundant regulatory mechanisms. 314 It is worth noting that Chapter 5 analyses of distal accessible chromatin sites, as surrogates for putative active cis-regulatory elements, revealed that AP-1 was also the top motif enriched among such regions without detected SWI/SNF binding. This result could be interpreted as that AP-1 motifs are a hallmark feature of active chromatin regulation in the endometrial epithelium. The widespread occurrence of AP-1 motif enrichment across these studies might suggest that bZIP-interacting transcription factors and associated regulatory machinery are critical to this cell type. Estrogen receptor has been well studied to transactivate AP-1 transcription factors through interactions with coactivators like P300, as a mechanism of ER-mediated transcriptional regulation (Kushner et al. 2000; Bjornstrom and Sjoberg 2005). Accurate hormone response in the endometrium may require the interplay between steroid hormone signaling and AP-1 factors, which is further augmented by ARID1A and SWI/SNF. Unique and overlapping chromatin regulatory activity between ARID1A- and BRG1- containing SWI/SNF complexes was also assessed in Chapter 5. At directly regulated gene promoters and distal elements, BRG1 and SWI/SNF were identified to interact with chromatin specifically at ETS and TEAD motifs, in addition to AP-1. Previous studies have shown that SWI/SNF complexes are recruited to regulatory elements containing these motifs, where they can interact with associated transcription factors to promote target gene expression (Sandoval et al. 2018; Skibinski et al. 2014; Alver et al. 2017). Across the entire promoter regions, LHX9 was also predicted as the top transcription factor regulating direct activating SWI/SNF target genes in 12Z cells, and this transcription factor has been previously demonstrated as essential for gonadogenesis as well as implicated in cancer and in vitro migration and invasion (Vladimirova et al. 2009; Birk et al. 2000). ANXA2 is a SWI/SNF target gene involved in cellular adhesion, with a predicted 315 regulatory LHX9 binding site, that has previously been linked to proper embryo implantation in the endometrial epithelium (Wang et al. 2015). 8.7.3 Relationship to other SWI/SNF subunits In Chapter 5, we revealed subunit-specific differences in chromatin regulation and transcriptional governance by SWI/SNF. Supported by our co-immunoprecipitation studies in 12Z cells (see Fig. A.5e), it is well known that ARID1A and BRG1 often operate in the same SWI/SNF complex. We observed these two subunits globally regulate common cellular processes as measured by direct transcriptional regulation, both at proximal gene promoters and distal elements. At the level of chromatin interactions, ~95% of gene promoters significantly bound by BRG1 were also bound by ARID1A, suggesting that the strongest promoter targets of BRG1-containing SWI/SNF complexes likely also contain ARID1A. These SWI/SNF gene targets were involved in processes such as cell motility and adhesion, corroborating invasive phenotypes in vivo. Critical subunit-specific distinctions were observed, though, as BRG1 ChIP enrichment was significantly higher at genes normally activated by BRG1, whereas ARID1A binding was higher at genes normally repressed by ARID1A. This result suggests that these SWI/SNF subunits serve primarily activating versus repressive roles in the endometrial epithelium, respectively, which may contribute to phenotypic disparities. As we know that ARID1A transcriptional repression of mesenchymal and invasion genes is critical to ARID1A mutant pathogenesis, one may hypothesize that such regulation may be achieved by BRM-containing complexes. Future work should investigate the functional roles of ARID1A in BRM vs. BRG1 containing complexes in the endometrial epithelium—alongside the plethora of other ARID1A-containing and ARID1A-absent 316 SWI/SNF subunit configurations—as has been partially described in one context (Raab, Resnick, and Magnuson 2015; Raab et al. 2017). 8.7.4 Governance of super-enhancer activation states In Chapter 6, we found that ARID1A is ubiquitously associated with super-enhancers marked by high levels of H3K27ac (Wilson et al. 2020). At the level of chromatin, ARID1A loss had a profound effect on genomic H3K27ac in a context-dependent manner, where ARID1A normally functions to promote typical enhancer accessibility and activity but suppresses hyperactivation of super-enhancers. ARID1A antagonizes P300 HAT activity at some super- enhancers, namely SERPINE1, such that ARID1A loss results in H3K27-hyperacetylation, increased chromatin accessibility, and increased eRNA transcription accompanying heightened gene expression in a P300-dependent manner. Functionally, SERPINE1 induction through this mechanism is essential for ARID1A loss-driven invasion in 12Z cells. Exposure to low-dose A- 485, a HAT inhibitor, reverses SERPINE1 hyperactivation and ARID1A mutant invasion in vitro. Supporting the clinical relevance of this mechanism, we noted upregulation of SERPINE1 in invasive ARID1A/PIK3CA mutant endometrial epithelia as well as in ARID1A mutant human deep-infiltrating endometriosis patient samples. SWI/SNF-mediated regulation of super-enhancers may be cell type specific. SWI/SNF subunit SMARCB1 can antagonize chromatin accessibility at super-enhancers in mouse embryonic stem cells (Langer, Ward, and Archer 2019), while SMARCB1 loss in rhabdoid tumors impairs SWI/SNF binding to typical enhancers, not super-enhancers (Wang et al. 2017). SWI/SNF can regulate Myc expression in acute myeloid leukemia through interactions with a lineage- specific super-enhancer (Shi et al. 2013), as has been described among other enhancers (Alver et 317 al. 2017). In mouse embryonic fibroblasts, the deletion of SWI/SNF family members has been shown to reduce H3K27ac at enhancers. SWI/SNF can promote chromatin accessibility at enhancers (Kelso et al. 2017; Vierbuchen et al. 2017). In embryonic stem cells, mutations in SWI/SNF catalytic subunit Brahma (BRM) result in enhancer reprogramming (Gao et al. 2019). In breast cancer, ARID1A binds and represses enhancers containing estrogen receptor-binding elements through co-recruitment of HDAC1, and ARID1A loss results in H4 acetylation, BRD4 recruitment, and subsequent transcription (Nagarajan et al. 2020). However, in that study, ARID1A knockout did not result in differential H3K27ac (Nagarajan et al. 2020). Our data suggest a unique role for ARID1A-P300 antagonism in the regulation of super-enhancer chromatin accessibility and H3K27ac deposition or maintenance in the endometrial epithelium. Although ARID1A is enriched at promoters and we observed ARID1A-P300 co-binding at these sites, we demonstrated in Chapter 6 that ARID1A-P300 antagonism uniquely occurs at super-enhancers, leading to hyperacetylation and hyperactivation when ARID1A is deficient. Both SERPINE1 and SDC4 have large super-enhancers spanning the promoter region, although we also show that super-enhancer regulation by ARID1A, P300 and A-485 affects the expression of genes located up to 50 kb away. Interestingly, although ARID1A is bound at the SERPINE1 promoter and multiple sites throughout the 41 kb SERPINE1 super-enhancer, increased H3K27ac is observed throughout the majority of the super-enhancer interval in ARID1A-deficient cells, suggesting that ARID1A regulates large chromatin domains. However, while we have demonstrated ARID1A-P300 antagonism at super-enhancers in endometrial epithelia, the biochemical or molecular mechanism of HAT antagonism by ARID1A still remains unclear. In Chapter 7, we presented a mechanism that could explain such phenomena involving ARID1A interaction with HDAC-containing NuRD associated with H3.3 nucleosomes at active chromatin, 318 including super-enhancers. Importantly, we observed a strong association between high levels of H3.3 and ZMYND8 at super-enhancers that were suppressed by ARID1A, i.e. that became hyperacetylated following ARID1A loss, suggesting that these factors are key determinants of super-enhancer regulatory behavior in endometrial epithelia. ZMYND8-mediated repression was further associated with H4(K16)ac, and other histone H4 acetyltransferases may be involved in marking these regions for joint regulation. 8.7.5 Collaboration with other chromatin regulators ARID1A likely antagonizes the activating effects of P300 HAT activity through association with repressive chromatin co-regulator machinery. One possible mechanism is introduced in Chapter 7, in which we observed that ARID1A is directly involved in maintaining histone variant H3.3 at active chromatin associated with both activating and repressive transcriptional regulation. H3.3 has been historically observed as an active chromatin mark associated with transcriptional activation. However, our data and others have demonstrated that H3.3 can also function toward transcriptional repression, as well as transcriptional poising and higher ordered chromatin regulation, though the mechanisms governing these functional specificities remain unclear (Shi, Wen, and Shi 2017). A simple hypothesized mechanism explaining how H3.3 can function repressively is through associations with CHD4 and the NuRD complex, as we have studied. NuRD has been historically considered as a repressor due to its subunit composition that includes the histone deacetylases HDAC1/2, although activating roles of NuRD are also known (Tong et al. 1998; Xue et al. 1998; Basta and Rauchman 2015). An early study of H3.3 chromatin dynamics indicated that NuRD components were associated with active genomic regions marked by high H3.3 turnover (Ha, Kraushaar, and Zhao 2014). More recently, 319 NuRD has been shown to directly interact with H3.3 nucleosomes (Kraushaar et al. 2018). Further, we have shown that ARID1A strongly interacts with CHD4 in endometrial epithelial cells, and chromatin co-binding by ARID1A and CHD4 is associated with H3.3 maintenance and transcriptional repression. However, it is not clear how ARID1A cooperates with CHD4 through protein-protein interaction to regulate H3.3 nucleosomes. Remarkably, CHD4/NuRD was also recently shown to control super-enhancer accessibility and maintain lower acetylation levels through complexed HDAC activity (Marques et al. 2020), similar to our findings with ARID1A by antagonizing P300 as described in Chapter 6 (Wilson et al. 2020). The authors also observed CHD4 interactions with SWI/SNF, in support of our data. Recently, NuRD and SWI/SNF recruitment to active TSS and enhancers was reduced in H3.3K4A mutant mouse ESCs (Gehre et al. 2020), suggesting that H3.3 chromatin may be mutually interdependent with remodeler activity. Corroborating our findings, the same study reported that gene associated H3.3 levels depleted following CHD4 or BRG1 (SWI/SNF) depletion. Loss of H3.3 following disruption of ARID1A chromatin interactions could lead to destabilization of repressive NuRD. One hypothesized cooperative mechanism that could explain how ARID1A and CHD4 maintain H3.3 in active chromatin is through different aspects of remodeler activity conferred by each complex. ARID1A could remodel or remove canonical H3 nucleosomes, and CHD4 could function to stabilize H3.3 through biochemical interactions. In this case, ARID1A loss may not contribute to removal of H3 nucleosomes leading to heighted canonical H3 levels, and CHD4 loss could result in the inability to stabilize H3.3. It is important to note that H3.3 has a defined chaperone associated with transcriptional activity, HIRA, which is likely to play a role in H3.3 co-regulation by CHD4 and ARID1A (Shi, Wen, and Shi 2017). P400 320 is another SWI/SNF-like remodeler recently shown to exchange H3.3 nucleosomes that could also possibly collaborate with SWI/SNF and NuRD (Pradhan et al. 2016). In silico analyses from the ReMap 2020 transcriptional regulator peak database (Cheneby et al. 2020) predicted that histone multi-reader ZMYND8 is highly associated with H3.3 chromatin regulation by ARID1A. While we did not detect high stringency nuclear interactions between ARID1A and ZMYND8, we did observe ARID1A-CHD4 and CHD4-ZMYND8 interactions. ZMYND8 has been shown to interact with NuRD in numerous contexts (Gong et al. 2015; Adhikary et al. 2016; Savitsky et al. 2016; Spruijt et al. 2016). Intriguingly, one recent study reported that ZMYND8 directly recognizes mutant H3.3G34R (Jiao et al. 2020). Our data indicate that ZMYND8 may be a further requirement for repressive transcriptional regulation linked to H3.3 maintenance, associated with H4 acetylation. In support, the ZMYND8 bromodomain directly interacts with acetylated H4 tails (Adhikary et al. 2016), and TIP60-mediated H4 acetylation can functionally recruit ZMYND8 through this mechanism to repress transcription with CHD4 in response to DNA damage (Gong et al. 2015). Our data also indicate that ARID1A directly suppresses chromatin accessibility at sites marked by H4 acetylation, suggesting that SWI/SNF chromatin remodeler activity may be involved in ZMYND8-NuRD-mediated transcriptional repression. Interestingly, ZMYND8 has been observed to regulate transcription independently of NuRD, in addition to NuRD-dependent regulation (Adhikary et al. 2016). Positive transcriptional regulation by ZMYND8 has been shown to occur through association with the P-TEFb elongation complex (Ghosh et al. 2018). ZMYND8-NuRD repression in response to DNA damage was previously shown to rely on KDM5A demethylase activity (Gong et al. 2017), further suggesting other factors may orchestrate this regulatory activity. ZMYND8 has been previously considered a super-enhancer factor, specifically to suppress hyperactivation (Shen, Xu, et al. 2016). 321 Corroborating our results, ZMYND8 was previously shown to associate with NuRD at super- enhancers (Spruijt et al. 2016). Indeed, we observed that super-enhancers that become hyperacetylated following ARID1A loss are normally associated with the highest levels of H3.3 and ZMYND8 binding. In our proposed model, ZMYND8 bromodomain interactions with H4 acetylated tails recruit or direct CHD4 and ARID1A for transcriptional repression at active chromatin. This most notably occurs at H3.3+ super-enhancers, where all three factors co-localize most frequently. Further work will seek to explain how ZMYND8 and H4(K16)ac cooperatively specify this repressive activity, as CHD4 and ARID1A can also function toward transcriptional activation. It is worth considering alternative hypotheses of how transcriptional homeostasis could be altered following disruption of these chromatin regulators and features. In addition to promoter and enhancer chromatin regulation, SWI/SNF, NuRD, and ZMYND8 have each been shown to mediate transcriptional pausing and elongation by RNA polymerase II and associated machinery (Schwabish and Struhl 2007; Bottardi et al. 2014; Ghosh et al. 2018; Trizzino et al. 2018), as well as DNA repair (Park et al. 2006; Smeenk et al. 2010; Gong et al. 2015). Super-enhancers mark critical cell identity genes (Whyte et al. 2013), and recent evidence suggests chromatin mechanisms coupling transcription and DNA repair occur at super-enhancers to control transcriptional hyperactivation (Hazan et al. 2019). Super-enhancer chromatin co-regulation by ARID1A, CHD4, and ZMYND8 to fine tune transcriptional activation states may reflect such a mechanism at the intersection of transcriptional regulation and DNA repair. 322 8.8 Future directions 8.8.1 Elucidating SWI/SNF-NuRD interdependence on H3.3 While an attractive mechanism, the interplay between ARID1A-SWI/SNF and NuRD in transcriptionally associated regulation of histone variant H3.3 is not completely described from the cell-based studies presented in Chapter 7. Our data suggest that ARID1A is involved in maintaining H3.3 at active regulatory elements, such that ARID1A loss leads to H3.3 depletion and associated transcriptional consequences. SWI/SNF has not been reported to remodel H3.3 chromatin, suggesting associated factors may be involved in this process. The H3.3-interacting CHD4 catalytic remodeler of NuRD was shown to biochemically and genomically interact with ARID1A as a potential mechanism that may be further specified by other co-factors like ZMYND8. We hypothesize that the combined functions of ARID1-SWI/SNF and CHD4-NuRD are responsible for the observed H3.3 maintenance in active chromatin. In this context, ARID1A may serve to remodel or eject canonical H3.1 nucleosomes in favor of H3.3 incorporation by other factors. While CHD4 is known to interact with H3.3 nucleosomes, it has not been considered or demonstrated to function as a canonical H3.3 chaperone or remodeler. As reviewed, much of the functional work on NuRD-H3.3 regulation has been genomic (Ha, Kraushaar, and Zhao 2014; Kraushaar et al. 2018). Hypothetically, CHD4-NuRD could function to stabilize H3.3 containing nucleosomes in the proposed mechanism. In this case, we hypothesize that CHD4 loss may also lead to H3.3 depletion at similar sites as observed with SWI/SNF loss. This experimental result would further support a cooperative mechanism between SWI/SNF and NuRD in H3.3 regulation. As discussed, some evidence suggests that SWI/SNF and NuRD recruitment may be dependent 323 upon H3.3 (Gehre et al. 2020). We aim to test if this aspect of H3.3 interdependence is also observed in our 12Z cell model by H3.3 knockdown genomic localization studies. It remains possible that NuRD is not responsible for establishing H3.3 nucleosomes but rather supporting their maintenance. In this scenario, another H3.3 remodeler or chaperone factor may be responsible, such as HIRA or P400 (Shi, Wen, and Shi 2017; Pradhan et al. 2016). In 12Z cells, we have also observed physical interactions between ARID1A-SWI/SNF and the P400 complex (Fig. 8.1). This supports a possible link between P400 and H3.3 regulation by SWI/SNF, leading to our proposed hypothetical model possibly involving all three discussed SWI/SNF-like remodeler complexes (Fig. 8.2). However, P400 is historically established as a remodeler of H2A.Z variant histones (Giaimo et al. 2019), while H3.3 activity has only been recently described (Pradhan et al. 2016). We have not yet investigated known transcription-associated H3.3 chaperones such as HIRA. Intriguingly, the SUPT16H subunit of the FACT complex was the de facto most significant factor associated with H3.3+ ARID1A binding from our ReMap2020 in silico screen (refer to Fig. 7.5a). FACT supports elongating RNA polymerase II to overcome nucleosomal barriers and maintain chromatin (Winkler and Luger 2011; Hsieh et al. 2013). FACT could be involved in fine-tuning transcriptional activation states at highly transcribed regions like super-enhancers. Future experiments will be designed to identify potential co-regulatory activities of such factors. 324 Figure 8.1 ARID1A also physically interacts with variant histone remodeler P400 a, Normalized protein counts for observed subunits within the SWI/SNF and P400 complexes from anti-ARID1A (D2A8U) or IgG co-immunoprecipitation followed by mass spectrometry in human 12Z endometriotic epithelial cell nuclear extracts. BAF53A is a shared subunit found in both complexes. b, The P400 catalytic subunit, EP400, is detected in ARID1A co-immunoprecipitation samples by Western blot. c, Reciprocally, ARID1A and other SWI/SNF subunits (BRG1 and BAF155) are detected in EP400 co-immunoprecipitation samples by Western blot. Figure 8.2 Hypothetical model of H3.3 chromatin regulation by SWI/SNF, NuRD and P400 Hypothesized, potential mechanism of how ARID1A-containing SWI/SNF complexes maintain H3.3 in active chromatin in biochemical collaboration with other SWI/SNF-like complexes: H3.3- interacting CHD4-NuRD and the variant histone remodeler P400. 325 8.8.2 ARID1A-SWI/SNF regulation of steroid hormone signaling As reviewed, ARID1A is reported to biochemically and functionally interact with steroid hormone receptors, likely through LXXLL motifs in its poorly characterized C-terminal domain. High ARID1A mutation rates in hormone-responsive tissues like the endometrium and breast suggest a pathophysiological relationship. In Chapter 2, we noted a reduction in expression of estrogen receptor (ER) and progesterone receptor (PR/PGR) in hyperplastic ARID1A/PIK3CA mutant mouse endometrial epithelial cells (refer to Fig. A.2a) (Wilson et al. 2019). Collectively, these data warrant further investigation into the roles of ARID1A and SWI/SNF in steroid hormone receptor genomic regulation. Human 12Z endometriotic epithelial cells constitutively expressing estrogen receptor (ESR1, ERα) were a gift from Dr. Niraj Joshi (Dr. Asgi Fazleabas). In engineering, 12Z cells were transfected with Precision LentiORF (horizon) plasmid containing the human ESR1 open reading frame (ORF) and selected under Blasticidin. These 12Z ESR1+ cells were then treated with estrogen (estradiol, E2) or vehicle for 24 hours with or without ARID1A knockdown by siRNA (siARID1A) followed by RNA-seq analysis (Fig. 8.3a). The progesterone receptor (PR) encoding gene, PGR, is a classical estrogen responsive gene (Kastner et al. 1990), and PR protein expression was highly induced following E2 treatment (Fig. 8.3b). However, PR induction was observed in an ARID1A-dependent manner with substantial reduction in the siARID1A + E2 condition (Fig. 8.3b). Despite constitutive expression at the transcript level, ERα protein expression was also markedly reduced in ARID1A knockdown conditions (Fig. 8.3b). As we showed in present works that ARID1A regulates various unfolded protein response and endoplasmic reticulum stress related molecular programs (e.g. Fig. 4.6i and A.2b-c), it is possible that ARID1A loss may post- translationally affect ERα protein stability or degradation. 326 Figure 8.3 12Z ESR1+ cells as a model for estrogen-mediated gene expression analysis a, Schematic of experimental design in 12Z cells engineered to constitutively express ESR1. b, Western blot analysis of ARID1A, PR, ERα, and β-actin (loading control) in ESR1+ cells. c, Unsupervised clustering of 2032 genes (rows) selected by variance across the experimental design. Gene expression values are Z-scored rlog counts. Red values indicate high relative gene expression, and blue values indicate low expression. d, Examples of gene expression alterations following estrogen treatment and ARID1A loss. y-axis represents linear relative gene expression with respect to the control condition. 327 Transcriptomic analysis by RNA-seq followed by unsupervised clustering revealed that ARID1A loss had a more dominant effect on gene expression than E2 (Fig. 8.3c). However, numerous patterns of estrogen-regulated gene expression were observed, including genes that are dependent upon ARID1A for estrogen-mediated induction (PGR), not augmented by ARID1A loss (C3), or augmented by ARID1A loss i.e. where ARID1A normally antagonizes induction (CCNA1) (Fig. 8.3c-d). We further explored estrogen-regulated gene expression that is also regulated by ARID1A. Out of 1699 differentially expressed genes observed with estrogen treatment (DESeq2, FDR < 0.0001), 939 were further affected by ARID1A loss, a highly significant overlap (p < 10- 282 , hypergeometric enrichment test) (Fig. 8.4a). Those 939 overlapping genes were then directionally segregated by the E2-ARID1A regulatory relationship as measured by E2/siARID1A DGE. The majority of overlapping genes (64.4%) were normally cooperatively regulated by E2 and ARID1A (i.e. different direction of perturbation with ARID1A loss and E2 treatment), with the remaining 35.6% of genes classified as exhibiting antagonistic regulatory activity (Fig. 8.4b). Gene set enrichment analysis was then performed for MSigDB Hallmark pathways and GO Biological Processes. Strikingly, Hallmark estrogen target genes were most enriched among genes cooperatively activated by E2 and ARID1A, while EMT, cell adhesion, and wound healing responses were often cooperatively downregulated in response to estrogen (Fig. 8.4c). These results suggest ARID1A mostly facilitates estrogen-mediated gene expression, both among genes induced and repressed by estrogen signaling. Known pathological invasive processes we have described in ARID1A mutant endometrial models, like EMT, migration, adhesion, and wound healing, are indicated as normally suppressed by estrogen signaling with support from ARID1A. We hypothesize that such genes might be involved in estrogen-stimulated re-epithelialization following menses. 328 Figure 8.4 Exploring ARID1A-mediated estrogen response a, Overlap of 1699 empirical estrogen response genes and siARID1A DGE in 12Z ESR1+ cells. b, Directional classification of perturbed genes mutually regulated by E2 and ARID1A. c, Gene set enrichment analysis for MSigDB Hallmark pathways (top) and GO Biological Processes (bottom) among E2-ARID1A regulated classes. d, Rank visualization of top estrogen-regulated genes in 12Z ESR1+ cells. e, Visualization of H3K4me3 and H3K27me3 chromatin features at the bivalent PGR locus in 12Z. 329 This dissertation has focused heavily on gene expression programs related to ARID1A mutant pathology, such as mesenchymal reprogramming and invasion, but these data also implicate wild-type ARID1A in activating estrogen response genes. Out of the 1699 genes empirically defined as estrogen-mediated in 12Z ESR1+ cells, PGR is one of the most highly induced estrogen target genes (Fig. 8.4d). Intriguingly, analysis of our profiled chromatin features in wild-type 12Z cells revealed that PGR is a bivalent gene marked by H3K4me3 and polycomb- mediated H3K27me3 (Fig. 8.4e). As reviewed, bivalent chromatin often marks poised genes primed for activation, and SWI/SNF has been previously implicated in antagonizing polycomb- mediated H3K27me3 silencing (Ho et al. 2011). Estrogen receptor-mediated recruitment of ARID1A-SWI/SNF to PGR might antagonize polycomb silencing leading to PR expression. We further identified 43 Hallmark estrogen response genes that display similar ARID1A-dependent estrogen induction (Fig. 8.5), supporting a physiological role for ARID1A in activating estrogen response genes. Note, these studies suggest ARID1A might affect estrogen receptor stability or degradation which could confound interpretation of genomic estrogen signaling alterations. Future efforts should leverage in vivo cell populations with intact estrogen signaling and may consider exploring direct estrogen receptor regulation through genome-wide binding analyses and dynamics. 330 Figure 8.5 ARID1A-dependent Hallmark estrogen response genes 43 genes within the MSigDB Hallmark early or late estrogen response pathways that are significantly induced by estrogen (E2) treatment but to a significantly less extent in the context of ARID1A knockdown (siARID1A + E2) in 12Z ESR1+ cells. Left, clustered heatmap of expression log2FC values in siARID1A, E2, or siARID1A + E2 conditions compared to control cells. siARID1A Δ represents the change in expression between siARID1A + E2 vs. siARID1A conditions. Blue cells indicate downregulation, and red cells indicate upregulation. Black cells in the right columns indicate membership to Hallmark estrogen response early and late pathway gene sets. 331 APPENDICES 332 APPENDIX A Supplementary material for Chapter 2 Table A.1 Genes exhibiting direct activation by ARID1A in 12Z cells AAK1 ABL2 ABLIM3 ACSL4 ADAM33 ADAMTS1 ADAMTS12 ADAMTS3 ADAMTS5 ADAMTS6 ADD3 ADIRF ADM AJUBA AKR1B1 ALCAM ALDH1A2 ALDH3A2 AMOTL2 AMPH ANAPC13 ANKRD13A ANKRD13C ANKRD33B ANXA2 ANXA7 AP1S3 AP5S1 APOLD1 ARAP2 ARHGAP18 ARHGAP5 ARHGEF28 ARID1A ARID5B ARL4D ARMH4 ARMT1 ARRB2 ASAP1 ASB1 ASPH ATG5 AXL BCAT1 BEX1 BICC1 BLVRB BNC1 BNC2 BOK C6orf141 CAMK2D CBX1 CCDC50 CCDC9B CCND1 CCNI CCNYL1 CD63 CDC42 CDCA7L CDH4 CDKN1B CELF2 CFL2 CHML CHMP2B CHST3 CLCN4 CLMP CLOCK CLTB CNKSR3 CNN3 COL12A1 COL4A5 COL4A6 COL5A3 COL8A1 COPZ1 CORO1C CPA4 CPED1 CRABP2 CRY1 CSPG4 CSRP2 CXADR CYLD CYTH3 DAPK1 DCBLD2 DEK DIAPH3 DIO2 DKK1 DNAJC22 DNAJC6 DOCK5 DOK5 DPY19L1 DPYSL2 DSC3 DTWD2 DUSP4 ECI2 EEF1A2 EFNA5 EIF2AK4 EIF4G2 ELF4 EMC6 ENAH EPB41L2 ERBIN ERCC1 ETFA EXOC6B EXT1 EZR F2RL1 F3 FADS3 FBXO9 FBXW11 FER FGF2 FGF5 FGF7 FHL2 FIG4 FIGN FLNC FN1 FOXK1 FRMD5 FSTL1 FTL FUNDC2 FYN G3BP2 GAB2 GABRE GEM GFM2 GLS GNG12 GNPDA1 GOLGA4 GPC6 GPN1 GPR176 GRB10 GSTM4 GULP1 HAS2 HBEGF HMGA1 HMGN4 HMMR HNRNPA0 HOXA10 HOXA11 HOXA3 HOXA5 HOXA6 HOXA9 HOXB3 HOXB9 HSPB8 HTR7 ICE2 IER5 IFT20 IGF2BP2 IL11 IL7R IPPK ITGA7 ITGB8 ITPKC ITPR3 JUP KANK2 KCTD3 KIF1B KIF5B KITLG KLF5 KLF7 KRT19 KRT7 LAMA5 LARP6 333 Table A.1 (cont’d) LBH LMBRD2 LMCD1 LOXL1 LRRC7 LRRC8C LRRC8D LRRFIP2 LRRN4CL LUZP1 LYPD1 MAGI2-AS3 MAGI3 MALSU1 MAP2K3 MAP7D1 MAPRE2 MARCKS MAVS MB21D2 MBD3 MBOAT2 MEDAG MEIS2 MELTF MEST MFSD1 MGST1 MGST3 MICALL1 MIDN MINDY2 MMP3 MT2A MTMR1 MTMR4 MYBL2 MYH10 MYO1B MYPN MYRF NAA40 NABP1 NACAD NAP1L1 NAT14 NBR1 NCKAP1 NCKAP5L NDFIP2 NEAT1 NEGR1 NEIL3 NEK7 NEXN NINJ1 NLRP1 NOC3L NPC2 NPPB NPTN NPTXR NR1D2 NR2F1 NR2F2 NRIP1 NTN4 NTNG1 NUAK1 OBSL1 OGFRL1 OPA1 OSBPL8 OXTR PAK2 PAPSS2 PARD6B PAWR PCGF5 PDE4A PDGFC PDPN PERP PHLDB2 PIP5K1A PITPNM3 PKP4 PLCB4 PLCXD3 PLEKHA3 PLS1 PM20D2 PMP22 PNMA2 PNPLA2 PNRC2 PPA1 PPFIBP1 PPM1K PPP1R15A PPP2CA PRKAA1 PRR11 PSG6 PSMD2 PTBP3 PTPN11 QSOX1 RAB13 RAB30 RAB32 RAB34 RAB38 RAB8B RAD18 RANGAP1 RAPGEF2 RARG RASGRF2 RASGRP3 RASSF8 RBPMS REPS1 RGMB RHOBTB3 RIPK1 RIPK2 RIPOR3 RNF144B RPL36AL RTKN2 RUNX1T1 RUNX2 S100A16 S100A2 S100A6 SAMD12 SAMD9 SBF2-AS1 SCFD1 SCN8A SCN9A SCOC SDCBP SEC22B SEMA3B SERAC1 SERPINA9 SERTAD2 SGMS2 SH3BP4 SH3RF1 SHC3 SKA2 SKP1 SLAIN2 SLC16A12 SLC16A5 SLC20A2 SLC25A3 SLC2A4RG SLC39A13 SLC39A14 SLC41A2 SLC7A2 SLC9A1 SLFN5 SMUG1 SNHG1 SNHG3 SNX11 SNX6 SOCS5 SPRED1 SPTLC3 SRPX2 SSH2 SSR3 STAMBP STAT5B STK10 STX11 STX1B STXBP6 SUPT4H1 SYNC SYTL2 TBC1D23 TBC1D9 TBL1X TBX3 TEAD1 TENT4A TES TFAP2A TGFB2 TINAGL1 TLN1 TMEM154 TMEM158 TMEM171 TMEM200A TMEM65 TMTC1 TNFRSF19 TOB1 TPMT TPX2 TRAM1 TRAM2 TRIB3 TRIM25 TRIM58 TRIM59 TRIP6 TRNP1 TSC22D2 TSPYL4 TTLL7 TUBA4A TXNRD1 UBE4B UPK1B USP12 USP53 VEGFC VEZF1 VGLL3 VIT VOPP1 WASF2 WDFY1 WDR44 WNT5A WWTR1 334 Table A.1 (cont’d) UPK1B USP12 USP53 VEGFC VEZF1 VGLL3 VIT VOPP1 WASF2 WDFY1 WDR44 WNT5A WWTR1 YWHAZ ZBED3 ZBED5 ZDHHC17 ZFHX4 ZFYVE9 ZNF22 ZNF516 ZNF598 ZNF699 ZNF804A ZRANB1 ZYX 443 genes exhibiting direct, functional activating regulation by ARID1A in human 12Z endometriotic epithelial cells. All genes are marked by detected ARID1A binding (ARID1A ChIP- seq, n = 2, MACS2 broad FDR < 0.05) at promoter chromatin (within 3 kb of TSS) and significant downregulation of expression following acute ARID1A knockdown by siRNA (siARID1A RNA- seq, n = 3, DESeq2 FDR < 0.0001). Table A.2 Genes exhibiting direct repression by ARID1A in 12Z cells ABCC4 ACOT13 ADAM17 ADGRG6 ADGRL2 AEN AFAP1 AHCY AHNAK2 AKAP13 ALDH3B1 ALDH7A1 ALDH9A1 ANGPTL4 ANKMY1 ANKRD28 ANP32B ANTXR1 ANTXR2 ANXA1 ANXA6 APOBEC3B APOL6 ARHGDIB ARID3A ARL6IP5 ARRDC3 ATF5 ATP13A2 ATP8B1 ATP8B3 ATXN1 ATXN2 AVL9 B4GALT1 B4GALT5 BACH1 BCL3 BDKRB1 BHLHE40 BHLHE41 BIRC2 BIRC3 BRWD1 BTBD19 BTBD3 C1orf198 C1QTNF6 CALD1 CAPRIN2 CASP8 CAVIN2 CBX5 CCDC117 CCDC80 CCL2 CCL7 CD59 CD82 CDC42EP4 CDCP1 CDK6 CDV3 CEMIP CFLAR CHD2 CHD3 CITED2 CLDN1 CLIC4 CMBL CNN2 CNTNAP1 COL1A2 COL5A2 COTL1 CPPED1 CREB3L2 CSNK1E CUEDC1 CXCL2 CXCL8 CYBRD1 DAW1 DDAH1 DDB2 DDIT4 DDR2 DEPP1 DHRS3 DNMBP DOCK1 DPYSL3 DRAM1 DSTN DUSP5 E2F3 EDN1 EHBP1L1 ELL EMP1 EMP3 ENC1 ENO1 ENTPD7 EPHA2 ERCC6 ERF ERRFI1 ETS2 ETV6 EVA1C FAF2 FAM160B1 FAM216A FAM3C FAP FAT1 FBLN2 FBXL14 FBXL5 FBXO32 FCHO2 FCHSD2 FNDC3A FNDC3B FOSL1 FOSL2 FRS2 GADD45A GAPDH GFPT1 GJA1 GPC1 GPR39 GPX8 GRAMD2B GTDC1 GXYLT1 HACD2 HEG1 HIPK3 HMGB2 HRH1 HSPA5 335 Table A.2 (cont’d) HSPB1 HSPG2 IBTK ICAM1 ID1 ID2 IDE IFIT2 IFIT3 IGFBP6 IKBKE IL1R1 IL1RAP IL6 IL6ST IRF1 ISCA1 ISG20 ITGA2 ITGA5 ITGAV ITGB1 ITGB4 ITPRIP JARID2 JUNB KBTBD7 KCNK2 KDM2B KDM3A KEAP1 KIAA0895 KLF10 KLF4 KLHL5 KMT2A KRT18 KRT8 LAMB1 LAMC1 LAT2 LATS1 LATS2 LDHA LEPROT LIFR LMNB1 LMO4 LOX LOXL2 LPAR1 LPP LRFN4 LRIG3 LRP11 LRRC58 LTBR LYPD3 MAP3K7CL MAP4K3 MAP4K4 MAPRE1 MBNL2 MED28 MET MFAP2 MFGE8 MKNK2 MTF2 MTHFD1L MXRA8 MYL9 MYO10 MYOF NAB2 NAMPT NAV1 NCEH1 NEMP1 NEMP2 NFKBIA NMI NPC1 NT5DC3 NT5E NUCB2 OSMR OTUD1 P3H2 P4HA1 P4HA2 PAFAH1B2 PANX1 PAPPA PARP10 PBX3 PDCD1LG2 PDE7B PDK1 PEAR1 PFKFB4 PHF10 PHGDH PHLDA1 PIK3IP1 PIKFYVE PIR PKP3 PLAT PLAU PLAUR PLCE1 PLD1 PLEKHA8 PLEKHB2 PLEKHN1 PLIN3 PLPP5 PLXNB2 PMEPA1 POLE3 PPP1R3C PRDM11 PRDM8 PRNP PRUNE2 PSAT1 PSD3 PTPN3 PTPRG PTPRJ PTTG1IP RAB31 RAB3D RAB7B RALB RAP1A RASA2 RBM43 RBM7 RELA RGS4 RHBDD3 RIMKLB RNF145 RNF20 ROR1 RPN2 RPRD1A RRBP1 RSF1 S100A10 S100A11 SBNO2 SCG2 SEC14L1 SEMA3C SERINC3 SERPINB8 SERPINH1 SFXN3 SGK1 SGMS1 SH2D4A SH3GLB1 SH3KBP1 SH3RF2 SH3RF3 SH3TC2 SIPA1L1 SIX2 SKIL SLC16A7 SLC20A1 SLC22A4 SLC25A37 SLC25A45 SLC26A11 SLC2A1 SLC2A3 SLC30A7 SLC38A1 SLC38A2 SLC39A10 SLC39A8 SLC7A5 SLTM SMG9 SMIM10 SMIM13 SOCS3 SOS2 SOWAHC SOX12 SP100 SPARC SPATS2 SPATS2L SPTAN1 SSR1 STAMBPL1 STAT3 STC1 STK40 STOM SUN1 SYNJ2 SYNPO2L TAGLN TBC1D2 TBK1 TFPI2 TGFB1 TGFBR1 TGIF1 TGIF2 TGM2 THBS1 THBS2 THSD4 TIMP3 TIPARP TLR6 TM4SF1 TMED5 TMEM132A TMEM245 TMEM263 TMEM268 TMSB4X TNFAIP2 TNFRSF10B TNFRSF1A TNKS1BP1 TNPO1 336 Table A.2 (cont’d) TOR4A TP53 TRAF1 TRIB2 TRIM47 TRIM6 TRIM65 TSPAN4 TUBB6 TWIST1 TXN UACA UBE2L6 UBN2 VCAM1 VCL VDAC1 VKORC1L1 VNN1 VXN WEE1 WNT5B WSB1 XAF1 XBP1 XPOT ZBTB1 ZCCHC3 ZFP36L1 ZMYND8 ZNF217 ZNF367 ZNF488 ZNF609 ZNF674 ZNF697 ZNF850 417 genes exhibiting direct, functional repressive regulation by ARID1A in human 12Z endometriotic epithelial cells. All genes are marked by detected ARID1A binding (ARID1A ChIP- seq, n = 2, MACS2 broad FDR < 0.05) at promoter chromatin (within 3 kb of TSS) and significant upregulation of expression following acute ARID1A knockdown by siRNA (siARID1A RNA-seq, n = 3, DESeq2 FDR < 0.0001). 337 Figure A.1 Phenotypic analysis of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G and lung metastasis 338 Figure A.1 (cont’d) a, Representative gross images of mice at time of sacrifice due to vaginal bleeding. White arrows indicate tumors. Size of uterine tumor varies dramatically within genotype at time of sacrifice. b, H&E staining and IHC for ARID1A, P-S6 and KRT8 (n = 2) of the endometrium at 5X (scale bar = 200 µm) and 20X (scale bar = 50 µm) magnification, with 20X magnifications representing portion panel to the right surrounded by black box. ARID1A expression is retained in the endometrial epithelium of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G. P-S6 is shown as marker of AKT pathway activation, and KRT8 as a marker of endometrial epithelium. Arrows indicate endometrial epithelium. c, Gross image, histology and IHC of LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G lung metastasis observed in one mouse. White arrow indicates lung metastasis. Black arrows indicate mutant epithelium (scale bar = 400 µm). 339 Figure A.2 LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrium shows downregulation of steroid hormone receptors and unfolded protein response proteins a, IF and IHC images detailing changes in hormone receptor expression in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (n = 2). KRT8 identifies endometrial epithelium. Confocal IF images represent maximum intensity projections. Arrows indicate endometrial epithelium (scale bar = 200 µm). b, IF images detailing changes in unfolded protein response protein GRP94 in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrium, counterstained with KRT8 and DAPI (n = 2). Arrows indicate endometrial epithelium (scale bar = 50 µm). c, IHC images detailing changes in unfolded protein response protein GRP78 in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrium (n = 2). Arrows indicate endometrial epithelium (scale bar = 200 µm). 340 Figure A.3 Allelic series of ARID1A loss with (Gt)R26Pik3ca*H1047R in endometrial epithelium displays concordant gene expression and differential chromatin accessibility 341 Figure A.3 (cont’d) a, Flow-cytometry analysis of EPCAM purity (based on PE labeling) before and after sorting. b, Purity of EPCAM-isolated cell populations by genotype (n = 3 for control, n = 4 for other genotypes). c, Volcano plot of RNA-seq differential gene dataset from EPCAM-isolated cells. Purple dots represent genes with significance FDR < 10-5 and represent the 517 gene signature for further analysis (genes with expression |log2FC| < 1 subsequently filtered). d, Clustering of 517 differentially expressed signature genes. EMT genes from Hallmark pathway and Mak & Tong pan-cancer gene signature are identified. e, Principal component analysis of gene expression data from LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl, and control EPCAM-sorted endometrial epithelial cells. f, Proportional Euler diagram displaying differentially expressed genes from LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+. Significant overlap between differentially expressed gene comparisons (FDR < 0.05) for individual experimental genotypes vs. control was observed. Only significant genes with concordant directionality of change between comparisons were considered intersecting. g, Hierarchical clustering of expression log2FC (vs. control) values for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl signature genes (d) within each genotypic comparison. h-i, Peak width distribution of differentially accessible peaks (FDR < 0.20) for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ (h) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G. j-k, Magnitude distribution of differentially accessible peaks separated by total peaks (gray) and promoter peaks (red) for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ (j) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G (k). l-m, Histogram of all differential ATAC peaks for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ (l) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G (m) depicting distance to nearest TSS. Percent of peaks found within +/- 10, 30, or 100 kb of the TSS are shown. n-o, Histogram of differential ATAC promoter peaks for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+ (n) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G (o) depicting distance to nearest TSS. p, Significant enrichment for promoters among differentially accessible peaks in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G, and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+. Enrichment ratio is calculated by promoter bp in ATAC peak set compared to background genome. 342 Figure A.4 ARID1A loss induces EMT in mouse endometrium a-h, Images of maximum intensity confocal projections of control, LtfCre0/+; Arid1afl/fl and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl endometrium sections stained with ZO-1 (a), ICAM-1 (b-c) CLDN10 (d), Cleaved CASP3 (e), VIM (f), SNAI2 (g), CDH1 (h) (n = 2). To label endometrial epithelium, the slides were counter-stained with KRT8. White arrows indicate endometrial epithelium (scale bar = 50 µm). 343 Figure A.5 Characterization of 12Z siARID1A ATAC-seq and ARID1A ChIP antibody a, Peak width distribution of significant differentially accessible peaks for 12Z siARID1A vs. control (FDR < 0.05). b, Magnitude distribution of differentially accessible peaks separated by genome-wide peaks (gray) and promoter peaks (red) for 12Z siARID1A. c, Histogram of all differential ATAC peaks for 12Z siARID1A depicting distance to nearest TSS. Percent of peaks found within +/- 10, 30, or 100 kb of the TSS are shown. d, Histogram of differential ATAC promoter peaks for 12Z siARID1A depicting distance to nearest TSS, in kilobases. e, Immunoprecipitation of ARID1A and SWI/SNF subunits using anti-ARID1A (D2A8U) (12354, Cell Signaling), anti-ARID1A (PSG3) (sc-32761, Santa Cruz) or anti-ARID1B (E9J4T) (92964, Cell Signaling) antibodies. f, Mass spectrometry analysis of SWI/SNF subunits found in anti-ARID1A (D2A8U) immunoprecipitation sample. g, HOMER de novo motif enrichment of differentially accessible peaks found in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/+, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/V1068G, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl, and 12Z siARID1A samples. h, Proportional Euler diagram of overlap between ARID1A promoter binding and genes with decreasing and increasing expression following ARID1A knockdown (siARID1A). 344 Figure A.6 Differential gene expression as a result of PIK3CAH1047R overexpression 345 Figure A.6 (cont’d) a, Principal component analysis of gene expression data from 12Z control, siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R samples. b, Proportional Euler diagram displaying differentially expressed genes from siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R. c, Hierarchical clustering of relative gene expression within each overlapping group (b) segregated by differential expression in siARID1A, PIK3CAH1047R, and siARID1A/PIK3CAH1047R. 346 APPENDIX B Supplementary material for Chapter 3 Supplementary Methods ATAC-seq analysis. This section provides detailed explanation of the Figure 3.4 workflow graphic. Throughout the workflow, steps should be performed on all replicates and conditions in parallel. A maintained and updated, machine-readable text version of this workflow with further comments and custom unix scripts for certain workflow functions are available in this appendix and at the following GitHub repository: https://github.com/reskejak/ATAC-seq. Note that this workflow was originally written for mouse analysis and certain steps should be changed as necessary for application to other organisms. In brief, the workflow begins with concatenation of any library read technical replicates, if necessary or applicable. Reads should then be trimmed and analyzed for quality control measures prior to genome alignment via Bowtie2 (Langmead and Salzberg 2012). To avoid issues with downstream filtering commands or manipulation, it is good practice to coordinate sort and index BAM intermediates after each step forward via samtools (Li et al. 2009). Reads mapping to the mitochondrial genome are then removed from the aligned BAM, such as with the removeChrom python script developed by Harvard Informatics (https://github.com/harvardinformatics/ATAC- seq). Mitochondrial read contamination is a well-documented issue with ATAC libraries due to lack of histones in mtDNA, and improved protocols have gone as far as including additional detergents to reduce contamination (Corces et al. 2017). A further filtering step then retains only properly-paired reads for downstream use. 347 At this point, we suggest estimating the complexities of all samples in the compared conditions, and then performing a stochastic subsampling process in order to standardize all samples to equivalent molecular complexity. The R packages preseqR (Daley and Smith 2013) and a wrapper ATACseqQC (Ou et al. 2018) have implemented functions to estimate complexity by calculating a duplicate frequency matrix then estimating the number of unique molecules sequenced in each library sample. samtools view can then be used to subsample libraries based on these estimates. After standardizing library complexity across the experimental design, remove PCR duplicates e.g. through Picard MarkDuplicates (http://broadinstitute.github.io/picard/). The next steps involve converting the BAMs into paired-end BED format (BEDPE) for downstream peak calling. First, sort the duplicate-removed BAMs by read names and fix associated read mate information. Converting to BEDPE is then achieved via bedtools (Quinlan and Hall 2010). In this format, a 9 bp coordinate shift can now be carried out to compensate for Tn5 transposase adapter insertion, which is practically achieved by a +4 and -5 bp shift to the Watson and Crick strand coordinates, respectively. The Tn5 shift should only minorly affect peak calling but is likely more important for high-resolution mapping such as motif footprinting. This step is largely a historic formality as was first reported by Buenrostro et al. (Buenrostro et al. 2013), and a bash script (bedpeTn5shift.sh) is included in this appendix that will perform this adjustment via awk. Finally, MACS2 (Zhang et al. 2008) requires a minimal 4-column BEDPE format that is collapsed from the standard 10-column bedtools format. We have also included a bash script (bedpeMinimalConvert.sh) to perform this minimal BEDPE format conversion, which can be found in this appendix. 348 Significant broad peaks can then be called from the minimal BEDPE individual replicates via MACS2 (Zhang et al. 2008) without a supplied input sample. Peaks should then be filtered with bedtools to remove low mappability, highly repetitive “blacklisted” genomic regions, which were previously identified in a comprehensive analysis of ENCODE data (Amemiya, Kundaje, and Boyle 2019). Removing peaks mapped to unplaced chromosome contigs is also suggested. At this stage in the workflow, one could proceed directly into our suggested DA analysis method with the individual replicate peak sets. However, it is often desirable to identify regions which consistently display ATAC peaks in all replicates for a given condition. For this purpose, we implemented the ENCODE-defined naïve overlap to determine biological replicate peak concordance. This method calls peaks on pooled replicates, and then identifies peaks displaying at least 50% overlap with all single replicate peaks. A bash script (naiveOverlapBroad.sh) is supplied in this appendix to execute this function for computing naïve overlap from two broadPeak replicates, and it may be easily modified to support more replicates. Differential accessibility analysis. Workflows for all implemented differential accessibility (DA) tools described in Chapter 3 will be described in detail in this section. The csaw (Lun and Smyth 2016) portion of this section describes the Figure 3.6 workflow, for which a machine-readable R script is available in this appendix and maintained and updated at the following GitHub repository: https://github.com/reskejak/ATAC-seq. The BAM files supplied to DA tools correspond to the coordinate sorted/indexed, duplicate removed, complexity normalized, properly-paired restricted, non-mitochondrial, paired-end BAM files generated as described in the previous ATAC-seq analysis section and the Figure 3.4 workflow graphic. 349 For DiffBind (Stark and Brown 2011), an experimental design sample table can be generated in R or a text editor in the format as described in the manual. This includes the columns “ID”, “Factor”, “Condition”, “Replicate”, “bamReads”, “Peaks”, and “PeakCaller”. The field “bamControl” is not included for ATAC-seq analysis. The experiment DBA object is constructed through dba() using this “sampleSheet” table, and DESeq2 (Love, Huber, and Anders 2014) analysis is specified via `AnalysisMethod=DBA_DESEQ2` with option `minOverlap=2`. Average fragment size parameter was supplied as a list of all replicates using the values obtained from each MACS2 .xls output file. dba.count() is then used on the experiment DBA object to count reads in peaks, followed by dba.contrast() with parameters `minMembers=2` and `categories=DBA_FACTOR`. The DBA object will then construct a consensus peak matrix for further DA interrogation. DA is then calculated via dba.analyze(), where the two normalization methods reported correspond to those with the Boolean operator `bFullLibrarySize`. A scalar count normalization by sample library total read depth is achieved with `bFullLibrarySize=TRUE`, whereas `bFullLibrarySize=FALSE` will only use reads that are located within the consensus peak matrix for scalar normalization. dba.report() then outputs the DA results for only significant regions by default (FDR < 0.05), or the parameter `th=1` can be used to elicit results for all regions tested within the consensus peak matrix with their associated statistics. For csaw (Lun and Smyth 2016), DA can be computed by either supplying a pre-defined peak set such as from MACS2 (Zhang et al. 2008) or by calling enriched regions de novo through the implemented sliding window method. Both approaches will be outlined here. When starting with a pre-defined peak set, first import peak set BED files and construct GRanges objects (Lawrence et al. 2013), then define a consensus peak set desired for further interrogation. For example, the consensus peak set, 4, could be derived from 1) the union of all replicate peak sets 350 for both conditions, ! = ( ⋃"!#$ %! ) ⋃( ⋃"!#$ (! ), 2) the union of only naïve overlap peaks for both conditions, ! = %%&' ⋃ (%&' , or 3) the union of condition peaks with any partial intersect for all replicates in a given condition, ! = (⋂"!#$ %! ) ⋃(⋂"!#$ (! ), where %$ , …, %" are replicate peak sets for the experimental condition, ($ , …, (" are replicate peak sets for the control condition, %%&' is the naïve overlap peak set for the experimental condition, and (%&' is the naïve overlap peak set for the control condition. The latter of which (3) was selected for the presented analysis and is implemented in Figure 3.6. Read parameters should then be defined through readParam() specifying paired-end data and option `max.frag=1000`, to remove fragments over 1 kilobase in concordance with the library size-selection step. The `discard` parameter should be supplied with the blacklisted regions described earlier and `restrict` specified to standard chromosomes. Then, count reads in the specified consensus peak set windows by regionCounts(), and subsequently filter low abundance peaks e.g. by a logCPM > -3 threshold as used in this analysis. For normalization, TMM (Trimmed mean of M values) (Robinson and Oshlack 2010) first requires counting of large background bins through windowCounts() with `bin=TRUE`, and a standard parameter here is `width=10000` for 10 kb bins. Then, normFactors() will generate TMM scaling factors based on the background binned counts. If instead desired, the loess-based normalization can be issued through normOffsets() with parameters `type="loess"` and `se.out=TRUE`, thereby writing the log-based offsets to the peak count matrix. Then, for DA analysis through edgeR (Robinson, McCarthy, and Smyth 2010), build a design model matrix, stabilize estimates with empirical bayes function estimateDisp(), fit the quasi-likelihood negative binomial model with glmQLFit(), specify contrast with makeContrasts(), and compute the quasi-likelihood F-tests with glmQLFTest(). Reference the edgeR manual for more information on constructing the design matrix. Finally, merge proximal tested windows by mergeWindows(), where a typical analysis will 351 merge windows up to 500 bp apart for a maximal merged window size of 5 kb, as was performed in Figure 3.6 analysis. Then, use the most significant window as a statistical representation of the merged window with getBestTest(). The final window set can be filtered by a desired FDR threshold to determine significant DA regions. If instead it is desired to identify de novo locally-enriched windows in csaw without prior peak calling, first assess the fragment length distribution with getPEsizes() to select an optimal window size. The window size is a critical parameter and should be set to larger than the majority of fragments; see csaw manual for more details. 300 bp was the optimal window size selected for the data analyzed here, so read windows were counted throughout the genome via windowCounts() with `width=300`. Next, there are numerous ways to filter uninteresting windows, one of which being local enrichment. We used a 2 kilobase neighborhood local background estimator to filter for windows only with a 3-fold increase in enrichment over neighborhood abundance. This was achieved by widening windows with resize(), counting neighborhood reads with regionCounts(), and filtering low enrichment windows with filterWindows(). Then, the locally enriched windows can be subject to DA analysis, as described above, by implementing a normalization method, building a model, stabilizing estimates, fitting the model, and so forth. The csaw workflow R script located in this appendix (and the GitHub repository) details commands for the entire DA process described here for both TMM and loess normalization and either using a prior defined MACS2 peak set or identifying de novo locally-enriched windows. For voom (Law et al. 2014) methods (VII and VIII), analyses presented in the manuscript were conducted by first reading MACS2 peak sets into csaw for counting and filtering as described above, though one could also apply voom methods to csaw de novo locally-enriched windows as well. The window counts table was extracted from csaw by assay() and converted into a data frame 352 for further manipulation. After setting up the model matrix and contrasts, normalization and mean- variance estimation was computed by voom(). Quantile normalization was applied with the voom option `normalize.method="quantile"`, which applies the Bolstad et al. quantile normalization method that has also been used for ATAC-seq by other groups (Bolstad et al. 2003; Corces et al. 2018). Through limma (Ritchie et al. 2015), linear models were then fit with lmFit(), followed by contrasts.fit(), and eBayes() for moderated statistics and hypothesis testing. topTable() was used to extract full DA results with option `n=Inf`. See limma manual for more details on model matrix and contrast design. 353 ATACseq_workflow.txt Example machine-readable Fig. 3.4 workflow including stepwise Unix and R commands for ATAC-seq data processing. # ATACseq_workflow.txt # ATAC-seq workflow from reads to peaks, in Unix and R # Jake Reske # Michigan State University # reskejak@msu.edu # https://github.com/reskejak # Machine-readable version of Figure 3.4 workflow for ATAC-seq data analysis # beginning with raw, paired-end Illumina reads # example experimental design: n=2 mouse ATAC-seq biological replicates for two conditions: treat and control ####################################### # Unix dependencies, in order of use: # FastQC # Trim Galore! # MultiQC # Bowtie2 # samtools # Harvard ATAC-seq module (for removeChrom python script) # Picard (for MarkDuplicates tool) # bedtools # MACS2 # R dependencies: # ATACseqQC (preseqR) # additional unix bash scripts used for custom functions: # bedpeTn5shift (https://raw.githubusercontent.com/reskejak/ATAC-seq/master/bedpeTn5shift.sh) # bedpeMinimalConvert (https://raw.githubusercontent.com/reskejak/ATAC- seq/master/bedpeMinimalConvert.sh) # naiveOverlapBroad (https://raw.githubusercontent.com/reskejak/ATAC- seq/master/naiveOverlapBroad.sh) ######################################## ######################################## ######################################## ### QUALITY CONTROL # in unix # Concatenate technical replicates between flowcells cat flowcell1/treat1_R1.fastq.gz flowcell2/treat1_R1.fastq.gz > cat/treat1_R1.fastq.gz # treat1_R1 cat flowcell1/treat1_R2.fastq.gz flowcell2/treat1_R2.fastq.gz > cat/treat1_R2.fastq.gz # treat1_R2 cat flowcell1/treat2_R1.fastq.gz flowcell2/treat2_R1.fastq.gz > cat/treat2_R1.fastq.gz # treat2_R1 cat flowcell1/treat2_R2.fastq.gz flowcell2/treat2_R2.fastq.gz > cat/treat2_R2.fastq.gz # treat2_R2 at flowcell1/control1_R1.fastq.gz flowcell2/control1_R1.fastq.gz > cat/control1_R1.fastq.gz # control1_R1 cat flowcell1/control1_R2.fastq.gz flowcell2/control1_R2.fastq.gz > cat/control1_R2.fastq.gz # control1_R2 cat flowcell1/control2_R1.fastq.gz flowcell2/control2_R1.fastq.gz > cat/control2_R1.fastq.gz # control2_R1 cat flowcell1/control2_R2.fastq.gz flowcell2/control2_R2.fastq.gz > cat/control2_R2.fastq.gz # control2_R2 354 ATACseq_workflow.txt (cont’d) # QC raw reads fastqc treat1_R1.fastq.gz fastqc treat1_R2.fastq.gz fastqc treat2_R1.fastq.gz fastqc treat2_R2.fastq.gz fastqc control1_R1.fastq.gz fastqc control1_R2.fastq.gz fastqc control2_R1.fastq.gz fastqc control2_R2.fastq.gz # Trim reads trim_galore --paired -o trimmed/ treat1_R1.fastq.gz treat1_R2.fastq.gz trim_galore --paired -o trimmed/ treat2_R1.fastq.gz treat2_R2.fastq.gz trim_galore --paired -o trimmed/ control1_R1.fastq.gz control1_R2.fastq.gz trim_galore --paired -o trimmed/ control2_R1.fastq.gz control2_R2.fastq.gz # rename output files e.g. treat1_R1_trimmed.fq.gz or modify script # QC trimmed reads fastqc treat1_R1_trimmed.fq.gz fastqc treat1_R2_trimmed.fq.gz fastqc treat2_R1_trimmed.fq.gz fastqc treat2_R2_trimmed.fq.gz fastqc control1_R1_trimmed.fq.gz fastqc control1_R2_trimmed.fq.gz fastqc control2_R1_trimmed.fq.gz fastqc control2_R2_trimmed.fq.gz # Combine QC data python multiqc . # in the fastqc output directory; repeat for raw and trimmed ######################################## ### Align # Align to reference genome # first have to prepare genome index; see bowtie2 manual for details bowtie2 --very-sensitive -X 1000 -x mm10_bt2_index -1 treat1_R1_trimmed.fq.gz -2 treat1_R2_trimmed.fq.gz | samtools view -bS - > treat1.bam bowtie2 --very-sensitive -X 1000 -x mm10_bt2_index -1 treat2_R1_trimmed.fq.gz -2 treat2_R2_trimmed.fq.gz | samtools view -bS - > treat2.bam bowtie2 --very-sensitive -X 1000 -x mm10_bt2_index -1 control1_R1_trimmed.fq.gz -2 control1_R2_trimmed.fq.gz | samtools view -bS - > control1.bam bowtie2 --very-sensitive -X 1000 -x mm10_bt2_index -1 control2_R1_trimmed.fq.gz -2 control2_R2_trimmed.fq.gz | samtools view -bS - > control2.bam # Coordinate sort and index samtools sort -o treat1.sorted.bam treat1.bam; samtools index treat1.sorted.bam samtools sort -o treat2.sorted.bam treat2.bam; samtools index treat2.sorted.bam samtools sort -o control1.sorted.bam control1.bam; samtools index control1.sorted.bam samtools sort -o control2.sorted.bam control2.bam; samtools index control2.sorted.bam ######################################## ### Filter # Remove mtDNA reads # using Harvard ATAC-seq module (https://github.com/jsh58/harvard) # python script removeChrom.py samtools view -h treat1.sorted.bam | python removeChrom.py - - chrM | samtools view -bh - > treat1.noMT.bam samtools view -h treat2.sorted.bam | python removeChrom.py - - chrM | samtools view -bh - > treat2.noMT.bam samtools view -h control1.sorted.bam | python removeChrom.py - - chrM | samtools view -bh - > control1.noMT.bam samtools view -h control2.sorted.bam | python removeChrom.py - - chrM | samtools view -bh - > control2.noMT.bam 355 ATACseq_workflow.txt (cont’d) # Coordinate sort and index samtools sort -o treat1.sorted.noMT.bam treat1.noMT.bam; samtools index treat1.sorted.noMT.bam samtools sort -o treat2.sorted.noMT.bam treat2.noMT.bam; samtools index treat2.sorted.noMT.bam samtools sort -o control1.sorted.noMT.bam control1.noMT.bam; samtools index control1.sorted.noMT.bam samtools sort -o control2.sorted.noMT.bam control2.noMT.bam; samtools index control2.sorted.noMT.bam # Restrict to properly-paired reads only # -f 3 specifies only properly-paired reads samtools view -bh -f 3 treat1.sorted.noMT.bam > treat1.filt.noMT.bam samtools view -bh -f 3 treat2.sorted.noMT.bam > treat2.filt.noMT.bam samtools view -bh -f 3 control1.sorted.noMT.bam > control1.filt.noMT.bam samtools view -bh -f 3 control2.sorted.noMT.bam > control2.filt.noMT.bam # Coordinate sort and index samtools sort -o treat1.sorted.filt.noMT.bam treat1.filt.noMT.bam; samtools index treat1.sorted.filt.noMT.bam samtools sort -o treat2.sorted.filt.noMT.bam treat2.filt.noMT.bam; samtools index treat2.sorted.filt.noMT.bam samtools sort -o control1.sorted.filt.noMT.bam control1.filt.noMT.bam; samtools index control1.sorted.filt.noMT.bam samtools sort -o control2.sorted.filt.noMT.bam control2.filt.noMT.bam; samtools index control2.sorted.filt.noMT.bam ######################################## ### Complexity # in R library(ATACseqQC) # Define sample files treat1 <- "treat1.sorted.filt.noMT.bam" treat2 <- "treat2.sorted.filt.noMT.bam" control1 <- "control1.sorted.filt.noMT.bam" control2 <- "control2.sorted.filt.noMT.bam" treat1.bai <- "treat1.sorted.filt.noMT.bam.bai" treat2.bai <- "treat2.sorted.filt.noMT.bam.bai" control1.bai <- "control1.sorted.filt.noMT.bam.bai" control2.bai <- "control2.sorted.filt.noMT.bam.bai" # Calculate duplication frequency matrix treat1.dups <- readsDupFreq(treat1, index=treat1.bai) treat2.dups <- readsDupFreq(treat2, index=treat2.bai) control1.dups <- readsDupFreq(control1, index=control1.bai) control2.dups <- readsDupFreq(control2, index=control2.bai) # Estimate library complexity treat1.complexity <- estimateLibComplexity(treat1.dups, times=100, interpolate.sample.sizes=seq(0.1, 1, by=0.01) treat2.complexity <- estimateLibComplexity(treat2.dups, times=100, interpolate.sample.sizes=seq(0.1, 1, by=0.01) control1.complexity <- estimateLibComplexity(control1.dups, times=100, interpolate.sample.sizes=seq(0.1, 1, by=0.01) control2.complexity <- estimateLibComplexity(control2.dups, times=100, interpolate.sample.sizes=seq(0.1, 1, by=0.01) # notes on interpretation: # $relative.size = relative library size, i.e. =1 is the full library size supplied # $values = number of unique fragments sequenced at a given $relative.size # downstream: # from all libaries in experiment, identify lowest $values integer at $relative.size==1 # estimate fraction of library to subsample to achieve uniform number of unique fragments (molecular complexity) 356 ATACseq_workflow.txt (cont’d) # back in Unix # Subsample based on library complexity estimates # dummy example samtools view -h -b -s 1.45 treat1.sorted.filt.noMT.bam > treat1_sub.filt.noMT.bam # -s 1.45 indicates 45% subsample, seed=1 samtools view -h -b -s 1.78 treat2.sorted.filt.noMT.bam > treat2_sub.filt.noMT.bam # -s 1.78 indicates 78% subsample, seed=1 # do not subsample control1; e.g. sample with lowest complexity at given read depth samtools view -h -b -s 1.91 control2.sorted.filt.noMT.bam > control2_sub.filt.noMT.bam # -s 1.91 indicates 91% subsample, seed=1 samtools sort -o treat1_sub.sorted.filt.noMT.bam treat1_sub.filt.noMT.bam; samtools index treat1_sub.sorted.filt.noMT.bam samtools sort -o treat2_sub.sorted.filt.noMT.bam treat2_sub.filt.noMT.bam; samtools index treat2_sub.sorted.filt.noMT.bam # no need to sort/index control1 again; did not subsample samtools sort -o control2_sub.sorted.filt.noMT.bam control2_sub.filt.noMT.bam; samtools index control2_sub.sorted.filt.noMT.bam # will proceed forward with the subsampled BAMs ######################################## ### Filter # Remove PCR duplicates with Picard (https://broadinstitute.github.io/picard/) java -jar picard.jar MarkDuplicates I=treat1_sub.sorted.filt.noMT.bam O=treat1_sub.noDups.filt.noMT.bam M=treat1_sub_dups.txt REMOVE_DUPLICATES=true java -jar picard.jar MarkDuplicates I=treat2_sub.sorted.filt.noMT.bam O=treat2_sub.noDups.filt.noMT.bam M=treat2_sub_dups.txt REMOVE_DUPLICATES=true java -jar picard.jar MarkDuplicates I=control1.sorted.filt.noMT.bam O=control1.noDups.filt.noMT.bam M=control1_dups.txt REMOVE_DUPLICATES=true java -jar picard.jar MarkDuplicates I=control2_sub.sorted.filt.noMT.bam O=control2_sub.noDups.filt.noMT.bam M=control2_sub_dups.txt REMOVE_DUPLICATES=true # Coordinate sort and index samtools sort -o treat1_sub.sorted.noDups.filt.noMT.bam treat1_sub.noDups.filt.noMT.bam; samtools index treat1_sub.sorted.noDups.filt.noMT.bam samtools sort -o treat2_sub.sorted.noDups.filt.noMT.bam treat2_sub.noDups.filt.noMT.bam; samtools index treat2_sub.sorted.noDups.filt.noMT.bam samtools sort -o control1.sorted.noDups.filt.noMT.bam control1.noDups.filt.noMT.bam; samtools index control1.sorted.noDups.filt.noMT.bam samtools sort -o control2_sub.sorted.noDups.filt.noMT.bam control2_sub.noDups.filt.noMT.bam; samtools index control2_sub.sorted.noDups.filt.noMT.bam ######################################## ### Format # Sort by read name samtools sort -n -o treat1_sub.namesorted.noDups.filt.noMT.bam treat1_sub.sorted.noDups.filt.noMT.bam samtools sort -n -o treat2_sub.namesorted.noDups.filt.noMT.bam treat2_sub.sorted.noDups.filt.noMT.bam samtools sort -n -o control1.namesorted.noDups.filt.noMT.bam control1.sorted.noDups.filt.noMT.bam samtools sort -n -o control2_sub.namesorted.noDups.filt.noMT.bam control2_sub.sorted.noDups.filt.noMT.bam # Fix read mates samtools fixmate treat1_sub.namesorted.noDups.filt.noMT.bam treat1_sub.fixed.bam samtools fixmate treat2_sub.namesorted.noDups.filt.noMT.bam treat2_sub.fixed.bam samtools fixmate control1.namesorted.noDups.filt.noMT.bam control1.fixed.bam samtools fixmate control2_sub.namesorted.noDups.filt.noMT.bam control2_sub.fixed.bam 357 ATACseq_workflow.txt (cont’d) # BEDPE conversion samtools view -bf 0x2 treat1_sub.fixed.bam | bedtools bamtobed -i stdin -bedpe > treat1_sub.fixed.bedpe samtools view -bf 0x2 treat2_sub.fixed.bam | bedtools bamtobed -i stdin -bedpe > treat2_sub.fixed.bedpe samtools view -bf 0x2 control1.fixed.bam | bedtools bamtobed -i stdin -bedpe > control1.fixed.bedpe samtools view -bf 0x2 control2_sub.fixed.bam | bedtools bamtobed -i stdin -bedpe > control2_sub.fixed.bedpe # Tn5 shift bash bedpeTn5shift.sh treat1_sub.fixed.bedpe > treat1_sub.tn5.bedpe bash bedpeTn5shift.sh treat2_sub.fixed.bedpe > treat2_sub.tn5.bedpe bash bedpeTn5shift.sh control1.fixed.bedpe > control1.tn5.bedpe bash bedpeTn5shift.sh control2_sub.fixed.bedpe > control2_sub.tn5.bedpe # Minimal conversion (from standard BEDPE format to that accepted by MACS2) bash bedpeMinimalConvert.sh treat1_sub.tn5.bedpe > treat1_sub.minimal.bedpe bash bedpeMinimalConvert.sh treat2_sub.tn5.bedpe > treat2_sub.minimal.bedpe bash bedpeMinimalConvert.sh control1.tn5.bedpe > control1.minimal.bedpe bash bedpeMinimalConvert.sh control2_sub.tn5.bedpe > control2_sub.minimal.bedpe ######################################## ### Peak calling # Call significant broad peaks (FDR < 0.05) on each individual replicate macs2 callpeak -t treat1_sub.minimal.bedpe -f BEDPE -n treat1 -g mm --broad --broad-cutoff 0.05 --keep-dup all macs2 callpeak -t treat2_sub.minimal.bedpe -f BEDPE -n treat2 -g mm --broad --broad-cutoff 0.05 --keep-dup all macs2 callpeak -t control1.minimal.bedpe -f BEDPE -n control1 -g mm --broad --broad-cutoff 0.05 --keep-dup all macs2 callpeak -t control2_sub.minimal.bedpe -f BEDPE -n control2 -g mm --broad --broad-cutoff 0.05 --keep-dup all # -g mm refers to Mus musculus genome size; use -g hs for Homo sapiens, or see MACS2 manual for manually setting genome size e.g. for other organisms # for simplicity, the "_sub" suffix was omitted to MACS2 output here # Filter ENCODE-defined blacklist regions and unplaced contigs # Blacklist regions refer to highly repetitive/unstructured regions with artificially high signal in genomic experiments. Available for certain model organisms. See publication for details. https://github.com/Boyle-Lab/Blacklist/ # Amemiya, H.M., Kundaje, A. & Boyle, A.P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep 9, 9354 (2019). (PMID: 31249361) bedtools intersect -v -a treat1_peaks.broadPeak -b mm10.blacklist.bed | grep -P 'chr[\dXY]+[ \t]' > treat1_peaks.filt.broadPeak bedtools intersect -v -a treat2_peaks.broadPeak -b mm10.blacklist.bed | grep -P 'chr[\dXY]+[ \t]' > treat2_peaks.filt.broadPeak bedtools intersect -v -a control1_peaks.broadPeak -b mm10.blacklist.bed | grep -P 'chr[\dXY]+[ \t]' > control1_peaks.filt.broadPeak bedtools intersect -v -a control2_peaks.broadPeak -b mm10.blacklist.bed | grep -P 'chr[\dXY]+[ \t]' > control2_peaks.filt.broadPeak ######################################## 358 ATACseq_workflow.txt (cont’d) # Compute naive overlap # Definition from ENCODE / Kundaje et al.: "Find pooled peaks that overlap Rep1 and Rep2 where overlap is defined as the fractional overlap wrt any one of the overlapping peak pairs >= 0.5". # Call MACS2 broad peaks on pooled replicates and filter blacklist regions macs2 callpeak -t treat1.minimal.bedpe treat2.minimal.bedpe -f BEDPE -n treat_pool -g mm -- broad --broad-cutoff 0.05 --keep-dup all macs2 callpeak -t control1.minimal.bedpe control2.minimal.bedpe -f BEDPE -n control_pool -g mm --broad --broad-cutoff 0.05 --keep-dup all bedtools intersect -v -a treat_pool_peaks.broadPeak -b mm10.blacklist.bed | grep -P 'chr[\dXY]+[ \t]' > treat_pool_peaks.filt.broadPeak bedtools intersect -v -a control_pool_peaks.broadPeak -b mm10.blacklist.bed | grep -P 'chr[\dXY]+[ \t]' > control_pool_peaks.filt.broadPeak # Compute naive overlap bash naiveOverlapBroad.sh treat1_peaks.filt.broadPeak treat2_peaks.filt.broadPeak treat_pool_peaks.filt.broadPeak > treat_overlap_peaks.filt.broadPeak bash naiveOverlapBroad.sh control1_peaks.filt.broadPeak control2_peaks.filt.broadPeak control_pool_peaks.filt.broadPeak > control_overlap_peaks.filt.broadPeak 359 bedpeTn5shift.sh Bash script for shifting coordinates in standard 10-column format BEDPE files to compensate for Tn5 adapter insertion as described in Buenrostro et al. See Fig. 3.4 for usage. #!/bin/bash ########################## # bedpeTn5shift.sh # Jake Reske # Michigan State University # reskejak@msu.edu # https://github.com/reskejak # intended to shift BEDPE coordinates +4 and -5 bp, respectively, to compensate for Tn5 insertion in ATAC data as described by Buenrostro et al. 2013 (PMID: 24097267). # usage: bash bedpeTn5shift.sh treat1.bedpe > treat1.tn5.bedpe # first make executable by: chmod u+x bedpeTn5shift.sh ########################## FILE=$1 awk -F $'\t' 'BEGIN {OFS = FS}{ \ if ($9 == "+") {$2 = $2 + 4; $6 = $6 - 5} \ else if ($9 == "-") {$3 = $3 - 5; $5 = $5 + 4} \ print $0}' $FILE 360 bedpeMinimalConvert.sh Bash script for converting standard 10-column format BEDPE to the “minimal” format defined by MACS2. See Fig. 3.4 for usage. #!/bin/bash ########################## # bedpeMinimalConvert.sh # Jake Reske # Michigan State University # reskejak@msu.edu # https://github.com/reskejak # convert standard bedtools BEDPE format to "minimal" BEDPE format accepted by MACS2 for peak calling # note, can take 20-40 minutes or longer per large (>10 million reads) BEDPE file # usage: bash bedpeMinimalConvert.sh treat1.bedpe > treat1.minimal.bedpe # first make executable by: chmod u+x bedpeMinimalConvert.sh ########################## FILE=$1 # define coordSort function function coordSort() { while read line; do ary=(${line}) for((i=0; i!=${#ary[@]}; ++i)); do for((j=i+1; j!=${#ary[@]}; ++j)); do if (( ary[i] > ary[j] )); then ((key=ary[i])) ((ary[i]=ary[j])) ((ary[j]=key)) fi done done echo ${ary[@]} done < ${1} } # get paired-end coordinates from BEDPE file awk 'BEGIN{OFS="\t"}{printf "%s\t%s\t%s\t%s\n",$2,$3,$5,$6}' $FILE > coords.$FILE # sort coordinates and arrange in minimal format coordSort coords.$FILE | paste - $FILE | awk 'BEGIN{OFS="\t"}{printf "%s\t%s\t%s\t%s\n",$5,$1,$4,$11}' - # remove intermediate coordinates file rm coords.$FILE 361 naiveOverlapBroad.sh Bash script for calculating naïve overlap broad peak set from two individual replicate peak sets and a pooled replicate peak set. Can be modified for to accept more replicates as desired. See Fig. 3.4 for usage. #!/bin/bash ########################## # naiveOverlapBroad.sh # Jake Reske # Michigan State University # reskejak@msu.edu # https://github.com/reskejak # Computing ENCODE-defined naive overlapping broadPeak set from two ATAC-seq (or ChIP-seq etc.) individual replicate broadPeak and replicate-pooled broadPeak sets # Definition from Kundaje et al.: "Find pooled peaks that overlap Rep1 and Rep2 where overlap is defined as the fractional overlap [with respect to] any one of the overlapping peak pairs >= 0.5". # usage: bash naiveOverlapBroad.sh treat1_peaks.broadPeak treat2_peaks.broadPeak treat_pool_peaks.broadPeak > treat_overlap_peaks.broadPeak # first make executable by: chmod u+x naiveOverlapBroad.sh # dependencies: bedtools (for intersectBed / bedtools intersect) # can extend function to accept more replicates as desired ########################## REP1=$1 REP2=$2 POOL=$3 # broadPeak intersectBed -wo \ -a ${POOL} -b ${REP1} | \ awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$12-$11; if (($19/s1 >= 0.5) || ($19/s2 >= 0.5)) {print $0}}' | \ cut -f 1-9 | sort | uniq | \ intersectBed -wo \ -a stdin -b ${REP2} | \ awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$12-$11; if (($19/s1 >= 0.5) || ($19/s2 >= 0.5)) {print $0}}' | \ cut -f 1-9 | sort | uniq 362 csaw_workflow.R Example R workflow for differential accessibility analysis with csaw as graphically displayed in Fig. 3.6 with minor package compatibility updates. Describes process for both TMM and loess normalizations and either supplying MACS2 peak sets as query regions or identifying de novo locally enriched windows. # csaw_workflow.R # csaw workflow for ATAC-seq differential accessibility analysis, in R # Jake Reske # Michigan State University # reskejak@msu.edu # https://github.com/reskejak # Updated, machine-readable version of Figure 3.6 workflow for ATAC-seq DA analysis with csaw # will describe methods for using 1) pre-defined peaks from MACS2 as well as 2) csaw de novo enriched window calling by local enrichment, # and normalization methods including 1) TMM on binned counts and 2) loess-based for trended biases # brief aside: use gc() to help clear memory after intensive commands if crashes/errors occur # example experimental design: n=2 mouse ATAC-seq biological replicates for two conditions: treat and control library(GenomicRanges) library(csaw) ######################################## ######################################## ######################################## # starting from MACS2 filtered broadpeaks # read replicate broadPeak files treat1.peaks <- read.table("treat1_peaks.filt.broadPeak", sep="\t")[,1:3] treat2.peaks <- read.table("treat2_peaks.filt.broadPeak", sep="\t")[,1:3] control1.peaks <- read.table("control1_peaks.filt.broadPeak", sep="\t")[,1:3] control2.peaks <- read.table("control2_peaks.filt.broadPeak", sep="\t")[,1:3] colnames(treat1.peaks) <- c("chrom", "start", "end") colnames(treat2.peaks) <- c("chrom", "start", "end") colnames(control1.peaks) <- c("chrom", "start", "end") colnames(control2.peaks) <- c("chrom", "start", "end") # read naive overlap broadPeak files treat.overlap.peaks <- read.table("treat_overlap_peaks.filt.broadPeak", sep="\t")[,1:3] control.overlap.peaks <- read.table("control_overlap_peaks.filt.broadPeak", sep="\t")[,1:3] colnames(treat.overlap.peaks) <- c("chrom", "start", "end") colnames(control.overlap.peaks) <- c("chrom", "start", "end") # convert to GRanges objects treat1.peaks <- GRanges(treat1.peaks) treat2.peaks <- GRanges(treat2.peaks) treatn.peaks <- GRanges(treatn.peaks) treat.overlap.peaks <- GRanges(treat.overlap.peaks) control1.peaks <- GRanges(control1.peaks) control2.peaks <- GRanges(control2.peaks) controln.peaks <- GRanges(controln.peaks) control.overlap.peaks <- GRanges(control.overlap.peaks) 363 csaw_workflow.R (cont’d) # define consensus peak set # one method: union of all replicate peak sets for both conditions treat.peaks <- union(treat1.peaks, treat2.peaks) control.peaks <- union(control1.peaks, control2.peaks) all.peaks <- union(treat.peaks, control.peaks) # another method: intersect between biological replicates; union between both experimental conditions treat.peaks <- intersect(treat1.peaks, treat2.peaks) control.peaks <- intersect(control1.peaks, control2.peaks) all.peaks <- union(treat.peaks, control.peaks) # yet another method: union between naive overlapping peak sets all.peaks <- union(treat.overlap.peaks, control.overlap.peaks) ############################## # specify paired-end BAMs pe.bams <- c("control1.sorted.noDups.filt.noMT.bam", "control2.sorted.noDups.filt.noMT.bam", "treat1.sorted.noDups.filt.noMT.bam", "treat2.sorted.noDups.filt.noMT.bam") ############################## # read mm10 blacklist blacklist <- read.table("~/ref_genome/mm10.blacklist.bed", sep="\t") colnames(blacklist) <- c("chrom", "start", "end") blacklist <- GRanges(blacklist) # define read parameters standard.chr <- paste0("chr", c(1:19, "X", "Y")) # only use standard chromosomes param <- readParam(max.frag=1000, pe="both", discard=blacklist, restrict=standard.chr) ############################## # count reads in windows specified by MACS2 peak.counts <- regionCounts(pe.bams, all.peaks, param=param) ############################## # MACS2 peaks only: filter low abundance peaks library("edgeR") peak.abundances <- aveLogCPM(asDGEList(peak.counts)) peak.counts.filt <- peak.counts[peak.abundances > -3, ] # only use peaks logCPM > -3 # few or no peaks should be removed; modify as desired ############################## # get paired-end fragment size distribution control1.pe.sizes <- getPESizes("control1.sorted.noDups.filt.noMT.bam") control2.pe.sizes <- getPESizes("control2.sorted.noDups.filt.noMT.bam") treat1.pe.sizes <- getPESizes("treat1.sorted.noDups.filt.noMT.bam") treat2.pe.sizes <- getPESizes("treat2.sorted.noDups.filt.noMT.bam") gc() # plot hist(treat1.pe.sizes$sizes) # repeat for all replicates and conditions # for analysis with csaw de novo enriched query windows, select a window size that is greater than the majority of fragments ############################## # count BAM reads in, e.g. 300 bp windows counts <- windowCounts(pe.bams, width=300, param=param) # set width as desired from the fragment length distribution analyses ########################################## 364 csaw_workflow.R (cont’d) # NORMALIZATION # method 1: MACS2 peaks only, TMM normalization based on binned counts peak.counts.tmm <- peak.counts.filt peak.counts.tmm <- normFactors(binned, se.out=peak.counts.tmm) # method 2: MACS2 peaks only, csaw loess-normalization peak.counts.loess <- peak.counts.filt peak.counts.loess <- normOffsets(peak.counts.loess, se.out=TRUE) # type="loess" is now default # from vignette: "For type="loess", a numeric matrix of the same dimensions as counts, containing the log-based offsets for use in GLM fitting." # method 3: csaw de novo peaks by local enrichment, TMM normalization based on binned counts counts.local.tmm <- counts.local.filt counts.local.tmm <- normFactors(binned, se.out=counts.local.tmm) # method 4: csaw de novo peaks by local enrichment, csaw loess-normalization counts.local.loess <- counts.local.filt counts.local.loess <- normOffsets(counts.local.loess, se.out=TRUE) # type="loess" is now default # from vignette: "For type="loess", a numeric matrix of the same dimensions as counts, containing the log-based offsets for use in GLM fitting." ######################################### # DIFFERENTIAL ACCESSIBILITY ANALYSIS # set working windows for the desired analysis working.windows <- peak.counts.tmm # MACS2 peaks only, standard TMM normalization based on binned counts # working.windows <- peak.counts.loess # MACS2 peaks only, for trended biases # working.windows <- counts.local.tmm # csaw de novo peaks by local enrichment, standard TMM normalization based on binned counts # working.windows <- counts.local.loess # csaw de novo peaks by local enrichment, for trended biases # SEE THE CSAW MANUAL FOR MORE INFO ON NORMALIZATION METHODS ########### # setup design matrix # see edgeR manual for more information y <- asDGEList(working.windows) colnames(y$counts) <- c("control1", "control2", "treat1", "treat2") rownames(y$samples) <- c("control1", "control2", "treat1", "treat2") y$samples$group <- c("control", "control", "treat", "treat") design <- model.matrix(~0+group, data=y$samples) colnames(design) <- c("control", "treat") # design # stabilize dispersion estimates with empirical bayes y <- estimateDisp(y, design) fit <- glmQLFit(y, design, robust=TRUE) # testing for differentially-accessible windows results <- glmQLFTest(fit, contrast=makeContrasts(treat-control, levels=design)) # head(results$table) rowData(working.windows) <- cbind(rowData(working.windows), results$table) # combine GRanges rowdata with differential statistics # working.windows@rowRanges # merge nearby windows # up to "tol" distance apart: 500 bp in this case; max merged window width: 5000 bp merged.peaks <- mergeWindows(rowRanges(working.windows), tol=500L, max.width=5000L) # summary(width(merged.peaks$region)) # should merge some peaks; change as desired # use most significant window as statistical representation for p-value and FDR for merged windows tab.best <- getBestTest(merged.peaks$id, results$table) 365 csaw_workflow.R (cont’d) # concatenating all relevant statistical data for final merged windows (no redundant columns) final.merged.peaks <- GRanges(cbind(as.data.frame(merged.peaks$region), results$table[tab.best$rep.test, -4], tab.best[,-c(7:8)])) # sort by FDR final.merged.peaks <- final.merged.peaks[order(final.merged.peaks@elementMetadata$FDR), ] final.merged.peaks # filter by FDR threshold FDR.thresh <- 0.05 # set as desired final.merged.peaks.sig <- final.merged.peaks[final.merged.peaks@elementMetadata$FDR < FDR.thresh, ] final.merged.peaks.sig # significant differentially-accessible windows write.table(final.merged.peaks, "treat_vs_control_csaw_DA-windows_all.txt", sep="\t", quote=F, col.names=T, row.names=F) write.table(final.merged.peaks.sig, "treat_vs_control_csaw_DA-windows_significant.txt", sep="\t", quote=F, col.names=T, row.names=F) ########################################### # Generate MA plot library(ggplot2) final.merged.peaks$sig <- "n.s." final.merged.peaks$sig[final.merged.peaks$FDR < FDR.thresh] <- "significant" ggplot(data=data.frame(final.merged.peaks), aes(x = logCPM, y = logFC, col = factor(sig, levels=c("n.s.", "significant")))) + geom_point() + scale_color_manual(values = c("black", "red")) + geom_smooth(inherit.aes=F, aes(x = logCPM, y = logFC), method = "loess") + # smoothed loess fit; can add span=0.5 to reduce computation load/time geom_hline(yintercept = 0) + labs(col = NULL) 366 Figure B.1 Differential gene expression overlap and FDR thresholding analyses Color legend for the 8 DA approaches (top) applies to the entire figure. a, Precision-Recall curve (higher is better) predicting RNA-seq gene differential expression (DE) by promoter DA FDR value. Zoom inset depicts differences in approach specificity at low recall. Overall AUC values are similar for all curves. b, Distribution of DA FDR values for all 8 approaches. Horizontal lines depict the approach- specific FDR thresholds meeting a 5% hypothesis rejection rate (vertical line). c, FDR thresholding analysis of number of RNA-seq DE genes overlapping with DA promoters. 367 Figure B.2 Negative control DA comparison of two control groups from Schep et al. yeast data MA plots from all 8 DA methods applied to a negative control comparison of two 0-minute control groups (n=2 each) from the Schep et al. yeast osmotic stress ATAC-seq data set. 368 Figure B.3 Extended analysis of DA methods on Schep et al. yeast osmotic stress ATAC-seq time course series a, Genome-wide significant DA regions (FDR < 0.05) calculated by each of the 8 DA analyses at 15 minutes exposure vs. 0 minute control, separated by increasing vs. decreasing accessibility change. b, Breakdown of promoter (-2000 to +200 bp around TSS) vs. distal (non-promoter) annotation of significant DA regions from each of the 8 DA analyses again at 15 minutes exposure. c, Classification of significantly increasing (left) vs. decreasing (right) accessibility promoter regions at 15 minutes exposure based on Ni et al. gene expression response to the same osmotic stress conditions. d, Integer number of genes displaying concordant expression response and promoter accessibility changes at 15 minutes exposure, classified as in c. 369 Figure B.4 Complete statistical analysis of Schep et al. osmotic stress ATAC-seq time series DA methods Boxplots in the style of Tukey without outliers for all promoter DA log2FC measurements at each time point compared to 0-minute control, regardless of significance, separated by Ni et al. gene expression response classification. Top (gray) is at genes with stable expression, middle (red) is at genes with upregulated expression, and bottom (blue) is at genes with downregulated expression. Statistic is paired, two-tailed Wilcox test. 370 Figure B.5 Replicated analysis downstream of random subsample seeds for complexity normalization a, Statistics of blacklist-filtered MACS2 broadPeak (FDR < 0.05) calls per library following two random subsamples to normalize molecular complexity. Each library retained over 99% of overlapping peaks between the two random subsample replicates. b, Proportional Euler diagram overlap of significant DA regions (FDR < 0.10) by method IV following the two random subsamples. 371 Figure B.6 Effects of library complexity normalization by random subsampling 372 Figure B.6 (cont’d) a-c, Library statistics from the analyzed in vivo mouse ATAC-seq data set from Chapter 2 before normalization (a), after read depth normalization (b), and after complexity normalization by subsampling (c). The left plots display the number of properly paired, non-mitochondrial fragments in each library, and the right plots display the library complexities in the format of estimated number of unique molecules sequenced. Percentages listed next to each sample in c represent the portion of each library that was retained during subsampling. LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl replicate #1 sample was estimated as the least complex library at current read depth and therefore was not subsampled. The complexity-normalized libraries were used for all subsequent analyses. d, Quantification of ATAC signal by RPKM (i.e. read depth normalization) in control and mutant libraries, at a set of a promoter ATAC regions determined significantly DA by analysis IV. DA regions are further segregated by increasing and decreasing accessibility. Statistic is paired, two- tailed Wilcox test. e, ATAC signal at regions as in d but instead quantified by RPK (reads per kilobase) in libraries of equivalent molecular complexity. 373 APPENDIX C Supplementary material for Chapter 4 Table C.1 Genes with ARID1A in vivo promoter binding detected in adult mouse endometrial epithelia 0610009E02Rik 1110028F18Rik 1700016L21Rik 1700042G07Rik 1700047M11Rik 1700056E22Rik 1700096J18Rik 1700113A16Rik 1700123O20Rik 2200002D01Rik 2310069B03Rik 2810013P06Rik 2810414N06Rik 3300005D01Rik 4732471J01Rik 4930447K03Rik 4930554I06Rik 4933406C10Rik 4933425B07Rik 5730435O14Rik 5830416I19Rik 5830428M24Rik 6030440G07Rik 6430548M08Rik 8430426J06Rik 9330104G04Rik A230028O05Rik A630095N17Rik A930005H10Rik Aamp Abca4 Abcb8 Abhd13 Abhd2 Abtb1 Acadvl Acat1 Acin1 Acot7 Add2 Aebp2 Aldh3a2 Aldoa Als2cl Amz2 Ankrd11 Ankrd22 Ankrd35 Ankrd50 Anxa1 Ap5s1 Arf1 Arf2 Arhgap21 Arhgap40 Arih1 Arl16 Arl6ip5 Arrdc2 Atf3 Atg2a Atg9b Atl3 Atp5d Atp6v1a Atxn1 Avpi1 AW549877 B230216N24Rik B3galt6 B3gntl1 BC005624 Bcar1 Bcl2l1 Birc6 Blnk Bsdc1 Btbd19 Bvht Cadm4 Capn2 Capn5 Capns1 Capzb Card19 Casc1 Casp9 Cbarp Ccdc12 Ccdc47 Ccdc63 Ccm2 Cd164l2 Cd68 Cdc27 Cdc42ep1 Cdh1 Cdk14 Cdk19 Cdt1 Cep57 Cfl1 Cflar Chmp6 Chpf Cic Clcf1 Cldn3 Clec2f Clic1 Clip2 Clk3 Clu Cmip Cntd1 Coa3 Coa4 Cops3 Coq10b Coq8b Cpeb2 Crk Crocc Crybg2 Ctsd Ctsz Cttnbp2nl Cxcl2 Cyb561 Dact3 Dcaf11 Ddx39a Ddx42 Ddx54 Deaf1 Dhx58 Dip2a Dkkl1 Dlg4 Dmac2l Dmpk Dnajc16 Dock9 Dolpp1 Dot1l Dph5 Drc3 Dstn Dusp2 Dynlt4 Edem3 Eif2ak3 Eif4a1 Eif5 Eif5a Ell Eno1 Ensa Enthd1 Erbb3 Ercc1 Esrra Etfrf1 Etv5 F3 Fam187b Fam220a Fam43a Fam72a Fam76a Fam76b Fbrsl1 Fbxo11 Fbxo24 Fbxo4 Fkbp2 Fnbp1l Fosl2 Fzd5 Fzr1 G3bp1 Gabarapl2 Gal3st3 Gatad2a Gfpt1 Glrp1 Glrx5 Glud1 Gm10646 Gm11563 Gm15612 Gm15941 Gm2093 Gm21284 Gm21992 Gm38670 Gm4651 Gm7854 Gm9895 Gmfg 374 Table C.1 (cont’d) Gns Gpr3 Grcc10 Gsk3b Gstm7 Gsto1 Gtdc1 Gtpbp2 Gys1 H2-Ab1 H2ac7 H2ac8 H2bc7 H2bc8 H3c4 Hbegf Hdac3 Hdac4 Helz Herpud2 Hgs Higd1a Hmg20b Hp1bp3 Hsd3b7 Hsp90aa1 Hspb6 Hyal1 Hyal3 Ifnar1 Igsf3 Il1f6 Il23a Il6ra Il6st Ilf3 Ing1 Ino80c Ippk Iqcd Iqgap1 Irf2bp2 Isg20 Itpkc Izumo4 Jarid2 Jun Jup Kat8 Kcnn1 Kctd11 Kif17 Klf10 Klhl26 Klhl6 Kpnb1 Krt12 Krt25 Krt8 L2hgdh Lactb2 Lad1 Lamc1 Lce3a Lig4 Limk1 Liph Lipm Lmna Lpo Lrch4 Lrp8 Lsm14b Lsr Ly6g6c Maff Map2k3 Map2k3os Map3k11 Map3k12 Map3k14 Map7d1 Mapk8 Marchf4 Mark2 Max Mboat7 Mcm4 Mdc1 Mgat4b Mgst3 Mia Midn Mien1 Mir129-1 Mir1946b Mir210 Mir21a Mir22 Mir22hg Mir3069 Mir3082 Mir3100 Mir3101 Mir3109 Mir324 Mir5123 Mir6924 Mir6926 Mir6945 Mir6962 Mir6965 Mir7008 Mir7039 Mir7671 Mir7678 Mir8098 Mirlet7c-1 Mknk2 Mks1 Mllt11 Mms19 Mnt Mob3a Mpig6b Mrps28 Msh5 Mtmr11 Muc21 Mus81 Mxd1 Myl12a Myl12b Myl4 Mylpf Myom1 Naa50 Naa80 Nbeal2 Ndrg1 Ndufs8 Nfe2l1 Nipal3 Noct Nop53 Npas2 Nr1d1 Nt5m Nub1 Nup98 Obp2a Odad3 Opa3 Orai2 Osbpl2 Oxsr1 Pa2g4 Pafah1b2 Pard6b Park7 Parp3 Pawr Paxx Pccb Pcif1 Pcmtd1 Pcnx3 Pcsk4 Pcx Pdhb Pdlim7 Pex7 Pgap2 Pgs1 Phf12 Phf2 Piezo1 Pigz Pik3r2 Pik3r3 Pim1 Pip5k1a Pip5k1c Pkm Pla2g2e Pla2g5 Pla2g6 Plaur Plch1 Plek2 Plekhg6 Plekhm1 Plet1 Plk3 Pmp22 Pnkd Pnrc2 Ppl Ppp1r15b Ppp1r2 Ppp2r5a Prdm10 Prkcsh Prkdc Proser3 Prrc2a Prss22 Prss56 Prss8 Psma3 Ptgfrn Ptpn6 Ptprh Ptprs Pxmp4 375 Table C.1 (cont’d) Pycrl Rab11b Rab11fip1 Rab40c Rab4b Ranbp10 Rap2a Rasal1 Rassf1 Rbbp4 Rbm14 Rbm15 Rccd1 Reep6 Rell2 Rfx1 Rheb Rhof Rhpn2 Riok3 Rita1 Rmi2 Rmnd5a Rnf224 Rnft2 Rp1l1 Rpl26 Rpl41 Rplp0 Rps6ka5 Rrp9 Rsph3a Rtn4 S100a10 S100a11 S100a3 Sap130 Scarna13 Sdcbp2 Sdf2l1 Sdf4 Sec11c Sec16a Senp3 Sertad2 Sf3b2 Sfr1 Sgk2 Sgms2 Sgsm3 Sh2d5 Sh3bgrl2 Sh3pxd2b Shld2 Sik3 Ski Slc22a5 Slc25a42 Slc25a48 Slc27a1 Slc35b1 Slc39a4 Slc66a2 Smarca4 Smg9 Snai3 Snd1 Snhg10 Snx11 Snx33 Sp1 Spag7 Spring1 Sprr1a Spsb2 Sptan1 Spty2d1 Sqstm1 Srgap2 Srgap3 Srxn1 Stam2 Stambp Stpg1 Strn3 Stub1 Suco Sypl Syt8 Tarbp2 Tatdn2 Tbc1d10b Tbl2 Tcf7l2 Tdrd7 Tead1 Tex2 Tff1 Tgif1 Tiam1 Ticam1 Tinagl1 Tjp3 Tlcd2 Tlcd3b Tle4 Tlr4 Tlr5 Tmc6 Tmc8 Tmco6 Tmem198 Tmem259 Tmem40 Tmem80 Tmem95 Tmie Tmpo Tnfrsf9 Tnfsf13 Tom1l2 Tpcn1 Trib1 Trim58 Trim6 Trim8 Tsen34 Tsnaxip1 Tspan1 Tubb5 Txnrd3 U2af2 U2surp Uap1l1 Ubald2 Ubap1l Ubc Ube2m Ube2v1 Ubtd1 Uox Upk3bl Urah Usp20 Usp22 Usp32 Vasn Vill Vps72 Wbp2 Wdfy2 Wdr1 Wdr19 Wdr47 Wdr89 Wdsub1 Wfdc2 Wfs1 Wwc1 Yipf6 Ypel3 Ywhag Zbtb5 Zbtb8os Zc3h10 Zc3h4 Zc3hav1 Zfp1 Zfp367 Zfp687 Zfp703 Zswim4 587 genes with significant ARID1A in vivo promoter binding (CUT&RUN, n = 2, MACS2 FDR < 0.25 broadPeaks compared to IgG control) within 3 kb of the TSS. ARID1A binding was profiled in vivo in 100,000 sorted adult mouse EPCAM+ endometrial epithelial cells. 376 Figure C.1 TP53 mutations in endometrial cancers profiled by TCGA-UCEC a, PIK3CA co-alteration rate among ARID1A and TP53 altered UCEC tumors. b, Lollipop plot for mutations in TP53 gene across TCGA-UCEC data (Pan-Can cohort). c, Distribution of TP53 mutations by type in TCGA-UCEC serous subtype primary tumors. d, Kaplan-Meier overall survival curves for TP53 mutant UCEC tumors segregated by type of TP53 mutation: missense vs. truncating. Statistic is log-rank test. e, Distribution of tumor grading among TP53 mutant UCEC tumors segregated by type of TP53 mutation: missense vs. truncating. Statistic is chi-squared test. 377 Figure C.2 Additional histopathological characterization of TP53/PIK3CA mutant mice 378 Figure C.2 (cont’d) a, Representative low-magnification H&E histology of control mouse uterus. b, Additional representative H&E histology of TP53/PIK3CA mutant uterus (approximately 76- day old) at varying magnifications. Arrowheads depict endometrial epithelia. c, KRT8 (left), a marker of endometrial epithelium, and phospho-S6 (right), a marker of activated PI3K signaling. Arrowheads depict mutant endometrial epithelia. Figure C.3 EPCAM endometrial epithelial cell purification statistics a, Example flow cytometry analysis of EPCAM purity from magnetically sorted LtfCre0/+; (Gt)R26Pik3ca*H1047R; Trp53fl/fl (TP53/PIK3CA mutant) mouse endometrial epithelial cells. b, Purity of EPCAM-isolated cell populations for each sample sequenced by RNA-seq. Mean purity ± SD (%) among sequenced samples was 84.6 ± 6.9. Purity of previously reported control group was 87.7 ± 5.4. 379 Figure C.4 Differences in epithelial-mesenchymal transition following TP53 vs. ARID1A loss 380 Figure C.4 (cont’d) a, Broad GSEA waterfall plots for the MSigDB Hallmark epithelial-mesenchymal transition (EMT) pathway in cells from each genetic mouse model compared to controls. b, k-means clustering (k = 3) of differential gene expression in TP53/PIK3CA mutant and ARID1A/PIK3CA mutant endometrial epithelial cells compared to controls for 183 mouse orthologs within the Hallmark EMT pathway. Red values indicate gene upregulation in mutant cells, and blue values indicate downregulation. c, Relative expression box-dot plots summarizing gene expression changes in the k clusters for each genetic mouse model compared to control cells. Statistic is unpaired, two-tailed Wilcoxon test. *** p < 0.001. d-e, Zoom into cluster 1 genes (n = 60) labeled by human ortholog. Red, bolded genes are further displayed in e as a box-dot plot. Statistic is FDR as reported by DESeq2 Wald test: * FDR < 0.05; ** FDR < 0.01; *** FDR < 0.001. 381 Figure C.5 Pathway analysis of distinct expression programs in TP53- and ARID1A-loss-driven hyperplasia Enrichment for Hallmark pathways, GO Biological Process gene sets, and oncogenic signatures (all retrieved from MSigDB) among genes DE between TP53/PIK3CA mutant vs. ARID1A/PIK3CA mutant endometrial epithelial cells, separated by directionality. 382 Figure C.6 Pathway alterations driven by TP53 and ARID1A mutations are associated with histological subtype a, Broad GSEA results for Hallmark pathways between TCGA-UCEC tumors: ARID1A mutant / TP53 wild-type vs. wild-type / wild-type compared to TP53 mutant / ARID1A wild-type vs. wild- type / wild-type. b, Broad GSEA results for Hallmark pathways between TCGA-UCEC tumors: ARID1A mutant / TP53 wild-type vs. TP53 mutant / ARID1A wild-type compared to endometrioid vs. serous. Significant correlation of pathway enrichment is observed between genetics and subtype by Pearson (r) and Spearman (rs) correlations. c, Correlation of GO Biological Process gene sets as in b. d, Top, phi correlation and associated statistic of dependent samples classified as either endometrioid vs. serous histotype and TP53mut/ARID1Awt or ARID1Amut/TP53wt. Bottom, Fisher’s Z-transformations comparing the Pearson correlation coefficients between the phi correlation and GSEA results. 383 Figure C.7 GO Biological Process gene set overlaps between mouse and human genotypes Overlap of enriched gene sets (|NES| > 1) determined in Fig. 4.4 GSEA for various mouse and human genetic comparisons as displayed, further segregated by upregulated vs. downregulated gene sets. Significant overlap indicates that more enriched gene sets were observed in both comparisons than expected by chance. Statistic is hypergeometric enrichment. 384 Figure C.8 Differential gene expression analysis of TP53 vs. ARID1A mutant mouse and human samples a, 81 genes with significantly higher expression (human: limma FDR < 0.05; mouse: DESeq2 FDR < 0.05) in TP53 mutant samples compared to ARID1A mutants. b, 149 genes with significantly higher expression in ARID1A mutant samples compared to TP53 mutants, as in a. c, Top 10 enriched Hallmark pathways and GO Biological Process gene sets among the 81 genes identified in a. Gray text indicates non-significance (FDR > 0.05). d, Top enriched gene sets among the 149 genes identified in b, as in c. 385 Figure C.9 Extended GSEA results for disease and model genetic comparisons a, Detailed Broad GSEA results for TCGA-UCEC ARID1A mutant / TP53 wild-type vs. wild- type / wild-type compared to TP53 mutant / ARID1A wild-type vs. wild-type / wild-type. Representative examples of highly enriched gene sets are labeled for each quadrant. b, Significantly over-represented terms in enriched gene sets (|NES| > 1) highlighted in a. Statistic is hypergeometric enrichment. See Methods section 4.5.8 for analysis framework. c-d, GSEA results for mouse TP53/PIK3CA mutant vs. control cells compared to ARID1A/PIK3CA mutant vs. control cells as in a-b. 386 Figure C.10 Enrichment of gene expression alterations at TP53 core transcriptional program genes in mouse models Proportion of genes significantly differentially expressed (DE) in TP53 core transcriptional program gene mouse orthologs compared to all expressed genes, for (top) ARID1A/PIK3CA mutant and (bottom) TP53/PIK3CA mutant endometrial epithelia compared to cells from control mice. Statistic is hypergeometric enrichment test. Figure C.11 PARADIGM pathway activity associated with ARID1A mutation in UCEC tumors a, Distribution of PARADIGM score differences between ARID1A mutant (n = 174) vs. wild-type (n = 128) TCGA-UCEC tumors, considerate of only TP53 wild-type tumors, as in Fig. 4.5c. Top, all 19,503 measured pathways; bottom, the 36 pathways with keyword “p53”. b, Empirical distribution of mean differences of ARID1A mutant vs. wild-type PARADIGM scores, based on 50,000 samples of 36 random PARADIGM pathways, as in Fig 4.5d. The blue line represents the mean score difference for the 36 pathways with keyword “p53” with associated permutation statistic. 387 Figure C.12 Extended analysis of ARID1A binding sites in vivo endometrial epithelia a, Genomic feature enrichment among 2146 ARID1A in vivo binding sites. b, Top significant (p < 10-30) known motifs from HOMER sequence analysis of ARID1A in vivo binding sites compared to the background genome. Motif sequence logos are scaled by information content for each nucleotide base. c, Overlap of ARID1A/PIK3CA mutant DE genes (RNA-seq, DESeq2 FDR < 0.05, n = 3481) and genes with ARID1A binding detected within 50 kb of TSS. Statistic is hypergeometric enrichment test. d, Top 10 (left) GO Biological Process gene sets and (right) Hallmark pathways enriched among 2887 human orthologs of genes with ARID1A binding within 50 kb from TSS. 388 Figure C.13 Additional TCGA-UCEC and MSK-IMPACT analysis of TP53/ARID1A co-altered tumors a, Distribution of histologic grading among TCGA-UCEC primary tumors, segregated by TP53/ARID1A co-altered (n = 38) vs. else (n = 471). Statistic is two-tailed Fisher’s exact test. b, Frequency of mixed morphology subtype tumors across all TCGA-UCEC primary tumors (n = 509) compared to specifically TP53/ARID1A co-altered tumors (n = 38). Statistic is hypergeometric enrichment. c, Contingency table of TP53/ARID1A co-mutation rate in all primary vs. metastatic tumors from the MSK-IMPACT Clinical Sequencing Cohort. Statistic is one-tailed Fisher’s exact test. d, Contingency table of ARID1A co-mutation rate in TP53 mutant primary vs. metastatic tumors from MSK-IMPACT as in c. e, TP53/ARID1A co-mutation rates of all MSK-IMPACT tumor samples segregated by presence or absence of POLE mutations. Statistic is two-tailed Fisher’s exact test. 389 Figure C.14 TP53/ARID1A/PIK3CA mutant mouse survival and marker staining a, Survival curves for LtfCre0/+; (Gt)R26Pik3ca*H1047R; Trp53fl/fl mice with and without an additional Arid1afl/fl allele (TP53/PIK3CA mutant and TP53/ARID1A/PIK3CA mutant mice). Statistic is Cox log-rank test. b, Representative IHC marker analysis in TP53/ARID1A/PIK3CA mutant mice. Left, phospho- S6; right, ARID1A staining. Arrowheads denote mutant endometrial epithelia. 390 Figure C.15 Representative proliferation and caspase-mediated cell death marker IHC More representative IHC images of Ki67 (proliferation) and cleaved caspase-3 (cell death) in TP53/PIK3CA mutant (a and c) and TP53/ARID1A/PIK3CA mutant (b and d) mouse uterus, respectively, accompanying Fig. 4.7f. Arrowheads denote mutant endometrial epithelia. 391 Figure C.16 ARID1A loss-induced ATF3 and TP63 is associated with invasive transcriptional signatures 392 Figure C.16 (cont’d) a, Atf3 gene expression (linear) in mutant mouse endometrial epithelial cell RNA-seq data. Statistic is FDR as reported by DESeq2. b, Significant correlation of Trp63 and Snai2 gene expression in experimental mouse endometrial epithelial cells. Expression is quantified as log2(normalized counts + 1). Statistics are Pearson (r) and Spearman (rs) coefficients. c, Workflow for identification of the TP63 target gene network induced by ARID1A mutation (n = 30 genes), beginning with 180 high confidence TP63 target genes (Riege et al. 2020). d, RNA-seq relative expression heatmap for genetic mouse models and UCEC primary tumor samples for the TP63 target gene network induced by ARID1A mutation. Red, bolded genes display similar expression patterns in genetic mouse models and human UCEC tumors. e, Top GO Biological Process gene sets enriched (hypergeometric enrichment p < 0.001) in the TP63 target gene network induced by ARID1A mutation. Hash represents the number of target genes found within each gene set. f, Broad GSEA results for the TP63 target gene network induced by ARID1A mutation among human UCEC primary tumors segregated by ARID1A and TP53 genetic status. Significant enrichment was observed in ARID1Amut/TP53wt vs. wt/wt tumors. 393 Figure C.17 Further representative marker staining of mutant mouse models and vaginal pseudostratified squamous epithelium 394 Figure C.17 (cont’d) a, Representative uterine TP63 (Cell Signaling) staining in mutant mouse models. Arrowheads denote endometrial epithelium. b-c, Further representative uterine (b) TP63 (GeneTex) and (c) COL17A1 staining in TP53/ARID1A/PIK3CA mutant mice. Arrowheads denote mutant endometrial epithelia. d, Representative H&E and IHC staining for ATF3, TP63 (Cell Signaling), and COL17A1 in vaginal pseudostratified squamous epithelium of wild-type CD-1 mice. Arrowheads denote basal epithelial cells expressing markers. 395 APPENDIX D Supplementary material for Chapter 5 396 Table D.1 Evidence for possible driver SMARCA4 mutations within uterine endometrial cancer Sample ID Cancer Type Protein OncoKB Cancer Mutation Copy # Chr. Position Ref. Var. # Mut in Change annotation Hotspot Type sample TCGA-BS-A0V4-01 Endometrioid R1243W Predicted Yes Missense Diploid 19 11144146 C T 323 Oncogenic TCGA-AX-A06F-01 Endometrioid R1192H Predicted Yes Missense Diploid 19 11143994 G A 7,282 Oncogenic TCGA-BS-A0UV-01 Endometrioid R1192H Predicted Yes Missense Diploid 19 11143994 G A 8,968 Oncogenic TCGA-FI-A2D5-01 Endometrioid R1192H Predicted Yes Missense Diploid 19 11143994 G A 13,865 Oncogenic TCGA-BS-A0TJ-01 Endometrioid G1232S Likely Yes Missense Diploid 19 11144113 G A 700 Oncogenic TCGA-FI-A2D0-01 Endometrioid G1232S Likely Yes Missense Diploid 19 11144113 G A 6,445 Oncogenic TCGA-B5-A1MW-01 Endometrioid G271Rfs*16 Likely No Frameshift Diploid 19 11097624 - C 538 Oncogenic TCGA-E6-A2P8-01 Mixed M272Cfs*31 Likely No Frameshift Diploid 19 11097625 C - 561 Oncogenic TCGA-PG-A6IB-01 Serous M272Cfs*31 Likely No Frameshift Diploid 19 11097625 C - 434 Oncogenic TCGA-A5-A0G1-01 Serous D1638= Likely No Splice Diploid 19 11172462 C T 10,820 Oncogenic TCGA-AX-A06F-01 Endometrioid X758_splice Likely No Splice Diploid 19 11121208 G A 7,282 Oncogenic TCGA-AX-A1CF-01 Endometrioid Y994* Likely No Nonsense Shallow 19 11135015 C G 92 Oncogenic deletion TCGA-DI-A1BU-01 Mixed X415_splice Likely No Splice Diploid 19 11100121 T C 7,642 Oncogenic Summary of possible BRG1 driver mutations within TCGA-UCEC (Uterine Corpus Endometrial Carcinoma) data retrieved from cBioPortal (Pan-Cancer Atlas data, n = 509 tumors with copy-number and mutation data) annotated as “predicted oncogenic” or “likely oncogenic” by OncoKB. Four bolded tumor samples display deleterious BRG1 mutations with relatively low background mutation rate (# Mut in sample). 397 Figure D.1 Hallmark pathway enrichment among 12Z siBRG1/siARID1A DGE Gene set enrichment analysis results for (top) MSigDB hallmark Pathways and (bottom) GO BP gene sets among upregulated and downregulated DE genes following siBRG1 or siARID1A treatment compared to control 12Z cells. 398 Figure D.2 Transcription factor networks among direct activating SWI/SNF target genes MSigDB GSEA results for GTRD transcription factor (TF) target gene sets, showing 18 significantly enriched (FDR < 0.05) gene sets among the 35 direct activating SWI/SNF target genes (described in Fig. 5.3a). Hash indicates the number of SWI/SNF target genes found within that TF target gene set. Gene set constituent matrix shows the pairwise membership of each gene to the enriched TF target gene sets, where black cells indicate a gene belongs to that gene set. Genes are arranged from left to right based on number of enriched gene sets. 399 Figure D.3 Representative images of individual uterine glands from LtfCre0/+; Brg1fl/+ and LtfCre0/+; Brg1fl/fl mice Pictured glands are those denoted by arrowheads in Fig. 5.5. 400 Figure D.4 LtfCre0/+; Brg1fl/fl endometrial epithelial cell purification statistics a, Flow cytometry analysis of EPCAM purity (based on PE labeling) of a representative LtfCre0/+; Brg1fl/fl endometrial epithelial cell population. b, Percent purity of EPCAM-isolated cell populations for each sample. The mean purity ± SD (%) among all LtfCre0/+; Brg1fl/fl samples was 82.3 ± 3.7. Purity of the previously reported control group was 87.7 ± 5.4. 401 APPENDIX E Supplementary material for Chapter 6 Figure E.1 Genome-wide chromatin features profiled in human 12Z endometrial epithelial cells Examples of chromatin features near the LOXL1 gene (hg38 chr15: 73.880-74.080 Mb), including transcription (total RNA), chromatin accessibility (ATAC), histone post-translational modifications (H3K27me3, H3K4me3, H3K4me1, H3K27ac, H3K18ac), and ARID1A binding. y-axis represents assay signal or likelihood of detection. Small bars beneath signal tracks indicate significant detection i.e. signal peak (MACS2 FDR < 0.05, n = 2). Feature combinations can be used to infer function or activity at a given genomic region, such as active (H3K27ac+ ATAC+ H3K4me1+) vs. poised (H3K4me1+) enhancers, active promoters (H3K4me3+ ATAC+), super- enhancers (clusters of H3K27ac+++ regions defined through the ROSE algorithm), and repressed or silenced regions (H3K27me3+). 402 Figure E.2 Enhancer classification and additional differential histone modification analysis a, Identification of super-enhancers (SE) and typical enhancers (TE) using ROSE identification. 26,963 H3K27ac regions with overlapping accessibility (ATAC) were used as input for ROSE, leading to identification of 413 active super-enhancers. Typical enhancers were defined by filtering these super-enhancer regions along with any overlapping gene promoters from the remaining H3K27ac peaks. b, Meta peak plots for H3K27ac at distal peaks within super-enhancers or typical enhancers, centered on the H3K27ac peak. y-axis is signal as ChIP - Input reads per million (RPM) per bp per peak. c, Number of H3K27ac peaks per super-enhancer depicted as a boxplot in the style of Tukey (left) or a histogram (right). The median number of peaks per super-enhancer is 3. d-h, Differential (d) H3K27ac, (e) H3K18ac, (f) H3K27me3, (g) H3K4me3 and (h) H3K4me1 ChIP-seq following ARID1A loss. MA plots display differential abundance with significant sites (FDR < 0.05) highlighted in red. x-axis is signal abundance quantified as log2 counts per million (log2CPM), and y-axis is the log2 fold-change (log2FC) difference of shARID1A vs. control conditions (n = 2 ChIP replicates per condition). (i-l) Genomic feature enrichment for sites with all differential, increasing or decreasing H3K27ac, compared to all tested H3K27ac regions, which are found in (i) promoters, (j) exons, (k) introns, or (l) intergenic regions. Statistic is hypergeometric enrichment and pairwise two-tailed Fisher’s exact test. 403 Figure E.3 ARID1A and P300 co-regulation of promoters and gene expression 404 Figure E.3 (cont’d) a-b, Differentially expressed genes from (a) LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl mice compared to control (FDR < 0.05, n = 3008 human orthologs) or (b) genes bound by ARID1A and differentially expressed upon ARID1A loss in 12Z cells (FDR < 0.05, n = 980) were analyzed using the Enrichr webtool for overlap with cofactor binding from ENCODE database. Significance (log10FDR, y-axis) of overlapping datasets, ranked by FDR (x-axis). P300 is the most significant co-factor in both datasets (arrow). c, Enrichment for significant genomic features among P300 ChIP peaks, ranked by significance. Enrichment ratio is calculated by bp of feature in ChIP peak set compared to background genome. d, Heatmap of ARID1A and P300 ChIP-seq signal at 18,277 gene promoters, arranged into groups based on significant binding of ARID1A, P300, both ARID1A and P300, or neither. e, Euler diagram of overlap between expressed gene promoters bound by ARID1A (n = 2954) and P300 (n = 7097). Statistic is hypergeometric enrichment. f, Euler diagram of overlap between direct, functional target genes of ARID1A (n = 486) and P300 (n = 1580). Direct, functional target genes were defined by ChIP promoter binding and which knockdown (by siARID1A or siP300) led to a significant change in gene expression. Statistic is hypergeometric enrichment. g-j, Percent of genes bound by ARID1A, EP300 or both among all genes (g) or genes which were differentially expressed following knockdown of ARID1A (h), EP300 (i) or 100 nM A-485 treatment (j). Statistic is two-tailed Fisher's exact test. k-m, Meta peak profile of ARID1A binding (k), P300 binding (l) or chromatin accessibility (ATAC) (m) at promoters with ARID1A binding, P300 binding, both, or neither. y-axis is ChIP - Input reads per bp per peak (k-l) or ATAC reads per bp per peak (m). 405 Figure E.4 Additional characterization of P300-deficient phenotypes 406 Figure E.4 (cont’d) a, Measurement of 12Z cell growth 72 hours post-transfection. No significant differences were observed (unpaired, two-tailed t-test). Mean ± SD, n = 4. b, Cell cycle analysis cells treated with siRNA co-treatment or A-485 treatment. Analysis was performed 72 hours after transfection and 48 hours after drug treatment. EdU incorporation and DyeCycle Ruby staining were measured by flow cytometry and used to determine cell cycle. No significant differences in S-phase (EdU) incorporation as a marker of proliferation were observed (right plot). Statistic is unpaired, two-tailed Wilcoxon test. c, Measurement of 12Z cell growth following 72 hours SAHA or TSA treatment. Cells were stained with calcein-AM for 1 hour prior to imaging. Data represents normalized fluorescence value relative to control (vehicle). Mean ± SD, n = 4. d, Invasion of 12Z cells following treatment with 316 nM SAHA or 10 nM TSA. Representative images of calcein-AM stained cells and total invaded cell numbers are shown (scale bar = 500 μm). No significant differences were observed (two-tailed, unpaired t-test). Mean ± SD, n = 4. e, Illustration of mutant alleles used in this study. f, PCR genotyping results to detect LtfCre0/+, (Gt)R26Pik3ca*H1047R, Arid1afl, and Ep300fl. g, Representative gross images of mouse uterus and uterine tumors. White arrowheads indicate tumors. LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl (n = 16) and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice (n = 12) were sacrificed at the point of vaginal bleeding. LtfCre0/+; Ep300fl/fl mice were aged out to 187 days (n = 6). h, Additional histology and IHC staining for Cleaved-Caspase 3 in LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mouse endometrium. Arrowheads indicate endometrial epithelium. Scale bar is indicated size between 200 μm and 500 μm. i, Ki67 IHC staining and quantification. Representative images of Ki67 staining in control, LtfCre0/+; Ep300fl/fl, LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl, and LtfCre0/+; (Gt)R26Pik3ca*H1047R; Arid1afl/fl; Ep300fl/fl mice. Number of Ki67+ cells and total cells were counted in the epithelium and stroma and plotted as a ratio of % positive epithelia/stroma. Mean ± SD, n = 3-5 mice, two- tailed, unpaired t-test, *** p < 0.001. 407 Figure E.5 Phenotypic characterization of cells following A-485 treatment 408 Figure E.5 (cont’d) a, Measurement of 12Z cell growth following 72 hours of A-485 treatment. Data represents normalized fluorescence value relative to vehicle control. Mean ± SD, n = 4. b, Western blot depicting ARID1A knockdown in shARID1A stable cell line. β-actin was used as a loading control. c, Viability assay for cells treated with A-485. Statistical analysis presented in Fig. 6.4g. d, Annexin-V staining of cells treated with A-485 following siRNA treatment. Cells were treated with A-485 for 24 hours. Annexin-V-FITC signal was measured by flow cytometry, and H2O2 treatment was used as a positive control. Histogram represents percentage of Annexin-V+ cells in each sample. Mean ± SD, n = 2. e, Invasion of 12Z cells following treatment with non-targeting siRNA or siARID1A and A-485 treatment from 1 μM to 10 μM. Representative images of calcein-AM stained cells and total invaded cell numbers are shown (scale bar = 500 μm). Mean ± SD, n = 4, two-tailed, unpaired t- test. f, Migration assay of 12Z cells following treatment with shARID1A and A-485 treatment from 10 nM to 1 μM. Images are representative of cells 24 hours following removal of insert (scale bar = 500 μm). Migration distance represents the average difference distance across each migration front from 0 to 24 hours. Mean ± SD, n = 4. Two-tailed, unpaired t-tests were performed in comparison to the shARID1A + vehicle condition. * p < 0.05, ** p < 0.01. 409 Figure E.6 Effects of A-485 treatment on ARID1A and PIK3CA double-mutant 12Z cells 410 Figure E.6 (cont’d) a, Western blot of ARID1A, β-actin, AKT, P-AKT following co-transfection of non-targeting siRNA (control) + empty vector, siARID1A + empty vector, non-targeting siRNA + PIK3CAH1047R plasmid or siARID1A + PIK3CAH1047R plasmid. b, Measurement of 12Z cell growth following 48 hours of A-485 treatment. Data represents normalized fluorescence value relative to vehicle control. Mean ± SD, n = 4. c, Western blot of ARID1A, β-actin, AKT, P-AKT following co-transfection of non-targeting shRNA (control) + empty vector or shARID1A + PIK3CAH1047R plasmid. d, Invasion of 12Z cells following treatment with shARID1A, PIK3CAH1047R plasmid and A-485 treatment from 31 nM to 1 μM. Representative images of calcein-AM stained cells and total invaded cell numbers are shown (scale bar = 500 μm). Mean ± SD, n = 4. Two-tailed, unpaired t- tests were performed in comparison to the shARID1A, PIK3CAH1047R + vehicle condition. e, Migration assay of 12Z cells following treatment with shARID1A, PIK3CAH1047R plasmid and A-485 treatment from 10 nM to 1 μM. Images are representative of cells 24 hours following removal of insert (scale bar = 500 μm). Migration distance represents the average difference distance across each migration front from 0 to 24 hours. Mean ± SD, n = 4. Two-tailed, unpaired t-tests were performed in comparison to the shARID1A, PIK3CAH1047R + vehicle condition. f, Caspase-Glo assay of 12Z cells in suspension following treatment with shARID1A and PIK3CAH1047R plasmid. Mean ± SD, n = 4, two-tailed, unpaired, t-test. * p < 0.05, ** p < 0.01, *** p < 0.001. 411 Figure E.7 SERPINE1/PAI-1 immunohistochemical staining in endometriosis patient samples a, IHC staining for SERPINE1/PAI-1 in human deep infiltrating and ovarian endometriosis patient tissues samples. Scale bars = 200 μm. b, IHC quantification of SERPINE/PAI-1 as epithelial H-score, comparing ARID1A-expressing (n = 10) vs. ARID1A-deficient (n = 2) lesions. Statistic is two-tailed, unpaired t-test. * p < 0.05 412 APPENDIX F Supplementary material for Chapter 7 Figure F.1 H3.3 analysis in histone wild-type CCLE lines Repeated analysis of H3.3 and H3.1 abundance in CCLE lines that are wild-type for all 74 human histone genes annotated by Nacev et al. (Nacev et al. 2019). Statistic is two-tailed, unpaired Welch’s t-test. a, H3.3 (left) and H3.1 (right) peptide abundances in ARID1A mutant vs. wild-type pan-cancer lines, as in Fig. 7.1c. b, H3.3 (left) and H3.1 (right) peptide abundances in ARID1A mutant vs. wild-type endometrial cancer lines, as in Fig. 7.1d. 413 Figure F.2 Additional ARID1A knockdown differential H3.3 data a, Immunoblot for ARID1A, compared to β-actin loading control, for 12Z cells transduced with non-targeting control shRNA or shARID1A (ARID1A knockdown). Wild-type condition shown as reference. b, Immunoblots for H3.3 compared to total H3 in siARID1A (ARID1A knockdown by siRNA) vs. non-targeting control siRNA treated 12Z cells. No visual difference in H3.3 levels is observed following ARID1A knockdown. c-d, Additional characterization of differential H3.3 abundance following ARID1A knockdown. (d) Typical enhancers are enriched for H3.3 alterations compared to all tested regions. H3.3+ regions within gene promoters and super-enhancers are affected less than expected by chance. (e) However, both typical enhancers and super-enhancers that show significant alterations in H3.3 almost universally show decreased H3.3 abundance, rather than increased H3.3 abundance. Statistics are hypergeometric enrichment test and pairwise two-tailed Fisher’s exact test. 414 Figure F.3 Additional H3.3 knockdown functional analysis a, Example cell cycle analysis following 72 hours siRNA transfection. EdU incorporation and DyeCycle Ruby staining were quantified by flow cytometry as cell cycle measures. b, Quantification of cell cycle phases (G1, S, and G2/M) in control non-targeting siRNA treated and siH3F3B (H3.3 knockdown) treated 12Z cells. c, Euler diagram displaying overlap of active H3.3-marked gene promoters and gene expression alterations following H3.3 knockdown (siH3F3B). Statistic is hypergeometric enrichment test. 415 Figure F.4 Example locus showing additional chromatin features profiled in 12Z cells Example IGV screenshot displaying chromatin features profiled in 12Z cells. New features added to our revised 12-feature ChromHMM chromatin state model are H2A.Z, acetyl-H2A.Z (H2A.Zac), pan-acetyl H4 (pan-H4ac), H4K16ac, and H3.3. ARID1A, CHD4, and ZMYND8 are included for further context and examples of chromatin co-regulation. 416 Figure F.5 Peptide specificity of anti-acetyl H2A.Z (K4/K7) Hybridization results for two histone peptide array replicates probed with anti-acetyl H2A.Z (K4/K7) (Cell Signaling, D3V1I), demonstrating clear specificity for acetylated H2A.Z peptides. 417 Figure F.6 ChromHMM 12-feature chromatin state model optimization 418 Figure F.6 (cont’d) Determining optimal number of chromatin states as described by Gorkin et al. (Gorkin et al. 2020). See methods section 7.5.9 for further details. a-b, Examples of correlating state feature emission probabilities from the most complex model (40-state) with less complex models; the first optimization strategy. In these examples, feature emissions from state 20 of the 40-state model (most complex model) are correlated with feature emissions in states from less complex models. a, Comparing to state 7 (left, Pearson r = 0.796) and state 9 (right, Pearson r = -0.454) from the 10-state model. The maximum correlation between 40-state model state 20 and any 10-state model state is Pearson r = 0.796. b, Comparing to state 22 (left, Pearson r = 0.981) and state 7 (right, Pearson r = 0.267) from the 22-state model. The maximum correlation between 40-state model state 20 and any 22-state model state is Pearson r = 0.981. c, Clustered heatmaps of all emission probabilities from all tested models (left, n=810 total modeled chromatin states), followed by example k-means clustering (right); the second optimization strategy. k-means clustering was performed from k=5 to k=40 clusters (the same number of states in the tested chromatin state models) for measurement of goodness of fit by sum- of-squares (between-cluster vs. total). Variance within each cluster is reduced as the number of k clusters increase. d, First strategy model selection plot based on the maximum correlation of emissions from the most complex model (40-state) compared to all simpler models. The y-axis is the median correlation across all 40-state model chromatin states compared to the best simpler model state. For example, each state within the 40-state model correlates best with a 20-state model state at a median Pearson r ~ 0.95. 95% threshold is represented by the red line. e, Second strategy model selection plot based on the goodness of fit for all simpler models relative to the most complex model (40-state). The y-axis is between-cluster vs. total sum-of-squares relative to the most complex model (40-state). 95% threshold is represented by the red line. 419 Figure F.7 Chromatin accessibility repressed by ARID1A is associated with H4 acetylation a, Euler diagram displaying overlapping genome-wide ChIP-seq peaks (called by MACS2) for pan- H4ac compared to ARID1A. 48.6% of pan-H4ac peaks are co-marked by ARID1A binding. b, Euler diagram displaying overlapping gene promoters marked by pan-H4ac (H4ac+) and ARID1A binding. Statistic is hypergeometric enrichment test. c, Heatmap displaying ChIP-seq signal across 10,558 genome-wide pan-H4ac ChIP-seq peaks. Signal is quantified as ChIP – Input. Peaks are ranked by overall pan-H4ac signal and stratified by ARID1A binding and promoter (<3 kb from a TSS) vs. distal (>3 kb from a TSS). d, Annotation and directional breakdown of 6,205 significant (FDR < 0.05) differentially accessible genomic regions, measured by ATAC-seq, following ARID1A depletion via siRNA (siARID1A). Regions are further segregated based on ARID1A binding status. e, Association of pan-H4ac with differential ATAC regions directly bound by ARID1A and separated by direction of accessibility change genome-wide (left), at promoters (center), and at distal elements (right). Statistic is hypergeometric enrichment test and two-tailed Fisher’s exact test. 420 Figure F.8 siZMYND8 and siCHD4 gene expression analysis 421 Figure F.8 (cont’d) a, Immunoblot for (top) ZMYND8 compared to (bottom) β-actin loading control in 12Z cells treated with non-targeting control siRNA or siZMYND8 (ZMYND8 knockdown). b, RNA-seq expression of ZMYND8 in control and siZMYND8 cells. Statistic is FDR-corrected DESeq2 Wald test. c, Volcano plot for differential gene expression between siZMYND8 and control cells. FDR < 0.001 was used as a significance threshold. Top significant ZMYND8-dependent genes are labeled. d-f, CHD4 knockdown framework and RNA-seq analysis as in a-c. g, Euler diagram displaying overlap between siZMYND8 DGE and ZMYND8 promoter-bound genes. Statistic is hypergeometric enrichment test. h, Euler diagram displaying overlap between siCHD4 DGE and CHD4 promoter-bound genes. Statistic is hypergeometric enrichment test. i, Left, association of siZMYND8 DGE among ARID1A-H3.3 co-regulated gene classes described in Fig. 7.4f-h. Right, distribution of significantly upregulated (ZMYND8 repressed) vs. downregulated (ZMYND8 activated) siZMYND8 DE genes among ARID1A-H3.3 co-regulated gene classes. Statistic is two-tailed Fisher’s exact test. j, RNA-seq DGE (FDR < 0.001) overlap and enrichment across the four analyzed knockdown conditions: siARID1A, siH3F3B, siCHD4, and siZMYND8. The black cell diagonal represents the number of total significant DE genes in that condition. The bottom-left triangle displays the number of overlapping DE genes between pairwise knockdowns. The upper-right triangle displays the overlap enrichment significance by hypergeometric enrichment test. A reduced 18,077 gene set universe was used that contains genes expressed in all analyzed conditions. k, Overlap of FDR < 0.001 DGE sets across the four analyzed knockdown conditions. l, Overlap of FDR < 0.05 DGE sets across the four analyzed knockdown conditions, corresponding with Fig. 7.7f. 422 Figure F.9 H4K16ac enrichment at repressed mechanistic genes Enrichment of H4K16ac at gene promoters or gene bodies of ARID1A-H3.3-ZMYND8-CHD4 co-repressed genes—i.e. genes that are upregulated (FDR < 0.05) following treatment with siARID1A, siH3F3B, siZMYND8, and siCHD4—compared to all expressed genes with chromatin data. Statistic is hypergeometric enrichment test. 423 REFERENCES 424 REFERENCES Abbott, J. A. 2017. 'Adenomyosis and Abnormal Uterine Bleeding (AUB-A)-Pathogenesis, diagnosis, and management', Best Pract Res Clin Obstet Gynaecol, 40: 68-81. Achenbach, F., M. Rose, N. Ortiz-Bruchle, L. Seillier, R. Knuchel, V. Weyerer, A. Hartmann, R. Morsch, A. Maurer, T. H. Ecke, S. Garczyk, and N. T. Gaisa. 2020. 'SWI/SNF Alterations in Squamous Bladder Cancers', Genes (Basel), 11. Adams, J. R., K. Xu, J. C. Liu, N. M. Agamez, A. J. Loch, R. G. Wong, W. Wang, K. L. Wright, T. F. Lane, E. Zacksenhaus, and S. E. Egan. 2011. 'Cooperation between Pik3ca and p53 mutations in mouse mammary tumor formation', Cancer Res, 71: 2706-17. Adhikary, S., S. Sanyal, M. Basu, I. Sengupta, S. Sen, D. K. Srivastava, S. Roy, and C. Das. 2016. 'Selective Recognition of H3.1K36 Dimethylation/H4K16 Acetylation Facilitates the Regulation of All-trans-retinoic Acid (ATRA)-responsive Genes by Putative Chromatin Reader ZMYND8', J Biol Chem, 291: 2664-81. Ahmed, Z., and D. Ucar. 2017. 'I-ATAC: interactive pipeline for the management and pre- processing of ATAC-seq samples', PeerJ, 5: e4040. Aird, D., M. G. Ross, W. S. Chen, M. Danielsson, T. Fennell, C. Russ, D. B. Jaffe, C. Nusbaum, and A. Gnirke. 2011. 'Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries', Genome Biol, 12: R18. Akbay, E. A., C. G. Pena, D. Ruder, J. A. Michel, Y. Nakada, S. Pathak, A. S. Multani, S. Chang, and D. H. Castrillon. 2013. 'Cooperation between p53 and the telomere-protecting shelterin component Pot1a in endometrial carcinogenesis', Oncogene, 32: 2211-9. Alabert, C., and A. Groth. 2012. 'Chromatin replication and epigenome maintenance', Nat Rev Mol Cell Biol, 13: 153-67. Alfert, A., N. Moreno, and K. Kerl. 2019. 'The BAF complex in development and disease', Epigenetics Chromatin, 12: 19. Ali, S. H., and J. A. DeCaprio. 2001. 'Cellular transformation by SV40 large T antigen: interaction with host proteins', Semin Cancer Biol, 11: 15-23. 425 Allhoff, M., K. Sere, F. Pires J, M. Zenke, and G. Costa I. 2016. 'Differential peak calling of ChIP-seq signals with replicates with THOR', Nucleic Acids Res, 44: e153. Allo, G., M. Q. Bernardini, R. C. Wu, M. Shih Ie, S. Kalloger, A. Pollett, C. B. Gilks, and B. A. Clarke. 2014. 'ARID1A loss correlates with mismatch repair deficiency and intact p53 expression in high-grade endometrial carcinomas', Mod Pathol, 27: 255-61. Aloni-Grinstein, R., Y. Shetzer, T. Kaufman, and V. Rotter. 2014. 'p53: the barrier to cancer stem cell formation', FEBS Lett, 588: 2580-9. Alotaibi, F. T., B. Peng, C. Klausen, A. F. Lee, A. O. Abdelkareem, N. L. Orr, H. Noga, M. A. Bedaiwy, and P. J. Yong. 2019. 'Plasminogen activator inhibitor-1 (PAI-1) expression in endometriosis', PLoS One, 14: e0219064. Alpsoy, A., and E. C. Dykhuizen. 2018. 'Glioma tumor suppressor candidate region gene 1 (GLTSCR1) and its paralog GLTSCR1-like form SWI/SNF chromatin remodeling subcomplexes', J Biol Chem, 293: 3892-903. Alver, B. H., K. H. Kim, P. Lu, X. Wang, H. E. Manchester, W. Wang, J. R. Haswell, P. J. Park, and C. W. Roberts. 2017. 'The SWI/SNF chromatin remodelling complex is required for maintenance of lineage specific enhancers', Nat. Commun., 8: 14648. Ambros, R. A., M. E. Sherman, C. M. Zahn, P. Bitterman, and R. J. Kurman. 1995. 'Endometrial intraepithelial carcinoma: a distinctive lesion specifically associated with tumors displaying serous differentiation', Hum Pathol, 26: 1260-7. Amemiya, H. M., A. Kundaje, and A. P. Boyle. 2019. 'The ENCODE Blacklist: Identification of Problematic Regions of the Genome', Sci Rep, 9: 9354. Andrade, D. A. P., V. D. da Silva, G. M. Matsushita, M. A. de Lima, M. A. Vieira, Cemc Andrade, R. L. Schmidt, R. M. Reis, and R. Dos Reis. 2019. 'Squamous differentiation portends poor prognosis in low and intermediate-risk endometrioid endometrial cancer', PLoS One, 14: e0220086. Andrews, S. 2010. 'FastQC: A Quality Control Tool for High Throughput Sequence Data.'. http://www.bioinformatics.babraham.ac.uk/projects/fastqc. Andrysik, Z., M. D. Galbraith, A. L. Guarnieri, S. Zaccara, K. D. Sullivan, A. Pandey, M. MacBeth, A. Inga, and J. M. Espinosa. 2017. 'Identification of a core TP53 426 transcriptional program with highly distributed tumor suppressive activity', Genome Res, 27: 1645-57. Anglesio, M. S., N. Papadopoulos, A. Ayhan, T. M. Nazeran, M. Noe, H. M. Horlings, A. Lum, S. Jones, J. Senz, T. Seckin, J. Ho, R. C. Wu, V. Lac, H. Ogawa, B. Tessier-Cloutier, R. Alhassan, A. Wang, Y. Wang, J. D. Cohen, F. Wong, A. Hasanovic, N. Orr, M. Zhang, M. Popoli, W. McMahon, L. D. Wood, A. Mattox, C. Allaire, J. Segars, C. Williams, C. Tomasetti, N. Boyd, K. W. Kinzler, C. B. Gilks, L. Diaz, T. L. Wang, B. Vogelstein, P. J. Yong, D. G. Huntsman, and I. M. Shih. 2017. 'Cancer-Associated Mutations in Endometriosis without Cancer', N Engl J Med, 376: 1835-48. Arnold, J. T., D. G. Kaufman, M. Seppala, and B. A. Lessey. 2001. 'Endometrial stromal cells regulate epithelial cell growth in vitro: a new co-culture model', Hum Reprod, 16: 836- 45. Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. 2000. 'Gene ontology: tool for the unification of biology. The Gene Ontology Consortium', Nat Genet, 25: 25-9. Aubrey, B. J., G. L. Kelly, A. Janic, M. J. Herold, and A. Strasser. 2018. 'How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression?', Cell Death Differ, 25: 104-13. Azizi, N., J. Toma, M. Martin, M. F. Khalid, F. Mousavi, P. W. Win, M. T. Borrello, N. Steele, J. Shi, M. P. di Magliano, and C. L. Pin. 2021. 'Loss of activating transcription factor 3 prevents KRAS-mediated pancreatic cancer', Oncogene, 40: 3118-35. Bachmann, C., H. Nguyen, J. Rosenbusch, L. Pham, T. Rabe, M. Patwa, G. Sokpor, R. H. Seong, R. Ashery-Padan, A. Mansouri, A. Stoykova, J. F. Staiger, and T. Tuoc. 2016. 'mSWI/SNF (BAF) Complexes Are Indispensable for the Neurogenesis and Development of Embryonic Olfactory Epithelium', PLoS Genet, 12: e1006274. Baergen, R. N., C. D. Warren, C. Isacson, and L. H. Ellenson. 2001. 'Early uterine serous carcinoma: clonal origin of extrauterine disease', Int J Gynecol Pathol, 20: 214-9. Bai, J., P. Mei, C. Zhang, F. Chen, C. Li, Z. Pan, H. Liu, and J. Zheng. 2013. 'BRG1 is a prognostic marker and potential therapeutic target in human breast cancer', PLoS One, 8: e59772. 427 Bannister, A. J., and T. Kouzarides. 2011. 'Regulation of chromatin by histone modifications', Cell Res, 21: 381-95. Barbie, D. A., P. Tamayo, J. S. Boehm, S. Y. Kim, S. E. Moody, I. F. Dunn, A. C. Schinzel, P. Sandy, E. Meylan, C. Scholl, S. Frohling, E. M. Chan, M. L. Sos, K. Michel, C. Mermel, S. J. Silver, B. A. Weir, J. H. Reiling, Q. Sheng, P. B. Gupta, R. C. Wadlow, H. Le, S. Hoersch, B. S. Wittner, S. Ramaswamy, D. M. Livingston, D. M. Sabatini, M. Meyerson, R. K. Thomas, E. S. Lander, J. P. Mesirov, D. E. Root, D. G. Gilliland, T. Jacks, and W. C. Hahn. 2009. 'Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1', Nature, 462: 108-12. Barbieri, R. L. 2014. 'The endocrinology of the menstrual cycle', Methods Mol Biol, 1154: 145- 69. Bartley, J., A. Julicher, B. Hotz, S. Mechsner, and H. Hotz. 2014. 'Epithelial to mesenchymal transition (EMT) seems to be regulated differently in endometriosis and the endometrium', Arch Gynecol Obstet, 289: 871-81. Basta, J., and M. Rauchman. 2015. 'The nucleosome remodeling and deacetylase complex in development and disease', Transl Res, 165: 36-47. Bedaiwy, M. A., T. Falcone, E. J. Mascha, and R. F. Casper. 2006. 'Genetic polymorphism in the fibrinolytic system and endometriosis', Obstet Gynecol, 108: 162-8. Bell, D. W., and L. H. Ellenson. 2019. 'Molecular Genetics of Endometrial Carcinoma', Annu Rev Pathol, 14: 339-67. Benjamini, Y., D. Drai, G. Elmer, N. Kafkafi, and I. Golani. 2001. 'Controlling the false discovery rate in behavior genetics research', Behav Brain Res, 125: 279-84. Berchuck, A., M. F. Kohler, J. R. Marks, R. Wiseman, J. Boyd, and R. C. Bast, Jr. 1994. 'The p53 tumor suppressor gene frequently is altered in gynecologic cancers', Am J Obstet Gynecol, 170: 246-52. Berg, A., E. A. Hoivik, S. Mjos, F. Holst, H. M. Werner, I. L. Tangen, A. Taylor-Weiner, W. J. Gibson, K. Kusonmano, E. Wik, J. Trovik, M. K. Halle, A. M. Oyan, K. H. Kalland, A. D. Cherniack, R. Beroukhim, I. Stefansson, G. B. Mills, C. Krakstad, and H. B. Salvesen. 2015. 'Molecular profiling of endometrial carcinoma precursor, primary and metastatic lesions suggests different targets for treatment in obese compared to non-obese patients', Oncotarget, 6: 1327-39. 428 Berger, A. C., A. Korkut, R. S. Kanchi, A. M. Hegde, W. Lenoir, W. Liu, Y. Liu, H. Fan, H. Shen, V. Ravikumar, A. Rao, A. Schultz, X. Li, P. Sumazin, C. Williams, P. Mestdagh, P. H. Gunaratne, C. Yau, R. Bowlby, A. G. Robertson, D. G. Tiezzi, C. Wang, A. D. Cherniack, A. K. Godwin, N. M. Kuderer, J. S. Rader, R. E. Zuna, A. K. Sood, A. J. Lazar, A. I. Ojesina, C. Adebamowo, S. N. Adebamowo, K. A. Baggerly, T. W. Chen, H. S. Chiu, S. Lefever, L. Liu, K. MacKenzie, S. Orsulic, J. Roszik, C. S. Shelley, Q. Song, C. P. Vellano, N. Wentzensen, Network Cancer Genome Atlas Research, J. N. Weinstein, G. B. Mills, D. A. Levine, and R. Akbani. 2018. 'A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers', Cancer Cell, 33: 690-705 e9. Bergeron, C., F. Amant, and A. Ferenczy. 2006. 'Pathology and physiopathology of adenomyosis', Best Pract Res Clin Obstet Gynaecol, 20: 511-21. Berglind, H., Y. Pawitan, S. Kato, C. Ishioka, and T. Soussi. 2008. 'Analysis of p53 mutation status in human cancer cell lines: a paradigm for cell line cross-contamination', Cancer Biol Ther, 7: 699-708. Bilyk, O., M. Coatham, M. Jewer, and L. M. Postovit. 2017. 'Epithelial-to-Mesenchymal Transition in the Female Reproductive Tract: From Normal Functioning to Disease Pathology', Front Oncol, 7: 145. Bioconductor Core Team, and BP Maintainer. 2016. 'TxDb.Hsapiens.UCSC.hg38.knownGene: Annotation package for TxDb object(s). R package version 3.4.0.'. Birk, O. S., D. E. Casiano, C. A. Wassif, T. Cogliati, L. Zhao, Y. Zhao, A. Grinberg, S. Huang, J. A. Kreidberg, K. L. Parker, F. D. Porter, and H. Westphal. 2000. 'The LIM homeobox gene Lhx9 is essential for mouse gonad formation', Nature, 403: 909-13. Bitler, B. G., K. M. Aird, A. Garipov, H. Li, M. Amatangelo, A. V. Kossenkov, D. C. Schultz, Q. Liu, M. Shih Ie, J. R. Conejo-Garcia, D. W. Speicher, and R. Zhang. 2015. 'Synthetic lethality by targeting EZH2 methyltransferase activity in ARID1A-mutated cancers', Nat Med, 21: 231-8. Bjornstrom, L., and M. Sjoberg. 2005. 'Mechanisms of estrogen receptor signaling: convergence of genomic and nongenomic actions on target genes', Mol Endocrinol, 19: 833-42. Blanco, L. Z., Jr., D. E. Heagley, J. C. Lee, A. M. Gown, P. Gattuso, J. Rotmensch, A. Guirguis, S. Dewdney, and P. Bitterman. 2013. 'Immunohistochemical characterization of squamous differentiation and morular metaplasia in uterine endometrioid adenocarcinoma', Int J Gynecol Pathol, 32: 283-92. 429 Blumli, S., N. Wiechens, M. Y. Wu, V. Singh, M. Gierlinski, G. Schweikert, N. Gilbert, C. Naughton, R. Sundaramoorthy, J. Varghese, R. Gourlay, R. Soares, D. Clark, and T. Owen-Hughes. 2021. 'Acute depletion of the ARID1A subunit of SWI/SNF complexes reveals distinct pathways for activation and repression of transcription', Cell Rep, 37: 109943. Bokhman, J. V. 1983. 'Two pathogenetic types of endometrial carcinoma', Gynecol Oncol, 15: 10-7. Bolstad, B. M., R. A. Irizarry, M. Astrand, and T. P. Speed. 2003. 'A comparison of normalization methods for high density oligonucleotide array data based on variance and bias', Bioinformatics, 19: 185-93. Boretto, M., N. Maenhoudt, X. Luo, A. Hennes, B. Boeckx, B. Bui, R. Heremans, L. Perneel, H. Kobayashi, I. Van Zundert, H. Brems, B. Cox, M. Ferrante, I. H. Uji, K. P. Koh, T. D'Hooghe, A. Vanhie, I. Vergote, C. Meuleman, C. Tomassetti, D. Lambrechts, J. Vriens, D. Timmerman, and H. Vankelecom. 2019. 'Patient-derived organoids from endometrial disease capture clinical heterogeneity and are amenable to drug screening', Nat Cell Biol, 21: 1041-51. Borgoni, S., E. Sofyali, M. Soleimani, H. Wilhelm, K. Muller-Decker, R. Will, A. Noronha, L. Beumers, P. J. Verschure, Y. Yarden, L. Magnani, A. H. C. van Kampen, P. D. Moerland, and S. Wiemann. 2020. 'Time-Resolved Profiling Reveals ATF3 as a Novel Mediator of Endocrine Resistance in Breast Cancer', Cancers (Basel), 12. Bosse, T., N. T. ter Haar, L. M. Seeber, P. J. v Diest, F. J. Hes, H. F. Vasen, R. A. Nout, C. L. Creutzberg, H. Morreau, and V. T. Smit. 2013. 'Loss of ARID1A expression and its relationship with PI3K-Akt pathway alterations, TP53 and microsatellite instability in endometrial cancer', Mod Pathol, 26: 1525-35. Bottardi, S., L. Mavoungou, H. Pak, S. Daou, V. Bourgoin, Y. A. Lakehal, B. Affar el, and E. Milot. 2014. 'The IKAROS interaction with a complex including chromatin remodeling and transcription elongation activities is required for hematopoiesis', PLoS Genet, 10: e1004827. Bowman, G. D., and M. G. Poirier. 2015. 'Post-translational modifications of histones that influence nucleosome dynamics', Chem Rev, 115: 2274-95. Boyd, K. E., and P. J. Farnham. 1997. 'Myc versus USF: discrimination at the cad gene is determined by core promoter elements', Mol Cell Biol, 17: 2529-37. 430 Boyle, A. P., S. Davis, H. P. Shulha, P. Meltzer, E. H. Margulies, Z. Weng, T. S. Furey, and G. E. Crawford. 2008. 'High-resolution mapping and characterization of open chromatin across the genome', Cell, 132: 311-22. Bracken, A. P., G. L. Brien, and C. P. Verrijzer. 2019. 'Dangerous liaisons: interplay between SWI/SNF, NuRD, and Polycomb in chromatin regulation and cancer', Genes Dev, 33: 936-59. Bruner, K. L., L. M. Matrisian, W. H. Rodgers, F. Gorstein, and K. G. Osteen. 1997. 'Suppression of matrix metalloproteinases inhibits establishment of ectopic lesions by human endometrium in nude mice', J Clin Invest, 99: 2851-7. Bruse, C., A. Bergqvist, K. Carlstrom, A. Fianu-Jonasson, I. Lecander, and B. Astedt. 1998. 'Fibrinolytic factors in endometriotic tissue, endometrium, peritoneal fluid, and plasma from women with endometriosis and in endometrium and peritoneal fluid from healthy women', Fertil Steril, 70: 821-6. Bruse, C., D. Radu, and A. Bergqvist. 2004. 'In situ localization of mRNA for the fibrinolytic factors uPA, PAI-1 and uPAR in endometriotic and endometrial tissue', Mol Hum Reprod, 10: 159-66. Buenrostro, J. D., P. G. Giresi, L. C. Zaba, H. Y. Chang, and W. J. Greenleaf. 2013. 'Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position', Nat Methods, 10: 1213-8. Buenrostro, J. D., B. Wu, H. Y. Chang, and W. J. Greenleaf. 2015. 'ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide', Curr Protoc Mol Biol, 109: 21 29 1- 21 29 9. Bull, J. R., S. P. Rowland, E. B. Scherwitzl, R. Scherwitzl, K. G. Danielsson, and J. Harper. 2019. 'Real-world menstrual cycle characteristics of more than 600,000 menstrual cycles', NPJ Digit Med, 2: 83. Bultman, S., T. Gebuhr, D. Yee, C. La Mantia, J. Nicholson, A. Gilliam, F. Randazzo, D. Metzger, P. Chambon, G. Crabtree, and T. Magnuson. 2000. 'A Brg1 null mutation in the mouse reveals functional differences among mammalian SWI/SNF complexes', Mol Cell, 6: 1287-95. Bulun, S. E. 2009. 'Endometriosis', N Engl J Med, 360: 268-79. 431 Bulun, S. E., Y. H. Cheng, M. E. Pavone, Q. Xue, E. Attar, E. Trukhacheva, H. Tokunaga, H. Utsunomiya, P. Yin, X. Luo, Z. Lin, G. Imir, S. Thung, E. J. Su, and J. J. Kim. 2010. 'Estrogen receptor-beta, estrogen receptor-alpha, and progesterone resistance in endometriosis', Semin Reprod Med, 28: 36-43. Burney, R. O., and L. C. Giudice. 2012. 'Pathogenesis and pathophysiology of endometriosis', Fertil Steril, 98: 511-9. Calo, E., and J. Wysocka. 2013. 'Modification of enhancer chromatin: what, how, and why?', Mol Cell, 49: 825-37. Cancer Genome Atlas Research, Network. 2011. 'Integrated genomic analyses of ovarian carcinoma', Nature, 474: 609-15. Cancer Genome Atlas Research Network, C. Kandoth, N. Schultz, A. D. Cherniack, R. Akbani, Y. Liu, H. Shen, A. G. Robertson, I. Pashtan, R. Shen, C. C. Benz, C. Yau, P. W. Laird, L. Ding, W. Zhang, G. B. Mills, R. Kucherlapati, E. R. Mardis, and D. A. Levine. 2013. 'Integrated genomic characterization of endometrial carcinoma', Nature, 497: 67-73. Carlson, M., and Bioconductor Package Maintainer. 2015. 'TxDb.Scerevisiae.UCSC.sacCer3.sgdGene: Annotation package for TxDb object(s). R package version 3.2.2.'. Cerami, E., J. Gao, U. Dogrusoz, B. E. Gross, S. O. Sumer, B. A. Aksoy, A. Jacobsen, C. J. Byrne, M. L. Heuer, E. Larsson, Y. Antipin, B. Reva, A. P. Goldberg, C. Sander, and N. Schultz. 2012. 'The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data', Cancer Discov, 2: 401-4. Chakravarty, D., J. Gao, S. M. Phillips, R. Kundra, H. Zhang, J. Wang, J. E. Rudolph, R. Yaeger, T. Soumerai, M. H. Nissan, M. T. Chang, S. Chandarlapaty, T. A. Traina, P. K. Paik, A. L. Ho, F. M. Hantash, A. Grupe, S. S. Baxi, M. K. Callahan, A. Snyder, P. Chi, D. Danila, M. Gounder, J. J. Harding, M. D. Hellmann, G. Iyer, Y. Janjigian, T. Kaley, D. A. Levine, M. Lowery, A. Omuro, M. A. Postow, D. Rathkopf, A. N. Shoushtari, N. Shukla, M. Voss, E. Paraiso, A. Zehir, M. F. Berger, B. S. Taylor, L. B. Saltz, G. J. Riely, M. Ladanyi, D. M. Hyman, J. Baselga, P. Sabbatini, D. B. Solit, and N. Schultz. 2017. 'OncoKB: A Precision Oncology Knowledge Base', JCO Precis Oncol, 2017. Chan, R. W., and C. E. Gargett. 2006. 'Identification of label-retaining cells in mouse endometrium', Stem Cells, 24: 1529-38. 432 Chandler, R. L., J. Brennan, J. C. Schisler, D. Serber, C. Patterson, and T. Magnuson. 2013. 'ARID1a-DNA interactions are required for promoter occupancy by SWI/SNF', Mol Cell Biol, 33: 265-80. Chandler, R. L., J. S. Damrauer, J. R. Raab, J. C. Schisler, M. D. Wilkerson, J. P. Didion, J. Starmer, D. Serber, D. Yee, J. Xiong, D. B. Darr, F. Pardo-Manuel de Villena, W. Y. Kim, and T. Magnuson. 2015. 'Coexistent ARID1A-PIK3CA mutations promote ovarian clear-cell tumorigenesis through pro-tumorigenic inflammatory cytokine signalling', Nat Commun, 6: 6118. Chen, E. Y., C. M. Tan, Y. Kou, Q. Duan, Z. Wang, G. V. Meirelles, N. R. Clark, and A. Ma'ayan. 2013. 'Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool', BMC Bioinformatics, 14: 128. Chen, K., Z. Hu, Z. Xia, D. Zhao, W. Li, and J. K. Tyler. 2015. 'The Overlooked Fact: Fundamental Need for Spike-In Control for Virtually All Genome-Wide Analyses', Mol Cell Biol, 36: 662-7. Chen, P., J. Zhao, Y. Wang, M. Wang, H. Long, D. Liang, L. Huang, Z. Wen, W. Li, X. Li, H. Feng, H. Zhao, P. Zhu, M. Li, Q. F. Wang, and G. Li. 2013. 'H3.3 actively marks enhancers and primes gene transcription via opening higher-ordered chromatin', Genes Dev, 27: 2109-24. Cheneby, J., Z. Menetrier, M. Mestdagh, T. Rosnet, A. Douida, W. Rhalloussi, A. Bergon, F. Lopez, and B. Ballester. 2020. 'ReMap 2020: a database of regulatory regions from an integrative analysis of Human and Arabidopsis DNA-binding sequencing experiments', Nucleic Acids Res, 48: D180-D88. Cherniack, A. D., H. Shen, V. Walter, C. Stewart, B. A. Murray, R. Bowlby, X. Hu, S. Ling, R. A. Soslow, R. R. Broaddus, R. E. Zuna, G. Robertson, P. W. Laird, R. Kucherlapati, G. B. Mills, Network Cancer Genome Atlas Research, J. N. Weinstein, J. Zhang, R. Akbani, and D. A. Levine. 2017. 'Integrated Molecular Characterization of Uterine Carcinosarcoma', Cancer Cell, 31: 411-23. Cheung, L. W., B. T. Hennessy, J. Li, S. Yu, A. P. Myers, B. Djordjevic, Y. Lu, K. Stemke-Hale, M. D. Dyer, F. Zhang, Z. Ju, L. C. Cantley, S. E. Scherer, H. Liang, K. H. Lu, R. R. Broaddus, and G. B. Mills. 2011. 'High frequency of PIK3R1 and PIK3R2 mutations in endometrial cancer elucidates a novel mechanism for regulation of PTEN protein stability', Cancer Discov, 1: 170-85. 433 Cho, H. D., J. E. Lee, H. Y. Jung, M. H. Oh, J. H. Lee, S. H. Jang, K. J. Kim, S. W. Han, S. Y. Kim, H. J. Kim, S. B. Bae, and H. J. Lee. 2015. 'Loss of Tumor Suppressor ARID1A Protein Expression Correlates with Poor Prognosis in Patients with Primary Breast Cancer', J Breast Cancer, 18: 339-46. Chui, M. H., T. L. Wang, and I. M. Shih. 2017. 'Endometriosis: benign, malignant, or something in between?', Oncotarget, 8: 78263-64. Clapier, C. R. 2021. 'Sophisticated Conversations between Chromatin and Chromatin Remodelers, and Dissonances in Cancer', Int J Mol Sci, 22. Clapier, C. R., and B. R. Cairns. 2009. 'The biology of chromatin remodeling complexes', Annu Rev Biochem, 78: 273-304. Colino-Sanguino, Y., E. M. Cornett, D. Moulder, G. C. Smith, J. Hrit, E. Cordeiro-Spinetti, R. M. Vaughan, K. Krajewski, S. B. Rothbart, S. J. Clark, and F. Valdes-Mora. 2019. 'A Read/Write Mechanism Connects p300 Bromodomain Function to H2A.Z Acetylation', iScience, 21: 773-88. Collaborative Group on Epidemiological Studies on Endometrial, Cancer. 2015. 'Endometrial cancer and oral contraceptives: an individual participant meta-analysis of 27 276 women with endometrial cancer from 36 epidemiological studies', Lancet Oncol, 16: 1061-70. Conlon, N., A. Silva, E. Guerra, P. Jelinic, B. A. Schlappe, N. Olvera, J. J. Mueller, C. Tornos, A. A. Jungbluth, R. H. Young, E. Oliva, D. Levine, and R. A. Soslow. 2016. 'Loss of SMARCA4 Expression Is Both Sensitive and Specific for the Diagnosis of Small Cell Carcinoma of Ovary, Hypercalcemic Type', Am J Surg Pathol, 40: 395-403. Consortium, Encode Project. 2012. 'An integrated encyclopedia of DNA elements in the human genome', Nature, 489: 57-74. Constantine, G. D., G. Kessler, S. Graham, and S. R. Goldstein. 2019. 'Increased Incidence of Endometrial Cancer Following the Women's Health Initiative: An Assessment of Risk Factors', J Womens Health (Larchmt), 28: 237-43. Cooke, P. S., D. L. Buchanan, P. Young, T. Setiawan, J. Brody, K. S. Korach, J. Taylor, D. B. Lubahn, and G. R. Cunha. 1997. 'Stromal estrogen receptors mediate mitogenic effects of estradiol on uterine epithelium', Proc Natl Acad Sci U S A, 94: 6535-40. 434 Coppe, J. P., P. Y. Desprez, A. Krtolica, and J. Campisi. 2010. 'The senescence-associated secretory phenotype: the dark side of tumor suppression', Annu Rev Pathol, 5: 99-118. Coppens, M. T., M. A. Dhont, J. G. De Boever, R. F. Serreyn, D. A. Vandekerckhove, and H. J. Roels. 1993. 'The distribution of oestrogen and progesterone receptors in the human endometrial basal and functional layer during the normal menstrual cycle. An immunocytochemical study', Histochemistry, 99: 121-6. Corces, M. R., J. M. Granja, S. Shams, B. H. Louie, J. A. Seoane, W. Zhou, T. C. Silva, C. Groeneveld, C. K. Wong, S. W. Cho, A. T. Satpathy, M. R. Mumbach, K. A. Hoadley, A. G. Robertson, N. C. Sheffield, I. Felau, M. A. A. Castro, B. P. Berman, L. M. Staudt, J. C. Zenklusen, P. W. Laird, C. Curtis, Network Cancer Genome Atlas Analysis, W. J. Greenleaf, and H. Y. Chang. 2018. 'The chromatin accessibility landscape of primary human cancers', Science, 362. Corces, M. R., A. E. Trevino, E. G. Hamilton, P. G. Greenside, N. A. Sinnott-Armstrong, S. Vesuna, A. T. Satpathy, A. J. Rubin, K. S. Montine, B. Wu, A. Kathiria, S. W. Cho, M. R. Mumbach, A. C. Carter, M. Kasowski, L. A. Orloff, V. I. Risca, A. Kundaje, P. A. Khavari, T. J. Montine, W. J. Greenleaf, and H. Y. Chang. 2017. 'An improved ATAC- seq protocol reduces background and enables interrogation of frozen tissues', Nat Methods, 14: 959-62. Cornett, E. M., B. M. Dickson, and S. B. Rothbart. 2017. 'Analysis of Histone Antibody Specificity with Peptide Microarrays', J Vis Exp. Creyghton, M. P., A. W. Cheng, G. G. Welstead, T. Kooistra, B. W. Carey, E. J. Steine, J. Hanna, M. A. Lodato, G. M. Frampton, P. A. Sharp, L. A. Boyer, R. A. Young, and R. Jaenisch. 2010. 'Histone H3K27ac separates active from poised enhancers and predicts developmental state', Proc Natl Acad Sci U S A, 107: 21931-6. Critchley, H. O. D., J. A. Maybin, G. M. Armstrong, and A. R. W. Williams. 2020. 'Physiology of the Endometrium and Regulation of Menstruation', Physiol Rev, 100: 1149-79. Cuevas, I. C., S. S. Sahoo, A. Kumar, H. Zhang, J. Westcott, M. Aguilar, J. D. Cortez, S. A. Sullivan, C. Xing, D. N. Hayes, R. A. Brekken, V. L. Bae-Jump, and D. H. Castrillon. 2019. 'Fbxw7 is a driver of uterine carcinosarcoma by promoting epithelial-mesenchymal transition', Proc Natl Acad Sci U S A, 116: 25880-90. D'Hooghe, T. M., and S. Debrock. 2002. 'Endometriosis, retrograde menstruation and peritoneal inflammation in women and in baboons', Hum Reprod Update, 8: 84-8. 435 Dai, D., D. M. Wolf, E. S. Litman, M. J. White, and K. K. Leslie. 2002. 'Progesterone inhibits human endometrial cancer cell growth and invasiveness: down-regulation of cellular adhesion molecules through progesterone B receptors', Cancer Res, 62: 881-6. Daikoku, T., Y. Hirota, S. Tranguch, A. R. Joshi, F. J. DeMayo, J. P. Lydon, L. H. Ellenson, and S. K. Dey. 2008. 'Conditional loss of uterine Pten unfailingly and rapidly induces endometrial cancer in mice', Cancer Res, 68: 5619-27. Daikoku, T., Y. Ogawa, J. Terakawa, A. Ogawa, T. DeFalco, and S. K. Dey. 2014. 'Lactoferrin- iCre: a new mouse line to study uterine epithelial gene function', Endocrinology, 155: 2718-24. Daley, T., and A. D. Smith. 2013. 'Predicting the molecular complexity of sequencing libraries', Nat Methods, 10: 325-7. Dallas, P. B., S. Pacchione, D. Wilsker, V. Bowrin, R. Kobayashi, and E. Moran. 2000. 'The human SWI-SNF complex protein p270 is an ARID family member with non-sequence- specific DNA binding activity', Mol Cell Biol, 20: 3137-46. Davies, J., and R. A. Kadir. 2012. 'Endometrial haemostasis and menstruation', Rev Endocr Metab Disord, 13: 289-99. del Carmen, M. G., M. Birrer, and J. O. Schorge. 2012. 'Uterine papillary serous cancer: a review of the literature', Gynecol Oncol, 127: 651-61. Deligdisch, L. 2000. 'Hormonal pathology of the endometrium', Mod Pathol, 13: 285-94. Dempster, J. M., J. Rossen, M. Kazachkova, J. Pan, G. Kugener, D. E. Root, and A. Tsherniak. 2019. 'Extracting Biological Insights from the Project Achilles Genome-Scale CRISPR Screens in Cancer Cell Lines', bioRxiv. Depmap, Broad. 2020. "DepMap 21Q1 Public." In. Depreeuw, J., E. Stelloo, E. M. Osse, C. L. Creutzberg, R. A. Nout, M. Moisse, D. A. Garcia- Dios, M. Dewaele, K. Willekens, J. C. Marine, X. Matias-Guiu, F. Amant, D. Lambrechts, and T. Bosse. 2017. 'Amplification of 1q32.1 Refines the Molecular Classification of Endometrial Carcinoma', Clin Cancer Res, 23: 7232-41. 436 Desouki, M. M., S. J. Kallas, D. Khabele, M. A. Crispens, O. Hameed, and O. Fadare. 2014. 'Differential vimentin expression in ovarian and uterine corpus endometrioid adenocarcinomas: diagnostic utility in distinguishing double primaries from metastatic tumors', Int J Gynecol Pathol, 33: 274-81. Dickson, B. M., E. M. Cornett, Z. Ramjan, and S. B. Rothbart. 2016. 'ArrayNinja: An Open Source Platform for Unified Planning and Analysis of Microarray Experiments', Methods Enzymol, 574: 53-77. Dingwall, C., G. P. Lomonossoff, and R. A. Laskey. 1981. 'High sequence specificity of micrococcal nuclease', Nucleic Acids Res, 9: 2659-73. Divate, M., and E. Cheung. 2018. 'GUAVA: A Graphical User Interface for the Analysis and Visualization of ATAC-seq Data', Front Genet, 9: 250. Dobin, A., C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, and T. R. Gingeras. 2013. 'STAR: ultrafast universal RNA-seq aligner', Bioinformatics, 29: 15-21. Donehower, L. A. 1996. 'The p53-deficient mouse: a model for basic and applied cancer studies', Semin Cancer Biol, 7: 269-78. Dudoit, S.; Yang, Y. H.; Callow, M.J.; Speed, T. P. 2002. 'STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS', Statistica Sinica, 12: 111-39. Duffy, M. J. 2004. 'The urokinase plasminogen activator system: role in malignancy', Curr Pharm Des, 10: 39-49. Durinck, S., Y. Moreau, A. Kasprzyk, S. Davis, B. De Moor, A. Brazma, and W. Huber. 2005. 'BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis', Bioinformatics, 21: 3439-40. Durinck, S., P. T. Spellman, E. Birney, and W. Huber. 2009. 'Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt', Nat Protoc, 4: 1184-91. 437 Dykhuizen, E. C., D. C. Hargreaves, E. L. Miller, K. Cui, A. Korshunov, M. Kool, S. Pfister, Y. J. Cho, K. Zhao, and G. R. Crabtree. 2013. 'BAF complexes facilitate decatenation of DNA by topoisomerase IIalpha', Nature, 497: 624-7. Ellrott, K., M. H. Bailey, G. Saksena, K. R. Covington, C. Kandoth, C. Stewart, J. Hess, S. Ma, K. E. Chiotti, M. McLellan, H. J. Sofia, C. Hutter, G. Getz, D. Wheeler, L. Ding, M. C. Working Group, and Network Cancer Genome Atlas Research. 2018. 'Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines', Cell Syst, 6: 271-81 e7. Ernst, J., and M. Kellis. 2012. 'ChromHMM: automating chromatin-state discovery and characterization', Nat Methods, 9: 215-6. ———. 2017. 'Chromatin-state discovery and genome annotation with ChromHMM', Nat Protoc, 12: 2478-92. Euscher, E., P. Fox, R. Bassett, H. Al-Ghawi, R. Ali-Fehmi, D. Barbuto, B. Djordjevic, E. Frauenhoffer, I. Kim, S. R. Hong, D. Montiel, E. Moschiano, A. Roma, E. Silva, and A. Malpica. 2013. 'The pattern of myometrial invasion as a predictor of lymph node metastasis or extrauterine disease in low-grade endometrial carcinoma', Am J Surg Pathol, 37: 1728-36. Evans, J., and L. A. Salamonsen. 2012. 'Inflammation, leukocytes and menstruation', Rev Endocr Metab Disord, 13: 277-88. Ewels, P., M. Magnusson, S. Lundin, and M. Kaller. 2016. 'MultiQC: summarize analysis results for multiple tools and samples in a single report', Bioinformatics, 32: 3047-8. Ezzati, M., O. Djahanbakhch, S. Arian, and B. R. Carr. 2014. 'Tubal transport of gametes and embryos: a review of physiology and pathophysiology', J Assist Reprod Genet, 31: 1337- 47. Fabjani, G., E. Kucera, E. Schuster, M. Minai-Pour, K. Czerwenka, G. Sliutz, S. Leodolter, A. Reiner, and R. Zeillinger. 2002. 'Genetic alterations in endometrial hyperplasia and cancer', Cancer Lett, 175: 205-11. Ferenczy, A. 1998. 'Pathophysiology of adenomyosis', Hum Reprod Update, 4: 312-22. 438 Fishilevich, S., R. Nudel, N. Rappaport, R. Hadar, I. Plaschkes, T. Iny Stein, N. Rosen, A. Kohn, M. Twik, M. Safran, D. Lancet, and D. Cohen. 2017. 'GeneHancer: genome-wide integration of enhancers and target genes in GeneCards', Database (Oxford), 2017. Fiziev, P., K. C. Akdemir, J. P. Miller, E. Z. Keung, N. S. Samant, S. Sharma, C. A. Natale, C. J. Terranova, M. Maitituoheti, S. B. Amin, E. Martinez-Ledesma, M. Dhamdhere, J. B. Axelrad, A. Shah, C. S. Cheng, H. Mahadeshwar, S. Seth, M. C. Barton, A. Protopopov, K. Y. Tsai, M. A. Davies, B. A. Garcia, I. Amit, L. Chin, J. Ernst, and K. Rai. 2017. 'Systematic Epigenomic Analysis Reveals Chromatin States Associated with Melanoma Progression', Cell Rep, 19: 875-89. Fouad, Y. A., and C. Aanei. 2017. 'Revisiting the hallmarks of cancer', Am J Cancer Res, 7: 1016-36. Frankish, A., M. Diekhans, A. M. Ferreira, R. Johnson, I. Jungreis, J. Loveland, J. M. Mudge, C. Sisu, J. Wright, J. Armstrong, I. Barnes, A. Berry, A. Bignell, S. Carbonell Sala, J. Chrast, F. Cunningham, T. Di Domenico, S. Donaldson, I. T. Fiddes, C. Garcia Giron, J. M. Gonzalez, T. Grego, M. Hardy, T. Hourlier, T. Hunt, O. G. Izuogu, J. Lagarde, F. J. Martin, L. Martinez, S. Mohanan, P. Muir, F. C. P. Navarro, A. Parker, B. Pei, F. Pozo, M. Ruffier, B. M. Schmitt, E. Stapleton, M. M. Suner, I. Sycheva, B. Uszczynska- Ratajczak, J. Xu, A. Yates, D. Zerbino, Y. Zhang, B. Aken, J. S. Choudhary, M. Gerstein, R. Guigo, T. J. P. Hubbard, M. Kellis, B. Paten, A. Reymond, M. L. Tress, and P. Flicek. 2019. 'GENCODE reference annotation for the human and mouse genomes', Nucleic Acids Res, 47: D766-D73. Frerichs, A., J. Engelhorn, J. Altmuller, J. Gutierrez-Marcos, and W. Werr. 2019. 'Specific chromatin changes mark lateral organ founder cells in the Arabidopsis inflorescence meristem', J Exp Bot, 70: 3867-79. Friedl, P., and D. Gilmour. 2009. 'Collective cell migration in morphogenesis, regeneration and cancer', Nat Rev Mol Cell Biol, 10: 445-57. Fruman, D. A., H. Chiu, B. D. Hopkins, S. Bagrodia, L. C. Cantley, and R. T. Abraham. 2017. 'The PI3K Pathway in Human Disease', Cell, 170: 605-35. Gaide Chevronnay, H. P., C. Galant, P. Lemoine, P. J. Courtoy, E. Marbaix, and P. Henriet. 2009. 'Spatiotemporal coupling of focal extracellular matrix degradation and reconstruction in the menstrual human endometrium', Endocrinology, 150: 5094-105. 439 Gao, F., N. J. Elliott, J. Ho, A. Sharp, M. N. Shokhirev, and D. C. Hargreaves. 2019. 'Heterozygous Mutations in SMARCA2 Reprogram the Enhancer Landscape by Global Retargeting of SMARCA4', Mol Cell, 75: 891-904 e7. Gao, J., B. A. Aksoy, U. Dogrusoz, G. Dresdner, B. Gross, S. O. Sumer, Y. Sun, A. Jacobsen, R. Sinha, E. Larsson, E. Cerami, C. Sander, and N. Schultz. 2013. 'Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal', Sci Signal, 6: pl1. Gao, X., P. Tate, P. Hu, R. Tjian, W. C. Skarnes, and Z. Wang. 2008. 'ES cell pluripotency and germ-layer formation require the SWI/SNF chromatin remodeling component BAF250a', Proc Natl Acad Sci U S A, 105: 6656-61. Garcia-Solares, J., J. Donnez, O. Donnez, and M. M. Dolmans. 2018. 'Pathogenesis of uterine adenomyosis: invagination or metaplasia?', Fertil Steril, 109: 371-79. Gargett, C. E., H. P. Nguyen, and L. Ye. 2012. 'Endometrial regeneration and endometrial stem/progenitor cells', Rev Endocr Metab Disord, 13: 235-51. Gargett, C. E., K. E. Schwab, and J. A. Deane. 2016. 'Endometrial stem/progenitor cells: the first 10 years', Hum Reprod Update, 22: 137-63. Gatchalian, J., S. Malik, J. Ho, D. S. Lee, T. W. R. Kelso, M. N. Shokhirev, J. R. Dixon, and D. C. Hargreaves. 2018. 'A non-canonical BRD9-containing BAF chromatin remodeling complex regulates naive pluripotency in mouse embryonic stem cells', Nat Commun, 9: 5139. Gebuhr, T. C., G. I. Kovalev, S. Bultman, V. Godfrey, L. Su, and T. Magnuson. 2003. 'The role of Brg1, a catalytic subunit of mammalian chromatin-remodeling complexes, in T cell development', J Exp Med, 198: 1937-49. Gehre, M., D. Bunina, S. Sidoli, M. J. Lubke, N. Diaz, M. Trovato, B. A. Garcia, J. B. Zaugg, and K. M. Noh. 2020. 'Lysine 4 of histone H3.3 is required for embryonic stem cell differentiation, histone enrichment at regulatory regions and transcription accuracy', Nat Genet, 52: 273-82. Gellersen, B., and J. J. Brosens. 2014. 'Cyclic decidualization of the human endometrium in reproductive health and failure', Endocr Rev, 35: 851-905. 440 Ghandi, M., F. W. Huang, J. Jane-Valbuena, G. V. Kryukov, C. C. Lo, E. R. McDonald, 3rd, J. Barretina, E. T. Gelfand, C. M. Bielski, H. Li, K. Hu, A. Y. Andreev-Drakhlin, J. Kim, J. M. Hess, B. J. Haas, F. Aguet, B. A. Weir, M. V. Rothberg, B. R. Paolella, M. S. Lawrence, R. Akbani, Y. Lu, H. L. Tiv, P. C. Gokhale, A. de Weck, A. A. Mansour, C. Oh, J. Shih, K. Hadi, Y. Rosen, J. Bistline, K. Venkatesan, A. Reddy, D. Sonkin, M. Liu, J. Lehar, J. M. Korn, D. A. Porter, M. D. Jones, J. Golji, G. Caponigro, J. E. Taylor, C. M. Dunning, A. L. Creech, A. C. Warren, J. M. McFarland, M. Zamanighomi, A. Kauffmann, N. Stransky, M. Imielinski, Y. E. Maruvka, A. D. Cherniack, A. Tsherniak, F. Vazquez, J. D. Jaffe, A. A. Lane, D. M. Weinstock, C. M. Johannessen, M. P. Morrissey, F. Stegmeier, R. Schlegel, W. C. Hahn, G. Getz, G. B. Mills, J. S. Boehm, T. R. Golub, L. A. Garraway, and W. R. Sellers. 2019. 'Next-generation characterization of the Cancer Cell Line Encyclopedia', Nature, 569: 503-08. Ghosh, K., M. Tang, N. Kumari, A. Nandy, S. Basu, D. P. Mall, K. Rai, and D. Biswas. 2018. 'Positive Regulation of Transcription by Human ZMYND8 through Its Association with P-TEFb Complex', Cell Rep, 24: 2141-54 e6. Giaimo, B. D., F. Ferrante, A. Herchenrother, S. B. Hake, and T. Borggrefe. 2019. 'The histone variant H2A.Z in gene regulation', Epigenetics Chromatin, 12: 37. Gibson, W. J., E. A. Hoivik, M. K. Halle, A. Taylor-Weiner, A. D. Cherniack, A. Berg, F. Holst, T. I. Zack, H. M. Werner, K. M. Staby, M. Rosenberg, I. M. Stefansson, K. Kusonmano, A. Chevalier, K. K. Mauland, J. Trovik, C. Krakstad, M. Giannakis, E. Hodis, K. Woie, L. Bjorge, O. K. Vintermyr, J. A. Wala, M. S. Lawrence, G. Getz, S. L. Carter, R. Beroukhim, and H. B. Salvesen. 2016. 'The genomic landscape and evolution of endometrial carcinoma progression and abdominopelvic metastasis', Nat Genet, 48: 848- 55. Gilabert-Estelles, J., A. Estelles, J. Gilabert, R. Castello, F. Espana, C. Falco, A. Romeu, M. Chirivella, E. Zorio, and J. Aznar. 2003. 'Expression of several components of the plasminogen activator and matrix metalloproteinase systems in endometriosis', Hum Reprod, 18: 1516-22. Giudice, L. C., and L. C. Kao. 2004. 'Endometriosis', Lancet, 364: 1789-99. Gkeka, P., T. Evangelidis, M. Pavlaki, V. Lazani, S. Christoforidis, B. Agianian, and Z. Cournia. 2014. 'Investigating the structure and dynamics of the PIK3CA wild-type and H1047R oncogenic mutant', PLoS Comput Biol, 10: e1003895. 441 Goldman, M. J., B. Craft, M. Hastie, K. Repecka, F. McDade, A. Kamath, A. Banerjee, Y. Luo, D. Rogers, A. N. Brooks, J. Zhu, and D. Haussler. 2020. 'Visualizing and interpreting cancer genomics data via the Xena platform', Nat Biotechnol, 38: 675-78. Gong, F., L. Y. Chiu, B. Cox, F. Aymard, T. Clouaire, J. W. Leung, M. Cammarata, M. Perez, P. Agarwal, J. S. Brodbelt, G. Legube, and K. M. Miller. 2015. 'Screen identifies bromodomain protein ZMYND8 in chromatin recognition of transcription-associated DNA damage that promotes homologous recombination', Genes Dev, 29: 197-211. Gong, F., T. Clouaire, M. Aguirrebengoa, G. Legube, and K. M. Miller. 2017. 'Histone demethylase KDM5A regulates the ZMYND8-NuRD chromatin remodeler to promote DNA repair', J Cell Biol, 216: 1959-74. Gonzalez-Quiroz, M., A. Blondel, A. Sagredo, C. Hetz, E. Chevet, and R. Pedeux. 2020. 'When Endoplasmic Reticulum Proteostasis Meets the DNA Damage Response', Trends Cell Biol, 30: 881-91. Gorkin, D. U., I. Barozzi, Y. Zhao, Y. Zhang, H. Huang, A. Y. Lee, B. Li, J. Chiou, A. Wildberg, B. Ding, B. Zhang, M. Wang, J. S. Strattan, J. M. Davidson, Y. Qiu, V. Afzal, J. A. Akiyama, I. Plajzer-Frick, C. S. Novak, M. Kato, T. H. Garvin, Q. T. Pham, A. N. Harrington, B. J. Mannion, E. A. Lee, Y. Fukuda-Yuzawa, Y. He, S. Preissl, S. Chee, J. Y. Han, B. A. Williams, D. Trout, H. Amrhein, H. Yang, J. M. Cherry, W. Wang, K. Gaulton, J. R. Ecker, Y. Shen, D. E. Dickel, A. Visel, L. A. Pennacchio, and B. Ren. 2020. 'An atlas of dynamic chromatin landscapes in mouse fetal development', Nature, 583: 744-51. Gorkin, D. U., D. Leung, and B. Ren. 2014. 'The 3D genome in transcriptional regulation and pluripotency', Cell Stem Cell, 14: 762-75. Grady, D., T. Gebretsadik, K. Kerlikowske, V. Ernster, and D. Petitti. 1995. 'Hormone replacement therapy and endometrial cancer risk: a meta-analysis', Obstet Gynecol, 85: 304-13. Grandi, G., A. Toss, L. Cortesi, L. Botticelli, A. Volpe, and A. Cagnacci. 2015. 'The Association between Endometriomas and Ovarian Cancer: Preventive Effect of Inhibiting Ovulation and Menstruation during Reproductive Life', Biomed Res Int, 2015: 751571. Grau, J., I. Grosse, and J. Keilwagen. 2015. 'PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R', Bioinformatics, 31: 2595-7. 442 Groothuis, P. G., H. H. Dassen, A. Romano, and C. Punyadeera. 2007. 'Estrogen and the endometrium: lessons learned from gene expression profiling in rodents and human', Hum Reprod Update, 13: 405-17. Grossman, R. L., A. P. Heath, V. Ferretti, H. E. Varmus, D. R. Lowy, W. A. Kibbe, and L. M. Staudt. 2016. 'Toward a Shared Vision for Cancer Genomic Data', N Engl J Med, 375: 1109-12. Gruber, S. B., and W. D. Thompson. 1996. 'A population-based study of endometrial cancer and familial risk in younger women. Cancer and Steroid Hormone Study Group', Cancer Epidemiol Biomarkers Prev, 5: 411-7. Gu, X., P. J. Coates, L. Boldrup, and K. Nylander. 2008. 'p63 contributes to cell invasion and migration in squamous cell carcinoma of the head and neck', Cancer Lett, 263: 26-34. Gu, Z., R. Eils, and M. Schlesner. 2016. 'Complex heatmaps reveal patterns and correlations in multidimensional genomic data', Bioinformatics, 32: 2847-9. Guan, B., M. Gao, C. H. Wu, T. L. Wang, and M. Shih Ie. 2012. 'Functional analysis of in-frame indel ARID1A mutations reveals new regulatory mechanisms of its tumor suppressor functions', Neoplasia, 14: 986-93. Guan, B., T. L. Mao, P. K. Panuganti, E. Kuhn, R. J. Kurman, D. Maeda, E. Chen, Y. M. Jeng, T. L. Wang, and M. Shih Ie. 2011. 'Mutation and loss of expression of ARID1A in uterine low-grade endometrioid carcinoma', Am J Surg Pathol, 35: 625-32. Guan, B., Y. S. Rahmanto, R. C. Wu, Y. Wang, Z. Wang, T. L. Wang, and M. Shih Ie. 2014. 'Roles of deletion of Arid1a, a tumor suppressor, in mouse ovarian tumorigenesis', J Natl Cancer Inst, 106. Guan, B., T. L. Wang, and M. Shih Ie. 2011. 'ARID1A, a factor that promotes formation of SWI/SNF-mediated chromatin remodeling, is a tumor suppressor in gynecologic cancers', Cancer Res, 71: 6718-27. Gucer, F., O. Reich, K. Tamussino, A. A. Bader, D. Pieber, W. Scholl, J. Haas, and E. Petru. 1998. 'Concomitant endometrial hyperplasia in patients with endometrial carcinoma', Gynecol Oncol, 69: 64-8. 443 Guichard, C., G. Amaddeo, S. Imbeaud, Y. Ladeiro, L. Pelletier, I. B. Maad, J. Calderaro, P. Bioulac-Sage, M. Letexier, F. Degos, B. Clement, C. Balabaud, E. Chevet, A. Laurent, G. Couchy, E. Letouze, F. Calvo, and J. Zucman-Rossi. 2012. 'Integrated analysis of somatic mutations and focal copy-number changes identifies key genes and pathways in hepatocellular carcinoma', Nat Genet, 44: 694-8. Guo, H., W. Kong, L. Zhang, J. Han, L. H. Clark, Y. Yin, Z. Fang, W. Sun, J. Wang, T. P. Gilliam, D. Lee, L. Makowski, C. Zhou, and V. L. Bae-Jump. 2019. 'Reversal of obesity- driven aggressiveness of endometrial cancer by metformin', Am J Cancer Res, 9: 2170- 93. Guo, R., L. Zheng, J. W. Park, R. Lv, H. Chen, F. Jiao, W. Xu, S. Mu, H. Wen, J. Qiu, Z. Wang, P. Yang, F. Wu, J. Hui, X. Fu, X. Shi, Y. G. Shi, Y. Xing, F. Lan, and Y. Shi. 2014. 'BS69/ZMYND11 reads and connects histone H3.3 lysine 36 trimethylation-decorated chromatin to regulated pre-mRNA processing', Mol Cell, 56: 298-310. Gusmao, E. G., M. Allhoff, M. Zenke, and I. G. Costa. 2016. 'Analysis of computational footprinting methods for DNase sequencing experiments', Nat Methods, 13: 303-9. Ha, M., D. C. Kraushaar, and K. Zhao. 2014. 'Genome-wide analysis of H3.3 dissociation reveals high nucleosome turnover at distal regulatory regions of embryonic stem cells', Epigenetics Chromatin, 7: 38. Hai, T., and T. Curran. 1991. 'Cross-family dimerization of transcription factors Fos/Jun and ATF/CREB alters DNA binding specificity', Proc Natl Acad Sci U S A, 88: 3720-4. Hai, T., C. D. Wolfgang, D. K. Marsee, A. E. Allen, and U. Sivaprasad. 1999. 'ATF3 and stress responses', Gene Expr, 7: 321-35. Haines, R. R., B. G. Barwick, C. D. Scharer, P. Majumder, T. D. Randall, and J. M. Boss. 2018. 'The Histone Demethylase LSD1 Regulates B Cell Proliferation and Plasmablast Differentiation', J Immunol, 201: 2799-811. Hajkova, N., I. Ticha, J. Hojny, K. Nemejcova, M. Bartu, R. Michalkova, M. Zikan, D. Cibula, J. Laco, T. Geryk, G. Mehes, and P. Dundr. 2019. 'Synchronous endometrioid endometrial and ovarian carcinomas are biologically related: A clinico-pathological and molecular (next generation sequencing) study of 22 cases', Oncol Lett, 17: 2207-14. Hanel, W., and U. M. Moll. 2012. 'Links between mutant p53 and genomic instability', J Cell Biochem, 113: 433-9. 444 Harlen, K. M., and L. S. Churchman. 2017. 'The code and beyond: transcription regulation by the RNA polymerase II carboxy-terminal domain', Nat Rev Mol Cell Biol, 18: 263-73. Harrow, J., A. Frankish, J. M. Gonzalez, E. Tapanari, M. Diekhans, F. Kokocinski, B. L. Aken, D. Barrell, A. Zadissa, S. Searle, I. Barnes, A. Bignell, V. Boychenko, T. Hunt, M. Kay, G. Mukherjee, J. Rajan, G. Despacio-Reyes, G. Saunders, C. Steward, R. Harte, M. Lin, C. Howald, A. Tanzer, T. Derrien, J. Chrast, N. Walters, S. Balasubramanian, B. Pei, M. Tress, J. M. Rodriguez, I. Ezkurdia, J. van Baren, M. Brent, D. Haussler, M. Kellis, A. Valencia, A. Reymond, M. Gerstein, R. Guigo, and T. J. Hubbard. 2012. 'GENCODE: the reference human genome annotation for The ENCODE Project', Genome Res, 22: 1760-74. Hartman, M. G., D. Lu, M. L. Kim, G. J. Kociba, T. Shukri, J. Buteau, X. Wang, W. L. Frankel, D. Guttridge, M. Prentki, S. T. Grey, D. Ron, and T. Hai. 2004. 'Role for activating transcription factor 3 in stress-induced beta-cell apoptosis', Mol Cell Biol, 24: 5721-32. Harvey, D. M., and A. J. Levine. 1991. 'p53 alteration is a common event in the spontaneous immortalization of primary BALB/c murine embryo fibroblasts', Genes Dev, 5: 2375-85. Hassan, A. H., S. Awad, and P. Prochasson. 2006. 'The Swi2/Snf2 bromodomain is required for the displacement of SAGA and the octamer transfer of SAGA-acetylated nucleosomes', J Biol Chem, 281: 18126-34. Hassan, A. H., K. E. Neely, and J. L. Workman. 2001. 'Histone acetyltransferase complexes stabilize swi/snf binding to promoter nucleosomes', Cell, 104: 817-27. Hassan, A. H., P. Prochasson, K. E. Neely, S. C. Galasinski, M. Chandy, M. J. Carrozza, and J. L. Workman. 2002. 'Function and selectivity of bromodomains in anchoring chromatin- modifying complexes to promoter nucleosomes', Cell, 111: 369-79. Hawkins, S. M., C. J. Creighton, D. Y. Han, A. Zariff, M. L. Anderson, P. H. Gunaratne, and M. M. Matzuk. 2011. 'Functional microRNA involved in endometriosis', Mol Endocrinol, 25: 821-32. Hazan, I., J. Monin, B. A. M. Bouwman, N. Crosetto, and R. I. Aqeilan. 2019. 'Activation of Oncogenic Super-Enhancers Is Coupled with DNA Repair by RAD51', Cell Rep, 29: 560-72 e4. 445 He, F., J. Li, J. Xu, S. Zhang, Y. Xu, W. Zhao, Z. Yin, and X. Wang. 2015. 'Decreased expression of ARID1A associates with poor prognosis and promotes metastases of hepatocellular carcinoma', J Exp Clin Cancer Res, 34: 47. He, S., Z. Wu, Y. Tian, Z. Yu, J. Yu, X. Wang, J. Li, B. Liu, and Y. Xu. 2020. 'Structure of nucleosome-bound human BAF complex', Science, 367: 875-81. Heery, D. M., E. Kalkhoven, S. Hoare, and M. G. Parker. 1997. 'A signature motif in transcriptional co-activators mediates binding to nuclear receptors', Nature, 387: 733-6. Heintzman, N. D., R. K. Stuart, G. Hon, Y. Fu, C. W. Ching, R. D. Hawkins, L. O. Barrera, S. Van Calcar, C. Qu, K. A. Ching, W. Wang, Z. Weng, R. D. Green, G. E. Crawford, and B. Ren. 2007. 'Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome', Nat Genet, 39: 311-8. Heinz, S., C. Benner, N. Spann, E. Bertolino, Y. C. Lin, P. Laslo, J. X. Cheng, C. Murre, H. Singh, and C. K. Glass. 2010. 'Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities', Mol Cell, 38: 576-89. Helming, K. C., X. Wang, and C. W. M. Roberts. 2014. 'Vulnerabilities of mutant SWI/SNF complexes in cancer', Cancer Cell, 26: 309-17. Hendrickson, M., J. Ross, P. J. Eifel, R. S. Cox, A. Martinez, and R. Kempson. 1982. 'Adenocarcinoma of the endometrium: analysis of 256 cases with carcinoma limited to the uterine corpus. Pathology review and analysis of prognostic variables', Gynecol Oncol, 13: 373-92. Henikoff, J. G., J. A. Belsky, K. Krassovsky, D. M. MacAlpine, and S. Henikoff. 2011. 'Epigenome characterization at single base-pair resolution', Proc Natl Acad Sci U S A, 108: 18318-23. Hill, D. A., S. Chiosea, S. Jamaluddin, K. Roy, A. H. Fischer, D. D. Boyd, J. A. Nickerson, and A. N. Imbalzano. 2004. 'Inducible changes in cell size and attachment area due to expression of a mutant SWI/SNF chromatin remodeling enzyme', J Cell Sci, 117: 5847- 54. Hilliard, S., R. Song, H. Liu, C. H. Chen, Y. Li, M. Baddoo, E. Flemington, A. Wanek, J. Kolls, Z. Saifudeen, and S. S. El-Dahr. 2019. 'Defining the dynamic chromatin landscape of mouse nephron progenitors', Biol Open, 8. 446 Hinkula, M., E. Pukkala, P. Kyyronen, and A. Kauppila. 2002. 'Grand multiparity and incidence of endometrial cancer: a population-based study in Finland', Int J Cancer, 98: 912-5. Hiramatsu, Y., A. Fukuda, S. Ogawa, N. Goto, K. Ikuta, M. Tsuda, Y. Matsumoto, Y. Kimura, T. Yoshioka, Y. Takada, T. Maruno, Y. Hanyu, T. Tsuruyama, Z. Wang, H. Akiyama, S. Takaishi, H. Miyoshi, M. M. Taketo, T. Chiba, and H. Seno. 2019. 'Arid1a is essential for intestinal stem cells through Sox9 regulation', Proc Natl Acad Sci U S A, 116: 1704-13. Hirschhorn, J. N., S. A. Brown, C. D. Clark, and F. Winston. 1992. 'Evidence that SNF2/SWI2 and SNF5 activate transcription in yeast by altering chromatin structure', Genes Dev, 6: 2288-98. Hnisz, D., B. J. Abraham, T. I. Lee, A. Lau, V. Saint-Andre, A. A. Sigova, H. A. Hoke, and R. A. Young. 2013. 'Super-enhancers in the control of cell identity and disease', Cell, 155: 934-47. Ho, L., E. L. Miller, J. L. Ronan, W. Q. Ho, R. Jothi, and G. R. Crabtree. 2011. 'esBAF facilitates pluripotency by conditioning the genome for LIF/STAT3 signalling and by regulating polycomb function', Nat Cell Biol, 13: 903-13. Ho, L., J. L. Ronan, J. Wu, B. T. Staahl, L. Chen, A. Kuo, J. Lessard, A. I. Nesvizhskii, J. Ranish, and G. R. Crabtree. 2009. 'An embryonic stem cell chromatin remodeling complex, esBAF, is essential for embryonic stem cell self-renewal and pluripotency', Proc Natl Acad Sci U S A, 106: 5181-6. Ho, P. J., S. M. Lloyd, and X. Bao. 2019. 'Unwinding chromatin at the right places: how BAF is targeted to specific genomic locations during development', Development, 146. Hoadley, K. A., C. Yau, T. Hinoue, D. M. Wolf, A. J. Lazar, E. Drill, R. Shen, A. M. Taylor, A. D. Cherniack, V. Thorsson, R. Akbani, R. Bowlby, C. K. Wong, M. Wiznerowicz, F. Sanchez-Vega, A. G. Robertson, B. G. Schneider, M. S. Lawrence, H. Noushmehr, T. M. Malta, Network Cancer Genome Atlas, J. M. Stuart, C. C. Benz, and P. W. Laird. 2018. 'Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer', Cell, 173: 291-304 e6. Hodges, C., J. G. Kirkland, and G. R. Crabtree. 2016. 'The Many Roles of BAF (mSWI/SNF) and PBAF Complexes in Cancer', Cold Spring Harb Perspect Med, 6. Hodges, H. C., B. Z. Stanton, K. Cermakova, C. Y. Chang, E. L. Miller, J. G. Kirkland, W. L. Ku, V. Veverka, K. Zhao, and G. R. Crabtree. 2018. 'Dominant-negative SMARCA4 447 mutants alter the accessibility landscape of tissue-unrestricted enhancers', Nat Struct Mol Biol, 25: 61-72. Holstege, F. C., E. G. Jennings, J. J. Wyrick, T. I. Lee, C. J. Hengartner, M. R. Green, T. R. Golub, E. S. Lander, and R. A. Young. 1998. 'Dissecting the regulatory circuitry of a eukaryotic genome', Cell, 95: 717-28. Hosoya, T., R. D'Oliveira Albanus, J. Hensley, G. Myers, Y. Kyono, J. Kitzman, S. C. J. Parker, and J. D. Engel. 2018. 'Global dynamics of stage-specific transcription factor binding during thymocyte development', Sci Rep, 8: 5605. Hota, S. K., J. R. Johnson, E. Verschueren, R. Thomas, A. M. Blotnick, Y. Zhu, X. Sun, L. A. Pennacchio, N. J. Krogan, and B. G. Bruneau. 2019. 'Dynamic BAF chromatin remodeling complex subunit inclusion promotes temporally distinct gene expression programs in cardiogenesis', Development, 146. Hsieh, F. K., O. I. Kulaeva, S. S. Patel, P. N. Dyer, K. Luger, D. Reinberg, and V. M. Studitsky. 2013. 'Histone chaperone FACT action during transcription through chromatin by RNA polymerase II', Proc Natl Acad Sci U S A, 110: 7654-9. Hsu, A. L., I. Khachikyan, and P. Stratton. 2010. 'Invasive and noninvasive methods for the diagnosis of endometriosis', Clin Obstet Gynecol, 53: 413-9. Ignatiadis, N., B. Klaus, J. B. Zaugg, and W. Huber. 2016. 'Data-driven hypothesis weighting increases detection power in genome-scale multiple testing', Nat Methods, 13: 577-80. Inoue, H., T. Furukawa, S. Giannakopoulos, S. Zhou, D. S. King, and N. Tanese. 2002. 'Largest subunits of the human SWI/SNF chromatin-remodeling complex promote transcriptional activation by steroid hormone receptors', J Biol Chem, 277: 41674-85. Inoue, S., Y. Hirota, T. Ueno, Y. Fukui, E. Yoshida, T. Hayashi, S. Kojima, R. Takeyama, T. Hashimoto, T. Kiyono, M. Ikemura, A. Taguchi, T. Tanaka, Y. Tanaka, S. Sakata, K. Takeuchi, A. Muraoka, S. Osuka, T. Saito, K. Oda, Y. Osuga, Y. Terao, M. Kawazu, and H. Mano. 2019. 'Uterine adenomyosis is an oligoclonal disorder associated with KRAS mutations', Nat. Commun., 10: 5785. Ip, P. P., J. A. Irving, W. G. McCluggage, P. B. Clement, and R. H. Young. 2013. 'Papillary proliferation of the endometrium: a clinicopathologic study of 59 cases of simple and complex papillae without cytologic atypia', Am J Surg Pathol, 37: 167-77. 448 Ismiil, N. D., G. Rasty, Z. Ghorab, S. Nofech-Mozes, M. Bernardini, G. Thomas, I. Ackerman, A. Covens, and M. A. Khalifa. 2007. 'Adenomyosis is associated with myometrial invasion by FIGO 1 endometrial adenocarcinoma', Int J Gynecol Pathol, 26: 278-83. Jandial, Rahul. 2013. Metastatic cancer : clinical and biological perspectives (Landes Bioscience: Austin, Texas, USA). Janicek, M. F., and N. B. Rosenshein. 1994. 'Invasive endometrial cancer in uteri resected for atypical endometrial hyperplasia', Gynecol Oncol, 52: 373-8. Jen, H. I., M. C. Hill, L. Tao, K. Sheng, W. Cao, H. Zhang, H. V. Yu, J. Llamas, C. Zong, J. F. Martin, N. Segil, and A. K. Groves. 2019. 'Transcriptomic and epigenetic regulation of hair cell regeneration in the mouse utricle and its potentiation by Atoh1', Elife, 8. Jeong, J. W., H. S. Lee, H. L. Franco, R. R. Broaddus, M. M. Taketo, S. Y. Tsai, J. P. Lydon, and F. J. DeMayo. 2009. 'beta-catenin mediates glandular formation and dysregulation of beta-catenin induces hyperplasia formation in the murine uterus', Oncogene, 28: 31-40. Jia, G., J. Preussner, X. Chen, S. Guenther, X. Yuan, M. Yekelchyk, C. Kuenne, M. Looso, Y. Zhou, S. Teichmann, and T. Braun. 2018. 'Single cell RNA-seq and ATAC-seq analysis of cardiac progenitor cell transition states and lineage settlement', Nat Commun, 9: 4877. Jiao, F., Z. Li, C. He, W. Xu, G. Yang, T. Liu, H. Shen, J. Cai, J. N. Anastas, Y. Mao, Y. Yu, F. Lan, Y. G. Shi, C. Jones, Y. Xu, S. J. Baker, Y. Shi, and R. Guo. 2020. 'RACK7 recognizes H3.3G34R mutation to suppress expression of MHC class II complex components and their delivery pathway in pediatric glioblastoma', Sci Adv, 6: eaba2113. Jin, Q., L. R. Yu, L. Wang, Z. Zhang, L. H. Kasper, J. E. Lee, C. Wang, P. K. Brindle, S. Y. Dent, and K. Ge. 2011. 'Distinct roles of GCN5/PCAF-mediated H3K9ac and CBP/p300- mediated H3K18/27ac in nuclear receptor transactivation', EMBO J, 30: 249-62. Jones, S., T. L. Wang, M. Shih Ie, T. L. Mao, K. Nakayama, R. Roden, R. Glas, D. Slamon, L. A. Diaz, Jr., B. Vogelstein, K. W. Kinzler, V. E. Velculescu, and N. Papadopoulos. 2010. 'Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma', Science, 330: 228-31. Joshi, A., C. Miller, Jr., S. J. Baker, and L. H. Ellenson. 2015. 'Activated mutant p110alpha causes endometrial carcinoma in the setting of biallelic Pten deletion', Am J Pathol, 185: 1104-13. 449 Kadoch, C., and G. R. Crabtree. 2015. 'Mammalian SWI/SNF chromatin remodeling complexes and cancer: Mechanistic insights gained from human genomics', Sci Adv, 1: e1500447. Kadoch, C., D. C. Hargreaves, C. Hodges, L. Elias, L. Ho, J. Ranish, and G. R. Crabtree. 2013. 'Proteomic and bioinformatic analysis of mammalian SWI/SNF complexes identifies extensive roles in human malignancy', Nat Genet, 45: 592-601. Kaeser, M. D., A. Aslanian, M. Q. Dong, J. R. Yates, 3rd, and B. M. Emerson. 2008. 'BRD7, a novel PBAF-specific SWI/SNF subunit, is required for target gene activation and repression in embryonic stem cells', J Biol Chem, 283: 32254-63. Kalluri, R., and R. A. Weinberg. 2009. 'The basics of epithelial-mesenchymal transition', J Clin Invest, 119: 1420-8. Karabacak Calviello, A., A. Hirsekorn, R. Wurmus, D. Yusuf, and U. Ohler. 2019. 'Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling', Genome Biol, 20: 42. Kasper, L. H., T. Fukuyama, M. A. Biesen, F. Boussouar, C. Tong, A. de Pauw, P. J. Murray, J. M. van Deursen, and P. K. Brindle. 2006. 'Conditional knockout mice reveal distinct functions for the global transcriptional coactivators CBP and p300 in T-cell development', Mol Cell Biol, 26: 789-809. Kastenhuber, E. R., and S. W. Lowe. 2017. 'Putting p53 in Context', Cell, 170: 1062-78. Kastner, P., A. Krust, B. Turcotte, U. Stropp, L. Tora, H. Gronemeyer, and P. Chambon. 1990. 'Two distinct estrogen-regulated promoters generate transcripts encoding the two functionally different human progesterone receptor forms A and B', EMBO J, 9: 1603-14. Kaufmann, O., E. Fietze, J. Mengs, and M. Dietel. 2001. 'Value of p63 and cytokeratin 5/6 as immunohistochemical markers for the differential diagnosis of poorly differentiated and undifferentiated carcinomas', Am J Clin Pathol, 116: 823-30. Kawase, T., R. Ohki, T. Shibata, S. Tsutsumi, N. Kamimura, J. Inazawa, T. Ohta, H. Ichikawa, H. Aburatani, F. Tashiro, and Y. Taya. 2009. 'PH domain-only protein PHLDA3 is a p53- regulated repressor of Akt', Cell, 136: 535-50. 450 Keller, A., A. I. Nesvizhskii, E. Kolker, and R. Aebersold. 2002. 'Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search', Anal Chem, 74: 5383-92. Kelso, T. W. R., D. K. Porter, M. L. Amaral, M. N. Shokhirev, C. Benner, and D. C. Hargreaves. 2017. 'Chromatin accessibility underlies synthetic lethality of SWI/SNF subunits in ARID1A-mutant cancers', Elife, 6. Kent, W. J., C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. 2002. 'The human genome browser at UCSC', Genome Res, 12: 996-1006. Khalique, S., K. Naidoo, A. D. Attygalle, D. Kriplani, F. Daley, A. Lowe, J. Campbell, T. Jones, M. Hubank, K. Fenwick, N. Matthews, A. G. Rust, C. J. Lord, S. Banerjee, and R. Natrajan. 2018. 'Optimised ARID1A immunohistochemistry is an accurate predictor of ARID1A mutational status in gynaecological cancers', J Pathol Clin Res, 4: 154-66. Kim, H. I., T. H. Kim, J. Y. Yoo, S. L. Young, B. A. Lessey, B. J. Ku, and J. W. Jeong. 2021. 'ARID1A and PGR proteins interact in the endometrium and reveal a positive correlation in endometriosis', Biochem Biophys Res Commun, 550: 151-57. Kim, H., M. Kim, S. K. Im, and S. Fang. 2018. 'Mouse Cre-LoxP system: general principles to determine tissue-specific roles of target genes', Lab Anim Res, 34: 147-59. Kim, M., F. Lu, and Y. Zhang. 2016. 'Loss of HDAC-Mediated Repression and Gain of NF- kappaB Activation Underlie Cytokine Induction in ARID1A- and PIK3CA-Mutation- Driven Ovarian Cancer', Cell Rep, 17: 275-88. Kim, S. M., and J. S. Kim. 2017. 'A Review of Mechanisms of Implantation', Dev. Reprod., 21: 351-59. Kim, T. H., J. Y. Yoo, Z. Wang, J. P. Lydon, S. Khatri, S. M. Hawkins, R. E. Leach, A. T. Fazleabas, S. L. Young, B. A. Lessey, B. J. Ku, and J. W. Jeong. 2015. 'ARID1A Is Essential for Endometrial Function during Early Pregnancy', PLoS Genet, 11: e1005537. Kirmizis, A., S. M. Bartley, A. Kuzmichev, R. Margueron, D. Reinberg, R. Green, and P. J. Farnham. 2004. 'Silencing of human polycomb target genes is associated with methylation of histone H3 Lys 27', Genes Dev, 18: 1592-605. 451 Klemm, S. L., Z. Shipony, and W. J. Greenleaf. 2019. 'Chromatin accessibility and the regulatory epigenome', Nat Rev Genet, 20: 207-20. Knofler, M., and J. Pollheimer. 2013. 'Human placental trophoblast invasion and differentiation: a particular focus on Wnt signaling', Front Genet, 4: 190. Kobel, M., B. M. Ronnett, N. Singh, R. A. Soslow, C. B. Gilks, and W. G. McCluggage. 2019. 'Interpretation of P53 Immunohistochemistry in Endometrial Carcinomas: Toward Increased Reproducibility', Int J Gynecol Pathol, 38 Suppl 1: S123-S31. Kok, V. C., H. J. Tsai, C. F. Su, and C. K. Lee. 2015. 'The Risks for Ovarian, Endometrial, Breast, Colorectal, and Other Cancers in Women With Newly Diagnosed Endometriosis or Adenomyosis: A Population-Based Study', Int J Gynecol Cancer, 25: 968-76. Kolin, D. L., C. M. Quick, F. Dong, C. D. M. Fletcher, C. J. R. Stewart, A. Soma, J. L. Hornick, M. R. Nucci, and B. E. Howitt. 2020. 'SMARCA4-deficient Uterine Sarcoma and Undifferentiated Endometrial Carcinoma Are Distinct Clinicopathologic Entities', Am J Surg Pathol, 44: 263-70. Kornberg, R. D., and Y. Lorch. 1992. 'Chromatin structure and transcription', Annu Rev Cell Biol, 8: 563-87. Kraushaar, D. C., Z. Chen, Q. Tang, K. Cui, J. Zhang, and K. Zhao. 2018. 'The gene repressor complex NuRD interacts with the histone variant H3.3 at promoters of active genes', Genome Res, 28: 1646-55. Kuhn, E., R. C. Wu, B. Guan, G. Wu, J. Zhang, Y. Wang, L. Song, X. Yuan, L. Wei, R. B. Roden, K. T. Kuo, K. Nakayama, B. Clarke, P. Shaw, N. Olvera, R. J. Kurman, D. A. Levine, T. L. Wang, and M. Shih Ie. 2012. 'Identification of molecular pathway aberrations in uterine serous carcinoma by genome-wide analyses', J Natl Cancer Inst, 104: 1503-13. Kuleshov, M. V., M. R. Jones, A. D. Rouillard, N. F. Fernandez, Q. Duan, Z. Wang, S. Koplev, S. L. Jenkins, K. M. Jagodnik, A. Lachmann, M. G. McDermott, C. D. Monteiro, G. W. Gundersen, and A. Ma'ayan. 2016. 'Enrichr: a comprehensive gene set enrichment analysis web server 2016 update', Nucleic Acids Res, 44: W90-7. Kurman, R. J., J. C. Felix, D. F. Archer, N. Nanavati, J. Arce, and D. L. Moyer. 2000. 'Norethindrone acetate and estradiol-induced endometrial hyperplasia', Obstet Gynecol, 96: 373-9. 452 Kurman, R. J., P. F. Kaminski, and H. J. Norris. 1985. 'The behavior of endometrial hyperplasia. A long-term study of "untreated" hyperplasia in 170 patients', Cancer, 56: 403-12. Kurman, R. J., and M. Shih Ie. 2016. 'The Dualistic Model of Ovarian Carcinogenesis: Revisited, Revised, and Expanded', Am J Pathol, 186: 733-47. Kurz, L., A. Miklyaeva, M. A. Skowron, N. Overbeck, G. Poschmann, T. Becker, K. Eul, T. Kurz, S. Schonberger, G. Calaminus, K. Stuhler, E. Dykhuizen, P. Albers, and D. Nettersheim. 2020. 'ARID1A Regulates Transcription and the Epigenetic Landscape via POLE and DMAP1 while ARID1A Deficiency or Pharmacological Inhibition Sensitizes Germ Cell Tumor Cells to ATR Inhibition', Cancers (Basel), 12. Kushner, P. J., D. A. Agard, G. L. Greene, T. S. Scanlan, A. K. Shiau, R. M. Uht, and P. Webb. 2000. 'Estrogen receptor pathways to AP-1', J Steroid Biochem Mol Biol, 74: 311-7. Kyo, S., J. Sakaguchi, S. Ohno, Y. Mizumoto, Y. Maida, M. Hashimoto, M. Nakamura, M. Takakura, M. Nakajima, K. Masutomi, and M. Inoue. 2006. 'High Twist expression is involved in infiltrative endometrial cancer and affects patient survival', Hum Pathol, 37: 431-8. Lac, V., T. M. Nazeran, B. Tessier-Cloutier, R. Aguirre-Hernandez, A. Albert, A. Lum, J. Khattra, T. Praetorius, M. Mason, D. Chiu, M. Kobel, P. J. Yong, B. C. Gilks, M. S. Anglesio, and D. G. Huntsman. 2019. 'Oncogenic mutations in histologically normal endometrium: the new normal?', J Pathol. Lac, V., L. Verhoef, R. Aguirre-Hernandez, T. M. Nazeran, B. Tessier-Cloutier, T. Praetorius, N. L. Orr, H. Noga, A. Lum, J. Khattra, L. M. Prentice, D. Co, M. Kobel, V. Mijatovic, A. F. Lee, J. Pasternak, M. C. Bleeker, B. Kramer, S. Y. Brucker, F. Kommoss, S. Kommoss, H. M. Horlings, P. J. Yong, D. G. Huntsman, and M. S. Anglesio. 2019. 'Iatrogenic endometriosis harbors somatic cancer-driver mutations', Hum Reprod, 34: 69- 78. Laird, P. W. 2010. 'Principles and challenges of genomewide DNA methylation analysis', Nat Rev Genet, 11: 191-203. Lakshminarasimhan, R., C. Andreu-Vieyra, K. Lawrenson, C. E. Duymich, S. A. Gayther, G. Liang, and P. A. Jones. 2017. 'Down-regulation of ARID1A is sufficient to initiate neoplastic transformation along with epigenetic reprogramming in non-tumorigenic endometriotic cells', Cancer Lett, 401: 11-19. 453 Lambert, A. W., D. R. Pattabiraman, and R. A. Weinberg. 2017. 'Emerging Biological Principles of Metastasis', Cell, 168: 670-91. Lambert, S. A., A. Jolma, L. F. Campitelli, P. K. Das, Y. Yin, M. Albu, X. Chen, J. Taipale, T. R. Hughes, and M. T. Weirauch. 2018. 'The Human Transcription Factors', Cell, 175: 598-99. Landt, S. G., G. K. Marinov, A. Kundaje, P. Kheradpour, F. Pauli, S. Batzoglou, B. E. Bernstein, P. Bickel, J. B. Brown, P. Cayting, Y. Chen, G. DeSalvo, C. Epstein, K. I. Fisher-Aylor, G. Euskirchen, M. Gerstein, J. Gertz, A. J. Hartemink, M. M. Hoffman, V. R. Iyer, Y. L. Jung, S. Karmakar, M. Kellis, P. V. Kharchenko, Q. Li, T. Liu, X. S. Liu, L. Ma, A. Milosavljevic, R. M. Myers, P. J. Park, M. J. Pazin, M. D. Perry, D. Raha, T. E. Reddy, J. Rozowsky, N. Shoresh, A. Sidow, M. Slattery, J. A. Stamatoyannopoulos, M. Y. Tolstorukov, K. P. White, S. Xi, P. J. Farnham, J. D. Lieb, B. J. Wold, and M. Snyder. 2012. 'ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia', Genome Res, 22: 1813-31. Lane, D. P., and L. V. Crawford. 1979. 'T antigen is bound to a host protein in SV40-transformed cells', Nature, 278: 261-3. Langer, L. F., J. M. Ward, and T. K. Archer. 2019. 'Tumor suppressor SMARCB1 suppresses super-enhancers to govern hESC lineage determination', Elife, 8. Langmead, B., and S. L. Salzberg. 2012. 'Fast gapped-read alignment with Bowtie 2', Nat Methods, 9: 357-9. Larsson, J. 2020. 'eulerr: Area-Proportional Euler and Venn Diagrams with Ellipses. R package version 6.1.0'. https://cran.r-project.org/package=eulerr. Lasko, L. M., C. G. Jakob, R. P. Edalji, W. Qiu, D. Montgomery, E. L. Digiammarino, T. M. Hansen, R. M. Risi, R. Frey, V. Manaves, B. Shaw, M. Algire, P. Hessler, L. T. Lam, T. Uziel, E. Faivre, D. Ferguson, F. G. Buchanan, R. L. Martin, M. Torrent, G. G. Chiang, K. Karukurichi, J. W. Langston, B. T. Weinert, C. Choudhary, P. de Vries, J. H. Van Drie, D. McElligott, E. Kesicki, R. Marmorstein, C. Sun, P. A. Cole, S. H. Rosenberg, M. R. Michaelides, A. Lai, and K. D. Bromberg. 2017. 'Discovery of a selective catalytic p300/CBP inhibitor that targets lineage-specific tumours', Nature, 550: 128-32. Laurent, B. C., I. Treich, and M. Carlson. 1993. 'The yeast SNF2/SWI2 protein has DNA- stimulated ATPase activity required for transcriptional activation', Genes Dev, 7: 583-91. 454 Laurent, B. C., M. A. Treitel, and M. Carlson. 1990. 'The SNF5 protein of Saccharomyces cerevisiae is a glutamine- and proline-rich transcriptional activator that affects expression of a broad spectrum of genes', Mol Cell Biol, 10: 5616-25. Law, C. W., Y. Chen, W. Shi, and G. K. Smyth. 2014. 'voom: Precision weights unlock linear model analysis tools for RNA-seq read counts', Genome Biol, 15: R29. Lawrence, M., W. Huber, H. Pages, P. Aboyoun, M. Carlson, R. Gentleman, M. T. Morgan, and V. J. Carey. 2013. 'Software for computing and annotating genomic ranges', PLoS Comput Biol, 9: e1003118. Lawrence, M. S., P. Stojanov, C. H. Mermel, J. T. Robinson, L. A. Garraway, T. R. Golub, M. Meyerson, S. B. Gabriel, E. S. Lander, and G. Getz. 2014. 'Discovery and saturation analysis of cancer genes across 21 tumour types', Nature, 505: 495-501. Le Gallo, M., and D. W. Bell. 2014. 'The emerging genomic landscape of endometrial cancer', Clin Chem, 60: 98-110. Le Gallo, M., A. J. O'Hara, M. L. Rudd, M. E. Urick, N. F. Hansen, N. J. O'Neil, J. C. Price, S. Zhang, B. M. England, A. K. Godwin, D. C. Sgroi, N. I. H. Intramural Sequencing Center Comparative Sequencing Program, P. Hieter, J. C. Mullikin, M. J. Merino, and D. W. Bell. 2012. 'Exome sequencing of serous endometrial tumors identifies recurrent somatic mutations in chromatin-remodeling and ubiquitin ligase complex genes', Nat Genet, 44: 1310-5. Le Gallo, M., M. L. Rudd, M. E. Urick, N. F. Hansen, S. Zhang, Nisc Comparative Sequencing Program, F. Lozy, D. C. Sgroi, A. Vidal Bel, X. Matias-Guiu, R. R. Broaddus, K. H. Lu, D. A. Levine, D. G. Mutch, P. J. Goodfellow, H. B. Salvesen, J. C. Mullikin, and D. W. Bell. 2017. 'Somatic mutation profiles of clear cell endometrial tumors revealed by whole exome and targeted gene sequencing', Cancer, 123: 3261-68. Lee, S. H., D. Harold, D. R. Nyholt, A. NZGene Consortium, Consortium International Endogene, Genetic, Consortium Environmental Risk for Alzheimer's disease, M. E. Goddard, K. T. Zondervan, J. Williams, G. W. Montgomery, N. R. Wray, and P. M. Visscher. 2013. 'Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis', Hum Mol Genet, 22: 832-41. Lessard, J., J. I. Wu, J. A. Ranish, M. Wan, M. M. Winslow, B. T. Staahl, H. Wu, R. Aebersold, I. A. Graef, and G. R. Crabtree. 2007. 'An essential switch in subunit composition of a chromatin remodeling complex during neural development', Neuron, 55: 201-15. 455 Levine, R. L., C. B. Cargile, M. S. Blazes, B. van Rees, R. J. Kurman, and L. H. Ellenson. 1998. 'PTEN mutations and microsatellite instability in complex atypical hyperplasia, a precursor lesion to uterine endometrioid carcinoma', Cancer Res, 58: 3254-8. Leyendecker, G., M. Herbertz, G. Kunz, and G. Mall. 2002. 'Endometriosis results from the dislocation of basal endometrium', Hum Reprod, 17: 2725-36. Leyendecker, G., L. Wildt, and G. Mall. 2009. 'The pathophysiology of endometriosis and adenomyosis: tissue injury and repair', Arch Gynecol Obstet, 280: 529-38. Li, B., M. Carey, and J. L. Workman. 2007. 'The role of chromatin during transcription', Cell, 128: 707-19. Li, B., and C. N. Dewey. 2011. 'RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome', BMC Bioinformatics, 12: 323. Li, C., Z. L. Xu, Z. Zhao, Q. An, L. Wang, Y. Yu, and D. X. Piao. 2017. 'ARID1A gene knockdown promotes neuroblastoma migration and invasion', Neoplasma, 64: 367-76. Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and Subgroup Genome Project Data Processing. 2009. 'The Sequence Alignment/Map format and SAMtools', Bioinformatics, 25: 2078-9. Li, N., Y. Li, J. Lv, X. Zheng, H. Wen, H. Shen, G. Zhu, T. Y. Chen, S. S. Dhar, P. Y. Kan, Z. Wang, R. Shiekhattar, X. Shi, F. Lan, K. Chen, W. Li, H. Li, and M. G. Lee. 2016. 'ZMYND8 Reads the Dual Histone Mark H3K4me1-H3K14ac to Antagonize the Expression of Metastasis-Linked Genes', Mol Cell, 63: 470-84. Li, S. F., T. Shiozawa, K. Nakayama, T. Nikaido, and S. Fujii. 1996. 'Stepwise abnormality of sex steroid hormone receptors, tumor suppressor gene products (p53 and Rb), and cyclin E in uterine endometrioid carcinoma', Cancer, 77: 321-9. Li, X., M. Guo, L. Cai, T. Du, Y. Liu, H. F. Ding, H. Wang, J. Zhang, X. Chen, and C. Yan. 2020. 'Competitive ubiquitination activates the tumor suppressor p53', Cell Death Differ, 27: 1807-18. Li, X. S., P. Trojer, T. Matsumura, J. E. Treisman, and N. Tanese. 2010. 'Mammalian SWI/SNF- -a subunit BAF250/ARID1 is an E3 ubiquitin ligase that targets histone H2B', Mol Cell Biol, 30: 1673-88. 456 Li, X., Y. Zhang, L. Zhao, L. Wang, Z. Wu, Q. Mei, J. Nie, X. Li, Y. Li, X. Fu, X. Wang, Y. Meng, and W. Han. 2014. 'Whole-exome sequencing of endometriosis identifies frequent alterations in genes involved in cell adhesion and chromatin-remodeling complexes', Hum Mol Genet, 23: 6008-21. Liang, H., L. W. Cheung, J. Li, Z. Ju, S. Yu, K. Stemke-Hale, T. Dogruluk, Y. Lu, X. Liu, C. Gu, W. Guo, S. E. Scherer, H. Carter, S. N. Westin, M. D. Dyer, R. G. Verhaak, F. Zhang, R. Karchin, C. G. Liu, K. H. Lu, R. R. Broaddus, K. L. Scott, B. T. Hennessy, and G. B. Mills. 2012. 'Whole-exome sequencing combined with functional genomics reveals novel candidate driver cancer genes in endometrial cancer', Genome Res, 22: 2120-9. Liberzon, A., C. Birger, H. Thorvaldsdottir, M. Ghandi, J. P. Mesirov, and P. Tamayo. 2015. 'The Molecular Signatures Database (MSigDB) hallmark gene set collection', Cell Syst, 1: 417-25. Lickert, H., J. K. Takeuchi, I. Von Both, J. R. Walls, F. McAuliffe, S. L. Adamson, R. M. Henkelman, J. L. Wrana, J. Rossant, and B. G. Bruneau. 2004. 'Baf60c is essential for function of BAF chromatin remodelling complexes in heart development', Nature, 432: 107-12. Lieberman-Aiden, E., N. L. van Berkum, L. Williams, M. Imakaev, T. Ragoczy, A. Telling, I. Amit, B. R. Lajoie, P. J. Sabo, M. O. Dorschner, R. Sandstrom, B. Bernstein, M. A. Bender, M. Groudine, A. Gnirke, J. Stamatoyannopoulos, L. A. Mirny, E. S. Lander, and J. Dekker. 2009. 'Comprehensive mapping of long-range interactions reveals folding principles of the human genome', Science, 326: 289-93. Ligresti, G., L. Militello, L. S. Steelman, A. Cavallaro, F. Basile, F. Nicoletti, F. Stivala, J. A. McCubrey, and M. Libra. 2009. 'PIK3CA mutations in human solid tumors: role in sensitivity to various therapeutic approaches', Cell Cycle, 8: 1352-8. Liu, C., M. Wang, X. Wei, L. Wu, J. Xu, X. Dai, J. Xia, M. Cheng, Y. Yuan, P. Zhang, J. Li, T. Feng, A. Chen, W. Zhang, F. Chen, Z. Shang, X. Zhang, B. A. Peters, and L. Liu. 2019. 'An ATAC-seq atlas of chromatin accessibility in mouse tissues', Sci Data, 6: 65. Local, A., H. Huang, C. P. Albuquerque, N. Singh, A. Y. Lee, W. Wang, C. Wang, J. E. Hsia, A. K. Shiau, K. Ge, K. D. Corbett, D. Wang, H. Zhou, and B. Ren. 2018. 'Identification of H3K4me1-associated proteins at mammalian enhancers', Nat Genet, 50: 73-82. Long, H. K., S. L. Prescott, and J. Wysocka. 2016. 'Ever-Changing Landscapes: Transcriptional Enhancers in Development and Evolution', Cell, 167: 1170-87. 457 Lorincz, A. M., and S. Sukumar. 2006. 'Molecular links between obesity and breast cancer', Endocr Relat Cancer, 13: 279-92. Love, M. I., S. Anders, V. Kim, and W. Huber. 2015. 'RNA-Seq workflow: gene-level exploratory analysis and differential expression', F1000Res, 4: 1070. Love, M. I., W. Huber, and S. Anders. 2014. 'Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2', Genome Biol, 15: 550. Loven, J., H. A. Hoke, C. Y. Lin, A. Lau, D. A. Orlando, C. R. Vakoc, J. E. Bradner, T. I. Lee, and R. A. Young. 2013. 'Selective inhibition of tumor oncogenes by disruption of super- enhancers', Cell, 153: 320-34. Lozano, G. 2010. 'Mouse models of p53 functions', Cold Spring Harb Perspect Biol, 2: a001115. Lu, D., C. D. Wolfgang, and T. Hai. 2006. 'Activating transcription factor 3, a stress-inducible gene, suppresses Ras-stimulated tumorigenesis', J Biol Chem, 281: 10473-81. Lu, K. H., and R. R. Broaddus. 2020. 'Endometrial Cancer', N Engl J Med, 383: 2053-64. Luger, K., and T. J. Richmond. 1998. 'The histone tails of the nucleosome', Curr Opin Genet Dev, 8: 140-6. Lukashchuk, N., and K. H. Vousden. 2007. 'Ubiquitination and degradation of mutant p53', Mol Cell Biol, 27: 8284-95. Lun, A. T., and G. K. Smyth. 2016. 'csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows', Nucleic Acids Res, 44: e45. Luo, B., H. W. Cheung, A. Subramanian, T. Sharifnia, M. Okamoto, X. Yang, G. Hinkle, J. S. Boehm, R. Beroukhim, B. A. Weir, C. Mermel, D. A. Barbie, T. Awad, X. Zhou, T. Nguyen, B. Piqani, C. Li, T. R. Golub, M. Meyerson, N. Hacohen, W. C. Hahn, E. S. Lander, D. M. Sabatini, and D. E. Root. 2008. 'Highly parallel identification of essential genes in cancer cells', Proc Natl Acad Sci U S A, 105: 20380-5. Luo, Q., X. Wu, W. Chang, P. Zhao, Y. Nan, X. Zhu, J. P. Katz, D. Su, and Z. Liu. 2020. 'ARID1A prevents squamous cell carcinoma initiation and chemoresistance by 458 antagonizing pRb/E2F1/c-Myc-mediated cancer stemness', Cell Death Differ, 27: 1981- 97. Maeda, D., T. L. Mao, M. Fukayama, S. Nakagawa, T. Yano, Y. Taketani, and M. Shih Ie. 2010. 'Clinicopathological significance of loss of ARID1A immunoreactivity in ovarian clear cell carcinoma', Int J Mol Sci, 11: 5120-8. Maglott, D., J. Ostell, K. D. Pruitt, and T. Tatusova. 2005. 'Entrez Gene: gene-centered information at NCBI', Nucleic Acids Res, 33: D54-8. Maheshwari, A., S. Gurunath, F. Fatima, and S. Bhattacharya. 2012. 'Adenomyosis and subfertility: a systematic review of prevalence, diagnosis, treatment and fertility outcomes', Hum Reprod Update, 18: 374-92. Mak, M. P., P. Tong, L. Diao, R. J. Cardnell, D. L. Gibbons, W. N. William, F. Skoulidis, E. R. Parra, J. Rodriguez-Canales, Wistuba, II, J. V. Heymach, J. N. Weinstein, K. R. Coombes, J. Wang, and L. A. Byers. 2016. 'A Patient-Derived, Pan-Cancer EMT Signature Identifies Global Molecular Alterations and Immune Target Enrichment Following Epithelial-to-Mesenchymal Transition', Clin Cancer Res, 22: 609-20. Mamo, A., L. Cavallone, S. Tuzmen, C. Chabot, C. Ferrario, S. Hassan, H. Edgren, O. Kallioniemi, O. Aleynikova, E. Przybytkowski, K. Malcolm, S. Mousses, P. N. Tonin, and M. Basik. 2012. 'An integrated genomic approach identifies ARID1A as a candidate tumor-suppressor gene in breast cancer', Oncogene, 31: 2090-100. Mao, T. L., L. Ardighieri, A. Ayhan, K. T. Kuo, C. H. Wu, T. L. Wang, and M. Shih Ie. 2013. 'Loss of ARID1A expression correlates with stages of tumor progression in uterine endometrioid carcinoma', Am J Surg Pathol, 37: 1342-8. Mao, T. L., and M. Shih Ie. 2013. 'The roles of ARID1A in gynecologic cancer', J Gynecol Oncol, 24: 376-81. Mareel, M., and A. Leroy. 2003. 'Clinical, cellular, and molecular aspects of cancer invasion', Physiol Rev, 83: 337-76. Marino, S., M. Vooijs, H. van Der Gulden, J. Jonkers, and A. Berns. 2000. 'Induction of medulloblastomas in p53-null mutant mice by somatic inactivation of Rb in the external granular layer cells of the cerebellum', Genes Dev, 14: 994-1004. 459 Marquardt, R. M., T. H. Kim, J. H. Shin, and J. W. Jeong. 2019. 'Progesterone and Estrogen Signaling in the Endometrium: What Goes Wrong in Endometriosis?', Int J Mol Sci, 20. Marquardt, R. M., T. H. Kim, J. Y. Yoo, H. E. Teasley, A. T. Fazleabas, S. L. Young, B. A. Lessey, R. Arora, and J. W. Jeong. 2021. 'Endometrial epithelial ARID1A is critical for uterine gland function in early pregnancy establishment', FASEB J, 35: e21209. Marques, J. G., B. E. Gryder, B. Pavlovic, Y. Chung, Q. A. Ngo, F. Frommelt, M. Gstaiger, Y. Song, K. Benischke, D. Laubscher, M. Wachtel, J. Khan, and B. W. Schafer. 2020. 'NuRD subunit CHD4 regulates super-enhancer accessibility in rhabdomyosarcoma and represents a general tumor dependency', Elife, 9. Marquez-Vilendrer, S. B., S. K. Rai, S. J. Gramling, L. Lu, and D. N. Reisman. 2016. 'Loss of the SWI/SNF ATPase subunits BRM and BRG1 drives lung cancer development', Oncoscience, 3: 322-36. Marquez-Vilendrer, S. B., K. Thompson, L. Lu, and D. Reisman. 2016. 'Mechanism of BRG1 silencing in primary cancers', Oncotarget, 7: 56153-69. Martin, M. 2011. 'Cutadapt removes adapter sequences from high-throughput sequencing reads.', EMBnet. J., 17: 10-12. Martins, A. L., N. M. Walavalkar, W. D. Anderson, C. Zang, and M. J. Guertin. 2018. 'Universal correction of enzymatic sequence bias reveals molecular signatures of protein/DNA interactions', Nucleic Acids Res, 46: e9. Mashtalir, N., A. R. D'Avino, B. C. Michel, J. Luo, J. Pan, J. E. Otto, H. J. Zullow, Z. M. McKenzie, R. L. Kubiak, R. St Pierre, A. M. Valencia, S. J. Poynter, S. H. Cassel, J. A. Ranish, and C. Kadoch. 2018. 'Modular Organization and Assembly of SWI/SNF Family Chromatin Remodeling Complexes', Cell, 175: 1272-88 e20. Mathur, R., B. H. Alver, A. K. San Roman, B. G. Wilson, X. Wang, A. T. Agoston, P. J. Park, R. A. Shivdasani, and C. W. Roberts. 2017. 'ARID1A loss impairs enhancer-mediated gene regulation and drives colon cancer in mice', Nat Genet, 49: 296-302. McCluggage, W. G., R. A. Soslow, and C. B. Gilks. 2011. 'Patterns of p53 immunoreactivity in endometrial carcinomas: 'all or nothing' staining is of importance', Histopathology, 59: 786-8. 460 Mehasseb, M. K., R. Panchal, A. H. Taylor, L. Brown, S. C. Bell, and M. Habiba. 2011. 'Estrogen and progesterone receptor isoform distribution through the menstrual cycle in uteri with and without adenomyosis', Fertil Steril, 95: 2228-35, 35 e1. Mehta, R., and A. D. Shapiro. 2008. 'Plasminogen activator inhibitor type 1 deficiency', Haemophilia, 14: 1255-60. Menkhorst, E., M. Griffiths, M. Van Sinderen, K. Rainczuk, K. Niven, and E. Dimitriadis. 2018. 'Galectin-7 is elevated in endometrioid (type I) endometrial cancer and promotes cell migration', Oncol Lett, 16: 4721-28. Meyer, C. A., and X. S. Liu. 2014. 'Identifying and mitigating bias in next-generation sequencing methods for chromatin biology', Nat Rev Genet, 15: 709-21. Meyer, L. A., R. R. Broaddus, and K. H. Lu. 2009. 'Endometrial cancer and Lynch syndrome: clinical and pathologic considerations', Cancer Control, 16: 14-22. Meyers, R. M., J. G. Bryan, J. M. McFarland, B. A. Weir, A. E. Sizemore, H. Xu, N. V. Dharia, P. G. Montgomery, G. S. Cowley, S. Pantel, A. Goodale, Y. Lee, L. D. Ali, G. Jiang, R. Lubonja, W. F. Harrington, M. Strickland, T. Wu, D. C. Hawes, V. A. Zhivich, M. R. Wyatt, Z. Kalani, J. J. Chang, M. Okamoto, K. Stegmaier, T. R. Golub, J. S. Boehm, F. Vazquez, D. E. Root, W. C. Hahn, and A. Tsherniak. 2017. 'Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells', Nat Genet, 49: 1779-84. Michel, B. C., A. R. D'Avino, S. H. Cassel, N. Mashtalir, Z. M. McKenzie, M. J. McBride, A. M. Valencia, Q. Zhou, M. Bocker, L. M. M. Soares, J. Pan, D. I. Remillard, C. A. Lareau, H. J. Zullow, N. Fortoul, N. S. Gray, J. E. Bradner, H. M. Chan, and C. Kadoch. 2018. 'A non-canonical SWI/SNF complex is a synthetic lethal target in cancers driven by BAF complex perturbation', Nat Cell Biol, 20: 1410-20. Mieczkowski, J., A. Cook, S. K. Bowman, B. Mueller, B. H. Alver, S. Kundu, A. M. Deaton, J. A. Urban, E. Larschan, P. J. Park, R. E. Kingston, and M. Y. Tolstorukov. 2016. 'MNase titration reveals differences between nucleosome occupancy and chromatin accessibility', Nat Commun, 7: 11485. Mihm, M., S. Gangooly, and S. Muttukrishna. 2011. 'The normal menstrual cycle in women', Anim Reprod Sci, 124: 229-36. 461 Miled, N., Y. Yan, W. C. Hon, O. Perisic, M. Zvelebil, Y. Inbar, D. Schneidman-Duhovny, H. J. Wolfson, J. M. Backer, and R. L. Williams. 2007. 'Mechanism of two classes of cancer mutations in the phosphoinositide 3-kinase catalytic subunit', Science, 317: 239-42. Mirantes, C., I. Espinosa, I. Ferrer, X. Dolcet, J. Prat, and X. Matias-Guiu. 2013. 'Epithelial-to- mesenchymal transition and stem cells in endometrial cancer', Hum Pathol, 44: 1973-81. Mittal, P., and C. W. M. Roberts. 2020. 'The SWI/SNF complex in cancer - biology, biomarkers and therapy', Nat Rev Clin Oncol, 17: 435-48. Montgomery, B. E., G. S. Daum, and C. J. Dunton. 2004. 'Endometrial hyperplasia: a review', Obstet Gynecol Surv, 59: 368-78. Moore, L. D., T. Le, and G. Fan. 2013. 'DNA methylation and its basic function', Neuropsychopharmacology, 38: 23-38. Moore, L., D. Leongamornlert, T. H. H. Coorens, M. A. Sanders, P. Ellis, S. C. Dentro, K. J. Dawson, T. Butler, R. Rahbari, T. J. Mitchell, F. Maura, J. Nangalia, P. S. Tarpey, S. F. Brunner, H. Lee-Six, Y. Hooks, S. Moody, K. T. Mahbubani, M. Jimenez-Linan, J. J. Brosens, C. A. Iacobuzio-Donahue, I. Martincorena, K. Saeb-Parsy, P. J. Campbell, and M. R. Stratton. 2020. 'The mutational landscape of normal human endometrial epithelium', Nature, 580: 640-46. Morice, P., A. Leary, C. Creutzberg, N. Abu-Rustum, and E. Darai. 2016. 'Endometrial cancer', Lancet, 387: 1094-108. Mullen, R. D., and R. R. Behringer. 2014. 'Molecular genetics of Mullerian duct formation, regression and differentiation', Sex Dev, 8: 281-96. Muppala, R., T. Donenberg, M. S. Huang, and M. P. Schlumbrecht. 2017. 'SMARCA4 germline gene mutation in a patient with epithelial ovarian: A case report', Gynecol. Oncol. Rep., 22: 45-47. Murali, R., R. A. Soslow, and B. Weigelt. 2014. 'Classification of endometrial carcinoma: more than two types', Lancet Oncol, 15: e268-78. Murray, M. J., W. R. Meyer, R. J. Zaino, B. A. Lessey, D. B. Novotny, K. Ireland, D. Zeng, and M. A. Fritz. 2004. 'A critical analysis of the accuracy, reproducibility, and clinical utility of histologic endometrial dating in fertile women', Fertil Steril, 81: 1333-43. 462 Muthuswami, R., L. Bailey, R. Rakesh, A. N. Imbalzano, J. A. Nickerson, and J. W. Hockensmith. 2019. 'BRG1 is a prognostic indicator and a potential therapeutic target for prostate cancer', J Cell Physiol. Mutter, G. L., M. C. Lin, J. T. Fitzgerald, J. B. Kum, J. P. Baak, J. A. Lees, L. P. Weng, and C. Eng. 2000. 'Altered PTEN expression as a diagnostic marker for the earliest endometrial precancers', J Natl Cancer Inst, 92: 924-30. Nacev, B. A., L. Feng, J. D. Bagert, A. E. Lemiesz, J. Gao, A. A. Soshnev, R. Kundra, N. Schultz, T. W. Muir, and C. D. Allis. 2019. 'The expanding landscape of 'oncohistone' mutations in human cancers', Nature, 567: 473-78. Nagarajan, S., S. V. Rao, J. Sutton, D. Cheeseman, S. Dunn, E. K. Papachristou, J. G. Prada, D. L. Couturier, S. Kumar, K. Kishore, C. S. R. Chilamakuri, S. E. Glont, E. Archer Goode, C. Brodie, N. Guppy, R. Natrajan, A. Bruna, C. Caldas, A. Russell, R. Siersbaek, K. Yusa, I. Chernukhin, and J. S. Carroll. 2020. 'ARID1A influences HDAC1/BRD4 activity, intrinsic proliferative capacity and breast cancer treatment response', Nat Genet. Nagl, N. G., Jr., A. Patsialou, D. S. Haines, P. B. Dallas, G. R. Beck, Jr., and E. Moran. 2005. 'The p270 (ARID1A/SMARCF1) subunit of mammalian SWI/SNF-related complexes is essential for normal cell cycle arrest', Cancer Res, 65: 9236-44. Namjan, A., A. Techasen, W. Loilome, P. Sa-Ngaimwibool, and A. Jusakul. 2020. 'ARID1A alterations and their clinical significance in cholangiocarcinoma', PeerJ, 8: e10464. Naumann, R. W. 2011. 'The role of the phosphatidylinositol 3-kinase (PI3K) pathway in the development and treatment of uterine cancer', Gynecol Oncol, 123: 411-20. Neely, K. E., A. H. Hassan, C. E. Brown, L. Howe, and J. L. Workman. 2002. 'Transcription activator interactions with multiple SWI/SNF subunits', Mol Cell Biol, 22: 1615-25. Nesvizhskii, A. I., A. Keller, E. Kolker, and R. Aebersold. 2003. 'A statistical model for identifying proteins by tandem mass spectrometry', Anal Chem, 75: 4646-58. Nguyen, K. H., F. Xu, S. Flowers, E. A. Williams, J. C. Fritton, and E. Moran. 2015. 'SWI/SNF- Mediated Lineage Determination in Mesenchymal Stem Cells Confers Resistance to Osteoporosis', Stem Cells, 33: 3028-38. 463 Ni, L., C. Bruce, C. Hart, J. Leigh-Bell, D. Gelperin, L. Umansky, M. B. Gerstein, and M. Snyder. 2009. 'Dynamic and complex transcription factor binding during an inducible response in yeast', Genes Dev, 23: 1351-63. Nie, Z., Y. Xue, D. Yang, S. Zhou, B. J. Deroo, T. K. Archer, and W. Wang. 2000. 'A specificity and targeting subunit of a human SWI/SNF family-related chromatin-remodeling complex', Mol Cell Biol, 20: 8879-88. Niemann, T. H., T. L. Trgovac, V. R. McGaughy, and L. Vaccarello. 1996. 'bcl-2 expression in endometrial hyperplasia and carcinoma', Gynecol Oncol, 63: 318-22. Nieto, M. A., R. Y. Huang, R. A. Jackson, and J. P. Thiery. 2016. 'Emt: 2016', Cell, 166: 21-45. Nisolle, M., and J. Donnez. 1997. 'Peritoneal endometriosis, ovarian endometriosis, and adenomyotic nodules of the rectovaginal septum are three different entities', Fertil Steril, 68: 585-96. Noyes, R. W., A. T. Hertig, and J. Rock. 2019. 'Reprint of: Dating the Endometrial Biopsy', Fertil Steril, 112: e93-e115. O'Connell, J. T., G. L. Mutter, A. Cviko, M. Nucci, B. J. Quade, H. P. Kozakewich, E. Neffen, D. Sun, A. Yang, F. D. McKeon, and C. P. Crum. 2001. 'Identification of a basal/reserve cell immunophenotype in benign and neoplastic endometrium: a study with the p53 homologue p63', Gynecol Oncol, 80: 30-6. O'Geen, H., L. Echipare, and P. J. Farnham. 2011. 'Using ChIP-seq technology to generate high- resolution profiles of histone modifications', Methods Mol Biol, 791: 265-86. O'Mara, T. A., D. M. Glubb, F. Amant, D. Annibali, K. Ashton, J. Attia, P. L. Auer, M. W. Beckmann, A. Black, M. K. Bolla, H. Brauch, H. Brenner, L. Brinton, D. D. Buchanan, B. Burwinkel, J. Chang-Claude, S. J. Chanock, C. Chen, M. M. Chen, T. H. T. Cheng, C. L. Clarke, M. Clendenning, L. S. Cook, F. J. Couch, A. Cox, M. Crous-Bous, K. Czene, F. Day, J. Dennis, J. Depreeuw, J. A. Doherty, T. Dork, S. C. Dowdy, M. Durst, A. B. Ekici, P. A. Fasching, B. L. Fridley, C. M. Friedenreich, L. Fritschi, J. Fung, M. Garcia- Closas, M. M. Gaudet, G. G. Giles, E. L. Goode, M. Gorman, C. A. Haiman, P. Hall, S. E. Hankison, C. S. Healey, A. Hein, P. Hillemanns, S. Hodgson, E. A. Hoivik, E. G. Holliday, J. L. Hopper, D. J. Hunter, A. Jones, C. Krakstad, V. N. Kristensen, D. Lambrechts, L. L. Marchand, X. Liang, A. Lindblom, J. Lissowska, J. Long, L. Lu, A. M. Magliocco, L. Martin, M. McEvoy, A. Meindl, K. Michailidou, R. L. Milne, M. Mints, G. W. Montgomery, R. Nassir, H. Olsson, I. Orlow, G. Otton, C. Palles, J. R. B. Perry, J. Peto, L. Pooler, J. Prescott, T. Proietto, T. R. Rebbeck, H. A. Risch, P. A. W. Rogers, M. 464 Rubner, I. Runnebaum, C. Sacerdote, G. E. Sarto, F. Schumacher, R. J. Scott, V. W. Setiawan, M. Shah, X. Sheng, X. O. Shu, M. C. Southey, A. J. Swerdlow, E. Tham, J. Trovik, C. Turman, J. P. Tyrer, C. Vachon, D. VanDen Berg, A. Vanderstichele, Z. Wang, P. M. Webb, N. Wentzensen, H. M. J. Werner, S. J. Winham, A. Wolk, L. Xia, Y. B. Xiang, H. P. Yang, H. Yu, W. Zheng, P. D. P. Pharoah, A. M. Dunning, P. Kraft, I. De Vivo, I. Tomlinson, D. F. Easton, A. B. Spurdle, and D. J. Thompson. 2018. 'Identification of nine new susceptibility loci for endometrial cancer', Nat Commun, 9: 3166. Ochoa-Bernal, M. A., and A. T. Fazleabas. 2020. 'Physiologic Events of Embryo Implantation and Decidualization in Human and Non-Human Primates', Int J Mol Sci, 21. Okuda, T., J. Otsuka, A. Sekizawa, H. Saito, R. Makino, M. Kushima, A. Farina, Y. Kuwano, and T. Okai. 2003. 'p53 mutations and overexpression affect prognosis of ovarian endometrioid cancer but not clear cell cancer', Gynecol Oncol, 88: 318-25. Olave, I., W. Wang, Y. Xue, A. Kuo, and G. R. Crabtree. 2002. 'Identification of a polymorphic, neuron-specific chromatin remodeling complex', Genes Dev, 16: 2509-17. Orlando, D. A., M. W. Chen, V. E. Brown, S. Solanki, Y. J. Choi, E. R. Olson, C. C. Fritz, J. E. Bradner, and M. G. Guenther. 2014. 'Quantitative ChIP-Seq normalization reveals global modulation of the epigenome', Cell Rep, 9: 1163-70. Ou, J., H. Liu, J. Yu, M. A. Kelliher, L. H. Castilla, N. D. Lawson, and L. J. Zhu. 2018. 'ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC- seq data', BMC Genomics, 19: 169. Owusu-Akyaw, A., K. Krishnamoorthy, L. T. Goldsmith, and S. S. Morelli. 2019. 'The role of mesenchymal-epithelial transition in endometrial function', Hum Reprod Update, 25: 114-33. Paoli, P., E. Giannoni, and P. Chiarugi. 2013. 'Anoikis molecular pathways and its role in cancer progression', Biochim Biophys Acta, 1833: 3481-98. Park, D., Y. Lee, G. Bhupindersingh, and V. R. Iyer. 2013. 'Widespread misinterpretable ChIP- seq bias in yeast', PLoS One, 8: e83506. Park, J. H., E. J. Park, H. S. Lee, S. J. Kim, S. K. Hur, A. N. Imbalzano, and J. Kwon. 2006. 'Mammalian SWI/SNF complexes facilitate DNA double-strand break repair by promoting gamma-H2AX induction', EMBO J, 25: 3986-97. 465 Park, P. J. 2009. 'ChIP-seq: advantages and challenges of a maturing technology', Nat Rev Genet, 10: 669-80. Park, Y. K., J. E. Lee, Z. Yan, K. McKernan, T. O'Haren, W. Wang, W. Peng, and K. Ge. 2021. 'Interplay of BAF and MLL4 promotes cell type-specific enhancer activation', Nat Commun, 12: 1630. Parker, S. C., M. L. Stitzel, D. L. Taylor, J. M. Orozco, M. R. Erdos, J. A. Akiyama, K. L. van Bueren, P. S. Chines, N. Narisu, Nisc Comparative Sequencing Program, B. L. Black, A. Visel, L. A. Pennacchio, F. S. Collins, Authors National Institutes of Health Intramural Sequencing Center Comparative Sequencing Program, and Nisc Comparative Sequencing Program Authors. 2013. 'Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants', Proc Natl Acad Sci U S A, 110: 17921-6. Pathiraja, P., S. Dhar, and K. Haldar. 2013. 'Serous endometrial intraepithelial carcinoma: a case series and literature review', Cancer Manag Res, 5: 117-22. Pearce, C. L., C. Templeman, M. A. Rossing, A. Lee, A. M. Near, P. M. Webb, C. M. Nagle, J. A. Doherty, K. L. Cushing-Haugen, K. G. Wicklund, J. Chang-Claude, R. Hein, G. Lurie, L. R. Wilkens, M. E. Carney, M. T. Goodman, K. Moysich, S. K. Kjaer, E. Hogdall, A. Jensen, E. L. Goode, B. L. Fridley, M. C. Larson, J. M. Schildkraut, R. T. Palmieri, D. W. Cramer, K. L. Terry, A. F. Vitonis, L. J. Titus, A. Ziogas, W. Brewster, H. Anton- Culver, A. Gentry-Maharaj, S. J. Ramus, A. R. Anderson, D. Brueggmann, P. A. Fasching, S. A. Gayther, D. G. Huntsman, U. Menon, R. B. Ness, M. C. Pike, H. Risch, A. H. Wu, A. Berchuck, and Consortium Ovarian Cancer Association. 2012. 'Association between endometriosis and risk of histological subtypes of ovarian cancer: a pooled analysis of case-control studies', Lancet Oncol, 13: 385-94. Perets, R., G. A. Wyant, K. W. Muto, J. G. Bijron, B. B. Poole, K. T. Chin, J. Y. Chen, A. W. Ohman, C. D. Stepule, S. Kwak, A. M. Karst, M. S. Hirsch, S. R. Setlur, C. P. Crum, D. M. Dinulescu, and R. Drapkin. 2013. 'Transformation of the fallopian tube secretory epithelium leads to high-grade serous ovarian cancer in Brca;Tp53;Pten models', Cancer Cell, 24: 751-65. Peterson, C. L., and I. Herskowitz. 1992. 'Characterization of the yeast SWI1, SWI2, and SWI3 genes, which encode a global activator of transcription', Cell, 68: 573-83. Pijnenborg, R., L. Vercruysse, and M. Hanssens. 2006. 'The uterine spiral arteries in human pregnancy: facts and controversies', Placenta, 27: 939-58. 466 Pillidge, Z., and S. J. Bray. 2019. 'SWI/SNF chromatin remodeling controls Notch-responsive enhancer accessibility', EMBO Rep, 20. Plant, T. M. 2015. '60 YEARS OF NEUROENDOCRINOLOGY: The hypothalamo-pituitary- gonadal axis', J Endocrinol, 226: T41-54. Pollard, K. S., M. J. Hubisz, K. R. Rosenbloom, and A. Siepel. 2010. 'Detection of nonneutral substitution rates on mammalian phylogenies', Genome Res, 20: 110-21. Polyak, K., and R. A. Weinberg. 2009. 'Transitions between epithelial and mesenchymal states: acquisition of malignant and stem cell traits', Nat Rev Cancer, 9: 265-73. Pott, S., and J. D. Lieb. 2015. 'What are super-enhancers?', Nat Genet, 47: 8-12. Powell, E., D. Piwnica-Worms, and H. Piwnica-Worms. 2014. 'Contribution of p53 to metastasis', Cancer Discov, 4: 405-14. Pradhan, S. K., T. Su, L. Yen, K. Jacquet, C. Huang, J. Cote, S. K. Kurdistani, and M. F. Carey. 2016. 'EP400 Deposits H3.3 into Promoters and Enhancers during Gene Activation', Mol Cell, 61: 27-38. Pranzatelli, T. J. F., D. G. Michael, and J. A. Chiorini. 2018. 'ATAC2GRN: optimized ATAC- seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference', BMC Genomics, 19: 563. Quinlan, A. R., and I. M. Hall. 2010. 'BEDTools: a flexible suite of utilities for comparing genomic features', Bioinformatics, 26: 841-2. R Core Team. 2018. 'R: A language and environment for statistical computing'. Raab, J. R., S. Resnick, and T. Magnuson. 2015. 'Genome-Wide Transcriptional Regulation Mediated by Biochemically Distinct SWI/SNF Complexes.', PLoS Genet, 11. Raab, J. R., J. S. Runge, C. C. Spear, and T. Magnuson. 2017. 'Co-regulation of transcription by BRG1 and BRM, two mutually exclusive SWI/SNF ATPase subunits', Epigenetics Chromatin, 10: 62. 467 Rafati, H., M. Parra, S. Hakre, Y. Moshkin, E. Verdin, and T. Mahmoudi. 2011. 'Repressive LTR nucleosome positioning by the BAF complex is required for HIV latency', PLoS Biol, 9: e1001206. Ramalingam, P., S. Croce, and W. G. McCluggage. 2017. 'Loss of expression of SMARCA4 (BRG1), SMARCA2 (BRM) and SMARCB1 (INI1) in undifferentiated carcinoma of the endometrium is not uncommon and is not always associated with rhabdoid morphology', Histopathology, 70: 359-66. Ramon, L., J. Gilabert-Estelles, R. Castello, J. Gilabert, F. Espana, A. Romeu, M. Chirivella, J. Aznar, and A. Estelles. 2005. 'mRNA analysis of several components of the plasminogen activator and matrix metalloproteinase systems in endometriosis using a real-time quantitative RT-PCR assay', Hum Reprod, 20: 272-8. Rashid, N. U., P. G. Giresi, J. G. Ibrahim, W. Sun, and J. D. Lieb. 2011. 'ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions', Genome Biol, 12: R67. Reed, B. G., and B. R. Carr. 2000. 'The Normal Menstrual Cycle and the Control of Ovulation.' in K. R. Feingold, B. Anawalt, A. Boyce, G. Chrousos, K. Dungan, A. Grossman, J. M. Hershman, G. Kaltsas, C. Koch, P. Kopp, M. Korbonits, R. McLachlan, J. E. Morley, M. New, L. Perreault, J. Purnell, R. Rebar, F. Singer, D. L. Trence, A. Vinik and D. P. Wilson (eds.), Endotext (South Dartmouth (MA)). Reed, S. D., K. M. Newton, W. L. Clinton, M. Epplein, R. Garcia, K. Allison, L. F. Voigt, and N. S. Weiss. 2009. 'Incidence of endometrial hyperplasia', Am J Obstet Gynecol, 200: 678 e1-6. Reich, M., T. Liefeld, J. Gould, J. Lerner, P. Tamayo, and J. P. Mesirov. 2006. 'GenePattern 2.0', Nat Genet, 38: 500-1. Renehan, A. G., M. Tyson, M. Egger, R. F. Heller, and M. Zwahlen. 2008. 'Body-mass index and incidence of cancer: a systematic review and meta-analysis of prospective observational studies', Lancet, 371: 569-78. Reske, J. J., M. R. Wilson, and R. L. Chandler. 2020. 'ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation', Epigenetics Chromatin, 13: 22. 468 Reske, J. J., M. R. Wilson, J. Holladay, M. Wegener, M. Adams, and R. L. Chandler. 2020. 'SWI/SNF inactivation in the endometrial epithelium leads to loss of epithelial integrity', Hum Mol Genet, 29: 3412-30. Reyes, J. C., J. Barra, C. Muchardt, A. Camus, C. Babinet, and M. Yaniv. 1998. 'Altered control of cellular proliferation in the absence of mammalian brahma (SNF2alpha)', EMBO J, 17: 6979-91. Riege, K., H. Kretzmer, A. Sahm, S. S. McDade, S. Hoffmann, and M. Fischer. 2020. 'Dissecting the DNA binding landscape and gene regulatory network of p63 and p53', Elife, 9. Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, and G. K. Smyth. 2015. 'limma powers differential expression analyses for RNA-sequencing and microarray studies', Nucleic Acids Res, 43: e47. Rizk, B., A. S. Fischer, H. A. Lotfy, R. Turki, H. A. Zahed, R. Malik, C. P. Holliday, A. Glass, H. Fishel, M. Y. Soliman, and D. Herrera. 2014. 'Recurrence of endometriosis after hysterectomy', Facts Views Vis Obgyn, 6: 219-27. Roberts, C. W., and S. H. Orkin. 2004. 'The SWI/SNF complex--chromatin and cancer', Nat Rev Cancer, 4: 133-42. Robinson, J. T., H. Thorvaldsdottir, W. Winckler, M. Guttman, E. S. Lander, G. Getz, and J. P. Mesirov. 2011. 'Integrative genomics viewer', Nat Biotechnol, 29: 24-6. Robinson, M. D., D. J. McCarthy, and G. K. Smyth. 2010. 'edgeR: a Bioconductor package for differential expression analysis of digital gene expression data', Bioinformatics, 26: 139- 40. Robinson, M. D., and A. Oshlack. 2010. 'A scaling normalization method for differential expression analysis of RNA-seq data', Genome Biol, 11: R25. Rocco, J. W., C. O. Leong, N. Kuperwasser, M. P. DeYoung, and L. W. Ellisen. 2006. 'p63 mediates survival in squamous cell carcinoma by suppression of p73-dependent apoptosis', Cancer Cell, 9: 45-56. Romano, A., S. Xanthoulea, E. Giacomini, B. Delvoux, E. Alleva, and P. Vigano. 2020. 'Endometriotic cell culture contamination and authenticity: a source of bias in in vitro research?', Hum Reprod, 35: 364-76. 469 Rothbart, S. B., K. Krajewski, B. D. Strahl, and S. M. Fuchs. 2012. 'Peptide microarrays to interrogate the "histone code"', Methods Enzymol, 512: 107-35. Roy, N., S. Malik, K. E. Villanueva, A. Urano, X. Lu, G. Von Figura, E. S. Seeley, D. W. Dawson, E. A. Collisson, and M. Hebrok. 2015. 'Brg1 promotes both tumor-suppressive and oncogenic activities at distinct stages of pancreatic cancer formation', Genes Dev, 29: 658-71. Rubel, C. A., S. P. Wu, L. Lin, T. Wang, R. B. Lanz, X. Li, R. Kommagani, H. L. Franco, S. A. Camper, Q. Tong, J. W. Jeong, J. P. Lydon, and F. J. DeMayo. 2016. 'A Gata2- Dependent Transcription Network Regulates Uterine Progesterone Responsiveness and Endometrial Function', Cell Rep, 17: 1414-25. Rutherford, E. J., A. D. K. Hill, and A. M. Hopkins. 2018. 'Adhesion in Physiological, Benign and Malignant Proliferative States of the Endometrium: Microenvironment and the Clinical Big Picture', Cells, 7. Saksouk, N., E. Simboeck, and J. Dejardin. 2015. 'Constitutive heterochromatin formation and transcription in mammals', Epigenetics Chromatin, 8: 3. Saladi, S. V., B. Keenen, H. G. Marathe, H. Qi, K. V. Chin, and I. L. de la Serna. 2010. 'Modulation of extracellular matrix/adhesion molecule expression by BRG1 is associated with increased melanoma invasiveness', Mol Cancer, 9: 280. Samartzis, E. P., N. Samartzis, A. Noske, A. Fedier, R. Caduff, K. J. Dedes, D. Fink, and P. Imesch. 2012. 'Loss of ARID1A/BAF250a-expression in endometriosis: a biomarker for risk of carcinogenic transformation?', Mod Pathol, 25: 885-92. Sampson, J. A. 1927. 'Metastatic or Embolic Endometriosis, due to the Menstrual Dissemination of Endometrial Tissue into the Venous Circulation', Am J Pathol, 3: 93-110 43. Samuels, Y., Z. Wang, A. Bardelli, N. Silliman, J. Ptak, S. Szabo, H. Yan, A. Gazdar, S. M. Powell, G. J. Riggins, J. K. Willson, S. Markowitz, K. W. Kinzler, B. Vogelstein, and V. E. Velculescu. 2004. 'High frequency of mutations of the PIK3CA gene in human cancers', Science, 304: 554. Sandhya, S., A. Maulik, M. Giri, and M. Singh. 2018. 'Domain architecture of BAF250a reveals the ARID and ARM-repeat domains with implication in function and assembly of the BAF remodeling complex', PLoS One, 13: e0205267. 470 Sandoval, G. J., J. L. Pulice, H. Pakula, M. Schenone, D. Y. Takeda, M. Pop, G. Boulay, K. E. Williamson, M. J. McBride, J. Pan, R. St Pierre, E. Hartman, L. A. Garraway, S. A. Carr, M. N. Rivera, Z. Li, L. Ronco, W. C. Hahn, and C. Kadoch. 2018. 'Binding of TMPRSS2-ERG to BAF Chromatin Remodeling Complexes Mediates Prostate Oncogenesis', Mol Cell, 71: 554-66 e7. Sano, R., and J. C. Reed. 2013. 'ER stress-induced cell death mechanisms', Biochim Biophys Acta, 1833: 3460-70. Sanyal, A., B. R. Lajoie, G. Jain, and J. Dekker. 2012. 'The long-range interaction landscape of gene promoters', Nature, 489: 109-13. Sapkota, Y., V. Steinthorsdottir, A. P. Morris, A. Fassbender, N. Rahmioglu, I. De Vivo, J. E. Buring, F. Zhang, T. L. Edwards, S. Jones, D. O, D. Peterse, K. M. Rexrode, P. M. Ridker, A. J. Schork, S. MacGregor, N. G. Martin, C. M. Becker, S. Adachi, K. Yoshihara, T. Enomoto, A. Takahashi, Y. Kamatani, K. Matsuda, M. Kubo, G. Thorleifsson, R. T. Geirsson, U. Thorsteinsdottir, L. M. Wallace, Psych- S. S. I. Broad Group i, J. Yang, D. R. Velez Edwards, M. Nyegaard, S. K. Low, K. T. Zondervan, S. A. Missmer, T. D'Hooghe, G. W. Montgomery, D. I. Chasman, K. Stefansson, J. Y. Tung, and D. R. Nyholt. 2017. 'Meta-analysis identifies five novel loci associated with endometriosis highlighting key genes involved in hormone metabolism', Nat Commun, 8: 15539. Savitsky, P., T. Krojer, T. Fujisawa, J. P. Lambert, S. Picaud, C. Y. Wang, E. K. Shanle, K. Krajewski, H. Friedrichsen, A. Kanapin, C. Goding, M. Schapira, A. Samsonova, B. D. Strahl, A. C. Gingras, and P. Filippakopoulos. 2016. 'Multivalent Histone and DNA Engagement by a PHD/BRD/PWWP Triple Reader Cassette Recruits ZMYND8 to K14ac-Rich Chromatin', Cell Rep, 17: 2724-37. Schep, A. N., J. D. Buenrostro, S. K. Denny, K. Schwartz, G. Sherlock, and W. J. Greenleaf. 2015. 'Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions', Genome Res, 25: 1757-70. Schiltz, R. L., C. A. Mizzen, A. Vassilev, R. G. Cook, C. D. Allis, and Y. Nakatani. 1999. 'Overlapping but distinct patterns of histone acetylation by the human coactivators p300 and PCAF within nucleosomal substrates', J Biol Chem, 274: 1189-92. Schneider, V. A., T. Graves-Lindsay, K. Howe, N. Bouk, H. C. Chen, P. A. Kitts, T. D. Murphy, K. D. Pruitt, F. Thibaud-Nissen, D. Albracht, R. S. Fulton, M. Kremitzki, V. Magrini, C. Markovic, S. McGrath, K. M. Steinberg, K. Auger, W. Chow, J. Collins, G. Harden, T. Hubbard, S. Pelan, J. T. Simpson, G. Threadgold, J. Torrance, J. M. Wood, L. Clarke, S. 471 Koren, M. Boitano, P. Peluso, H. Li, C. S. Chin, A. M. Phillippy, R. Durbin, R. K. Wilson, P. Flicek, E. E. Eichler, and D. M. Church. 2017. 'Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly', Genome Res, 27: 849-64. Schultheis, A. M., L. G. Martelotto, M. R. De Filippo, S. Piscuglio, C. K. Ng, Y. R. Hussein, J. S. Reis-Filho, R. A. Soslow, and B. Weigelt. 2016. 'TP53 Mutational Spectrum in Endometrioid and Serous Endometrial Cancers', Int J Gynecol Pathol, 35: 289-300. Schwabish, M. A., and K. Struhl. 2007. 'The Swi/Snf complex is important for histone eviction during transcriptional activation and RNA polymerase II elongation in vivo', Mol Cell Biol, 27: 6987-95. Sen, M., X. Wang, F. H. Hamdan, J. Rapp, J. Eggert, R. L. Kosinsky, F. Wegwitz, A. P. Kutschat, F. S. Younesi, J. Gaedcke, M. Grade, E. Hessmann, A. Papantonis, P. Strbel, and S. A. Johnsen. 2019. 'ARID1A facilitates KRAS signaling-regulated enhancer activity in an AP1-dependent manner in colorectal cancer cells', Clin Epigenetics, 11: 92. Sengupta, D., A. Kannan, M. Kern, M. A. Moreno, E. Vural, B. Stack, Jr., J. Y. Suen, A. J. Tackett, and L. Gao. 2015. 'Disruption of BRD4 at H3K27Ac-enriched enhancer region correlates with decreased c-Myc expression in Merkel cell carcinoma', Epigenetics, 10: 460-6. Serresi, M., S. Kertalli, L. Li, M. J. Schmitt, Y. Dramaretska, J. Wierikx, D. Hulsman, and G. Gargiulo. 2021. 'Functional antagonism of chromatin modulators regulates epithelial- mesenchymal transition', Sci Adv, 7. Shao, Z., Y. Zhang, G. C. Yuan, S. H. Orkin, and D. J. Waxman. 2012. 'MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets', Genome Biol, 13: R16. Sharghi, K. G., N. A. Ramey, P. S. Rush, and D. J. Grider. 2019. 'Endometriosis of the Eyelid, an Extraordinary Extra-abdominal Location Highlighting the Spectrum of Disease', Am J Dermatopathol, 41: 593-95. Sharma, K., T. T. Vu, W. Cook, M. Naseri, K. Zhan, W. Nakajima, and H. Harada. 2018. 'p53- independent Noxa induction by cisplatin is regulated by ATF3/ATF4 in head and neck squamous cell carcinoma cells', Mol Oncol, 12: 788-98. Shaw, P. H. 1996. 'The role of p53 in cell cycle regulation', Pathol Res Pract, 192: 669-75. 472 Shen, H., W. Xu, R. Guo, B. Rong, L. Gu, Z. Wang, C. He, L. Zheng, X. Hu, Z. Hu, Z. M. Shao, P. Yang, F. Wu, Y. G. Shi, Y. Shi, and F. Lan. 2016. 'Suppression of Enhancer Overactivation by a RACK7-Histone Demethylase Complex', Cell, 165: 331-42. Shen, J., Y. Peng, L. Wei, W. Zhang, L. Yang, L. Lan, P. Kapoor, Z. Ju, Q. Mo, M. Shih Ie, I. P. Uray, X. Wu, P. H. Brown, X. Shen, G. B. Mills, and G. Peng. 2015. 'ARID1A Deficiency Impairs the DNA Damage Checkpoint and Sensitizes Cells to PARP Inhibitors', Cancer Discov, 5: 752-67. Shen, J., L. Yao, Y. G. Lin, F. J. DeMayo, J. P. Lydon, L. Dubeau, and A. S. Lee. 2016. 'Glucose-regulated protein 94 deficiency induces squamous cell metaplasia and suppresses PTEN-null driven endometrial epithelial tumor development', Oncotarget, 7: 14885-97. Shen, W., C. Xu, W. Huang, J. Zhang, J. E. Carlson, X. Tu, J. Wu, and Y. Shi. 2007. 'Solution structure of human Brg1 bromodomain and its specific binding to acetylated histone tails', Biochemistry, 46: 2100-10. Shi, B., W. Yan, G. Liu, and Y. Guo. 2018. 'MicroRNA-488 inhibits tongue squamous carcinoma cell invasion and EMT by directly targeting ATF3', Cell Mol Biol Lett, 23: 28. Shi, J., W. A. Whyte, C. J. Zepeda-Mendoza, J. P. Milazzo, C. Shen, J. S. Roe, J. L. Minder, F. Mercan, E. Wang, M. A. Eckersley-Maslin, A. E. Campbell, S. Kawaoka, S. Shareef, Z. Zhu, J. Kendall, M. Muhar, C. Haslinger, M. Yu, R. G. Roeder, M. H. Wigler, G. A. Blobel, J. Zuber, D. L. Spector, R. A. Young, and C. R. Vakoc. 2013. 'Role of SWI/SNF in acute leukemia maintenance and enhancer-mediated Myc regulation', Genes Dev, 27: 2648-62. Shi, L., H. Wen, and X. Shi. 2017. 'The Histone Variant H3.3 in Transcriptional Regulation and Human Disease', J Mol Biol, 429: 1934-45. Shimshek, D. R., J. Kim, M. R. Hubner, D. J. Spergel, F. Buchholz, E. Casanova, A. F. Stewart, P. H. Seeburg, and R. Sprengel. 2002. 'Codon-improved Cre recombinase (iCre) expression in the mouse', Genesis, 32: 19-26. Shin, H. Y. 2018. 'Targeting Super-Enhancers for Disease Treatment and Diagnosis', Mol Cells, 41: 506-14. Siepel, A., G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, G. M. Weinstock, R. K. Wilson, R. A. Gibbs, W. J. 473 Kent, W. Miller, and D. Haussler. 2005. 'Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes', Genome Res, 15: 1034-50. Simon, R. P., D. Robaa, Z. Alhalabi, W. Sippl, and M. Jung. 2016. 'KATching-Up on Small Molecule Modulators of Lysine Acetyltransferases', J Med Chem, 59: 1249-70. Skene, P. J., J. G. Henikoff, and S. Henikoff. 2018. 'Targeted in situ genome-wide profiling with high efficiency for low cell numbers', Nat Protoc, 13: 1006-19. Skibinski, A., J. L. Breindel, A. Prat, P. Galvan, E. Smith, A. Rolfs, P. B. Gupta, J. LaBaer, and C. Kuperwasser. 2014. 'The Hippo transducer TAZ interacts with the SWI/SNF complex to regulate breast epithelial lineage commitment', Cell Rep, 6: 1059-72. Smedley, D., S. Haider, B. Ballester, R. Holland, D. London, G. Thorisson, and A. Kasprzyk. 2009. 'BioMart--biological queries made easy', BMC Genomics, 10: 22. Smeenk, G., W. W. Wiegant, H. Vrolijk, A. P. Solari, A. Pastink, and H. van Attikum. 2010. 'The NuRD chromatin-remodeling complex regulates signaling and repair of DNA damage', J Cell Biol, 190: 741-9. Smith, H. W., and C. J. Marshall. 2010. 'Regulation of cell signalling by uPAR', Nat Rev Mol Cell Biol, 11: 23-36. Sokpor, G., Y. Xie, J. Rosenbusch, and T. Tuoc. 2017. 'Chromatin Remodeling BAF (SWI/SNF) Complexes in Neural Development and Disorders', Front Mol Neurosci, 10: 243. Sorosky, J. I. 2012. 'Endometrial cancer', Obstet Gynecol, 120: 383-97. Soslow, R. A., E. Pirog, and C. Isacson. 2000. 'Endometrial intraepithelial carcinoma with associated peritoneal carcinomatosis', Am J Surg Pathol, 24: 726-32. Spivakov, M. 2014. 'Spurious transcription factor binding: non-functional or genetically redundant?', Bioessays, 36: 798-806. Spruijt, C. G., M. S. Luijsterburg, R. Menafra, R. G. Lindeboom, P. W. Jansen, R. R. Edupuganti, M. P. Baltissen, W. W. Wiegant, M. C. Voelker-Albert, F. Matarese, A. Mensinga, I. Poser, H. R. Vos, H. G. Stunnenberg, H. van Attikum, and M. Vermeulen. 2016. 'ZMYND8 Co-localizes with NuRD on Target Genes and Regulates Poly(ADP- 474 Ribose)-Dependent Recruitment of GATAD2A/NuRD to Sites of DNA Damage', Cell Rep, 17: 783-98. Staahl, B. T., J. Tang, W. Wu, A. Sun, A. D. Gitler, A. S. Yoo, and G. R. Crabtree. 2013. 'Kinetic analysis of npBAF to nBAF switching reveals exchange of SS18 with CREST and integration with neural developmental pathways', J Neurosci, 33: 10348-61. Stanton, B. Z., C. Hodges, J. P. Calarco, S. M. Braun, W. L. Ku, C. Kadoch, K. Zhao, and G. R. Crabtree. 2017. 'Smarca4 ATPase mutations disrupt direct eviction of PRC1 from chromatin', Nat Genet, 49: 282-88. Stark, R., and G. Brown. 2011. 'DiffBind: differential binding analysis of ChIP-seq peak data'. http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf. Stewart, C. A., and R. R. Behringer. 2012. 'Mouse oviduct development', Results Probl Cell Differ, 55: 247-62. Stewart-Morgan, K. R., N. Reveron-Gomez, and A. Groth. 2019. 'Transcription Restart Establishes Chromatin Accessibility after DNA Replication', Mol Cell, 75: 408-14. Streppel, M. M., S. Lata, M. DelaBastide, E. A. Montgomery, J. S. Wang, M. I. Canto, A. M. Macgregor-Das, S. Pai, F. H. Morsink, G. J. Offerhaus, E. Antoniou, A. Maitra, and W. R. McCombie. 2014. 'Next-generation sequencing of endoscopic biopsies identifies ARID1A as a tumor-suppressor gene in Barrett's esophagus', Oncogene, 33: 347-57. Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, and J. P. Mesirov. 2005. 'Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles', Proc Natl Acad Sci U S A, 102: 15545-50. Suda, K., H. Nakaoka, K. Yoshihara, T. Ishiguro, R. Tamura, Y. Mori, K. Yamawaki, S. Adachi, T. Takahashi, H. Kase, K. Tanaka, T. Yamamoto, T. Motoyama, I. Inoue, and T. Enomoto. 2018. 'Clonal Expansion and Diversification of Cancer-Associated Mutations in Endometriosis and Normal Endometrium', Cell Rep, 24: 1777-89. Sudarsanam, P., V. R. Iyer, P. O. Brown, and F. Winston. 2000. 'Whole-genome expression analysis of snf/swi mutants of Saccharomyces cerevisiae', Proc Natl Acad Sci U S A, 97: 3364-9. 475 Sullivan, K. D., M. D. Galbraith, Z. Andrysik, and J. M. Espinosa. 2018. 'Mechanisms of transcriptional regulation by p53', Cell Death Differ, 25: 133-43. Sun, X., S. C. Wang, Y. Wei, X. Luo, Y. Jia, L. Li, P. Gopal, M. Zhu, I. Nassour, J. C. Chuang, T. Maples, C. Celen, L. H. Nguyen, L. Wu, S. Fu, W. Li, L. Hui, F. Tian, Y. Ji, S. Zhang, M. Sorouri, T. H. Hwang, L. Letzig, L. James, Z. Wang, A. C. Yopp, A. G. Singal, and H. Zhu. 2017. 'Arid1a Has Context-Dependent Oncogenic and Tumor Suppressor Functions in Liver Cancer', Cancer Cell, 32: 574-89 e6. Suryo Rahmanto, Y., W. Shen, X. Shi, X. Chen, Y. Yu, Z. C. Yu, T. Miyamoto, M. H. Lee, V. Singh, R. Asaka, G. Shimberg, M. I. Vitolo, S. S. Martin, D. Wirtz, R. Drapkin, J. Xuan, T. L. Wang, and I. M. Shih. 2020. 'Inactivation of Arid1a in the endometrium is associated with endometrioid tumorigenesis through transcriptional reprogramming', Nat Commun, 11: 2717. Swygert, S. G., and C. L. Peterson. 2014. 'Chromatin dynamics: interplay between remodeling enzymes and histone modifications', Biochim Biophys Acta, 1839: 728-36. Syed, S. M., M. Kumar, A. Ghosh, F. Tomasetig, A. Ali, R. M. Whan, D. Alterman, and P. S. Tanwar. 2020. 'Endometrial Axin2(+) Cells Drive Epithelial Homeostasis, Regeneration, and Cancer following Oncogenic Transformation', Cell Stem Cell, 26: 64-80 e13. Szubert, M., E. Kozirog, O. Olszak, K. Krygier-Kurz, J. Kazmierczak, and J. Wilczynski. 2021. 'Adenomyosis and Infertility-Review of Medical and Surgical Approaches', Int J Environ Res Public Health, 18. Tabibzadeh, S. 1996. 'The signals and molecular pathways involved in human menstruation, a unique process of tissue destruction and remodelling', Mol Hum Reprod, 2: 77-92. Takeda, T., K. Banno, R. Okawa, M. Yanokura, M. Iijima, H. Irie-Kunitomi, K. Nakamura, M. Iida, M. Adachi, K. Umene, Y. Nogami, K. Masuda, Y. Kobayashi, E. Tominaga, and D. Aoki. 2016. 'ARID1A gene mutation in ovarian and endometrial cancers (Review)', Oncol Rep, 35: 607-13. Taketani, K., J. Kawauchi, M. Tanaka-Okamoto, H. Ishizaki, Y. Tanaka, T. Sakai, J. Miyoshi, Y. Maehara, and S. Kitajima. 2012. 'Key role of ATF3 in p53-dependent DR5 induction upon DNA damage of human colon cancer cells', Oncogene, 31: 2210-21. 476 Tan, J., P. Yong, and M. A. Bedaiwy. 2019. 'A critical review of recent advances in the diagnosis, classification, and management of uterine adenomyosis', Curr Opin Obstet Gynecol, 31: 212-21. Tanaka, Y., A. Nakamura, M. S. Morioka, S. Inoue, M. Tamamori-Adachi, K. Yamada, K. Taketani, J. Kawauchi, M. Tanaka-Okamoto, J. Miyoshi, H. Tanaka, and S. Kitajima. 2011. 'Systems analysis of ATF3 in stress response and cancer reveals opposing effects on pro-apoptotic genes in p53 pathway', PLoS One, 6: e26848. Taran, F. A., E. A. Stewart, and S. Brucker. 2013. 'Adenomyosis: Epidemiology, Risk Factors, Clinical Phenotype and Surgical and Interventional Alternatives to Hysterectomy', Geburtshilfe Frauenheilkd, 73: 924-31. Tarbell, E. D., and T. Liu. 2019. 'HMMRATAC: a Hidden Markov ModeleR for ATAC-seq', Nucleic Acids Res, 47: e91. Tashiro, H., C. Isacson, R. Levine, R. J. Kurman, K. R. Cho, and L. Hedrick. 1997. 'p53 gene mutations are common in uterine serous carcinoma and occur early in their pathogenesis', Am J Pathol, 150: 177-85. Taslim, C., J. Wu, P. Yan, G. Singer, J. Parvin, T. Huang, S. Lin, and K. Huang. 2009. 'Comparative study on ChIP-seq data: normalization and binding pattern characterization', Bioinformatics, 25: 2334-40. Taylor, H. S. 2004. 'Endometrial cells derived from donor stem cells in bone marrow transplant recipients', JAMA, 292: 81-5. Teixeira, J., B. R. Rueda, and J. K. Pru. 2008. 'Uterine stem cells.' in, StemBook (Cambridge (MA)). The Gene Ontology Consortium. 2019. 'The Gene Ontology Resource: 20 years and still GOing strong', Nucleic Acids Res, 47: D330-D38. Toenhake, C. G., S. A. Fraschka, M. S. Vijayabaskar, D. R. Westhead, S. J. van Heeringen, and R. Bartfai. 2018. 'Chromatin Accessibility-Based Characterization of the Gene Regulatory Network Underlying Plasmodium falciparum Blood-Stage Development', Cell Host Microbe, 23: 557-69 e9. 477 Tong, J. K., C. A. Hassig, G. R. Schnitzler, R. E. Kingston, and S. L. Schreiber. 1998. 'Chromatin deacetylation by an ATP-dependent nucleosome remodelling complex', Nature, 395: 917-21. Trizzino, M., E. Barbieri, A. Petracovici, S. Wu, S. A. Welsh, T. A. Owens, S. Licciulli, R. Zhang, and A. Gardini. 2018. 'The Tumor Suppressor ARID1A Controls Global Transcription via Pausing of RNA Polymerase II', Cell Rep, 23: 3933-45. Tsompana, M., and M. J. Buck. 2014. 'Chromatin accessibility: a window into the genome', Epigenetics Chromatin, 7: 33. Urick, M. E., and D. W. Bell. 2019. 'Clinical actionability of molecular targets in endometrial cancer', Nat Rev Cancer, 19: 510-21. Valencia, A. M., C. K. Collings, H. T. Dao, R. St Pierre, Y. C. Cheng, J. Huang, Z. Y. Sun, H. S. Seo, N. Mashtalir, D. E. Comstock, O. Bolonduro, N. E. Vangos, Z. C. Yeoh, M. K. Dornon, C. Hermawan, L. Barrett, S. Dhe-Paganon, C. J. Woolf, T. W. Muir, and C. Kadoch. 2019. 'Recurrent SMARCB1 Mutations Reveal a Nucleosome Acidic Patch Interaction Site That Potentiates mSWI/SNF Complex Chromatin Remodeling', Cell, 179: 1342-56 e23. van Attikum, H., and S. M. Gasser. 2005. 'The histone code at DNA breaks: a guide to repair?', Nat Rev Mol Cell Biol, 6: 757-65. Vaske, C. J., S. C. Benz, J. Z. Sanborn, D. Earl, C. Szeto, J. Zhu, D. Haussler, and J. M. Stuart. 2010. 'Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM', Bioinformatics, 26: i237-45. Venkatesh, S., and J. L. Workman. 2015. 'Histone exchange, chromatin structure and the regulation of transcription', Nat Rev Mol Cell Biol, 16: 178-89. Vercellini, P., F. Parazzini, S. Oldani, S. Panazza, T. Bramante, and P. G. Crosignani. 1995. 'Adenomyosis at hysterectomy: a study on frequency distribution and patient characteristics', Hum Reprod, 10: 1160-2. Vierbuchen, T., E. Ling, C. J. Cowley, C. H. Couch, X. Wang, D. A. Harmin, C. W. M. Roberts, and M. E. Greenberg. 2017. 'AP-1 Transcription Factors and the BAF Complex Mediate Signal-Dependent Enhancer Selection', Mol Cell, 68: 1067-82 e12. 478 Vladimirova, V., T. Mikeska, A. Waha, N. Soerensen, J. Xu, P. C. Reynolds, and T. Pietsch. 2009. 'Aberrant methylation and reduced expression of LHX9 in malignant gliomas of childhood', Neoplasia, 11: 700-11. Voigt, P., W. W. Tee, and D. Reinberg. 2013. 'A double take on bivalent promoters', Genes Dev, 27: 1318-38. Wang, A., S. Arantes, C. Conti, M. McArthur, C. M. Aldaz, and M. C. MacLeod. 2007. 'Epidermal hyperplasia and oral carcinoma in mice overexpressing the transcription factor ATF3 in basal epithelial cells', Mol Carcinog, 46: 476-87. Wang, A., S. Arantes, L. Yan, K. Kiguchi, M. J. McArthur, A. Sahin, H. D. Thames, C. M. Aldaz, and M. C. Macleod. 2008. 'The transcription factor ATF3 acts as an oncogene in mouse mammary tumorigenesis', BMC Cancer, 8: 268. Wang, B., T. M. Ye, K. F. Lee, P. C. Chiu, R. T. Pang, E. H. Ng, and W. S. Yeung. 2015. 'Annexin A2 Acts as an Adhesion Molecule on the Endometrial Epithelium during Implantation in Mice', PLoS One, 10: e0139506. Wang, J. R., B. Quach, and T. S. Furey. 2017. 'Correcting nucleotide-specific biases in high- throughput sequencing data', BMC Bioinformatics, 18: 357. Wang, K., J. Kan, S. T. Yuen, S. T. Shi, K. M. Chu, S. Law, T. L. Chan, Z. Kan, A. S. Chan, W. Y. Tsui, S. P. Lee, S. L. Ho, A. K. Chan, G. H. Cheng, P. C. Roberts, P. A. Rejto, N. W. Gibson, D. J. Pocalyko, M. Mao, J. Xu, and S. Y. Leung. 2011. 'Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer', Nat Genet, 43: 1219-23. Wang, W., J. Cote, Y. Xue, S. Zhou, P. A. Khavari, S. R. Biggar, C. Muchardt, G. V. Kalpana, S. P. Goff, M. Yaniv, J. L. Workman, and G. R. Crabtree. 1996. 'Purification and biochemical heterogeneity of the mammalian SWI-SNF complex', EMBO J, 15: 5370-82. Wang, W., F. Vilella, P. Alama, I. Moreno, M. Mignardi, A. Isakova, W. Pan, C. Simon, and S. R. Quake. 2020. 'Single-cell transcriptomic atlas of the human endometrium during the menstrual cycle', Nat Med, 26: 1644-53. Wang, W., Y. Xue, S. Zhou, A. Kuo, B. R. Cairns, and G. R. Crabtree. 1996. 'Diversity and specialization of mammalian SWI/SNF complexes', Genes Dev, 10: 2117-30. 479 Wang, X., J. R. Haswell, and C. W. Roberts. 2014. 'Molecular pathways: SWI/SNF (BAF) complexes are frequently mutated in cancer--mechanisms and potential therapeutic insights', Clin Cancer Res, 20: 21-7. Wang, X., R. S. Lee, B. H. Alver, J. R. Haswell, S. Wang, J. Mieczkowski, Y. Drier, S. M. Gillespie, T. C. Archer, J. N. Wu, E. P. Tzvetkov, E. C. Troisi, S. L. Pomeroy, J. A. Biegel, M. Y. Tolstorukov, B. E. Bernstein, P. J. Park, and C. W. Roberts. 2017. 'SMARCB1-mediated SWI/SNF complex function is essential for enhancer regulation', Nat Genet, 49: 289-95. Wang, X., X. Li, T. Wang, S. P. Wu, J. W. Jeong, T. H. Kim, S. L. Young, B. A. Lessey, R. B. Lanz, J. P. Lydon, and F. J. DeMayo. 2018. 'SOX17 regulates uterine epithelial-stromal cross-talk acting via a distal enhancer upstream of Ihh', Nat Commun, 9: 4421. Wang, X., N. G. Nagl, D. Wilsker, M. Van Scoy, S. Pacchione, P. Yaciuk, P. B. Dallas, and E. Moran. 2004. 'Two related ARID family proteins are alternative subunits of human SWI/SNF complexes', Biochem J, 383: 319-25. Wang, X., M. S. L. Praca, J. R. H. Wendel, R. E. Emerson, F. J. DeMayo, J. P. Lydon, and S. M. Hawkins. 2021. 'Vaginal Squamous Cell Carcinoma Develops in Mice with Conditional Arid1a Loss and Gain of Oncogenic Kras Driven by Progesterone Receptor Cre', Am J Pathol, 191: 1281-91. Wang, Z., M. Gerstein, and M. Snyder. 2009. 'RNA-Seq: a revolutionary tool for transcriptomics', Nat Rev Genet, 10: 57-63. Wang, Z., C. Zang, J. A. Rosenfeld, D. E. Schones, A. Barski, S. Cuddapah, K. Cui, T. Y. Roh, W. Peng, M. Q. Zhang, and K. Zhao. 2008. 'Combinatorial patterns of histone acetylations and methylations in the human genome', Nat Genet, 40: 897-903. Wei, X. L., D. S. Wang, S. Y. Xi, W. J. Wu, D. L. Chen, Z. L. Zeng, R. Y. Wang, Y. X. Huang, Y. Jin, F. Wang, M. Z. Qiu, H. Y. Luo, D. S. Zhang, and R. H. Xu. 2014. 'Clinicopathologic and prognostic relevance of ARID1A protein loss in colorectal cancer', World J Gastroenterol, 20: 18404-12. Weinert, B. T., T. Narita, S. Satpathy, B. Srinivasan, B. K. Hansen, C. Scholz, W. B. Hamilton, B. E. Zucconi, W. W. Wang, W. R. Liu, J. M. Brickman, E. A. Kesicki, A. Lai, K. D. Bromberg, P. A. Cole, and C. Choudhary. 2018. 'Time-Resolved Analysis Reveals Rapid Dynamics and Broad Scope of the CBP/p300 Acetylome', Cell, 174: 231-44 e12. 480 Wen, H., Y. Li, Y. Xi, S. Jiang, S. Stratton, D. Peng, K. Tanaka, Y. Ren, Z. Xia, J. Wu, B. Li, M. C. Barton, W. Li, H. Li, and X. Shi. 2014. 'ZMYND11 links histone H3.3K36me3 to transcription elongation and tumour suppression', Nature, 508: 263-8. Whitehouse, I., A. Flaus, B. R. Cairns, M. F. White, J. L. Workman, and T. Owen-Hughes. 1999. 'Nucleosome mobilization catalysed by the yeast SWI/SNF complex', Nature, 400: 784-7. Whyte, W. A., D. A. Orlando, D. Hnisz, B. J. Abraham, C. Y. Lin, M. H. Kagey, P. B. Rahl, T. I. Lee, and R. A. Young. 2013. 'Master transcription factors and mediator establish super- enhancers at key cell identity genes', Cell, 153: 307-19. Wickham, H. 2016. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag New York). Wieczorek, D., N. Bogershausen, F. Beleggia, S. Steiner-Haldenstatt, E. Pohl, Y. Li, E. Milz, M. Martin, H. Thiele, J. Altmuller, Y. Alanay, H. Kayserili, L. Klein-Hitpass, S. Bohringer, A. Wollstein, B. Albrecht, K. Boduroglu, A. Caliebe, K. Chrzanowska, O. Cogulu, F. Cristofoli, J. C. Czeschik, K. Devriendt, M. T. Dotti, N. Elcioglu, B. Gener, T. O. Goecke, M. Krajewska-Walasek, E. Guillen-Navarro, J. Hayek, G. Houge, E. Kilic, P. O. Simsek-Kiper, V. Lopez-Gonzalez, A. Kuechler, S. Lyonnet, F. Mari, A. Marozza, M. Mathieu Dramard, B. Mikat, G. Morin, F. Morice-Picard, F. Ozkinay, A. Rauch, A. Renieri, S. Tinschert, G. E. Utine, C. Vilain, R. Vivarelli, C. Zweier, P. Nurnberg, S. Rahmann, J. Vermeesch, H. J. Ludecke, M. Zeschnigk, and B. Wollnik. 2013. 'A comprehensive molecular study on Coffin-Siris and Nicolaides-Baraitser syndromes identifies a broad molecular and clinical spectrum converging on altered chromatin remodeling', Hum Mol Genet, 22: 5121-35. Wiegand, K. C., A. F. Lee, O. M. Al-Agha, C. Chow, S. E. Kalloger, D. W. Scott, C. Steidl, S. M. Wiseman, R. D. Gascoyne, B. Gilks, and D. G. Huntsman. 2011. 'Loss of BAF250a (ARID1A) is frequent in high-grade endometrial carcinomas', J Pathol, 224: 328-33. Wiegand, K. C., S. P. Shah, O. M. Al-Agha, Y. Zhao, K. Tse, T. Zeng, J. Senz, M. K. McConechy, M. S. Anglesio, S. E. Kalloger, W. Yang, A. Heravi-Moussavi, R. Giuliany, C. Chow, J. Fee, A. Zayed, L. Prentice, N. Melnyk, G. Turashvili, A. D. Delaney, J. Madore, S. Yip, A. W. McPherson, G. Ha, L. Bell, S. Fereday, A. Tam, L. Galletta, P. N. Tonin, D. Provencher, D. Miller, S. J. Jones, R. A. Moore, G. B. Morin, A. Oloumi, N. Boyd, S. A. Aparicio, M. Shih Ie, A. M. Mes-Masson, D. D. Bowtell, M. Hirst, B. Gilks, M. A. Marra, and D. G. Huntsman. 2010. 'ARID1A mutations in endometriosis- associated ovarian carcinomas', N Engl J Med, 363: 1532-43. Wild, P. J., K. Ikenberg, T. J. Fuchs, M. Rechsteiner, S. Georgiev, N. Fankhauser, A. Noske, M. Roessle, R. Caduff, A. Dellas, D. Fink, H. Moch, W. Krek, and I. J. Frew. 2012. 'p53 481 suppresses type II endometrial carcinomas in mice and governs endometrial tumour aggressiveness in humans', EMBO Mol Med, 4: 808-24. Williams, A. B., and B. Schumacher. 2016. 'p53 in the DNA-Damage-Repair Process', Cold Spring Harb Perspect Med, 6. Wilson, B. G., and C. W. Roberts. 2011. 'SWI/SNF nucleosome remodellers and cancer', Nat Rev Cancer, 11: 481-92. Wilson, M. R., J. Holladay, and R. L. Chandler. 2020. 'A mouse model of endometriosis mimicking the natural spread of invasive endometrium', Hum Reprod, 35: 58-69. Wilson, M. R., J. J. Reske, J. Holladay, S. Neupane, J. Ngo, N. Cuthrell, M. Wegener, M. Rhodes, M. Adams, R. Sheridan, G. Hostetter, F. T. Alotaibi, P. J. Yong, M. S. Anglesio, B. A. Lessey, R. E. Leach, J. M. Teixeira, S. A. Missmer, A. T. Fazleabas, and R. L. Chandler. 2020. 'ARID1A Mutations Promote P300-Dependent Endometrial Invasion through Super-Enhancer Hyperacetylation', Cell Rep, 33: 108366. Wilson, M. R., J. J. Reske, J. Holladay, G. E. Wilber, M. Rhodes, J. Koeman, M. Adams, B. Johnson, R. W. Su, N. R. Joshi, A. L. Patterson, H. Shen, R. E. Leach, J. M. Teixeira, A. T. Fazleabas, and R. L. Chandler. 2019. 'ARID1A and PI3-kinase pathway mutations in the endometrium drive epithelial transdifferentiation and collective invasion', Nat. Commun., 10: 3554. Win, A. K., J. C. Reece, and S. Ryan. 2015. 'Family history and risk of endometrial cancer: a systematic review and meta-analysis', Obstet Gynecol, 125: 89-98. Winkler, D. D., and K. Luger. 2011. 'The histone chaperone FACT: structural insights and mechanisms for nucleosome reorganization', J Biol Chem, 286: 18369-74. Winuthayanon, W., S. C. Hewitt, G. D. Orvis, R. R. Behringer, and K. S. Korach. 2010. 'Uterine epithelial estrogen receptor alpha is dispensable for proliferation but essential for complete biological and biochemical responses', Proc Natl Acad Sci U S A, 107: 19272-7. Wong, A. K., F. Shanahan, Y. Chen, L. Lian, P. Ha, K. Hendricks, S. Ghaffari, D. Iliev, B. Penn, A. M. Woodland, R. Smith, G. Salada, A. Carillo, K. Laity, J. Gupte, B. Swedlund, S. V. Tavtigian, D. H. Teng, and E. Lees. 2000. 'BRG1, a component of the SWI-SNF complex, is mutated in multiple human tumor cell lines', Cancer Res, 60: 6171-7. 482 Wu, J. I., J. Lessard, and G. R. Crabtree. 2009. 'Understanding the words of chromatin regulation', Cell, 136: 200-6. Wu, J. N., and C. W. Roberts. 2013. 'ARID1A mutations in cancer: another epigenetic tumor suppressor?', Cancer Discov, 3: 35-43. Wu, Q. J., Y. Y. Li, C. Tu, J. Zhu, K. Q. Qian, T. B. Feng, C. Li, L. Wu, and X. X. Ma. 2015. 'Parity and endometrial cancer risk: a meta-analysis of epidemiological studies', Sci Rep, 5: 14243. Wu, R. C., T. L. Wang, and M. Shih Ie. 2014. 'The emerging roles of ARID1A in tumor suppression', Cancer Biol Ther, 15: 655-64. Wu, R., Y. Zhai, R. Kuick, A. N. Karnezis, P. Garcia, A. Naseem, T. C. Hu, E. R. Fearon, and K. R. Cho. 2016. 'Impact of oviductal versus ovarian epithelial cell of origin on ovarian endometrioid carcinoma phenotype in the mouse', J Pathol, 240: 341-51. Wu, S., N. Fatkhutdinov, L. Rosin, J. M. Luppino, O. Iwasaki, H. Tanizawa, H. Y. Tang, A. V. Kossenkov, A. Gardini, K. I. Noma, D. W. Speicher, E. F. Joyce, and R. Zhang. 2019. 'ARID1A spatially partitions interphase chromosomes', Sci Adv, 5: eaaw5294. Wu, X., B. C. Nguyen, P. Dziunycz, S. Chang, Y. Brooks, K. Lefort, G. F. Hofbauer, and G. P. Dotto. 2010. 'Opposing roles for calcineurin and ATF3 in squamous skin cancer', Nature, 465: 368-72. Xie, J. J., Y. M. Xie, B. Chen, F. Pan, J. C. Guo, Q. Zhao, J. H. Shen, Z. Y. Wu, J. Y. Wu, L. Y. Xu, and E. M. Li. 2014. 'ATF3 functions as a novel tumor suppressor with prognostic significance in esophageal squamous cell carcinoma', Oncotarget, 5: 8569-82. Xu, L., T. Zu, T. Li, M. Li, J. Mi, F. Bai, G. Liu, J. Wen, H. Li, C. Brakebusch, X. Wang, and X. Wu. 2021. 'ATF3 downmodulates its new targets IFI6 and IFI27 to suppress the growth and migration of tongue squamous cell carcinoma cells', PLoS Genet, 17: e1009283. Xu, N., L. Wang, P. Sun, S. Xu, S. Fu, and Z. Sun. 2019. 'Low Arid1a Expression Correlates with Poor Prognosis and Promotes Cell Proliferation and Metastasis in Osteosarcoma', Pathol Oncol Res, 25: 875-81. 483 Xue, Y., J. Wong, G. T. Moreno, M. K. Young, J. Cote, and W. Wang. 1998. 'NURD, a novel complex with both ATP-dependent chromatin-remodeling and histone deacetylase activities', Mol Cell, 2: 851-61. Yan, F., D. R. Powell, D. J. Curtis, and N. C. Wong. 2020. 'From reads to insight: a hitchhiker's guide to ATAC-seq data analysis', Genome Biol, 21: 22. Yan, H. B., X. F. Wang, Q. Zhang, Z. Q. Tang, Y. H. Jiang, H. Z. Fan, Y. H. Sun, P. Y. Yang, and F. Liu. 2014. 'Reduced expression of the chromatin remodeling gene ARID1A enhances gastric cancer cell migration and invasion via downregulation of E-cadherin transcription', Carcinogenesis, 35: 867-76. Yan, Z., K. Cui, D. M. Murray, C. Ling, Y. Xue, A. Gerstein, R. Parsons, K. Zhao, and W. Wang. 2005. 'PBAF chromatin-remodeling complex requires a novel specificity subunit, BAF200, to regulate expression of selective interferon-responsive genes', Genes Dev, 19: 1662-7. Yang, Y. M., and W. X. Yang. 2017. 'Epithelial-to-mesenchymal transition in the development of endometriosis', Oncotarget, 8: 41679-89. Yang, Y., X. Wang, J. Yang, J. Duan, Z. Wu, F. Yang, X. Zhang, and S. Xiao. 2019. 'Loss of ARID1A promotes proliferation, migration and invasion via the Akt signaling pathway in NPC', Cancer Manag Res, 11: 4931-46. Yates, L. R., S. Knappskog, D. Wedge, J. H. R. Farmery, S. Gonzalez, I. Martincorena, L. B. Alexandrov, P. Van Loo, H. K. Haugland, P. K. Lilleng, G. Gundem, M. Gerstung, E. Pappaemmanuil, P. Gazinska, S. G. Bhosle, D. Jones, K. Raine, L. Mudie, C. Latimer, E. Sawyer, C. Desmedt, C. Sotiriou, M. R. Stratton, A. M. Sieuwerts, A. G. Lynch, J. W. Martens, A. L. Richardson, A. Tutt, P. E. Lonning, and P. J. Campbell. 2017. 'Genomic Evolution of Breast Cancer Metastasis and Relapse', Cancer Cell, 32: 169-84 e7. Ye, Y., A. Vattai, X. Zhang, J. Zhu, C. J. Thaler, S. Mahner, U. Jeschke, and V. von Schonfeldt. 2017. 'Role of Plasminogen Activator Inhibitor Type 1 in Pathologies of Female Reproductive Diseases', Int J Mol Sci, 18. Yeh, C. C., F. H. Su, C. R. Tzeng, C. H. Muo, and W. C. Wang. 2018. 'Women with adenomyosis are at higher risks of endometrial and thyroid cancers: A population-based historical cohort study', PLoS One, 13: e0194011. 484 Yen, T. T., T. Miyamoto, S. Asaka, M. H. Chui, Y. Wang, S. F. Lin, R. L. Stone, A. N. Fader, R. Asaka, H. Kashima, T. Shiozawa, T. L. Wang, I. M. Shih, and E. J. Tanner, 3rd. 2018. 'Loss of ARID1A expression in endometrial samplings is associated with the risk of endometrial carcinoma', Gynecol Oncol, 150: 426-31. Yevshin, I., R. Sharipov, S. Kolmykov, Y. Kondrakhin, and F. Kolpakov. 2019. 'GTRD: a database on gene transcription regulation-2019 update', Nucleic Acids Res, 47: D100- D05. Yin, X., J. W. Dewille, and T. Hai. 2008. 'A potential dichotomous role of ATF3, an adaptive- response gene, in cancer development', Oncogene, 27: 2118-27. Yu, G., L. G. Wang, Y. Han, and Q. Y. He. 2012. 'clusterProfiler: an R package for comparing biological themes among gene clusters', OMICS, 16: 284-7. Yu, O., R. Schulze-Rath, J. Grafton, K. Hansen, D. Scholes, and S. D. Reed. 2020. 'Adenomyosis incidence, prevalence and treatment: United States population-based study 2006-2015', Am J Obstet Gynecol, 223: 94 e1-94 e10. Zaino, R. J., and R. J. Kurman. 1988. 'Squamous differentiation in carcinoma of the endometrium: a critical appraisal of adenoacanthoma and adenosquamous carcinoma', Semin Diagn Pathol, 5: 154-71. Zang, Z. J., I. Cutcutache, S. L. Poon, S. L. Zhang, J. R. McPherson, J. Tao, V. Rajasegaran, H. L. Heng, N. Deng, A. Gan, K. H. Lim, C. K. Ong, D. Huang, S. Y. Chin, I. B. Tan, C. C. Ng, W. Yu, Y. Wu, M. Lee, J. Wu, D. Poh, W. K. Wan, S. Y. Rha, J. So, M. Salto-Tellez, K. G. Yeoh, W. K. Wong, Y. J. Zhu, P. A. Futreal, B. Pang, Y. Ruan, A. M. Hillmer, D. Bertrand, N. Nagarajan, S. Rozen, B. T. Teh, and P. Tan. 2012. 'Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes', Nat Genet, 44: 570-4. Zehir, A., R. Benayed, R. H. Shah, A. Syed, S. Middha, H. R. Kim, P. Srinivasan, J. Gao, D. Chakravarty, S. M. Devlin, M. D. Hellmann, D. A. Barron, A. M. Schram, M. Hameed, S. Dogan, D. S. Ross, J. F. Hechtman, D. F. DeLair, J. Yao, D. L. Mandelker, D. T. Cheng, R. Chandramohan, A. S. Mohanty, R. N. Ptashkin, G. Jayakumaran, M. Prasad, M. H. Syed, A. B. Rema, Z. Y. Liu, K. Nafa, L. Borsu, J. Sadowska, J. Casanova, R. Bacares, I. J. Kiecka, A. Razumova, J. B. Son, L. Stewart, T. Baldi, K. A. Mullaney, H. Al-Ahmadie, E. Vakiani, A. A. Abeshouse, A. V. Penson, P. Jonsson, N. Camacho, M. T. Chang, H. H. Won, B. E. Gross, R. Kundra, Z. J. Heins, H. W. Chen, S. Phillips, H. Zhang, J. Wang, A. Ochoa, J. Wills, M. Eubank, S. B. Thomas, S. M. Gardos, D. N. Reales, J. Galle, R. Durany, R. Cambria, W. Abida, A. Cercek, D. R. Feldman, M. M. 485 Gounder, A. A. Hakimi, J. J. Harding, G. Iyer, Y. Y. Janjigian, E. J. Jordan, C. M. Kelly, M. A. Lowery, L. G. T. Morris, A. M. Omuro, N. Raj, P. Razavi, A. N. Shoushtari, N. Shukla, T. E. Soumerai, A. M. Varghese, R. Yaeger, J. Coleman, B. Bochner, G. J. Riely, L. B. Saltz, H. I. Scher, P. J. Sabbatini, M. E. Robson, D. S. Klimstra, B. S. Taylor, J. Baselga, N. Schultz, D. M. Hyman, M. E. Arcila, D. B. Solit, M. Ladanyi, and M. F. Berger. 2017. 'Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients', Nat Med, 23: 703-13. Zeisberg, M., and E. G. Neilson. 2009. 'Biomarkers for epithelial-mesenchymal transitions', J Clin Invest, 119: 1429-37. Zeitvogel, A., R. Baumann, and A. Starzinski-Powitz. 2001. 'Identification of an invasive, N- cadherin-expressing epithelial cell type in endometriosis using a new cell culture model', Am J Pathol, 159: 1839-52. Zerbino, D. R., P. Achuthan, W. Akanni, M. R. Amode, D. Barrell, J. Bhai, K. Billis, C. Cummins, A. Gall, C. G. Giron, L. Gil, L. Gordon, L. Haggerty, E. Haskell, T. Hourlier, O. G. Izuogu, S. H. Janacek, T. Juettemann, J. K. To, M. R. Laird, I. Lavidas, Z. Liu, J. E. Loveland, T. Maurel, W. McLaren, B. Moore, J. Mudge, D. N. Murphy, V. Newman, M. Nuhn, D. Ogeh, C. K. Ong, A. Parker, M. Patricio, H. S. Riat, H. Schuilenburg, D. Sheppard, H. Sparrow, K. Taylor, A. Thormann, A. Vullo, B. Walts, A. Zadissa, A. Frankish, S. E. Hunt, M. Kostadima, N. Langridge, F. J. Martin, M. Muffato, E. Perry, M. Ruffier, D. M. Staines, S. J. Trevanion, B. L. Aken, F. Cunningham, A. Yates, and P. Flicek. 2018. 'Ensembl 2018', Nucleic Acids Res, 46: D754-D61. Zhai, Y., R. Wu, R. Kuick, M. S. Sessine, S. Schulman, M. Green, E. R. Fearon, and K. R. Cho. 2017. 'High-grade serous carcinomas arise in the mouse oviduct via defects linked to the human disease', J Pathol, 243: 16-25. Zhang, C., X. Zhang, L. Huang, Y. Guan, X. Huang, X. L. Tian, L. Zhang, and W. Tao. 2021. 'ATF3 drives senescence by reconstructing accessible chromatin profiles', Aging Cell, 20: e13315. Zhang, L., C. Wang, S. Yu, C. Jia, J. Yan, Z. Lu, and J. Chen. 2018. 'Loss of ARID1A Expression Correlates With Tumor Differentiation and Tumor Progression Stage in Pancreatic Ductal Adenocarcinoma', Technol Cancer Res Treat, 17: 1533034618754475. Zhang, R., T. Fukumoto, and E. Magno. 2018. 'SWI/SNF Complexes in Ovarian Cancer: Mechanistic Insights and Therapeutic Implications', Mol Cancer Res. 486 Zhang, Y., G. LeRoy, H. P. Seelig, W. S. Lane, and D. Reinberg. 1998. 'The dermatomyositis- specific autoantigen Mi2 is a component of a complex containing histone deacetylase and nucleosome remodeling activities', Cell, 95: 279-89. Zhang, Y., T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nusbaum, R. M. Myers, M. Brown, W. Li, and X. S. Liu. 2008. 'Model-based analysis of ChIP-Seq (MACS)', Genome Biol, 9: R137. Zhang, Y., D. Zhao, C. Gong, F. Zhang, J. He, W. Zhang, Y. Zhao, and J. Sun. 2015. 'Prognostic role of hormone receptors in endometrial cancer: a systematic review and meta-analysis', World J Surg Oncol, 13: 208. Zhao, B., J. Lin, L. Rong, S. Wu, Z. Deng, N. Fatkhutdinov, J. Zundell, T. Fukumoto, Q. Liu, A. Kossenkov, S. Jean, M. G. Cadungog, M. E. Borowsky, R. Drapkin, P. M. Lieberman, C. T. Abate-Shen, and R. Zhang. 2019. 'ARID1A promotes genomic stability through protecting telomere cohesion', Nat Commun, 10: 4067. Zhao, J. J., Z. Liu, L. Wang, E. Shin, M. F. Loda, and T. M. Roberts. 2005. 'The oncogenic properties of mutant p110alpha and p110beta phosphatidylinositol 3-kinases in human mammary epithelial cells', Proc Natl Acad Sci U S A, 102: 18443-8. Zhao, J., X. Li, M. Guo, J. Yu, and C. Yan. 2016. 'The common stress responsive transcription factor ATF3 binds genomic sites enriched with p300 and H3K27ac for transcriptional regulation', BMC Genomics, 17: 335. Zhou, W., K. M. Gross, and C. Kuperwasser. 2019. 'Molecular regulation of Snai2 in development and disease', J Cell Sci, 132. Zondervan, K. T., C. M. Becker, K. Koga, S. A. Missmer, R. N. Taylor, and P. Vigano. 2018. 'Endometriosis', Nat Rev Dis Primers, 4: 9. Zondervan, K. T., C. M. Becker, and S. A. Missmer. 2020. 'Endometriosis', N Engl J Med, 382: 1244-56. Zondervan, K. T., N. Rahmioglu, A. P. Morris, D. R. Nyholt, G. W. Montgomery, C. M. Becker, and S. A. Missmer. 2016. 'Beyond Endometriosis Genome-Wide Association Study: From Genomics to Phenomics to the Patient', Semin Reprod Med, 34: 242-54. 487