COMPARATIVE GENOMICS IN BREAST CANCER: INSIGHTS FROM MURINE MODELS TO HUMAN DISEASE MECHANISMS By Carson Dennis Broeker A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Biochemistry and Molecular Biology—Doctor of Philosophy 2025 ABSTRACT Breast cancer is the most diagnosed cancer in women in the United States and throughout the world. Current estimates have approximately 1 out of every 8 women will get breast cancer at some point in their lifetime, while 1 out of every 39 women will ultimately die from breast cancer in the United States. Breast cancer is a heterogeneous disease, with no two patients’ breast cancers being exactly alike. The research and medical communities have spent a significant amount of effort to characterize and classify breast cancers into meaningful subtypes that represent different possible treatment modalities. Despite treatment advances in targeting specific breast cancer subtypes, we still lack effective treatment options for various subtypes of breast cancer and patients with advanced/metastatic breast cancer. A large domain of cancer research centers around attempting to mimic human breast cancer subtypes in mice to mechanistically study oncogenes and tumor suppressors in controlled environments. Identification of mouse models representative of human breast cancer subtypes would enable sensitive and specific therapeutic development targeting breast cancer subtypes while limiting systemic toxicity. One of the ways researchers classify breast cancers into meaningful categories is through intrinsic subtypes, where breast cancer subtypes are defined by differential expression of key genes known to be involved in oncogenic signaling pathways unique to each subtype. The differential expression of these genes often stems from acquired genomic alterations. I hypothesize that comparative genomics between human breast cancer subtypes and mouse models of breast cancer would allow identification of additional genomic oncogenic driver events, potentially establishing links between specific mouse models of breast cancer to human breast cancer subtypes. Follow up in vitro and in vivo mechanistic studies can then be used to validate computational findings to confirm putative oncogenic drivers. To this end, I analyzed the genomes of spontaneous primary mammary gland tumors from different mouse models of breast cancer, focusing on the MMTV-Cre E2F5 conditional knockout, MMTV-Myc, and MMTV-Neu mouse models. In summary, I found that MMTV-Cre driven E2F5 conditional knockout in the murine mammary gland resulted in highly heterogeneous genomic alterations, but most tumors independently developed amplification events centered on Wnt2, Met, Cav1, and Cav2, while also possessing the same highly impactful splice acceptor mutation in Fbxo15. The MMTV-Myc mouse model of breast cancer generates several distinct histological subtypes, from which I identified that the EMT histological subtype coalesces on heterogeneous activation of the Kras pathway through Kras activating mutations or Fgfr2 amplification, while the microacinar subtype develops co-occurring mutations in Kit and Rara and amplifications over chromosomes 11 and 15. Finally, I interrogated the previously discovered 17q21.33 amplicon centered over putative oncogenes COL1A1, CHAD, and PHB1 involved in metastasis in HER2+ breast cancer. I found increased breast cancer metastasis signatures and increased estrogen signaling but failed to identify the specific role each gene played in the metastatic cascade. These results underscore the potential impact of genomic sequencing in mouse models of breast cancer and their applicability to studying human breast cancer development and treatment. This work is dedicated to: My wife: May. My mother: Roxann. My late grandmothers: Lou Ellen and Erma. My brothers and sisters-in-law: Austin, Kate, Troy, Katherine, Trent, and Heather. iv ACKNOWLEDGMENTS None of the work outlined in this dissertation would have been possible without the assistance and support of a great number of people; they have helped me grow as both a scientist and person, for which I would like to express my sincerest thanks. There are a great many more people who have been instrumental in shaping the trajectory of my research path than I can put into words, but rest assured, I am deeply thankful to you all. First, I would like to thank my mother, Roxann Broeker, for your endless love and support; you have given me more than I could ever know or ever return. You have encouraged me every step of the way and I can truly say I have only made it this far today because of you. I would also like to thank my brothers—Austin, Troy, and Trent Broeker— and my sisters-in-law—Kate, Katherine, and Heather Broeker; some of my fondest memories are with you, and I am grateful for the time we get to spend together. I hope to be around much more often than I have been for the past many years. I want to thank my late grandmothers, Lou Ellen Kisker and Erma Broeker; you are dearly missed, and thank you for some of my most cherished memories. Lou Ellen, you are proof that breast cancer is curable and research like mine makes a positive difference in the world. To the rest of my family—aunts, uncles, cousins, nieces, nephews, and all others—thank you. I can’t overstate the importance that my friends, both old and new, played during this time in my life. To Doug Guzior, my former roommate and forever one of my best friends, thank you for the many late nights of laughter and mutually shared disgruntled musings. Convincing you that St. Louis is midwestern was perhaps the most difficult challenge I face during my PhD. To Erik Debye and Ally Paine, thank you for being there throughout our time in East Lansing and beyond, especially during the most important v time of our lives—Stardew Valley nights—and also for being the best man and matron of honor at our wedding. To Halle Flate, Jasper Gomez, Beth Ottosen, Chris and Jenn Speicher, Matt, Leah, Ty, Charles, and the rest of the Misfits, and of course the TNF group, especially Ryan; you have been constant sources of support, advice, and companionship. Thank you all. I want to thank my supervisor, Eran Andrechek, for his guidance and mentorship over the many years that have made me mature as a scientist. I may not have fully understood your decisions and intentions in the moment at times, in retrospect I do believe you had my development as a scientist in mind. Thank you to my dissertation committee members—Sophia Lunt, David Arnosti, Hua Xiao, and Bill Henry—for your direction and guidance during my PhD. Thank you to Sophia Lunt and David Arnosti for allowing me to do rotations in their labs. Thank you again to David Arnosti for always taking the time to have discussions with me and provide reassurance. I would like to thank my colleagues in the Andrechek lab, both past and present. Briana To and Matt Swiatnicki; you provided sage advice and mentorship at the beginning of my tenure in the Andrechek lab. To John Vusich, Jesus Garcia-Lerena, Mylena Ortiz, and Anthony Schulte; thank you for the discussions, advice, banter, and help in my projects. Thank you to all undergraduates, rotation students, and summer students that helped me to develop as a mentor. Thank you to my former undergraduate mentor labs of Ruthie Angelovici and Chiswili Yves Chabu at the University of Missouri-Columbia; you were the first to inspire my love of research science and taught me how to rigorously apply the scientific method. vi I would be remiss if I did not thank all those who supported me at the scientific and administrational level: Amy Porter, Sandra O’Reilly, Jessica Lawrence, Jessie Bennett, Jessica Spitzley, Jasmine Jackson, Nichole Daly, Brad Robinson, Jake Wier, and Inna McNamara. Additionally, I want to thank the many colleagues and friends I found outside of the lab: the MSU Covid-19 Early Detection Program team; Dirk Colbry, Jim Leikert, and all others at the MSU Institute for Cyber Enabled Research; Estelle Maes and the rest of the team at AbbVie in the Quantitative Translational ADME Sciences department. Thank you to the MSU cancer research community for your discussions and advice regarding my project. I want to thank Lauren Aitch and the Aitch Foundation for believing in the value of my project and awarding me a fellowship in 2022. Thank you to the government funding sources, the Department of Defense CDMRP (W81XWH-21-1-0002) and NCI F31 (F31CA271689-01A1 and F31CA271689-02), for funding the research that went into this project. Funding is a crucial part of the scientific process, and we wouldn’t be able to make these advances without the financial support of both public and private institutions that believe in the importance of basic cancer research. Finally, I want to thank my lovely wife, May Broeker. You have been my rock and greatest motivator throughout these years. I surely would not have made it this far without your unwavering support. You have endured and sacrificed more than I could ever repay; I am eternally grateful to you. Thank you for sticking it out with me even in the hardest of times. We will always be there for each other. I look forward to the rest of our lives together. vii PREFACE This dissertation will frequently use gene and protein names for both mice and humans. This dissertation will break with the convention of italicizing gene names and instead write both gene names and protein names in plain text. Another convention is that all letters in human gene names are capitalized, while just the first letter in mouse gene names are capitalized—this convention will be followed. While the protein product and gene name are often the same between species, there are some exceptions which may confound understanding at points in this dissertation. TP53 refers to the human gene tumor protein 53, while Trp53 refers to the mouse gene transformation-related protein 53; these genes are orthologs of each other and serve the same purpose in both species as major tumor suppressors. Similarly, Col1A1, Chad, Phb1, and Cd31 refer to mouse orthologs of the human genes COL1A1, CHAD, PHB1, and CD31. Instances of PHB/Phb also refer to PHB1/Phb1—both instances are commonly used in the literature and refer to the same gene. However, p53 is the protein product name of both TP53 in humans and Trp53 in mice. To complicate matters further, a significant portion of this dissertation will discuss HER2+ breast cancer, which is driven by amplification of the gene ERBB2. HER2 is usually used in context as the oncogenic protein product of the gene ERBB2 in humans. Erbb2 is the orthologous gene name in mice, but the protein naming convention in murine animals is Neu due to the protein first being identified as playing a role in neuroblastoma in rodents. MMTV-Neu refers to a mouse model of breast cancer that is meant to mimic human HER2+ breast cancer. The work described herein was only possible due to the contributions of a great number of people. Carson Broeker is solely responsible for chapter 1 of this dissertation. viii For chapter 2, Briana To and Eran Andrechek are responsible for the planning, acquisition of data, and figure generation for Figures 2.4.1B and 2.4.1C. Briana To is responsible for writing non-computational materials and methods in chapter 2. Carson Broeker is responsible for all other aspects of chapter 2 of this thesis, including hypothesis generation, experimental procedures, data acquisition, interpretation, figure generation, and writing. For chapter 3, Carson Broeker wrote the manuscript, performed whole genome DNA extraction and processing of MMTV-Myc samples, performed all bioinformatic analyses, performed all statistical analyses, performed all machine learning, and contributed to the theoretical design of the study; Mylena Ortiz performed mutation verification by Sanger sequencing and ssGSEA processing of microarray gene expression samples from GSE15904; Michael Murillo contributed to the theoretical and technical design of the machine learning components. Eran Andrechek performed original MMTV-Myc tumor extraction and cryopreservation, performed microarray analysis on MMTV-Myc tumors, and lead the theoretical design of the study; all authors contributed to the revision of the manuscript; Eran Andrechek is the corresponding author for chapter 3. For chapter 4, Eran Andrechek is the corresponding author, collected MMTV-Neu tumor FFPE samples used in Figure 4.4.2, and provided minor revisions to the manuscript; the MSU histopathology core provided FFPE, IHC, and H&E services. Carson Broeker is responsible for all other aspects of the article, including generating hypotheses, experimental methods, data collection, data interpretation, data visualization, and writing of the manuscript. Chapter 5 was solely written and conceptualized by Carson Broeker. All chapters of this dissertation had minor changes made at the behest of committee members. ix TABLE OF CONTENTS LIST OF ABBREVIATIONS ............................................................................................. xi CHAPTER 1: INTRODUCTION ....................................................................................... 1 CHAPTER 2: CONDITIONAL KNOCKOUT OF E2F5 IN THE MURINE MAMMARY GLAND RESULTS IN HETEROGENEOUS TUMOR DEVELOPMENT ........................ 30 CHAPTER 3: MMTV-MYC TUMOR HISTOLOGICAL HETEROGENEITY REFLECTS HETEROGENEITY IN HUMAN BREAST CANCER ...................................................... 57 CHAPTER 4: AMPLIFICATION OF COL1A1, CHAD, AND PHB1 INCREASES BREAST CANCER METASTASIS ............................................................................................... 94 CHAPTER 5: FUTURE DIRECTIONS ......................................................................... 128 BIBLIOGRAPHY ......................................................................................................... 140 APPENDIX A: FIGURES ............................................................................................. 167 APPENDIX B: TABLES ............................................................................................... 197 x LIST OF ABBREVIATIONS ABC ATP-Binding Cassette ADC Antibody Drug Conjugate AML Acute Myeloid Leukemia ANOVA Analysis of Variance BRCA1 Breast Cancer Type 1 Susceptibility Protein BRCA2 Breast Cancer Type 2 Susceptibility Protein BRG Brahma-Related Gene-1 CHAD Chondroadherin cKO Conditional Knockout CNV Copy Number Variation COL1A1 Collagen, Type I, Alpha-1 CRISPR Clustered Regularly Interspaced Short Palindromic Repeats CRISPRa CRISPR Activation DBD DNA Binding Domain DCIS Ductal Carcinoma In Situ DNA Deoxyribonucleic Acid DP Dimerization Partner DREAM Dimerization Partner, RB-Like, E2F and Multi-Vulval Class B E2F Early 2 Factor EGFR Epidermal Growth Factor Receptor EMT Epithelial-to-Mesenchymal Transition xi ER Estrogen Receptor 1 FFPE Formalin Fixed Paraffin Embedded FVB Friend Leukemia Virus B GEMM Genetically Engineered Mouse Model GEO Gene Expression Omnibus GRB7 Growth Factor Receptor-Bound Protein 7 GSEA Gene Set Enrichment Analysis HER2 Human Epidermal Growth Factor Receptor 2 HR IBC IDC IHC ILC ISH KD KM KO Hormone Receptor Invasive Breast Cancer Invasive Ductal Carcinoma Immunohistochemistry Invasive Lobular Carcinoma In Situ Hybridization Knockdown Kaplan-Meier Knockout LCIS Lobular Carcinoma In Situ LTR Long-Terminal Repeat mAb Monoclonal Antibody MAPK Mitogen Activated Protein Kinase MAPQ Mapping Quality xii MCC Matthews Correlation Coefficient MECs Mammary Epithelial Cells MMTV Mouse Mammary Tumor Virus NBF Neutral Buffered Formalin NSCLC Non-Small Cell Lung Cancer OS Overall Survival PAM50 Prediction Analysis of Microarray 50 PARP Poly ADP-Ribose Polymerase PCA Principal Component Analysis PcG Polycomb Group PHB1 Prohibitin 1 PR Progesterone Receptor PyMT Polyomavirus Middle T Antigen RB Retinoblastoma RBF Radial Basis Function RFECV Recursive Feature Elimination with Cross-Validation RFS Relapse Free Survival RNA Ribonucleic Acid RTK Receptor Tyrosine Kinase SAPK Stress Activated Protein Kinase shRNA Short Hairpin RNA SNV Single Nucleotide Variant xiii ssGSEA Single Sample Gene Set Enrichment Analysis SVM Support Vector Machine SWI/SNF SWItch/Sucrose Non-Fermentable TCGA The Cancer Genome Atlas TKI Tyrosine Kinase Inhibitor TNBC Triple Negative Breast Cancer t-SNE t-distributed Stochastic Neighbor Embedding VAF Variant Allele Frequency WAP Whey Acidic Protein WGS Whole Genome Sequencing xiv CHAPTER 1: INTRODUCTION 1 1.1 Origins of cancer Cancer at its core is a group of complex and malignant diseases defined by uncontrolled cell proliferation that disrupt normal tissue organization and interfere with the body’s normal functioning. Some of our earliest known references to cancer date back to around 4000-2000 B.C. from the Egyptian physician Imhotep and in the Smith and Ebers papyri. In Greece, Hippocrates would later take detailed accounts of patient tumors around 400 B.C., then coining the term karkinoma, Greek for “crab”, and where we ultimately derive the term carcinoma and cancer from1,2. While the root causes of these malignancies at the time were erroneously believed to be an imbalance of humors or bile in the body, trying to define the causal mechanisms for what causes cancer to form has preoccupied the minds of scientists and medical professionals for millennia. Only in the 19th century did our modern understanding of cancer begin to emerge, with the idea emerging that all cells must arise from already existing cells, forming a crucial tenet of cell theory and purporting cancer development arises from abnormal cellular processes3. In the early 20th century, it was proposed that chromosomal abnormalities can lead to cancer development4—the first written record proposing a link between chromosomes and cancer. Later in the 20th century, it became firmly established that cancer is precipitated by genetic abnormalities, linking the development of retinoblastoma (RB) to mutations in the RB1 gene, now known as the “two hit hypothesis”5, and colorectal cancer development precipitating from the sequential accrual of somatic mutations that would cause gain of function changes in oncogenes and loss of function changes in tumor suppressors6,7. 2 Along with the supporting work of thousands of other scientists, the genetic basis for cancer had been firmly established in the late 20th century. Knowing that many of the hallmarks of cancer culminate in uncontrolled cellular proliferation stemming from normal homeostatic and developmental processes gone awry8, it is crucial to understand the normal function of the cellular processes and genes that are perturbed along the way to carcinogenesis. 1.2 Genetics in cancer Multicellular organisms require a coordinated symphony of different cellular processes to work closely together to maintain homeostasis and proper functioning of the organism. Essential to our understanding of why genetics are the primary target through which carcinogenesis acts lies in the central dogma of molecular biology, where deoxyribonucleic acid (DNA) acts as the lasting blueprint from which ribonucleic acid (RNA) is transcribed, then translated into proteins which act as the main molecular machinery that enables the vast majority of cellular and organismal processes9. Inherent to the central dogma of molecular biology is that information flow is unidirectional, so proteins cannot encode RNA or DNA, and RNA does not encode DNA—although the later discovery of RNA retroviruses and reverse transcription did prove there are exceptions to this. However, in purely eukaryotic and mammalian systems thus far there are no instances where this unidirectional flow has been broken. Cancer exploits this usual flow of information in the cells by making alterations directly to DNA where those changes later reverberate down to RNA and proteins. When undergoing mitosis, these changes to the DNA are heritable to daughter cells. If these changes to the DNA ultimately result in an evolutionary advantage for the cell—such as allowing it to replicate and grow much faster 3 than its neighboring cells or inhibit apoptosis signals—this naturally leads to a greater proportion of this population harboring the DNA responsible for the evolutionary advantages. Herein lies the essential problem in carcinogenesis, where cancer onset is akin to a speciation event where this malignant population no longer recognizes playing a small, discrete, and specialized role as part of a larger multicellular organism, but rather this population is now in competition with surrounding non-malignant cells for resources and space10. If mutations or other genetic changes can result in the ability of a cell to increase its evolutionary fitness in its specific environment, over time this population of cells present within this environment will come to dominate its makeup if there are no counteracting selection mechanisms11. 1.3 Oncogenes In all cells there are subsets of genes that are responsible for cellular processes including cell growth, DNA replication, differentiation, and more. Essential in normal development and homeostasis, these genes are often exploited in carcinogenesis to increase the evolutionary fitness of the malignant population of cells. These genes that have the capacity to cause cancer when altered are referred to as proto-oncogenes in their normal state but referred to as oncogenes when in their malignant state. These gain of function genetic changes are a key component of driving cancer initiation and progression. Confirmed oncogenes may often be referred to as drivers of tumorigenesis in this dissertation and in the literature. Drivers of tumorigenesis are referred to as such because they confer a selective growth advantage, and without them tumorigenesis would not occur or cannot be sustained12. 4 Commonly altered genes in cancer can be disparate in their exact cellular functions but coalesce at increasing the evolutionary fitness of the malignant population. An example of one of these commonly altered genes is epidermal growth factor receptor (EGFR). EGFR is commonly mutated in non-small cell lung cancer (NSCLC), with meta- analyses finding prevalence of mutations ranging from greater than 38% of NSCLC populations in Asia to 14% of NSCLC cases in Europe13. These mutations in EGFR are selected for based on their ability to activate constitutive downstream signaling of EGFR. Signaling downstream of EGFR includes many other known proto-oncogenic pathways including the likes of the c-RAF serine/threonine-protein kinase pathway for cell growth and division, the PI3K/AKT pathway—which induces cell growth, differentiation, proliferation, motility, survival, and intracellular trafficking—and signal transducer and activator of transcription pathways14. In contrast to NSCLC, EGFR is preferentially amplified in glioblastoma multiforme, effectively increasing downstream signaling from EGFR by simply increasing the gene copies of EGFR leading to higher transcription and translation of EGFR15. As previously stated, EGFR is preferentially mutated in NSCLC rather than amplified. These mutations in NSCLC typically affect the kinase domain, allowing for inhibition of dephosphorylation, and thus constitutive activation of EGFR signaling16. The differing evolutionary selective pressures for EGFR amplification versus mutation are not well understood. However, these disparate DNA alterations converge on increasing EGFR downstream signaling, suggesting a selective pressure for malignant tumors that emerge at each of these respective tissues to increase EGFR signaling specifically. While only genetically altered at low single digit percentiles in all breast cancer cases, EGFR is overexpressed in nearly 5 half of triple negative breast cancer (TNBC)17, known as the most aggressive and difficult to treat breast cancer subtype due to its lack of targetable markers18. All of this is to demonstrate that across cancer types, there are selective pressures that drive convergence on genetic alterations of key proto-oncogenes, such as EGFR, that a select population of cells will acquire. This genetic alteration provides an evolutionary competitive advantage over surrounding cells, driving growth and proliferation of this malignant population of cells, ultimately leading to carcinogenesis. While activation of certain oncogenes is necessary for malignant transformation of cells, there are other key steps needed for these cells to overcome inherited limitations meant to explicitly prevent uncontrolled proliferation. This section has intentionally been kept limited to discuss the oncogenes relevant to this dissertation in later chapters. EGFR is genetically altered in only ~3% of total breast cancer cases19, but it represents the diversity of possible genomic alterations possible, selected for based on either tissue of origin or even extending to further subtypes of different cancers from a single tissue of origin. 1.4 Tumor suppressors If oncogenes are the accelerators of cell proliferation and outgrowth, then tumor suppressors are the brakes on these processes. Tumor suppressors are aptly named due to Alfred Knudson’s work of the RB protein suppressing retinoblastoma tumor formation5. The RB protein is meant to act as a brake on the cell cycle, sequestering E2F transcription factors and preventing the transition from G1 to S phase in the cell cycle20. In the case of Knudson’s work, as a physician he encountered children who presented with a rare form of ocular cancer called retinoblastoma. Knudson saw that certain families had a hereditary 6 predisposition of developing retinoblastoma, commonly in both eyes, while others whose families did not have a hereditary predisposition to developing retinoblastoma often presented with the disease later in life and only in one eye. The discovery of tumor suppressors came from the realization that those who developed retinoblastoma in both eyes inherited a mutation in the RB (gene name RB1), which rendered one of the two genomic copies of RB defective. For a time, the single functioning copy of RB was able to compensate and prevent tumor formation, but it only took one additional deleterious mutation in the other allele of RB to spontaneously develop tumors. In the case of patients with the hereditary mutation in RB, the probability of one cell in each eye developing a deleterious mutation in the functioning allele of RB is much more likely than multiple deleterious mutations occurring on both alleles within the same cell, thus why patients with hereditary RB mutations tend to develop retinoblastoma tumors in both eyes rather than just one5. While RB plays an important role in the cell cycle, there are other, more prototypical tumor suppressors that play roles in many different areas antagonistic to cancer development and progression. The best known tumor suppressor is p53 (gene name TP53 in humans, Trp53 in mice), which plays a critical role in responding to external stresses on the cell. In response to various stress signals, p53 can coordinate DNA damage repair, lead the cell into senescence, arrest the cell cycle, or even cause the cell to head down the irreversible path of apoptosis21. For its role in coordinating DNA damage repair, p53 is colloquially referred to as “the guardian of the genome”. Since genomic alterations are required to activate oncogenes, these genetic perturbations are often correctly seen as DNA damage to the cell, where p53 will then coordinate the DNA 7 damage response to correct these alterations. Indeed, p53 is possibly the most important gene in suppressing tumor formation across all cancer types, which is why in nearly half of all cancers p53 is mutated to the point of loss of function22. Additional evidence of p53’s critical role as a tumor suppressor is p53 homozygous null mice develop tumors with close to 100% penetrance with virtually no other ill effects on development23. Many of these mutations manifest as missense mutations in the DNA binding domain of p53 primarily at codons 175, 213, 245, 248, 273, and 282; these mutations can act in a dominant negative fashion by inhibiting the functional tetramer of p53 from recognizing and binding DNA, where it would normally activate transcription of other genes involved in DNA damage repair24. Mutations outside the DNA binding domain are often truncating mutations and simply inhibit wildtype (WT) function of p5322. The dependence on loss of function of p53 varies widely by tissue of origin for carcinogenesis to occur. Only 15% of acute myeloid leukemia (AML) cases have mutant p53 while over 90% of ovarian cancer cases have p53 mutations19, highly suggestive that there is significant evolutionary pressure to have non-functional p53 in ovarian cancer but complete loss of function of p53 in AML is not necessary. For breast cancer, p53 is mutated or lost in nearly 34% of breast cancer cases19; There is clearly more to breast cancer initiation than loss of p53 or activation of one of the previously mentioned oncogenes. 1.5 Breast cancer Broadly, breast cancer is a type of solid cancer where the breast epithelium is the primary induction site of carcinogenesis. Breast cancer is a notoriously heterogeneous disease and can be classified in many ways in terms of histopathology, grade, nodal 8 status, intrinsic subtype, or degree of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) expression25. Breast cancer most often begins in the ductal cells that line the milk duct, comprising 85% or more of breast cancer cases, but breast cancer can begin in the lobular cells that make up the milk producing lobules at the end of the terminal duct lobular units, accounting for the remaining 8-15% of cases. When the premalignant cells have yet to escape the confines of either the lobule or duct from which they originated, the cancer is referred to as lobular carcinoma in situ (LCIS) or ductal carcinoma in situ (DCIS), respectively. If these cells turn malignant, they will invade through the basement membrane into the surrounding fibrovascular tissue; the disease has then advanced to invasive lobular carcinoma (ILC) or invasive ductal carcinoma (IDC), collectively referred to as invasive breast cancer (IBC). However, DCIS/LCIS does not necessarily lead to the development of IDC. Less than half of women diagnosed with DCIS go on to develop IDC26,27. A problem in breast cancer treatment is overtreatment of what would likely be benign DCIS cases that would have never progressed to IDC, but at a population level, more intensive, early treatment of DCIS improves patient odds of preventing DCIS progression to IDC28. Recently, however, it was found we can predict DCIS progression probability with gene signatures even better than clinical markers29, which may help us determine which DCIS cases to aggressively treat and which can continue under the ‘watchful waiting’ of a physician. Other than histopathology, another common method for stratifying breast cancers is through receptor expression. The most common type of breast cancer defined by receptor expression is hormone receptor (HR) + and HER2-, where HR+ represents ER/PR positivity in the tumor since these are hormonally activated receptors. The 9 HR+/HER2- subtype makes up about 68-70% of all breast cancers30,31. The next most prevalent receptor-based subtype at 11% of all breast cancer cases is the HR-/HER2-, synonymously referred to as TNBC. Following is the HR+/HER2+ subtype at 10% and the HR-/HER2+ subtype at 4-5%, with the other remaining percentage of women not having well defined subtypes30,31. Stratifying breast cancers by receptor expression or lack thereof is especially useful in clinical settings because our standard of care therapies rely on the presence of ER, PR, or HER2 protein within the cells of the tumor to be effective at selectively killing tumor cells. Physicians often base treatment decisions on the staging system for breast cancer which relies on 7 key metrics: size of the tumor, spread of the cancer to nearby lymph nodes, presence of distant metastasis, grade of the cancer, ER status, PR status, and HER2 status32. Options for treating each breast cancer subtype will be explored in a later section. In bioinformatics-based breast cancer research, gene expression signatures are typically used to classify breast cancer patients into meaningful subtypes. Derived from a small breast cancer cohort in the early 2000s33, the prediction analysis of microarray 50 (PAM50) method is widely used to stratify breast cancer patients into ‘intrinsic’ breast cancer subtypes: luminal A, luminal B, HER2 enriched, basal-like, normal-like, and claudin-low—although, the claudin-low subtype has recently been called into question as to whether or not it represents an additional phenotype on top of the other intrinsic breast cancer subtypes rather than an intrinsic subtype itself34. The intrinsic subtypes of breast cancer, while largely overlapping with the receptor-based classification approach, are not synonymous with each other. Luminal A breast cancer is very similar to the HR+/HER2- receptor classification, as both are generally ER+/PR+, lack HER2 expression, and have 10 low expression of proliferation markers like Ki-67. Luminal B breast cancer is more tenuously defined but is generally similar to HR+/HER2+ or can be HR+/HER2- with high Ki-67 expression; luminal B breast cancer is generally faster proliferating and more aggressive than luminal A breast cancer with worse clinical outcomes to boot. HER2 enriched breast cancer is essentially identical to HR-/HER2+ breast cancer, but also includes expression of associated genes like growth factor receptor-bound protein 7 (GRB7), which bind to receptor tyrosine kinases (RTKs) like HER2 to aid in downstream signal transduction. Basal-like breast cancer is often likened to TNBC, because both subtypes are HR-/HER2-, although basal-like breast cancer must also include expression of additional genes such as EGFR and cytokeratin 5 or cytokeratin 635. Claudin-low has a lot of overlap with basal-like breast cancer, being HR-/HER2-, but will also have low expression of claudins that would usually aid in establishing tight junctions between cells and will also have high epithelial-to-mesenchymal (EMT) marker expression like vimentin, fibronectin, and N-cadherin. Normal-like breast cancer can be the trickiest subtype to define. The gene expression pattern of normal-like breast cancer is, as its name implies, similar to normal breast tissue. This usually means lower ER and PR expression than the luminal subtypes and lower cell proliferation markers. Because of the heterogeneity present within the normal-like subtype, it can often be difficult to classify and treat effectively in the clinical setting36. Receptor based classifications of breast cancer subtypes aid physicians in finding the best possible treatment options for patients, while gene expression based subtyping like PAM50 enables researchers to tie prognostic value from each intrinsic subtype to cellular pathways and ultimately therapeutic implications. Intrinsic subtyping is not 11 typically used in the clinic, but there is clear prognostic value from intrinsic subtypes that is not captured by receptor based subtyping alone37. Expression based subtyping contextualizes oncogenic pathways, which may inform future therapeutic development. 1.6 Breast cancer statistics It is estimated that in the United States there will be 313,510 new cases of breast cancer diagnosed in 2024 alone—2,790 in men, and 310,720 in women—representing approximately 15.7% of all total cancers and 30% of all cancers diagnosed in women in 202438. It is no surprise that breast cancer is the most commonly diagnosed cancer in woman, excluding non-melanoma of the skin30. Proportionally, the individual breast cancer patient has never had a better time for survival odds in the history of the United States; the highest reported female age adjusted breast cancer death rate being at 26.6 deaths per 100,000 women in 1999, and the death rate falling steadily each year to 18.7 deaths per 100,000 women in 202239. Despite the progress of the breast cancer death rate decreasing steadily over time over the past 20 years, there has been an increasing number of total breast cancer cases diagnosed each year, increasing by 0.5% per year over the same period30. Combining this data, in total there were 41,144 deaths attributed to breast cancer in 1999 and now 42,211 deaths attributed to breast cancer in 2022 in the United States, with this number oscillating between a low of 40,589 in 2008 and a high of 42,465 in 201839. Currently, women have a 1 in 8 chance of developing breast cancer at some point in their lifetime, with this probability peaking in septuagenarians30. The breast cancer mortality rate has steadily decreased to the point where only 1 in 39 women will die from breast cancer as of 202230. Breast cancer is the second leading cause of cancer death in 12 women behind lung cancer overall, although 80-90% of lung cancer cases are initiated from modifiable risk factors, such as smoking38, whereas only up to 30% of breast cancers are attributable to modifiable risk factors, like obesity, activity level, and alcohol consumption40. Taken together, breast cancer is the cancer type with the highest number of annual deaths in women that are due to non-modifiable risk factors—most breast cancer diagnoses and deaths are due to factors outside the control of the patient themselves and their lifestyle. 1.7 Breast cancer metastasis Metastasis is the process in which the primary tumor spreads from its initial induction site to a more distant site in the body. The metastatic cascade usually involves the initial transformation of cells into malignant cells, growth of these cells into a primary tumor, optionally stimulating angiogenesis during this process, and then invading the surrounding tissue. However, metastasis is different from local tissue invasion. Only once the malignant cells intravasate into either the blood supply or lymphatic system and extravasate at a distant site in the body, colonize this site, and form another tumor would this be called metastasis41. In solid cancers, metastasis accounts for the large majority of mortality associated with the disease, ranging between 66–90% of all cancer mortality42. A recent pan-cancer analysis also found that of those living with metastatic cancer, ~83% of these patients ultimately die from their diagnosed cancer, with the remaining ~17% dying from alternative causes43. Thankfully, the proportion of people dying of metastatic breast cancer has been decreasing over the decades and now represents one of the cancer patient populations proportionally least likely to die from metastatic disease along with 13 prostate cancer patients43. However, while an individual patient may be less likely to die from metastatic breast cancer, given the prevalence of breast cancer diagnoses this still translates into 42,000+ deaths in the United States alone from breast cancer every year38, rivaling the fatalities from liver disease, kidney disease44,45, and eclipsing motor vehicle fatalities46. Having established breast cancer as a prevalent disease with a high number of fatalities, mostly due to metastatic breast cancer, it is clear why there is concerted interest in understanding the mechanisms of breast cancer metastasis. To complicate matters further, there is ample evidence showing that the different subtypes of breast cancer preferentially metastasize to different sites of the body47–49. For instance, it has been found that IDCs preferentially metastasize to the central nervous system, distant lymph nodes, and the lungs, while ILCs metastasize more often to the ovaries, gastrointestinal tract, and the abdominal peritoneum50. For intrinsic breast cancer subtypes, all subtypes can seed bone metastases, but luminal B especially so. HER2 enriched tumors are more prone than HER2- subtypes to metastasize to the liver, brain, and lungs. Basal-like tumors often preferentially metastasize to the lungs35,51. Uncovering novel mechanisms of breast cancer metastasis can unlock new treatment modalities that can lessen or even prevent metastasis from occurring in the first place. Of the 4.1 million women in the United States with a history of breast cancer, only 4% of these women are currently living with metastatic breast cancer52. The small percentage of breast cancer patients living with metastasis is reflective of the poor 5-year overall survival (OS) rate at 29%30. Breast cancer treatment is more effective prior to metastatic dissemination. 14 1.8 Treatment of breast cancer In the clinic, physicians use the histopathology of the tumor, size and location of the tumor, ER/PR and HER2 status of the tumor, degree of lymph node involvement, determination of distant metastasis, genetic risks or other comorbidities, and other patient specific factors when deciding on the best treatment plan for breast cancer. Initial factors that can influence all other decisions made by a physician include determining how advanced the cancer is and how healthy the patient is to determine if certain treatments could be tolerated. If the cancer is in stage IV and the patient is in poor health, the physician may not seek curative treatment and instead focus on palliative care and ameliorating pain53,54. Care for stage IV breast cancer would likely also include therapeutics specific to the patient’s cancer if they can be tolerated to slow progress of the cancer or encouraging participation in a clinical trial for new cancer therapeutics. Applicable to all stages of IBC, one of the first steps is surgery to remove the primary tumor for stages I-III55, but may be applicable in some stage IV cases54. Radiotherapy often follows surgery to kill any remaining malignant cells in the area of interest. Systemic chemotherapy can be administered in the neoadjuvant or adjuvant setting depending on circumstances56. Beyond these steps, additional treatments are based on the breast cancer subtype. For HR+ cancers, regardless of HER2 status, patients may be given aromatase inhibitors to prevent androgen conversion to estrogen by the aromatase enzyme, or patients could be given alternatively/combinatorically selective estrogen receptor modulators, like tamoxifen56,57. Knowing estrogen’s primary role is to promote epithelial cell proliferation in the mammary gland and uterus during puberty, for HR+ cancers these mechanisms of cell proliferation in response to estrogen 15 are still in effect, so blocking this pathway effectively reduces tumor growth and may even induce apoptosis in malignant cells58. Tamoxifen is often prescribed for premenopausal women while aromatase inhibitors are frequently prescribed for postmenopausal women. This is due to aromatase inhibitors not effectively targeting ovary production of estrogen in premenopausal women, so an inhibitor downstream of estrogen production needs to be used. But in postmenopausal women, ovary production of estrogen has ceased, so aromatase inhibitors with more efficacious profiles to prevent breast cancer recurrence can be used instead59. Separately, if tumors have overexpression of HER2, other treatments can be used regardless of HR status. HER2, the human protein product of the gene erythroblastic leukemia viral oncogene homolog 2/ Erb-B2 receptor tyrosine kinase 2 (ERBB2), is often overexpressed in breast cancer and has been found to be so consequential to patient clinical outcomes that it earns its own subtype discussed previously (HER2 enriched). HER2 is often amplified at the genomic level in 20-30% of IBCs19 and contributes to elevated mitogen-activated protein kinase (MAPK) signaling as well as PI3K-AKT signaling, leading to increased protein translation, cell survival, cell invasiveness, angiogenesis, cell motility, and cell proliferation60. HER2 lacks an extracellular ligand to activate receptor dimerization. Instead, HER2 can homodimerize or heterodimerize with other HER protein family members to activate MAPK signaling by simply encountering other HERs in the plasma membrane. The literature shows downstream signaling can be significantly altered depending on homodimerization or heterodimerization status of HER2. The consensus is that homodimerization results in stronger downstream signaling for both MAPK and PI3K-AKT pathways61,62. The genomic amplification and 16 consequential overexpression of HER2 increases the probability of HER2 homodimerization given the increased concentration of HER2 within the membrane relative to other HER members, giving the cells with this genetic event an evolutionary competitive advantage given the more robust downstream signaling from HER2 homodimerization. The current standard of care therapy for HER2+ breast cancer at any stage is treatment with monoclonal antibodies (mAbs) directed against HER2 to prevent dimerization with HER members acting in the extracellular space in combination with tyrosine kinase inhibitors (TKIs) that can permeate the cytosolic space of the cell to stop ATP binding to the RTK domain61. Different monoclonal antibodies exist to target different domains of HER2; trastuzumab is the oldest mAb targeted against HER2 and binds domain IV, while the only other mAb available is pertuzumab which targets domain II of HER261. More recently, therapeutics have combined these mAbs for their targeting specificity to HER2 and linked them to cytotoxic poisons that are released upon receptor internalization, then releasing the cytotoxic payload. These combinatorial drugs are referred to as antibody-drug conjugates (ADCs). Trastuzumab-emtansine is one of these ADCs that is the current standard of care for treating recurrent HER2+ breast cancer, but recently the DESTINY-Breast03 clinical trial showed trastuzumab-deruxtecan, a different ADC, significantly improved progression free survival relative to trastuzumab- emtansine63,64, suggesting that the cytotoxic linked drug plays an extremely significant role in inhibiting the growth of HER2+ breast cancer. Other targeted therapies often depend on genomic profiling of the tumor or more advanced methods. Immunotherapy like pembrolizumab for TNBC or basal-like breast cancer has become increasingly common due to the lack of other targets in these breast 17 cancer subtypes. A physician may look at PD-L1 expression levels or total tumor mutational burden to see if the patient would be a good candidate for immunotherapy, as high PD-L1 expression or high tumor mutational burden are good prognostic indicators for immunotherapy efficacy. Immunotherapy is often used in combination with chemotherapy especially for recurrent TNBC65. Other established targeted therapies in breast cancer depend on if the patient has mutations in DNA repair genes, particularly in breast cancer type 1 susceptibility protein (BRCA1) and BRCA2. BRCA genes are critical to double-stranded DNA repair using the error free homologous recombination pathway. A small subset of women (0.25%) will inherit monoallelic mutated BRCA1 and BRCA2 that causes loss of function, preventing error free DNA repair in these tissues in a dominant negative manner66. This results in accumulation of DNA damage that can act as evolutionary fodder for carcinogenesis initiation. In a landmark association study of over 113,000 women, it was found that loss of function of BRCA1 gave women a greater than 50% absolute risk of developing breast cancer by the age of 80 and a lifetime risk somewhere between 60-80%67, much greater odds than the lifetime risk of 12.5% of the general female population30. For BRCA mutations and other DNA repair protein mutations, poly ADP-ribose polymerase (PARP) inhibitors have proven effective in selectively killing malignant cells. PARP is instrumental in detecting and initiating single strand break repair and base excision repair, which are error prone solutions providing the evolutionary fodder for carcinogenesis in this case. PARP inhibitors prevent error prone DNA repair pathways from occurring, which would normally repair single strand breaks before they can progress to double strand breaks. With PARP inhibition, double strand breaks begin to accumulate within the genome. Accumulation of DNA double 18 strand breaks triggers p53 induced apoptosis or cell cycle arrest from which the cell is unable to recover68. However, newer evidence has come to light that instead suggests the mechanism for PARP inhibitors killing malignant cells is instead trapping PARP at a stalled replication fork in S phase of the cell, which leads to fork collapse and eventually apoptosis due to the inability of the cell to resolve these stalled replication forks69. Complicating matters is the frequent genetic divergence of the metastatic site compared to the primary tumor. The primary tumor initiation site and sites of distant metastases reside under different selective pressures, and thus the genetic composition that provides the best fitness in each of these locations is likely to be different. Indeed, multiple groups have found that metastatic sites differ greatly from the primary site they disseminated from. In a case study, gastric metastasis stemming from a primary breast tumor resembled markers of what is traditionally primary gastric cancer, suggesting the gastric microenvironment selects for specific cellular composition that can thrive in this location70. In a comprehensive analysis of primary breast tumors and matched metastases, the matched primary tumors and metastases were more closely associated with each other than primary tumors were to other primary tumors or metastases to other metastases. However, common differences were seen between patients primary tumors and metastases, such as deletion of ATM in metastases, loss of PIK3CA activating mutations in metastases, and increased mutations in TP53 in metastases71. While in this limited breast cancer cohort metastases mostly resemble the primary tumor, that is not the case for all cancer types, as genomic divergence in metastases from primary hepatocellular carcinoma (HCC) is common72. 19 Clearly, from the extreme heterogeneity present within breast cancer, distinct subtypes require treatment specific to the individual patient’s type of breast cancer to effectively block the cancer’s ability to proliferate and grow. 1.9 Obstacles in breast cancer treatment Cancer is nothing if not an evolutionarily complex microcosm of competition between malignant cells and the host. Just as when we treat bacterial infections with antibacterials we run the risk of drug resistance, so too does drug resistance crop up when treating breast cancer. If you don’t kill the entire population of cancer cells with treatment, then almost certainly the cells that survived had an increased level of fitness to deal with the chemotherapy or other therapeutic at hand. These resistant cells can then go on to repopulate the tumor, creating an even larger resistant population. Just as we would switch to different medications when bacterial drug resistance had developed, the same occurs in the clinic with cancer treatment. Polychemotherapy with multiple agents has been used to some success in breast cancer treatment73. When you only have one or two total options for a targeted therapy, though, how can you overcome the inevitable rise of resistant clones within your breast cancer population? Trying to account for therapeutic toxicity like bone loss/degradation from continued aromatase inhibition74 or cardiotoxicity/GI toxicity from HER2 mAb based therapies64 surely complicates treatment regimen decisions, but toxicities of treatments will not be discussed here. A common mechanism of drug resistance in breast cancer as with all cancers is the increased expression of ATP-binding cassette (ABC) transporter proteins. These membrane channel proteins are one of the main drug efflux mechanisms, binding both positively and negatively charged chemotherapeutics and transporting them from the 20 cytosol to the extracellular space75. Another general aspect applicable to many solid cancers is that these are often hypoxic environments with poor vasculature, limiting the penetration and drug availability to tumor cells76. In addition, a tightly packed microenvironment with stromal cells and extracellular proteins like collagen can make for difficult environment for immune cells to aid in targeting the solid tumor77, clarifying why some tumors create dense microenvironments of extracellular proteins. In contexts related just to breast cancer, some mechanisms of resistance can include additional genetic alterations that prevent previously given inhibitors from binding their intended targets in the first place. An example of this is in the ER ligand binding domain, where induced missense mutations are common after extended treatment to hormonal therapies like aromatase inhibitors78,79. A majority of acquired mutations (~60%) in luminal A (HR+/HER2-) patients occur at amino acid Y537 in the ligand binding domain. The effect of this mutation is to stabilize ER in its agonist form, allowing it to continuously confer constitutive activity independent of estrogen ligand binding78,80. These mutations render aromatase inhibitors and estrogen receptor modulators like tamoxifen ineffective at preventing ER signaling. Endocrine treatment induced acquired mutations are present in the majority of sequenced recurrent metastatic breast cancer from HR+ patients19,80. Unsurprisingly, similar genetic mechanisms are at play with multi drug resistance in HER2+ breast cancer. Under selective pressure, malignant cells can start producing p95HER2, a truncated form of HER2 that lacks an extracellular domain in order to thwart mAb therapies from inhibiting dimerization. Masking of the extracellular domain by increasing production of mucins or hyaluronans can also prevent mAb binding64. Co- occurring mutations at HER2L755S and HER3E928G have been shown to reduce RTK 21 antagonist effectiveness and increase the heterodimerization efficiency and downstream signaling of HER2/HER381. Increased selective pressure on HER2 specifically can cause malignant cells to seek adjacent pathways to keep up their cell proliferation and growth demands. PIK3CA mutations, most often associated with HR+ breast cancer, have been shown to reduce the efficacy of HER2 therapies, suggesting an escape mechanism or off-ramp for tumors once dependent on HER2 signaling for growth to shunt metabolic resources into the PI3K pathway82. Other evidence suggests bidirectional crosstalk between HER2 pathways and HR pathways. Studies have found ER signaling can promote resistance of HER2+ tumors to HER2 mAb therapies83,84, with clinical trials finding the combination of hormonal therapy and trastuzumab without the addition of chemotherapy in HR+/HER2+ metastatic breast cancer much more effective than hormonal treatment alone85. These findings suggest that one must inhibit multiple possible off-ramp pathways, like those of ER/PR, when the primary objective is HER2 pathway inhibition. Recent evidence suggests that there may be a new clinically significant breast cancer subtype, that of the HER2-low subtype. Breast cancers that have immunohistochemistry (IHC) staining of +1 or +2 IHC staining for HER2 and negative in situ hybridization (ISH) results are defined as HER2-low86,87. Historically, this subtype had been treated as HR+/HER2-, which is consistent with the finding that therapies specifically directed against HER2 are much less effective in these HER2-low patients88–90. However, results from the DESTINY-Breast04 clinical trial show that the ADC trastuzumab- deruxtecan doubled the PFS and significantly increased overall OS in advanced HER2- low breast cancer compared to standard care91. 22 PARP inhibition eventually leads to resistance in a majority of patients on long term treatment. Upregulation of ABC drug efflux pumps, particularly ABCB192,93, is common in patients with long treatment regimens of PARP inhibitors. An interesting mechanism of resistance is reversion mutations in BRCA1 and BRCA2 that restore homologous recombination functionality94. Two separate research groups found this mechanism simultaneously, where protein truncating variants of BRCA1 and BRCA2 could be restored to WT functionality after long term treatment of BRCA1/2 deficient cell lines with PARP inhibitors, with homologous recombination thus restored thereafter95,96. Since homologous recombination deficiency had already lead to the accrual of genetic damage in these cell lines and in BRCA1/2 deficient patients, further genetic damage is of little advantage to the tumor, so when a selective pressure like PARP inhibitors are applied, cells with ways to circumvent the stalled replication fork are quickly selected for69,97. Plenty of obstacles in breast cancer treatment remain, from severe side effects to currently available therapies or tumors acquiring resistance to these therapies. However, possibly the biggest obstacle in treatment of the different subtypes of breast cancer is the lack of available therapies that can specifically target all of the different genetic drivers an individual patient has. When discussing cancer drivers, this connotation often refers to gain of function genetic alterations to oncogenes. However, inactivation of tumor suppressors is of equal importance regarding cancer drivers. Cancer driver genetic alterations may refer to mutations, copy number variations, or structural rearrangements that can confer selective growth or survival advantages to the tumorigenic population, without which tumorigenesis would not occur12. Known drivers of tumorigenesis in breast cancer that are either overexpressed or genetically activated in some way include: ESR1, 23 GATA3, PIK3CA, AKT1, CCND1, FGFR1, ERBB2, MYC, MAP3K1, STK11, and more. Tumorigenesis driven by loss of tumor suppressors also include: BRCA1/2, TP53, PTEN, RB1, ATM, RAD51, BARD1, CHEK2, ARID1A, and many others98,99. Currently, there are 633 known cancer driver genes across both oncogenes and tumor suppressors determined by complex mathematical analysis of variational mutational rates between certain genes and background mutational rates, bolstered by experimental validation of certain genes. From this dataset, it is estimated there are 99 potential driver genes in breast adenocarcinoma, the most driver genes of any cancer type100. While the average number of non-synonymous genetic alterations in a single breast adenocarcinoma is 60-70 events, it is estimated that the average number of oncogenic drivers in an individual tumor is between 2-6 genes affected12,100. As stated before, if an alternative pathway exists that tumor cells can take advantage of in the presence of a selective pressure, activation of said pathway and increased signaling flux down this pathway would confer a fitness advantage to this tumor population. Therefore, we need to potentially treat all different genetic drivers in individual patients to ensure a complete response with no possible escape pathways for growth and proliferation. 1.10 Mouse models of breast cancer Genetically engineered mouse models (GEMMs) of breast cancer have been extensively used to study the mechanisms by which activation of oncogenes or loss of tumor suppressors contribute to carcinogenesis101. While many GEMMs exist to test the effects of exogenous substances, of focus for this thesis are the mouse mammary tumor virus (MMTV) derived GEMMs. Historically, it was found that certain inbred mouse strains have higher incidence of mammary tumors compared to other inbred mouse strains. 24 When cross breeding these inbred strains, only the offspring of the maternal parent of the high incidence strain showed continued high incidence of mammary tumors102,103. It was shown that the factor behind this was transmitted through milk by fostering low incidence mammary tumor mice with high mammary tumor incidence mothers, finding that these supposed low mammary tumor incidence foster pups then drastically increased mammary tumor incidence once nursed by mothers with high incidence of mammary tumors104. This factor in the milk was later found to be MMTV105. Others later found that in the induction of carcinogenesis, MMTV preferentially integrated into the genome at specific sites located next to endogenous oncogenes and increased their expression106, in this case, Int-1 (later renamed to Wnt-1). This revelation further cemented the genetic underpinnings of cancer development and then offered a tool to be used by researchers to overexpress oncogenes or tumor suppressors of choice in GEMMs to observe their effects in isolation. The MMTV long terminal repeat (LTR) promoter/enhancer section of the virus that was responsible for overexpression of endogenous oncogenes was isolated and could then be cloned with other DNA segments, often corresponding to different oncogenes, that could then be inserted into mouse embryos where this transgenic DNA would then incorporate into the genome randomly and overexpress the transgenic DNA of interest. Also of benefit was that the MMTV-LTR had specific expression in the mammary gland epithelium that was hormonally induced at pregnancy, although there was some minimal leaky expression elsewhere in the body101. The first instance of cloning a transgenic oncogene with the MMTV-LTR was that of Myc, a master transcriptional regulator strongly implicated at the time of being an oncogene. The only problem was that before there was no way to expressly isolate Myc 25 in vivo as being the single causative agent of carcinogenesis. Using the MMTV-LTR GEMM, it was found that overexpression of Myc alone in the mammary epithelium was sufficient to induce spontaneous mammary adenocarcinomas107. This was the first known instance of using a GEMM to show that a single oncogene can lead to spontaneous tumor formation. Under the control of a tissue specific, hormonally driven, constitutively active promoter, genes of interest could now be tested in isolation to examine effects on mammary gland development and oncogenesis. Additional promoters for constitutive expression of oncogenes have been developed since, including whey acidic protein (WAP) and C3(1)/Tag which upregulates genes under the control of rat prostatic steroid binding protein108. For the purposes of this thesis and projects undertaken in the main chapters, we will focus on MMTV driven GEMMs. The advent of using the MMTV-LTR to drive constitutive expression of transgenes specifically to the mammary gland epithelium created new possibilities in breast cancer research. Followed immediately after Myc, transgenic Ras and Neu (rat ortholog to human HER2) were used again to induce spontaneous mammary tumors from a single oncogene109–111. Thus far, the models used were simply transgenic endogenous oncogenes, but later there came another important MMTV model expressing a highly transformative virally derived protein. From its publication in 1992, the MMTV- polyomavirus middle T antigen (MMTV-PyMT) mouse model has been used widely to examine methods of breast cancer metastasis, as this model forms multifocal mammary tumors with low latency which inevitably develop lung metastases at nearly 100% penetrance112. Another advancement came during 1997, when the MMTV-Cre and WAP- Cre mouse models showed that you can use MMTV or WAP to activate the Cre-Lox 26 recombination system specifically in the mammary gland epithelium to delete regions of DNA engineered between loxP sites113. Now researchers were able to test single tumor suppressor or single gene knockout (KO) in vivo rather than strictly single oncogene activation/overexpression. With a variety of in vivo models and modern molecular biology tools at hand, breast cancer researchers sought to determine mechanistically each gene’s role in breast cancer. However, there is a gap in the research knowledge of these GEMMs that we are now only beginning to understand and investigate fully. 1.11 Dissertation rationale It is unquestionable that human breast cancer is heterogeneous, often being driven by activation of a variety of oncogenes and loss of function of multiple tumor suppressors; breast cancer induction is never truly a single step process. Often, a multi-step, sequential process of activating oncogenes and loss of function of tumor suppressors leads to breast cancer that can invade the surrounding tissue, evade the immune system, and metastasize to distant sites within the body. Vogelstein, who discovered the sequential genetic mutations for colorectal cancer induction7, has this to say on cancer development, “Unlike diseases such as cystic fibrosis or muscular dystrophy, wherein mutations in one gene can cause disease, no single gene defect 'causes' cancer. Mammalian cells have multiple safeguards to protect them against the potentially lethal effects of cancer gene mutations, and only when several genes are defective does an invasive cancer develop. Thus it is best to think of mutated cancer genes as contributing to, rather than causing, cancer.”114 27 All mouse models we have discussed thus far likely have other underlying genetic alterations that enable induction of mammary tumors. Only recently has the technology and cost of whole genome sequencing (WGS) and whole exome sequencing technologies been accessible to enable identification of other genetic drivers accrued in these GEMMs. Further uncovering the sequential, multi-step processes of activation of oncogenes and loss of function of tumor suppressors from single gene activation/overexpression or knockout presents an unprecedented opportunity in understanding tumor biology, uncovering previously unknown interactions between genes, and identifying the causative agents of heterogeneity in mouse models of breast cancer that could be applied to human breast cancer. All of these elements coalesce in advancing our understanding of human breast cancer biology and how we can better treat the individual patient based on the genetics of their breast cancer. Indeed, this is exactly what our research group has been examining115–118. Our previous work has previously examined the popular MMTV-Neu, MMTV- PyMT, and E2F knockout variants crossed to MMTV-PyMT118,119. Other groups have also found utility in examining GEMMs of breast cancer at the whole genome or whole exome level to identify similar genetic alterations that occur in human breast cancer. These include additional scrutiny on PyMT/Neu driven spontaneous mouse models of breast cancer120 and the ER+ NRL-PRL driven by Kras activating mutations121. In this dissertation, I will discuss my work on parsing the WGS information found in the MMTV- Cre conditional knockout (cKO) of E2F5 mouse model of breast cancer, WGS results found for the differing histological subtypes of the MMTV-Myc model, and finally characterizing knockdowns (KDs) of genes previously implicated by WGS in the MMTV- 28 Neu model as contributing to breast cancer metastasis. Sequencing many different mouse models of breast cancer using integrative multi-omics will culminate in the construction of a murine version of The Cancer Genome Atlas (TCGA)122,123. Compiling the different genetic, epigenetic, and gene expression changes that occur within each mouse model of breast cancer would prove an invaluable resource to breast cancer researchers in choosing a GEMM appropriate for their studies and allowing for large scale comparative genetics between mice and humans. 29 CHAPTER 2: CONDITIONAL KNOCKOUT OF E2F5 IN THE MURINE MAMMARY GLAND RESULTS IN HETEROGENEOUS TUMOR DEVELOPMENT 30 2.1 Preface The introduction concerning early 2 factor (E2F) transcription factors are adapted from the following book chapter. Permission was obtained from the publisher (Elsevier, license # 5881941336678) for reuse and adaptation of the entire book chapter: Broeker, C. D. & Andrechek, E. R. 6.16 - E2F Transcription Factors in Cancer, More than the Cell Cycle. in Comprehensive Pharmacology (ed. Kenakin, T.) 277–311 (Elsevier, Oxford, 2022). doi:10.1016/B978-0-12- 820472-6.00102-X. Most WGS results that appear in this chapter have previously been published in the following manuscript. This article is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Figures and text have been adapted and slightly altered from their original work: To, B., Broeker, C., Jhan, JR. et al. Insight into mammary gland development and tumor progression in an E2F5 conditional knockout mouse model. Oncogene (2024). https://doi.org/10.1038/s41388-024- 03172-4 31 2.2 Abstract Breast cancer development is closely linked to disruptions in mammary gland developmental processes, underscoring the importance of defining normal mammary gland development. While E2Fs 1-4 have well-defined roles in mammary gland development, the role of E2F5 remains less clear. Exploring single-cell RNA sequencing data in the murine mammary gland, it was observed that E2F5 expression fluctuates significantly between stages of mammary gland development. To investigate the role of E2F5 in mammary gland development, the research community created a mammary- specific E2F5 knockout mouse model. Although initial mammary gland development was modestly affected, after an extended latency period, E2F5 cKO mice developed highly metastatic mammary tumors with significant intratumoral and intertumoral heterogeneity. Whole genome sequencing of these tumors revealed conserved amplifications of known oncogenes Wnt2 and Met across the majority of samples. Additionally, impactful splice acceptor mutations in Fbxo15 were identified. These findings suggest that E2F5 loss leads to the amplification of key oncogenes and mutations that drive the development of metastatic mammary tumors. This study highlights the role of E2F5 as a critical transcription factor regulating a network of genes with tumor suppressor functions and underscores its importance in maintaining mammary gland integrity and preventing tumorigenesis. 32 2.3 Introduction E2Fs have been well established to play central roles in cellular proliferation for many eukaryotes. E2Fs, complexed with dimerization partner (DP) proteins and RB family pocket proteins, are regulated in a cell cycle dependent manner124–127. While certain E2Fs can exhibit high specificity binding to different pocket proteins, others are more promiscuous and will bind to multiple pocket proteins. This complexity is increased by a dependance on the current stage of the cell cycle or cellular context125. Advances in chromatin immunoprecipitation (ChIP)-chip and ChIP-seq have allowed the identification of the E2F cistrome, which shows that E2Fs have targets other than genes associated with the established G1-S phase transition role in the cell cycle (Figure 2.3A). These functions can include DNA repair, checkpoints for G2-M transition, replication, metabolism, autophagy, apoptosis, and other previously unknown targets of E2Fs124,128,129 (Figure 2.3B). E2Fs can exhibit considerable redundancy in binding to target promoters, making it difficult to assign specific functions to each individual E2F130. Both the established roles and newly emerging roles of E2Fs come into play in an area that has gained considerable attention in recent history: the role of E2Fs in cancer124,126. E2Fs are highly upregulated across a variety of tumor types. Using TCGA for a pan-cancer analysis, the tumor types with generally upregulated E2F activity compared to normal tissue include stomach adenocarcinoma, liver hepatocellular carcinoma, and lung adenocarcinoma, while also being upregulated generally in many more tumor types. Gene set enrichment analysis (GSEA) also confirms these findings131,132. For specific mechanisms contributing to tumorigenesis by E2Fs, both activator and repressor E2Fs have been shown to bind at adjacent sites of the BRCA1 promoter under hypoxic 33 conditions, resulting in the repression of the BRCA1 gene and subsequent depression of homologous directed repair133. Depletion of certain E2Fs can also promote oncogenesis, for instance loss of E2F3 results in aberrant mitotic spindle formation, aneuploidy, and abnormal centrosome duplication which can result in genetic instability and increase metastatic potential134. Indeed, this E2F mediated genetic instability is a well-documented mechanism leading to cancer development133,135,136. Similarly, depletion of E2Fs as exemplified by E2F2 nullizygous mice in the Eµ-Myc background drastically reduced time to onset of B cell lymphoma, without altering Myc-mediated proliferative ability or apoptosis. Interestingly, these E2F2 null tumors were fairly homogeneous and resembled early onset like B cell tumors, while E2F4 null tumors, in the same Eµ-myc background, produce heterogeneous tumors that can resemble late or early onset like B cell tumors137. E2F’s role in cancer development and progression has not gone unnoticed, with many groups now trying to specifically target E2Fs for clinical benefit. For example, E2F1 is often overexpressed in gastric carcinoma and contributes to multidrug resistance via upregulation of MDR1, MRP, Tap73, ZEB1, and ZEB2, while simultaneously downregulating expression of GAX138. Coupled with the previous data, downregulation of E2F1 by exogenous short hairpin RNA (shRNA) supplementation has been shown to abrogate multidrug resistance, suggesting that E2F1 could be clinically targeted139. In identifying E2F expression patterns across cancer types, TCGA has proven a valuable repository for identification of clinical targets124,140,141, although, some inconsistencies have arisen depending on the tissue of origin, with even more differences by cancer subtype, adding complexity to the role of E2Fs in carcinogenesis. From TCGA data, it was found that overexpression of E2F3 in NSCLC was correlated with better 34 prognosis, but in colon cancer overexpression of E2F3 was associated with worse overall survival and other clinical outcomes140,142. These inconsistencies by cancer type exemplify the need for specific targeted therapies for E2Fs individualized to each patient. At the time of this article’s publication, almost 5,900 papers mentioning E2Fs in the title have been published, according to PubMed, since their inception with increasing frequency in recent years, indicative of their increasing relevance. Here we focus on various aspects of E2Fs, from their basic structure, interactors, regulation, established canonical functions, nascently identified noncanonical functions, and especially detail how E2Fs contribute to cancer and how we can therapeutically target E2Fs in the clinic. E2Fs can be separated into two main groups: transcriptional activators and repressors. E2F1, E2F2, and E2F3a are activators, while E2F3b, E2F4, E2F5, E2F6, E2F7a, E2F7b, and E2F8 are transcriptional repressors, although they can be categorized into subgroups further. Despite being a truncated isoform of E2F3a, E2F3b exhibits canonical repressor activity. Structurally, all E2Fs share a highly conserved DNA binding domain (DBD). E2F1–E2F6 also share a highly conserved dimerization domain, with heterodimerization occurring with members of the DP protein family124,143. DP proteins are obligate partners to E2Fs, facilitating binding to DNA at E2F consensus regions. In humans there are three dimerization partner proteins, DP1, DP2, and the more recently identified DP3. DP3 has been found to heterodimerize with E2Fs and exerts a transcriptional repressive effect to impair cell cycle progression but is unable to bind to E2F consensus sequences144. The E2F1–6-DP heterodimers can bind directly to DNA and exert their respective functions, whether related to activation or repression145. However, many other mechanisms for E2Fs exist beyond directly binding to DNA, with 35 E2Fs playing integral roles in transcription factor complexes that will be discussed later in this review. E2F1–E2F3a/b contain N-terminal nuclear localization sequences and are the only E2Fs to possess cyclin A regulatory domains, while E2F4 and E2F5 contain nuclear export sequences contained within their respective DNA binding domains. E2F1– E2F5 contain C-terminal binding sites for pocket proteins, also frequently called RB proteins, which in humans are pRB, p107, and p130 where the pocket domain of RB proteins binds to the transactivation domain between helical subdomains on the E2F C- terminal region, thereby preventing any transcriptional activator activity from E2Fs146. Both highly conserved A and B pocket domains of the RB proteins are required for repressive activity, although the spacer region separating the two domains appears functionally unnecessary. Strangely, the two pocket domains of RB proteins can be expressed on entirely different proteins and still exert their repressive activity147. E2F1–3 only bind to pRB, while E2F4, the most abundant E2F in the cell, can bind to any pocket protein but preferentially associates with different pocket proteins depending on the stage of the cell cycle. Despite being a similar repressor, E2F5 can only associate with p130124,143,148. E2F6 contains a partial transactivation domain for regulation by polycomb group (PcG) proteins but cannot bind to pocket proteins149. PcG proteins are known especially for their transcriptional repression of homeobox genes for proper body segmentation150, but have also been observed complexed with E2F6 at both consensus and non-consensus E2F target promoter sites151. E2F7a, E2F7b, and E2F8 lack all aforementioned domains except for DNA binding domains and are considered to be atypical repressor E2Fs for their duplication of DNA binding domains and ability to bind to DNA without forming heterodimers with DP family proteins145,152–154. 36 The first five E2F proteins have pocket protein binding domains. When not bound by pocket proteins, these E2Fs are free to bind to DNA and exert their transactivation functions typically by binding to promoter sequences. For activation of genes, activator E2Fs complex with histone acetyl transferases such as p300/CBP, Tip60 and PCAF/GCN5 to remodel chromatin by acetylating lysine residues on the H3 and H4 subunits of the histone octamer, thus inhibiting higher order chromatin structure and relaxing coiled DNA. This relaxation ensures that the DNA is more amenable to transcription145,155,156. In a similar fashion, other E2Fs, mainly E2F4 and the pocket proteins p107 and p130, bind to promoters of genes to repress transcription through various mechanisms. These mechanisms can prevent association of transactivation domains and subsequent activation of gene expression, promote recruitment of histone deacetylases to remodel chromatin structure and make it more inaccessible for transcriptional machinery, and recruitment of chromatin remodeling proteins such as the transcriptional coregulator Brahma-related gene-1 (BRG), a subunit of the SWItch/Sucrose Non-Fermentable (SWI/SNF) complex, also called the BRG/Brahma associated factor complex in mammals145,155,157–159. E2Fs are canonically regulated by the phosphorylation of RB proteins by CDKs. E2Fs—excluding E2F6, E2F7, and E2F8—contain a C-terminal transactivation domain which contains a site for RB proteins to bind and thus repress E2F activity. RB proteins exhibit their repressive effects when bound to E2Fs, but when mitogenic signals accumulate within the cell, CDKs become increasingly active, hyperphosphorylate RB, and cause the release of E2Fs125,160. Specific residues for CDK phosphorylation include serine and threonine residues that lie within the C-terminal domain of RB proteins. 37 Phosphorylation at S788 and S795 directly disrupts RB core binding to the E2F1-DP1 heterodimer by 22 fold compared to unphosphorylated RB, while phosphorylation at T821 and T826 induce interactions between the RB core and the RB pocket domain, indirectly destabilizing the interaction between pRB and the E2F-DP heterodimer. CDK4/6-Cyclin D is primarily responsible for the phosphorylation of S788 and S795, with these same CDK complexes in addition to CDK2-Cyclin E/A phosphorylating T821 and T826, giving cell cycle specificity to RB-E2F-DP interactions and transcriptional repression148. Specific CDK levels and activity oscillate with the cell cycle and each CDK has specific targets to maintain temporal control over the cell cycle. The kinase subunits of CDKs are present in roughly constant amounts throughout the cell cycle; the oscillation of specific CDKs occurs from degradation or inhibition of the cyclin subunits throughout the cell cycle phases. The cyclin E-CDK2 complex is most prevalent in the G1 phase, with cyclin E degradation or inhibition occurring during the transition to S phase. Concurrently, cyclin A production occurs, where it associates with CDK1 and CDK2, lasting from S phase to the end of the G2 phase. The transition to M phase is marked by rapid degradation/inhibition of cyclin A, with cyclin B production ramping up quickly and associating with CDK1. After M phase, cyclin B is degraded, with the cycle to start anew in G1 phase with the production of cyclin E. However, this is the classical model of CDK activity in the cell and conflicting information has since been found161. The complexity, nuances, and caveats of CDK signaling are beyond the scope of this review, and thus will not be discussed. For E2Fs, CDKs are important insofar as their ability or lack thereof to phosphorylate RB proteins. The high conservation of the C-terminal domain containing phosphorylatable residues at certain positions in pRB and related pocket proteins across 38 species is indicative of CDK importance in RB regulation, which in turn affects E2F activity. As was mentioned previously, the C-terminal domains of human p107 and p130 contain IE regions that act as degrons for regulation of transcriptional potency and a functional IE region exists on pRB when complexed with DP proteins162. These are sites for phosphorylation by CDKs which ultimately act to target the pocket proteins for degradation by the proteasome. While the C-terminal domain of RB proteins are what we typically associate with phosphorylation, the N-terminal domain also possesses residues that can be phosphorylated by CDKs and seemingly confers the opposite effect of C- terminal residue phosphorylation. It has been found that under cellular stress conditions, including oxidative stress, DNA damage, osmotic stress, and more, stress activated protein kinases (SAPKs) become activated to regulate cell cycle checkpoint genes. It was found in MEFs that p38-MAPK/SAPK becomes activated under previously mentioned stress conditions and phosphorylates the N-terminal domain of pRB, increasing its potency as a transcriptional repressor. Interestingly, phosphorylation on the N-terminal domain is dominant to any effects of phosphorylation on the C-terminal domain by CDKs; p38 N-terminal phosphorylated pRB was found to be resistant to inactivation by CDKs. As a potential clinical application of this knowledge, a phosphomimetic pRB mutant that mimics p38 phosphorylation of the N-terminal domain was found to drastically reduce MCF7 breast cancer cells’ ability to proliferate163. This is a promising approach to overcome the limitations of CDK inhibition in cancer cells for therapeutic effect. The canonical and historical context of E2Fs labeled them as regulators of the cell cycle and cell differentiation. A role for E2Fs in apoptosis was established shortly thereafter their discovery and link to cellular proliferation, expanding the group of 39 canonical functions for E2Fs164. As more E2Fs became known, and their functions expanded to repressive roles while still regulating cell cycle and proliferation, it seemed E2Fs had established and highly conserved, albeit limited, roles in the cell. Activator E2Fs became grouped together for their ability to induce quiescent cells into S phase and to prevent cell cycle arrest mediated by cyclin dependent kinase inhibitors164–168. At this point, the literature began to confine the E2F members into well-defined groups: the transcriptional activators, the passive repressors, and the later identified active repressors. However, the E2Fs prove to be more dynamic, with these definitions failing under certain contexts124,169. Unlike Drosophila, which have a strict dichotomy with dE2F1 being the sole transcriptional activator and dE2F2 being the sole transcriptional repressor, it has been noted that five of the human E2F members have been observed serving both activator and repressor roles145. While it was initially thought that activator E2Fs were necessary for cellular proliferation, with in vitro data of MEFs supporting this hypothesis, redundancy was noted among the E2F activators170. However, others have shown individual or even combination knockouts of various E2Fs did not significantly affect normal widespread cellular proliferation, only specific lineages of differentiated cells were affected124,171–173. Binding and interplay between E2Fs in cellular pathways is complex and not easily tractable, as promoters bound by activator E2Fs in G1 phase of the cell cycle could be bound by repressive E2Fs at a different stage in the cell cycle. Moreover, the activator E2F could recruit other factors to confer transient repressive activity, but perhaps only in a certain tissue type with factors such as degree of chromosome compaction or presence of mitogenic signals conferring different effects. 40 Focusing on E2F5 specifically, this protein is constitutively expressed at all stages of the cell cycle, although its expression can be tuned depending on the stage of differentiation or development. E2F5 was found to be highly expressed in actively proliferating, undifferentiated neuronal precursor cells and in the suprabasal layer of epithelial cells in murine gestational development once stratification from precursor cells emerges174. E2F5 complexes predominantly with p130 but is capable of binding to p107127. E2F5 is known to associate with the p130 pocket protein when moving into the nucleus and is not complexed to a pocket protein when its NES facilitates translocation to the cytoplasm via CRM1175. As mentioned previously, knockout of E2F5 combined with E2F4 knockout results in neonatal lethality176, stemming from an inability to arrest cycling cells in G1 leading to aberrant cell proliferation that is ultimately deleterious to the organism. Overlapping functions with E2F4 are not surprising given their similarity in functional motifs and overall high sequence identity at 74.44% for 70% query cover (via BLAST of Homo sapiens E2F4 vs E2F5). E2F5 is also a noted subunit of the MuvB and Dimerization Partner, RB-Like, E2F and Multi-Vulval Class B (DREAM) complex where it exhibits transcriptionally repressive activities177,178, as is the case with E2F4. Similar to E2F4, E2F5 is involved in the formation of repressive SMAD-p107-E2F complexes for inhibition of c-Myc expression179,180. E2F5 is not known to bind pRB with high affinity, in contrast to E2F1, E2F2, and E2F3. However, despite sharing many cell functions with E2F4, E2F5 does confer some specific functions. Early evidence for a specific role of E2F5 comes from E2F5 null mice, where embryonic development occurs normally in mice, but soon after birth the pups develop hydrocephalus, significantly decreasing lifespan181. This hydrocephalus is caused by increased production of cerebrospinal fluid 41 by the choroid plexus, already a major producer of cerebrospinal fluid. Expression profiles of E2Fs in embryonic tissues and E2F5 knockouts of embryo fibroblasts in vitro revealed that E2F5 was not essential and does not greatly affect cell proliferation, but contributes to differentiation and development of tissues. While highly similar to the properties of E2F4, the expression profiles of E2F4 in these embryonic tissues shows constitutive levels of E2F4 that cannot compensate for the loss of E2F5. This indicates that either E2F5 plays a specific role in certain tissue development and differentiation, or that E2F4 and E2F5 contribute to development and differentiation through dissimilar mechanisms181. E2F5, like all E2Fs before, has numerous exceptions to its main role as a transcriptional repressor. Overexpression of E2F5 in primary baby rat kidney cells has been found to promote morphological transformation. Together with a screen of 442 breast tumor samples, increased copy number for E2F5 was observed in the sample set as well as increased E2F5 transcript expression. Using functional assays to support their hypothesis, the authors proposed that E2F5 cooperates with oncogenes such as Myc182. Overexpression of E2F5 has also been observed to be common in hepatocellular carcinoma with ablation of E2F5 reducing proliferation183. MYCN driven neuroblastoma was found to positively regulate E2F5. E2F5 knockdown in this model was observed to inhibit cell cycle progression and cell proliferation, presenting E2F5 as a cooperating oncogenic factor in MYCN driven neuroblastoma184. However, other studies have implicated lower expression of E2F5 in chemoresistance in NSCLC. In NSCLS a mutation in p53 (p53R175H) induced higher expression of miRNA-128-2, which post- transcriptionally targets and represses E2F5 translation. With E2F5 no longer able to 42 repress p21waf1, this allowed p21waf1 to localize to the cytoplasm where it prevented pro-caspase-3 cleavage, conferring an antiapoptotic effect condoning tumor proliferation and resistance to apoptosis185. Other groups have also linked increased repression of E2F5 to oncogenic activity through post-transcriptional mechanisms by miRNAs186,187. Conversely, other studies found that miRNAs targeting E2F5 act to suppress glioblastoma growth188. When these disparate studies are considered together, both reduced expression and overexpression of E2F5 can contribute to oncogenic activity depending on the context, but the nuances of how this occurs is not well understood. Indeed, many functions attributed to E2F5 are listed in combination with E2F4, as their roles appear to overlap in certain contexts. There is a clear need to address the dearth of information available on E2F5 relative to the other E2Fs. In the context of mammary gland development and breast cancer, E2Fs 1-4 have all been characterized at some level. E2F1 has been found to regulate breast cancer metastasis in GEMMs through multiple different pathways and has a canonical role of progressing the cell cycle, thereby increasing proliferation115,119. Despite its status as an activator E2F, loss of E2F2 in MMTV-Myc GEMMs results in increased metastasis189 and predicts worse clinical outcomes in humans with low E2F2 activity190. E2F3 is commonly overexpressed in breast cancer and leads to aneuploidy and increase EMT191. High E2F4 expression is highly correlated with worse overall survival in humans192. E2F1, E2F3, and E2F4 KO mice all showed delayed mammary gland development and branching compared to WT mice193. We know the effects of perturbing E2Fs 1-4 in the context of mammary gland development, but we know little about the role of E2F5 in mammary 43 gland development. Herein we sought to examine the direct mechanistic roles of E2F5 on mammary gland development by E2F5 KO. 2.4 Results 2.4.1 Conditional knockout of E2F5 delays mammary gland development and leads to latent highly metastatic tumor growth Publicly available scRNAseq datasets for murine mammary epithelial cells (MECs) at nulliparous, pregnant, lactating, and involution stages of mammary gland development showed high differential expression of E2F5 between stages, in contrast to E2F1 for comparison194. Similar differential expression of E2F5 between mammary gland cell types was seen in the single-cell mouse cell atlas195. Given the observation of E2F5 expression changing drastically between different developmental stages and cell types within the mammary gland, coupled with the previous lack of a mouse model to study E2F5, our research group created the MMTV-Cre cKO E2F5 strain of mice on the Friend leukemia virus B (FVB) background to study the effects of E2F5 depletion in the murine mammary gland. It is known that full body KO of E2F5 leads to hydrocephalus in adolescent mice181, so we opted for the mammary specific MMTV-Cre GEMM to avoid this complication. Our group engineered flanking loxP (flox) sites encompassing exons 2 and 3 of E2F5, containing a critical portion of the E2F5 DBD for functional activity (Figure 2.4.1A). Upon entering puberty, the hormonally activated MMTV LTR promoter/enhancer expresses high amounts of Cre recombinase, excising the floxed DNA, and rendering a complete functional KO of E2F5 in the murine mammary gland. Our group observed a statistically significant decrease in mammary ductal outgrowth in the MMTV-Cre E2F5 cKO group compared to MMTV-Cre control (Figure 2.4.1B). Unexpectedly, cKO E2F5 mice 44 developed highly penetrant mammary tumors after a long latency period while MMTV- Cre mice did not. Mammary tumor onset was earlier in E2F5 cKO multiparous mice than nulliparous mice (Figure 2.4.1C)196, potentially due to extensive proliferation and apoptosis of alveolar cells in the mammary glands of multiparous mice during lactation and involution, adding additional mitotic stress and DNA damage to these cells. Given the long latency period before tumor onset and the unexpected nature that KO of E2F5 would result in mammary tumors, I hypothesized that E2F5 depletion started a cascade of multiple additional genetic alterations that must have occurred before tumor initiation. 2.4.2 Confirmation of E2F5 knockout by whole genome sequencing I utilized WGS for an unbiased approach to comprehensively examine 5 E2F5 cKO tumors of diverse histology and pathology. While odds were low, there was a possibility nonetheless that mammary tumor formation in the E2F5 cKO mice arose from factors outside the purview of E2F5 cKO, with some tumors potentially containing functional E2F5. Coupled with establishing tumor causality, examining the variant allele frequency (VAF) of normal E2F5 present within the WGS data can be used to establish how large the normal tissue or immune cell component of each tumor is. To validate E2F5 KO in these tumors via WGS data, I examined the aligned bam file of each tumor using Integrative Genomics Viewer (IGV)197 to show KO of exons 2 and 3 of E2F5 for individual 150 base pair-reads. Indeed, paired read signatures within IGV suggest successful deletion of E2F5 exons 2 and 3 in all 5 tumors examined (Figure 2.4.2A). However, WGS paired reads also suggest a fair amount of normal tissue contamination, especially in samples 3790 and 3790_2. With confirmation of complete E2F5 KO in all 5 tumors at the 45 WGS level, we can examine downstream genetic events with certainty that these events are the ultimate result of E2F5 depletion. 2.4.3 Heterogeneous copy number alterations and mutational profiles across tumors Significant heterogeneity was present within all 5 E2F5 cKO tumors examined. WGS data included copy number variations (CNVs), single nucleotide variants (SNVs) and translocations/inversions. Circos198 plots displaying all genetic events for tumors 3790 (Figure 2.4.3A), 3790_2 (Figure 2.4.3B), 3976 (Figure 2.4.3C), 5492 (Figure 2.4.3D), and 5691 (Figure 2.4.3E) exemplify the intertumoral heterogeneity. The outermost ring of each Circos plot is an ideogram representing each mouse chromosome proportional in size to its number of base pairs within the genome. The next ring in represents SNVs, where blocks represent mutations predicted by Mutect2199 to be of low, moderate, or high impact. Low impact mutations can include synonymous variants, where a nucleotide change would result in the same amino acid incorporation during translation. Moderate impact mutations are often missense mutations where a single amino acid change occurs. High impact mutations are rarer but vary widely within the tumors—high impact mutations within tumor 5492 alone include a truncating mutation in Trp53, loss of a stop codon in Vmn2r89, and a splice acceptor/intron variant mutation in Fbxo15. The next ring in the Circos plots indicates copy number gains with red bars and copy number losses with blue bars, with size proportional to the read evidence support for each event. The innermost ring indicates both translocations and inversions. While heterogeneity is clearly present across all tumors, I sought to examine overlapping events between tumors, as independently arising genetic alterations suggest identical selective pressures 46 and convergence on specific oncogenic pathways. To accomplish this, I created an OncoPrint200 that details commonly mutated, amplified, or deleted genes across all tumors (Figure 2.4.3F). Commonly altered genes that have previously been implicated in human cancers include Fbxo15201 (5/5 tumors), Plec202 (4/5), Tshz1203 (4/5), Itgb1204 (4/5) Arid1a205 (3/5), Kmt2c206. It was noted that a majority of the mutations in all tumors were C>T missense mutations. To quantitatively examine the origin of the mutational mechanisms within these tumors, I utilized deconstructSigs207 to predict the etiology of mutations within these tumors. Our observations were quantitatively confirmed to show that the major mutational mechanism within these tumors was from aging (Figure 2.4.3G), which spontaneously deaminates 5-methylcytosine leading to the formation of thymine. The second most prevalent mutation source was identified as homologous recombination repair (HRR) deficiency. Mutational signatures from E2F5 cKO tumors are more similar to MMTV-Neu than to MMTV-PyMT tumors. 2.4.4 All tumors possess mutations causing a novel splice acceptor site in Fbxo15 Given the ubiquity of the Fbxo15 splice acceptor mutation among all 5 tumors examined, I wanted to examine the potential relevance of this mutation in human cancers. All 5 MMTV-Cre E2F5 cKO tumors displayed G>T splice acceptor mutations on position chr18:84,977,366 (GRCm38) in Fbxo15 to varying degrees of penetrance (Figure 2.4.4A). Most tumors had only heterozygous somatic mutation of this region, while tumor 3976 had 100% penetrance with no WT allele remaining. This mutation consequently disrupts the AG-3’ site of an intron, which is essential for recognition and subsequent splicing out of introns during mRNA processing208,209. While the significance of this splice acceptor site mutation is not known, it may result in the inclusion of intron material during 47 translation and lead to a frameshift of the coding sequence, leading to a nonfunctional Fbxo15 protein. This may be consequential as Fbxo15—an F-box protein and critical substrate recognition component of the Skp, Cullin, F-box containing complex E3 ubiquitin ligase—is likely playing a tumor suppressive role in breast cancer201. Other F- box proteins like FBXW7 have known roles for substrate recognition of oncoproteins like Notch and c-Myc for ubiquitination and subsequent degradation by the proteasome210. To examine the potential human relevance of this mutation in Fbxo15, I used Clustal Omega211 multiple sequence alignment to align the entirety of mouse Fbxo15 DNA (Figure 2.4.4B) to human FBXO15 DNA. I found the homologous region of human FBXO15 to be chr18:74,082,052 (GRCh38) (Figure 2.4.4C), also an AG-3’ intronic splice acceptor site right before amino acid 380. While all mutations for FBXO15 are of unknown significance in terms of effect on FBXO15 function, when doing a pan-cancer analysis212– 214, it was found that the same splice acceptor site mutation has been recorded in human cancers (Figure 2.4.4D). While the independent coalescence of the same somatic G>T mutation at the same position in Fbxo15 among all 5 E2F5 cKO tumors examined would suggest the same strong selective pressure to ablate Fbxo15 function in all samples, without further validation the significance of this mutation cannot be known for certain. However, copy number data from the E2F5 cKO tumors identified independent amplification of oncogenes where significance is apparent. 2.4.5 Independent convergence on inversions and amplifications of Cav1, Cav2, Wnt2, and Met I identified heterogeneous copy number alterations over all tumors (Figure 2.4.5A). Called amplifications or deletions of genes had a minimum threshold absolute 48 copy number change of +/- 1. A region of strong amplification was identified on chromosome 6 between base pairs 15,000,000 and 20,000,000 (Figure 2.4.5B). This region is amplified in 3 out of the 5 tumors examined. Genes within this region that have known tumor suppressive or oncogenic roles are Cav1215,216, Cav2217, Met218, and Wnt2219. In tumor 3976, all genes within this region had an additional 92 copies of each gene, while in tumor 5691 there were 35 additional copies of each gene. Tumor 3790_2 had only 1 additional copy of each gene within this region. A possible factor contributing to or stemming from this amplified region in these tumors is an inversion event present in a majority of these 3 tumors that shares a boundary with the amplification event (Figure 2.4.5C). The degree to which these genes are amplified in these tumors suggests a strong selective pressure to amplify this region. The borders of the amplification event with key genes highlighted convey this well (Figure 2.4.5D), with the borders of the amplification event being slightly different between tumors, but encompassing the known oncogenes within this region. It is of note that tumor 3976 had such a high degree of amplification over this region and also had the most complete excision signal for E2F5 (Figure 2.4.2A). There is clearly convergence on certain genetic events following E2F5 KO. 2.5 Discussion E2F5 is canonically known as a tumor suppressor176,220, and results described herein further support that role in the context of mammary gland development. We found only mild perturbances in mammary gland ductal outgrowth and development, but highly penetrant metastatic mammary tumors after long latency periods of 18-21 months on average. Given the latency in tumor development, I hypothesized additional genetic alterations would have accrued over time for tumor development to occur. I ultimately 49 found a variety of heterogeneous alterations between the 5 tumors examined, such as the Kras activating mutation and Trp53 truncating mutation in tumor 5492. More interesting are the shared events that occurred independently. Independent convergence on the same genetic alterations would suggest a high degree of similarity in selective pressures within each tumor’s microenvironment and dependence on the same oncogenic pathways. Upon examination of WGS data for 5 E2F5 cKO tumors, we indeed found convergence upon similar genetic alterations between tumors. The overarching mutational mechanisms for each tumor result from aging, which results from the spontaneous or enzymatic deamination of 5-methylcytosine, creating thymine residues in their stead. These results are corroborated by the average tumor latency period in these mice approaching two years. Minor mutational signatures include HRR and MMR deficiency, which are difficult to validate with the current limited dataset. HRR deficiency is strongly associated with BRCA1/2 mutations or promoter methylation221. However, I found no mutations in either BRCA1 or BRCA2 in any of the tumors sequenced, and we lack epigenetic data on these tumors to confirm promoter hypermethylation of these genes. MMR deficiency is often associated with mutations in MLH1, MSH2, MSH6, and others222, but I did not find support for these mutations either. It is possible other genes in mice that have not been identified as playing roles in HRR or MMR are the culprits, but more likely that promoter methylation or post-transcriptional regulation mechanisms are at play, but their data has not been captured. A more robust integrative multi-omic approach would resolve these quandaries. 50 All 5 tumors converging on the same Fbxo15 splice acceptor putatively high impact mutation is indicative of a strong selective pressure at play that is not well understood. Fbxo15 is characterized mainly by its ~40 amino acid long F-box motif occurring from amino acids 77-117. While this mutation is of unknown clinical significance, we may be able to draw comparisons to other F-box proteins. Fbxw7 is another F-box protein that has a missense mutation at chr3:84,974,498 exclusively in tumor 5691. F-box proteins like Fbxo15 and Fbxw7 are a critical subunit in the ubiquitin protein ligase complex SKP1- cullin-F-box (SCF). F-box proteins are critical to proteasome-mediated degradation of various proteins within the cell, particularly oncoproteins223. FBXW7 has known pathogenic mutations that aid or abet tumorigenesis to occur in human cancers210,212,224. There are multiple hotspot missense mutations of arginine residues to histidine residues at amino acids 465, 479, and 505 in FBXW7 that are confirmed to be loss-of-function pathogenic in human cancers. The missense mutation in tumor 5691 at position chr3:84,974,498 also results in a residue change from arginine to histidine at amino acid 508 in mice, orthologous to amino acid 505 in humans. It is highly likely that there is a selective pressure driving loss of function of different F-box proteins in E2F5 cKO tumors. However, the link between loss of E2F5 and subsequent loss of function of F-box proteins is not well understood. Similarly, there is convergence on amplification of the region surrounding Cav1, Cav2, Met, and Wnt2 in most mouse tumors examined in this study. While only amplified in 1-4% in human cancers, there is ample evidence that a similar amplification event in humans, with copy number gains seemingly peaking over MET. Indeed, MET is the only oncogene of this group with confirmed oncogenic activating mutations in human 51 cancers212. WNT2 is a strong activator of canonical WNT/β-catenin signaling, ostensibly leading to increased malignant progression and anchorage independent survival of cancer cells225. CAV1 and CAV2 have conflicting roles in cancer progression, with evidence suggesting that they play tumor suppressive roles early on in cancer progression by limiting cell proliferation, preventing loss of cell polarity, and suppressing MAPK, PI3K-AKT, and WNT/ β-catenin signaling. However, after advanced cancer progression CAV1 and CAV2 may flip to promoting tumorigenesis as their upregulated expression is associated with increased metastasis, increased migration, and drug resistance216,226–228. It is unclear whether there is cooperation between multiple genes within this amplification event, or if it is the case of selective amplification of one gene in this region, likely Met or Wnt2, and other genes are merely passengers. Additional studies on the links between E2F5 and these consistent genetic alterations are needed to establish connections between them, potentially informing therapeutic options for human cancer patients in the future. 2.6 Materials and methods 2.6.1 scRNAseq E2F5 expression in mammary development stages To analyze E2F5 expression in a single cell RNA-seq mammary development dataset, the following website was used: https://marionilab.cruk.cam.ac.uk/mammaryGland/. To analyze E2F5 expression in various mammary cell types, the following website was used: https://bis.zju.edu.cn/MCA/. The following published microarray datasets were used for analysis: terminal end bud and duct (GSE2988) and mammary gland developmental stages (GSE12247). 52 2.6.2 Animal generation and housing All animal husbandry and animal use protocols followed local, national and institutional guidelines. Ethical approval for the study was approved by Michigan State University Animal Care & Use Committee (IACUC) under AUF 06/18-084-00. E2F5 cKO mice were generated by flanking exons 2 and 3 of the E2F5 gene with loxP sites. E2F5flox/flox mice were interbred with MMTV-Cre mice229. Mice were monitored weekly for tumor development. The endpoint for primary tumor was 2000 mm3. 2.6.3 WGS dataset generation and access Newly generated WGS data has been deposited at the NIH NLM under BioProject PRJNA887715. To generate WGS data, flash frozen tumor samples stored at -80 °C were ground using a sterilized mortar and pestle under liquid nitrogen until a homogeneous powder. DNA from ground tumor samples was immediately extracted using a Qiagen Genomic-tip 2-/G Kit according to the manufacturer’s guidelines for tissue weighing up to 15 mg. WGS was performed in the Research Technology Support Facility (RTSF) at Michigan State University and sequenced to a depth of 40x. TruSeq Nano DNA library preparation was performed by trained personnel at the RTSF. Instrumentation used was the Illumina HiSeq 2500 and generated paired end 150 base pair reads. 2.6.4 WGS data processing WGS sample reads were concatenated and quality checked using FastQC/0.11.7230. Sample reads had adapter ends trimmed off using Trimmomatic/0.38231 and again checked for quality using FastQC. Paired WGS reads were aligned to the GRCm38/mm10 reference genome using Burrows-Wheeler Aligner/0.7.17232. Read groups were added using Picard/2.22.1 from the Broad Institute 53 (http://broadinstitute.github.io/picard). Reads were then indexed and sorted using SAMtools/1.11233. Duplicate reads were removed using Picard. 2.6.5 WGS somatic genetic alteration calling Once WGS reads had been checked for quality, trimmed poor quality reads, aligned to the mm10 reference genome, duplicated reads removed, sorted, and indexed were the resulting bam files used for somatic genetic alteration identification. To reduce false positives, I ran our same pipeline on the Sanger Mouse Genomes Project (https://www.sanger.ac.uk/data/mouse-genomes-project/) FVB/NJ background release 1604 bam file. Each tumor bam file had SNVs called using the consensus of Mutect2 under GATK/4.0.5.1234 and VarScan/2.4.1235. Translocations, inversions, and CNVs were called using the consensus of Delly/0.7.8236 and Lumpy/0.2.13237 for each tumor sample. Annotations of all samples were done using SnpEff/4.1d238. Tabular data from tumor VCF files were sorted and processed using a custom Python script using Python 3.8.8, Pandas 1.2.4, and NumPy 1.20.1. Unaltered and script data can be accessed from the original publication196. 2.6.6 KM plot visualization The Kaplan-Meier (KM) plot of % tumor free mice by age in Figure 2.4.1C was generated by Briana To using data obtained by Briana To. The plotting software used was GraphPad Prism 9.0. 2.6.7 Circos plot visualizations Circos plots were generated using Circos/0.69-6198 using the somatic consensus of SNV, CNV, inversion, and translocation calls. 54 2.6.8 CNVkit processing and visualization CNVkit plots were generated on CNVKit/0.9.9239. The “cnvkit.py batch –method wgs” option was used to generate the copy number ratios and copy segments files for downstream visualization. The “cnvkit.py scatter” command was used to generate the scatter plots with all default setting maintained. Heatmaps were generated using the “cnvkit.py heatmap -d” options. All SNVs found in the FVB background were filtered out from the analysis. Only SNVs with a somatic p-value <= 0.05 were included in all downstream analyses as determined by VarScan235. CNVs and inversions found in the FVB/NJ background that are on the same chromosome and have a difference of less than 100 bps from tumor CNVs and inversions are excluded from analysis. For translocations, all events on either FVB background that are within 1 kb of either the donor or receiver position on each chromosome are dropped. These steps were accomplished using a custom Python script196. 2.6.9 Mutational signature generation and visualization The Catalogue of Somatic Mutations in Cancer (COSMIC) single base substitution (SBS) signatures for E2F5 cKO tumors were derived using the deconstructSigs R package207. The mouse mm10 reference assembly was used for trinucleotide context when calling SBS signatures. The resulting weights were plotted using matplotlib in Python. 2.6.10 Oncoprint visualization The oncoprint visualization in Figure 2.4.3F was generated using WGS calls for E2F5 cKO tumors in the ComplexHeatmap R package200. 55 2.6.11 Statistical considerations Except otherwise noted, all statistical comparisons are performed with an unpaired students two-tailed, unpaired t-test. 2.7 Acknowledgments Eran Andrechek provided minor revisions and overall story guidance for this chapter. 2.8 Author contributions Briana To and Eran Andrechek are responsible for the planning, acquisition of data, and figure generation for Figures 2.4.1B and 2.4.1C. Briana To is responsible for writing non-computational materials and methods. Carson Broeker is responsible for all other aspects of chapter 2 of this thesis, including hypothesis generation, experimental procedures, data acquisition, interpretation, figure generation, and writing. 56 CHAPTER 3: MMTV-MYC TUMOR HISTOLOGICAL HETEROGENEITY REFLECTS HETEROGENEITY IN HUMAN BREAST CANCER 57 3.1 Preface This chapter has previously been published as the following manuscript. Modifications to the text and figures have been made: Broeker, C.D., Ortiz, M.M.O., Murillo, M.S. et al. Integrative multi-omic sequencing reveals the MMTV-Myc mouse model mimics human breast cancer heterogeneity. Breast Cancer Res 25, 120 (2023). https://doi.org/10.1186/s13058-023-01723-3 This article is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. 58 3.2 Abstract Breast cancer is a complex and heterogeneous disease with distinct subtypes and molecular profiles corresponding to different clinical outcomes. Mouse models of breast cancer are widely used, but their relevance in capturing the heterogeneity of human disease is unclear. Previous studies have shown the heterogeneity at the gene expression level for the MMTV-Myc model but have only speculated on the underlying genetics. Herein, I examined three common histological subtypes of the MMTV-Myc model through whole genome sequencing and have integrated these results with gene expression data. Significantly, key genomic alterations driving cell signaling pathways were well conserved within histological subtypes. Genomic changes included frequent, co-occurring mutations in proto-oncogene c-KIT (Kit) and retinoic acid receptor-α (Rara) in the microacinar histological subtype as well as Scrib mutations in the EMT subtype. EMT tumors additionally displayed strong Kras activation signatures downstream of genetic activating events primarily ascribed to Kras activating mutations, but also Fgfr2 amplification. Orthologous genetic events in human breast cancer showed stark decreases in overall survival. In further analyzing transcriptional heterogeneity of the MMTV-Myc model, I report a supervised machine learning model that classifies MMTV- Myc histological subtypes and other mouse models as being preferentially representative of different human intrinsic breast cancer subtypes. I conclude that the well-established MMTV-Myc mouse model presents further opportunities for investigation of human breast cancer heterogeneity. 59 3.3 Introduction MYC is a master transcriptional regulator that is amplified in approximately 15– 20% of breast cancers and overexpressed in up to 35% of breast cancers240. All MYC family genes (c-MYC, N-MYC, L-MYC, B-MYC) contain basic helix-loop-helix domains, which allow for heterodimerization with MYC-associated factor X to bind consensus E- box sequence motifs on core gene promoter regions241; this allows for the initiation of transcription of MYC responsive genes, including regulation of proliferation, cell growth, differentiation, nucleotide biosynthesis, DNA replication, RNA levels, and apoptosis242,243. MYC amplification is often enriched in high-grade breast cancers244,245, triple-negative breast cancers246, and basal-like breast cancers247. This is supported by evidence showing that MYC occupies the promoters of most active genes in tumorigenesis and that MYC acts as a non-specific amplifier of whole cell gene expression248. TNBC and basal-like breast cancer subtypes carry poor prognostic clinical outcomes relative to other breast cancer subtypes249. In mouse models, several groups have demonstrated that overexpression of MYC alone is sufficient for the development of spontaneous mammary tumors with high penetrance107,250,251. There is a clear, concerted interest to develop mouse models of breast cancer that recapitulate the heterogeneous features of human breast cancer to better understand disease dynamics. In response, numerous mouse models of breast cancer have been created, characterized by a wide variety of genetic drivers including constitutive overexpression of endogenous oncogenes specifically in the mouse mammary gland (MMTV-Neu, MMTV-Cyclin D1, MMTV-Akt1)252, conditional overexpression of oncogenes in the mammary gland (MMTV-rtTA/TetO-NeuNT253, MTB/TWNT254), 60 nonfunctional or conditional loss of tumor-suppressor genes (Stat1−/−, MMTV- Cre/BRCA1fl/fl)252, or overexpression of exogenous transforming oncoproteins (MMTV- PyMT)112. Various mouse models of breast cancer have had their gene expression profiles characterized255,256, with most models clustering largely within one human intrinsic breast cancer subtype. While many mouse models have been generated that model a specific aspect of human breast cancer biology, few realize the full spectrum of heterogeneity present within human breast tumors or accurately model clinical features257. Previous studies revealed the MMTV-Myc model of breast cancer mimics many human disease parameters, including substantial histological and transcriptional heterogeneity250. These analyses revealed a close association between the transcriptional profile and histological subtype among MMTV-Myc tumors. Other studies have also shown that tumors derived from different mouse models share the same histological patterns cluster together at the gene expression level better than they do with tumors of the same genotype but with different histological features256. Importantly, MMTV-Myc samples have one of the most varied gene expression patterns amongst models, with distinct subsets of MMTV-Myc samples clustering closely with all intrinsic human breast cancer subtypes256. While the contributions of the tumor microenvironment, epigenetics, tumor metabolome, immune composition, and other factors are imported and have been highlighted recently for their roles in cancer258–260, all cancers at their core are driven by genetic aberrations by activation of oncogenes and inactivation of tumor-suppressor genes. The advent of anti-estrogenic compounds such as tamoxifen in the treatment of hormone-receptor positive breast cancers261 and the monoclonal antibody trastuzumab 61 to treat HER2+ breast cancer262 have demonstrated the utility in targeting breast cancer with therapies based on genomic, transcriptomic, and immunohistochemical data. In realization of the complex molecular aberrations driving all cancers, a concerted effort in the form of TCGA was created and serves as a critical repository of integrated molecular cancer data for researchers263. Despite the clear relevance of genetic aberrations in human breast cancer progression and the wide use of mouse models to study human breast cancer in vivo, relatively few mouse models of breast cancer have been characterized at the genome level, barring MMTV-PyMT118,120, MMTV-Neu118, and NRL- PRL121 models. Previous WGS of GEMMs has revealed important human disease parallels, as sequencing of the MMTV-Neu mouse model revealing a conserved interspecies coamplification event that exists in 25% of human HER2+ breast cancers and 9% of all breast cancers. Preclinical functional studies showed the presence of this coamplification event increases migration in vitro, increases metastasis in vivo, and reduces distant metastasis free survival in humans118. In the MMTV-PyMT model, it was found that mutations in the phosphatase PTPRH led to increased EGFR phosphorylation and EGFR signaling118, with clustered regularly interspaced short palindromic repeats (CRISPR) ablation and rescue experiments demonstrating that PTPRH was normally dephosphorylating EGFR264. In each model, genomic alterations drove key aspects of cellular signaling, altering tumor biology, with key similarities to human cancer. These studies underscore the importance of characterizing genomic alterations in mouse models of cancer and the utility of integrating gene expression and DNA sequencing of GEMMs to find clinical parallels in human cancers. 62 Herein, I report our findings of interrogating the genomic and transcriptional heterogeneity within the MMTV-Myc mouse model. I performed WGS on three common, distinct histological subtypes that arose in this model: microacinar, squamous, and EMT. I hypothesized genomic heterogeneity would be present across subtypes but would be conserved within each histological subtype. Importantly, each tumor used for WGS has matched gene expression data. Integration of both genomic and transcriptomic data enabled the identification of active signaling pathways exploited in each histological subtype and their respective genetic origins. From this, targeted in vivo preclinical studies can be designed for the highest potential clinical translation into humans. 3.4 Results 3.4.1 Heterogeneous and conserved genomic alterations determined by histological subtype Based on the diverse transcriptional and histological subtypes observed in the MMTV-Myc tumors250, I hypothesized these phenotypes were due to a divergence in genomic changes conserved between each histological subtype. To ascertain putative conserved genomic changes within each histological subtype, short read WGS was performed on randomly selected tumors of microacinar, squamous, and EMT histological subtypes, later integrated with gene expression data obtained from matched tumor samples. A great deal of genomic heterogeneity was observed between the histological subtypes as shown by representative Circos plots for the microacinar (Figure 3.4.1A), squamous (Figure 3.4.1B), and EMT (Figure 3.4.1C) tumors. Few inversions and translocations were found across all tumors, consistent with rates of large structural 63 rearrangements in human breast cancer265. Most differences in genetic aberrations between subtypes were confined to SNVs and CNVs. Strikingly, all microacinar tumors sequenced shared the same whole chromosome amplification events on chromosomes 15 and 11 as revealed by copy number segmentation calls (Figure 3.4.1D). Estimated total ploidy gain for chromosome 15 is 2, for a total of 4 gene copies, and ploidy gain of 1 for chromosome 11, for a total of 3 gene copies. The predicted integer copy number gains were consistent across all microacinar tumors. While many whole chromosome amplifications and deletions were observed in EMT tumors, none were consistent across the histological subtype. Interestingly, there were very few focal and whole chromosome CNVs in individual squamous tumors and none shared across the subtype. The majority of CNVs across all tumors sequenced were broad and affected the entire chromosome rather than a discrete region within the chromosome, with a notable exception discussed in a later figure. This is consistent with human breast cancer CNVs, which often affect a whole chromosome arm on either side of the centromere266 except in the cases of strong selective pressure over a particular region, such as the cases of ERBB2 in breast cancer267 or AR in prostate cancer268. It is important to note that mouse chromosomes in their entirety are telocentric or acrocentric, with vanishingly small p arms, so the q arms of these chromosomes are de facto the entire functional chromosome. This contrasts with human chromosomes, which span metacentric, submetacentric, and acrocentric. Thus, mouse genetic aberrations affecting the entire chromosome do not show deletion in one arm and amplification in another chromosome arm that often occurs in human breast cancer51. Differential gains and losses of chromosome arms in cancer are largely the result of isochromosome formation, 64 where instead of dividing longitudinally, the centromere of a chromosome divides transversely during mitosis, producing one daughter cell with two long arms and one daughter cell with two short arms of the chromosome269,270. 3.4.2 Gene copy number correlates with gene expression It is well established that CNVs typically correlate strongly with changes in gene expression across cancer types271 and have been used to target cancer dependencies for therapeutic intervention272. Despite this, I sought to establish correlation between gene expression and CNVs in the MMTV-Myc mouse model tumors empirically within our own dataset. When correlating the estimated absolute integer copy number gain or loss for each gene with their matched gene expression sample and extending these results to 33 additional samples profiled by microarray, I obtained high Pearson’s r and Kendall’s τ correlation coefficients broadly across the genome for CNV sites (Figure 3.4.2A). Of note, the highest and most consistent correlation for both Pearson’s r and Kendall’s τ occurs on chromosomes 15 and 11, suggesting the highly conserved ploidy gains across microacinar samples translates well into increased gene expression. After establishing the link between CNVs and gene expression in the histological subtypes, I wanted to examine whether gene expression differences localized to human defined cytogenetic bands could stratify the histological subtypes. Indeed, when performing single sample GSEA (ssGSEA) on gene expression data using the MSigDB C1 positional gene set, I find that unsupervised hierarchical clustering of these data post normalization largely clusters histological subtypes separately (Figure 3.4.2B), recapitulating clustering from raw gene expression data250. These data suggest CNVs or other events confined to specific genomic regions are largely responsible for gene 65 expression differences between histological subtypes. Principal component analysis (PCA) on these same C1 ssGSEA values reveals a similar clustering pattern among subtypes (Figure 3.4.2C). Principal component 1 was able to explain over 25% of the variance in this population, which largely separated the EMT and microacinar subtypes, with squamous tumors infiltrating both clusters or being separated mostly by principal component 2. Subsequent principal components were responsible for low single digit percentages in explanation of variance or less (data not shown). 3.4.3 Mutational signatures vary dramatically between mouse models of breast cancer, but not within MMTV-Myc histological subtypes Having established CNVs as being a critical factor in determining gene expression in MMTV-Myc tumors, I sought to investigate whether differing mutations within these tumors played a role in gene expression changes. After limiting gene expression data to genes predicted to have moderate or high impact mutations as determined by Mutect2, I did not see a statistically significant difference in the proportion of genes that were differentially regulated compared to the overall proportion of all genes that are differentially regulated between subtypes (data not shown). This is unsurprising as mutations typically do not affect self-gene expression, except in the case of truncating mutations273. Next, I examined wholistic mutation patterns between each histological subtype and compared them to mutation patterns in previously sequenced MMTV-Neu and MMTV-PyMT tumors118. I found no large differences in overall mutational burden between MMTV-Myc subtypes, although I found trends between mouse models (Figure 3.4.3A). Investigating these trends, I find mutational burden did not diverge significantly between 66 MMTV-Myc subtypes but varied considerably in the MMTV-Neu mouse model. The MMTV-PyMT model exhibited the highest overall mutational burden of mouse models analyzed, but also demonstrated considerable variability between individual tumors. I hypothesized these differences in mutational burden could arise from separate oncogenic drivers and mutational processes within each mouse model given the extensive characterization of mutational processes in human cancers221,274. To examine mutational processes in our mouse models, I utilized deconstructSigs207 to generate different COSMIC single base substitution (SBS) signatures that take adjacent nucleotides into context and ascribes etiologies to different trinucleotide mutational patterns. I found that all MMTV-Myc tumors regardless of subtype were dominated by the impaired HRR signature (Figure 3.4.3B). The HRR signature is strongly associated with germline and somatic mutations in BRCA1 and BRCA2 mutations in human breast cancer221. No mutations in BRCA1 or BRCA2 were found in any of the MMTV-Myc tumors analyzed, with these signatures possibly the result of BRCA1/BRCA2 promoter hypermethylation or other factors not analyzed. MMTV-Neu tumors were dominated by the clock-like aging signatures, while MMTV-PyMT tumors demonstrated large tobacco smoking signatures. Given that all mouse models were raised in similar controlled environments, these data suggest an unidentified endogenous C > A mutational mechanism present in MMTV-PyMT tumors. While overall mutational burden and mutational signatures cannot parse between histological subtypes of the MMTV-Myc tumors, I reasoned that specific mutations may be associated with each subtype. Indeed, I find a small number of conserved (≥ 66% of tumors in each subtype) and impactful mutations within each subtype (Figure 3.4.3C). 67 Notable conserved mutations in the EMT subtype include Kras G12D activating mutations and splice variants in Scrib. This may be significant as others have found SCRIB cooperates with MYC for transformation and mislocalization of SCRIB within the cell, sufficient to promote cell transformation275. Interestingly, the squamous subtype had no discernible conserved and impactful mutations between tumors. There may be heterogeneous conservation of signaling pathways activated at the transcriptional level, though, as each squamous tumor had impactful mutations in transcription factors: zinc- finger and BTB domain containing genes, zinc-finger protein genes, or both. For the microacinar tumors, I observed conserved missense mutations resulting in amino acid change A538E in Kit and amino acid A255D for Rara. KIT is a well-established oncogene, particularly in AML276 where mutations play a large role in pathology, and RARA is involved in embryonic development whose disruption is well established in carcinogenesis277. Interestingly, Kit and Rara mutations were found to be mutually inclusive in the tumors sequenced. Of the 9 tumors sequenced at the genome level, 5 were found to have both the A538E Kit and A255D Rara mutations, while the other 4 tumors contained no discernible mutations whatsoever in either Kit or Rara. To validate our WGS findings and evaluate the extent of these mutations further in other MMTV-Myc tumors, Mylena Ortiz performed Sanger sequencing on the tumors that previously underwent WGS and an additional two tumors from each histological subtype. From these data, we confirmed that these Kit and Rara mutations were mutually inclusive in a larger population of 7 of the 15 tumors that underwent Sanger sequencing. From the total populations analyzed, these cooccurring mutations were present in 60% of both the microacinar and squamous 68 subtypes, while only 20% in EMT. These data may suggest a link between Kit and Rara given their cooccurrence patterns; however, the exact functional implications of these mutations are not known. To gain a better understanding of the potential functional impact of the co-occurring Kit and Rara mutations, I performed multiple sequence alignment on Kit and Rara amino acid sequences from their most prevalent isoform using Clustal Omega278. Comparing species, Kit A538 in the mouse maps to KIT M535 in humans with an overall interspecies amino acid identity of 82.77% for the full protein. Both mutations reside in exon 10 of their respective species, where the majority of this exon codes for the transmembrane domain in topological space. According to TCGA PanCancer Atlas data shown in cBioPortal, the KIT transmembrane domain in humans spans amino acids 525–545, suggesting that the analogous KIT A538E mutation in the mouse occurs directly in the middle of the transmembrane domain. Additionally, while TCGA PanCancer data do not show confirmed pathogenic mutations in the transmembrane domain of KIT, the transmembrane domain resides between two confirmed mutational hotspots labeled as likely oncogenic. I speculate that the A538E mutation will destabilize the transmembrane domain of Kit leading to dysregulated Kit signaling. Follow up functional studies will need to be performed to ascertain the causal effects of this mutation in mice and in humans. Similarly, Rara A255 in the mouse maps directly to RARA A255 in humans. Importantly, exon 6, where the mutations reside in both species, shares 100% identity, while the whole protein sequences share 89.98% identity. Exon 6 in both species comprises the beginning portion of the hormone receptor ligand binding region of RARA. The TCGA PanCancer cohort does not contain RARA A255D mutations and mutations 69 do not confirm nor deny pathogenicity. I hypothesize that the A255D mutation interferes with retinoic acid binding and heterodimerization with the retinoid X receptor, which thus inhibits the transcriptional activation of downstream genes that would lead to cell differentiation and cell cycle control279,280. 3.4.4 Overexpression of Kras signaling in the EMT subtype is accomplished through either Kras activating mutations or Fgfr2 amplification Previous studies on the MMTV-Myc model have shown that the EMT subtype often possessed activating Kras mutations and increased Ras signaling250, whereas the squamous and microacinar subtypes largely did not (Figure 3.4.4A). This was the case for EMT tumors 222 and 1938 that underwent WGS, where both had activating Kras G12D missense mutations, while the squamous tumor 1066 acquired the less transforming but still activating Kras G13R mutation (Figure 3.4.4B). Consequently, when comparing ssGSEA values for the C6 oncogenic signature gene sets between the three histological subtypes, EMT tumors consistently had upregulation of various Kras signaling pathways across tissue types (Figure 3.4.4C). Investigation of a representative Kras signaling pathway shows the probability of Kras activation is considerably higher in EMT than both microacinar and squamous samples (Figure 3.4.4D). Despite high ssGSEA Kras activity scores, EMT tumor 1356 possessed no Kras mutations or mutations in genes elsewhere in the Ras pathway. For this tumor, copy number segmentation data obtained pointed to a high-level amplification event on chromosome 7, encompassing Fgfr2 and Ate1 (Figure 3.4.4E). Of particular significance is the estimated copy number gain of this region. EMT tumor 222 had a similarly bounded focal amplification event over Fgfr2 and Ate1 that increased both gene copy numbers by 70 1, but EMT tumor 1356 has an estimated copy number gain of 471 over this region and correlates well with gene expression levels of Fgfr2. The scale of this amplification event suggests a strong selective pressure for focal amplification of Fgfr2 in EMT tumor 1356. When examining the pathways of FGFR2 in the literature, FGFR2 acts directly upstream of RAS-MAPK, PI3K-AKT, and JAK-STAT signaling pathways281 (Figure 3.4.4F). Thus, I propose that the increase in Kras signaling seen in EMT tumor 1356 is due to the large, focal amplification of Fgfr2 and subsequent activation of Ras signaling. Stemming from the previous assessment, it is likely that the EMT histological phenotype is dependent on increased Ras signaling, ostensibly through heterogeneous genetic mechanisms. 3.4.5 Squamous tumors possess an intermediate gene expression state between microacinar and EMT tumors To this point, there are strong associations between copy number amplifications specific to the microacinar subtype and heterogeneous genetic events that lead to KRAS pathway activation in the EMT subtype. However, there are no readily identifiable conserved genetic features that can explain the squamous phenotype. Despite this, squamous tumors consistently occupy a gene expression state between that of the EMT and microacinar subtypes. PCA of C2 curated gene sets (Figure 3.4.5A) and C6 oncogenic signature gene sets (Figure 3.4.5B) show clustering of squamous samples between microacinar and EMT, with some squamous samples invading both the microacinar and EMT clusters. From the previous results, I sought to examine the relationship between individual pathways in each gene set that could explain these differences. To accomplish this, I chose the top three differentially expressed representative pathways from EMT and 71 microacinar pathway signatures for C2 (Figure 3.4.5C) and C6 (Figure 3.4.5D) MSigDB gene sets. While there were no statistically significant correlations within each histological subtype (data not shown), likely due to limited numbers of samples, pairwise correlations of pathway signatures across all three subtypes showed both significant positive and negative correlations between ssGSEA pathway activities. Beyond this, top differentially expressed pathway activities appear to lie on a continuum, with squamous tumors routinely spanning between and infiltrating the microacinar and EMT clusters. These data are consistent with the squamous histology occupying an intermediate phenotype between that of EMT and microacinar subtypes. 3.4.6 Genomic alterations identified in MMTV-Myc EMT tumors are conserved in human breast cancer and portend poor clinical outcomes It is clear that the MMTV-Myc mouse model produces primary mammary tumors that are heterogeneous in histology, gene expression, metastatic variance250, and now somatic genomic perturbations that can explain many of the transcriptional differences seen in this model. However, the significance of these events and translational potential to humans is not immediately obvious. To evaluate whether the events seen in MMTV-Myc mouse model could have predictive power in clinical outcomes, I utilized an integrative approach, combining gene expression data and somatic genetic events to examine how they can parse human breast cancer subtypes. To this end, I took all genes that were differentially expressed between MMTV-Myc histological subtypes that were also present in conserved copy number gain or loss events to obtain the integrative gene set (Table 3.4.6.1). Subsequently, I performed unsupervised hierarchical clustering on human gene 72 expression from the METABRIC breast cancer dataset282–284 limited to the integrative mouse gene set (Figure 3.4.6A). I found distinct clusters emerged that well represented the intrinsic subtypes of breast cancer. Importantly, clustering from the same number of genes used but instead randomly selected failed to resolve intrinsic breast cancer subtype clusters to the same degree. This suggests that the integrated mouse gene set represents a diverse set of informative genes across all intrinsic subtypes of human breast cancer that can effectively differentiate between subtypes rather than contributing noise. Clustering by the PAM50 gene set showed improved subtype clustering overall relative to the mouse integrative gene set but did not resolve the luminal A and luminal B subtypes to the same extent (Figure 3.4.6B). It is important to note that the PAM50 gene set is curated specifically to be able to differentiate between human intrinsic breast cancer subtypes, so seeing improved performance relative to the integrative mouse gene set is expected. It is clear that genetic events in the MMTV-Myc mouse model and their resulting gene expression differences can resolve human breast cancer intrinsic subtypes. However, it is unclear to what extent these genetic events in the MMTV-Myc mouse model represent genetic events occurring in human breast cancer. To address this, I assessed publicly available TCGA datasets on breast cancer available through cBioPortal. I assessed prevalence of genetic events in human breast cancer first identified to be conserved both in genomic alteration status and differential gene expression in the MMTV-Myc mouse model. Subsequently, I examined the effects of these genes on human breast cancer overall survival clinical outcomes through Kaplan–Meier analysis. 73 All of the genetic events examined were found to be co-occurring with MYC amplification in human breast cancer (Table 3.4.6.2), especially supporting the use of the MMTV-Myc mouse model in studying MYC-driven human breast cancers. However, because of the co-occurring nature of these events and the limited number of patient samples with genetic events and matched clinical outcomes, statistical significance is difficult to achieve in mutually exclusive populations of MYC amplification and identified genetic events. MYC amplification and overexpression are well described in TNBC and basal-like subtypes of breast cancer240,285, known as the deadliest subtypes of breast cancer currently286,287, which must be accounted for in survival analysis. To remedy this, I look at relative differences in overall survival from MYC amplification compared to analogous genetic events in humans identified in the mouse model. Kaplan–Meier analysis on breast cancer patients revealed MYC amplification was present in 18.2% of patients surveyed, with median overall survival at 135.2 months (95% CI: 113.7–148.1) compared to the unaltered population with median overall survival at 171.3 months (95% CI: 161.2–182.9) (Figure 3.4.6C). When comparing the patient population of those harboring KRAS activating mutations or amplifications, which account for 2.2% of all breast cancer cases, I find a marked decrease in overall survival compared to the unaltered cohort (Figure 3.4.6D) with median overall survival at 77.7-months (95% CI: 61.8–146.4) for the KRAS altered population compared to median overall survival of 164.3 months (95% CI: 154.3–173.0) for the unaltered cohort. While it cannot be ruled out that MYC amplification co-occurrence in the KRAS altered cohort reduces overall survival, the KRAS altered population maintains a 57.5-month median overall survival deficit to MYC amplification alone. Thus, MYC amplification may only have a modest 74 additive effect to KRAS amplification or activating mutations in this cohort. It is worth noting that the unaltered population in the KRAS cohort has reduced overall survival compared to the MYC unaltered cohort, suggesting most patients with MYC amplification are largely shunted to the unaltered group. Similarly, patients with focal FGFR2 amplifications, accounting for 1.5% of all breast cancer cohort patients, also exhibit a marked decrease in overall survival compared to the unaltered cohort (Figure 3.4.6E) with median overall survival at 87.7 months (95% CI: 62.4–191.0), contrasting with the unaltered cohort at 163.5 months (95% CI: 154.0–172.9). Again, patients with FGFR2 amplification fare significantly worse than those with MYC amplification, standing at a difference of 47.5-month median overall survival difference. While KRAS and FGFR2 alterations are infrequent in breast cancer, these alterations may be extremely significant in the prognosis and treatment of their disease for the few patients with these genetic alterations. Altogether, these analyses reveal conserved genetic events between both human MYC-driven breast cancer and the MMTV-Myc mouse model of breast cancer. Importantly, these somatic genetic events impact overall survival in human breast cancer patients in large excess of what MYC amplification causes. 3.4.7 Supervised machine learning can predict corresponding human breast cancer subtypes to both murine models of breast cancer and histological subtypes within each murine breast cancer model Thus far, I have shown that there exist tightly linked transcriptional and genomic perturbations in the MMTV-Myc model, which are heterogeneous across its histological subtypes with clinical implications in humans. However, whether these histological 75 subtypes are representative of different human breast cancer intrinsic subtypes is unclear. Unsupervised clustering by the integrative mouse gene set shows modest ability to parse human intrinsic breast cancer subtypes, but it falls short of being able to discriminate between genes that can best stratify patients by intrinsic subtype, and thus which MMTV-Myc histological subtype is representative of a given human intrinsic subtype. To this end, I employed an ensemble machine learning model classifier trained on human METABRIC gene expression data to predict which human breast cancer intrinsic subtype each MMTV-Myc histological subtype best represents. To accomplish this, I first combined the raw METABRIC microarray gene expression data284 with the MMTV-Myc microarray data (GSE15904), along with additional mouse microarray cohorts for variations of the MMTV-Neu mouse model (GSE42533) and the MMTV-PyMT mouse model (GSE104397) for comparison. PCA of normalized data revealed strong batch effects between datasets that would distort causal biological interpretation (Figure S 3.4.7.1). In removing batch effects, I employed the parametric empirical Bayes shrinkage adjustment available from ComBat288, effectively eliminating non-biological differences between datasets (Figure S 3.4.7.2). I then sought to narrow down the number of genes/features in this dataset to avoid overfitting the model and make it more generalizable to new patient data for intrinsic subtype prediction. PAM50 is a well-established scoring metric of gene expression data for 50 genes to stratify breast cancer patients into intrinsic subtypes and offer clinical prognoses289. From here, I employed recursive feature elimination with cross-validation (RFECV) with a support vector machine (SVM) radial basis function (RBF) kernel estimator to determine the optimal features in the dataset. Surprisingly, 32 specific genes 76 (Figure S 3.4.7.3) (Table 3.4.7) from the PAM50 subset gave higher cross-validation scores than utilizing all 50 genes. In testing various estimators for the machine learning classifier model limited to this 32 gene subset, I found a great deal of variability between models in hard classification predictions between model instantiations. To alleviate this, I utilized a soft voting classifier composed of logistic regression, SVM with RBF kernel, random forest, XGBoost, and multi-layer perceptron estimators averaged over 15 k-fold instantiations. Pooling multiple estimators together was done to reduce bias from any one particular estimator and to increase overall accuracy. Subsequently, the highest average probability of a given class will decide which intrinsic subtype a mouse tumor belongs to. The motivation for developing a supervised machine learning model classifier is evident after examining the distribution of the combined human and mouse gene expression dataset through a scatterplot of t-distributed stochastic neighbor embedding290 (t-SNE) data in two-dimensional space (Figure 3.4.7A). There are clusters of human breast cancer samples forming the intrinsic subtypes characterized by distinct biological pathways33, evolutionary progenitors of each subtype291, and different clinical outcomes for each breast cancer subtype292. However, claudin-low spans across multiple other subtypes within the image, which supports the argument that claudin-low breast cancers represent an additional complex phenotype rather than an intrinsic subtype of breast cancer34. The mouse samples are well dispersed throughout the plot even within the same mouse model, pointing to substantial gene expression heterogeneity within each model. However, problems arise in that it is not immediately obvious which cluster each mouse sample belongs to. When using t-SNE, there is also necessarily a loss of information even when performing nonlinear dimensionality reduction, in this case from 77 32 to 2 dimensions. This is nothing to say of the tendency for gradient descent algorithms such as t-SNE to become stuck in local optima. Additionally, other unsupervised methods such as hierarchical clustering equally weight all features, which decreases accuracy of the model. For these reasons and more, there was a clear need to develop a supervised machine learning classifier using the 32 gene feature set. Accuracy scores and F1 scores are often used to evaluate the efficacy of supervised machine learning models. However, it has been shown that these metrics can be overinflated293, and so I also report the more robust Matthew’s correlation coefficient (MCC) metric, which proportionally accounts for true positives, true negatives, false positives, and false negatives. For the soft voting classifier used to predict mouse-to- human subtypes, I obtained an average accuracy score of 80.0%, a weighted F1 score of 80.0%, and an MCC score of 74.0% after 15 k-fold stratified and shuffled cross- validation using a 70% train and 30% test split (Figure 3.4.7B). All metrics used to evaluate the machine learning model are in high agreement and show the model is effective at predicting human breast cancer intrinsic subtypes. To better visualize these predictions on the test data, I constructed a confusion matrix showing true classes on the vertical axis and predicted class on the horizontal axis (Figure 3.4.7C). Many of the most confused classes correspond well to the t-SNE visualization where clusters overlap, including the overlap of luminal A and luminal B classes, overlap of luminal A and normal classes, and the overlap of claudin-low and basal classes. All metrics combined show that the initial METABRIC gene expression data limited to the 32 most predictive genes shared with the mouse microarray data has significant power in discriminating between intrinsic subtypes of breast cancer using the soft voting classifier. 78 When applying the classification model to mouse gene expression data, I obtain heterogeneous subtype predictions across the various MMTV-Myc-, MMTV-Neu-, and MMTV-PyMT-based mouse models (Figure 3.4.7D). Our initial hypothesis postulated that the MMTV-Myc model would enrich for claudin-low and basal-like tumors, similar to how human breast cancer patients with amplified MYC are preferentially basal-like or claudin-low. Indeed, for MMTV-Myc tumors overall, 30% are predicted as claudin-low, 24% are luminal A, 16% are luminal B, 11% are basal, 11% are HER2+, and 8% are normal-like. Large differences in intrinsic subtype proportions are evident across all mouse models tested in comparison with human intrinsic breast cancer subtype proportions, with METABRIC intrinsic subtype proportions at 35% luminal A, 24% luminal B, 11% HER2+, 11% claudin-low, 11% basal, and 8% normal-like. The claudin-low subtype is especially enriched across all mouse models examined. Though, ratios of each intrinsic subtype vary considerably between histological subtypes for the MMTV-Myc model (Figure 3.4.7E). Adenocarcinoma tumors are primarily categorized as claudin-low at 57%, with luminal B and basal subtypes trailing at 15% and 14%, respectively. Papillary tumors maintain roughly similar proportions of basal (17%), claudin-low (25%), HER2+ (19%), and luminal B (24%) subtypes. EMT tumors are overwhelmingly claudin-low (44%) and luminal B (33%). Microacinar tumors show the most enrichment for the HER2+ subtype at 21% and tied for most basal enriched with papillary at 17%. Squamous tumors are considerably variable, with 33% classified as luminal A, 22% as claudin-low, and 21% as normal-like, corroborating previous pathway signatures showing squamous tumors are not confined to a specific localized cluster. 79 These results have the potential to inform mouse model use when investigating different subtypes of breast cancer or examining breast cancer heterogeneity more generally. For instance, the MMTV-Neu mouse model is often used as a model of HER2+ breast cancer, but most MMTV-Neu tumors show gene expression similar to luminal A and claudin-low human tumors. While other groups have predicted MMTV-PyMT tumors to correspond to the luminal B subtype255, I predict only 13% of PyMT based mouse models fall into this category, with PyMT tumors overall being quite heterogeneous. In summary, I have created an accurate supervised machine learning classification model that can stratify human breast cancer intrinsic subtypes. When applied to batch effect corrected mouse transcriptional data, I observe diverse intrinsic subtype profiles assigned to different histological subtypes from the MMTV-Myc mouse model. Creation of deep learning models that can incorporate additional multi-omic data and patient metadata for more accurate subtype classification are areas of active development within the breast cancer research community. Development of improved subtype classification models in the future will serve to better identify the biological pathways activated or depressed within each patient to direct them to the appropriate therapeutic route. 3.5 Discussion The utility of mouse models of breast cancer has progressed from overexpression of a driving oncogene to questions as to whether they accurately mimic the heterogeneity and progression of human breast cancer. Here, I have described the utility of the MMTV- Myc GEMM in recapitulating the histological, transcriptional, and genomic heterogeneity of human breast cancer. 80 Conserved somatic genetic changes across MMTV-Myc histological subtypes are associated with negative effects on overall survival when applied to human breast cancer patients. These somatic events have largely been overlooked previously due to their low prevalence in human breast cancer and resulting lack of statistically significant differences in clinical outcomes. Given that MYC is frequently amplified in basal-like and TNBCs240, and the current lack of targeted therapies in TNBCs294, it is likely there is no consistent driver of oncogenesis across all TNBC patients. Instead, I hypothesize there may be low prevalence oncogenic drivers that lead to similar transcriptional profiles ultimately, but each patient’s tumor maintains different genetic drivers. An example that arises in this paper is that of activating Kras mutations or significant ploidy gain of Fgfr2. Activation of either proto-oncogene will result in increased MAPK signaling, but treating the root oncogenic driver will require different therapeutic strategies. The most prevalent KRAS mutations in breast cancer are G12C/D/V/A mutations, with some of these G12C mutant patients potentially benefitting from treatment with the recently developed sotorasib295 or adagrasib296 therapies. However, patients with amplified FGFR2 would not respond to these therapies and instead would more likely benefit from highly selective FGFR inhibitors such as AZD4547. A recent clinical study found that treating endocrine therapy-resistant breast cancer patients with AZD4547 achieved partial response in some patients, with differentially expressed genes involved in FGFR signaling able to distinguish between responders and non-responders297. Although FGFR1 is amplified more often in breast cancer, both FGFR1 and FGFR2 amplified and overexpressing breast cancers could likewise benefit from AZD4547 treatment. 81 Obscn was identified with conserved mutations across all histological subtypes, with OBSCN mutations correlating poorly with overall survival in humans. However, OBSCN is a large gene where mutations often co-occur with other large genes, such as TTN, MUC family genes, and FAT family genes. Therefore, mutations in OBSCN are likely biomarkers of hypermutational burden rather than OBSCN playing a tumor suppressive or oncogenic role. This could still be useful information, as hypermutant individuals are more likely to respond to immune checkpoint inhibitors298. Aside from direct clinical implications in humans, these results reveal human intrinsic subtype analogs in the MMTV-Myc, MMTV-Neu, and MMTV-PyMT mouse model histological subtypes. Others have classified mouse models of breast cancer and ascribed them to different human intrinsic subtypes of breast cancer previously in unsupervised methods255,256. However, these analyses are not weighted for genes that are able to discriminate between intrinsic subtypes; they may be biased toward noise in the dataset rather than predictive signals and do not produce metrics for scoring accuracy of the model. To rectify this, I created an accurate machine learning classification model trained on human gene expression data and applied to batch effect corrected gene expression data of different mouse models of breast cancer. From these results, I see there is substantial heterogeneity within the histological subtypes of the MMTV-Myc model. However, the proportions in MMTV-Myc tumors that match human intrinsic subtypes are skewed relative to their occurrence in humans. I see a general decrease in luminal tumors and an increase in claudin-low and normal-like tumors. While this is true overall for the MMTV-Myc mouse model, it is highly dependent on the histology of the 82 tumor. For example, adenocarcinoma tumors are 57% claudin-low, while microacinar tumors are 11% claudin-low. EMT tumors are largely mapped to claudin-low and luminal B intrinsic subtypes. The EMT subtype maintains great variability in CNVs, similar to that of human luminal B breast cancer299, but the gene expression signatures of EMT tumors overlap with the canonical signatures associated with human claudin-low breast cancer: high expression of markers for cytotoxic T-cell and natural killer cell infiltration (Granzymes C, D, E, F, and G), high expression of dormancy markers (NR2F1), and low expression of cell-adhesion proteins (CLDN2, GJB1, CEACAM1)300. These observations may be useful to cancer researchers when selecting a mouse model for studying a specific subtype of breast cancer, particularly the adenocarcinoma or EMT histological subtypes as there are no established spontaneous models of breast cancer that exclusively mimic claudin-low breast cancer255. These data suggest that the adenocarcinoma or EMT histological subtypes from the MMTV-Myc GEMM could be a reliable immunocompetent model for claudin-low breast cancer. The copy number changes identified among the MMTV-Myc tumors sequenced, particularly the conserved ploidy gains on chromosomes 11 and 15 in the microacinar tumors, correlate well with gene expression changes. Erbb2 and related genes lie on chromosome 11, which may explain the propensity of microacinar tumors to be HER2- like than other histological subtypes. It is likely these CNVs are causal in driving gene expression changes between histological subtypes, although I cannot confirm that with these limited data. CRISPR knockout followed by gene addback experiments could be used to validate these findings. The highly conserved copy number gains seen in the microacinar tumors suggest a strong selective pressure for amplification and 83 overexpression of genes in these regions. Upon examining the human homologs and their syntenic chromosomal regions for the integrated mouse gene set, I find the two largest high synteny regions are the entirety of chromosome 17 following from the mouse chromosome 11 amplification and the long arm of chromosome 8 (8q) following from the mouse chromosome 15 amplification. It is interesting to note that human chromosome 8q contains the MYC locus and chromosome 17q contains the ERBB2 locus, with amplification of these two loci highly correlated in human breast cancer. Regions 8q and 17q are among the most frequently amplified regions in human breast cancer301,302. However, given the small sample size of microacinar tumors used and experimental setup, it is impossible to determine whether either of these amplifications has causal implications for the other. It should also be noted that chromosome 17p is often deleted in many human breast cancers while the entirety of chromosome 17 in humans is able to be mapped to mouse chromosome 11. Mouse chromosomes are telocentric, and given that these frequent large amplifications lie on either side of the centromere in humans, it is possible that amplification of the analogous genes from region 17q is of higher selective consequence than deletion of the analogous genes from region 17p. Comparative genomics between the MMTV-Myc histological subtypes and MYC-driven human breast cancers may be an important area of future study. In summary, a significant hurdle in the study of breast cancer in vivo has been the limitations of mouse models recapitulating the heterogeneity found in human breast cancer. I report that the MMTV-Myc GEMM recapitulates the histological, transcriptional, and genomic heterogeneity found in human breast cancer, with important clinical parallels identified. I find different MMTV-Myc histological subtypes preferentially represent 84 different human intrinsic breast cancer subtypes, further solidifying the MMTV-Myc model as an appropriate in vivo method for examining the multi-faceted aspects of human breast cancer heterogeneity even down to the gene expression level. 3.6 Materials and methods 3.6.1 Whole genome sequencing Flash frozen MMTV-Myc tumors250 stored at − 80 °C were ground using a sterile mortar and pestle under liquid nitrogen. Three randomly selected tumors each from the microacinar, squamous, and EMT MMTV-Myc histological subtypes underwent DNA extraction. DNA was extracted for each tumor using a Qiagen Genomic-tip 20/G KIT according to manufacturer specifications. Whole genome sequencing was performed at Michigan State University RTSF. DNA was sequenced at 40 × depth using Illumina HiSeq 2500 paired end 150 base pair reads after TruSeq Nano DNA library construction. 3.6.2 Transcriptomics Gene expression data used in this study have previously been published250 on MMTV-Myc and MMTV-Neu tumors and are publicly available in the Gene Expression Omnibus (GEO) under accession number GSE15904. MMTV-PyMT and MMTV-PyMT E2F knockout tumor transcriptional data have been published119 and are available under GEO accession number GSE104397. MMTV-Neu E2F knockout transcriptional data were previously published303 and are available under accession number GSE42533. 3.6.3 Genome sequence processing Raw whole genome sequencing paired end fastq files first underwent initial quality control using FASTQC230. Quality and adapter trimming of reads were performed using Trimmomatic231, with reads reassessed for quality afterward again using FASTQC. Reads 85 were then aligned to the mm10 reference Mus musculus genome using BWA-mem304 with option -M selected for compatibility with Picard. After alignment, read groups were added using Picard305. SAMtools233 was then used to sort bam files, mark PCR duplicated sequences, and index bam files. Discordant and splitter read files were also generated and sorted using SAMtools for downstream somatic variant callers. 3.6.4 Computational somatic mutation determination Somatic mutations were called using the consensus of Mutect2234 (GATK suite) and VarScan235 calls based on chromosome, position, reference base, and variant base. Variants were then annotated using SnpEff238. To reduce false positives, filtering included subtracting FVB-specific mutations from the mm10 C57BL/6 background as indicated in the FVB_NJ.mgp.v5.snps.dbSNP142.vcf file from the Wellcome Sanger Institute. In addition to this, HaplotypeCaller on GATK was used to call germline mutations on the FVB_NJ genome REL-1604-BAM available from the Sanger Mouse Genomes Project306 FTP server. After converting the FVB_NJ.bam WT reference to fastq files using SAMtools, these fastq files underwent the same whole genome sequence processing as MMTV-Myc tumors. Germline variants from this wiltdtype FVB background against the mm10 reference genome were subtracted from somatic variants in each tumor. 3.6.5 Verification of mutations Mutations observed in the whole genome sequencing analysis for Apc, Rara, and Kit genes were screened and confirmed by Sanger sequencing for matched tumor samples and other tumors for a total of 15 samples (5 EMT, 5 squamous, and 5 microacinar tumors). Mylena Ortiz performed all aspects of Sanger sequencing verification and writing materials and methods for Sanger sequencing verification. Briefly, 86 RNA was extracted from mammary tumors using the Qiagen RNeasy Midi KIT. cDNA was generated using the AppliedBiosystems High-Capacity cDNA Reverse Transcription KIT (#4368814). Genes were amplified under PCR using the following primers: c-KIT 5′ ATAGACTCCAGCGTCTTCCG 3′ and 5′ GCTCCCAATGTCTTTCCAAAACT 3′; RARA 5′ TTGTGCATCTGAGTCCGGTT 3′ and 5′ TGGGCAAGTACACTACGAACA 3′; APC 5′ CCGCTCGTATTCAGCAGGT 3′ and 5′ CCTGCAGCCTATTCTGTGCT 3′. PCR parameters are standard parameters as listed on New England BioLabs PCR protocol (M0273) V1. PCR fragments were purified using QIAquick PCR Purification KIT or QIAquick Gel Extraction KIT. Sanger sequencing was performed by GENEWIZ/Azenta under standard premixed conditions, with one of the PCR primers used as the sequencing primer. Sequence alignment and visualization were performed with Geneious Prime- 2020.0.5 software. 3.6.6 Computational somatic copy number determination Somatic copy number variations were determined using multiple methods; discrete copy number variations are based on the consensus of Delly236 and Lumpy237 calls, while whole chromosome ploidy count and segmentations were determined using CNVKIT239. For discrete CNVs, only those with length above 10,000 base pairs, no evidence in the WT background, and a mapping quality (MAPQ) score at 60 or above were included for analysis. Delly and Lumpy calls were combined by similar genome starting and ending positions within a 100 base pair margin of error for difference in CNV length using a custom Python script. Inversions were taken as the consensus between Delly and Lumpy with the same restrictions and filtering steps implemented for CNVs. Translocations were called under 87 similar approaches as CNVs and inversions. The differences are no length minimums, position differences between Lumpy and Delly calls being a maximum of 1,000 base pairs, and MAPQ scores of 50 or greater are included for analysis. To reduce false positives, these translocation calls were then merged with gene break calls made by CNVKIT by gene name. CNVKIT gene breaks were called using the “breaks” option under default options. 3.6.7 GSEA pathway analysis Previously published gene expression data from MMTV-Myc and MMTV-Neu mouse model tumors were downloaded from the GEO DataSets under the accession number GSE15904. The gct converted file containing gene symbols as row names was used as input for ssGSEA available as a module in GenePattern307. Pathway analysis was conducted using the MSigDB gene set databases c1 (positional), c2 (curated), and c6 (oncogenic signature) with default settings. The resulting output data matrix with pathway enrichment scores for each sample was employed for downstream hierarchical clustering and graphical representations. 3.6.8 Circos plots Representative mouse Circos plots for each histological subtype were generated using Circos198 as a software package available on MSU’s high performance computing center. All mutations, CNVs, inversions, and translocations were mapped at exact mm10 genomic coordinates of each event. Working from the outermost region of each plot, it begins with a labeled mouse ideogram with chromosomes presented in ascending order. The next innermost rings consist of SNVs color coded to represent low (yellow), moderate (orange), or high (red) 88 predicted impact as determined by Mutect2. Following, the next inner ring depicts copy number variations as whole integer copy number changes, with height corresponding to copy number integers of one or two, determined by CNVKIT. Copy number gains are depicted in red, while copy number losses are depicted in blue. In the final innermost circle, translocations are colored randomly to one of the two chromosomes involved in the translocation event. Inversions are colored in black. Only somatic variants that satisfy the previous requirements for somatic variant calling are included for each tumor. 3.6.9 Copy number and gene expression correlation Correlation of copy number and gene expression was done on CNVKIT. After generating copy ratios and copy segments under the “batch” WGS method, discrete copy number segments were generated using the “segment” method. The “cnv_expression_correlate” method was used to generate Kendall rank correlation coefficients and Pearson correlation coefficients on discrete copy number data for all 9 tumors that underwent WGS and gene expression data for all 42 tumors that underwent transcriptomic profiling. Correlation coefficients for each gene were then mapped onto Circos plots, with significant Kendall’s τ coefficients (≥ 0.3) colored blue and in the outermost ring. In the innermost ring, significant Pearson’s r coefficients (≥ 0.7) are colored red. Non-significant values for both metrics were colored black. Correlation coefficients could only be generated for genes with discrete copy number changes at ± 1 of ploidy level or greater differences, so a majority of the genome will show no correlation. 89 3.6.10 Unsupervised clustering All unsupervised hierarchical clustering was performed using the clustermap function as part of the Seaborn Python library. Distance between clusters was computed using the Ward variance minimization algorithm. 3.6.11 PCA Principal component analysis was performed using the scikit-learn308 library, utilizing the StandardScaler and PCA packages. The number of principal components analyzed was 2 in every instance. Results were visualized using a custom scatter plot in matplotlib. 3.6.12 Copy number heatmaps All heatmaps displaying log2 fold change of copy number segmentation data were generated using CNVKIT, both for initial copy number segmentation and copy number ratio file generation, as well as visualization. 3.6.13 Mutational signatures Mutational burden plots were generated in matplotlib. Mutational signatures were derived using the deconstructSigs207 R package and plotted using matplotlib. 3.6.14 Mutational burdens WGS data for MMTV-Neu and MMTV-PyMT previously analyzed in the laboratory and used in mutation plots are available under the NIH SRA with BioProject number PRJNA541842. Processing of Neu and PyMT WGS fastq reads was done according to the same methods as the MMTV-Myc samples. 90 3.6.15 Volcano plot of ssGSEA values The volcano plot of ssGSEA c6 oncogenic signatures for the EMT subtype compared to both squamous and microacinar signatures was plotted in Python using BioinfoKIT309. The log-fold-change threshold is at 0.4, and the p-value threshold is set at 0.05. 3.6.16 Kaplan-Meier analysis Kaplan–Meier plots were made with using the Survminer R package. Publicly available and deidentified TCGA non-redundant breast cancer patient data were used to construct Kaplan–Meier curves using criteria stated for each figure. 3.6.17 t-SNE plot The t-SNE diagram was created using the t-SNE implementation inside of scikit- learn. t-SNE was performed on the optimized 32 gene subset expression data of the PAM50 gene set as determined by RFECV using a support vector machine classifier. The resulting scatterplot of data is generated using matplotlib and colored according to the legend. 3.6.18 Supervised machine learning The supervised machine learning soft voting classifier implemented using scikit- learn consists of logistic regression, support vector machine, random forest, XGBoost, and multi-layer perceptron classifier probabilities merged together. METABRIC gene expression data for the 32 genes with matched PAM50 subtypes were shuffled and split 70–30 into training and test data sets, respectively. To combat overfitting and training on biased data, PAM50 subtype proportions were kept the same between training and test sets. Probability predictions were averaged over 15 instantiations. 91 3.6.19 Human data usage Clustering was performed on human breast cancer transcriptional data matched with intrinsic subtypes from the anonymized METABRIC dataset, which is readily available through cBioPortal. Kaplan–Meier analyses were performed on non-redundant human breast cancer patients from all breast cancer studies available in cBioPortal as of time of publication that match the criteria specified in each plot. 3.6.20 Statistics Unless otherwise stated, statistical tests with p-value displayed are done using a two-sided Student’s t-test. 3.6.21 Software versions BioinfoKIT-2.1.0, Bokeh-2.4.3, BWA-mem-0.7.17, Circos-0.69.6, CNVKIT-0.9.9, Delly-0.7.8, FASTQC-0.11.7, GATK-4.1.4.1, Lumpy-0.2.13, Matplotlib-3.4.3, Mutect2- 2.1, NumPy-1.20.3, Pandas-1.3.4, Panel-0.13.1, Picard-2.18.1, Python-3.9.7, SAMtools- 1.9, scikit-learn-0.24.2, SciPy-1.9.0, Seaborn-0.11.2, SnpEff-4.3, ssGSEA-10.0.11, Trimmomatic-0.38, VarScan-2.4.1, and Yellowbrick-1.5. 3.6.22 Availability of data and materials Mouse MMTV-Myc whole genome sequence data obtained in this study is available under BioProject PRJNA945899. Python and R code used in this study to process data and generate figures is available on GitHub post publication (https://github.com/CarBroke). Other data related to primary and supplementary figure generation are included in supplemental materials. 92 3.7 Acknowledgments The authors thank Jesus Alberto Garcia-Lerena, John Vusich, Marcelio Shammami, Reham Ammar, and James Lord of Michigan State University for insightful discussion of various attributes of the manuscript. Whole genome sequencing of MMTV- Myc tumors was financially possible from an internal MSU and Illumina joint discovery grant. Stipend support for the first author during the course of this study was provided under a US Department of Defense CDMRP #W81XWH-21-1-0002 and a fellowship from the Aitch Foundation, a 501(c)(3) charitable organization located in Lansing, MI. Funding sources had no impact on the study design, sample collection, data analysis, data interpretation, or manuscript preparation. 3.8 Author contributions CDB wrote the manuscript, performed whole genome DNA extraction and processing of MMTV-Myc samples, performed all bioinformatic analyses, performed all statistical analyses, performed all machine learning, and contributed to the theoretical design of the study. MMOO performed the mutation verification by Sanger sequencing and ssGSEA processing. MSM contributed to the theoretical and technical design of the machine learning components. ERA performed MMTV-Myc tumor extraction and preservation, performed microarrays on mouse transcriptional data, and lead the theoretical design of the study. All authors contributed to the revision of the manuscript. ERA is the corresponding author. 93 CHAPTER 4: AMPLIFICATION OF COL1A1, CHAD, AND PHB1 INCREASES BREAST CANCER METASTASIS 94 4.1 Preface This chapter represents the main focus of my thesis research. This chapter has been adapted into a first author original research article currently in progress. 95 4.2 Abstract Approximately one fifth of breast cancer patients’ tumors overexpress HER2 stemming from a focal amplification event on the long arm of human chromosome 17, leading to high copy number gains of the HER2 gene, ERBB2. To study the mechanisms of metastasis in HER2+ breast cancer, we utilize a transgenic mouse model which overexpresses Neu, the rat isoform of HER2, under control of the MMTV promoter, known as MMTV-Neu. Previously analyzed gene expression and WGS evidence from our lab suggested high expression of genes near cytoband 11qD. Examining the syntenic human region of 17q21.33, our group identified a similar amplification event present in 25% of HER2+ patients, 9% of all breast cancers, and contains a number of candidate genes that putatively mediate metastasis. Previous preliminary experiments examine the potential for these candidate genes in mediating metastasis, but we know little about how these genes are mediating metastasis. I hypothesize that upregulation of three genes in the 17q21 cytoband—collagen type 1 alpha 1 chain (COL1A1), chondroadherin (CHAD), and prohibitin 1 (PHB1)—act to increase migration and proliferation, thus increasing metastasis. To this end, I utilized computational, in vitro, and in vivo methods to examine how these genes affect each stage of metastasis. We identified the HER2+ BT-474 breast cancer cell line with confirmed amplification over the 17q21 cytoband; shRNA KD cell lines for COL1A1, CHAD, and PHB1 were engineered to test KD effects on metastasis. I found no difference in migratory capacity, invasive capacity, or growth rate both in vitro and in vivo. No statistically significant difference in response to PI3K/AKT was found, but altered signaling pathways indicate increased estrogen signaling, increased propensity for bone metastases, and patients with this amplicon are likely in the luminal B subtype. 96 4.3 Introduction ERBB2, the gene which produces the protein product HER2, is genetically altered in 15-20% of all breast cancer cases. Activating mutations in ERBB2 account for single digit percentages of genetic alterations in this region, but the vast majority of the genetic alterations over the 17q12 locus are. HER2 positivity in patient tumors is most often classified by IHC staining: IHC-0 finds zero to minimal staining of HER2 in <10% of biopsied tumor cells and is considered HER2-; IHC-1+ finds barely perceptible to faint staining of HER2 in >10% of biopsied tumor cells and is still considered HER2-, but with the recent advent of trastuzumab-deruxtecan treatment, it is now clinically relevant to distinguish IHC-1+ from IHC-0310,311; IHC-2+ finds weak to moderate staining of HER2 in >10% of tumor cells and is considered equivocal in HER2 positivity/negativity; and finally, IHC-3+ finds intense and complete staining of HER2 in >10% of tumor cells and is considered HER2+311. IHC-1+ and IHC-2+ may have additional studies, such as ISH of ERBB2 gene copies to look at amplification levels relative to CEP17 to aid in determining eligibility for HER2 targeted therapy311. ERBB2 is an RTK and the most frequently altered gene in the MAPK-RAS pathway in breast cancer, acting to broadly alter cell proliferation, cell survival, protein translation, migration, cell adhesion, and cell differentiation19,312,313. Only beginning in the 1980s did we begin to ascertain the relevance of HER2/Neu in cancer, as it became clear ERBB2 bore striking homology to the recently discovered EGFR314,315, which had a known role in phosphorylation and stimulating cell growth316. ERBB2 was soon demonstrated to be a potent oncogene, with its overexpression inducing NIH 3T3 cell transformation, resulting in anchorage independent cell growth and loss of contact inhibition317,318. At the same time, it was becoming increasingly evident 97 that ERBB2 played an important role in breast cancer development. In an initial investigation of 189 primary breast tumors, 30% of those showed amplification of ERBB2 and HER2 protein levels 2-fold to 20-fold higher than in non-amplified samples. Amplification of ERBB2 led to worse relapse free survival (RFS) and OS compared to patients without amplification of ERBB2, establishing ERBB2 amplification as a prognostic biomarker in breast cancer319. Shortly thereafter, a concerted effort by researchers and clinicians to identify therapeutics that can target ERBB2/HER2 ensued. Preliminary experiments found that addition of mAbs reactive to HER2 cause rapid and reversible internalization of HER2, reverting NIH 3T3 cells to their non-transformed state320. A promising mAb candidate, humAb4D5-8, later renamed trastuzumab, was found to strongly inhibit proliferation and promote antibody-dependent cellular toxicity in breast cancer cell lines expressing high amounts of HER2321. Trastuzumab binds tightly to domain IV, part of the extracellular domain of HER2, preventing both homodimerization and heterodimerization with EGFR and other HER family members322, thus preventing MAPK and PI3K signaling cascades, halting cancer progression323. Development of additional mAbs targeted against the extracellular domain of HER2324, TKIs to target the intracellular RTK domain of HER2325– 327, and ADCs that can target cells even with very low expression of HER291 all combined have dramatically increased the survival rate of HER2+ breast cancer patients over time313,328,329. However, HER2+ metastatic breast cancer is still almost invariably fatal330. While the main oncogenic driver of HER2+ breast cancer is amplification of ERBB2, there are likely many other accrued oncogenic drivers that remain untreated, contributing to the poor OS for HER2+ metastatic disease. 98 While overexpression of HER2 can cause transformation of cells in vitro, observing spontaneous HER2+ primary mammary tumors in an immunocompetent in vivo model can recapitulate human HER2+ breast cancer more closely. Indeed, GEMMs have been engineered expressing either activated Neu110 or endogenous Neu111 under control of the MMTV promoter/enhancer. MMTV-Neu models are able to create spontaneous mammary tumors capable of metastasizing to distant organs. These models have enabled researchers to study HER2+ breast cancer in vivo with metastasis in unprecedented detail. However, it has become increasingly relevant that cancer is a complex disease that is the result of multiple genetic alterations accrued over an extended period of time67,114,331. While over expression of HER2/Neu is sufficient for transformation of NIH 3T3 in vitro, the prolonged latency period for onset of tumors within the MMTV- Neu model suggests overexpression of Neu alone is not sufficient for tumorigenesis in vivo. Additional genetic alterations are accrued as the evolutionary and fitness landscape of the tumor and its microenvironment changes over time. Our group hypothesized that the MMTV-Neu mouse model of HER2+ breast cancer accrues additional genetic alterations over time, similar to human HER2+ breast cancer. Finding additional genetic drivers in HER2+ breast cancer while being able to test therapeutics in an immunocompetent mouse model would serve to further enhance our understanding and treatment of HER2+ breast cancer. Thus, we randomly chose three MMTV-Neu tumors for WGS sequencing to identify heterogeneous and conserved genetic alterations. Initial examination of WGS data by Jon Rennhack from MMTV-Neu GEMM tumors showed amplification over mouse cytoband 11qD, syntenic to human cytoband 17q21.33118. Indeed, expression of genes 99 in the 11qD cytoband was found to be upregulated in a majority of MMTV-Neu tumor samples tested118. In humans, amplification of the syntenic cytoband 17q21 is associated with higher breast cancer grade and poor prognosis332–335. However, my own reexamination of MMTV-Neu tumor WGS failed to recapitulate the previously found amplification over cytoband 11qD. In my own analysis, I find no amplification event over 11qD in any of the three tumors utilized for WGS (Figure 4.3A, Figure 4.3B, and Figure 4.3C). This is in contrast to my WGS analysis of MMTV-Myc microacinar tumors with large amplifications occurring over the entirety of chromosomes 11 and 15 (Figure 4.3D). I could also find no evidence of focal amplifications occurring over specific regions of 11qD in MMTV-Neu tumors. This is in contrast to finding focal amplification of Fgfr2 in the EMT histological subtype of MMTV-Myc tumors (Figure 4.3E). I do not dispute gene expression upregulation in the 11qD cytoband within MMTV-Neu tumors, but there is conflicting evidence as to the validity of gene copy number amplification over genes in this cytoband. However, the amplification over the syntenic 17q21.33 cytoband in human breast cancer is well documented in thousands of breast cancer samples and without dispute19,284—subsequently, 17q21.33 will be shortened to 17q21 for brevity. Indeed, other groups have come to very similar conclusions using array comparative genomic hybridization defining similar boundaries for a 17q21 amplicon, while also reporting additional genetic firestorm events elsewhere on chromosome 17336. Preliminary KO and KD of putative oncogenes in this cytoband in both mouse and human breast cancer cell lines found defects of migration and metastasis118. Identification of putative oncogenic drivers within the 17q21 cytoband amplification event spanning over 40 genes and 4 megabases began with filtering down to genes that 100 showed differential gene expression between breast tumors with or without the 17q21 cytoband amplification, limiting further analysis to 19 genes. Genes of interest were further limited to those that were differentially expressed in HER2+ breast cancer and showed tangible differences in clinical outcomes in HER2+ breast cancer patients in cBioPortal. Candidate genes were assessed for viability against the Achilles shRNA library to identify non-lethal drug targets337 and cross-referenced with genes that responded to high throughput patient derived xenograft drug screens for therapeutic vulnerabilities338. Final candidate gene selection occurred after an extensive literature search for previous implications in tumorigenesis. COL1A1 was identified as a putative oncogene within this cytoband due to mounting evidence that collagen is crucial part of the tumor stroma for limiting immune cell penetration, metabolic fuel for cancer cells, providing proliferative signals for cancer cells, and providing migration tracks for cancer cells to disseminate away from the primary tumor339–343. Cancer associated fibroblasts have received increased attention recently as the importance of collagen in the tumor microenvironment has become appreciated for patient disease prognosis344, but recent evidence shows that cancer cells themselves are able to produce oncogenic collagen variants343. CHAD was identified as another potential oncogene given its nature in facilitating adhesion between cell surface integrins and the extracellular matrix, particularly collagen345–347, which aids in the migration of cancer cells. PHB1 has been noted for its association with high breast cancer grade and poor clinical outcomes332, with PHB1 potentially playing a driving role in cancer progression as PHB1 is necessary for Ras activation by c-Raf in vivo348. PHB1 is evolutionarily conserved between species and is essential for cell survival, as it plays a critical role in biogenesis, bioenergetics, and 101 regulation of oxidative phosphorylation in the mitochondria348–351. There is substantial evidence from these data that COL1A1, CHAD, and PHB1 are putative oncogenes in this context, but we do not know mechanistically how these genes are affecting migration, metastasis, and other oncogenic functions. To this end, I sought to characterize the roles of COL1A1, CHAD, and PHB1 at each stage of metastasis using computational, in vitro, and in vivo methods. For in vitro and in vivo methods, I utilized shRNA KDs of COL1A1, CHAD, and PHB1 engineered into the BT-474 cell line—a HER2+ and HR+ cell line confirmed to have amplification of cytoband 17q21. The cell lines were engineered by VectorBuilder, characterized and validated internally, and then characterized and validated before experimental procedures. I also report integrated computational analysis of hundreds of HER2+ breast cancer patient samples with or without the 17q21 amplification event. Given the conflicting evidence over the presence of the 11qD amplification in MMTV-Neu mice, we focused solely on performing experimental procedures using the human cell line BT-474 as it recapitulates the 17q21 cytoband amplification, is HER2+, and is HR+, representative of the human breast cancer samples seen with 17q21 cytoband amplification. 4.4 Results 4.4.1 The COL1A1 amplicon worsens clinical outcomes in breast cancer patients Adverse clinical outcomes in patients harboring amplification of genes in the 17q21 cytoband is well documented in breast cancer333–335 and in neuroblastoma352–354. To examine the frequency of amplification of this region in breast cancer, I utilized breast cancer datasets available within cBioPortal212,214. Genetic alterations occur in ERBB2 in ~15% of all breast cancer cases while genetic alterations occur in COL1A1, CHAD, and 102 PHB1 in ~6% of all breast cancer cases (Figure 4.4.1A). Given the overlapping results of ERBB2 amplification and amplification of COL1A1/CHAD/PHB1, I sought to tease apart the mutual exclusivity of these events and how each individually might be contributing to clinical outcomes. ERBB2 resides in cytoband 17q12 while COL1A1, CHAD, and PHB1 reside near to each other in cytoband 17q21; each cytoband is often focally amplified independently of the other, but these amplifications often cooccur. The most common overlap is between COL1A1 and CHAD coamplification, while the least common is ERBB2 and CHAD coamplification (Figure 4.4.1B). All possible combinations of coamplification events are statistically significant (q<0.001) by Fisher exact test and Benjamini-Hochberg multiple hypothesis correction. Therefore, I sought to examine the 17q21 cytoband amplification independently of ERBB2 amplification, since ERBB2 amplification is well established in its role for increasing metastasis and worsening clinical outcomes in breast cancer319. I only included breast cancer patients with ERBB2 alteration excluding PHB1 alteration, patients with PHB1 alteration excluding ERBB2 alteration, or patients that lacked genetic alterations to either ERBB2 or PHB1—PHB1 alteration is representative of the 17q21 cytoband as a whole in this analysis, as examining multiple genes within the 17q21 cytoband independently and simultaneously would eliminate >90% of 17q21 amplified patients from consideration in the analysis. When interrogated, I found median months RFS for breast cancer patients without alteration to ERBB2 or PHB1 was 252.27 months, dropping to 151.77 months for patients with only altered ERBB2, and dropping further to 98.20 months for patients with only altered PHB1 (Figure 4.4.1C). This represents a hazard ratio of 1.41 (95% CI: 1.15-1.74) for ERBB2 alteration relative to unaltered and a hazard ratio of 1.68 (95% CI: 1.07-2.63) 103 for PHB1 alteration relative to unaltered. A similar trend is observed for OS, where median months OS was 163.17 months for unaltered patients, 141.73 months for patients with altered ERBB2, and 114.23 months for patients with altered PHB1 (Figure 4.4.1D). Patients with ERBB2 alterations have a hazard ratio of 1.27 (95% CI: 1.11-1.45) relative to the unaltered cohort while patients with PHB1 alterations have a hazard ratio of 1.31 (95% CI: 0.95-1.80) relative to the unaltered cohort. These data clearly show a strong association between amplification of PHB1 and poor clinical outcomes independent of ERBB2 status. However, causal determinations cannot be made from these data. It is possible PHB1 amplification correlates highly with genetic alteration of the true oncogenic driver. Mechanistic studies that attempt to identify the causal relationship between 17q21 amplification and oncogenic traits will be explored in later figures. However, many of the human samples examined at the genetic level do not have histopathology available for tumors. In our previous work, we found that MMTV-Neu tumors that have high expression of Col1A1 also show increased collagen staining in IHC118. I hypothesized that we would similarly observe increased Chad and Phb1 staining in MMTV-Neu tumors with increased gene expression of 17q21 cytoband genes. There is additional evidence that PHB1 is critical for maintaining angiogenic and migratory capacity of endothelial cells that comprise blood vessels355. Angiogenesis is one of the hallmarks of cancer8, so I additionally hypothesized that increased Phb1 staining in MMTV-Neu tumors would correlate with increased Cd31 staining, as Cd31 is a marker of endothelial cells. 104 4.4.2 Staining of Col1A1, Chad, or Phb1 does not correlate with differing structural characteristics of MMTV-Neu tumors To look at the relationship between Cd31 and Col1A1/Chad/Phb1 gene expression impacts on histopathology, I sought to examine associations between Col1A1, Chad, Phb1, and Cd31 IHC in a collection of MMTV-Neu tumors with high gene expression of 17q21 cytoband genes. 15 historical MMTV-Neu samples were tested for gene expression of Col1A1, Chad, Phb1, and Cd31 using quantitative real time polymerase chain reaction (qRT-PCR). MMTV-Neu tumors for the highest Phb1 gene expression (tumor 503), middle Phb1 expression (tumor 281), and the lowest Phb1 gene expression (tumor 842) of the 15 MMTV-Neu tumors examined underwent IHC for Col1A1, Chad, Phb1, and Cd31, with representative images from those samples shown (Figure 4.4.2). Heterogeneous spatial localization of proteins was seen between tumors. Tumor 281 saw weak but dispersed Col1A1 staining, while tumors 503 and 842 saw large, aligned deposits of collagen running through central regions of each tumor. Interestingly, Chad displayed disperse punctate staining in tumors examined. Tumor 281 has weak punctate staining of Chad, while tumor 503 has little to no staining for Chad, in sharp contrast to tumor 842 which realizes strong punctate staining for Chad. Phb1 expression appears more diffuse throughout all tumors but is more apparent in certain regions, such as in the region shown for tumor 503. Tumor 842 had some of the strongest staining for Col1A1 and Chad, but also had noticeably lesser staining for Phb1. No trend or correlation could be identified between Col1A1, Chad, or Phb1 expression and localization. Cd31 staining of endothelial cells appeared roughly similar between all tumors examined both in terms of overall staining and spatial organization. 105 Even if trends or correlations were seen between Col1A1, Chad, Phb1, and Cd31 staining in MMTV-Neu tumors, this could not establish causality between high 17q21 cytoband gene expression and changes to the histopathology of the tumor, but merely provide further correlative evidence if a later causal relationship is shown to exist. Therefore, I sought to ablate gene expression of COL1A1, CHAD, and PHB1 in a human breast cancer cell line to determine causal effects on tumor cell behavior and characteristics. 4.4.3 Knockdown of COL1A1, CHAD, or PHB1 in the BT-474 cell line does not affect proliferation in vitro BT-474 is human breast cancer cell that is HER2+ and has an amplification event over cytoband 17q21356,357, making it a good candidate for gene KD of COL1A1, CHAD, and PHB1 as this is the most common genetic profile seen in human breast cancer patients with 17q21 cytoband amplification19. Custom shRNA vectors for COL1A1, CHAD, PHB1, and a universal scramble control were stably transduced into BT-474 populations using lentivirus within 3 passages after obtaining the cell line from ATCC. I chose a shRNA gene KD approach as CRISPR KO of genes becomes increasingly lethal to cells the more gene copies there are358, activating the p53-mediated DNA damage response pathway359. Previous work in the lab to generate COL1A1, CHAD, and PHB1 CRISPR KO in BT-474 wasn’t able to be achieved, likely because of CRISPR induced lethality for cutting multiple times in the genome for each gene due to amplification358. Gene KD in the BT-474 cell line with 17q21 amplification was chosen instead of gene KD in a cell line diploid for 17q21 as the clinical outcome data suggest overexpression of genes with the 17q21 cytoband increase metastasis and decrease overall survival. I 106 hypothesized that gene expression of COL1A1, CHAD, and PHB1 after KD in the BT-474 cell line would be comparable to expression of these genes in cell lines diploid for 17q21 genes, potentially reverting to the less metastatic phenotype. Stable pooled populations of BT-474 WT, BT-474 Scramble KD, BT-474 COL1A1 KD, BT-474 CHAD KD, and BT- 474 PHB1 KD were single cell filtered and plated in cell culture in complete DMEM. shRNA pooled populations was selected for using puromycin over 3 days. Media was replaced at least every 72 hours until BT-474 cell line populations reached 80-90% confluence. RNA was extracted from each cell line and quantified using single step qRT- PCR. Gene KD of COL1A1 (Figure 4.4.3A), CHAD (Figure 4.4.3B), and PHB1 (Figure 4.4.3C) are reported for each respective cell line relative to the WT and scramble controls. Statistically significant gene KD was reported in all shRNA KD lines relative to WT and scramble controls, with mean gene KD expression levels between 20-25% of WT and scramble expression for COL1A1, CHAD, and PHB1. This decrease in gene expression levels is commensurate with naturally occurring differences in gene expression levels seen for human breast cancer patients with or without the 17q21 cytoband amplification284. Having confirmed gene KD for COL1A1, CHAD, and PHB1 in these cell lines, I first wanted to examine if there were any differences in proliferation caused by gene KD. To answer this, I plated ~50,000 single cell filtered cells into each well in 6-well plates for each cell line. I counted the number of cells in each well at 24-hour timepoints with three replicates for each timepoint for 8 days after initial plating. I found no statistically significant difference in the number of cells at any timepoint between the WT or scramble lines and the COL1A1, CHAD, or PHB1 cell lines (Figure 4.4.3.D), indicating no difference in proliferation between cell lines. While no difference in proliferation between 107 cell lines was found, previous preliminary evidence found that gene KO of COL1A1, CHAD, or PHB1 in the NDL2-5 cell line led to migration defects118. 4.4.4 BT-474 has poor migration and invasive capability and knockdown of COL1A1, CHAD, or PHB1 does not affect these capabilities To examine the migration and invasion capabilities of cells with amplified 17q21 genes or gene KDs, I plated 100,000 single cell filtered cells from the BT-474 WT, BT- 474 Scramble KD, BT-474 COL1A1 KD, BT-474 CHAD KD, and BT-474 PHB1 KD cell lines into 8 µm transwell membranes for migration analysis or 8 µm transwell membranes with solidified Matrigel for invasion analysis360. To ensure proliferation was not a contributing factor in analysis, I used 24-hour and 72-hour timepoints for measuring migration and invasion. Representative images taken at 4x magnification of crystal violet stained cells shows very little migration at 24-hours for all cell lines, with highly variable differences in migration between cell lines at 72-hours (Figure 4.4.4A). No cells were detected in either the 24-hour or 72-hour timepoints for any of the invasion analyses. Quantification of the number of cells that had migrated/invaded to the bottom of the membrane reveal no statistically significant differences between any of the different cell lines (Figure 4.4.4B). While no direct migration or invasion differences were seen, potential downstream transcriptional changes from gene KD could render one of these cell lines more susceptible to therapeutic intervention. 4.4.5 Inhibition of the PI3K-AKT pathway does not selectively affect BT-474 KD cell lines KD of COL1A1, CHAD, or PHB1 may introduce vulnerabilities into BT-474 cells, making them either dependent on already elevated oncogenic pathways or forcing them 108 to shift to alternative oncogenic pathways. To find potential gene expression pathway perturbations as a result of gene copy number differences, I utilized gene expression data from TCGA METABRIC284 limited to two different patient groups: one group has ERBB2, COL1A1, CHAD, and PHB1 amplification, while the other group only has ERBB2 amplification. I performed differential gene expression analysis between these two groups and found PHB1 to be the most upregulated gene in the ERBB2/COL1A1/CHAD/PHB1 amplified group (Figure 4.4.5A). PHB1 is an evolutionarily conserved and essential gene often implicated in cancers332,348,351,361, but there are no confirmed therapeutics that can target PHB1 specifically. To find perturbagens that are associated with changes in this gene expression signature, I utilized Connectivity Map362 queries on the 50 most upregulated and 50 most downregulated genes from the differential expression analysis. Top perturbagens that can preferentially target the 17q21 amplified cytoband had normalized connectivity score and q-value plotted (Figure 4.4.5B). Many potentially statistically significant perturbagens were identified, but the third top perturbagen, A- 443654, stood out to us for several reasons. A-443654 is a top perturbagen from Connectivity Map and is known to have a selective and strong inhibition of the AKT pathway363. PHB1 is known to have an important role in the PI3K/AKT pathway and the MAPK/ERK/RAS pathway348,350 while also being the most differentially expressed gene overall in these two patient populations. When I examine total protein AKT levels and phospho-AKT levels by western blot of all cell lines, I find that the BT-474 PHB1 KD cell line does not have different total levels of AKT compared to any of the other cell lines, but has significantly less phospho-AKT levels than the other cell lines (Figure 4.4.5C). I hypothesized that A-443654 drug treatment will more selectively target the BT-474 PHB1 109 KD than other BT-474 cell lines. Contrary to my hypothesis, I found that A-443654 had no statistically significant difference on cell survival between all cell lines tested at any concentration (Figure 4.4.5D). It is possible none of the BT-474 cell lines accurately recapitulate the gene expression pathways of 17q21 cytoband amplified human breast cancer patients in vitro. BT-474 cell lines may better recapitulate human 17q21 cytoband amplified breast cancer pathways in vivo. At the very least, orthotopic mammary tumors generated from BT-474 cells could be used for RNAseq to provide global gene expression pathway differences initiated by gene KD. 4.4.6 Cell line growth rates are not statistically significantly different in vivo, but all show development of contralateral mammary gland metastases To this end, I performed orthotopic mammary fat pad injections of 1x106 single cell filtered BT-474 WT, Scramble KD, COL1A1 KD, CHAD KD, and PHB1 KD cells into the right #4 mammary gland of BALB/c CAnN.Cg-Foxn1nu/Crl immunodeficient female mice between 8-10 weeks in age with 8 mice per group. Tumors became palpable in most mice by three weeks post injection. As of 97 days post injection, only one mouse in the BT-474 PHB1 KD cohort has been dropped from the study due to health issues. All cohorts of mice slowly developed primary tumors over time, peaking in volume 69 days post injection, but then regressing in almost all cases between day 69 and day 97 (Figure 4.4.6A). No statistically significant differences in growth rates between cell lines were observed in vivo up to an including post tumor regression. Interestingly, at the same time regression was noticed, nodules became apparent in the left #4 mammary gland of a majority of the mice within the study. While this cannot be confirmed until necropsy, it is 110 possible these solid nodules represent contralateral mammary gland metastases. Indeed, volumes of the nodules in the contralateral mammary gland of these mice are often on par if not larger than the tumors present at the orthotopic injection site. To test whether gene KD had effects on metastasis, independent of primary tumor development, I performed tail vein injections of BT-474 WT, Scramble KD, COL1A1 KD, CHAD KD, and PHB1 KD into 16 BALB/c nude mice per condition, 8-10 weeks in age. Mice were necropsied at 9 weeks post injection. Lungs were perfused and fixed in 10% neutral buffered formalin (NBF). After sectioning, H&E staining, and microscope assisted imaging, samples largely showed normal lung physiology with branching bronchi, bronchioles, blood vessels, and alveolar ducts clearly present in every section (Figure 4.4.6B). Only two tumors were seen between all 80 samples examined—a large, densely packed acinar tumor in BT-474 WT sample 1 (Figure 4.4.6C), and a smaller, more loosely packed acinar tumor in BT-474 CHAD KD sample 11 (Figure 4.4.6D). Statistically significant differences cannot be concluded from these limited data. 4.4.7 Patients with the COL1A1 amplicon have increased metastasis gene expression signatures and fall into the luminal B subtype of breast cancer Despite the lack of conclusive evidence from gene KD of COL1A1, CHAD, and PHB1 in the BT-474 cell line both in vitro and in vivo, the 17q21 cytoband amplification is likely to have impact on clinical outcomes in human breast cancer patients. To explore the signaling pathway differences between patients with ERBB2, COL1A1, CHAD, and PHB1 amplification compared to ERBB2 amplification alone, I used unsupervised hierarchical clustering on the TCGA METABRIC breast cancer gene expression dataset limited to PAM50 genes. I found that patients largely reside within the HER2+ intrinsic 111 subtype and the luminal B subtype (Figure 4.4.7A). To examine differences in signaling pathways more quantitatively, I employed GSEA on total TCGA METABRIC gene expression data, comparing ERBB2/COL1A1/CHAD/PHB1 amplified patients to ERBB2 amplified patients. Indeed, I find the 17q21 amplified population gene expression data has the 17q21_q25 amplicon signature as its most highly upregulated C2 gene set (Figure 4.4.7B). Confirming the visual association from the cluster heatmap, I find a strong luminal B signature present in the 17q21 amplified population (Figure 4.4.7C). Consistent with the clinical observations observed from breast cancer datasets19, patients with 17q21 amplification have a higher propensity for metastasis than patients without (Figure 4.4.7D). While at first unexpected but corroborating the increased luminal B signature, I find that patients with 17q21 cytoband amplification are likely to highly express estrogen as noted by increased tamoxifen sensitivity (Figure 4.4.7E) and both a C2 curated and hallmark gene set implicating increased estrogen signaling (Figure 4.4.7F, G). Overinterpretation is cautioned against, as this dataset has been intentionally limited to only patients with ERBB2 amplification. When comparing 17q21 cytoband amplification patients to all other breast cancer patients, canonical HER2+ signatures tend to dominate (data not shown). Since 25% of HER2+ breast cancer patients have an additional 17q21 cytoband amplification, much more prevalent than the single digit percentages of all breast cancer patients possessing a 17q21 cytoband amplification, I limited our in-depth analysis to only HER2+ patients. 4.5 Discussion In this work, I employed computational gene expression analysis and gene KD of COL1A1, CHAD, and PHB1 for in vitro and in vivo experiments designed to probe what 112 stages of metastasis does the 17q21 cytoband affect and through what pathways it affects metastasis. To this end, I deduced that HER2+ patients with the 17q21 cytoband amplification are much more likely to be HR+ and have more metastasis. Patients with PHB1 amplification, and thus having amplification over cytoband 17q21, fare just the same or worse in RFS and OS than patients with ERBB2 amplification. From our use of engineered BT-474 COL1A1, CHAD, and PHB1 KD cell lines, I found that KD of PHB1 lowers the amount of phospho-AKT present within the cells without reducing the total amount of AKT present, indicative that PHB1 is indeed directly contributing to AKT signaling. However, I was unable to make any definitive conclusions regarding stage of metastasis altered by COL1A1, CHAD, or PHB1. I observed no differences in growth rates between gene KD and WT or scramble control, both in vitro and in vivo. There was no observable difference in migration or invasion between cell lines, mostly due to BT- 474’s known poor basal level of migration and invasive capabilities364. No difference was observed between cell line responses to the AKT inhibitor A-443654, despite it being a highly predicted effective inhibitor. It has been observed by others that HER2+ breast cancers can switch oncogenic driver pathways in response to therapy365, which may be the case in this circumstance. BT-474 may be able to effectively switch any dependence on the PI3K/AKT pathway to another oncogenic pathway like MAPK in response to AKT inhibition. Unexpectedly, the scramble KD control and CHAD KD also showed increased phospho-AKT relative to total AKT and vinculin loading controls by western blot. Lack of upregulation in the COL1A1 KD line suggests this is not a transfection induced artifact. A possible explanation for increased phospho-AKT in the scramble and CHAD KD cell lines 113 may lie in non-specific off target effects of the shRNA, or differences in confluency of each cell line at time of protein extraction. Increased cell density in scramble or CHAD KD may have increased focal adhesion kinase signaling through integrin mediated signal transduction, with PI3K being a direct response factor to focal adhesion kinase366. According to The Human Protein Atlas, BT-474 oncogenic signaling pathways significantly upregulated relative to over 1,000 different pan-cancer cell lines tested in order of highest to lowest z-score PROGENy signatures367 are: estrogen signaling (2.249), VEGF signaling (0.479), androgen signaling (0.358), PI3K signaling (0.269), p53 signaling (0.075), and hypoxia signaling (0.058)368. It is worth noting that most breast cancer cell lines examined have downregulated PI3K signaling relative to most other pan- cancer cell lines while BT-474 has a slightly upregulated PI3K signaling score. However, BT-474 estrogen signaling is more upregulated than estrogen signaling in other breast cancer cell lines by a significant margin368. It could be the case that despite amplification over the 17q21 cytoband and increased gene expression from this region, estrogen signaling is nearly the sole driver of oncogenic signaling and proliferation in BT-474. RNAseq on orthotopically injected cell lines forming mammary tumors was planned to ascertain any differentially regulated pathways between BT-474 cell lines, but unexpectedly, virtually all tumors began shrinking after 69 post injection timepoint, precluding RNAseq. Initial palpation of established tumors approximately three weeks post injection suggests that BT-474 cells injected into the mammary gland were viable. BT-474 is tumorigenic in nude mice but is known to grow exceptionally slowly and is prone to spontaneous regressions369. Others have reported that BT-474 grows more rapidly in the extremely immunodeficient NSG strain of mice370. Nude BALB/c female mice were 114 used in this study, which still retain natural killer cells, B cells, neutrophils, dendritic cells, and macrophages. I propose that immune infiltration within the BALB/c nude strain is likely the cause of tumor regression. Future directions could include extracting tumor at initially palpable time points to stain for presence of different immune cells by IHC. However, it is possible initial palpations of tumors included significant inflammation swelling that contributed to tumor size. While initial tumor palpations were believed to be evidence of viable cells, if these palpable regions were indeed fluid accumulation due to inflammation, that may support spontaneous tumor regressions beginning around day 69 post injection, with immune cells infiltrating the mammary gland to clear sheared cellular debris. Possible sources of cell line viability reduction may stem from over trypsinization, single cell filtering through 70 µm cell filters prior to suspension in PBS, and even shearing during injection. Further supporting evidence of immune infiltration causing tumor regression is found in the The Human Protein Atlas CytoSig371 scores for BT-474, showing that the highest cytokine signaling detected in BT-474 is that of interleukin 4, with a z-score of 4.33 when compared to pan-cancer cell lines. Interleukin 4 is responsible for macrophage differentiation, inhibiting migration of neutrophils, inducing interferon γ expression by natural killer cells, B cell proliferation and major histocompatibility II expression, and many other immunogenic activities372. Future in vivo experiments with BT-474 may be best suited for use with NSG mice, which further lack B cells and NK cells. Additional positive control (e.g. MDA-MB-231) and negative control cell lines (e.g. 293T) should be used alongside BT-474 injections to control for experimental procedures performed to cell lines. However, previous preliminary work from the lab was able to generate BT-474 orthotopic 115 mammary tumors in nude BALB/c mice118, although no metadata regarding tumor growth rates were recorded. Even so, BT-474 is poorly migratory and possessed no invasive capability by 72 hours when tested in vitro. It is important to note that experiments performed herein using the BT-474 cell line were done using a new stock obtained from American Type Culture Collection and did not use the same BT-474 cells used in previous preliminary experiments within the lab118. Altogether, these data suggest that different HR+/HER2+ cell lines should be used for further study. I could not identify other HR+/HER2+ established cell lines that also have the 17q21 cytoband amplification, despite its notable prevalence in human breast cancer. However, KD or KO of the 17q21 cytoband genes in breast cancer cell lines that do not have the 17q21 amplification may be used to compare dependency on these genes for oncogenic growth. In summary, I find further computational evidence supporting the 17q21 cytoband amplification in breast cancer, particularly HER2+ breast cancer, as playing an important role in metastasis and clinical outcomes. However, I was unable to mechanistically confirm why the 17q21 cytoband amplification increases metastasis or worsens clinical outcomes. Further studies are needed to identify specific mechanisms and pathways associated with differences in clinical outcomes. 4.6 Materials and methods 4.6.1 Oncoprint, upset plot, and KM curves The oncoprint in Figure 4.4.1A was generated using all available breast cancer studies as of November 2024 within cBioPortal in the oncoprint tab. The upset plot was generated using a custom Python script from mutual exclusivity data obtained from cBioPortal for all available breast cancer studies comparing ERBB2, COL1A1, CHAD, 116 and PHB1. KM curves were generated in cBioPortal from the comparison/survival tab using all available breast cancer studies, comparing nonoverlapping patients with ERBB2 amplifications, patients with PHB1 amplifications, and no amplification of either gene for both RFS and OS. 4.6.2 WGS scatterplots WGS scatterplots shown in Figure 4.3 were generated using previously processed MMTV-Neu and MMTV-Myc tumor bam files373. Plots were generated using the normal ‘batch’ method of CNVkit239 to process WGS bam files and the ‘scatterplot’ function was used to generate the WGS scatterplots for each bam file. 4.6.3 Microscopy imaging Microscope images of crystal violet stained transwell membranes, IHC sections, and H&E sections were taken on an Olympus BX41. Reference bars are included in the bottom left of each image for measurements. Light and focus were adjusted for each image to achieve optimal clarity. 4.6.4 Cell culture All cell culture was performed in a BSL-2 rated biosafety cabinet. BT-474 were cultured in high glucose DMEM (Millipore Sigma D6429) supplemented with 10% FBS (Life Technologies A5256701) and 1% antibiotic-antimycotic (Life Technologies 15240062). Cells were cultured at 37 °C and 5% CO2 in sterilized incubators with ample autoclaved reverse osmosis treated water in the water pan to maintain humidity. Cell lines were tested using a mycoplasma PCR detection kit (ab289834) and were all found to be mycoplasma negative before use in experiments. 117 4.6.5 Cell line generation shRNA cell lines were created by VectorBuilder. The human COL1A1 shRNA target sequence is: GGCAAGACAGTGATTGAATAC. The human CHAD shRNA target sequence is: ACGCTGAAACACGTCCATTTG. The human PHB1 shRNA target sequence is: CCGTGGGTACAGAAACCAATT. The scramble shRNA target sequence is: CCTAAGGTTAAGTCGCCCTCG. All cell lines used except for BT-474 WT are puromycin resistant. Pooled engineered cell lines were selected for with 2 µg/mL puromycin and then cultured in 0.5 µg/mL puromycin to maintain selection for shRNA populations. Puromycin treatment ceased 1 week in advance prior to experimental procedures such as RNA extraction, migration or invasion experiments, orthotopic injection, and all other experiments. The basic vector map used in each cell line is available (Figure S 4.6). All cell lines were of similar confluency when used for experimental procedures. 4.6.6 RNA extraction RNA was extracted from MMTV-Neu tumors and BT-474 cell lines using the QIAGEN RNeasy Midi Kit (#75144) according to the manufacturers protocol for tissue or cell line RNA extraction when appropriate. Homogenization of solutions prior to RNA extraction was accomplished via a Fisher handheld homogenizer. RNA was collected in DNase and RNase free water, diluted to a concentration 1 ng/µL, and stored at -80 °C in aliquots to avoid freeze-thawing. 4.6.7 Protein extraction Approximately 600 µL of NP-40 cell lysis buffer (50 mM Tris, pH 7.4, 250 mM NaCl, 5 mM EDTA, 50 mM NaF, 1 mM Na3VO4, 1% NP-40, 0.02% NaN3) was used per 1.0x106 cells for protein extraction of BT-474 cell lines, followed by gentle mixing for 1 hour at 4 118 °C to lyse cells, and centrifugation at 10,000 G to pellet debris. Sodium orthovanadate (Na3VO4) from NEB (P0758S) and 1 tab of cOmplete™, Mini, EDTA-free Protease Inhibitor Cocktail from Roche (11836170001) was added directly to 10 mL of NP-40 lysis buffer directly before use. Tumor tissue was ground using mortar and pestle under liquid nitrogen, where 1 mL of NP-40 lysis buffer was then added per 0.1 mg of tumor tissue. Gentle rotation/mixing occurred for 1 hour at 4 °C to lyse cells, and then centrifuged at 10,000 G to pellet debris. Aliquots of 30 µg of protein had Laemmli SDS 6x reducing sample buffer from Themo Fisher (J61337.AD) added at the appropriate volume, boiled at 100 °C for 10 minutes, and then stored at -80 °C before use in single use aliquots. 4.6.8 Quantitative real time PCR qRT-PCR was performed using the Stratagene Mx3000p on the SYBR Green setting. The Luna® Universal One-Step RT-qPCR Kit from NEB (E3005L) was utilized for all qRT-PCR reactions. qRT-PCR reactions followed the manufacturers suggestions and used 1 ng total of starting RNA for each reaction. All WT and scramble reactions were done in octuplicate, while KD cell line values were done in quadruplicate. Relative fold values for reporting by bar chart were obtained using the Delta-Delta Ct method within Microsoft Excel374. qRT-PCR primers were obtained from PrimerBank375 and previously validated for use with each gene. COL1A1 primers are forward 5’- GTGCGATGACGTGATCTGTGA-3’ and reverse 5’-CGGTGGTTTCTTGGTCGGT-3’. CHAD primers are forward 5’-CGCGGCCTCAAGCAACTTA-3’ and reverse 5’- TAGGTCAGCTCGGTCAGGTC-3’. PHB1 primers are forward 5’- GACCACGTAATGTGCCAGTCA-3’ and reverse 5’-CATCATAGTCCTCTCCGATGCT-3’. GAPDH primers are forward 5’-GGAGCGAGATCCCTCCAAAAT-3’ and reverse 5’- 119 GGCTGTTGTCATACTTCTCATGG-3’. Primer efficiency was tested using a standard curve of RNA from MMTV-Neu tumors samples from 10 ng total RNA to 1 pg total RNA in each reaction. Primer efficiency was found to be between 90-110% for each primer pair before quantitation of RNA in BT-474 cell lines. All primers were ordered from Integrated DNA Technologies. 4.6.9 Western blotting Gels used are 10% Mini-PROTEAN® TGX™ Precast Protein Gels, 10-well, 30 µl #4561033 from Bio-Rad. Gels were run in SDS running buffer at 30V until samples passed the stacking layer, then run at 70V in the separating layer. Approximately 30 µg of protein for each sample was loaded into each well. Wet transfer overnight at 4 °C in methanol containing transfer buffer was used. Polyvinylidene fluoride (PVDF) membrane used in transfer is cut to size from Alkali Scientific BrightStar membrane (#XR730). Primary antibodies used include CST Vinculin (E1E9V) XP® Rabbit mAb (#13901), CST Akt (pan) (11E7) Rabbit mAb (#4685), and CST Phospho-Akt (Ser473) (D9E) XP® Rabbit mAb (#4060). All primary antibodies were diluted 1:1,000 in tris buffered saline (TBS) + 5% bovine serum albumin (BSA) and incubated on membranes overnight (12+ hours) at 4 °C. Secondary fluorescent antibody staining was accomplished using a 1:10,000 dilution of LiCor IRDye® 800CW Goat anti-Rabbit IgG Secondary Antibody (926-32211) in TBS + 5% BSA + 0.1% Tween20 for 1 hour at ambient temperature (22 °C). Blots were dried prior to imaging on a LiCor ODYSSEY® M Imaging System using the 785 and 685 nm laser light sources for fluorescent excitation. Green and red channels were converted to black and white using the LiCor acquisition software provided with the imager. 120 4.6.10 In vitro growth curve Approximately 50,000 single cell filtered BT-474 WT, scramble KD, COL1A1 KD, CHAD KD, and PHB1 KD cells were plated in 6-well plates. Cells were washed in warmed sterile phosphate buffered saline (PBS) twice, then lightly trypsinized with 1% trypsin for 5-10 minutes until cell detachment. Cells were added to 15 mL conical tubes, then centrifuged at 500 G for 5 minutes at ambient temperature, the supernatant aspirated, and then the cell pellet resuspended in 1 mL of PBS. 10 µL of each resuspension was mixed with 10 µL of trypan blue and measured in duplicate on ThermoFisher Countess 2 automated cell counter slides, with average cell counts reported for each well. Triplicates of each cell line for each time point were performed for statistical power. Cell counts were normalized relative to the initial plating number of cells for each cell line and plotted in Python using the matplotlib package. 4.6.11 Transwell migration and invasion assays Corning 8 µm BioCoat polycarbonate transwell control inserts (#354578) were used for both migration and invasion assays. Invasion assays additionally had 75 µL of Corning Matrigel (#354234) diluted 1:10 with PBS added to the top of the membrane and allowed to solidify for 2 hours at 37 °C and 5% CO2. 100 µL of each cell line at a concentration of 1x107 cells/mL was added to the top of each membrane. Experimental procedures for both migration and invasion assays were performed according to the following protocol376. Time points were 24 hours and 72 hours for both migration and invasion assays. Each cell line for each timepoint was done in triplicate for migration and invasion. Python was used to create the bar plot showing aggregate cell counts. Cell 121 counts were manually verified under microscope given the small number of cells present in each membrane, if any at all. 4.6.12 Volcano plot The volcano plot in Figure 4.4.5 was made using a custom Python script. TCGA METABRIC gene expression z-scored data284 was downloaded, limited to only patients with ERBB2 amplifications, and then split into groups where patients either had COL1A1, CHAD, and PHB1 amplified, or none of the three amplified. Differential gene expression analysis was then performed on these data and plotted using a custom Python script. The false discovery rate (FDR) q-value significance cutoff is 0.05, with q-value calculated by Bonferroni multiple hypothesis correction of p-values. 4.6.13 Connectivity map plot Following from the differential gene expression analysis, I took the top 50 upregulated genes from the 17q21 cytoband population and the 50 most downregulated genes from the same population and applied these to the Connectivity Map362 query web tool. Drug candidates identified to preferentially treat the 17q21 upregulated gene signature are plotted in a volcano plot format similar to the differential gene expression. 4.6.14 A-443654 dose response curve 10,000 cells of each respective cell line were plated in 96-well white opaque microplates (Thermo Scientific #15042) and allowed to attach overnight in complete DMEM at 37 °C and 5% CO2. The following day, media was aspirated and replaced with varying concentrations of A-443654 from 1 mM to 1 fM. Column A of the microplate was complete DMEM alone with no cells, column B was 10,000 cells with complete DMEM and no drug or vehicle. The remaining columns of each microplate were split between 122 using vehicle (DMSO) or A-443654 at varying concentrations mentioned. Treatments were performed over 72 hours, where 100 µL of ambient temperature Cell Titer Glo 2.0 (Promega #G9241) was added to each well containing 100 µL of media. Plates were gently rotated at ambient temperature for 10 minutes to homogenize the signal within each well. Luminescence was then recorded using a Tecan Spark microplate reader between 30 minutes up to 1 hour after addition of Cell Titer Glo 2.0 reagent. Values had average negative control/background luminescence subtracted (Column A), then normalized to positive control average (Column B). A-443654 treated wells had luminescence values then divided by respective DMSO treated wells to normalize to DMSO effects on cell viability. Percent viability is then converted to percent response. Mean and standard deviation were plotted using matplotlib in Python with a fitted logistic curve. Negative control values were done in replicates of 8, positive control values were done in replicates of 8, and all remaining values were done in replicates of 4. 4.6.15 In vivo orthotopic BT-474 injection mammary tumor growth curve 1x106 cells from each cell line were single cell filtered and then orthotopically injected into the right mammary fat pad of nude female BALB/c mice in 100 µL of PBS, with 8 mice per BT-474 cell line used (40 mice total). Tumors were palpable 20 days post injection. Measurement of tumors was done using a Chicago Brand 6” Stainless Vernier Caliper (#50011). Length and width measurements were taken at each timepoint with volume computed using an ellipsoid formula (4/3 * Pi * Length * Width2) due to difficulty in measuring depth of tumors. Timepoints were initially taken one week apart, but due to the slow growth of the tumors timepoints were extended to two weeks apart to reduce oscillation of tumor volume averages by noise captured at each timepoint. As of the time 123 of this recording, only one mouse (#39, PHB1 KD) had to be euthanized early due to health concerns from an ulcerating metastasis near the left axillary lymph node. 4.6.16 In vivo tail vein injection of BT-474 cell lines 1x106 single cell filtered cells from each cell line was injected into the tail vein of nude female BALB/c mice suspended in 100 µL of PBS, with 16 mice used per cell line (80 mice total). Mice were euthanized following IACUC protocols using CO2 and subsequently necropsied at 9 weeks post injection. Immediately post euthanasia, lungs were perfused with 10% NBF at a rate of no more than 200 µL per second through the trachea using a 25 G needle until back pressure is detected, up to 1 mL total377. After back pressure is detected, surgical string is used to tie off the trachea, and lungs are excised and allowed to fix in ~10-20 volumes of NBF for 4 days. Subsequently, the fixed lungs are moved to 50% ethanol for intermediate storage. Lungs are then sectioned and formalin fixed, paraffin embedded (FFPE). 4.6.17 Unsupervised hierarchical clustering TCGA METABRIC z-scored gene expression data and intrinsic subtype classification of samples were limited to only PAM50 genes. The Python seaborn clustermap function was used to generate the plot. Data was normalized across rows and down columns prior to plotting values. 4.6.18 Gene set enrichment analysis TCGA METABRIC z-scored gene expression data was limited to ERBB2 amplified patients either with or without COL1A1/CHAD/PHB1 amplified. GSEApy378 was used on gene expression data for MSigDB C2 or Hallmark gene sets. GSEA enrichment plots are automatically generated for top differentially regulated pathways for each gene set. 124 4.6.19 Statistical considerations Unless otherwise specified, statistical analyses were performed in python using a two-tailed student’s t-test for determination of significance for comparisons of two groups. For comparison of multiple groups simultaneously, analysis of variance (ANOVA) was performed with Tukey’s range test performed if significant differences were found with ANOVA to identify which specific groups within the multiple comparison test are significantly different. 4.6.20 Immunohistochemistry Primary antibodies used include Cell Signaling Technology (CST) COL1A1 (E8F4L) XP® Rabbit mAb #72026, Novus Biological Chondroadherin Rabbit Polyclonal NBP1-87031, CST PHB1 (E8R3V) Rabbit mAb #69683, and Anti-CD31 (Ms,Sw) from Rat (SZ31) for mouse FFPE tissue #DIA-310. Sections placed on charged slides and dried at 56°C overnight. Slides were subsequently deparaffinized in Xylene and hydrated through descending grades of ethyl alcohol to distilled water and placed in Tris Buffered Saline pH 7.4 (Scytek Labs – Logan, UT) - 5 minutes for pH adjustment. Heat Induced Epitope Retrieval (Scytek) was performed in microprocessor-controlled pressure cooker 125°C – 30 seconds followed by a cool down to 90°C OR vegetable steamer at 100°C – 30 minutes followed by a cool down for 10 minutes at room temperature, removed rinsed several times in distilled water. Endogenous peroxidase block - 30 minutes at 25°C in 3% Hydrogen Peroxide/Methanol bath followed by running tap water and several rinses in distilled water; and incubation in Tris buffered saline + Tween 20 (Scytek) for 5 minutes. Following pretreatment standard Micro-polymer staining steps performed at room temperature on the Biocare intelliPATH 125 automated stainer. All staining steps are followed by rinses in TBS Autowash Buffer + surfactant (Biocare – Concord, CA). After blocking for non-specific protein with Rodent Block M (Biocare) - 10 minutes. Sections were incubated with primaries as shown in table below in Normal Antibody Diluent (Scytek) or SignalStain® antibody diluent (Cell Signaling – Danvers, MA) for 1 hour OR overnight at 4°C. Rat on Mouse HRP – Probe incubation for 30 minutes (Biocare). Rat on Mouse HRP Polymer (Biocare) incubated for 30 minutes, or Rabbit on Rodent HRP Polymer (Biocare) incubated for 30 minutes. Reaction development utilized Betazoid DAB (Biocare) - 5 minutes followed by enhancement with DAB Sparkle (Biocare) – 1 minute. Counterstained in CATHE Hematoxylin diluted 1:5 – 5 minutes, followed by dehydration in ethanol, clearing in Xylene and coverslipping with permanent mounting media. Additional methods for each antibody used are listed (Table 4.6.20). 4.7 Acknowledgments I thank Mylena Ortiz, Jesus Garcia-Lerena, John Vusich, Anthony Schulte, Srijana Shrestha, and Olivia Morris for discussion of the manuscript and figures. I thank Morgan Atkins for transcribing mouse orthotopic injection mammary tumor sizes on October 21st, 2024. I thank the lab of Anna Moore for allowing use of their Tecan Spark plate reader. I thank Brett Trombley and Anthoney Schulte for assistance in use of the Tecan Spark plate reader. The authors also extend their deepest gratitude to Sandra O’Reilly for animal training and animal surgery techniques, and to Amy Porter and the MSU histopathology laboratory for FFPE embedding, sectioning, and staining/IHC of tumor samples acquired during the study and providing a description of procedures performed for materials and methods. 126 4.8 Author Contributions ERA is the corresponding author, collected MMTV-Neu tumor FFPE samples used in Figure 4.4.2, and provided minor revisions to the manuscript. CDB is responsible for all other aspects of the articles, including generating hypotheses, all experimental methods, data collection, data interpretation, data visualization, and writing of the manuscript. 127 CHAPTER 5: FUTURE DIRECTIONS 128 5.1 MMTV-Cre E2F5 conditional knockout: determining mechanistic effects of Cav1/Cav2/Met/Wnt2 amplification and Fbxo15 splice acceptor mutation Generation of MMTV-Cre E2F5 cKO mice was accomplished with the original intention to study the effects of depletion of E2F5 on mammary gland development. Unexpectedly, these mice developed highly penetrant and metastatic mammary tumors after long latency periods196. While E2F5 is known to primarily play a tumor suppressive role in the cell, no studies before had implicated loss of E2F5 being sufficient for oncogenic transformation. Given the long latency periods before tumor development, I hypothesized that additional genetic alterations must have occurred before oncogenic transformation can occur. We sought to perform WGS on select tumors from E2F5 cKO mice to ascertain the additional genetic alterations that occur and ultimately lead to tumorigenesis. WGS of MMTV-Cre E2F5 cKO mice revealed many homogeneous somatic events that are conserved in human breast cancer and may prove consequential for future treatment. However, the mechanistic implications of these somatic events are not well understood. To better understand the mechanisms for how these additional genetic alterations contribute to tumorigenesis and metastasis in the MMTV-Cre E2F5 cKO GEMM, I propose performing the base-specific in situ sequencing (BaSISS)379 workflow at different developmental stages of MMTV-Cre E2F5 cKO development to determine at what stage of development the recurrent Cav1/Cav2/Met/Wnt2 amplification and Fbxo15 splice acceptor mutation occurs. Given the prevalence of each of these events, I hypothesize these are genetic driving events, rather than passenger events. Determining the stage of development at which these genetic events occur will aid in determining the causal flow 129 of somatic genetic events. Determination of the chronological order of genetic alterations occurring can provide evidence for the ultimate link back to how depletion of E2F5 is responsible for this cascade of events initially. It could be the case that there are evolutionary clonal sweeps occurring, with certain clonal populations having a more competitive fitness advantage at certain stages of tumor development. Met and Wnt2 are known oncogenes219,225, while the status Cav1 and Cav2 play in tumorigenesis are less certain216,217,226–228. Fbxo15 is part of the larger F-box protein family, one of three necessary components in the SCF complex used for protein ubiquitination for targeting to 26S proteasomal degradation380. In addition to Fbxo15, tumor 5691 contains an R508H missense mutation in Fbxw7, another F-box family member with a confirmed tumor suppressive role of ubiquitinating various oncoproteins210—this is significant as multiple arginine to histidine or cysteine mutations in FBXW7 in humans in this syntenic region are confirmed to be oncogenic. These data together suggest an important role of ubiquitination deregulation in E2F5 cKO tumors. I hypothesize that E2F5 normally plays a tumor suppressive role for Met and Wnt2. Once E2F5 ablation has occurred, excess Met and Wnt2 proteins are found within the cell, contributing to increased oncogenic signaling. However, functional SCF complexes would ubiquitinate these oncoproteins, leading to their degradation. This leads to a selective pressure on the cells, where F-box protein ablation would allow for increased proliferation, thus increasing evolutionary fitness of these proto-cancerous cells within the E2F5 cKO mammary gland environment. This may explain why Fbxo15 deleterious splice acceptor mutations occur with such high penetrance across all tumors sequenced, in addition to Fbxw7 deleterious missense mutations. Extracting protein from early E2F5 cKO 130 mammary glands and comparing to E2F5 cKO tumor protein by western blot for quantitation of ubiquitin tagged proteins or proteasomal activity assay would determine if differing levels of ubiquitination are present in F-box mutant cells or not. While none of the tumors sequenced herein were without an Fbxo15 mutation, it is possible other E2F5 cKO tumors might be without it. In such a case, non-mutant E2F5 mammary gland protein could serve as a positive control for ubiquitination and protease activity. If ubiquitination is confirmed to be impaired in F-box mutant cells, I then hypothesize that amplification over Met and Wnt2 occurs at this time in clonal evolution due to a lack of negative selective pressure. To test this hypothesis, I propose taking MECs and generating a complete Fbxo15 KO using CRISPR to examine the effects on downstream ubiquitination. I hypothesize that total ubiquitination will be lower, but since E2F5 is still intact in the model, there will not be a selective pressure to develop Met or Wnt2 amplification. I propose ChIP-seq or ATAC-seq to identify E2F5 repressive sites within the genome. I have hypothesized that E2F5 is responsible for repressing Wnt2 and Met, but this is not known currently. If E2F5 is identified as repressing Wnt2 or Met, this would complement the hypothesis that E2F5 ablation would allow for increased fitness via amplification of Wnt2 and Met, with preventing oncoprotein ubiquitination via F-box protein ablation being a critical step in overcoming this obstacle to increased fitness, thus increasing selective pressure on deleterious mutations for F-box proteins. The downstream ubiquitination targets of Fbxo15 are not known, so I also propose comparing Fbxo15 KO MECs to WT MECs, enriching for ubiquitinated proteins using ubiquitin-affinity purification chromatography, and analyzing the proteome of each cell line using mass spectrometry. All experiments together would serve to elucidate the importance of Wnt2 131 and Met amplification in breast cancer, Fbxo15 mutations/ablation in breast cancer, and identify how E2F5 is related to Wnt2, Met, and Fbxo15 as a tumor suppressor and transcriptional regulator. 5.2 MMTV-Myc histological subtypes: identification of evolutionary selective pressures determining genomic and gene expression states It is known that the MMTV-Myc GEMM produces various distinct histological subtypes, with varying rates of metastasis associated with each histological subtype250. Observed by WGS was the recurrent amplification over chromosomes 11 and 15 in the microacinar subtype of the MMTV-Myc model, along with co-occurring mutations in Rara and Kit. The convergence on these specific events in the microacinar subtype suggests a strong fitness advantage for these tumors to adopt these genetic alterations. We currently do not know the importance of these whole chromosome amplifications or co- occurring mutations in Kit and Rara. After combing the literature, no previous linkage between KIT and RARA in humans has been directly established. All-trans retinoic acid treatment in combination with immune checkpoint therapies has been used to some success at treating metastatic melanoma381. RARA is a nuclear hormone receptor, which may be regulating KIT after being activated by retinoic acid. Validation of the mechanistic interplay between KIT, RARA, and whole chromosome amplification in MYC driven breast cancer is needed to determine if these data could represent new treatment modalities in MYC driven breast cancer. Herein, I propose experiments aimed at determining the interactions between these genetic alterations. No matter the histological subtype being examined, each MMTV-Myc tumor is initially transformed by overexpressing Myc, driven by the MMTV promoter. Importantly, 132 the microacinar subtype is the only epithelial like tumor examined373, while squamous and EMT are non-epithelial. Critically, the MMTV promoter/enhancer is only active in epithelial cell types within the mammary gland. The MMTV promoter/enhancer would thus only be expressed in the microacinar subtype, which is the only subtype that maintains high Myc gene expression and also maintains amplification over chromosome 15 and 11 of the MMTV-Myc tumors examined. Therefore, I hypothesize that Myc overexpression is directly responsible for the amplification of chromosomes 15 and 11 in the microacinar subtype by increasing flux within the cell cycle, leading to aberrant centromere division and isochromosome formation, with only dosage increases in chromosomes 15 and 11 providing a fitness advantage and being further propagated to daughter cells. To test this hypothesis, I propose transfecting Myc cDNA into MECs and culturing the resultant transformed cells. After multiple passages, I hypothesize that chromosomes 15 and 11 will be amplified compared to the rest of the genome, which can be analyzed using comparative genomic hybridization or fluorescence ISH. If MECs fail to transform after Myc overexpression, we can utilize a tetracycline inducible MMTV-Myc system where Myc is selectively overexpressed. I propose multiple conditions of tetracycline inducible MMTV-Myc mice with tetracycline fed to them until various timepoints during tumor development (e.g. one group taken off tetracycline immediately at tumor palpation, one group taken off tetracycline 2 weeks post tumor palpation, etc.). All mice would be necropsied and have WGS performed at tumor endpoint of 2,000 mm3. I hypothesize that mice taken off tetracycline closer to endpoint will be more likely to have microacinar histology and amplification of chromosomes 15 and 11. I hypothesize that microacinar histology is the first to develop within MMTV-Myc tumors as these are largely Myc 133 overexpression driven, but over time as additional somatic mutations and CNVs accrue, clonal diversification occurs, and clonal populations start competing with each other. Over time, other histological subtypes can start to outcompete microacinar cells, which may explain where the apparent continuum of gene expression between histological subtypes originates373. Given the co-occurrence of Rara and Kit mutations, it suggests a selective pressure for generating these mutations in Myc induced tumors. There is currently no known link between Kit and Rara in normal physiology or cancer development. I propose taking cell lines derived from MMTV-Myc tumors that are confirmed to not have Kit or Rara mutations, and utilize CRISPR to create A538E amino acid coding mutations in Kit and A255D amino acid coding mutations in Rara in separate populations. I hypothesize that one of these mutations must happen first in the tumor, creating a favorable environment for the other mutation to occur. Given that the Rara VAF is close to 100% in almost all cases of mutation presence and Kit mutation VAF is typically 50%, I hypothesize that Rara mutations typically occur first and then lead to the development of Kit mutations. This would bear out in the data as A255D mutated Rara proteins leading to the development of A538E Kit mutations, while A538E Kit mutations would not lead to the development of A255D Rara mutations. This would provide further evidence of a linkage between Kit and Rara. Given the prevalence of these co-occurring mutations in the microacinar subtype (>60%), this may implicate Myc overexpression as being an important catalyst in this mutational process. We can examine WGS to find copy number changes after Kit and Rara mutation, but attempting to upregulate/amplify entire chromosomes to then examine for Kit or Rara mutation is not possible with current 134 biotechnology approaches. Therefore, we cannot be certain if Kit and Rara mutations occur before chromosome 11 and 15 amplification, or if the reverse is true. Another potential future direction would be that of clonal evolution and competition. MMTV-Myc tumors with mixed histological subtypes are prevalent in terms of proportional abundance250, showing that many histological subtypes could be in competition with each other in spontaneously occurring MMTV-Myc tumors. Squamous and EMT histological subtypes lost Myc overexpression during tumor development, suggesting a selective pressure to shift away from Myc overexpression to other oncogenic traits that would yield greater fitness advantages. To test this hypothesis, I would label different MMTV-Myc derived cell lines from microacinar, squamous, EMT, and papillary histologies with fluorescent proteins unique to each histology. Populations of each cell line would be mixed in various proportions (e.g. 25% to 75%, 50% to 50%, etc.) and injected into the mammary fat pad of syngeneic MMTV-Cre mice. I hypothesize that there would be competition between histological subtype cell lines for colonization of the mammary gland, and that the non-epithelial cell types would often colonize the mammary gland faster and represent a greater proportion of the resultant tumor, if not entirely outcompeting epithelial cell types. This could be expanded to all three or four histological subtype cell lines injected simultaneously to observe ultimate population frequency. Beyond cell lines, clonal evolution and competition could be extended to MMTV- Myc mice themselves for therapeutic development or insight on therapeutic application for human breast cancers. It has been established that the EMT and papillary cell lines derived from MMTV-Myc tumors utilize different metabolic pathways to proliferate382. I hypothesize that spontaneous MMTV-Myc tumors or combined microacinar, EMT, 135 squamous, and papillary tumors can be evolutionarily steered using therapeutics. It has been established that papillary cell lines prefer de novo nucleotide biosynthesis and is responsive to methotrexate treatment. EMT cell lines, however, prefer scavenging nucleotides rather than de novo biosynthesis. I hypothesize that treating MMTV-Myc mice with methotrexate during tumor development would dramatically reduce the number of papillary tumors seen, and instead that these mice would preferentially develop another histological subtype, such as EMT. These results could have important implications for the treatment of human cancers, as we often use a maximum tolerated dose approach in treating cancer in the clinical setting. Maximum tolerated dose is effective in many cancers, but prostate cancer383,384, melanoma385, breast cancer64, and many other cancer types experience therapeutic resistance development often due to consistent selective pressure enabling resistant populations the chance to flourish. There is promising evidence that current treatment paradigms can be more effective if treating patients in an approach that considers the prospect of clonal evolution in heterogeneous cancers and actively avoids creating an environment favorable to resistant population development386. Breast cancer is a notoriously heterogeneous disease, in which one continuous treatment may select for a specific resistant population61,69,75,83,84,297. In this scenario, I hypothesize that methotrexate treatment upon first palpation of MMTV-Myc tumors may eliminate tumor development altogether in some cases, but often will lead to development of EMT like tumors which have higher rates of metastasis. EMT primary tumors have higher rates of metastasis in MMTV-Myc mice250, so human equivalent treatments would rather focus on creating a favorable tumor microenvironment for more easily treatable subtypes of breast cancer. These experiments will serve as a proof of concept that spontaneous, 136 heterogeneous tumors can have their clonal evolution steered, with the implication for human breast cancer treatment being that we can steer the tumor’s evolution away from resistant or hard to treat subtypes. A final potential future direction for utilizing the MMTV-Myc mouse model begins with the knowledge of conserved orthologous blocks of DNA shared between species. Mouse chromosomes 11 and 15 are largely syntenic with human chromosomes 17 and 8—these large blocks of DNA have been conserved over evolution, merely segregated to different chromosomes in each species over time387. In MYC induced breast cancer in humans, chromosomes 17 and 8 are often amplified, similar to the amplification of chromosomes 11 and 15 for microacinar tumors in the MMTV-Myc mouse model. It is important to remember that the microacinar subtype is the only MMTV-Myc histological subtype sequenced that still maintains high Myc expression, while the squamous and EMT cell lines which do not overexpress Myc do not have conserved amplification events. If all Myc overexpressing breast cancers show selective amplification over the same syntenic regions of DNA between species, this suggests amplified and overexpressed genes within this evolutionarily shared region provide a competitive advantage compared to populations without this amplification. I propose creating a rat model of Myc overexpression, for which none currently exists. I hypothesize a similar syntenic amplification event would occur in Myc overexpressing tumors in rats. Rat chromosomes 2 and 8 are syntenic to mouse chromosome 15, and rat chromosomes 10 and 14 are syntenic to mouse chromosome 11. Human chromosomes often experience amplification or deletion over arms of each chromosome as they are often metacentric or submetacentric. But rat and mouse chromosomes are all acrocentric, with the long arm 137 of the chromosome functionally being the whole chromosome. For non-focal amplification events, this would dictate that the entirety of each chromosome in rats or mice must be amplified, as is the case in the microacinar subtype for chromosomes 11 and 15. I hypothesize that rat chromosome 2, 8, 10, and 14 would be wholly amplified in response to Myc overexpression. However, if only some chromosomes are amplified, this further narrows down the overlapping syntenic region between humans, mice, and rats that Myc would be directly responsible for amplifying. If Myc overexpression directly induces amplification of chromosomes in other species, these data can be used to further narrow down the essential region of amplification for putative oncogene identification. 5.3 17q21 cytoband amplification: overexpression of 17q21 cytoband genes to observe differential outcomes in gene expression and metastasis The cell line BT-474 was selected to be used for experimental procedures as it accurately mimics the genetic and gene expression state of human breast cancers that amplify the 17q21 cytoband. BT-474 is a HR+/HER2+ breast cancer cell line with confirmed amplification over cytoband 17q21, which includes genes COL1A1, CHAD, and PHB1. Previous preliminary work in the lab has found that KD of COL1A1 or CHAD results in decreased migration in vitro, while KD of COL1A1 or CHAD reduces metastasis in vivo118. However, results performed herein could not recapitulate the preliminary results. The BT-474 cell line proved to be poorly migratory, invasive, and tumorigenic in my own experiments. There are no other readily available breast cancer cell lines that are HR+/HER2+ and possess an amplification of the 17q21 cytoband similar to BT-474. However, upregulating COL1A1, CHAD, or PHB1 independently may better show how each gene causally affects metastasis and clinical outcomes. In the case of 138 upregulating individual genes, causal inference on gene upregulation causing downstream effects is easily ascertainable. To this end, I propose using CRISPR activation (CRISPRa) to upregulate COL1A1, CHAD, and PHB1 in the CAMA1 luminal A (HR+/HER2-) breast cancer cell line and in the BT-20 basal-like (HR-/HER2-) breast cancer cell line. Both of these cell lines are known to be tumorigenic in nude mice and do not possess amplification over the 17q21 cytoband. Thus, these cell lines would be ideal candidates to upregulate COL1A1, CHAD, and PHB1 in to examine effects on metastasis independent even of ERBB2 status. Experiments performed in chapter 4 of this thesis would be repeated for these additional cell lines. I hypothesize that upregulating COL1A1, CHAD, and PHB1 in both the CAMA1 and BT-20 cell line would result in increased metastasis, migration, and invasion. If similar effects are seen in both CAMA1 and BT-20 derived CRISPRa cell lines, this would strongly implicate upregulation of target genes being responsible for phenotypic changes, rather than cell line specific effects. Altogether, these future directions will provide meaningful investigation into the mechanistic effects of oncogenes and tumor suppressors in breast cancer, providing avenues for therapeutic development and new treatment modalities in the future. 139 BIBLIOGRAPHY 1. Lagay, F. The Legacy of Humoral Medicine. AMA J. Ethics 4, 206–208 (2002). 2. Hajdu, S. I. Greco-Roman thought about cancer. Cancer 100, 2048–2051 (2004). 3. Virchow, R. Cellular Pathology. (Ann Arbor, Mich. : Edwards Bros., 1940). 4. Boveri, T. Concerning the Origin of Malignant Tumours by Theodor Boveri. Translated and annotated by Henry Harris. J. Cell Sci. 121, 1–84 (2008). 5. Knudson, A. G. Mutation and Cancer: Statistical Study of Retinoblastoma. Proc. Natl. Acad. Sci. 68, 820–823 (1971). 6. Fearon, E. R., Hamilton, S. R. & Vogelstein, B. Clonal Analysis of Human Colorectal Tumors. Science 238, 193–197 (1987). 7. Vogelstein, B. et al. Genetic Alterations during Colorectal-Tumor Development. N. Engl. J. Med. 319, 525–532 (1988). 8. Hanahan, D. & Weinberg, R. A. The Hallmarks of Cancer. Cell 100, 57–70 (2000). 9. Crick, F. Central Dogma of Molecular Biology. Nature 227, 561–563 (1970). 10. Cancer through the Lens of Evolution and Ecology. (CRC Press, Boca Raton, 2024). doi:10.1201/9781003307921. 11. Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006). 12. Vogelstein, B. et al. Cancer Genome Landscapes. Science 339, 1546–1558 (2013). 13. Zhang, Y.-L. et al. The prevalence of EGFR mutation in patients with non-small cell lung cancer: a systematic review and meta-analysis. Oncotarget 7, 78985–78993 (2016). 14. Bethune, G., Bethune, D., Ridgway, N. & Xu, Z. Epidermal growth factor receptor (EGFR) in lung cancer: an overview and update. J. Thorac. Dis. 2, 48–51 (2010). 15. Brennan, C. W. et al. The Somatic Genomic Landscape of Glioblastoma. Cell 155, 462–477 (2013). 16. Uribe, M. L., Marrocco, I. & Yarden, Y. EGFR in Cancer: Signaling Mechanisms, Drugs, and Acquired Resistance. Cancers 13, 2748 (2021). 17. Masuda, H. et al. Role of epidermal growth factor receptor in breast cancer. Breast Cancer Res. Treat. 136, 331–345 (2012). 140 18. Derakhshan, F. & Reis-Filho, J. S. Pathogenesis of Triple-Negative Breast Cancer. Annu. Rev. Pathol. 17, 181–204 (2022). 19. Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020). 20. Dick, F. A. & Rubin, S. M. Molecular mechanisms underlying RB protein function. Nat. Rev. Mol. Cell Biol. 14, 297–306 (2013). 21. Hafner, A., Bulyk, M. L., Jambhekar, A. & Lahav, G. The multiple mechanisms that regulate p53 activity and cell fate. Nat. Rev. Mol. Cell Biol. 20, 199–210 (2019). 22. Kennedy, M. C. & Lowe, S. W. Mutant p53: it’s not all one and the same. Cell Death Differ. 29, 983–987 (2022). 23. Donehower, L. A. et al. Mice deficient for p53 are developmentally normal but susceptible to spontaneous tumours. Nature 356, 215–221 (1992). 24. Olivier, M., Hollstein, M. & Hainaut, P. TP53 Mutations in Human Cancers: Origins, Consequences, and Clinical Use. Cold Spring Harb. Perspect. Biol. 2, a001008 (2010). 25. Johansson, A. L. V. et al. In modern times, how important are breast cancer stage, grade and receptor subtype for survival: a population-based cohort study. Breast Cancer Res. 23, 17 (2021). 26. Grimm, L. J., Rahbar, H., Abdelmalak, M., Hall, A. H. & Ryser, M. D. Ductal Carcinoma in Situ: State-of-the-Art Review. Radiology 302, 246–255 (2021). 27. van Seijen, M. et al. Ductal carcinoma in situ: to treat or not to treat, that is the question. Br. J. Cancer 121, 285–292 (2019). 28. Mannu, G. S. et al. Invasive breast cancer and breast cancer mortality after ductal carcinoma in situ in women attending for breast screening in England, 1988-2014: population based observational cohort study. BMJ 369, m1570 (2020). 29. Strand, S. H. et al. Molecular classification and biomarkers of clinical outcome in breast ductal carcinoma in situ: Analysis of TBCRC 038 and RAHBT cohorts. Cancer Cell 40, 1521-1536.e7 (2022). 30. Giaquinto, A. N. et al. Breast Cancer Statistics, 2022. CA. Cancer J. Clin. 72, 524– 541 (2022). 31. Female Breast Cancer Subtypes - Cancer Stat Facts. SEER https://seer.cancer.gov/statfacts/html/breast-subtypes.html. 141 32. Stages of Breast Cancer | Understand Breast Cancer Staging. https://www.cancer.org/cancer/types/breast-cancer/understanding-a-breast-cancer- diagnosis/stages-of-breast-cancer.html. 33. Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747– 752 (2000). 34. Fougner, C., Bergholtz, H., Norum, J. H. & Sørlie, T. Re-definition of claudin-low as a breast cancer phenotype. Nat. Commun. 11, 1787 (2020). 35. Chen, W., Hoffmann, A. D., Liu, H. & Liu, X. Organotropism: new insights into molecular mechanisms of breast cancer metastasis. Npj Precis. Oncol. 2, 1–12 (2018). 36. Dai, X. et al. Breast cancer intrinsic subtype classification, clinical use and future trends. Am. J. Cancer Res. 5, 2929–2943 (2015). 37. Ohnstad, H. O. et al. Prognostic value of PAM50 and risk of recurrence score in patients with early-stage breast cancer with long-term follow-up. Breast Cancer Res. BCR 19, 120 (2017). 38. Siegel, R. L., Giaquinto, A. N. & Jemal, A. Cancer statistics, 2024. CA. Cancer J. Clin. 74, 12–49 (2024). 39. U.S. Cancer Statistics Working Group. U.S. Cancer Statistics Data Visualizations Tool. (2024). 40. Islami, F. et al. Proportion and number of cancer cases and deaths attributable to potentially modifiable risk factors in the United States. CA. Cancer J. Clin. 68, 31–54 (2018). 41. Valastyan, S. & Weinberg, R. A. Tumor Metastasis: Molecular Insights and Evolving Paradigms. Cell 147, 275–292 (2011). 42. Dillekås, H., Rogers, M. S. & Straume, O. Are 90% of deaths from cancer caused by metastases? Cancer Med. 8, 5574–5576 (2019). 43. Mani, K. et al. Causes of death among people living with metastatic cancer. Nat. Commun. 15, 1519 (2024). 44. Ahmad, F. B. & Anderson, R. N. The Leading Causes of Death in the US for 2020. JAMA 325, 1829–1830 (2021). 45. Ahmad, F. B. Provisional Mortality Data — United States, 2022. MMWR Morb. Mortal. Wkly. Rep. 72, (2023). 46. Yellman, M. A. Motor Vehicle Crash Deaths — United States and 28 Other High- Income Countries, 2015 and 2019. MMWR Morb. Mortal. Wkly. Rep. 71, (2022). 142 47. Chikarmane, S. A., Tirumani, S. H., Howard, S. A., Jagannathan, J. P. & DiPiro, P. J. Metastatic patterns of breast cancer subtypes: What radiologists should know in the era of personalized cancer medicine. Clin. Radiol. 70, 1–10 (2015). 48. Guo, Y. et al. Different Breast Cancer Subtypes Show Different Metastatic Patterns: A Study from A Large Public Database. Asian Pac. J. Cancer Prev. APJCP 21, 3587– 3593 (2020). 49. Xiao, W. et al. Breast cancer subtypes and the risk of distant metastasis at initial diagnosis: a population-based study. Cancer Manag. Res. 10, 5329–5338 (2018). 50. Arpino, G., Bardou, V. J., Clark, G. M. & Elledge, R. M. Infiltrating lobular carcinoma of the breast: tumor characteristics and clinical outcome. Breast Cancer Res. 6, R149 (2004). 51. Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012). 52. Gallicchio, L., Devasia, T. P., Tonorezos, E., Mollica, M. A. & Mariotto, A. Estimation of the Number of Individuals Living With Metastatic Cancer in the United States. JNCI J. Natl. Cancer Inst. 114, 1476–1483 (2022). 53. Burguin, A., Diorio, C. & Durocher, F. Breast Cancer Treatments: Updates and New Challenges. J. Pers. Med. 11, 808 (2021). 54. Treatment of Stage IV (Metastatic) Breast Cancer. https://www.cancer.org/cancer/types/breast-cancer/treatment/treatment-of-breast- cancer-by-stage/treatment-of-stage-iv-advanced-breast-cancer.html. 55. Treatment of Breast Cancer Stages I-III. https://www.cancer.org/cancer/types/breast-cancer/treatment/treatment-of-breast- cancer-by-stage/treatment-of-breast-cancer-stages-i-iii.html. 56. Treatment of Breast Cancer by Stage. https://www.cancer.org/cancer/types/breast- cancer/treatment/treatment-of-breast-cancer-by-stage.html. 57. Breast Cancer Hormone Receptor Status | Estrogen Receptor. https://www.cancer.org/cancer/types/breast-cancer/understanding-a-breast-cancer- diagnosis/breast-cancer-hormone-receptor-status.html. 58. Fuentes, N. & Silveyra, P. Estrogen receptor signaling mechanisms. Adv. Protein Chem. Struct. Biol. 116, 135–170 (2019). 59. Wendler, R. Diagnosed with breast cancer after menopause? Aromatase inhibitors Center can https://www.mdanderson.org/cancerwise/diagnosed-with-breast-cancer-after- menopause--aromatase-inhibitors-can-help.h00-159542112.html. Anderson Cancer help. MD 143 60. Breast Cancer Status? https://www.cancer.org/cancer/types/breast-cancer/understanding-a-breast-cancer- diagnosis/breast-cancer-her2-status.html. | What Status HER2 HER2 is 61. Waks, A. G. et al. Dual HER2 inhibition: mechanisms of synergy, patient selection, and resistance. Nat. Rev. Clin. Oncol. 1–15 (2024) doi:10.1038/s41571-024-00939- 2. 62. Raghav, K. P. S. & Moasser, M. M. Molecular Pathways and Mechanisms of HER2 in Cancer Therapy. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 29, 2351– 2361 (2023). 63. Cortés, J. et al. Trastuzumab Deruxtecan versus Trastuzumab Emtansine for Breast Cancer. N. Engl. J. Med. 386, 1143–1154 (2022). 64. Swain, S. M., Shastry, M. & Hamilton, E. Targeting HER2-positive breast cancer: advances and future directions. Nat. Rev. Drug Discov. 22, 101–126 (2023). 65. Immunotherapy Treatment. https://www.cancer.org/cancer/types/breast-cancer/treatment/immunotherapy.html. Breast Cancer Breast Cancer for | 66. Friedenson, B. The BRCA1/2 pathway prevents hematologic cancers in addition to breast and ovarian cancers. BMC Cancer 7, 152 (2007). 67. null, null. Breast Cancer Risk Genes — Association Analysis in More than 113,000 Women. N. Engl. J. Med. 384, 428–439 (2021). 68. Cortesi, L., Rugo, H. S. & Jackisch, C. An Overview of PARP Inhibitors for the Treatment of Breast Cancer. Target. Oncol. 16, 255–282 (2021). 69. Dias, M. P., Moser, S. C., Ganesan, S. & Jonkers, J. Understanding and overcoming resistance to PARP inhibitors in cancer therapy. Nat. Rev. Clin. Oncol. 18, 773–791 (2021). 70. Dilawar, H. et al. Gastric Metastasis from Invasive Lobular Breast Cancer, Resembling Primary Gastric Cancer. J. Nucl. Med. Technol. 52, 68–70 (2024). 71. Bertucci, F. et al. Comparative genomic analysis of primary tumors and metastases in breast cancer. Oncotarget 7, 27208–27219 (2016). 72. Sun, Y. et al. Integrated multi-omics profiling to dissect the spatiotemporal evolution of metastatic hepatocellular carcinoma. Cancer Cell 42, 135-156.e17 (2024). 73. Bonadonna, G. et al. Combination Chemotherapy as an Adjuvant Treatment in Operable Breast Cancer. N. Engl. J. Med. 294, 405–410 (1976). 144 74. Amir, E., Seruga, B., Niraula, S., Carlsson, L. & Ocaña, A. Toxicity of Adjuvant Endocrine Therapy in Postmenopausal Breast Cancer Patients: A Systematic Review and Meta-analysis. JNCI J. Natl. Cancer Inst. 103, 1299–1309 (2011). 75. Kinnel, B., Singh, S. K., Oprea-Ilies, G. & Singh, R. Targeted Therapy and Mechanisms of Drug Resistance in Breast Cancer. Cancers 15, 1320 (2023). 76. Chen, Z., Han, F., Du, Y., Shi, H. & Zhou, W. Hypoxic microenvironment in cancer: molecular mechanisms and therapeutic interventions. Signal Transduct. Target. Ther. 8, 1–23 (2023). 77. Binnewies, M. et al. Understanding the tumor immune microenvironment (TIME) for effective therapy. Nat. Med. 24, 541–550 (2018). 78. Clusan, L., Le Goff, P., Flouriot, G. & Pakdel, F. A Closer Look at Estrogen Receptor Mutations in Breast Cancer and Their Implications for Estrogen and Antiestrogen Responses. Int. J. Mol. Sci. 22, 756 (2021). 79. Schiavon, G. et al. Analysis of ESR1 mutation in circulating tumor DNA demonstrates evolution during therapy for metastatic breast cancer. Sci. Transl. Med. 7, 313ra182- 313ra182 (2015). 80. Spoerke, J. M. et al. Heterogeneity and clinical significance of ESR1 mutations in ER-positive metastatic breast cancer patients receiving fulvestrant. Nat. Commun. 7, 11579 (2016). 81. Hanker, A. B. et al. Co-occurring gain-of-function mutations in HER2 and HER3 modulate HER2/HER3 activation, oncogenesis, and HER2 inhibitor sensitivity. Cancer Cell 39, 1099-1114.e8 (2021). 82. Loibl, S. et al. PIK3CA mutations are associated with reduced pathological complete response rates in primary HER2-positive breast cancer: pooled analysis of 967 patients from five prospective trials investigating lapatinib and trastuzumab†. Ann. Oncol. 27, 1519–1525 (2016). 83. Xia, W. et al. A model of acquired autoresistance to a potent ErbB2 tyrosine kinase inhibitor and a therapeutic strategy to prevent its onset in breast cancer. Proc. Natl. Acad. Sci. 103, 7795–7800 (2006). 84. Wang, Y.-C. et al. Different mechanisms for resistance to trastuzumab versus lapatinib in HER2- positive breast cancers - role of estrogen receptor and HER2 reactivation. Breast Cancer Res. 13, R121 (2011). 85. Kaufman, B. et al. Trastuzumab Plus Anastrozole Versus Anastrozole Alone for the Treatment of Postmenopausal Women With Human Epidermal Growth Factor Receptor 2–Positive, Hormone Receptor–Positive Metastatic Breast Cancer: Results From the Randomized Phase III TAnDEM Study. J. Clin. Oncol. 27, 5529– 5537 (2009). 145 86. Tarantino, P. et al. HER2-Low Breast Cancer: Pathological and Clinical Landscape. J. Clin. Oncol. 38, 1951–1962 (2020). 87. Schettini, F. et al. Clinical, pathological, and PAM50 gene expression features of HER2-low breast cancer. Npj Breast Cancer 7, 1–13 (2021). 88. Gradishar, W. J. et al. Breast Cancer Version 2.2015. J. Natl. Compr. Canc. Netw. 13, 448–475 (2015). 89. Giuliano, A. E. et al. Breast Cancer—Major changes in the American Joint Committee on Cancer eighth edition cancer staging manual. CA. Cancer J. Clin. 67, 290–303 (2017). 90. Fehrenbacher, L. et al. NSABP B-47/NRG Oncology Phase III Randomized Trial Comparing Adjuvant Chemotherapy With or Without Trastuzumab in High-Risk Invasive Breast Cancer Negative for HER2 by FISH and With IHC 1+ or 2+. J. Clin. Oncol. 38, 444–453 (2020). 91. Modi, S. et al. Trastuzumab Deruxtecan in Previously Treated HER2-Low Advanced Breast Cancer. N. Engl. J. Med. 387, 9–20 (2022). 92. Rottenberg, S. et al. High sensitivity of BRCA1-deficient mammary tumors to the PARP inhibitor AZD2281 alone and in combination with platinum drugs. Proc. Natl. Acad. Sci. 105, 17079–17084 (2008). 93. Jaspers, J. E. et al. BRCA2-Deficient Sarcomatoid Mammary Tumors Exhibit Multidrug Resistance. Cancer Res. 75, 732–741 (2015). 94. Pettitt, S. J. et al. Clinical BRCA1/2 Reversion Analysis Identifies Hotspot Mutations and Predicted Neoantigens Associated with Therapy Resistance. Cancer Discov. 10, 1475–1488 (2020). 95. Sakai, W. et al. Secondary mutations as a mechanism of cisplatin resistance in BRCA2-mutated cancers. Nature 451, 1116–1120 (2008). 96. Edwards, S. L. et al. Resistance to therapy caused by intragenic deletion in BRCA2. Nature 451, 1111–1115 (2008). 97. Li, H. et al. PARP inhibitor resistance: the underlying mechanisms and clinical implications. Mol. Cancer 19, 1–16 (2020). 98. Osborne, C., Wilson, P. & Tripathy, D. Oncogenes and Tumor Suppressor Genes in Breast Cancer: Potential Diagnostic and Therapeutic Applications. The Oncologist 9, 361–377 (2004). 99. Rajendran, B. K. & Deng, C.-X. Characterization of potential driver mutations involved in human breast cancer by computational approaches. Oncotarget 8, 50252–50272 (2017). 146 100. Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020). 101. Cardiff, R. D. & Kenney, N. Mouse Mammary Tumor Biology: A Short History. in Advances in Cancer Research vol. 98 53–116 (Academic Press, 2007). 102. Oberling, C. & GuéRin, M. The Role of Viruses in the Production of Cancer. in Advances in Cancer Research (eds. Greenstein, J. P. & Haddow, A.) vol. 2 353–423 (Academic Press, 1954). 103. Medina, D. Of Mice and Women: A Short History of Mouse Mammary Cancer Research with an Emphasis on the Paradigms Inspired by the Transplantation Method. Cold Spring Harb. Perspect. Biol. 2, a004523 (2010). 104. Bittner, J. J. Some Possible Effects of Nursing on the Mammary Gland Tumor Incidence in Mice. Science 84, 162–162 (1936). 105. Pitelka, D. R., Bern, H. A., Nandi, S. & DeOme, K. B. On the Significance of Virus- Like Particles in Mammary Tissues of C3Hf Mice2. JNCI J. Natl. Cancer Inst. 33, 857–885 (1964). 106. Nusse, R. & Varmus, H. E. Many tumors induced by the mouse mammary tumor virus contain a provirus integrated in the same region of the host genome. Cell 31, 99–109 (1982). 107. Stewart, T. A., Pattengale, P. K. & Leder, P. Spontaneous mammary adenocarcinomas in transgenic mice that carry and express MTV/myc fusion genes. Cell 38, 627–637 (1984). 108. Park, M. K., Lee, C. H. & Lee, H. Mouse models of breast cancer in preclinical research. Lab. Anim. Res. 34, 160–165 (2018). 109. Sinn, E. et al. Coexpression of MMTV/v-Ha-ras and MMTV/c-myc genes in transgenic mice: Synergistic action of oncogenes in vivo. Cell 49, 465–475 (1987). 110. Muller, W. J., Sinn, E., Pattengale, P. K., Wallace, R. & Leder, P. Single-step induction of mammary adenocarcinoma in transgenic mice bearing the activated c- neu oncogene. Cell 54, 105–115 (1988). 111. Guy, C. T. et al. Expression of the neu protooncogene in the mammary epithelium of transgenic mice induces metastatic disease. Proc. Natl. Acad. Sci. 89, 10578– 10582 (1992). 112. Guy, C. T., Cardiff, R. D. & Muller, W. J. Induction of Mammary Tumors by Expression of Polyomavirus Middle T Oncogene: A Transgenic Mouse Model for Metastatic Disease. Mol. Cell. Biol. 12, 954–961 (1992). 147 113. Wagner, K.-U. et al. Cre-mediated gene deletion in the mammary gland. Nucleic Acids Res. 25, 4323–4330 (1997). 114. Vogelstein, B. & Kinzler, K. W. Cancer genes and the pathways they control. Nat. Med. 10, 789–799 (2004). 115. Swiatnicki, M. R. & Andrechek, E. R. Metastasis is altered through multiple processes regulated by the E2F1 transcription factor. Sci. Rep. 11, 9502 (2021). 116. Swiatnicki, M. R. & Andrechek, E. R. How to Choose a Mouse Model of Breast Cancer, a Genomic Perspective. J. Mammary Gland Biol. Neoplasia 24, 231–243 (2019). 117. Rennhack, J., To, B., Wermuth, H. & Andrechek, E. R. Mouse Models of Breast Cancer Share Amplification and Deletion Events with Human Breast Cancer. J. Mammary Gland Biol. Neoplasia 22, 71–84 (2017). 118. Rennhack, J. P. et al. Integrated analyses of murine breast cancer models reveal critical parallels with human disease. Nat. Commun. 10, 3261 (2019). 119. Hollern, D. P. et al. E2F1 Drives Breast Cancer Metastasis by Regulating the Target Gene FGF13 and Altering Cell Migration. Sci. Rep. 9, 10718 (2019). 120. Ross, C. et al. The genomic landscape of metastasis in treatment-naïve breast cancer models. PLOS Genet. 16, e1008743 (2020). 121. Campbell, K. M. et al. A Spontaneous Aggressive ERα+ Mammary Tumor Model Is Driven by Kras Activation. Cell Rep. 28, 1526-1537.e4 (2019). 122. Hutter, C. & Zenklusen, J. C. The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell 173, 283–285 (2018). 123. Shi, X. et al. Building a translational cancer dependency map for The Cancer Genome Atlas. Nat. Cancer 5, 1176–1194 (2024). 124. Chen, H.-Z., Tsai, S.-Y. & Leone, G. Emerging roles of E2Fs in cancer: an exit from cell cycle control. Nat. Rev. Cancer 9, 785–797 (2009). 125. Dyson, N. The regulation of E2F by pRB-family proteins. Genes Dev. 12, 2245–2262 (1998). 126. Kent, L. N. & Leone, G. The broken cycle: E2F dysfunction in cancer. Nat. Rev. Cancer 19, 326–338 (2019). 127. Dimova, D. K. & Dyson, N. J. The E2F transcriptional network: old acquaintances with new faces. Oncogene 24, 2810–2826 (2005). 148 128. Cam, H. et al. A Common Set of Gene Regulatory Networks Links Metabolism and Growth Inhibition. Mol. Cell 16, 399–411 (2004). 129. Ren, B. et al. E2F integrates cell cycle progression with DNA repair, replication, and G2/M checkpoints. Genes Dev. 16, 245–256 (2002). 130. Xu, X. et al. A comprehensive ChIP–chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells reveals interchangeable roles of E2F family members. Genome Res. 17, 1550–1561 (2007). 131. Wang, H., Wang, X., Xu, L., Zhang, J. & Cao, H. Integrated analysis of the E2F transcription factors across cancer types. Oncol. Rep. 43, 1133–1146 (2020). 132. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102, 15545– 15550 (2005). 133. Bindra, R. S. et al. Hypoxia-induced down-regulation of BRCA1 expression by E2Fs. Cancer Res. 65, 11597–11604 (2005). 134. Saavedra, H. I. et al. Inactivation of E2F3 results in centrosome amplification. Cancer Cell 3, 333–346 (2003). 135. Cahill, D. P., Kinzler, K. W., Vogelstein, B. & Lengauer, C. Genetic instability and darwinian selection in tumours. Trends Cell Biol. 9, M57–M60 (1999). 136. Fodde, R., Smits, R. & Clevers, H. APC, Signal transduction and genetic instability in colorectal cancer. Nat. Rev. Cancer 1, 55–67 (2001). 137. Rempel, R. E. et al. A Role for E2F Activities in Determining the Fate of Myc-Induced Lymphomagenesis. PLOS Genet. 5, e1000640 (2009). 138. Yan, L.-H. et al. Reversal of Multidrug Resistance in Gastric Cancer Cells by E2F-1 Downregulation In Vitro and In Vivo. J. Cell. Biochem. 115, 34–41 (2014). 139. Yan, L.-H. et al. Overexpression of E2F1 in human gastric carcinoma is involved in anti-cancer drug resistance. BMC Cancer 14, 904 (2014). 140. Gao, Z., Shi, R., Yuan, K. & Wang, Y. Expression and prognostic value of E2F activators in NSCLC and subtypes: a research based on bioinformatics analysis. Tumor Biol. 37, 14979–14987 (2016). 141. Suh, D. S., Yoon, M. S., Choi, K. U. & Kim, J. Y. Significance of E2F-1 overexpression in epithelial ovarian cancer. Int. J. Gynecol. Cancer 18, (2008). 142. Yao, H., Lu, F. & Shao, Y. The E2F family as potential biomarkers and therapeutic targets in colon cancer. PeerJ 8, e8562 (2020). 149 143. Liban, T. J., Thwaites, M. J., Dick, F. A. & Rubin, S. M. Structural Conservation and E2F Binding Specificity within the Retinoblastoma Pocket Protein Family. J. Mol. Biol. 428, 3960–3971 (2016). 144. Qiao, H. et al. Human TFDP3, a Novel DP Protein, Inhibits DNA Binding and Transactivation by E2F *. J. Biol. Chem. 282, 454–466 (2007). 145. Rowland, B. D. & Bernards, R. Re-Evaluating Cell-Cycle Regulation by E2Fs. Cell 127, 871–874 (2006). 146. Rubin, S. M. Deciphering the Rb phosphorylation code. Trends Biochem. Sci. 38, 12–19 (2013). 147. Chow, K. N. B. & Dean, D. C. Domains A and B in the Rb Pocket Interact To Form a Transcriptional Repressor Motif. Mol. Cell. Biol. 16, 4862–4868 (1996). 148. Rubin, S. M., Gall, A.-L., Zheng, N. & Pavletich, N. P. Structure of the Rb C-Terminal Domain Bound to E2F1-DP1: A Mechanism for Phosphorylation-Induced E2F Release. Cell 123, 1093–1106 (2005). 149. Trimarchi, J. M., Fairchild, B., Wen, J. & Lees, J. A. The E2F6 transcription factor is a component of the mammalian Bmi1-containing polycomb complex. Proc. Natl. Acad. Sci. 98, 1519–1524 (2001). 150. Lewis, E. B. A gene complex controlling segmentation in Drosophila. Nature 276, 565–570 (1978). 151. Storre, J. et al. Homeotic transformations of the axial skeleton that accompany a targeted deletion of E2f6. EMBO Rep. 3, 695–700 (2002). 152. Shen, W.-H. The plant E2F–Rb pathway and epigenetic control. Trends Plant Sci. 7, 505–511 (2002). 153. Kosugi, S. & Ohashi, Y. E2Ls, E2F-like Repressors of Arabidopsis That Bind to E2F Sites in a Monomeric Form. J. Biol. Chem. 277, 16553–16558 (2002). 154. Rauber, R., Cabreira, C., de Freitas, L. B., Turchetto-Zolet, A. C. & Margis-Pinheiro, M. The evolutionary history of the E2F and DEL genes in Viridiplantae. Mol. Phylogenet. Evol. 99, 225–234 (2016). 155. Taubert, S. et al. E2F-Dependent Histone Acetylation and Recruitment of the Tip60 Acetyltransferase Complex to Chromatin in Late G1. Mol. Cell. Biol. 24, 4546–4556 (2004). 156. Glozak, M. A. & Seto, E. Histone deacetylases and cancer. Oncogene 26, 5420– 5432 (2007). 150 157. Kadoch, C. & Crabtree, G. R. Mammalian SWI/SNF chromatin remodeling complexes and cancer: Mechanistic insights gained from human genomics. Sci. Adv. 1, e1500447 (2015). 158. Harbour, J. W. & Dean, D. C. The Rb/E2F pathway: expanding roles and emerging paradigms. Genes Dev. 14, 2393–2409 (2000). 159. Trotter, K. W. & Archer, T. K. The BRG1 transcriptional coregulator. Nucl. Recept. Signal. 6, nrs.06004 (2008). 160. Weinberg, R. A. The retinoblastoma protein and cell cycle control. Cell 81, 323–330 (1995). 161. Hochegger, H., Takeda, S. & Hunt, T. Cyclin-dependent kinases and cell-cycle transitions: does one fit all? Nat. Rev. Mol. Cell Biol. 9, 910–916 (2008). 162. Sengupta, S. et al. The Evolutionarily Conserved C-terminal Domains in the Mammalian Retinoblastoma Tumor Suppressor Family Serve as Dual Regulators of Protein Stability and Transcriptional Potency *. J. Biol. Chem. 290, 14462–14475 (2015). 163. Gubern, A. et al. The N-Terminal Phosphorylation of RB by p38 Bypasses Its Inactivation by CDKs and Prevents Proliferation in Cancer Cells. Mol. Cell 64, 25– 36 (2016). 164. DeGregori, J., Leone, G., Miron, A., Jakoi, L. & Nevins, J. R. Distinct roles for E2F proteins in cell growth control and apoptosis. Proc. Natl. Acad. Sci. 94, 7245–7250 (1997). 165. Lukas, J., Petersen, B. O., Holm, K., Bartek, J. & Helin, K. Deregulated Expression of E2F Family Members Induces S-Phase Entry and Overcomes p16INK4A- Mediated Growth Suppression. Mol. Cell. Biol. 16, 1047–1057 (1996). 166. DeGregori, J., Leone, G., Ohtani, K., Miron, A. & Nevins, J. R. E2F-1 accumulation bypasses a G1 arrest resulting from the inhibition of G1 cyclin-dependent kinase activity. Genes Dev. 9, 2873–2887 (1995). 167. Qin, X. Q., Livingston, D. M., Kaelin, W. G. & Adams, P. D. Deregulated transcription factor E2F-1 expression leads to S-phase entry and p53-mediated apoptosis. Proc. Natl. Acad. Sci. 91, 10918–10922 (1994). 168. Johnson, D. G., Schwarz, J. K., Cress, W. D. & Nevins, J. R. Expression of transcription factor E2F1 induces quiescent cells to enter S phase. Nature 365, 349– 352 (1993). 169. Johnson, J. et al. Targeting the RB-E2F pathway in breast cancer. Oncogene 35, 4829–4835 (2016). 151 170. Wu, L. et al. The E2F1–3 transcription factors are essential for cellular proliferation. Nature 414, 457–462 (2001). 171. Yamasaki, L. et al. Tumor Induction and Tissue Atrophy in Mice Lacking E2F-1. Cell 85, 537–548 (1996). 172. Field, S. J. et al. E2F-1 Functions in Mice to Promote Apoptosis and Suppress Proliferation. Cell 85, 549–561 (1996). 173. Humbert, P. O. et al. E2f3 is critical for normal cellular proliferation. Genes Dev. 14, 690–703 (2000). 174. Dagnino, L. et al. Expression patterns of the E2F family of transcription factors during mouse nervous system development. Mech. Dev. 66, 13–25 (1997). 175. Trimarchi, J. M. & Lees, J. A. Sibling rivalry in the E2F family. Nat. Rev. Mol. Cell Biol. 3, 11–20 (2002). 176. Gaubatz, S. et al. E2F4 and E2F5 Play an Essential Role in Pocket Protein–Mediated G1 Control. Mol. Cell 6, 729–735 (2000). 177. Litovchick, L. et al. Evolutionarily Conserved Multisubunit RBL2/p130 and E2F4 Protein Complex Represses Human Cell Cycle-Dependent Genes in Quiescence. Mol. Cell 26, 539–551 (2007). 178. Sadasivam, S. & DeCaprio, J. A. The DREAM complex: master coordinator of cell cycle-dependent gene expression. Nat. Rev. Cancer 13, 585–595 (2013). 179. Chen, C.-R., Kang, Y., Siegel, P. M. & Massagué, J. E2F4/5 and p107 as Smad Cofactors Linking the TGFβ Receptor to c-myc Repression. Cell 110, 19–32 (2002). 180. Kowalik, T. F. Smad About E2F: TGFβ Repressionof c-Myc via a Smad3/E2F/p107 Complex. Mol. Cell 10, 7–8 (2002). 181. Lindeman, G. J. et al. A specific, nonproliferative role for E2F-5 in choroid plexus function revealed by gene targeting. Genes Dev. 12, 1092–1098 (1998). 182. Polanowska, J. et al. Human E2F5 gene is oncogenic in primary rodent cells and is amplified in human breast tumors. Genes. Chromosomes Cancer 28, 126–130 (2000). 183. Jiang, Y. et al. A potential oncogenic role of the commonly observed E2F5 overexpression in hepatocellular carcinoma. World J. Gastroenterol. 17, 470–477 (2011). 184. Liu, Y., Liu, D. & Wan, W. MYCN-induced E2F5 promotes neuroblastoma cell proliferation through regulating cell cycle progression. Biochem. Biophys. Res. Commun. 511, 35–40 (2019). 152 185. Donzelli, S. et al. MicroRNA-128-2 targets the transcriptional repressor E2F5 enhancing mutant p53 gain of function. Cell Death Differ. 19, 1038–1048 (2012). 186. Zou, C. et al. Up-regulated MicroRNA-181a induces carcinogenesis in Hepatitis B virus-related hepatocellular carcinoma by targeting E2F5. BMC Cancer 14, 97 (2014). 187. Cai, C., Huo, Q., Wang, X., Chen, B. & Yang, Q. SNHG16 contributes to breast cancer cell migration by competitively binding miR-98 with E2F5. Biochem. Biophys. Res. Commun. 485, 272–278 (2017). 188. Fang, D.-Z. et al. MicroRNA-129-3p suppresses tumor growth by targeting E2F5 in glioblastoma. Eur. Rev. Med. Pharmacol. Sci. 22, 1044–1050 (2018). 189. Yuwanita, I., Barnes, D., Monterey, M. D., O’Reilly, S. & Andrechek, E. R. Increased metastasis with loss of E2F2 in Myc -driven tumors. Oncotarget 6, 38210–38224 (2015). 190. Fujiwara, K., Yuwanita, I., Hollern, D. P. & Andrechek, E. R. Prediction and Genetic Demonstration of a Role for Activator E2Fs in Myc-Induced Tumors. Cancer Res. 71, 1924–1932 (2011). 191. Jusino, S. et al. E2F3 drives the epithelial-to-mesenchymal transition, cell invasion, and metastasis in breast cancer. Exp. Biol. Med. 246, 2057–2071 (2021). 192. Khaleel, S. S., Andrews, E. H., Ung, M., DiRenzo, J. & Cheng, C. E2F4 regulatory program predicts patient survival prognosis in breast cancer. Breast Cancer Res. 16, 486 (2014). 193. Andrechek, E. R., Mori, S., Rempel, R. E., Chang, J. T. & Nevins, J. R. Patterns of that characterize mammary development. cell signaling pathway activation Development 135, 2403–2413 (2008). 194. Bach, K. et al. Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing. Nat. Commun. 8, 2128 (2017). 195. Han, X. et al. Mapping the Mouse Cell Atlas by Microwell-Seq. Cell 172, 1091- 1107.e17 (2018). 196. To, B. et al. Insight into mammary gland development and tumor progression in an (2024) knockout mouse model. Oncogene conditional 1–14 E2F5 doi:10.1038/s41388-024-03172-4. 197. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011). 198. Krzywinski, M. et al. Circos: An information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009). 153 199. Auwera, G. A. V. der & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. (O’Reilly Media, Inc., 2020). 200. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016). 201. Zhao, Y. et al. FBXO15 plays a critical suppressive functional role in regulation of breast cancer progression. Signal Transduct. Target. Ther. 6, 1–4 (2021). 202. Buckup, M. et al. Plectin is a regulator of prostate cancer growth and metastasis. Oncogene 40, 663–676 (2021). 203. Expression of TSHZ1 in cancer - Summary - The Human Protein Atlas. https://www.proteinatlas.org/ENSG00000179981-TSHZ1/pathology. 204. Fei, Y., Wu, Y., Chen, L., Yu, H. & Pan, L. Comprehensive pan-carcinoma analysis of ITGB1 distortion and its potential clinical significance for cancer immunity. Discov. Oncol. 15, 47 (2024). 205. Xu, G. et al. ARID1A determines luminal identity and therapeutic response in estrogen-receptor-positive breast cancer. Nat. Genet. 52, 198–207 (2020). 206. Gala, K. et al. KMT2C mediates the estrogen dependence of breast cancer through regulation of ERα enhancer function. Oncogene 37, 4692–4710 (2018). 207. Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016). 208. Konarska, M. M., Grabowski, P. J., Padgett, R. A. & Sharp, P. A. Characterization of the branch site in lariat RNAs produced by splicing of mRNA precursors. Nature 313, 552–557 (1985). 209. RNA Splicing | Learn Science at Scitable. http://www.nature.com/scitable/topicpage/rna-splicing-introns-exons-and- spliceosome-12375. 210. Yeh, C.-H., Bellon, M. & Nicot, C. FBXW7: a critical tumor suppressor of human cancers. Mol. Cancer 17, 1–19 (2018). 211. Madeira, F. et al. The EMBL-EBI Job Dispatcher sequence analysis tools framework in 2024. Nucleic Acids Res. 52, W521–W525 (2024). 212. de Bruijn, I. et al. Analysis and Visualization of Longitudinal Genomic and Clinical Data from the AACR Project GENIE Biopharma Collaborative in cBioPortal. Cancer Res. 83, 3861–3867 (2023). 154 213. Gao, J. et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Sci. Signal. 6, pl1–pl1 (2013). 214. Cerami, E. et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 2, 401–404 (2012). 215. Qian, X.-L. et al. Caveolin-1: a multifaceted driver of breast cancer progression and its application in clinical treatment. OncoTargets Ther. 12, 1539 (2019). 216. Nwosu, Z. C., Ebert, M. P., Dooley, S. & Meyer, C. Caveolin-1 in the regulation of cell metabolism: a cancer perspective. Mol. Cancer 15, 71 (2016). 217. Wang, Y. et al. CAV2 promotes the invasion and metastasis of head and neck squamous cell carcinomas by regulating S100 proteins. Cell Death Discov. 8, 1–11 (2022). 218. Wood, G. E., Hockings, H., Hilton, D. M. & Kermorgant, S. The role of MET in chemotherapy resistance. Oncogene 40, 1927–1941 (2021). 219. Zhan, T., Rindtorff, N. & Boutros, M. Wnt signaling in cancer. Oncogene 36, 1461– 1473 (2017). 220. Attwooll, C., Denchi, E. L. & Helin, K. The E2F family: specific functions and overlapping interests. EMBO J. 23, 4709–4716 (2004). 221. Nik-Zainal, S. et al. Mutational Processes Molding the Genomes of 21 Breast Cancers. Cell 149, 979–993 (2012). 222. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). 223. Kipreos, E. T. & Pagano, M. The F-box protein family. Genome Biol. 1, 1–7 (2000). 224. Bailey, M. H. et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371-385.e18 (2018). 225. Yu, F. et al. Wnt/β-catenin signaling in cancers and targeted therapies. Signal Transduct. Target. Ther. 6, 1–24 (2021). 226. Hwang, N. et al. Caveolin-1 mediates the utilization of extracellular proteins for survival in refractory gastric cancer. Exp. Mol. Med. 55, 2461–2472 (2023). 227. Ren, L. et al. Caveolin-1 is a prognostic marker and suppresses the proliferation of breast cancer. Transl. Cancer Res. 10, 3797 (2021). 228. Díaz, M. I. et al. Caveolin-1 suppresses tumor formation through the inhibition of the unfolded protein response. Cell Death Dis. 11, 1–16 (2020). 155 229. Andrechek, E. R. et al. Amplification of the neu/erbB-2 oncogene in a mouse model of mammary tumorigenesis. Proc. Natl. Acad. Sci. 97, 3444–3449 (2000). 230. Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. 231. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). 232. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 233. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). 234. McKenna, A. et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). 235. Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012). 236. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012). 237. Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014). 238. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 6, 80–92 (2012). 239. Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLOS Comput. Biol. 12, e1004873 (2016). 240. Xu, J., Chen, Y. & Olopade, O. I. MYC and Breast Cancer. Genes Cancer 1, 629– 640 (2010). 241. Walhout, A. J. M., Gubbels, J. M., Bernards, R., van der Vliet, P. C. & Timmers, H. Th. M. c-Myc/Max Heterodimers Bind Cooperatively to the E-Box Sequences Located in the First Intron of the Rat Ornithine Decarboxylase (ODC) Gene. Nucleic Acids Res. 25, 1493–1501 (1997). 242. Perna, D. et al. Genome-wide mapping of Myc binding and gene regulation in serum- stimulated fibroblasts. Oncogene 31, 1695–1709 (2012). 243. Dang, C. V. MYC on the Path to Cancer. Cell 149, 22–35 (2012). 156 244. Deming, S. L., Nass, S. J., Dickson, R. B. & Trock, B. J. C-myc amplification in breast cancer: a meta-analysis of its occurrence and prognostic relevance. Br. J. Cancer 83, 1688–1695 (2000). 245. Blancato, J., Singh, B., Liu, A., Liao, D. J. & Dickson, R. B. Correlation of amplification and overexpression of the c-myc oncogene in high-grade breast cancer: FISH, in situ hybridisation and immunohistochemical analyses. Br. J. Cancer 90, 1612–1619 (2004). 246. Casciano, J. C. et al. MYC regulates fatty acid metabolism through a multigenic program in claudin-low triple negative breast cancer. Br. J. Cancer 122, 868–884 (2020). 247. Chandriani, S. et al. A Core MYC Gene Expression Signature Is Prominent in Basal- Like Breast Cancer but Only Partially Overlaps the Core Serum Response. PLOS ONE 4, e6693 (2009). 248. Lin, C. Y. et al. Transcriptional Amplification in Tumor Cells with Elevated c-Myc. Cell 151, 56–67 (2012). 249. Alluri, P. & Newman, L. Basal-like and Triple Negative Breast Cancers: Searching For Positives Among Many Negatives. Surg. Oncol. Clin. N. Am. 23, 567–577 (2014). 250. Andrechek, E. R. et al. Genetic heterogeneity of Myc-induced mammary tumors reflecting diverse phenotypes including metastatic potential. Proc. Natl. Acad. Sci. 106, 16387–16392 (2009). 251. D’Cruz, C. M. et al. c-MYC induces mammary tumorigenesis by means of a preferred pathway involving spontaneous Kras2 mutations. Nat. Med. 7, 235–239 (2001). 252. Sakamoto, K., Schmidt, J. W. & Wagner, K.-U. Mouse Models of Breast Cancer. in Mouse Models of Cancer: Methods and Protocols (eds. Eferl, R. & Casanova, E.) 47–71 (Springer, New York, NY, 2015). doi:10.1007/978-1-4939-2297-0_3. 253. Moody, S. E. et al. Conditional activation of Neu in the mammary epithelium of transgenic mice results in reversible pulmonary metastasis. Cancer Cell 2, 451–461 (2002). 254. Gunther, E. J. et al. Impact of p53 loss on reversal and recurrence of conditional Wnt-induced tumorigenesis. Genes Dev. 17, 488–501 (2003). 255. Pfefferle, A. D. et al. Transcriptomic classification of genetically engineered mouse models of breast cancer identifies human subtype counterparts. Genome Biol. 14, R125 (2013). 256. Hollern, D. P. & Andrechek, E. R. A genomic analysis of mouse models of breast cancer reveals molecular features ofmouse models and relationships to human breast cancer. Breast Cancer Res. 16, R59 (2014). 157 257. Manning, H. C., Buck, J. R. & Cook, R. S. Mouse Models of Breast Cancer: Platforms for Discovering Precision Imaging Diagnostics and Future Cancer Medicine. J. Nucl. Med. 57, 60S-68S (2016). 258. Pagès, F. et al. Immune infiltration in human tumors: a prognostic factor that should not be ignored. Oncogene 29, 1093–1102 (2010). 259. Schmidt, D. R. et al. Metabolomics in cancer research and emerging applications in clinical oncology. CA. Cancer J. Clin. 71, 333–358 (2021). 260. Wu, S., Zhu, W., Thompson, P. & Hannun, Y. A. Evaluating intrinsic and non-intrinsic cancer risk factors. Nat. Commun. 9, 3490 (2018). 261. Fisher, B. et al. Tamoxifen for Prevention of Breast Cancer: Report of the National Surgical Adjuvant Breast and Bowel Project P-1 Study. JNCI J. Natl. Cancer Inst. 90, 1371–1388 (1998). 262. Piccart-Gebhart, M. J. et al. Trastuzumab after Adjuvant Chemotherapy in HER2- Positive Breast Cancer. N. Engl. J. Med. 353, 1659–1672 (2005). 263. Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013). 264. Swiatnicki, M. R. et al. Elevated phosphorylation of EGFR in NSCLC due to mutations in PTPRH. PLOS Genet. 18, e1010362 (2022). 265. Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020). 266. Baslan, T. et al. Novel insights into breast cancer copy number genetic heterogeneity revealed by single-cell genome sequencing. eLife 9, e51480 (2020). 267. Staaf, J. et al. High-resolution genomic and expression analyses of copy number alterations in HER2-amplified breast cancer. Breast Cancer Res. 12, R25 (2010). 268. Ulz, P. et al. Whole-genome plasma sequencing reveals focal amplifications as a driving force in metastatic prostate cancer. Nat. Commun. 7, 12008 (2016). 269. Mertens, F., Johansson, B. & Mitelman, F. Isochromosomes in neoplasia. Genes. Chromosomes Cancer 10, 221–230 (1994). 270. Mennuti, M. T. Chapter 5 - Cytogenetics: Part 2, Structural Chromosome Rearrangements and Reproductive Impact. in Perinatal Genetics (eds. Norton, M. E., Kuller, J. A. & Dugoff, L.) 39–48 (Elsevier, Philadelphia, 2019). doi:10.1016/B978- 0-323-53094-1.00005-9. 271. Shao, X. et al. Copy number variation is highly correlated with differential gene expression: a pan-cancer study. BMC Med. Genet. 20, 175 (2019). 158 272. Ohshima, K. et al. Integrated analysis of gene expression and copy number identified potential cancer driver genes with amplification-dependent overexpression in 1,454 solid tumors. Sci. Rep. 7, 641 (2017). 273. Rivas, M. A. et al. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 348, 666–669 (2015). 274. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020). 275. Zhan, L. et al. Deregulation of Scribble Promotes Mammary Tumorigenesis and Reveals a Role for Cell Polarity in Carcinoma. Cell 135, 865–878 (2008). 276. Paschka, P. et al. Adverse Prognostic Significance of KIT Mutations in Adult Acute Myeloid Leukemia With inv(16) and t(8;21): A Cancer and Leukemia Group B Study. J. Clin. Oncol. 24, 3904–3911 (2006). 277. di Masi, A. et al. Retinoic acid receptors: From molecular mechanisms to cancer therapy. Mol. Aspects Med. 41, 1–115 (2015). 278. Sievers, F. et al. Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011). 279. Wang, G., Tian, Y., Hu, Q., Xiao, X. & Chen, S. PML/RARa blocks the differentiation and promotes the proliferation of acute promyelocytic leukemia through activating MYB expression by transcriptional and epigenetic regulation mechanisms. J. Cell. Biochem. 120, 1210–1220 (2019). 280. Kin Ting Kam, R., Deng, Y., Chen, Y. & Zhao, H. Retinoic acid synthesis and functions in early embryonic development. Cell Biosci. 2, 1–14 (2012). 281. Ornitz, D. M. & Itoh, N. The Fibroblast Growth Factor signaling pathway. WIREs Dev. Biol. 4, 215–266 (2015). 282. Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016). 283. Rueda, O. M. et al. Dynamics of breast-cancer relapse reveal late-recurring ER- positive genomic subgroups. Nature 567, 399–404 (2019). 284. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012). 285. Horiuchi, D. et al. MYC pathway activation in triple-negative breast cancer is synthetic lethal with CDK inhibition. J. Exp. Med. 209, 679–696 (2012). 286. Yin, L., Duan, J.-J., Bian, X.-W. & Yu, S. Triple-negative breast cancer molecular subtyping and treatment progress. Breast Cancer Res. 22, 61 (2020). 159 287. Bertucci, F., Finetti, P. & Birnbaum, D. Basal Breast Cancer: A Complex and Deadly Molecular Subtype. Curr. Mol. Med. 12, 96–110 (2012). 288. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007). 289. Parker, J. S. et al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes. J. Clin. Oncol. 27, 1160–1167 (2009). 290. Maaten, L. van der & Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). 291. Polyak, K. Breast cancer: origins and evolution. J. Clin. Invest. 117, 3155–3163 (2007). 292. Sørlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. 98, 10869–10874 (2001). 293. Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6 (2020). 294. Newton, E. E., Mueller, L. E., Treadwell, S. M., Morris, C. A. & Machado, H. L. Molecular Targets of Triple-Negative Breast Cancer: Where Do We Stand? Cancers 14, 482 (2022). 295. Skoulidis, F. et al. Sotorasib for Lung Cancers with KRAS p.G12C Mutation. N. Engl. J. Med. 384, 2371–2381 (2021). 296. Jänne, P. A. et al. Adagrasib in Non–Small-Cell Lung Cancer Harboring a KRASG12C Mutation. N. Engl. J. Med. 387, 120–131 (2022). 297. Coombes, R. C. et al. Results of the phase IIa RADICAL trial of the FGFR inhibitor AZD4547 in endocrine resistant breast cancer. Nat. Commun. 13, 3246 (2022). 298. Campbell, B. B. et al. Comprehensive Analysis of Hypermutation in Human Cancer. Cell 171, 1042-1056.e10 (2017). 299. Creighton, C. J. The molecular profile of luminal B breast cancer. Biol. Targets Ther. 6, 289–297 (2012). 300. Prat, A. et al. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 12, R68 (2010). 301. Pariyar, M., Johns, A., Thorne, R. F., Scott, R. J. & Avery-Kiejda, K. A. Copy number variation in triple negative breast cancer samples associated with lymph node metastasis. Neoplasia 23, 743–753 (2021). 160 302. Shadeo, A. & Lam, W. L. Comprehensive copy number profiles of breast cancer cell model genomes. Breast Cancer Res. 8, R9 (2006). 303. Andrechek, E. R. HER2/Neu tumorigenesis and metastasis is regulated by E2F activator transcription factors. Oncogene 34, 217–225 (2015). 304. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013). 305. Picard Tools - By Broad Institute. https://broadinstitute.github.io/picard/. 306. Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011). 307. Reich, M. et al. GenePattern 2.0. Nat. Genet. 38, 500–501 (2006). 308. Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. Mach. Learn. PYTHON. 309. Bedre, R. reneshbedre/bioinfokit. (2024). 310. Wolff, A. C. et al. American Society of Clinical Oncology/College of American Pathologists Guideline Recommendations for Human Epidermal Growth Factor Receptor 2 Testing in Breast Cancer. J. Clin. Oncol. 25, 118–145 (2006). 311. Wolff, A. C. et al. Human Epidermal Growth Factor Receptor 2 Testing in Breast Cancer: American Society of Clinical Oncology–College of American Pathologists Guideline Update. Arch. Pathol. Lab. Med. 147, 993–1000 (2023). 312. Yarden, Y. & Sliwkowski, M. X. Untangling the ErbB signalling network. Nat. Rev. Mol. Cell Biol. 2, 127–137 (2001). 313. Marra, A., Chandarlapaty, S. & Modi, S. Management of patients with advanced- stage HER2-positive breast cancer: current evidence and future perspectives. Nat. Rev. Clin. Oncol. 21, 185–202 (2024). 314. Schechter, A. L. et al. The neu oncogene: an erb-B-related gene encoding a 185,000-Mr tumour antigen. Nature 312, 513–516 (1984). 315. Coussens, L. et al. Tyrosine Kinase Receptor with Extensive Homology to EGF Receptor Shares Chromosomal Location with neu Oncogene. Science 230, 1132– 1139 (1985). 316. Carpenter, G., King, L. & Cohen, S. Epidermal growth factor stimulates phosphorylation in membrane preparations in vitro. Nature 276, 409–410 (1978). 317. Di Fiore, P. P. et al. erbB-2 Is a Potent Oncogene When Overexpressed in NIH/3T3 Cells. Science 237, 178–182 (1987). 161 318. Hudziak, R. M., Schlessinger, J. & Ullrich, A. Increased expression of the putative growth factor receptor p185HER2 causes transformation and tumorigenesis of NIH 3T3 cells. Proc. Natl. Acad. Sci. 84, 7159–7163 (1987). 319. Slamon, D. J. et al. Human Breast Cancer: Correlation of Relapse and Survival with Amplification of the HER-2/neu Oncogene. Science 235, 177–182 (1987). 320. Drebin, J. A., Link, V. C., Stern, D. F., Weinberg, R. A. & Greene, M. I. Down- modulation of an oncogene protein product and reversion of the transformed phenotype by monoclonal antibodies. Cell 41, 695–706 (1985). 321. Carter, P. et al. Humanization of an anti-p185HER2 antibody for human cancer therapy. Proc. Natl. Acad. Sci. 89, 4285–4289 (1992). 322. Cho, H.-S. et al. Structure of the extracellular region of HER2 alone and in complex with the Herceptin Fab. Nature 421, 756–760 (2003). 323. Hudis, C. A. Trastuzumab — Mechanism of Action and Use in Clinical Practice. N. Engl. J. Med. 357, 39–51 (2007). 324. Minckwitz, G. von et al. Adjuvant Pertuzumab and Trastuzumab in Early HER2- Positive Breast Cancer. N. Engl. J. Med. 377, 122–131 (2017). 325. Park, J. W. et al. Adaptive Randomization of Neratinib in Early Breast Cancer. N. Engl. J. Med. 375, 11–22 (2016). 326. Murthy, R. K. et al. Tucatinib, Trastuzumab, and Capecitabine for HER2-Positive Metastatic Breast Cancer. N. Engl. J. Med. 382, 597–609 (2020). 327. Geyer, C. E. et al. Lapatinib plus Capecitabine for HER2-Positive Advanced Breast Cancer. N. Engl. J. Med. 355, 2733–2743 (2006). 328. Geurts, S. M. E. et al. Time trends in real-world treatment patterns and survival in patients diagnosed with de novo HER2+ metastatic breast cancer: an analysis of the SONABRE registry. Breast Cancer Res. Treat. 205, 287–302 (2024). 329. Slamon, D. J. et al. Use of Chemotherapy plus a Monoclonal Antibody against HER2 for Metastatic Breast Cancer That Overexpresses HER2. N. Engl. J. Med. 344, 783– 792 (2001). 330. Giaquinto, A. N. et al. Breast cancer statistics 2024. CA. Cancer J. Clin. n/a,. 331. McFarland, C. D. et al. The Damaging Effect of Passenger Mutations on Cancer Progression. Cancer Res. 77, 4763–4772 (2017). 332. Webster, L. R. et al. Prohibitin expression is associated with high grade breast cancer but is not a driver of amplification at 17q21.33. Pathology (Phila.) 45, 629– 636 (2013). 162 333. Naylor, T. L. et al. High resolution genomic analysis of sporadic breast cancer using array-based comparative genomic hybridization. Breast Cancer Res. 7, R1186 (2005). 334. Gaughran, G., Chipman, M., Shackleton, M., Moore, M. & Iddawela, M. 26P Impact of 17q21 locus gene co-amplification on overall survival in HER2-positive breast cancer. Ann. Oncol. 31, S1225 (2020). 335. Bilal, E. et al. Amplified Loci on Chromosomes 8 and 17 Predict Early Relapse in ER-Positive Breast Cancers. PLoS ONE 7, e38575 (2012). 336. Staaf, J. et al. High-resolution genomic and expression analyses of copy number alterations in HER2-amplified breast cancer. Breast Cancer Res. BCR 12, R25 (2010). 337. Cowley, G. S. et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data 1, 140035 (2014). 338. Gao, H. et al. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat. Med. 21, 1318–1325 (2015). 339. Barcus, C. E. et al. Elevated collagen-I augments tumor progressive signals, intravasation and metastasis of prolactin-induced estrogen receptor alpha positive mammary tumor cells. Breast Cancer Res. 19, 9 (2017). 340. Conklin, M. W. et al. Aligned Collagen Is a Prognostic Signature for Survival in Human Breast Carcinoma. Am. J. Pathol. 178, 1221–1232 (2011). 341. Su, H. & Karin, M. Collagen architecture and signaling orchestrate cancer development. Trends Cancer 9, 764–773 (2023). 342. Hsu, K.-S. et al. Cancer cell survival depends on collagen uptake into tumor- associated stroma. Nat. Commun. 13, 7078 (2022). 343. Chen, Y. et al. Oncogenic collagen I homotrimers from cancer cells bind to α3β1 integrin and impact tumor microbiome and immunity to promote pancreatic cancer. Cancer Cell 40, 818-834.e9 (2022). 344. Ping, Q. et al. Cancer-associated fibroblasts: overview, progress, challenges, and directions. Cancer Gene Ther. 28, 984–999 (2021). 345. Batista, M. A. et al. Nanomechanical phenotype of chondroadherin-null murine articular cartilage. Matrix Biol. 38, 84–90 (2014). 346. Camper, L., Heinegård, D. & Lundgren-Åkerlund, E. Integrin α2β1 Is a Receptor for the Cartilage Matrix Protein Chondroadherin. J. Cell Biol. 138, 1159–1167 (1997). 163 347. Haglund, L. et al. Identification and Characterization of the Integrin α2β1 Binding Motif in Chondroadherin Mediating Cell Attachment *. J. Biol. Chem. 286, 3925–3934 (2011). 348. Rajalingam, K. et al. Prohibitin is required for Ras-induced Raf–MEK–ERK activation and epithelial cell migration. Nat. Cell Biol. 7, 837–843 (2005). 349. Signorile, A., Sgaramella, G., Bellomo, F. & Rasmo, D. D. Prohibitins: A Critical Role in Mitochondrial Functions and Implication in Diseases. Cells 8, 71 (2019). 350. Zhu, B., Fukada, K., Zhu, H. & Kyprianou, N. Prohibitin and Cofilin Are Intracellular Effectors of Transforming Growth Factor β Signaling in Human Prostate Cancer Cells. Cancer Res. 66, 8640–8647 (2006). 351. Yang, J., Li, B. & He, Q.-Y. Significance of prohibitin domain family in tumorigenesis and its implication in cancer diagnosis and treatment. Cell Death Dis. 9, 1–10 (2018). 352. Mlakar, V. et al. 17q Gain in Neuroblastoma: A Review of Clinical and Biological Implications. Cancers 16, 338 (2024). 353. Nagy, Z. et al. An ALYREF-MYCN coactivator complex drives neuroblastoma tumorigenesis through effects on USP3 and MYCN stability. Nat. Commun. 12, 1881 (2021). 354. Bown, N. et al. Gain of Chromosome Arm 17q and Adverse Outcome in Patients with Neuroblastoma. N. Engl. J. Med. 340, 1954–1961 (1999). 355. Schleicher, M. et al. Prohibitin-1 maintains the angiogenic capacity of endothelial cells by regulating mitochondrial function and senescence. J. Cell Biol. 180, 101 (2008). 356. Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). 357. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). 358. Aguirre, A. J. et al. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting. Cancer Discov. 6, 914–929 (2016). 359. Haapaniemi, E., Botla, S., Persson, J., Schmierer, B. & Taipale, J. CRISPR–Cas9 genome editing induces a p53-mediated DNA damage response. Nat. Med. 24, 927– 930 (2018). 360. Justus, C. R., Leffler, N., Ruiz-Echevarria, M. & Yang, L. V. In vitro Cell Migration and Invasion Assays. J. Vis. Exp. JoVE 51046 (2014) doi:10.3791/51046. 164 361. Liu, J. et al. Targeting PHB1 to inhibit castration-resistant prostate cancer progression in vitro and in vivo. J. Exp. Clin. Cancer Res. 42, 128 (2023). 362. Lamb, J. et al. The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 313, 1929–1935 (2006). 363. Han, E. K.-H. et al. Akt inhibitor A-443654 induces rapid Akt Ser-473 phosphorylation independent of mTORC1 inhibition. Oncogene 26, 5655–5661 (2007). 364. van Slooten, H.-J. et al. Outgrowth of BT-474 human breast cancer cells in immune- deficient mice: a new in vivo model for hormone-dependent breast cancer. Br. J. Cancer 72, 22–30 (1995). 365. Smith, A. E. et al. HER2 + breast cancers evade anti-HER2 therapy via a switch in driver pathway. Nat. Commun. 12, 6667 (2021). 366. Zhao, X. & Guan, J.-L. Focal adhesion kinase and its signaling pathways in cell migration and angiogenesis. Adv. Drug Deliv. Rev. 63, 610–615 (2011). 367. Schubert, M. et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat. Commun. 9, 20 (2018). 368. The human cell line - breast cancer - The Human Protein Atlas. https://www.proteinatlas.org/humanproteome/cell+line/breast+cancer. 369. BT-474 - HTB-20 | ATCC. https://www.atcc.org/products/htb-20. 370. BT-474 Human ductal breast carcinoma: New life for an old model. https://oncology.labcorp.com/bt-474-human-ductal-breast-carcinoma-new-life-old- model. 371. Jiang, P. et al. Systematic investigation of cytokine signaling activity at the tissue and single-cell levels. Nat. Methods 18, 1181–1191 (2021). 372. Keegan, A. D., Leonard, W. J. & Zhu, J. Recent advances in understanding the role of IL-4 signaling. Fac. Rev. 10, 71 (2021). 373. Broeker, C. D., Ortiz, M. M. O., Murillo, M. S. & Andrechek, E. R. Integrative multi- omic sequencing reveals the MMTV-Myc mouse model mimics human breast cancer heterogeneity. Breast Cancer Res. 25, 120 (2023). 374. PhD, S. B. How To Perform The Delta-Delta Ct Method. Top Tip Bio https://toptipbio.com/delta-delta-ct-pcr/ (2019). 375. Wang, X., Spandidos, A., Wang, H. & Seed, B. PrimerBank: a PCR primer database for quantitative gene expression analysis, 2012 update. Nucleic Acids Res. 40, D1144–D1149 (2012). 165 376. Justus, C. R., Marie, M. A., Sanderlin, E. J. & Yang, L. V. Transwell In Vitro Cell Migration and Invasion Assays. in Cell Viability Assays: Methods and Protocols (eds. Friedrich, O. & Gilbert, D. F.) 349–359 (Springer US, New York, NY, 2023). doi:10.1007/978-1-0716-3052-5_22. 377. Davenport, M. L., Sherrill, T. P., Blackwell, T. S. & Edmonds, M. D. Perfusion and inflation of the mouse lung for tumor histology. J. Vis. Exp. JoVE 10.3791/60605 (2020) doi:10.3791/60605. 378. Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023). 379. Lomakin, A. et al. Spatial genomics maps the structure, nature and evolution of cancer clones. Nature 611, 594–602 (2022). 380. Cardozo, T. & Pagano, M. The SCF ubiquitin ligase: insights into a molecular machine. Nat. Rev. Mol. Cell Biol. 5, 739–751 (2004). 381. Tobin, R. P. et al. Targeting MDSC Differentiation Using ATRA: A Phase I/II Clinical Trial Combining Pembrolizumab and All-Trans Retinoic Acid for Metastatic Melanoma. Clin. Cancer Res. 29, 1209–1219 (2023). 382. Ogrodzinski, M. P., Teoh, S. T. & Lunt, S. Y. Targeting Subtype-Specific Metabolic Preferences in Nucleotide Biosynthesis Inhibits Tumor Growth in a Breast Cancer Model. Cancer Res. 81, 303–314 (2021). 383. Jin, Z. et al. GATA2 promotes castration-resistant prostate cancer development by suppressing IFN-β axis-mediated antitumor immunity. Oncogene 43, 2595–2610 (2024). 384. Teo, M. Y., Rathkopf, D. E. & Kantoff, P. Treatment of Advanced Prostate Cancer. Annu. Rev. Med. 70, 479–499 (2019). 385. Sun, C. et al. Reversible and adaptive resistance to BRAF(V600E) inhibition in melanoma. Nature 508, 118–122 (2014). 386. Zhang, J., Cunningham, J. J., Brown, J. S. & Gatenby, R. A. Integrating evolutionary dynamics into treatment of metastatic castrate-resistant prostate cancer. Nat. Commun. 8, 1816 (2017). 387. Rangwala, S. H. et al. The NCBI Comparative Genome Viewer (CGV) is an interactive visualization tool for the analysis of whole-genome eukaryotic alignments. PLOS Biol. 22, e3002405 (2024). 166 APPENDIX A: FIGURES Figure 2.3: Paradigm of E2F functions. (A) Canonically, E2Fs play small but critical roles in the cell cycle. Heterodimerized E2F and DP pairs are sequestered by RB proteins until phosphorylation of RB proteins occurs by CDKs, thus releasing RB from the trimeric complex. E2F DP heterodimers are then available to bind DNA and induce transcription of cell cycle and cell fate genes. (B) However, more recently E2Fs have been found to play roles extending beyond cell cycle control. One such mechanism is inducing angiogenesis through E2F7 and E2F8 cooperating with HIF1α to increase transcription of VEGFA. Another mechanism includes E2Fs being mediators of the DNA damage response through increased transcription of E2F7 and E2F8, while simultaneously phosphorylating E2F1 to halt progression of the cell cycle. 167 Figure 2.4.1: Schematic of E2F5 cKO and impact on mammary gland outgrowth and tumor formation. (A) Overview of the outcome of loxP excision by MMTV-Cre expression upon hormonal activation of the MMTV promoter/enhancer during puberty. E2F5 exons 2 and 3 reside between loxP oriented in the same direction, which upon Cre mediated recombination, this will result in the excision of DNA contained between the loxP sites. After recombination, E2F5 exons 1 and 4 will reside next to each other. (B) Compared to control MMTV-Cre mice mammary gland outgrowth, the overall distance growth to the lymph node was impaired in E2F5 cKO mice in a statistically significant manner. Briana To is responsible for generating Figure 2.4.1B . (C) MMTV-Cre mice were tumor free during the duration of the study, while homozygous multiparous E2F5 cKO developed tumors at an accelerated rate compared to homozygous nulliparous E2F5 cKO mice. Briana To is responsible for generating Figure 2.4.1C. 168 Figure 2.4.2: Confirmation of deletion of exons 2 and 3 in E2F5 tumors. (A) At the bottom of the figure in blue, pictured from left to right, are exons 2 and 3 respectively of E2F5. Paired end 150 base pair aligned WGS reads for each tumor, separated by black bars, are stacked above E2F5 exons 2 and 3 in IGV. Each gray pentagon represents a normal WGS read, with the point of the pentagon indicating directionality of the read. Red WGS paired end reads indicate presence of a deletion event. There is a deletion present in all tumors starting at position Chr3:14,586,714 and ending at Chr3:14,588,445 on the mm10 genome, the exact positions where the loxP sites are engineered. 169 Figure 2.4.3: Circos plot preview of WGS events by tumor, specific genes altered across tumors, and mutational driving mechanisms within each tumor. Circos plots illustrate genetic events for each tumor. The outermost ring is an ideogram of the mouse chromosomes, proportional in size to their base pair length within the mm10 genome. The next ring displays mutations colored yellow, orange, and red. Yellow mutation blocks are predicted to be of low impact by Mutect2. Orange mutation blocks are predicted to be of moderate impact by Mutect2. Red mutation blocks are predicted to be of high impact by Mutect2. The next inner ring depicts copy number changes within each tumor. Red blocks on this track indicate amplifications, while blue blocks indicate deletions. The length of each block is commensurate with the base pair length of the observed CNV event within the genome, while the height of the event is proportional to the amount of paired read evidence for each CNV event (e.g. one amplification event with 40 pieces of paired read 170 Figure 2.4.3 (cont’d) evidence will be a third the height of another amplification event with 120 pieces of paired read evidence). The innermost ring contains inversions, colored in black, and translocations, randomly assigned one of the colors of the chromosomes the translocation is present on. Circos plots are shown for tumors 3790 (A), 3790_2 (B), 3976 (C), 5492 (D), and 5691 (E). (F) The most frequently altered genes across all tumors are presented in oncoprint format. The top bar chart represents the total number of events per tumor of frequently altered oncogenes and tumor suppressors, while the side bar chart represents the total number of alterations per gene across all tumors. (G) DeconstructSigs determined etiologies of trinucleotide mutational base pair contexts for MMTV-Neu, MMTV-PyMT, and MMTV-Cre E2F5 cKO tumors. The most prevalent mutational signature in E2F5 cKO tumors is the clock-like aging signature noted by spontaneous deamination of 5-methylcytosine (SBS1). The second most prevalent signature is that of HRR deficiency (SBS3). Small contributions from mismatch repair (MMR) deficiency were noted, with other signatures unidentifiable or unknown. MMTV-Neu tumors display similar mutational profiles to E2F5 cKO tumors but with less HRR deficiency. MMTV-PyMT tumors have major contributions from aging signatures and tobacco smoking (SBS4) signatures, while a large number of mutations are of unknown origin. 171 Figure 2.4.4: Fbxo15 splice acceptor mutation is present in human cancer at the same site. (A) The G>T predicted high impact splice acceptor variant mutation at Chr18: 84,977,366 (mm10 genome) at the start of exon 9/10 of Fbxo15 in all 5 mouse MMTV- Cre E2F5 cKO tumors is shown. VAF vary from 0.5, indicative of heterozygosity of the mutation or contamination from normal tissue, to 1 in sample 3976, indicative of the mutation. (B) Highlight of mouse Chr18:84,977,366 and homozygosity of transcriptional direction. (C) DNA pairwise alignment of human and mouse Fbxo15 using Clustal Omega showed human chr18:74,082,052 was the homologous nucleotide in humans to the mutation identified in the mouse tumors. (D) Examining cBioPortal datasets of cancer, it was found that the same splice acceptor variant mutation has occurred in human cancer, with a cluster of other prevalent truncating and missense mutations around this site. 172 Figure 2.4.5: Conserved amplification of known oncogenes in E2F5 cKO tumors. (A) Overview of log2 fold change in copy number for all chromosomes of each sample. White indicates no change from normal, blue indicates lower copy number than normal, and red indicates increased copy number. (B) Zoomed in log2 fold change of chromosome 6 base pairs 15,000,000–20,000,000. This region encompasses genes Cav1, Cav2, Met, and Wnt2. (C) Further examination of WGS revealed that in a majority of the tumors with amplification over this region possessed an inversion event within the amplification event. One boundary of the amplification event is shared with the inversion event in these tumors. (D) Further zooming in on Cav1, Met, and Wnt2 (Cav2 omitted due to overlapping with Cav1) displayed in yellow bars, log2 fold copy number ratios for these genes are presented in context of immediately adjacent non-amplified DNA. Each gray dot represents an individual WGS read, while orange dots represent averages for WGS at a given genomic region. 173 Figure 3.4.1: Heterogeneous and conserved somatic features revealed by whole genome sequencing. Representative Circos plots are shown for (A) microacinar, (B) squamous, and (C) EMT histological subtypes. The outermost ring of each Circos plot depicts an ideogram for the mouse chromosomes proportionate to the actual chromosome length. The next inner ring shows mutations in genes as stacked blocks at their corresponding genomic locations, color coded to their predicted impact by Mutect2199—yellow for low impact, orange for moderate impact, and red for high impact. The next inner ring shows discrete copy number changes as analyzed by CNVKIT; red regions indicate amplification and blue regions indicate deletions. The height of each copy number alteration corresponds to the predicted change in copy number, with the lowest level change being and showing a max copy number change of  2. The innermost ring 174 Figure 3.4.1 (cont’d) reveals inversions and translocations as determined by the consensus of Delly and Lumpy somatic variant callers. Inversions are colored black, while translocations match the color of the ideogram of one of the two chromosomes involved in the translocation event. (D) A CNVKIT heatmap shows the log2 fold change of the estimated normalized copy number segments of each chromosome for each tumor sample relative to the wildtype reference. 175 Figure 3.4.2: Copy number changes drive gene expression changes. (A) Circos plot of correlation of copy number changes and gene expression across all tumor samples. Significant Kendall’s rank correlation coefficient (> = 0.3) shown in blue and significant Pearson correlation coefficient (> = 0.7) shown in red. (B) Unsupervised hierarchical clustering of the MSigDB C1 positional dataset for ssGSEA values closely recapitulates the stratification of histological subtypes by gene expression clustering. (C) Principal component analysis (PCA) of C1 positional ssGSEA values reveals distinct clustering by histological subtype. 176 Figure 3.4.3: Oncogenic drivers determine mutational heterogeneity. (A) Total counts of overall somatic mutational burden of MMTV-Myc tumors compared to mutational burden of MMTV-Neu and MMTV-PyMT tumors as shown by bar plot. 177 Figure 3.4.3 (cont’d) (B) Weights of Catalogue of Somatic Mutations in Cancer (COSMIC) mutational signatures derived from each tumor using DeconstructSigs207 depicted by stacked bar plot. (C) Venn diagram of conserved mutations (≥ 66% of tumors) between histological subtypes of moderate or high impact predicted by Mutect2. Putatively impactful oncogenes with Sanger sequencing confirmed mutations are listed by their representative histological subtype in which those mutations are conserved. 178 Figure 3.4.4: Heterogeneous activation of KRAS pathway in EMT histological subtype. (A) Proportion of each tumor histological subtype with activating mutations in 179 Figure 3.4.4 (cont’d) KRAS in bar plot format. (B) Sequence variation in KRAS between all tumors sequenced as shown by a logo plot illustrates canonically activating mutations in KRAS. (C) A volcano plot of the ssGSEA values from the MSigDB C6 oncogenic signature gene set, showing EMT upregulated or downregulated gene sets compared to microacinar and squamous. (D) Violin plot of a representative KRAS pathway signature from the ssGSEA values of the C6 oncogenic signature gene set, showing distinct upregulation of KRAS signaling in the EMT subtype. (E) Heatmap of log2 fold change in copy number segmentation values showing high-level amplification of FGFR2 in EMT. (F) Canonical molecular pathway signaling reveals FGFR2 lies directly upstream of KRAS (Created with BioRender.com) 180 intermediate phenotype between Figure 3.4.5: Squamous represents an microacinar and EMT. (A) PCA of the MSigDB C2 curated and (B) C6 oncogenic signature gene sets for ssGSEA values of the microacinar, squamous, and EMT tumors recapitulates C1 clustering and explains more variance in the data. (C) Pairwise relationship plots for representative C2 gene set and (D) C6 gene set ssGSEA values are shown with linear regression lines and a 95% CI. Pearson R correlation values are shown with p-values determined from a two-sided t-test. 181 Figure 3.4.6: Mouse model genetic and transcriptional events inform human clinical outcomes. (A) Unsupervised hierarchical clustering of METABRIC gene expression values by a list of 453 homologous genes that are in a conserved amplification/deletion event and are differentially expressed between MMTV-Myc histological subtypes. (B) Unsupervised hierarchical clustering of METABRIC gene expression values by the PAM50 gene set. (C) Overall survival (OS) KM analysis of non-redundant TCGA breast cancer patients, accessed through cBioPortal, stratified by Myc amplification status. (D) OS KM curve of non-redundant TCGA breast cancer patients stratified by KRAS alteration status. (E) OS KM curve of non-redundant TCGA breast cancer patients stratified by FGFR2 amplification status. 182 Figure 3.4.7: MMTV-Myc histological subtypes are representative of different human breast cancer intrinsic subtypes. (A) t-SNE performed on human METABRIC and mouse model gene expression samples using the 32 homologous gene subset of PAM50 as determined by recursive feature elimination with cross-validation. (B) Normalized scoring metrics for the soft voting classifier including accuracy for measuring true positives, a weighted F1 scoring metric for balancing precision and recall, and an MCC metric for taking into account false positives and false negatives even in the case of unbalanced classes. (C) A confusion matrix where true positives lie along the diagonal 183 Figure 3.4.7 (cont’d) from top left to bottom right and false values occupy all other boxes. (D) Bar chart of proportional probabilities of each model representing human intrinsic breast cancer subtypes as determined by the soft voting classifier. Human intrinsic subtype proportion was determined directly from proportions of METABRIC breast cancer patients subtyped. (E) Bar chart of MMTV-Myc histological subtypes and proportional probabilities of each subtype corresponding to different human breast cancer intrinsic subtypes determined by the soft voting classifier. 184 Figure S 3.4.7.1: PCA of pre batch effect removal data. PCA plot of Z-scaled data for METABRIC gene expression data, and also MMTV-Myc, MMTV-Neu, and MMTV-PyMT primary mammary mouse tumors and visualized using a matplotlib scatterplot in Python. The first principal component accounts for 44.6% of the variance within the dataset, with three distinct clusters that correspond to different sampling groups appearing. The second principal component accounts for 8.6% of variance within the dataset. This figure serves to demonstrate that the main differences between datasets lie not between subtypes, but between the datasets themselves. 185 Figure S 3.4.7.2: PCA of post batch effect removal data. PCA plot of Z-scaled and batch corrected METABRIC, MMTV-Myc, MMTV-Neu, and MMTV-PyMT primary mammary tumor gene expression data. Batch correction was performed using ComBat with settings defined in the main text methods section. Principal components are plotted using a scatterplot in matplotlib within Python. The first principal component in the batch corrected dataset accounts for 5.9% of variance, while the second principal component accounts for 4.5% of variance. 186 Figure S 3.4.7.3: RFECV determines only 32 of PAM50 genes are optimal for differentiation between intrinsic subtypes of breast cancer. 10 k-fold RFECV using a SVM classifier with RBF kernel was performed on Z-scored and batch corrected METABRIC gene expression data—all mouse data was removed after batch correction and Z-scoring but before RFECV. Gene expression data was limited to the remaining PAM50 genes after combining human and mouse datasets, which was 45 genes. The vertical dashed line was placed at the optimal number of features, which is the number of features which give the highest average accuracy score over the 10 iterations the test is performed. The light blue shaded area represents one standard deviation away from the mean accuracy score. RFECV was performed using the Yellowbrick package in Python. 187 Figure 4.3: Reexamination of WGS read evidence does not support an amplification event over Col1A1, Chad, and Phb1 in previously sequenced MMTV-Neu tumors. Although previously purported to have amplification over the 11qD cytoband in most MMTV-Neu primary mammary tumor samples sequenced, reexamination of the WGS data for these same tumors (A, B, and C) found no strong amplification or deletion support over the entire genome of each tumor. This contrasts with the strong WGS read support for amplification of chromosome 11 (+1 copy) and 15 (+2 copies) in microacinar MMTV-Myc samples (D) or the very strong evidence for a focal amplification on chromosome 7 over Fgfr2 in one of the MMTV-Myc EMT tumor samples (E). 188 Figure 4.4.1: The 17q21 cytoband amplification is associated with worse clinical outcomes independent of ERBB2 coamplification. (A) An oncoprint displays the total number of cBioPortal breast cancer cases and the frequency of genetic alteration for each gene. (B) An upset plot illustrates the number of cooccurrences of genetic alteration for each gene compared to every other gene. (C) RFS KM plot is shown for all cBioPortal breast cancer cases with no alterations to ERBB2 or PHB1, ERBB2 alterations excluding PHB1 alterations, and PHB1 alterations excluding ERBB2 alterations. (D) OS KM plot is shown for all cBioPortal breast cancer cases with no alterations to ERBB2 or PHB1, ERBB2 alterations excluding PHB1 alterations, and PHB1 alterations excluding ERBB2 alterations. 189 Figure 4.4.2: Expression of 11qD genes is spatially heterogeneous between tumors. Three representative MMTV-Neu tumor samples are stained with Col1A1, Chad, Phb1, or Cd31 primary antibodies for use in IHC. Protein expression for Col1A1, Chad, and Phb1 and location of protein expression is heterogeneous between tumors presented. Staining of Cd31, indicative of endothelial cells and blood vessels, is relatively consistent throughout all tumors 18 tumors imaged. 190 Figure 4.4.3: shRNA knockdown of COL1A1, CHAD, or PHB1 in BT-474 reduced gene expression but did not affect proliferation. Delta-delta Ct comparison of qRT- PCR data found statistically significantly reduced gene expression of COL1A1 (A), CHAD (B), and PHB1 (C) in their corresponding shRNA treatment relative to both WT BT-474 and universal scramble shRNA BT-474. Gene expression was normalized to expression of GAPDH in each cell line. (D) Plating approximately 50,000 cells of each cell line, growth rates of knockdown cell lines were not statistically significantly different compared to WT or scramble KD cell lines. 191 Figure 4.4.4: All BT-474 cell lines have poor migration and invasion capabilities in vitro. (A) Representative images of 8 µm transwell films were stained with crystal violet and then examined under a microscope at 4x optical zoom. (B) Quantification of number of colonies for panels shown previously. The mean of triplicates for unaltered transwell films (migration assays) and transwell films with diluted Matrigel (invasion assays) were plotted for each group with standard deviations also shown. No colonies/cells were evident for either the 24 hour or 72 hour invasion time points. 192 Figure 4.4.5: KD of PHB1 does not alter cell response to AKT inhibition. (A) Differential gene expression analysis between HER2+ either with or without the 17q21 cytoband amplification showed that PHB1 is the most significantly upregulated gene for the 17q21 cytoband amplification population. (B) The 50 most upregulated and the 50 most downregulated genes in the 17q21 cytoband amplified population were processed using Connectivity Map Query tool. Top perturbagens are plotted by normalized connectivity score and false discovery rate log base 10 q-value. (C) Western blot analysis shows 30 µg of MMTV-Neu tumor positive control and all BT-474 cell lines incubated against vinculin, pan-AKT, and Serine 473 phospho-AKT antibodies. (D) Cell death dose- response curve for all BT-474 cell lines incubated with various A-446354 doses for 72 hours. 100% response is complete cell death, while 0% response represents the average of untreated BT-474 WT cell growth after 72 hours. 193 Figure 4.4.6: BT-474 is poorly tumorigenic in BALB/c nude mice through orthotopic tail vein injection or tail vein injection. (A) Tumor volume is measured via calipers over time on select days post orthotopic mammary fat pad injection of 100 µL containing 1x10^6 cells of the respective BT-474 cell line into the #4 right mammary gland of BALB/c nude female mice. Measurement intervals were increased over time due to the slow growth rate of tumors. (B) Representative H&E stained lung section at 10x magnification after 9 weeks of growth of BT-474 cell lines from tail vein injection. (C) A tumor lodged in lung parenchyma from BALB/c nude female mouse injected with BT-474 WT cell line at 10x magnification. (D) A tumor lodged in lung parenchyma from BALB/c nude female mouse injected with BT-474 CHAD KD cell line at 10x magnification. 194 Figure 4.4.7: Patients with 17q21 cytoband amplification are luminal B and have increased breast cancer metastasis. (A) TCGA METABRIC gene expression data underwent unsupervised hierarchical clustering with ERBB2 amplification status, COL1A1 amplification status, CHAD amplification status, PHB1 amplification status, and intrinsic molecular subtype displayed. Gene expression data is colored by Z score ranking of all genes within the limited PAM50 gene set. Patient samples are separated on the x- axis, while PAM50 genes are separated by the y-axis. (B-G) When limiting to patients with ERBB2 amplification, and then separating gene expression of patients into those that have COL1A1/CHAD/PHB1 amplification or not, GSEA was performed with normalized enrichment scores shown for significant MSigDB C2 curated gene sets. 195 Figure S 4.6: Vector map of shRNA plasmids. Elements of the plasmid vector stayed the same between lentiviral transduction of different BT-474 populations, except for the specific shRNA target sequence specific to the gene of interest. The Vector Builder base plasmid is pLV[shRNA]-Puro-U6 meant for mammalian shRNA KD. 196 APPENDIX B: TABLES Table 3.4.6.1: Integrative gene set for clustering. Integrative mouse gene set found by taking all genes that were differentially expressed between EMT, squamous, and microacinar histological subtypes that also had copy number differences present from the 197 AAASBTG1DTNBP1GGPS1LGALS8NAA35PHYKPLREV3LSLC46A1TNK1ZFYVE16AARSD1C1QBPDYNLL2GLI3LIFRNAB2PICK1RFFLSLC48A1TNRC6BZFYVE26AATFCACNB3EBAG9GOLPH3LIG3NAGAPIM3RFNGSLC52A2TOB1ZMYND11ABCB7CAND1EDIL3GPAA1LIMS1NBR1PLA2G6RGS17SLU7TOM1L1ZWINTACADVLCANXEEA1GPD1LMAN2NCALDPMM1RHBDD3SMAD5TOM1L2ACO2CARD14EEF1DGPRC5CLPONCOR1PNPT1RNF167SMARCD2TOMM22ACOX1CASC3EFR3AGPS1LTA4HNEDD1POFUT2RNF185SNF8TOP1MTACTR10CBX1EGR2GPTLUC7L3NF2POLD2RNF41SNX3TPGS1ADCY6CBX4EIF2S3XGPX8LYZ2NFYBPOLDIP2RNF43SOAT2TRAF4ADCY8CBX5EIF3DGRHL2MAN1ANHP2POLG2RPA1SOS2TRIB1ADSLCCDC47EIF3EGRK6MAP1BNID1POLR2FRPL27SOX10TRIM16AFTPHCCT4EIF3HGTF2A1MAP4K5NIPAL2POLR3HRPS6KB1SPDL1TRIM25AHI1CCT5EIF3LGTPBP1MAPK9NLE1PPARARRM2BSREBF1TRIM37AKAP1CDH23EIF4A1GUK1MAT2BNME1PPM1ARTCBSREBF2TRMT10AAKAP10CELA1EIF5AH1F0MATKNME2PPP1R1BRTKN2SRGNTRPS1ALG12CENPVELAC2HABP4MBTD1NOL11PRDM1RUFY1SRP68TSR1AMACRCHADELK3HDAC2MCATNOVA1PRLRSAMM50SRSF2TSTANK3CHRAC1ELP5HEXBMCRS1NPM1PRMT2SAP30BPSSBP2TUBB2AANKRD54CMBLENPP1HFEMDH1NQO2PRPSAP2SAR1AST13TUBB2BAOAHCNOT2ENY2HID1MED1NR2F1PRR5SAR1BST6GALNAC1TUBD1AP3B1CNOT6EPDR1HINT1MED16NR4A1PSMB3SCAF11ST6GALNAC2TUBG2APAF1CNOT8ERAL1HIVEP2MED30NSMCE2PSMB6SCN8ASTARD3TXN2APOBEC3COA3EXOSC4HMGA2MED9NT5C3BPSMC3IPSCO2STAT5ATXNDC5AQP5COASYF13A1HMMRMEF2CNUDT4PSMD11SCRIBSTAT5BUBE2G2ARF3COILFABP6HNRNPA1METAP1NUP155PSMD3SCRN2STC2UIMC1ARFGAP3COPS3FAM161AHNRNPABMETTL23NUP50PSME3SDHASUB1UNC5BARHGAP18COX6CFANCLHSF1MICU1NUP85PSME4SEC14L4SUZ12UQCRQARHGAP39CPDFBXO4HSP90B1MID2OGFOD3PTGR2SEC63SYKUTRNARHGAP8CPEB4FDXRHUS1MIEN1OMDPTP4A3SENP1SYNE1VCANARID4BCPSF1FGFR2IFT20MLXOS9PTPRRSERHLSYNJ2BPVDAC1ARL1CTDSP2FIGNL1IFT27MOSPD1OTULINPTRH2SERINC1TAB1VPS25ARL16CTLA2AFKBP11IL6STMPGOVCA2PTTG1IPSERPINA3GTADA2AVPS26AARL4DCXCL14FLCNINHBAMPSTOXCT1PUF60SF3A1TARSVPS28ASB8CYB561FN3KINPP5JMRPL10OXR1PYCR1SGK1TBCEVPS53ASPNCYC1FOXF2INPP5KMRPL12P4HA1RAB32SGPL1TBK1VPS54ATG101DBN1FOXI1ITGB2MRPL13PABPC1RAB40BSGSM3TBPL1WARSATG7DCAF13FRKJKAMPMRPL22PAPOLARABEP1SHARPINTBRG4WASF1ATP2A3DCXRFTCDJMJD1CMRPL27PATZ1RACGAP1SHMT1TCF20WWC1ATP5G1DDX5FTSJ3JMJD6MRPL55PCBP2RAD1SHPRHTCF3TAZATP5G2DDX56FUCA2KANSL2MRPS23PCMT1RAD21SIRT1TCN2XRCC4ATP5HDEPTORFZD6KAT2AMRPS24PDK2RAD50SKP2TEFXRCC6ATP6V1C1DERL1G3BP1KHDRBS3MRPS7PDXPRAD51DSLC11A2TEFMYAF2ATPAF2DERL2G6PC3KIF21AMSI2PELI1RALASLC1A3TIMM22YWHAEAUHDESI1GABRPKIF2AMTDHPELP1RANBP17SLC22A5TIMP3ZFP131B9D1DGAT1GALNT10KITLMTIF2PEMTRANGAP1SLC25A10TLE2ZFP2BCL11ADHRS7BGALNT6KLF6MTSS1PES1RANGRFSLC25A11TMED10ZFP622BCLAF1DIO2GAMTKRT18MXD3PEX12RAP1BSLC25A17TMED4ZFP672BIKDPYSGAS1LAMA2MYBBP1APEX13RARSSLC38A1TMEM101ZFP7BOP1DRG1GDI2LAMA4MYCPFDN5RASA1SLC39A11TMEM11ZFP740BRIX1DRG2GDPD1LAS1LMYO10PHF5ARECQL4SLC41A2TMEM97ZFP830 Table 3.4.6.1 (cont’d) whole genome sequencing data. The rationale behind this is that if the MMTV-Myc histological subtype preferentially represent different human intrinsic breast cancer subtypes, the combination of genomic and transcriptomic data will limit the number of genes used in unsupervised hierarchical clustering to those that matter most. 198 A B Neither A Not B B Not A Both Log2 Odds Ratio p-Value q-Value ERBB2 COL1A1 4095 460 181 275 >3 <0.001 <0.001 ERBB2 MYC 6299 727 1081 332 1.412 <0.001 <0.001 MYC KRAS 6914 1314 COL1A1 MYC 3666 COL1A1 KRAS 4428 ERBB2 KRAS 7234 269 399 994 MYC FGFR2 6921 1371 KRAS FGFR2 8092 COL1A1 FGFR2 4482 200 439 112 889 127 146 105 136 73 99 2.218 <0.001 <0.001 187 1.519 <0.001 <0.001 57 2.316 <0.001 <0.001 65 1.696 <0.001 <0.001 42 1.014 <0.001 <0.001 11 1.71 0.001 0.001 17 1.249 0.003 0.003 ERBB2 FGFR2 7262 1030 118 29 0.793 0.008 0.008 Tendency Co- occurrence Co- occurrence Co- occurrence Co- occurrence Co- occurrence Co- occurrence Co- occurrence Co- occurrence Co- occurrence Co- occurrence Table 3.4.6.2: Contingency table of TCGA breast cancer patient copy number data by listed genes available in cBioPortal. An event for each gene is the presence of either a copy number gain or copy number loss. Separate pairwise contingency tables are listed for all possible combinations of genes considered: ERBB2, MYC, COL1A1, KRAS, and FGFR2. Odds ratios and p-values were determined using Fisher’s exact test for each gene combination. Q-values are Bonferroni multiple hypothesis corrected p- values. 199 ACTR3B ANLN BCL2 BIRC5 BLVRA CCNB1 CDC20 CDH3 CEP55 CXXC5 EGFR ESR1 EXO1 FGFR4 FOXA1 FOXC1 GRB7 KRT14 KRT17 KRT5 MAPT MELK MIA MKI67 MLPH MMP11 PHGDH PTTG1 SFRP1 SLC39A6 TYMS UBE2T Table 3.4.7: Optimized 32 gene list. List of the 32 most important genes as a subset of PAM50 for distinguishing between intrinsic subtypes of breast cancer as determined through RFECV using an SVM classifier with RBF kernel. 200 Retrieval: Temperature and Device: Primary: Species: Citrate pH 6.0 Pressure Cooker 125 °C CD31 (SZ31) Rat Citrate pH 6.0 Pressure Cooker 125 °C CHAD Rabbit Citrate pH 6.0 Vegetable Steamer 100 °C Collagen 1A1 (E8F4L) Rabbit Citrate pH 6.0 Pressure Cooker 125 °C PHB1 Rabbit Dilution & Incubation: 1:20 – 1 hour RT Scytek Normal Antibody Diluent 1:100 – 1 hour RT Scytek Normal Antibody Diluent 1:150 – Overnight 4°C / Cell Signaling Diluent 1:100 – Overnight 4°C / Cell Signaling Diluent Table 4.6.20: Immunohistochemistry details. Information on buffer used, temperature, device, species of origin, dilution ratio, and amount of time for incubation used on each antibody in Chapter 4 is found here. 201