GENETICALLY ENGINEERED MOUSE MODELS PREDICT ACTIONABLE MUTATIONS IN HUMAN CANCERS By Matthew Richard Swiatnicki A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Microbiology and Molecular Genetics – Doctor of Philosophy 2021 GENETICALLY ENGINEERED MOUSE MODELS PREDICT ACTIONABLE MUTATIONS IN HUMAN CANCERS ABSTRACT By Matthew Richard Swiatnicki In the United States alone, cancer claims the lives of over 600,000 people a year. While progress has been made in understanding this complex set of diseases, more work is needed if we are to end our struggle with cancer. Bioinformatics analysis and genetically engineered mice are important tools for understanding the biological complexities of cancer. When combined, these approaches can be an important avenue to uncover disrupted cellular pathways contributing to cancer formation. While genetically engineered mouse models are important for the study of cancer, genome sequence analysis of many of these models is lacking. Within this work, we sequenced whole genomes of two genetically engineered mouse models of cancer, MMTV-Neu and MMTV-PyMT. Through this sequence data, we have found numerous disruptions to pathways contributing to the metastatic cascade. These include tumor signatures associated with defective mismatch repair, as well as numerous genomic mutations within cell adhesion genes. More importantly, we have uncovered a conserved V483M missense mutation within the protein tyrosine phosphatase receptor type H (Ptprh) gene. Within mice, tumors harboring a Ptprh mutation correlate with increased phosphorylation of the epidermal growth factor receptor (EGFR). EGFR is a known oncogene that is mutated in numerous cancers, including non-small cell lung cancer (NSCLC). Lung cancer is the number one cancer cause of death in the United States. Often, prognosis for lung cancer is poor, often due to late diagnosis. NSCLC patients with mutations in EGFR typically have a more favorable prognosis, due to treatment with tyrosine kinase inhibitors. More research is needed to improve survival rates of lung cancer patients who do not present with mutations in EGFR. Within NSCLC, 5% of patients have mutations in PTPRH, and many of these mutations correlated with increased EGFR activity as well as PI3K/AKT activity. If PTPRH mutant patients have increased activation of EGFR and would benefit from TKI therapy, this presents a unique opportunity to treat a large subset of cancer patients with an FDA approved therapy. CRISPR KO of PTPRH within the H23 lung cancer cell line resulted in increased phosphorylation of EGFR and downstream AKT. Furthermore, PTPRH mutant NSCLC cell lines H1155 and H2228 respond to the tyrosine kinase inhibitor osimertinib. In vivo osimertinib treatment of nude mice injected with H2228 cells also shows partial response, suggesting PTPRH mutant patients may benefit from EGFR therapy. This work is dedicated to my family, especially to my grandmother Rita Swiatnicki who passed from breast cancer in 2003. May we one day find a cure to ease the suffering for all those who toil with the affliction of cancer. iv ACKNOWLEDGEMENTS There are a number of people without whom; I would not have achieved my degree. First I would like to thank my family and friends, especially my soon to be wife Jenna, and my parents. Their support over these last few years has made this possible. I would like to thank Dr. Eran Andrechek for being a great mentor, and the rest of the Andrechek lab for all their support. I would especially like to thank Dr. Jon Rennhack and Dr. Sean Misek for their help and advice involving my persistent questions with research and experimental design. My thesis committee, including Dr. Kathy Meek, Dr. Susan Conrad, Dr. Hua Xiao, and Dr. Kefei Yu also deserve a large thank you for their support over the years. Especially Kathy, who graciously allowed me to hunt on her property and keep my sanity. Finally, I would like to thank the Microbiology and Molecular Genetics Department for all of their support and numerous funding source overs the years. v TABLE OF CONTENTS I. LIST OF TABLES ..............................................................................................................................................ix LIST OF FIGURES ............................................................................................................................................ x KEY TO ABBREVIATIONS ................................................................................................................................xi INTRODUCTION ............................................................................................................................................. 1 CANCER AS A GENOMIC DISEASE ..................................................................................................... 2 EFFICACY OF MOUSE MODELS ......................................................................................................... 5 MICE AS A CANCER MODEL ................................................................................... 5 CARCINOGEN BASED MODELS ........................................................................ 5 TRANSPLANT MOUSE MODELS ....................................................................... 6 GENETICALLY ENGINEERED MOUSE MODELS ................................................. 6 MOUSE PHENOTYPES ...................................................................................... 9 GENE EXPRESSION DATA ............................................................................... 11 GENOMIC COPY NUMBER ALTERATIONS ...................................................... 12 PATHWAY ANALYSIS ...................................................................................... 13 SEQUENCING ................................................................................................. 14 OTHER CONSIDERATIONS – METABOLOMICS AND PROTEOMICS ................ 15 CHOOSING A MODEL ..................................................................................... 15 DISCUSSION ................................................................................................... 17 MICE AS MODELS FOR TREATMENT .................................................................... 18 BIOINFORMATICS AS A MEANS TO INVESTIGATE CANCER ............................................................ 19 SEQUENCING ..................................................................................................................... 20 GENE EXPRESSION ............................................................................................................ 20 PATHWAY ANALYSIS ......................................................................................................... 21 DATA ANALYSIS ................................................................................................................. 22 THE FUTURE OF CANCER TREATMENT ........................................................................................... 23 II. CHAPTER 1 ALTERED METASTASIS IN E2F1 KNOCKOUT MODELS OF HUMAN BREAST CANCER ............... 25 PREFACE ......................................................................................................................................... 26 ABSTRACT ....................................................................................................................................... 27 INTRODUCTION .............................................................................................................................. 28 RESULTS ......................................................................................................................................... 29 ANALYSIS OF GENE EXPRESSION DATA IN NEU AND PYMT TUMORS .............................. 29 MUTATION ANALYSIS THROUGH WHOLE GENOME SEQUENCING .................................. 30 MUTATION SIGNATURES GENERATED FROM SNV PROFILES ........................................... 31 EXAMINING TUMOR CLONALITY ....................................................................................... 33 COPY NUMBER AND TRANSLOCATION EVENTS ................................................................ 33 ANALYSIS OF DISRUPTED PATHWAYS ............................................................................... 34 DISCUSSION.................................................................................................................................... 36 MATERIALS AND METHODS ........................................................................................................... 39 GENE EXPRESSION ANALYSIS ............................................................................................ 39 WHOLE GENOME SEQUENCING AND PROCESSING .......................................................... 39 VARIANT CALLING ............................................................................................................. 40 vi MUTATION SIGNATURES .................................................................................................. 40 TUMOR CLONALITY ........................................................................................................... 40 CIRCOS PLOTS ................................................................................................................... 41 TRANSLOCATION VERIFICATION ....................................................................................... 41 APPENDIX .................................................................................................................................................... 42 CHAPTER 2 PTPRH MUTATIONS IN PYMT MOUSE TUMORS ...................................................................... 64 ABSTRACT ....................................................................................................................................... 65 INTRODUCTION .............................................................................................................................. 66 PHOSPHATE SIGNALING WITHIN THE CELL ....................................................................... 66 RECEPTOR TYROSINE KINASES .......................................................................................... 66 EPIDERMAL GROWTH FACTOR RECEPTOR ....................................................................... 68 PHOSPHATASES ................................................................................................................. 70 PROTEIN TYROSINE PHOSPHATASE RECEPTOR TYPE H .................................................... 71 RESULTS ......................................................................................................................................... 71 DISCOVERY OF PTPRH MUTATIONS IN MOUSE PYMT TUMORS ...................................... 71 PTPRH MUTANT TUMORS CORRELATE WITH HIGH EGFR ACTIVITY ................................. 72 DISCUSSION.................................................................................................................................... 73 MATERIALS AND METHODS ........................................................................................................... 74 TARGETED RESEQUENCING OF PYMT TUMORS ............................................................... 74 ANALYSIS OF PTPRH MUTATIONS IN WES DATA .............................................................. 74 WESTERN BLOTTING ......................................................................................................... 75 APPENDIX .................................................................................................................................................... 76 CHAPTER 3 RELATIONSHIP OF PTPRH AND EGFR IN HUMAN CANCER....................................................... 83 ABSTRACT ....................................................................................................................................... 84 INTRODUCTION .............................................................................................................................. 85 RESULTS ......................................................................................................................................... 86 PTPRH MUTATIONS IN HUMAN CANCER .......................................................................... 86 BIOINFORMATICS PREDICTS ACTIVATION OF EGFR AND DOWNSTREAM PATHWAYS .... 87 PTPRH TARGETS EGFR IN HUMAN LUNG CANCER LINE .................................................... 88 TARGETING OF OTHER KINASES BY PTPRH ....................................................................... 89 NUCLEAR EGFR WITHIN PTPRH MUTANT TUMORS.......................................................... 90 DISCUSSION.................................................................................................................................... 90 MATERIALS AND METHODS ........................................................................................................... 93 DETERMINING PTPRH MUTATIONS IN HUMAN CANCERS ............................................... 93 MUTUAL EXLCUSIVITY ....................................................................................................... 93 DEMOGRAPHICS OF PTPRH MUTATIONS ......................................................................... 93 EGFR ACTIVITY AND PATHWAY ACTIVITY PREDICTION .................................................... 94 CRISPR KNOCKOUT ........................................................................................................... 94 CRISPR KNOCK-IN MUTATION ........................................................................................... 95 WESTERN BLOTTING ......................................................................................................... 95 OVEREXPRESSION EXPERIMENTS ..................................................................................... 96 RECEPTOR TYROSINE KINASE ARRAY ................................................................................ 96 IHC NUCLEAR EGFR ........................................................................................................... 96 APPENDIX .................................................................................................................................................... 97 vii CHAPTER 4 TREATMENT OPPORTUNITIES FOR PTPRH MUTATIONS IN NON-SMALL CELL LUNG CANCER ..................................................................................................................................................... 105 ABSTRACT ..................................................................................................................................... 106 INTRODUCTION ............................................................................................................................ 107 PTPRH DEREGULATION IN HUMAN CANCERS ................................................................ 107 NON-SMALL CELL LUNG CANCER .................................................................................... 108 TYROSINE KINASE INHIBITORS ........................................................................................ 109 RESULTS ....................................................................................................................................... 110 POOLED PTPRH KNOCKOUTS HAVE INCREASED GROWTH ............................................. 110 PTPRH MUTANT CELL LINES ARE SENSITIVE TO TYROSINE KINASE INHIBITION THROUGH OSIMERTINIB TREATMENT .............................................................................................. 110 TREATING MICE WITH HUMAN PTPRH MUTANT TUMORS ............................................ 111 DISCUSSION.................................................................................................................................. 112 MATERIALS AND METHODS ......................................................................................................... 113 POOLED CRISPR KNOCKOUT ........................................................................................... 113 MTT ASSAY ...................................................................................................................... 113 GROWTH CURVES ........................................................................................................... 113 DOSE RESPONSE CURVES ................................................................................................ 114 IN VIVO MOUSE TREATMENT ......................................................................................... 114 APPENDIX .................................................................................................................................................. 115 CHAPTER 5 FUTURE DIRECTIONS .............................................................................................................. 125 METASTASIS IN E2F1 KNOCKOUT MOUSE MODELS .................................................................... 126 PTPRH MUTATIONS IN HUMAN CANCERS ................................................................................... 127 WORKS CITED ............................................................................................................................................ 131 viii LIST OF TABLES Table 1.1: Mouse tumor signature etiology ............................................................................................... 58 Table 1.2: Supporting reads for 20 randomly selected translocations from the tumor in figure 6 ........... 59 Table 1.3: Table showing read support for 20 randomly drawn translocations within each of the 12 mouse tumors ............................................................................................................................................. 60 Table 1.4: Cosmic associated genes ........................................................................................................... 61 Table 2.1: Mammary gland Ptprh mutation status in PyMT mice ............................................................. 81 ix LIST OF FIGURES Figure 1.1: Altered phenotypic characteristics in E2F1−/− tumors ............................................................. 43 Figure 1.2: Gene expression changes in E2F1−/− mouse tumors, and E2F1 low human breast cancer ..... 45 Figure 1.3: Filtering background strain to remove artifacts that have potential to confound analysis .... 47 Figure 1.4: SNV mutation burden in Neu and PyMT tumors ..................................................................... 49 Figure 1.5: Mutation profiles ..................................................................................................................... 51 Figure 1.6: Clonal heterogeneity in Neu and PyMT tumors ....................................................................... 53 Figure 1.7: Mutation burden in Neu and PyMT tumors ............................................................................. 54 Figure 1.8: Verification of translocation calls ............................................................................................ 55 Figure 1.9: Mutations in basement membrane genes ............................................................................... 57 Figure 2.1: Ptprh mutations in PyMT mouse tumors ................................................................................. 77 Figure 2.2: Increased p-EGFR in Ptprh mutant mouse tumors .................................................................. 79 Figure 2.3: Downstream pathway activity in Ptprh mutant mouse tumors .............................................. 80 Figure 3.1: PTPRH mutations within human cancers ................................................................................. 98 Figure 3.2: Pathway activation predictions in PTPRH mutant tumors ....................................................... 99 Figure 3.3: PTPRH knockout cells have increased p-EGFR ....................................................................... 100 Figure 3.4: Downstream signaling of H23 PTPRH KO cells ....................................................................... 101 Figure 3.5: PTPRH regulates other kinases .............................................................................................. 103 Figure 3.6: Localization of EGFR to the nucleus in PTPRH ablated tumors .............................................. 104 Figure 4.1: Variable growth of PTPRH KO clones ..................................................................................... 116 Figure 4.2: Increased cellular growth and proliferation upon pooled PTPRH knockdown ...................... 117 Figure 4.3: Tyrosine kinase inhibitor treatment of PTPRH mutant cell lines ........................................... 119 Figure 4.4: In vivo treatment of H2228 PTPRH mutant tumors ............................................................... 122 Figure 4.5: TUNEL and KI67 staining in PTPRH mutant tumors treated with osimertinib ....................... 123 x APC ATRS BCR C-ABL ChIP-seq CNV COSMIC CRISPR DCIS DMBA DSB EGFR ER FGFR1 GEF GEMM GEO GFP GO (SS)GSEA HER2 ICGC IGF1R KEY TO ABBREVIATIONS Adenomatous polyposis coli A/T-rich sequences Breakpoint cluster region protein Abelson tyrosine kinase Chromatin immunoprecipitation sequencing Copy number variant Catalog of somatic mutations in cancer Clustered regularly interspaced short palindromic repeats ductal carcinoma in situ 7-12,Dimethylbenz[a]anthracene Double stranded break repair Epidermal growth factor receptor Estrogen receptor Fibroblast growth factor receptor 1 Guanine nucleotide exchange factor Genetically engineered mouse model Gene Expression Omnibus Green fluorescent protein Gene ontology (single sample) Gene set enrichment analysis Human epidermal growth factor receptor International Cancer Genome Consortium Insulin like growth factor 1 receptor xi ILC In/del KRAS KEGG MAPK MCA MIND MITE-seq MMR MMTV MNU NCI Neu NSCLC PDX PTB PTP PTPRH PyMT RB RCAS-TVA rPTP RTK SB Invasive lobular carcinoma insertion/deletion Kirsten rat sarcoma Kyoto encyclopedia of genes and genomes Mitogen activated protein kinase 3-methy1cholanthrene Mammary intraductal Mutagenesis by integrated tiles sequencing Mismatch repair Mouse mammary tumor virus promoter N-methyl-N-nitrosourea National Cancer Institute (Erb-B2) Receptor tyrosine kinase 2 Non-small cell lung cancer Patient derived xenograft Phosphotyrosine binding Protein tyrosine phosphatase Protein tyrosine phosphatase receptor type H Polyoma middle T antigen Retinoblastoma Replication-competent avian sarcoma-leukosis virus – tumor virus A receptor Receptor protein tyrosine phosphatase Receptor tyrosine kinase Sleeping beauty xii SC SCID ScRNA-seq SNV SAP-1 TALEN TCGA TKI VAF WAP WES WGS Small cell lung cancer Severe combined immunodeficiency Single cell RNA sequencing Single nucleotide variant Stomach cancer-associated phosphatase 1 (PTPRH) Transcription activator-like effector nucleases The Cancer Genome Atlas Tyrosine kinase inhibitor Variant allele frequency Whey acidic protein Whole exome sequencing Whole genome sequencing xiii INTRODUCTION 1 CANCER AS A GENOMIC DISEASE Current scientific paradigm surrounding the onset of cancer involves gene mutations leading to dysregulation of cellular pathways controlling proliferation, apoptosis, and cellular maintenance. Often, mutations in a few oncogenes or tumor suppressor genes lead to oncogenic transformation of a cell [1– 4]. This is exemplified through the current model of colorectal cancer, which often relies on mutations in the tumor suppressor APC (Adenomatous polyposis coli), followed by mutations in the proto-oncogene KRAS (Kirsten Rat Sarcoma) to develop a malignancy [5–9]. With recent cost reductions in sequencing technologies, whole genome or whole exome sequencing has been completed on hundreds of thousands of human tumors. This has revealed differing mutation burdens across various forms of cancer, with certain cancers such as glioblastomas harboring few mutations, and others such as colorectal cancers harboring a large number of mutations [4]. Analyzing whole genome sequence data has also revealed the importance of non-exonic mutations within cancer formation. Genetic mutations in regions important for gene regulation, such as gene promoters, can impact tumor formation and growth. Analyzing the impact of non-exonic mutations is a quickly growing area within the cancer field. Mutations to the genetic code can be broadly classified into two categories. These include small structural changes such as single base pair mutations (SNVs) and small insertions and deletions (in/dels), as well as larger structural changes such as amplification or deletion events (CNVs) and translocations. While the vast majority of single base pair mutations are synonymous, resulting in no changes to protein structure, nonsynonymous and nonsense mutations can lead to amino acid shifts or truncated proteins that alter protein function. Examples of SNVs contributing to cancer formation are L858R EGFR mutations in lung cancer, and various amino acid shifting KRAS mutations that occur in numerous cancers [10, 11]. Large amplification or deletion events within the genome can result in cancer through disruption of a single gene or multiple genes. This is evidenced within human epidermal growth factor receptor type 2 (HER2) breast cancer patients, where amplification of the HER2 oncogene contributes to oncogenic 2 transformation [12]. Translocations are also capable of inducing cancer through a number of mechanisms, including fusing active gene promoters with known oncogenes, or simple truncation of a gene. An example of translocations contributing to cancer formation lies within chronic myelogenous leukemia patients, where a translocation between chromosomes 9 and 22 fuses the C-ABL (Abelson tyrosine kinase) oncogene with an active BCR (Breakpoint cluster region protein) promoter [13–15]. Mutations and structural changes affecting gene promoters, splice sites, and other regions important for gene regulation are now becoming more appreciated for their ability to cause cancer [16–18]. With the realization that cancer is largely a disease of underlying genetic mutations, much debate has swirled around the contribution of factors underlying these mutations. There are currently thought to be three mechanisms for the onset of genetic insults, including heritable germline mutations, random mutations originating from DNA replication errors, and errors introduced by environmental mutagens. Heritable germline mutations are perhaps the easiest to trace, but account for the least amount of cancer incidence. Heritable BRCA1/2 mutant breast cancers account for approximately 25% of all breast cancer cases, while heritable mutations in RB1 account for 40% of retinoblastoma cases [19, 20]. When analyzing overall cancer rates however, The National Cancer Institute (NCI) estimates germline mutations account for approximately five to ten percent of all cancers. Approximately 90% of cancers occur due to environmental factors and random chance, however there has been much debate over the contribution of these two factors to cancer incidence. Samuel Epstein’s ‘The Politics of Cancer’ attributes the majority of cancers to increasing environmental pollution via carcinogens [21]. This notion is supported through a slew of epidemiological evidence, such as migrant cancer rates shifting towards rates of their adoptive countries [22], and higher cancer incidence seen in areas located in close proximity to heavy industrial presence [23–25]. However, these statements are more complicated when looking below the surface level. For instance, other studies have shown migrants adopt rates of cancer similar to their adoptive country for only certain cancers, while other cancers 3 maintain rates to that of their country of origin [26, 27]. The most well known environmental factor contributing to increased cancer rates is smoking. By the 1960s, there were numerous epidemiological and animal studies showing a link between smoking and lung cancer [28–30]. Since then, the data has evolved to show an overwhelming amount of evidence linking smoking to cancer. This includes genomic studies showing differing mutation profiles of lung cancer patients who smoked, versus those who haven’t [31, 32]. In a shift from Epstein’s line of thinking, some recent evidence suggests a majority of cancers occur by chance, due to random mutations within the genome [33]. This evidence was based on correlations between the number of stem cell divisions occurring within particular tissues, and the cancer incidence within those tissues. This paper has come under fire for a number of reasons, namely, another group was able to show the correlation held in a hypothetical scenario where cancer incidence was high due to environmental effects [34]. A series of letters to Science has also pointed out that the original Vogelstein study didn’t include breast and prostate cancer in their analysis, two cancers thought to be highly impacted by environmental factors [35–37]. These letters also point out a potential flaw in the Vogelstein statistical analysis, showing that the confidence limits would actually be +/- 30, meaning the rates of incidence could be 30 times less or greater than their predicted value. More recent data from Vogelstein and other groups have reiterated the importance of random DNA replication errors in the formation of cancer [38, 39]. They prudently pointed out however, that these studies don’t diminish the impact environmental mutagens have on the formation of cancer. Questions surrounding the impact of certain gene mutations are important considerations for cancer biologists. While tumors often carry a high mutational burden, it has been traditionally thought that only a few mutations, dubbed ‘driver mutations’, contribute to tumor progression. A vast majority of the remaining mutations are dubbed ‘passenger mutations’, and thought to have little impact on tumor progression [4]. More recent data however has suggested that whether collectively or individually, 4 passenger mutations may have more of an impact on tumor progression than previously thought [40–42]. With the development of MITE-seq (Mutagenesis by integrated tiles), the effect of every possible amino acid substitution within an individual gene can be determined [43, 44]. While this technology shows promise for investigating passenger mutations within individual genes, completing this assay for each gene with the exome remains a tall order. Many important questions surrounding passenger mutations remain. Do these passenger mutations arise within pre-neoplastic tissue, or after tumor formation? Furthermore, how do these mutations contribute to the metastatic process of cancer? Future studies involving single cell, and MITE sequencing may be able to resolve some of these questions. EFFICACY OF MOUSE MODELS In cancer research, the use of mouse models is often two-fold. The first includes studying cancer associated oncogenes, pathways, and histology, in other words, studying cancer itself. The second includes utilizing mouse models to study the safety and efficacy of drug treatments. In the two subsections below, I will touch on both of these uses. I. MICE AS A CANCER MODEL This subsection of the introduction has previously been published as a review in the Journal of Mammary Gland Biology and Neoplasia titled “How to Choose a Mouse Model of Breast Cancer, a Genomic Perspective”. Portions of the review not applicable to this thesis were excluded. While the review focuses specifically on mouse models of breast cancer, the principals can be applied to mouse models of various other cancers. CARCINOGEN BASED MODELS A common method for modeling breast cancer is through mouse model systems. Currently there are numerous systems, each with advantages and disadvantages, used to generate different models. Modeling cancer in animals began with the application of coal tar on rabbits and mice, leading to the formation of tumors [45]. Since that point, a wide array of carcinogens employed in mice have been used 5 to study cancer, including N-methyl-N-nitrosourea (MNU), 3-methy1cholanthrene (MCA), and perhaps the most widely used 7-12,Dimethylbenz[a]anthracene (DMBA) [46, 47]. Tumors in mice treated with carcinogens often express a variety of genomic alterations including mutations in PTEN, increased expression of CCND1 and MYC, and the activation of important cellular pathways including NF-κB, Wnt, and PI3K/AKT [48, 49]. Histologically, these tumors vary greatly between models, with MPA treated mice often exhibiting type-B adenocarcinomas, and DMBA treated mice often having tumors of the adenomyoepethelial and myoepithelial histologies [46, 50]. TRANSPLANT MOUSE MODELS To further study facets of human cancers in a more biologically relevant setting, transplantable mouse models have been developed. These include the mammary intraductal (MIND) model in addition to the previously mentioned cell lined xenografts and patient derived xenograft models. In order to study the progression of human cancers from ductal carcinoma in situ (DCIS), the MIND model mimics human DCIS through the injection of human DCIS cells into the ducts of severe combined immunodeficiency (SCID)-beige mice [51]. Indeed, this method allows for the subtypes of DCIS to be maintained in a mouse model [51, 52]. However, despite their clear strengths, these models are not readily amenable to modification or manipulation to allow quick and easily genetic testing of hypotheses. GENETICALLY ENGINEERED MOUSE MODELS The complexity of human cancer may best be modeled through the various forms of genetically engineered mice, including transposon based, transgenic, knock-in, knock-out, and inducible mouse systems. One of their largest advantages these models possess is the acquisition of impactful mutations [53, 54], analogous to the development and progression of human breast cancer. One method of generating mice with cancer in the mammary glands is through the use of transposable elements [55–57]. These systems are used for germline transmission, as well as generating somatic mutations for the study of cancer [58]. Use of these systems allowed mice to be characterized 6 with mutations in key genes. As mentioned above, patients with invasive lobular carcinoma (ILC) tend to have loss of E-Cadherin. Using the Sleeping Beauty (SB) transposable system, Kas et al. showed the importance of particular genes, including Myh9, and Ppp1r12b, contributing to tumor formation in mice with ablated E-Cadherin [59]. To study potential oncogenes, transgenic mice are developed to determine whether overexpression of that particular gene results in tumor formation. In these mice, tissue specific promoters direct oncogene expression to a particular organ or tissue. Promoters for the study of breast cancer in mice include the commonly used mouse mammary tumor virus (MMTV) and whey acidic protein (WAP), as well as others including keratins [60–62]. Overexpression of a number of important oncogenes with these promoters has illustrated the importance of key genes, including C-MYC, RAS, and ERBB2 [60, 63, 64]. In addition to the simple overexpression systems, work from the Chodosh lab introduced numerous inducible systems where expression of key oncogenes could be turned on or off in the mammary gland through introduction of doxycycline to the water [53, 65–67]. These systems revealed that while tumors were initially dependent upon the initiating oncogene, they accumulated enough mutations that when expression of the primary driving gene was withdrawn, tumors that initially regressed eventually relapsed. Other studies have used a combination of the inducible and standard transgenic systems to demonstrate oncogene dominance, where only one oncogene in a two oncogene system is needed to maintain tumor viability [68, 69]. In addition to transgenic models with overexpression of various oncogenes, knock-in models have been generated to express oncogenes in their native genomic location. This has allowed for expression of oncogenes under the control of the Rosa26 promoter, resulting in lower levels of transgene expression [70]. Other groups have placed a lox-stop-lox cassette between the endogenous promoter and an oncogene. The advantage of this system is that normal temporal and spatial control of gene expression occurs [71], but depending on timing of the excision event, mice can adapt to oncogene expression [72]. 7 Importantly, with the lox-stop-lox system, erbB2 knock-in mice developed amplification and overexpression of the oncogene, analogous to HER2+ve breast cancer [71]. Numerous other knock-in models have been created to study breast cancer genes, including R273H, R248W, and R175H Tp53 mutant mice, as well as H1047R Pik3ca mutant mice [73, 74]. Alongside overexpression of oncogenes, knock-out mice permit the study of tumor suppressor genes in vivo. TP53, the most mutated gene in breast cancer, as well as BRCA1, which has germline mutations in 5-10 percent of human breast cancer, have been studied extensively through the use of knockout models [75]. The combination of knockout models with transgenic models, where expression of Cre is linked to the transgene, have also allowed the study of specific facets of tumor development while lacking signaling pathways [76, 77]. In addition to standard transgenic and knock-in / knockout systems, engineered nuclease systems, including TALEN (Transcription activator-like effector nucleases) and CRISPR (clustered regularly interspaced short palindromic repeats), are used to generate mouse models. These systems allow for the deletion, addition, and replacement of desired DNA sequences into numerous models, including mice. While TALEN systems are capable of editing genes anywhere in the genome, as opposed to CRISPR needing nearby PAM motifs, CRISPR has become a more widely used tool due to its simplicity and cost effectiveness. Studies utilizing the power of TALEN and CRISPR systems have investigated numerous genes important to breast cancer, including BRCA1 and CDH1 [78, 79]. These systems can be employed through manipulation of mouse embryonic cells, or through direct injection of the system components into wildtype mice, and mice containing the CAS9 protein under control of the cre-lox system [80, 81]. Gene specificity is achieved in these systems through the use of guide RNAs. A further review of these systems can be found here [82]. With the recent advent of CRISPR systems easing the transgenic process, it will also be interesting to see whether there is a resurgence in the use of estrogen receptor (ER)+ rat models. Another tool potentially capable of faithfully recapitulating human breast cancer progression is 8 the replication-competent avian sarcoma-leukosis virus – tumor virus A receptor (RCAS-TVA) system reviewed here [83]. This system can be used for the delivery of oncoproteins and dominant negative tumor suppressors in a timely matter, but is often limited to small insertions into the virus. With the heterogeneity of human breast cancer and the large number of mouse models available to study the disease, the central question becomes, which model is the best fit for a particular study? This is obviously dependent on the experimental question, but the characterization of the models and their relation to human breast cancer should be considered. This is true on a phenotypic, genomic, and gene expression level. MOUSE PHENOTYPES On a phenotypic level, there is a large amount of variation between the various mouse models of breast cancer. In terms of latency, models range from the rapid MMTV-PyMT in the FVB background, to the prolonged GR/J, with tumors appearing at 45 days, and 12 months respectively. Other notable models with strikingly different latency periods include MMTV-NeuNT (Erbb2) transgenics relative to the conditional expression of NeuNT under the control of the endogenous promoter, where tumors appear at 89 days and 15 months respectively [84, 85]. Variation is also observed in the tumor growth rate in various strains. While MMTV-Neu mouse tumors grow to 2500mm3 from first palpitation in approximately 45 days [86], other models such as MMTV-Myc mice with Stat3 ablated, can take as long as 109 days to grow to 2500mm3 from the first palpitation [87]. Fluctuations in tumor latency and growth rate are also context dependent, relying on differentially activated signaling pathways. This is exemplified with ablation of the E2F1 transcription factor in two different mouse models. Loss of E2F1 in the MMTV- Neu mouse model leads to increases in both tumor latency and growth rate, whereas in the MMTV-PyMT model, a decrease in latency and no alteration to growth rate was observed [86, 88]. These differences illustrate the importance of selecting particular models for a study. 9 Previous research has also shown histological differences between the primary tumors of various mouse models. Genetically engineered mouse models (GEMM) exploring mice harboring specific genome alterations introduced through a number of genome editing techniques, have been important tools for cancer researchers. A review of GEMMs by a panel of experts in 2000 found the majority of genetically engineered mouse tumors to have a set of histological forms unique from non-GEMM tumors such as carcinogen induced models [89]. Some GEM tumors, such as those from models expressing the neu and src transgenes, have also been found to have histologies similar to those of tumors from human patients [90]. Much like human breast cancer, a large amount of histological variation is seen within certain GEMMs. MMTV-Myc mice have been shown to harbor multiple tumor histologies including papillary, microacinar, and squamous tumors [54]. Similar pathologies were noted in the MMTV-Met mice [91]. In MMTV-PyMT mice, while approximately 40 percent of tumors have a microacinar histology, tumors also display a wide array of histological patterns including adenosquamous, glandular, and those of mixed histology [88]. More recently, certain GEMM tumor histological subtypes have been shown to correlate with particular transcriptional profiles within the model, much like the human disease. In fact, gene expression signatures have been generated that are capable of predicting histological patterns in mouse tumors [92]. The study of metastasis is also heavily reliant on mouse models. While the expression of some oncoproteins such as PyMT and Neu result in a heavy metastatic burden in mice, other transgenic models with potent oncogenes such as WAP-Ras and MMTV-Myc have lower metastatic rates, or fail to metastasize at all [61, 64, 84]. Strain background is also an important consideration in the ability of the primary tumor to metastasize, with expression of PyMT in FVB mice resulting in nearly all tumor bearing mice developing metastasis to the lung. However, the same transgenic line interbred to RF/J, C58/J, and other mouse backgrounds dramatically reduced the metastatic burden [93]. Of GEMMs that metastasize, most result in metastases to the lungs. However, select models have the ability to metastasize to different 10 organs. MT-Met mice have demonstrated metastasis to the heart and kidney as well as the lung, and tumors from p53fp/fp MMTV-Cre mice are able to metastasize to the liver [94, 95]. GENE EXPRESSION DATA The advent of microarray and sequencing technologies has made it possible to complete large scale gene analysis on large numbers of samples. In breast cancer, conserved gene expression patterns led to the definition of the intrinsic subtypes of breast cancer [96]. Since the initial work on human breast tumor expression data, numerous studies have applied microarrays to study GEMM mammary tumors. This has been done for individual models [54, 91, 97–103], as well as in a broader survey approach across models. When examining individual models using array analysis, a surprising amount of molecular heterogeneity has been a recurring finding. Not surprisingly, this heterogeneity was present in tumors with long latency (MMTV-Myc), and correlated with histological subtypes. Predicting that tumors with a short latency would be less heterogeneous would appear to be a logical hypothesis, however, it is notable that tumors with extremely short latency, driven by PyMT, also have a surprising level of heterogeneity from tumor to tumor. Together these studies suggest that both models are dependent upon accumulation of other events for tumor formation and progression. Not all models have extensive heterogeneity, and models such as Wap-Myc, C3(1)Tag, and MMTV-Neu, have less heterogeneity based on gene expression profiles. Comparison of these individual models to human breast cancer has revealed that C3(1)-Tag and Wap-Myc models have expression patterns similar to basal-like human tumors (a highly aggressive molecular subtype of breast tumors), including high expression CRYAB, a known human basal- like tumor marker [104]. Expression signatures from other tumor types, such as luminal, do not correlate as well between mouse models and human tumors, although they still share some similar features, like positive staining for the K8/18 marker [104]. While the MMTV-Neu model fails to actually reflect human Her2+ breast cancer on a gene expression level, this may simply be due to the altered expression of other 11 genes within the large HER2 amplicon. A mouse model with amplification of the endogenous erbB2 locus [71] should thus be assayed for similarities to human HER2+ve breast cancer. In addition to papers that have profiled individual models, there have been several publications that compared various models. Herschkowitz et al examined 13 different models of breast cancer, identifying models with similarities to luminal tumors, despite being ER-negative, and having heterogeneous expression patterns. They also identified other GEMMs resembling more basal like tumors. [104]. Hollern et al increased the number of samples analyzed (1156) as well as profiling numerous additional models to examine 26 major models with several additional variants (wild type Myc, T58A Myc etc.). This unsupervised approach demonstrated substantial heterogeneity in the majority of mouse models. Using both a gene expression and a signaling pathway approach, they also noted several similarities between the intrinsic subtypes of human breast cancer, and subsets of various mouse models. Importantly, it was noted that only a portion of tumors from an individual model reflected each of the intrinsic subtypes [105]. Further, Pfefferle et al. examined 356 samples from 27 models to identify 17 distinct mouse mammary tumor intrinsic subtypes, eight of which reflected subtypes in human breast cancer. However, this analysis used an intrinsic approach, a supervised method of clustering that may add bias to the study. Each of these three publications provides an important examination of the diversity of mouse models of breast cancer and are an essential starting point when choosing a mouse model for analysis. GENOMIC COPY NUMBER ALTERATIONS In tumor cells, regions of the genome are often deleted or repeated dozens of times, potentially serving to drive tumor formation or modify tumor progression. A prime example of copy number variation (CNV) in cancer is the amplification of human epidermal growth factor receptor type 2 (HER2), resulting in uncontrolled activation of downstream signaling cascades, including the mitogen activated protein kinase (MAPK) pathway [12, 106]. While extensive CNV data from mouse tumor models has not been 12 generated, use of an algorithm that predicts CNV from gene expression data has been generated and validated [107]. Applied to mouse models of breast cancer, the prediction of CNV noted variation across numerous mouse models of breast cancer. However, genes from some CNV regions, such as Gsn, are conserved among some models [107]. This same trend was seen within distinct mouse models, whereas some CNV events showed little conservation between mice in a given model, and other events were present in greater than 50 percent of mice in a given model [107]. More interestingly, integrated clustering of CNV events from mouse and human tumors showed conservation of some CNV events between the two species [107], demonstrating that mouse models can be an accurate depiction of human breast tumors in terms of copy number alterations. PATHWAY ANALYSIS Research has shown that complex networks of proteins work together in regulatory pathways that control cellular function. These signaling pathways, including the MAPK/ERK and PI3K/AKT pathways, are often dysregulated in cancer [108, 109]. Expression data from the various genes that constitute these pathways and their downstream targets can predict activation or inactivation of particular pathways, making these pathway signatures an important tool for the study of breast cancer. To uncover pathway use, gene expression analysis has been coupled with bioinformatic tools like Gene Set Enrichment Analysis (GSEA), which has been widely applied to many models. Likewise, a Bayesian Regression Pathway signature system [110] has been applied to mouse models of breast cancer to predict cell signaling pathway activity [86–88]. Like differential gene expression data, pathway signatures often vary within GEMMs, the most prominent example of this perhaps being the Myc model [105]. In mice, pathway signatures have shown a correlation with histological subtypes, most notable being the microacinar histology associated with amplification events on chromosomes 11 and 15 [107]. Pathway signatures from mouse mammary tumors have also been found to correlate to human breast tumors. A set of highly expressed pathways found in tumors from Myc mice were also found to be highly expressed in Basal-like 13 human tumors [111]. This trend has been seen in a number of pathway signature sets between mouse and human tumors. SEQUENCING Sequencing of human breast cancer samples has led to both the discovery of novel mutations important to breast cancer, such as FOXP1 [112], as well as further characterization of genes already known to be important to cancer development including HER2 and PI3K [113, 114]. In mouse models, sequencing studies in lung cancer have shown the mutational burden from GEMM tumors to be lower than that of human lung tumors. Tumors from Kras, and Egfr driven mice carry a mutational burden of ~.05 non-synonymous mutations per mega base, while human tumors harbor a mutational burden of ~4.1 non-synonymous mutations per mega base [115, 116]. While numerous publications have examined gene expression in mouse models of breast cancer, very few models have been examined at the sequence level. Recently, whole genome sequencing (WGS) from mouse mammary tumors (MMTV-Neu and MMTV- PyMT) has also led to the discovery of alterations in genes potentially important to human breast cancer, including Col1a1 and Phb [117]. The potential impacts of these mutations on tumor behavior in such well characterized tumor models underscores the need to complete WGS on mouse models of breast cancer [118]. Researchers are now beginning to appreciate the cellular and genetic heterogeneity of tumors not only between patients, but within single tumors [119]. Intra-tumoral and metastatic site heterogeneity present issues for tumor treatment, as targeted therapies may be effective for only part of the tumor. Single cell RNA sequencing (scRNA-seq) is beginning to confront these challenges through the understanding of the differences present within a primary tumor, and across the metastatic sites. Investigation of copy number alterations in single cell sequencing of two triple negative human breast tumors found four distinct populations of cells, with some shared CNV regions between the cell populations [120]. In mice, scRNA-seq has begun to show the distinct gene expression profiles of 14 mammary epithelial cells at different developmental stages. In the mammary gland, a shift in gene expression from a basal-like transcriptional profile to a more luminal profile occurs around 5 weeks of age [121]. While more studies are needed using scRNA-seq, key insights into the single cell heterogeneity of cancer should continue to be uncovered as this technology continues to develop. OTHER CONSIDERATIONS - METABOLOMICS AND PROTEOMICS While cancer metabolomics is not a new area of study within the field, recent years have seen a surge in metabolic profiling of both human and mouse tumors. A 2018 study from Dai et al. focuses on the metabolic profiles for a number of mouse models, including PyMT, Wnt1, and Neu [122]. This study not only found metabolomic differences between tumor and normal breast tissue for each model, it also found that each oncogene had a unique metabolomic profile. Furthermore, the C3-TAg model was found to have metabolites of prognostic value, illustrating the importance of these studies. Advances in mass spectrometry have also led to a rise in large scale proteomics analysis. These analyses in breast cancer mouse models have allowed both comparisons to the human disease, as well as enhanced the search for biomarkers capable of early cancer detection. Indeed, proteins found upregulated in the plasma of tumor bearing PyMT mice have been found to coincide with multiple human breast cancer cell lines, including MCF7 and BT474 [123]. In some cases, such as with the conditionally activated Neu mouse model, entire proteomic profiles have been made publically accessible in hopes of enhancing the search for novel cancer biomarkers [124]. CHOOSING A MODEL Choosing the correct mouse model to investigate human breast cancer is an important experimental decision. As reviewed above, there are numerous categories stratifying the various models. Rather than simply using a model based on availability, investigators should carefully consider the choice of model. First, if the research question is one related to a particular signaling pathway, then this may dictate the choice of model. Numerous models have been profiled in comparison to each other in several 15 reports [105, 111], and both GSEA and Bayesian pathway predictions have been reported for these models [105]. These data may be downloaded and signaling pathways searched to determine models with high or low activity for a pathway of interest. However, given the gene expression heterogeneity seen in various models [111, 125, 126], the number of tumors with the signaling pathway alterations in question should be considered when calculating the number of experimental subjects required. If the primary consideration is a phenotype, such as metastatic progression, then the model choice will be constrained by that characteristic. While a majority of studies use the MMTV-PyMT strain for metastatic research, other strains that metastasize are available. The short tumor latency and extensive metastasis are attractive characteristics for the PyMT transgenic mice, but if the gene expression profile and signaling pathways that are of interest do not match, then other strains are available with metastatic properties. Other characteristics, from tumor latency to promoter system can be considered when choosing a mouse model. For investigators simply looking to ask which mouse model most closely resembles a subtype of human breast cancer, unfortunately there is not an easy answer or single best choice. Examining co- clustering of human and mouse model tumors by gene expression [92] or predicted CNV [107] has revealed that many different models cluster with each of the subtypes of human breast cancer. MMTV- Myc is particularly instructive with varied histological subtypes and gene expression subtypes that individually cluster with most of the major subtypes of human breast cancer [92]. While this confounds the choice of model system, it underscores how sample to sample heterogeneity of gene expression in human breast cancer is reflected in the majority of mouse model systems. Ultimately, the choice of mouse model system is a multifactorial one. This choice must take into account the initiating oncogene, latency, progression characteristics, gene expression similarities to human cancer, cell signaling pathway use, and whether copy number variation is relevant. Moreover, 16 once a model is chosen, the resulting tumors must be characterized to determine how the tumor to tumor heterogeneity that is present in the various models has been altered with the experimental manipulations. DISCUSSION Numerous genomic perturbations, and a cascade of protein interactions and regulatory pathways all function together to initiate and maintain oncogenic transformation. Given this complexity, the mouse model is highly suited to study breast cancer. The in vivo nature of mouse models allows the complexity of cancer to be studied more accurately than cell culture and other in vitro experiments alone. Numerous types of mouse models, including carcinogen induced, patient derived xenografts (PDXs), and GEMMs recapitulate certain aspects of the disease. While their usefulness is dependent on the research question, GEMMs are perhaps the most comprehensive due to their ability to closely mimic the initiating oncogenic event that occurs in a number of cancers while maintaining an appropriate tumor microenvironment and functioning immune system. On an expression and histological level, GEMM tumors are as complex as the human tumors they attempt to mimic. Just as a wide array of histologies are seen within human tumors, tumor histological differences can be seen within single GEMMs. Classifying histological subtypes on their expression profile also shows relevancy to human breast cancer. Since the initial characterization of human breast cancer into intrinsic subtypes, an increasing amount of data has been generated showing mouse subtypes that mimic each. While little whole genome sequencing data has been generated for GEMM tumors, the data available has shown that like human tumors, mouse tumors display a large array of genomic rearrangements, including single nucleotide variants, copy number alterations, and translocations. The histological, expression, and sequencing similarities between human and mouse breast tumors show that when used correctly, genetically engineered mouse models can be an accurate method for studying human breast cancer. 17 Given the complexity of both human breast cancer and the numerous mouse models used to study it, choosing the correct mouse model is essential for the experimental question. Initial examination of expression based analysis and the human based subtypes that are mimicked through large scale gene expression experiments is critical [96, 105]. Depending on copy number alterations in the gene, it is also beneficial to examine the mouse models for similar changes [107]. Whether through GSEA or a signature based approach, signaling pathways should also be examined [105, 111] to ensure that the appropriate model is used. Recent examples of drug screening in mouse models have taken these parameters into account [127, 128] in important demonstrations of the integration of bioinformatics analysis of mouse models with wet lab experiments. II. MICE AS MODELS FOR TREATMENT Clinical trials act as a controlled experiment, allowing researchers and doctors to determine the safety and efficacy of cancer drugs before they are widely prescribed for use. While there are a variety of clinical trials for studying oncology, drug trials are used to study drug safety and efficacy. Typically, drug trials consist of five phases (0-4), with phases 0 and 1 focusing on determining pharmacokinetics and safety respectively [129]. Phases 2 and 3 incorporate a larger number of participants to determine efficacy of the drug, and continue to monitor safety. Finally, phase 4 evaluates long term affects and outcomes of the drug. With the advent of numerous types of mouse models to study oncogenesis at a molecular, cellular, and histological level, there has also been an uptick in the usage of these models as pre-clinical indicators for the safety and efficacy of new cancer drugs. Often, experiments on the safety and efficacy of drugs are completed on mice before a drug can be taken to clinical trials. There is a question however, of whether these models are good indicators of how a drug will perform in the clinic. Before use of genetically engineered mouse models in pre-clinical trial studies, in vitro data and patient derived xenograft models were used widely. Data from the National Cancer Institute (NCI) however showed experiments from these models did not correlate well with phase II clinical trial results 18 [130]. It was therefore hoped that use of GEMMs would better predict clinical trial results [131]. More recent studies however, have shown a continued failure of mouse models to predict safety and efficacy outcomes within the clinic [132–134]. Elongated telomeres found in laboratory mice may have implications in using mice as models for cancer and clinical studies [135, 136]. It is plausible that long telomeres in lab mice may result in an increased ability to repair tissue and resist toxicity, as well as enhance tumor promotion. However, most carefully designed studies in mice involve normalized controls, which would seemingly circumvent questions surrounding tumorigenesis. It is also important to note many mouse studies failing to predict drug toxicity and efficacy may result from poorly designed experiments. Careful consideration must be given to the mouse model’s histology, gene expression patterns driving that histology, molecular driver, immune microenvironment, and other factors [137, 138]. From available data, it seems the use of mouse models for pre-clinical studies must be reconsidered. With the current paradigm however, they will likely remain a staple for use as preclinical models. BIOINFORMATICS AS A MEANS TO INVESTIGATE CANCER The last few decades have seen an explosion of bioinformatics methods used to study cancer. These technologies have vastly improved our understanding of cancer on a molecular and epidemiological level. A large array of new approaches now allows researchers to study gene sequences and expression, cellular pathways, proteins, tumor-stromal interactions, epidemiological trends, and many other facets of carcinogenesis. With these new technologies has also come new hope for improved targeted treatments, and even more recently, a more serious look at pan cancer therapies. The following paragraphs will briefly look at some of the more common technologies and methods that have revolutionized the study cancer biology. 19 SEQUENCING Since the initial advent of sanger sequencing in the 1970s and the first draft of the human genome in the early 2000s, sequencing technologies have come a long way. In sequencing the human genome, what initially took 3 billion dollars and 13 years to complete can now be done in a couple days with a few thousand dollars [139]. This is a testament to the newly available next generation sequencing technologies. Sequencing technologies seemingly have the ability to cover most facets of gene regulation. Genome sequencing can uncover large and small changes to the genetic code, chromatin immunoprecipitation (ChIP)-sequencing is capable of discovering protein regulatory changes in promoter regions, and RNA sequencing can determine changes in gene expression. Even more impressive are the advancements in single-cell sequencing, which can be applied at the DNA and RNA level, and has promise to tackle the questions surrounding intra-tumor heterogeneity [140]. On a clinical level, targeted sequencing and exome sequencing have become important for determining course of action for treatment regimes. Utilization of targeted therapies has increased in recent years, but these therapies still rely on genetic information to make sound treatment decisions. This is evident in the treatment of non-small cell lung cancer patients with tyrosine kinase inhibitors (TKI). While TKIs work effectively in patients who harbor activating mutations in the oncogene EGFR, they show no results in patients without the mutations [141]. Targeted sequencing completed on lung tumor biopsies is capable of providing clinicians with the proper information. GENE EXPRESSION While sequence analysis plays an important role in research and clinical therapy, gene expression analysis is another necessary piece of the puzzle. Often, gene expression is not affected by mutations to the underlying gene, or gene expression may change without gene mutations. Whether through microarray technology or RNA-sequencing, large scale shifts in gene expression can be determined for a large number of samples relatively simply. While RNA sequencing is now more widely used, microarray 20 technology is still around, and a more cost effective technology. The main difference between the two technologies is microarray’s dependence on transcript specific oligos annealed to a chip, while RNA-seq sequences do not rely on these transcript specific oligos. When comparing the technologies, RNA-seq seems to have an advantage in detecting low level transcripts [142]. Like sequencing, gene expression patterns are often used to study cancer as well as determine the clinical course of action. In the laboratory, gene expression is often used to determine genome wide expression changes across sample groups that are subject to gene knockouts, drug treatment, or other experimental scenarios [105, 143]. In the clinic, gene expression patters are often used to classify patient tumors and determine a course of action for treatment [96, 104]. PATHWAY ANALYSIS Cellular processes are often organized into complicated pathways and protein networks. A prudent example of this is the Ras/Raf/Mek/Erk signaling pathway stemming from RTK stimulation, and leading to eventual transcription factor activation or repression [144]. With the complicated nature of these pathways and their key role in stimulating and maintaining oncogenesis, researchers have developed a number of tools for their investigation. Often, these tools rely on gene expression data gathered through microarray or RNA-seq technologies. One such tool has been the development of pathway signatures for human breast cancer [110, 145]. Pathway signatures are often developed through overexpression of an oncogene or GFP (green fluorescent protein) control within a particular cell line. Expression data is then gathered from the oncogene or GFP overexpressed line, and a training dataset is developed to allow for classification of future samples. This classification is given as a score that predicts whether the pathway in question is active. Another pathway prediction tool is Gene Set Enrichment Analysis (GSEA) [146]. Briefly, GSEA uses gene expression data to compare two groups of samples in order to determine whether particular gene sets or pathways may be up or downregulated in one sample group compared to the other. This analysis can often be useful when first exploring expression data from two 21 sample groups, such as drug treated vs. non-treated groups. Overall, these programs can serve as good predictors to which pathways may be activated or repressed within a tumor. This gives researchers the ability to narrow their search when completing lab validation. DATA ANALYSIS With the plethora of data that has been generated using the above technologies comes a need for expert data analysis. Over the years, a number of regulations, programs, and analysis methods have been put forth to deal with the large amount of incoming data. To ensure public access to data produced under federal grant money, authors are required to submit datasets to online portals, such as the Gene Expression Omnibus (GEO). Large databases have also been developed to allow for analysis of large datasets by the public. An example of this is The Cancer Genome Atlas (TCGA), a tool used to access genomic mutation data for thousands of human tumors of various cancers. Hundreds, if not thousands of programs have been generated to deal with the influx of data. Some of these programs are generated by individual labs, while others have been generated through the coordinated effort of multiple groups. For sequence analysis there are programs that “clean and prep” data, programs to align data to reference genomes, and numerous programs to determine genomic variants occurring within the data. Once the initial data processing is complete, there multitudes of other programs to complete specialized analysis, such as determining tumor heterogeneity or tumor mutation signatures. There are also dedicated programs for RNA, ChIP, and single-cell sequencing analysis. While all of these programs work to achieve the same result, many go about it in a different fashion, making the choice of which program to use dependent on the biological question. For instance, in genome sequence analysis, some programs can uncover rare mutations but also have a higher number of false positives, while other programs have a lower number of false positives but may miss low frequency mutations. Overall, analysis methods have drastically improved to increase statistical power and remove confounding effects. This is exemplified in RNA sequencing data, where data normalization 22 has improved to remove potential analysis errors including transcript number and length. In many cases these advancements are beneficial, however in some cases they pose even more challenges. For example, microarray technology was the go to for obtaining gene expression data in the early 2000s. Even though microarray is still used, RNA sequencing has become the standard for many labs conducting large gene expression studies. While there is a boon of available data, integrating microarray and RNA-seq datasets is still a challenging endeavor. THE FUTURE OF CANCER TREATMENT Cancer therapy has made many strides since the 19th and 20th centuries, however there is still a long way to go. This is evident when examining the treatment regimes and survival rates of breast cancer. Once common place, radical mastectomies are now considered barbaric as less invasive surgeries combined with adjuvant therapy have been found equally effective [147]. Drug treatments have also advanced tremendously, from early mustard gas derivatives [148] to more advanced chemotherapies and targeted therapies [149–151]. These treatments have seen vastly increased 5-year survival rates and decreased observed mortality rate [152]. Late stage metastatic and triple negative breast cancers still carry a poor prognosis, showing the need for improved therapies. Like breast cancer, the overall success for treatment of cancers has varied widely. Some cancers such as breast and skin melanomas are treated with high success, while others, such as lung and pancreatic, yield a poor prognosis [153, 154]. However, just like breast cancer, the overall 5-year survival rates do not tell the whole story. Treatment success can vary widely within certain cancers depending on molecular phenotype, genetic mutations, and stage of diagnosis. Melanoma for instance has an extremely high success rate when caught early, but has a poor prognosis after metastasis has occurred [155]. Current research has focused on characterizing the molecular and histological profiles of tumors in order to develop new therapies. Within the clinic, patients undergo tumor biopsies, which then undergo sequence, molecular, and histological analysis to apply applicable targeted therapies. These 23 targeted therapies have improved survival rates within the clinic, but resistance mechanisms continue to be a challenging issue. Some clinical trials, such as the ongoing SMMART trial [156], are attempting to circumvent these resistance mechanisms by closely monitoring tumor growth and performing new biopsies once resistance begins to develop. This allows a new treatment regime to begin and a further reduction in tumor volume. While these avenues show a lot of promise, they have issues as well. For instance, some patients cannot be enrolled in the SMMART trial due to a lack of actionable mutations. Furthermore, multiple biopsies can be burdensome on the patient, and unfeasible in certain cancers. Finally, this approach is extremely costly in terms of financial burden and manpower. It is fair to point out these issues may be solved with further research and technology development. A further characterization of cancer genomes and molecular profiles may lead to a greater number of actionable mutations. Improvements in our understanding of, and sequencing extra-cellular vesicles and other biomarkers may eliminate the need for invasive biopsies [157]. Technology advancements may also reduce costs and labor. The above financial challenges and patient burdens may make it impossible to apply this approach to every cancer patient and thus, other options need to be considered. While the heterogeneous nature of cancer has put finding a ‘universal cure’ in doubt, a universal cure is an endeavor we should still pursue even if that cure is more akin to a universal process than treatment with a single drug. The biggest obstacle in such an approach would surely be distinguishing tumor cells from cells in normal physiological condition. If this were done however, a number of targeting approaches could foreseeably be taken. One includes treating with already developed drugs that target particular pathways. More intriguing perhaps would be using Crispr technology in conjunction to inhibitors of DNA repair pathways. Hypothetically, this could damage the cancer cells enough to make them undergo cell cycle arrest and apoptosis once they are unable to repair the DNA damage. While these treatments may seem far off, they are surely worth investigation. 24 CHAPTER 1 ALTERED METASTASIS IN E2F1 KNOCKOUT MODELS OF HUMAN BREAST CANCER 25 PREFACE While this chapter is not directly related to the bulk of the work in this thesis, its importance is two-fold. First, this chapter underscores many of the important bioinformatics methods I have learned during my time in the Andrechek lab. These methods are now vital for success as a cancer researcher. Second, the whole genome sequencing completed in this study directly resulted in finding a mutation in the Ptprh gene. The characterization of this PTPRH mutation and its relevance to human non-small cell lung cancer is the bulk of my thesis work, and illustrates the importance of pan-cancer research. This chapter is adapted, with additional added data, from a manuscript previously published in Scientific Reports As: “Metastasis is altered through multiple processes regulated by the E2F1 transcription factor” DOI: 10.1038/s41598-021-88924-y 26 ABSTRACT The E2F family of transcription factors is important for many cellular processes, from their canonical role in cell cycle regulation to other roles in angiogenesis and metastasis. Alteration of the Rb/E2F pathway occurs in various forms of cancer, including breast cancer. E2F1 ablation has been shown to significantly decrease metastasis in MMTV-Neu and MMTV-PyMT transgenic mouse models of breast cancer. Here we take a bioinformatics approach to determine the impact of E2F1 loss on the genomic landscape of these tumors, and look specifically at genes related to the metastatic cascade, in both Neu and PyMT models. Through gene expression analysis, we reveal few transcriptome changes in non- metastatic E2F1-/- tumors relative to transgenic tumor controls. However investigation of these models through whole genome sequencing found numerous differences between the models, including differences in the proposed tumor etiology between E2F1-/- and E2F1+/+ tumors induced by Neu or PyMT. For example, loss of E2F1 within the Neu model led to an increased contribution of the inefficient double stranded break repair signature to the proposed etiology of the tumors. While the SNV mutation burden was higher in PyMT mouse tumors than Neu mouse tumors, there was no statistically significant differences between E2F WT and E2F1 KO mice. Investigating mutated genes through gene set analysis also found a significant number of genes mutated in the cell adhesion pathway in E2F1-/- tumors, indicating this may be a route for disruption of metastasis in E2F1-/- tumors. Overall, these findings illustrate the complicated nature of uncovering drivers of the metastatic process. 27 INTRODUCTION Breast cancer is the most diagnosed cancer in women. To study genomic events contributing to breast cancer, numerous genetically engineered mouse models have been generated, including MMTV- Neu [158] which recapitulates HER2+ve breast cancer, and MMTV-Polyoma virus Middle T antigen (PyMT) [84]. The PyMT model relies on overexpression of the PyMT oncogene, leading to downstream activation of the SRC and AKT pathways. The PyMT model is highly aggressive, with tumors appearing at 45 days of age. Metastasis to the lung occurs in over 90% of tumor bearing mice, resulting in wide use of PyMT for metastasis studies. Similar to human breast cancers, both Neu and PyMT models have striking heterogeneity at histological and gene expression levels [89, 92, 104, 105], reinforcing the importance of these models as tools for the study of breast cancer. Previous studies using Neu and PyMT models predicted a key role for the E2F1 transcription factor through a pathway signature analysis, suggesting that mechanisms outside the overexpression of the Neu or PyMT oncogene were contributing to tumor biology [86, 88]. The E2F family of transcription factors is involved in numerous cellular processes, best known for cell cycle control. Usually sequestered by retinoblastoma (Rb), E2F1 is released to act on downstream targets upon Rb phosphorylation [159]. While mutations in E2F1 are not common in human breast cancer, mutations within the E2F pathway occur in over 25% of breast cancer patients, illustrating the importance of the pathway [160–164]. To test the hypothesis that E2F1 regulates key events in Neu and PyMT tumors, E2F1 knockout (KO) mice [163] were interbred with Neu and PyMT models [86, 88]. This resulted in mammary tumors with changes in latency, growth rate, and a significant decrease in metastasis to the lung. Metastasis is the ultimate cause of mortality in cancer, with an estimated 90% of cancer deaths resulting from the spread of cancer cells to distal sites within the body [165]. Typically, cancer cells undergo numerous important steps for completion of the metastatic cascade. These include escape from the primary tumor, intravasation, extravasation, and seeding the distal site [166] as reviewed by Welch [167]. 28 An important component contributing to the metastatic capability of a tumor is its microenvironment. Various collagens and proteins integral to cellular and tissue structure are capable of impacting metastatic potential. Indeed, proteins within the extracellular matrix, including collagen IV, have been found to regulate metastasis within the liver [168]. Collagen IV is a major component of the basement membrane, an important barrier to tumor invasion, and breaching this has been shown to be a critical early step in tumor invasion and metastasis [169, 170]. Interestingly, a previous report demonstrated a decrease in the number of circulating tumor cells within PyMT E2F1-/- mice, suggesting a disruption to the early steps in the metastatic cascade. Other data shows remodeling of the extracellular matrix at pre-metastatic lesion sites to be important for eventual seeding of distant metastasis [171]. Recent advances in bioinformatics methods have facilitated the investigation of cancer biology. Publicly available transcriptomic datasets have allowed for comparisons between primary tumor and distant metastatic lesions [172, 173]. Next generation sequencing has furthered our understanding of cancer genomics. Studies involving the sequencing of human tumors have described the mutation rate of solid tumors [174], and demonstrated that numerous genomic events are required for metastasis [120, 175, 176]. To determine the underlying genomic events behind altered metastatic characteristics in E2F1 KO tumors, gene expression and sequence data was analyzed. Here, we characterize the genome landscape of E2F WT and E2F1 KO tumors from both the Neu and PyMT models and uncover new targets that may be critical to tumor development and progression. RESULTS ANALYSIS OF GENE EXPRESSION DATA IN NEU AND PYMT TUMORS We previously demonstrated altered phenotypic characteristics upon ablation of E2F1 within Neu and PyMT models, including changes in growth rate and tumor latency for the primary tumors (Figure 1.1A). Given the short latency of PyMT mice, it was surprising to observe tumor latency in PyMT mice significantly decreased with E2F1 loss while growth rate remained unaffected. Interestingly, the opposite 29 effect was seen within Neu E2F1-/- mice, where latency was significantly increased, and growth rate was significantly increased. However, the most striking phenotype was a significant reduction of metastasis with loss of E2F1 in both strains (Figure 1.1B and 1.1C). To determine whether gene expression differences regulated phenotypic changes in E2F1 knockout tumors, fold change differences were examined. Volcano plots revealed few genes with major gene expression changes when analyzing E2F1 WT and E2F1 KO primary tumors (Figure 1.2A). While there were some genes with a fold change between 1 and 1.5, there were very few genes with a fold change greater than 1.5. To test whether this is recapitulated in human breast cancer, data from The Cancer Genome Atlas (TCGA) was analyzed. E2F1 activity in HER2+ve samples was determined using pathway signature analysis. Samples were stratified into quartiles for E2F1 activity and differential gene expression was determined. As shown in Figure 1.2B, human breast tumors resemble mouse mammary tumors in that low E2F1 activity does not lead to vast gene expression changes. To test for genetic pathways affected by loss of E2F1, Gene Set Enrichment Analysis (GSEA) was completed on Neu and PyMT tumors with and without E2F1. GSEA analysis revealed several differentially regulated pathways, including WNT signaling, and nucleotide excision repair (Figure 1.2C). Importantly, WNT signaling has been shown to regulate the epithelial to mesenchymal transition, a process involved in the metastatic cascade [177, 178]. MUTATION ANALYSIS THROUGH WHOLE GENOME SEQUENCING Given that the gene expression analysis did not identify a mechanism altering metastatic potential, we examined genomic events occurring in Neu and PyMT tumors with and without E2F1. Whole genome sequencing was completed and single nucleotide variant (SNV) profiles were called for each tumor using TCGA best practices. Initial analysis of the SNV data resulted in an unexpectedly high proportion of SNVs occurring within chromosome 2 of the E2F1 knockout tumors (Figure 1.3A-D). However, E2F1 is located within the qH1 band of chromosome 2 and correlated to where the increased SNVs were observed (Figure 1.3E). While E2F1 knockout mice were backcrossed 12 generations to FVB, 30 we hypothesized that SNV abundance was called due to residual background strain DNA from the original E2F1 knockout stain. Given that E2F1 mice were generated in the SV129 background, and Neu and PyMT mice are on the FVB background, we filtered SNV calls using a list of SNVs that were generated from comparing the SV129 background against the C57/BL6 background, the standard mouse reference genome (Figure 1.3F). As a result, the majority of chromosome 2 SNV calls were filtered out, and the proportion of SNVs was roughly equal across the 19 autosomal mouse chromosomes in E2F1 WT and E2F1 KO PyMT tumors (Figure 1.3G). This was also the case for E2F1 KO Neu tumors (data not shown). As such, residual background is an important caution when sequencing mouse models. Interestingly, the SNV mutation burden was higher in PyMT mice as compared to Neu mice (p- value = 0.05), which was surprising due to the brief latency of PyMT tumors (Figure 1.4A). Except for one PyMT E2F1 knockout tumor, the rate of exonic SNVs ranged from .005 to .08 mutations per megabase. This mutation rate is similar to previous rates shown for mouse tumors [179], and is lower than the 1 mutation / megabase exonic mutation rate commonly observed in human breast cancer [174]. Surprisingly, a significant percentage shift of exonic, intronic, and intergenic SNVs occurred when comparing PyMT E2F1 KO tumors to WT tumors (Figure 1.4A). In PyMT WT tumors, the percent of exonic and intronic mutations were approximately 1 and 30 respectively. This is in contrast to E2F1 KO tumors where the percentages were approximately 2 and 38 respectively. The percentage increases (P-value = .05 for exonic and .03 for intronic) seen in E2F1 KO tumors corresponded to percentage decreases (P- value = .03) in the intergenic regions of the tumors. These shifts were not seen in Neu tumors. MUTATION SIGNATURES GENERATED FROM SNV PROFILES To analyze distinct types of SNVs occurring within our tumors, and investigate potential mechanisms driving these differences, a mutation signature approach was taken [180]. While trinucleotide signatures showed similarities between Neu and PyMT tumors, there were striking differences, such as T>G mutations occurring almost exclusively in Neu tumors of either E2F1 status 31 (Figure 1.4B). The signatures for all 12 tumors are shown in (Figure 1.5). Principal component analysis (PCA) completed using mutation signatures from all 12 tumors shows distinct clustering between Neu and PyMT tumors (Figure 1.4C). Furthermore, apart from a single E2F1 KO PyMT tumor, PCA separates E2F1 WT and E2F1 KO tumors into distinct clusters within the Neu and PyMT models. While PyMT E2F1 KO sample 2 has a 6-fold increase in the number of SNVs, this is not reflected within the sample clustering of the principal component analysis. This is due to PCA being completed on the mutation signatures of the samples. For example, if sample X were to have an increased number of SNVs as compared to sample Y, but the overall mutation profile of those SNVs was similar between sample X and Y, they would cluster together. The contribution of the 30 known COSMIC (catalog of somatic mutations in cancer) signatures to each Neu and PyMT tumor were then determined [180]. While all Neu and PyMT tumors had some contribution from signature 18, there were stark differences in other COSMIC signatures contributing to Neu and PyMT tumors (Figure 1.4D). For example, Neu tumors had contributions from signatures 1 and 3, while PyMT tumors were associated with signatures 4 and 20. Furthermore, there were signature differences when comparing E2F1 WT tumors to E2F1 KO tumors within the Neu and PyMT models. For example, Neu E2F1 WT tumors were associated with signatures 5 and 9, while Neu E2F1 KO tumors lacked these associations. Neu E2F1 KO tumors also had an association with signature 12, while Neu E2F WT tumors lacked this signature. When analyzing the proposed etiology for these signatures, Neu tumor signatures are associated with age, while PyMT tumor signatures have no age association, which correlates with Neu and PyMT tumor latency (Table 1.1). Interestingly, Neu tumors also have an association with inefficient double stranded break repair (DSB), with E2F1 KO tumors being more highly associated than E2F1 WT tumors. E2F1 has been found to recruit DSB processing factors, particularly NBS1, to DSB sites, which serves as a possible explanation for this signature [181]. PyMT E2F1 KO tumor signatures were not associated with DSB, but were highly associated with the smoking signature number 32 4, and defective DNA mismatch repair (MMR) signature 20. While it may seem counterintuitive that PyMT E2F1 KO tumors would be associated with one MMR signature and not the others (numbers 6, 15, and 26), it is entirely possible for this to occur. Multiple mutational profiles can be associated with a particular etiology, even though the mutational profiles themselves are distinct from each other. Together, these data suggest E2F1 loss drives differences in DNA repair and tumor etiology. EXAMINING TUMOR CLONALITY A wealth of evidence has shown tumors to have intra-tumoral heterogeneity on a histological and molecular level [119, 182–186]. Previous research demonstrated a shift in histological heterogeneity within E2F1-/- PyMT mice, where no shift in histology was seen in E2F1-/- Neu mice [86, 88]. To assess the molecular intra-tumoral heterogeneity in PyMT and Neu tumors, variant allele frequencies (VAF) were investigated. Briefly, the VAF is determined by taking a proportion of the number of reads containing a particular SNV mutation versus all of the reads in that location. In a single clone tumor, the VAF for all mutations will be .5 since half of the reads will have the mutation (we assume here that only one copy of the DNA is mutated). When analyzing the clonality of Neu and PyMT tumors, 5 of 6 Neu tumors had two clones, and all PyMT tumors had one clone (Figure 1.6). This is unsurprisingly given the fast growth of PyMT tumors. E2F1 status had no effect on the clonality of Neu or PyMT tumors. COPY NUMBER AND TRANSLOCATION EVENTS Multiple programs were also used to determine copy number variants and translocations occurring within Neu and PyMT tumors (Figure 1.7A-D). Based on consensus CNV calls from two programs, over 98% of the copy number events were small in size (under 1 mb), while relatively few larger events (above 1 mb) were observed. Surprisingly, there was a large amount of copy number gene overlap between the E2F WT and E2F1 KO tumors (Figure 1.7E). The large number of shared genes involved in copy number events may indicate E2F1 loss is not a primary driver of these events. 33 There were also a surprisingly large number of translocations occurring within the Neu and PyMT tumors. When comparing average number of translocations per sample across the genomic models, there were statistically more translocations occurring within Neu tumors than PyMT tumors, regardless of E2F1 status. When comparing E2F1 status within each model, there was no statistically significant difference (Figure 1.7F). To confirm the translocation calls made by Delly and Lumpy, 20 translocations from each tumor were chosen at random and read evidence for these translocations was analyzed using Genome Ribbon [187]. Translocation read data for one tumor is shown in Table 1.2. All tumors had at least 75% of translocations with some read support, with 9 of 12 tumors having at least 85% of translocations with some read support (Table 1.3). Interestingly, all translocation events analyzed had a varying level of wild type reads present. Since care was taken to exclude normal tissue when primary tumor was collected for sequencing, and since the abundance of wild type reads is fairly large for many of the translocation sites, this suggests a large amount of heterogeneity within the tumors. While some normal tissue (vasculature, immune etc.) is present in any tumor, the prevalence of wild type reads is far below that observed for mutations. To verify one of the translocation events from Table 1, PCR was completed with primers flanking the translocation junction. Both translocated and wild type reads were present at the breakpoint, confirming the existence of the translocation (Figure 1.8). Based on this evidence, upwards of 80% of the translocations were predicted to be real events. ANALYSIS OF DISRUPTED PATHWAYS To determine whether cancer and metastasis related genes were mutated within E2F1 WT and E2F1 KO tumors, the mutation list was filtered with known cancer genes from COSMIC. This analysis found mutations in a number of cancer associated genes (Table 1.4). While a few of the genes listed in supplemental table 2 have known metastatic implications, they were not consistently mutated within the sample groups, or were mutated exclusively within E2F wildtype tumors. To identify whether an abundance of mutations occurred within particular pathways comparing E2F1 knockout to wildtype 34 tumors, a database mining approach was taken using Gather [188]. First, genes with potentially impactful mutations were stratified into two gene lists that were distinct in E2F1-/- and E2F1+/+ tumors. Potentially impactful mutations included SNVs causing stop gain or nonsynonymous mutations, translocations causing truncated or fusion genes, and copy number segments resulting in the amplification or deletion of genes. These two gene lists were then applied to Gather to determine whether Gene Ontology (GO) lists or KEGG (Kyoto encyclopedia of genes and genomes) pathways were significantly mutated. This analysis determined a number of significant GO lists that were present within the gene list from E2F1-/- tumors, but not E2F1+/+ tumors. In fact, the top three GO pathways associated with E2F1 KO tumors were involved in cell adhesion (GO:0007155 p-value = <.0001, GO:0007156 p-value = <.0001, GO:0016337 p-value = .0001). Genes in those cell adhesion GO annotations included various collagens, integrins, and cadherins (Figure 1.9). Previous research has shown collagens to be important for tumor maintenance, angiogenesis, and metastasis [168]. Collagen IV is the major component of the basement membrane and is comprised of heterogeneous trimers stemming from six COL4A genes. Three collagen IV genes were found mutated in different PyMT E2F1 KO tumors. Other mutations within PyMT E2F1 KO tumors include COL5A2, with collagen V being a component of the interstitial matrix, COL6A1-3, with collagen VI being abundant in the tumor invasive front [168–170] and several integrin and cadherin genes. Interestingly, a closer examination of the gene expression data revealed the integrin pathway was also found to be upregulated within E2F WT tumors, but not E2F1 KO tumors. There was also an abundance of intronic and synonymous mutations within these genes, suggesting they may be hypermutated due to the disruption of E2F1 within the model, although this hasn’t been statistically verified. Indeed, of the 64 mutated genes within the cell adhesion Gene Ontology number 0007155, half were noted to have an E2F1 binding motif using TRANSFAC (p-value = .003, data not shown). With E2F1 known to regulate the cell cycle as well as a number of genes involved in DNA repair and adhesion, it is feasible that loss of E2F1 could result in an 35 abundance of mutations within certain gene profiles through a disruption of the cell’s ability to undergo DNA repair during the S phase. E2F1 loss and corresponding disruptions to the cell cycle, especially during S phase could conceivably lead to an increased mutation burden, potentially within E2F regulated genes. E2F1 has also been shown to recruit nucleotide excision repair and double stranded break repair factors to sites of DNA damage [181, 189, 190]. It is possible that loss of recruitment of these factors could lead to inefficient DNA repair, and an increased mutational burden, although this would need to be further explored. DISCUSSION Ablation of E2F1 in PyMT and Neu transgenic mice results in a significant decrease in pulmonary metastasis. To determine whether gene expression changes were responsible for altered phenotypes, transcriptomic data was analyzed but showed no large changes in gene expression between E2F1+/+ and E2F1-/- tumors. This was recapitulated in human HER2+ breast cancers after separation into E2F1 high/low quartiles. GSEA revealed several pathways differentially regulated between E2F1+/+ and E2F1-/- tumors, but without obvious implications in regulating metastasis. To test for genomic alterations impacting metastasis, we completed WGS of E2F1+/+ and E2F1-/- tumors in Neu and PyMT models. Mutation trinucleotide signatures showed differences between etiology of Neu and PyMT tumors, as well as between the E2F1 knockout and WT tumors. Neu tumors were more closely associated with double stranded break repair, while PyMT tumors were associated with DNA Mismatch Repair. As noted, Neu E2F1 KO tumors were more closely associated with defective double stranded break repair than Neu E2F wildtype tumors. An interesting question that warrants further investigation would be whether this was due to increased alterations within these genes upon loss of E2F1, or due to some other transcriptional function of E2F1. Analyzing mutated genes for GO and KEGG pathways revealed alterations in cell adhesion. Further analysis of these genes uncovered a role in the basement membrane and interstitial matrix, which could be a potential mechanism for disruption of the metastatic cascade. 36 Sequencing data from genetically engineered mouse models is largely lacking, with only a few models having been sequenced [115, 179, 191, 192]. SNV mutation rates between previous studies and ours indicate similarities, and small discrepancies may be explained through differences in data processing methods. For copy number variation prior research has shown numerous small copy number events and a few larger events [115], although this was estimated from whole exome sequencing data. This was recapitulated in our data, with the exception that large events were not prevalent after taking the consensus of two structural variant callers. We also noted a substantially greater number of translocations within the mouse tumors as compared to a previous study comparing Neu and PyMT wildtype tumors, while the same trend of Neu tumors having more translocations than PyMT tumors held. This increase in called translocations is likely due to differences in calling methods. Overall, the field would benefit from a large comparison of mouse tumor sequencing data with tumors analyzed under the same parameters. After analyzing mutated genes using a pathway approach, many genes involved in cell adhesion were found having potentially impactful mutations in E2F1 knockout tumors, but not E2F1 wild type tumors, including various collagens, integrins and cadherins. Of the mutated genes found important to cell adhesion, genes such as Col4a1 are important components of the basement membrane and are involved in tumor progression. Disruptions to the basement membrane and collagen formation has potential to disrupt the metastatic process. This theory is supported by previous data we generated, which found a significant decrease in circulating tumor cells [88]. Interestingly, we have also previously noted amplification of Col1a1 in Neu E2F1 WT tumors which impacted the metastatic process [193]. Combined, these data suggest collagens and proteins within the basement membrane are important to the metastatic process in Neu and PyMT tumors. SNV profiling for human tumors has utility for both discovery and treatment purposes. Sequencing of human breast tumors has revealed larger genomic trends as well as mutation rates for 37 oncogenes and tumor suppressors [194]. The importance of determining SNVs within mouse models is evidenced by previous research from our lab and others [115, 179]. Potential sources of error when determining SNVs can stem from differing genetic background within mice, even after backcrossing, as well as being too loose or too stringent with the filtering process. Interestingly, our prior work identified and validated a SNV in Ptprh in PyMT tumors [179], but this mutation was not present within this sequence analysis. While the initial paper stipulated an SNV call must pass 3 of 4 SNV calling programs, the work herein stipulated a call must pass 3 of 3 programs used, leading to the discrepancy. When analyzing the SNV data for each program used, a Ptprh SNV was called from SomaticSniper and Varscan, but not called from Mutect2. This suggests the usage of multiple programs to call SNVs is more applicable for discovery purposes, and that less stringent filtering parameters may be beneficial. When analyzing copy number alterations and translocations within the models, there were a surprising lack of differences across E2F1 status, suggesting E2F1 loss is not a primary driver of these events. Furthermore, the varying read support seen for confirmed translocations indicates a high amount of tumor heterogeneity occurring in both models, regardless of E2F1 status. While there were numerous COSMIC associated genes mutated within the models, no mutations conserved between E2F1 knockout tumors (within or across models) were immediately apparent as important to the metastatic process. Analyzing gene expression changes between E2F1 WT and E2F1 KO tumors showed no major changes upon E2F1 loss. This was recapitulated among human HER2+ve breast cancer tumors stratified between low and high E2F1 activity. The lack of large gene expression changes may indicate that numerous small changes result in phenotypic alterations, or that genomic mutations are leading to altered protein function/localization. Interestingly, the gene encoding Transcription Factor AP-2 Beta was significantly upregulated in Neu E2F1 KO mice. This, combined with the data showing a lack of major gene expression changes between E2F1 WT and E2F1 KO tumors, indicates some possible compensation by Transcription Factor AP-2 Beta, as well as other members of the E2F family [86, 105]. The sequencing 38 data from E2F1-/- Neu and PyMT mice indicate phenotypic changes may be due to an abundance of mutations in particular pathways, in addition to minor expression changes. Taking into consideration that the metastatic process likely originates from a small population of metastatic cells within the primary tumor, the contribution of a few metastatic cells to the bulk tumor gene expression or sequencing data may cause key events to be lost within the noise of the primary tumor. Future work will address these issues through single cell sequencing and gene expression in matched primary and metastatic tumors. MATERIALS AND METHODS GENE EXPRESSION ANALYSIS Gene expression data was described previously [86, 92]. Volcano plots for Neu and PyMT tumors were generated by removing outliers for each sample group using Nowaclean (Holsb, Einar. 2017. “nowaclean”), samples greater than 3.0 standard deviations away when constructing PCA plots were removed. Data were log2 transformed, and the mean for each gene was calculated within the four sample groups. Fold change was calculated by subtracting the E2F1 KO mean from the E2F1 WT mean for each gene. P-values were calculated and data plotted using EnhancedVolcano (Blighe, Kevin. 2018. “EnhancedVolcano”) in R. Human RSEM normalized RNAseq breast cancer data from TCGA was downloaded from UCSC Xena, filtered to HER2+ samples, and sorted by E2F1 expression. Lower and upper quartiles were kept and data were processed for volcano plots as above. GSEA plots were generated from combining Neu and PyMT gene expression datasets. Datasets were collapsed and combatted to remove batch effects. GSEA was run using GenePattern [195]. WHOLE GENOME SEQUENCING AND PROCESSING Raw whole genome sequencing data from mouse tumors was previously obtained 28. Briefly, three samples from each group (total of 12) were used, DNA from flash frozen extracted following manufacture’s protocol for Qiagen Genomic-tip 20/G kit. Sequencing was completed at a depth of 40x with paired end, 150 base pair reads. DNA was prepared and sequenced using Illumina TruSeq Nano DNA 39 library preparation and an Illumina HiSeq 2500. For this study, raw fastq files were assessed for quality control using FASTQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimmed using Trimmomatic [196]. Files were aligned to mm10 mouse reference using BWA MEM [197] with standard parameters. Picard tools (“Picard Toolkit.” 2019) was used to add read groups and remove duplicates. Samtools [198] was used to sort and index files. VARIANT CALLING Somatic SNVs were called using SomaticSniper [199], Mutect2 [200], and VarScan [201]. Consensus calls were merged using R (R Core Team (2018)) base programming, and mutations were only kept if called by all three programs. SNV calls were filtered using base R to account for differences between the FVB strain and mm10 alignment (C57/BL6), as well as differences between the SV129 strain (original E2F1 mouse background) and C57/BL6. SNVs were annotated using Annovar [202]. CNVs were determined by keeping the consensus of Lumpy [203] and Delly [204]. Consensus was determined using Intansv (Yao W 2019) at a threshold of .2, and events smaller than 10,000 bp were filtered out. Intansv was also used to annotate CNV events. Translocations were called using Lumpy and Delly, and filtered based on read evidence. Lumpy calls were kept if they had at least 20 supporting split end and paired end reads, Delly calls were kept if there was split end and paired end read evidence for the call. WT FVB mouse sequence was used as a normal control. MUTATION SIGNATURES Trinucleotide mutation signatures were completed using the Musica [205] shiny app in R. Musica code was altered to allow for the use of the mouse mm10 reference genome. TUMOR CLONALITY Clonality for each tumor was determined individually using the MAGOS program in R [206]. An updated R script was acquired through email correspondence with the author. Base R was used to extract 40 VAFs from the consensus SNV calls, and to prep files for use in MAGOS. VAFs of 0 and 1 were removed as per author’s suggestion. CIRCOS PLOTS Circos plots were generated for each sample using CIRCOS version .69 [207]. Genetic variants were plotted according to the mm10 reference genome. TRANSLOCATION VERIFICATION Read evidence for 20 randomly selected translocations from all 12 sequenced samples was examined using GenomeRibbon [187]. For PCR verification, primers were designed with at least 400 bp flanking the predicted breakpoint. 41 APPENDIX 42 Figure 1.1: Altered phenotypic characteristics in E2F1−/− tumors A) E2F1−/− mice were crossed with MMTV-Neu and MMTV-PyMT mice on the FVB background to create 43 Figure 1.1 (cont’d) E2F1 knockouts in both models. B) Phenotypic changes seen in PyMT E2F1−/− mice and (C) Neu E2F1−/− mice, summarizing changes in latency, growth rate, and number of metastasis. H&E staining of E2F1+/+ mouse lung shows a large number of metastasis, while E2F1−/− mice have little to no metastasis. Histology of the lungs was obtained at primary tumor endpoint. 44 Figure 1.2: Gene expression changes in E2F1−/− mouse tumors, and E2F1 low human breast cancer A) Two volcano plots show significant fold changes in genes from Neu and PyMT mouse tumors 45 Figure 1.2 (cont’d) respectively. Fold change was determined by subtracting the E2F1 KO mean from the E2F1 WT mean for each gene. Fold change and p-value cutoff for Neu tumors was .5, and .05 respectively. Fold change and Pvalue for PyMT tumors was 1.0 and .001 respectively. B) Diagram represents data processing steps for human TCGA data. A volcano plot shows significant fold change genes in E2F1 high vs. E2F1 low human HER2+ve tumors. Fold change was determined by subtracting samples in the lowest E2F1 quartile mean from the highest E2F1 quartile mean for each gene. Fold change cutoff and p-value for human tumors was 2.0, and 10e−60 respectively. C) GSEA plots generated for E2F1 WT vs E2F1 KO tumors (Neu and PyMT combined) show enrichment of Nucleotide excision repair, and WNT signaling pathways in E2F1 KO tumors. 46 Figure 1.3: Filtering background strain to remove artifacts that have potential to confound analysis A) Pie chart from an E2F1+/+ PyMT tumor represents the normalized (SNVs/Chromosome Size) percentage of SNVs within each chromosome. B) Pie chart from an E2F1−/− PyMT tumor represents the normalized percentage of SNVs within each chromosome. An abundance of SNVs within chromosome 2 47 Figure 1.3 (cont’d) is observed. C) The banding pattern of mouse chromosome 2. The arrow highlights the location of E2F1, and the yellow box represents the bands represented in D and E. D) Manhattan plot shows the number of SNVs occurring within the 2qF3-2qH3 bands of chromosome 2, in the E2F1+/+ sample from A. E) Manhattan plot shows the number of SNVs occurring within the 2qF3-2qH3 bands of chromosome 2, in the E2F1−/− sample from B. F) Top pie chart is the same as in B. Bottom pie chart represents the percentage of SNVs across each chromosome of the same sample as above, after filtering on the sv129 background. 48 Figure 1.4: SNV mutation burden in Neu and PyMT tumors 49 Figure 1.4 (cont’d) A) First two bar graphs represent the number of total or exonic mutations per megabase occurring in all 12 sequenced tumors. Third graph represents the percentage shift of exonic, intronic, and intergenic mutations in PyMT+/+ and PyMT-/- tumors. B) Shows representative mutation profiles for each of the four classes of samples sequenced. Mutation profiles are derived from 96 bp trinucleotide signatures originally developed by Alexandrov et. al. Four classes of samples are Neu E2F1+/+, Neu E2F1−/−, PyMT E2F1+/+, PyMT E2F1−/−. C) PCA plots derived from trinucleotide signatures show clustering of all 12 samples sequenced. D) The heatmap of cancer signatures for the 12 sequenced tumors, as well as various cancers is shown. 50 Figure 1.5: Mutation profiles 51 Figure 1.5 (cont’d) Mutation profiles for all 12 Neu and PyMT mouse tumors corresponding to four classes in Figure 4B. Mutation profiles derived from 96 bp trinucleotide signatures originally developed by Alexandrov et. al. 52 Figure 1.6: Clonal heterogeneity in Neu and PyMT tumors Graphs showing clonal populations in representative Neu and PyMT tumors. Each dot represents a specific mutation, with the Y-axis showing the total number of reads covering that mutation, and the X- axis showing the variant allele frequency of that mutation. Each color represents a different predicted clone. E2F1 status did not affect clonality. 53 Figure 1.7: Mutation burden in Neu and PyMT tumors A) Circos plot for a representative Neu E2F1+/+ sample. B) Circos plot for a representative Neu E2F1−/− sample. C) Circos plot for a representative PyMT E2F1+/+ sample. D) Circos plot for a representative PyMT E2F1−/− sample. For A-D Circos plots, outer most ring represents the mouse chromosomes. Four successive inner rings represent the following mutation types; total SNVs, exonic SNVs, Copy number variation with green being amplification and red being deletion, and translocations. E) Venn diagram showing the overlap of genes within copy number events. Consensus copy number events were generated for each of the three samples within the four sample classes. Genes were then extracted and compared across the sample classes. F). Venn diagram showing the overlap of translocations occurring within the four sample classes. Consensus translocations calls from each of the three samples within each class were generated, and the four classes were then compared. 54 Figure 1.8: Verification of translocation calls 55 Figure 1.8 (cont’d) A) Example of a GenomeRibbon plot where no structural variation occurs. The top colored bands represent each chromosome of the mouse, and the red box below represents the location searched within a sample’s bam file. Each line within that box represents a different read. B) A GenomeRibbon plot representing translocation number 13 from table 1. Translocated reads are shown between chromosome 9 and chromosome 8. C) Gel image of the chromosome 8/9 translocation from the GenomeRibbon plot above. DNA was from a PyMT E2F1−/− tumor. Both translocation and wild type tumor DNA were amplified. Translocated reads were amplified using a primer set flanking the region where the two translocated ends ligate. 56 Figure 1.9: Mutations in basement membrane genes Diagram shows various mutations occurring in genes that code for proteins making up the basement membrane and interstitial matrix. Circles at top indicate genes with colors representing 1 of 3 sequenced E2F1−/− PyMT tumors that has a mutation in that gene. Image on left represents a breast tumor with surrounding basement membrane. Image on right represents the basement membrane and interstitial matrix on the outer edge of a tumor. 57 Table 1.1: Mouse tumor signature etiology Table showing contribution of each proposed tumor etiology for each of the 12 mouse tumors. Numbers represent a proportion of the whole. 58 Table 1.2: Supporting reads for 20 randomly selected translocations from the tumor in figure 6 Random translocations were selected by inputting all translocations from the tumor into Excel, and using the RAND() function to assign a random number. The 20 highest translocations were then selected. Positions 1 and 2 represent the translocation breakpoint. Genome Ribbon was used to analyze read evidence. 59 Translocation #Position1Position2Supporting Reads (approximate)Total Reads (exact)% Support13 _ 655520532 _ 20941004106116.3922 _ 16166984318 _ 78292047101198.40315 _ 439442181 _ 1123188556896.74413 _ 2330787611 _ 883033051110410.5855 _ 560918218 _ 56166475147718.1865 _ 70584362 _ 89313194498.16716 _ 8353260414 _ 96841358158318.0783 _ 15328805017 _ 10818207158817.05916 _ 8353281914 _ 96841377199420.211016 _ 1836800412 _ 80665928108511.76118 _ 10286124117 _ 679324430570.001214 _ 2131251611 _ 9863013162346.84139 _ 552244338 _ 85188141138215.8514X _ 384801499 _ 55983052127017.14156 _ 7316234916 _ 961218153348.82164 _ 432625733 _ 1138572144685.88175 _ 6257393413 _ 867963532772.60183 _ 1359291831 _ 1396350929959.47196 _ 676807444 _ 14741947901580.00207 _ 7919900519 _ 40536086148117.28 Table 1.3: Table showing read support for 20 randomly drawn translocations within each of the 12 mouse tumors To pick 20 random translocations, for each tumor, all translocation events were imported into excel and a random number was assigned using RAND() function. These were then sorted highest to lowest, and the 20 highest translocations were taken. Translocation read support was analyzed using GenomeRibbon. 60 TumorTranslocations with Extensive* Read SupportTranslocations with Low* Read SupportTranslocations with no Read Support% Extensive Support% with at Least Some SupportAverage Read Support (%)Neu_E2F1KO_11811909514.5Neu_E2F1KO_21703858514.78Neu_E2F1KO_3132565758.82Neu_WT_11712859015.83Neu_WT_21334658010.22Neu_WT_31811909512.42PyMT_E2F1KO_11622809013.21PyMT_E2F1KO_21811909513.48PyMT_E2F1KO_31712859010.8PyMT_WT_11442709013.62PyMT_WT_21335658014.94PyMT_WT_31541759514.06*Extensive read support is deemed greater than 5% of reads supporting the translocation*Low read support is deemed greater than 0, but less than 5% of reads supporting the translocation Cosmic Cancer Genes Exclusive Mutation Mutations Exclusive to E2F1 Mutation Cosmic Cancer Gene to E2F1 KO Tumors Type WT Tumors ABL1 AFF1 AKT2 ALK ANK1 AR ARHGEF10 ARID1A ATM ATRX AXIN1 BAZ1A BCL11A BCL9 BCL9L BRD4 CAMTA1 CASP9 CBLB CCDC6 CD274 CD79A CDKN1A CNTRL CREB1 DNMT3A ELF4 ELK4 ELN EPS15 ERCC2 ERCC3 ERCC4 ETV5 EZR FAM47C FAT3 FGFR2 FLNA FLT3 FOXP1 AFF4 SNV ATP1A1 SNV BAP1 SNV BCL2 SNV BCL7A SNV CARD11 SNV CASP3 SNV CHEK2 SNV CPEB3 SNV CTNNB1 SNV SNV ETV4 Translocation FLI1 SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV FOXO4 LHFP LMNA MSH2 NRG1 PIK3R1 PLAG1 POLD1 PREX2 RANBP2 ROBO2 RSPO3 SF3B1 SMAD4 SUZ12 TGFBR2 ZBTB16 ZEB1 Table 1.4: Cosmic associated genes 61 Type SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV Translocation SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV Table 1.4 (cont’d) Table shows Cosmic cancer associated genes that are mutated exclusively within E2F1 KO or E2F WT mouse tumors. GAS7 GPC5 GRM3 H3F3A HOXD11 IL6ST JAK2 KAT7 KCNJ5 KDM6A KDSR KEAP1 KMT2A KMT2C KMT2D LZTR1 MAF MALT1 MAP2K4 MAP3K13 MITF MLLT1 MLLT10 MSN MUTYH NACA NBEA NF1 NFKB2 NIN NTRK3 NUP98 NUTM1 PAX8 PDGFRA PDGFRB PHOX2B PICALM POU2AF1 PTCH1 SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV Translocation SNV SNV SNV SNV SNV Translocation SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV 62 Table 1.4 (cont’d) PTK6 PTPN6 PTPRT PWWP2A RARA REL RET RMI2 RNF213 ROS1 SDHAF2 SETD2 SFPQ SIRPA SIX1 SKI SMARCE1 SOCS1 SPEN SRC SRGAP3 STAG1 STK11 STRN TAF15 TBX3 TCF3 TEC TET1 TET2 TFEB THRAP3 TMPRSS2 TRAF7 TRIM24 TRIM27 TRIP11 TSC1 TSHR VAV1 VHL WT1 ZFHX3 SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV SNV 63 CHAPTER 2 PTPRH MUTATIONS IN PYMT MOUSE TUMORS 64 ABSTRACT Genetically engineered mouse models are an important means for investigating a variety of cancers. While their relevancy to human cancer has been well documented on a histological and molecular level, sequencing of mouse tumors has not been as common. Through whole genome sequencing of PyMT mouse mammary tumors, we have uncovered a mutation in the protein tyrosine phosphatase receptor type H gene (Ptprh). This conserved mutation is present in 80% of PyMT tumors, and correlates with increased phosphorylation of EGFR, a known target of PTPRH. Interestingly, Ptprh mutations also correlated with increased p-AKT, an important signaling molecule downstream of EGFR. 65 INTRODUCTION PHOSPHATE SIGNALING WITHIN THE CELL The human body is a highly organized, functional system. To achieve this high degree of functionality, cells need to communicate effectively with their neighbors and within themselves. Communicating with neighboring cells is usually accomplished through a variety of extra-cellular ligands that act as messages travelling from cell to cell, and throughout the body. Within each cell, signaling is achieved through a complicated network of specialized proteins that are often activated through a series of reactions catalyzing ATP to phosphorylate amino acid residues on target substrates. Many of these proteins are classified as kinases and broken down into two large groups based on which amino acid residues are phosphorylated, including serine/threonine kinases and tyrosine kinases. Within these cascades are a number of other proteins including guanine nucleotide exchange factors (GEF) that act as intermediaries, and become active upon exchange of their bound GDP for GTP. Some downstream targets of these cascades are transcription factors. Activation or repression of transcription factors by signaling cascades eventually leads to transcription of various genes. While these signaling pathways are complicated, they are often initiated through various receptor tyrosine kinases (RTK)s. RECEPTOR TYROSINE KINASES Receptor tyrosine kinases are perhaps some of the most important signaling molecules in cellular communication and cancer. Prior to the name ‘receptor tyrosine kinase’ being coined in the late 1970’s, important work involving the elucidation of this class of proteins had been done. Experiments in the early 1960s were responsible for the discovery of epidermal growth factor (EGF), the ligand eventually found to be responsible for stimulation of the epidermal growth factor receptor (EGFR) and other RTKs. EGF was found to prompt early tooth eruption and eyelid formation in 8 day old mice [208, 209]. Work in the 70s also demonstrated the ability of the protein SRC and the growth factor EGF to stimulate serine and threonine phosphorylation, with the eventual seminal paper showing phosphorylation of tyrosine 66 residues by the SRC kinase [210–214]. It was a short time later when the phosphor-tyrosine activity of EGFR was also found [215]. Decades later, we have a much clearer understanding of RTKs and how they operate within the cell. While there are numerous classes of RTKs capable of being activated in a number of fashions, canonical RTK activation typically relies on dimerization of RTK monomers residing within the cell’s cytoplasmic membrane ([216]. These RTKs consist of an extracellular binding domain, transmembrane domain, and intracellular catalytic domain. The basic RKT activation process consisting of ligand binding, dimerization, and phosphorylation of tyrosine residues on the C-terminal tail is conserved across varying RTK families, however the details of the process can differ significantly between individual receptors. While extracellular ligand binding appears necessary for RTK activation, and is usually associated with dimerization of RTK monomers, it isn’t always required for dimerization [217]. Early RTK paradigm, applicable to a number of RTKs, shows dimerization is driven by ligands that are themselves dimerized, as is the case with VEGF and Axl [218–220]. Dimerization of other RTKs is driven by monomeric ligands, such as FGF [221]. Certain RTKs also require accessory molecules to aid in dimerization [222, 223]. With certain RTKs, the ligands directly facilitate dimerization by binding to each other. However, some RTKs dimerize by binding directly to themselves, with ligand binding facilitating activation by inducing a conformation shift. In some cases, RTKs are capable of dimerizing with other members in their family, which is common within the ERBB family of RTKs [224]. After ligand binding and dimerization, activation of the RTK dimer occurs, usually through a conformational shift that releases cis-auto inhibition in the intracellular domain. In most cases, the conformational shift opens up the active site, allowing ATP binding to occur. Interestingly, while most RTKs have vastly different crystal structures in an inactive state, structures of active RTK catalytic domains are strikingly similar [225]. Other modes of activation are seen, including by-passing allosteric inhibition as well as inhibition by c-terminal sequences [226]. After the active site is made accessible, tyrosine 67 residues near the c-terminal end of the intracellular domain become phosphorylated. Many RTKs have numerous tyrosine residues capable of being phosphorylated, and some evidence shows the residues are phosphorylated in a specific order [227]. Once tyrosine residues on the C-terminal tail become phosphorylated, a number of signaling molecules are recruited to propagate downstream signaling. Many of these molecules have SRC homology 2 (SH2) or phosphotyrosine binding (PTB) domains [228, 229]. These signaling cascades can achieve deregulated cell growth through numerous mechanisms, including activation or repression of numerous transcription factors capable of altering cellular programing, as well as differential control of the cell cycle. Numerous mechanisms act as a negative feedback loop to keep RTK signaling in check, including RTK degradation through ubiquitination and phosphate removal by protein tyrosine phosphatases (PTPs) [230, 231]. Overall, the complexity of RTK signaling is vast, and disruptions to all facets of these processes can induce tumor formation. Disruptions to RTKs themselves include chromosomal rearrangements, amplification events, and gene mutations resulting in a gain of function [226], something commonly seen within the epidermal growth factor receptor. EPIDERMAL GROWTH FACTOR RECEPTOR EGFR plays a role in numerous cancers including glioma and lung cancer. EGFR is a member of the ERBB family of RTKs, and is involved in numerous signaling pathways responsible for increasing cellular growth, proliferation, and an evasion of apoptotic signals. Pathways stimulated by EGFR activation include Pi3k/Akt and Ras/Raf/Mek/Erk. While EGFR follows the basic RTK activation process, there are notable differences compared to more ‘canonical’ receptor tyrosine kinases. For example, some evidence has shown EGF is capable of activating pre-existing EGFR dimers [232, 233], and further evidence has shown increased expression of EGFR can stimulate ligand independent dimerization [234]. Even though ligand binding is capable of stimulating dimerization through conformational shifts, EGFR dimerization is entirely mediated by the extracellular domains [235, 236]. Furthermore, EGFR seems to differ in that the 68 receptor doesn’t require trans-autophosphorylation to phosphorylate and open the active domain in the C-terminal tail. Instead, the intracellular region of EGFR contains a C-lobe and an N-lobe. Once dimerized, the C-lobe is capable of swinging around to connect with the N-lobe allowing a disruption to the auto- inhibited state, and an active conformation to be taken [237]. After the active conformation is taken, phosphorylation can occur on the many tyrosine residues in EGFR’s c-terminal tail [238–240]. Interestingly, various ligands seem capable of inducing differential tyrosine phosphorylation and various downstream signaling pathways [241, 242]. Gene mutations are also capable of inducing EGFR’s active state, and these mutations are common in multiple cancers. Common mutations leading to constitutively active EGFR in non-small cell lung cancer (NSCLC) include a deletion in exon 19, and the L858R point mutation [243, 244]. EGFR stimulation can lead to eventual transcription of numerous genes, from immediate early genes such as the transcription factors FOS and JUN within minutes, to secondary late response genes over 120 minutes after stimulation [245]. After signaling, EGFR is internalized and returned to the cell surface or marked for degradation [246, 247]. Some research has indicated the cell’s ‘decision’ process involving EGFR internalization is pH dependent [248]. EGFR has also been seen in the nucleus of regenerating liver tissue [249], and various cancers including ovarian and bladder [250, 251]. Furthermore, EGFR has been found to act as a transcriptional activator via direct binding to A/T-rich sequences (ATRS) in the promoters of certain genes, such as cyclin D1 [252]. Nuclear EGFR is also capable of acting as a co-activator through interactions with transcription factors, such as STAT3, which recruits nuclear EGFR to the iNOS gene [253]. This has led to nuclear EGFR having prognostic value for a variety of cancers, including breast and non-small cell lung cancer [254, 255]. Overall, EGFR is extensively involved in cancer progression through a variety of mechanisms. Its importance is illustrated by the successful treatment of EGFR mutant cancers with tyrosine kinase inhibitors, which will be discussed in more detail further below. 69 PHOSPHATASES Just as RTKs are responsible for propagating phosphate signaling within the cell, phosphatases are responsible for regulating these signaling pathways through the removal of phosphate groups from target residues. While conventional wisdom suggested kinases were the most important aspect regarding intercellular signaling, phosphatases are just as important in that regard. Since some of the earliest work on tyrosine phosphatases [256–260], the field has expanded rapidly as a sign of appreciation for how important these proteins are in the regulation of cellular pathways. Typically, phosphatases are broadly classified into two groups, including serine/threonine phosphatases and tyrosine phosphatases. These groups are further delineated into a number of classifications dependent on the subcellular location and substrate specificity of the phosphatase. Here I will focus more on protein tyrosine phosphatases (PTPs), and more specifically receptor like PTPs (RPTPs). RPTPs largely consist of a variable extracellular region, transmembrane domain, and largely conserved intra-cellular phosphatase domain [261]. Often, the extracellular regions of RPTPs are comprised of a number of immunoglobulin-like or fibronectin type III domains, which are thought to mediate substrate binding and cell-cell contacts [261]. The intracellular phosphatase domains of RPTPs consist of the highly conserved HC-(X5)-R motif responsible for catalytic activity, as well as nine other conserved motifs that play a role in selectivity and catalysis of target substrates [262]. Many PTPs contain a cleft within the conserved catalytic motif that is responsible for recognition of phosphorylated tyrosine [263]. This cleft is too deep for phosphorylated serine and threonine residues, which is thought to mediate the selectivity of PTPs for pTYR. Catalysis of phosphorylated tyrosine occurs during a two step chemical process involving a conformational shift of the PTP active site, which makes the PTP catalytically competent [261]. While PTPs are generally thought to shut down pathway signaling, their impact is entirely context dependent. In fact, the regulation of certain pathways by PTPs can result in activation or repression in an entirely context dependent manner. In the case of the RPTP CD45, dephosphoylation of SRC results in 70 activation of signaling downstream of SRC, rather than an inhibition of the pathways [264, 265]. These context dependent processes complicate the narrative of PTPs, allowing them to be viewed as having tumor suppressor or oncogenic properties depending on their cellular location and target pathways. PROTEIN TYROSINE PHOSPHATASE RECEPTOR TYPE H PTPRH, otherwise known as Stomach Cancer-Associated Phosphatase 1 (SAP-1) is a member of the receptor like protein phosphatases. Like many other RPTPs, PTPRH has an extracellular region consisting of fibronectin domains, a transmembrane domain, and an intracellular phosphatase domain. The structure of PTPRH is largely conserved between humans and mice, with humans having eight fibronectin domains and mice having six [266]. PTPRH was first cloned in the early 1990’s (as SAP-1) from human gastrointestinal cancers, and much of its characterization has been in that context [267]. Like CD45, PTPRH is capable of activating pathways downstream of SRC in a context dependent manner [266]. Interestingly, PTPRH becomes inactive upon dimerization, which is regulated by the extracellular fibronectin domains [266]. While the literature base for PTPRH is small, it was found to be a regulator of EGFR through a screening approach in 2017 [268]. Within that study, Yao et. al found PTPRH to be a negative regulator of EGFR, specifically at EGFR tyrosine residue 1197. PTPRH deficient ovarian cell lines were also found to have ERK activation downstream of EGFR [269]. Overall, this indicates PTPRH mutations may contribute to deregulated cellular pathways within tumors, through multiple mechanisms. RESULTS DISCOVERY OF PTPRH MUTATIONS IN MOUSE PYMT TUMORS Previous research in the lab discovered a Ptprh mutation within the mammary tumors of PyMT FVB mice [193]. Upon targeted resequencing, 81% of PyMT mice (n = 45) were found to have a conserved V483M mutation within Ptprh. Further addition of 22 samples to this dataset found this ratio held, with Ptprh mutations occurring in 82% of PyMT tumors (Figure 2.1A). Interestingly, mammary tumors that 71 arose within the same mouse had the same pattern of Ptprh mutations, so if one tumor from mouse A had a heterozygous Ptprh mutation, other tumors from that same mouse had heterozygous mutations as well (Table 2.1). While a mechanism for this has yet to be explored, previous work has ruled out these mutations being germline [193]. Previous analysis of whole exome sequencing (WES) data acquired from a collaborator showed Ptprh mutations occurred throughout the Ptprh exome in PyMT mice of various backgrounds other than FVB. This is in contrast to FVB mice, where the mutation in Ptprh always results in a valine to methionine shift at amino acid 483 (Figure 2.1B). Interestingly, analysis of the WES data also found Ptprh mutation status to be conserved between primary tumors and their metastasis (Figure 2.1C). A student’s T-test found no statistical difference (p = .39) when comparing the number of exonic mutations in primary tumors, to exonic mutations in metastatic tumors. These data suggest that when Ptprh mutations occur, they occur early within the primary tumor progression. PTPRH MUTANT TUMORS CORRELATE WITH HIGH EGFR ACTIVITY As mentioned above, PTPRH has known interactions with the epidermal growth factor receptor. Therefore, we hypothesized PyMT tumors with a mutation in Ptprh would have increased phosphorylation of EGFR, specifically at EGFR residue 1197. To correlate mouse Ptprh mutations with increased p-EGFR, western blots were run using an antibody specific for 1197-EGFR [193] (Figure 2.2A). These blots show a clear correlation between mutated Ptprh and increased phosphorylation of EGFR. In fact, homozygous mutant tumors have an even further increase in p-EGFR than heterozygous mutant tumors. Suggesting a dominant negative mechanism may be occurring. To further explore the relationship of mutated Ptprh with signaling pathways downstream of EGFR, western blots for phosphorylated AKT, ERK, and the transcription factor STAT3 were completed using mouse tumor lysates that were wildtype for Ptprh, or had a homozygous mutation (Figure 2.3A/B). AKT, ERK, and STAT3 are both important regulators of pathways downstream of EGFR. [270–274]. As you can see, Ptprh mutant PyMT tumors have a clear increase in phosphorylated AKT, but not of ERK or STAT3. This suggests mutated Ptprh is only responsible 72 for regulating some of the tyrosine residues on the c-terminal tail of EGFR. Based on our data we believe PTPRH may specifically be targeting EGFR residue 1197, however, previous characterization of tyrosine residues on the c-terminal tail of EGFR has illustrated the complicated nature of these signaling pathways, and it may not be as simple as PTPRH targeting a single residue. DISCUSSION Through whole genome sequencing of PyMT mammary tumors from FVB mice, we have uncovered a conserved V483M mutation within the Ptprh gene. This gene was found mutated in 82% of tumors (n = 67) and was determined not to be germline. Further analysis of WES data found a conservation of Ptprh mutations within primary mammary tumors and their matched metastasis, suggesting Ptprh mutations occur within the early stages of tumor progression. Correlative western blot analysis found increased phosphorylation of EGFR at residue 1197, as well as increased phosphorylation of AKT further downstream, but not of ERK or STAT3. It may be prudent in the future to determine if in fact Ptprh mutations are occurring early within tumor formation. This may give insight as to whether PTPRH can be a driving force of tumor progression. 82% of tumors harboring mutations in Ptprh also begs the question of whether this mutation is selected for in this particular oncogenic model. While the above questions were not addressed within this work, the answers could provide beneficial insight into the role of PTPRH in PyMT carcinogenesis. We have yet to uncover a mechanism behind the failure of mutant Ptprh dephosphorylating EFGR, however heterozygous mutants resulting in increased p-EGFR suggest the mechanism may be dominant negative. Furthermore, dimerization of PTPRH is known to cause loss of activity, suggesting a mutation in the fibronectin domain could lead to increased ability of PTPRH to bind to itself. Uncovering this mechanism in future work, potentially through a series of co-immunoprecipitations and other biochemical assays, could lead to valuable insight as to how the V483M mutation impacts PTPRH’s ability to dephosphorylate EGFR. 73 With EGFR being a well know regulator of numerous cellular signaling pathways, Ptprh mutant tumors could have deregulated cellular growth dynamics through some of these pathways. In fact, we saw increased phosphorylated AKT within Ptprh mutant tumors that also had increased p-EGFR. AKT is an important regulator of pathways leading to increased cellular proliferation and evasion of pro apoptotic signals, so PTPRH mutations resulting in increased AKT activation could result in increased cellular proliferation and enhanced tumor growth. In fact, this has been noted in PyMT tumors [193], which is a striking phenotype given the already fast growth of PyMT tumors. A further exploration into how increased p-AKT is linked to increased –EGFR would be interesting. That mechanism could occur through canonical signaling mechanisms, such as through GRB2 and PI3K and intermediaries, or perhaps through another mechanism. Overall, further exploration V483M mutant Ptprh’s contribution to tumorigenesis in PyMT mice would provide valuable insights into phosphatase biology. MATERIALS AND METHODS TARGETED RESEQUENCING OF PYMT TUMORS DNA was extracted from flash frozen tumors using lysis buffer (50 mL Tris HCl, 5 mL 500 mM EDTA, 10 mL 10% SDS, 20 mL 5M NaCl, H20 up to 500 mL), or FFPE tissue using Qiagen FFPE extraction kit. The region flanking V483M was PCR amplified using the following primers, F = GGCCTTAGGTTCAATTGTGAATAC, R = CCTTAGCTTCCCGAGTATTGGTT. Amplified DNA was sent to GeneWiz for Sanger sequencing with the following primer TCATCCAAACTACATCTATGATCCA. Geneious software was used for alignment to reference DNA. ANALYSIS OF PTPRH MUTATIONS IN WES DATA Pre-annotated VCF files were downloaded for 64 tumors from GEO ascension number GSE142387. Data was processed within R by reading in VCF files, then filtering to only keep mutations within the Chr 7 bp 4548992 – 4604041 range (location of Ptprh in mouse genome). These files were then converted to Annovar format, exported, and annotated using Annovar. Statistical analysis was completed using a 74 student's t test (unequal variance, 2 tailed) between the metastasis group (mutations per met sample), and the primary group (mutations per primary tumor). WESTERN BLOTTING Tumor lysates were harvested from flash frozen tumors by crushing with a mortar and pestle, then dissolving in TNE lysis buffer (5 mL 1 M Tris HCl pH 8, 3 mL 5M NaCl, 1 mL NP40, 400 uL .5M EDTA, 2.0 mL .5M NaF, H2O to 100 mL). Roche mini protease tablets and sodium orthovanadate were used and protease and phosphatase inhibitors respectively. Sample concentrations were read using BCA assay, and were diluted to same concentration using extra lysis buffer. SDS was added and samples were heated to 95C for 10 min. Samples were loaded onto an 8% gel and run for ~2 hours, then transferred onto .45 uM PVDF at 70 volts for 2 hours. Blocking occurred for 1 hour at room temp in 5% BSA. Primary antibodies were incubated overnight in blocking buffer. Blots were rinsed with TBST and incubated at room temp with secondary for 1 hour before being rinsed and imaged again. Antibodies were as follows; total EGFR (Cell sig. D38B1), p-EGFR (Invitrogen PA5-37553), AKT (Cell sig. 11E7), p-AKT (Cell sig. D9E), STAT3 (cell sig. 79D7), p-STAT3 (D3A7), B-Tubulin (Proteintech 10094-1-AP), Vinculin (E1E9V). 75 APPENDIX 76 Figure 2.1: Ptprh mutations in PyMT mouse tumors A) Ptprh V483M mutation frequency seen in PyMT mammary tumors of FVB background mice. B) Lollipop plot of PTPRH exome showing location of V483M mutation within the predicted PTPRH fibronectin 77 Figure 2.1 (cont’d) domain. C) Table of Ptprh exonic mutations seen in primary PyMT FVB tumors and their matched metastasis. WES data obtained from a collaborator. 78 Rennhack 2018 [179] Figure 2.2: Increased p-EGFR in Ptprh mutant mouse tumors A) Increase in phosphorylated 1197 EGFR seen in heterozygous and homozygous Ptprh mutant PyMT mouse tumors. 79 Figure 2.3: Downstream pathway activity in Ptprh mutant mouse tumors A) Increase in phosphorylated S473 AKT seen in homozygous Ptprh mutant PyMT mouse tumors. B) No differences seen in phosphorylated ERK1/2 or Y705 STAT3. 80 Table 2.1: Mammary gland Ptprh mutation status in PyMT mice 81 Mouse #Tumor # (mammary gland #)Ptprh Status21392Heterozygous21396Heterozygous2731WT2735WT2736WT2738WT2743WT2747WT2748WT28314Homozygous28315Homozygous28316homozygous3003Heterozygous3007Heterozygous3008Heterozygous3009Heterozygous31466Homozygous31467Homozygous33041Heterozygous33048Heterozygous37202Homozygous37208Homozygous3792WT3795WT4551Heterozygous4552Heterozygous4556Heterozygous4561Heterozygous4562Heterozygous4563Heterozygous4564Heterozygous5472Homozygous5475Homozygous5476Homozygous5631Heterozygous5632Heterozygous5634Heterozygous5636Heterozygous5921Heterozygous5923Heterozygous6161Homozygous6162Homozygous6181WT6184WT6186WT6285Heterozygous6286Heterozygous6933Heterozygous6934Heterozygous Table 2.1 (cont’d) Ptprh mutation status is conserved amongst different mammary gland tumors from the same mouse. 82 CHAPTER 3 RELATIONSHIP OF PTPRH AND EGFR IN HUMAN CANCER 83 ABSTRACT While mouse models of cancer can be beneficial tools for studying the disease, not all genomic mutations found within mouse tumors are relevant to human tumor development. Here we investigate the importance of PTPRH mutations in human cancer, finding that 5% of NSCLC cases have mutations within PTPRH. Many of these mutations are predicted to have increased EGFR activity, and activation of the PI3K/AKT pathway downstream of EGFR. We show PTPRH ablation through CRISPR leads to increased phosphorylation of EGFR, as well as AKT. A phosphorylated receptor tyrosine kinase array also discovered other RTKs potentially targeted by PTPRH, including a confirmed increase in phosphorylated FGFR1 upon loss of PTPRH. Interestingly, Ptprh mutant mouse tumors and PTPRH KO lung cancer cells also display increased EGFR localization to the nucleus of cells, which has been noted in other cancers and regenerating liver tissue. 84 INTRODUCTION Previous data found a Ptprh mutation within PyMT mammary tumors, with these tumors exhibiting increased p-EGFR and p-AKT as compared to Ptprh WT tumors. While this data was striking, it does not show whether PTPRH mutations are relevant within human cancers. Genetic aberrations found within mouse models of cancer are not always applicable to human forms of the disease [275, 276]. This is especially the case for certain oncogenic drivers in mice, such as the PyMT oncogene used to drive carcinogenesis within the PyMT model [277]. While tumor induction within the PyMT model relies on the activation of certain pathways known to be important for carcinogenesis, such as Pi3K/AKT, the main oncogenic driver (PyMT) is not found within human cancers. Determining whether genetic mutations found in mouse tumors are applicable to human cancers can also be complicated by a large number of passenger mutations, whose effect on tumor progression can be ambiguous [278]. Sorting through mutations found via whole genome sequencing is often completed by applying numerous filtering steps, including but not limited to the following; 1. Annotating variants to determine their coding classification (nonsynonymous, etc.) 2. Correlating particular mutations to survival data or another phenotype across multiple samples 3. Analyzing human datasets to determine whether the mutation is present in human tumors 4. Cross referencing mutation lists with known oncogenes and tumor suppressor genes 5. Using pathway or gene set databases (such as Gather) to find relationships between lists of mutated genes 6. Pairing mutations with transcriptomic data to check for a corresponding alteration in gene expression 7. Determining whether the mutation results in a potential protein conformational shift 85 Overall, the resources available to aid in determining whether a particular mouse model gene mutation is relevant are vast, and numerous resources should be combined to shift through the noise occurring within the mutational landscape of mouse model tumors. In this chapter, some of the listed resources are applied to show the relevancy of PTPRH to human cancers. The relationship between PTPRH and EGFR is also flushed out. RESULTS PTPRH MUTATIONS IN HUMAN CANCER To determine whether PTPRH mutations were present within human tumors, data from The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) were analyzed taking a pan-cancer data mining approach. Initial analysis of these two data collections showed high rates of PTPRH mutations in skin, uterus, and lung cancers (Figure 3.1A). Interestingly, when analyzing data from ICGC, the percentage of patients with PTPRH mutations within the same cancer type was noted to be variable across datasets from different countries. For instance, a higher percentage of melanoma patients in Australia were noted to have PTPRH mutations than in the United States. A closer look at the individual datasets however, revealed differences in data processing and reporting that accounted for most of the mutation percentage discrepancies. When focusing on lung cancer however, it was noted that a higher percentage of patients in South Korea had PTPRH mutations as compared to the United States (Figure 3.1B). This analysis only considered exonic PTPRH mutations. Because of the known relationship between PTPRH and EGFR, we decided to focus more closely on PTPRH mutations within non-small cell lung cancer (NSCLC) patients, since EGFR activating mutations occur in a large subset of those patients. This would give us a patient group that has already been characterized in the context of increased EGFR signaling and treatment with EGFR inhibitors. With PTPRH known to target EGFR, we hypothesized a mutation in PTPRH could lead to increased EGFR signaling in patient tumors that have no canonical activating mutations in EGFR. Importantly, we see that NSCLC 86 patients with mutations in PTPRH are mutually exclusive from NSCLC patients with activating mutations in EGFR (Figure 3.1C). This means the subset of patients with PTPRH mutations could have increased activation of EGFR, but are not classified as such and are therefore missing out on potentially efficacious EGFR therapies. Analyzing the TCGA NSCLC dataset for potential discrepancies in age, overall survival, sex, or race found no statistical differences (Figure 3.1D). Interestingly, while EGFR mutant lung cancers are not typically associated with smoking, PTPRH mutant tumors have previously been associated smoking [279]. BIOINFORMATICS PREDICTS ACTIVATION OF EGFR AND DOWNSTREAM PATHWAYS To further explore whether PTPRH mutations in NSCLC tumors lead to increased EGFR activity, a number of bioinformatics predictions were used. First, we predicted EGFR activity in PTPRH mutant NSCLC tumors using single sample gene set enrichment analysis (ssGSEA) on RNA-sequencing data from these tumors (Figure 3.2A). This analysis showed certain PTPRH mutant tumors had predicted high EGFR activity. In fact, there three ‘hotspot’ regions within the PTPRH exome where PTPRH mutations were predicted to have increased EGFR activity. Two of these hotspot regions occur within PTPRH fibronectin domains where we also discovered our mouse Ptprh mutation, and the third region occurs within the phosphatase domain. Interestingly, phosphatase domain PTPRH mutations with predicted high EGFR activity are located just downstream of the conserved HC(X5)R activity motif, but not within the motif. With correlative predictions showing increased p-EGFR in PTPRH mutant lung cancer tumors, we wanted to determine whether pathways downstream of EGFR were also being impacted within those tumors. To begin, ssGSEA was completed on 12 tumors in each of the three groups; PTPRH mutants with predicted high EGFR from the previous GSEA analysis, EGFR L858R mutants, and tumors WT for both PTPRH and EGFR. The pathway predictions from ssGSEA were then clustered into a heatmap using hierarchical K-means clustering (Figure 3.2B). This analysis showed certain PTPRH mutant tumors to cluster with EGFR mutant tumors, suggesting they have a similar pathway activation profile. To further 87 investigate pathways downstream of EGFR, GSEA was completed on tumors the same 12 tumors used for the ssGSEA and pathway clustering. GSEA showed predicted activation of the PI3K/AKT pathway, which matches the increased p-AKT seen in Ptprh mutant mouse tumors (Figure 3.2C). PTPRH TARGETS EGFR IN HUMAN LUNG CANCER LINE With bioinformatics analysis showing PTPRH mutations occurring in 5% of NSCLC tumors and predicting activation of EGFR and EGFR pathways within those tumors, we wanted to determine whether non-functional PTPRH could indeed lead to activated EGFR. CRISPR knockouts were created in the H23 NSCLC cell line, targeting exon four of PTPRH. Sanger sequencing of some CRISPR clones confirmed a disruption to PTPRH sequence a few base pairs upstream of the PAM sequence, where an adenosine insertion occurred (Figure 3.3A). Adenosine insertion leads to truncation of the PTPRH mRNA through multiple early stop codons. While we struggled to find a working antibody for PTPRH, we used Y1197 phosphorylated EGFR as a screen for determining the effectiveness of PTPRH knockout. Increased Y1197 phosphorylation was seen in PTPRH KO clones harboring a disruption at the cut site, but not in clones without the disruption (Figure 3.3B). To determine whether expressing PTPRH within the PTPRH KO clones could rescue the increased p-y-1197 EGFR phenotype, wild type PTPRH was transiently expressed within one of the PTPRH KO clones (Figure 3.3C). This resulted in a decrease of phosphorylated tyrosine at EGFR site 1197. Overexpressing a catalytically dead version of PTPRH did not rescue the increased phosphorylation of EGFR tyrosine 1197 (Figure 3.3D). Overall, these analysis show PTPRH is responsible for dephosphorylating EGFR at tyrosine residue 1197. To determine whether there were increases in p-AKT, p-STAT3, and p-ERK within PTPRH KO cells, western blots were completed using lysates from the same PTPRH KO CRISPR clones that showed increased p-EGFR. Interestingly, the same pattern of phosphorylation seen in mouse tumor lysates occurred within human cell line lysates (Figure 3.4A). PTPRH KO clones showed increased p-AKT, but no 88 increases in p-STAT3 or p-ERK. It was noted however, that one CRISPR clone did not have the same increase in p-AKT, suggesting the possibility of clonal effects. To determine whether clonal effects were indeed occurring, a PTPRH KO CRISPR clone with high p-AKT was subjected to CRISPR homologous recombination repair to yield a Y1197F mutation in EGFR. Y1197F mutants were confirmed through the addition of an ECORI cut site, as well as Sanger sequencing. CRISPR repair yielded two mutant clones, one with a heterozygous mutation at Y1197, and one with a homozygous mutation. Western blots completed on lysates from Y1197F mutant clones show decreases in p-EGFR and p-AKT as compared to the parent cell, confirming the increase in p-AKT was indeed due to the loss of PTPRH (Figure 3.4B/C). TARGETING OF OTHER KINASES BY PTPRH Certain phosphatases are known to have multiple targets. To determine whether loss of PTPRH may impact other kinases within H23 cells, a human receptor tyrosine kinase array was completed. A membrane arrayed with RTK antibodies for specific phosphorylation sites was incubated with either H23 WT lysate, or H23 PTPRH KO lysate. The membrane was then incubated with biotinylated antibody followed by labelled streptavidin. Numerous RTK’s were found to have different phosphorylation profiles between the membranes incubated with WT lysate as compared to PTPRH KO lysate (Figure 3.5A). After quantifying the signals, two RTKs in particular had increased phosphorylation on the PTPRK KO blot as compared to the WT blot. These were fibroblast growth factor receptor 1 (FGFR1), with an approximate 3.5 fold increase, and insulin like growth factor 1 receptor (IGF1R) with an approximate 2.4 fold increase. Western blots were completed to confirm increased phosphorylation of FGFR1 in H23 PTPRH KO cells (Figure 3.5B). Indeed, a substantial increase in phosphorylated FGFR1 was seen in PTPRH KO cell lysate as compared to WT lysate, when looking at the 145 KD band. To predict whether FGFR1 and IGF1R may have increased signaling within PTPRH mutant human tumors, we completed the same analysis as above for prediction of EGFR activation in human tumors. ssGSEA was completed to predict pathway activation status of FGFR1 and IGF1R within PTPRH mutant 89 tumors, and pathway activation status was correlated to each sample via a lollipop plot (Figure 3.5C). Interestingly, the same predicted hotspots for EGFR activation seem conserved for FGFR1 and IGF1R activation. In other words, if one of the three kinases are predicted to be active, the other two kinases are most likely predicted to be active as well. NUCLEAR EGFR WITHIN PTPRH MUTANT TUMORS As mentioned in the introduction of chapter two, EGFR has been noted within the nucleus of cells in times of cellular stress and deregulation. To determine the subcellular location of EGFR within Ptprh mutant mouse tumors, immunohistochemistry was completed using an antibody specific for p-y-1197 EGFR (Figure 3.6A). IHC showed vast increases in EGFR staining within Ptprh mutant mouse tumors, with EGFR localized to the nucleus. With the above mouse analysis being correlative, we wanted to determine whether loss of PTPRH resulted in increased EGFR translocation to the nucleus. To accomplish this, H23 PTPRH KO cells were injected into the left flank of nude mice. After reaching 8-10mm in size, mice were necropsied and both flash frozen and formalin fixed tumor tissue was harvested. No metastasis were noted in these mice. IHC was completed using a p-y-1197 specific EGFR antibody, and an increase in nuclear EGFR was noted in mouse tumors derived from PTPRH KO cells (Figure 3.6B). These data suggest a failure of PTPRH to dephosphorylate EGFR at tyrosine residue 1197 leads to increased localization of EGFR to the nucleus. Future analysis to determine whether full or partial length EGFR is located within the nucleus, as well as putative targets of EGFR within the nucleus would be highly beneficial. DISCUSSION A pan cancer analysis of human PTPRH mutations found numerous cancers harboring mutations in at least 5% of patients, suggesting mutated PTPRH may play a role in tumor development for various other cancers. With PTPRH affecting cell signaling pathways in a context dependent manner, it is possible PTPRH mutations could have an oncogenic or tumor suppressive effect depending on the cancer site and cell type. PTPRH mutations were found in approximately 5% of NSCLC patients, with these mutations 90 spread across the PTPRH exome. This is an interesting contrast to the conserved V645M mutation found within our PyMT tumors, and has implications for which mutations may be impactful on tumor growth. While a mechanism has yet to be explored for these various mutations, it is possible the mutations are acting in different fashion from each other. Mutations within phosphatase domain may abrogate catalytic activity, while mutations in the fibronectin domains may prevent dimerization and binding of target substrates. Since some of the phosphatase domain mutations with predicted high EGFR activity lie outside the conserved activity HC(X5)R motif, it is also possible these mutations are occurring within other conserved PTP motfis, and preventing recognition of substrate binding sites. More biochemical analysis will be needed to explore these hypothesis. In the previous chapter, mouse tumors showed a correlation between mutant Ptprh and high phosphorylation of EGFR. This was confirmed in a human NSCLC cell line through CRISPR ablation of PTPRH. Overexpressing WT PTPRH within PTPRH KO cells rescued the increased p-EGFR phenotype, confirming PTPRH does indeed regulate EGFR within this context. Bioinformatics predictions showed predicted activation of the PI3K/AKT pathway, and this was confirmed through western blotting of PTPRH KO clones. Interestingly, phosphorylation of STAT3 (a transcription factor known to be regulated by EGFR) or ERK were not affected by PTPRH ablation. This suggests PTPRH is only regulating certain tyrosine residues on the c-terminal tail of EGFR. A more robust analysis of how other pathways downstream of EGFR may be affected by PTPRH loss would be a prudent next step. In generating Y1197F EGFR mutants within the PTPRH KO clone with higher p-AKT levels, we noted a decrease in phosphorylation of AKT. However that phosphorylation did not reduce completely to wild type levels. It is possible this failure to reduce p-AKT levels to those seen in WT is due to other activated pathways within the PTPRH KO cells, especially since we see increased phosphorylation of other kinases within PTPRH KO cells. A kinase array showed increased phosphorylation of numerous RTKs within PTPRH KO cells, including FGFR1 and IGFR1. Interestingly, increased phosphorylation of EGFR was not shown on the array. 91 However, when checking the phosphorylated antibodies used on the blot, tyrosine 1197 site was not included. This is further confirmation that PTPRH is targeting tyrosine 1197 on EGFR, and not other tyrosine sites. As the array was only designed for RTK interactions, other intracellular signaling molecules may have been impacted by loss of PTPRH, but would have been missed. A mass-spec approach may be beneficial in the future, to determine what other signaling molecules may be impacted by loss of PTPRH. Increased phosphorylation of FGFR1 was confirmed through western blotting. This has interesting implications for both cellular pathways that may be affected, as well as potential treatment options for those with non-functional PTPRH. Perhaps a dual drug inhibition approach of targeting FGFR1 and EGFR would be prudent. Finally, Ptprh mutant mouse tumors, and PTPRH KO human tumors implanted in mice have increased staining of nuclear EGFR. Nuclear EGFR has been noted in times of cellular stress, as well as regenerating liver tissue. While in the nucleus, EGFR can act as a cofactor, or direct transcriptional activator by binding to the promoters of certain genes, such as cyclin D1. Increased nuclear EGFR upon loss of PTPRH activity could have profound impacts on cellular signaling pathways. The mechanism behind increased nuclear localization of EGFR has not been explored, but warrants further exploration. It is possible that loss of PTPRH activity leading to increased activation of EGFR could result in increased internalization of EGFR, although this hypothesis would need to be further explored. One potential caveat to this work is the lack of other cell lines with PTPRH ablation. While the addition of another PTPRH KO cell line would have added robustness to these data, we feel the current data sufficiently demonstrates PTPRH is responsible for regulating EGFR signaling due to two key experiments. First, overexpression of WT PTPRH within leads to reduced phosphorylation of EGFR within PTPRH KO cells, while overexpression of a catalytically dead version of PTPRH does not result in this reduction. Second, heterozygous and homozygous Y1197F EGFR mutants having a step-wise reduction in p-Y 1197 EGFR within PTPRH knockout cells, meaning the heterozygous mutant had some reduction of p- 92 Y 1197, and the homozygous mutant had a larger reduction in p-Y 1197. Overall, these data suggest PTPRH is indeed responsible for regulating EGFR signaling. MATERIALS AND METHODS DETERMINING PTPRH MUTATIONS IN HUMAN CANCERS Pan-Cancer datasets from numerous sources, including TCGA and ICGC, were analyzed through CBioPortal and the ICGC portal. Lung cancer mutation percentage were analyzed specifically using TCGA 2016 dataset accessed through CBioPortal. The South Korean and U.S datasets showing discrepancy in percentage of PTPRH mutations were analyzed on the ICGC portal. Both datasets were filtered to include only patients with exonic mutations. MUTUAL EXLCUSIVITY All NSCLC datasets available on CBioPortal were used for this analysis, and are listed below. PTPRH and EGFR SNV mutation data were downloaded and combined. Duplicate samples were removed, and any sample with a PTPRH or EGFR mutation was considered. A 2x2 contingency table was run to determine mutual exclusivity. Datasets include; MSK - cancer cell 2018, MSKCC - J clin oncol 2018, TRACERx - NEJM 2017, University of Turnin, 2017, MSK - Science 2015, TCGA - Nat Genet 2016 (Pan), Broad - cell 2012, MSKCC - Science 2015, TCGA - Firehose Legacy, TCGA - Nature 2014, TCGA - Pan-cancer Atlas, TSP - Nature 2008, MSKCC - Cancer Discov 2017, TCGA - Nature 2012 DEMOGRAPHICS OF PTPRH MUTATIONS Age, overall survival, and race demographics were analyzed using the Lung Adenocarcinoma TCGA Pan-Cancer Atlas data set downloaded from CBioPortal. This was one of the few datasets with race data. Two-tailed Student’s T-Tests assuming unequal variance were completed for PTPRH mutant VS. EGFR mutant samples, as well as PTPRH mutant VS. WT (non-EGFR mutant) samples for age of diagnosis and overall survival. Samples without age or OS data were excluded. Only samples with missense or 93 truncating mutations were included, and overexpression samples were excluded. Race was analyzed using a 2x2 contingency table. EGFR ACTIVITY AND PATHWAY ACTIVITY PREDICTION TCGA pan-cancer RNA-seq dataset (downloaded from UCSC Xena) was analyzed for PTPRH, EGFR, FGFR1, and IGF1R mutations. This mutation list was downloaded and filtered to keep samples that had a mutation in PTPRH, EGFR, or that were WT for PTPRH, EGFR, FGFR1, and IGF1R. Any sample with a mutation in PTPRH was kept, resulting in 53 samples. 10 samples of each of the two categories were kept; WT for PTPRH and the above three RTKs, and L858R mutant EGFR that were WT for PTPRH, FGFR1, or IGF1R. To decide which WT and EGFR samples to keep, the samples from those subsequent groups were assigned a random number using the RAND() function in excel. These numbers were then sorted from highest to lowest, keeping the top 10 samples. RSEM(log2 X+1) normalization was applied to the filtered sample list, resulting in 47 PTPRH mutant samples (WT for the kinases), 9 samples that WT for PTPRH and the three kinases, and 8 samples with EGFR mutations (WT for PTPRH, FGFR1, and IGF1R). ssGSEA was run on the samples to predict pathway activation status. Pathways for each kinase were filtered down, selecting the most relevant and robust pathway. In Microsoft Excel, a ranking sum score was applied to the pathway prediction data for each sample using the following formula; =(B4-MIN(B$4:B$475))/(MAX(B$4:B$475)-MIN(B$4:B$475)) For GSEA analysis of PTPRH mutant tumors, the pan-cancer RNA-seq dataset was again downloaded from UCSC Xena. Twelve tumors for each of the three categories were kept; PTPRH mutant tumors predicted to have high EGFR activity, EGFR L858R mutants, and tumors that were WT for both PTPRH and EGFR. GSEA was completed using the GenePattern server. CRISPR KNOCKOUT Benchling [280] was used to design the guide RNA (AGCACACACTAACATCACCG) targeting the fourth exon of PTPRH. The guide was cloned into px458 using AgeI and EcoRI, and transformed into DH5a. 94 Transient transfection of px458 into H23 cells was completed using Promega’s Viafect. GFP positive cells were sorted into single cell clones into 96 well plates using FACS. Once clones had grown into a colony, they were subsequently moved to 24-well plates, then 6-well plates. DNA was harvested and sent to ACTG for sanger sequencing. CRISPR KNOCK-IN MUTATION Guide RNA was designed in Benchling with the PAM (NGG) sequence 5 bp downstream of the desired EGFR a.a. 1197 mutation site. The single stranded region of homology was designed in Benchling by choosing desired length for homology arms as well as the desired mutation, then taking the reverse complement of that strand. The oligo was designed with 36 bp upstream of the desired mutation site and 90 bp downstream. The desired mutation resulting in a Y1197F amino acid substitution was added. Luckily, this mutation also resulted in the addition of an EcoRI cut site, which was used for downstream screening. The mutation also altered the guide RNA enough to prevent re-annealing once HR mediated repair occurred. Guide RNA was cloned into px458 in a manner similar to the CRISPR knockout protocol. For transfection, H23 PTPRH KO cells were seeded at ~85% confluency, then transfected using Viafect in a 6:1 ratio. 1 ug of px458 with guide, and 4 ug of ss repair template were transfected. Sorting was completed using FACS for GFP. Clones were screened using a digest for EcoRI, and confirmed with sequencing. WESTERN BLOTTING Blocking was completed at room temperature for one hour, using manufactures recommended buffer. Primary antibody was incubated overnight at four degrees C. Blots were imaged using LiCOR system. Antibodies used were as follows; total EGFR (Cell Signaling D38B1), 1197 EGFR (Invitrogen PA5- 37553), total AKT (Cell Signaling 11E7), p-s473 AKT (Cell Signaling D9E), total STAT3 Cell Signaling 79D7 (), p-Y705 STAT3 (Cell Signaling D3A7), total FGFR1 (Cell Signaling D8E4), p-Y653/654 FGFR1 (Cell Signaling 3471s), beta tubulin (Proteintech 10094-1), vinculin (Cell Signaling E1E9V). 95 OVEREXPRESSION EXPERIMENTS PTPRH c-DNA within plasmid PRc-CMV was kindly provided by Dr. Takashi Matozaki at Kobe University. Site directed mutagenesis was used to achieve a D986A mutant. 5% DMSO and a 2 minute/kb extension time were used during SDM due to the high GC content of PTPRH. Both WT and D986A mutant PTPRH plasmid constructs were transiently expressed in PTPRH KO cells using Viafect. G418 Gentacin was used as a selection marker. Once all control cells were dead, protein lysate was harvested using TNE lysis buffer with protease and phosphatase inhibitors. RECEPTOR TYROSINE KINASE ARRAY Protocol for RayBiotech Human RTK Phosphorylation Array C1 kit was followed. Membranes were incubated with lysate from H23 WT cells or H23 PTPRH KO cells. Lysate concentration was read using a Bradford assay, then diluted and read again to ensure accuracy. IHC NUCLEAR EGFR Human cell lines H23 PTPRH WT or H23 PTPRH KO were injected into the left flank of nude mice. H23 cell line tumors were grown to approximately 10 mm in the largest direction prior to necropsy. Mouse PyMT tumors, and tumors grown from human H23 cells were necropsied with portions of tumor tissue preserved in formalin, and portions of tumor flash frozen for further downstream analysis. Formalin fixed paraffin embedded tumors were subjected to staining using an antibody specific for 1197 EGFR (Thermo PA5-37553). 96 APPENDIX 97 Figure 3.1: PTPRH mutations within human cancers A) Pan-cancer analysis using data from ICGC and TCGA shows PTPRH mutations present within numerous cancers. Lung cancer is highlighted. B) PTPRH mutation rates can vary within NSCLC, depending on study site. C) Oncoplot of TCGA data showing EGFR and PTPRH mutation rates with NSCLC. Each rectangle represents a patient tumor. PTPRH mutations are mutually exclusive from EGFR mutations. D) Analysis completed on TCGA data shows no relationship seen between PTPRH mutations and age, overall survival, or race. 98 Figure 3.2: Pathway activation predictions in PTPRH mutant tumors A) Lollipop plot correlates predicted EGFR activity with human PTPRH mutations. Each dot represents a human tumor with its PTPRH mutation corresponding to that location on the PTPRH exome. B) ssGSEA was used to predict gene set enrichment in EGFR or PTPRH mutant NSCLC tumors. Enriched gene sets were subjected to hierarchical clustering and visualized with a heatmap. C) GSEA predicts activation of the PI3K/AKT pathway downstream of EGFR. 99 Figure 3.3: PTPRH knockout cells have increased p-EGFR A) Electropherogram shows indel of A insertion a few base pairs upstream of the PAM sequence within H23 NSCLC cells. B) Western blotting for 1197 p-EGFR shows increased p-EGFR in PTPRH KO cells with indel. Both KO clones had same A insertion seen in electropherogram shown in 3.3A. C) Overexpression of a WT PTPRH plasmid in PTPRH KO cells reduces p-EGFR to WT H23 levels. D) Overexpression of a D986A catalytically dead version of PTPRH does not rescue increased p-EGFR phenotype. 100 Figure 3.4: Downstream signaling of H23 PTPRH KO cells Western analysis shows activation of AKT pathway, but not STAT3 in PTPRH KO cells. A) Western blotting shows no increase in p-ERK or p-Y705 STAT3 in PTPRH KO cells, but increased p-S473 AKT in one clonal population. B) Electropherogram of H23 PTPRH KO Clone 1 cells subjected to CRISPR for mutation 101 Figure 3.4 (cont’d) of EGFR tyrosine 1197. Clone 16 had heterozygous mutation, and clone 10 had homozygous mutation to achieve tyrosine to phenylalanine amino acid substitution. C) Western blot of H23 PTPRH KO/EGFR Y1197F mutants shows decreased phosphorylation of EGFR at residue 1197, and decreased p S473 when mutating the tyrosine at 1197 to phenylalanine. 102 Figure 3.5: PTPRH regulates other kinases A) Human phosphorylated RTK array shows variable phosphorylation of FGFR1, IGF1R, and other kinases between H23 PTPRH KO lysate and H23 WT lysate. B) Lollipop plot showing predicted activity of RTKs in PTPRH mutant NSCLC tumors. Hotspot regions are similar to those of the EGFR lollipop plot in figure 3.2A. C) Western showing increased p-FGFR1 in H23 PTPRH KO clones, as compared to PTPRH WT cells. 103 Figure 3.6: Localization of EGFR to the nucleus in PTPRH ablated tumors Immunohistochemistry using an antibody specific for 1197 p-Y-EGFR shows increased nuclear localization of PTPRH in mouse and human tumors with PTPRH activity loss. A) PyMT tumors with V486M mutation correlate with increased localization of 1197 EGFR to the nucleus. B) H23 PTPRH WT or KO cells were injected into the left flank of nude mice. Tumors grown from H23 PTPRH KO cells have increased EGFR staining within the nucleus. 104 TREATMENT OPPORTUNITIES FOR PTPRH MUTATIONS IN NON-SMALL CELL LUNG CANCER CHAPTER 4 105 ABSTRACT Previous data has shown increased phosphorylation of EGFR upon loss of PTPRH in H23 NSCLC cells. Pooled knockout of PTPRH within H23 cells leads to increased proliferation and cellular growth, suggesting PTPRH loss contributes to tumor growth through EGFR pathway activation. We show PTPRH mutant non-small cell lung cancer lines respond to osimertinib treatment in vitro, and the H2228 PTPRH mutant cell line responds to osimertinib treatment in vivo. Furthermore, treatment of H2228 tumors with osimertinib reduces cellular proliferation as seen through KI67 staining on formalin fixed tumors. Overall, these data suggest PTPRH mutant NSCLC patients may benefit from tyrosine kinase inhibitor treatment of EGFR. 106 INTRODUCTION PTPRH DEREGULATION IN HUMAN CANCERS While some phosphatases, such as PTEN [281, 282], have well defined tumor suppressive capabilities, many phosphatases are undefined in the context of cancer. Overall, the importance of cell signaling changes through phosphatase regulation is becoming more appreciated. Even with the literature on PTPRH being sparse, there have been investigations into the roles of PTPRH within some cancers. Expression levels of PTPRH are thought to be low within normal colon epithelial tissue, however increased expression has been seen within severe dysplasia of the colon, and colon cancer [283]. An inverse of this expression profile deregulation is seen within cancers of the liver, where lower PTPRH expression is seen within poorly differentiated hepatocellular carcinomas (HCC) while normal liver tissue has high expression of PTPRH. Furthermore, expression of PTPRH within two HCC cell lines having low PTPRH expression drastically reduced cellular motility and growth rate in vitro, suggesting PTPRH has a tumor suppressive role within hepatocellular carcinoma. While the differing nature of PTPRH expression between colon cancers and hepatocellular carcinomas seems contradictory, it is important to remember PTPRH can affect signaling pathways in a context dependent manner. Loss of PTPRH expression within hepatocellular carcinomas aligns with canonical thinking that phosphatases abrogate downstream signaling of RTKS through removal of phosphate groups, while overexpression of PTPRH in colon cancers highlights the ability of PTPRH to act as an oncogene due to activation of SRC. Overexpression of PTPRH has been noted in NSCLC, with correlative hypomethylation of PTPRH being suggested as the cause [279]. Furthermore, PTPRH overexpression has been noted as a prognostic indicator for poor survival. This seems to be in contrast to our data, which suggests loss of PTPRH leads to increased oncogenic signaling. On the surface it may seem logical that if loss of PTPRH function leads to increased oncogenic signaling through EGFR, then high expression of PTPRH should abrogate this 107 signaling. However, there are two important pieces of information that could explain this discrepancy. First, PTPRH function has been shown to decrease upon homodimerization. It is entirely possible that overexpression of PTPRH leads to increased homodimerization through increased contact of PTPRH with itself, although this would need to be further explored. Second, overexpression of PTPRH could lead to increased targeting of other signaling molecules such as SRC. As dephosphorylation of tyrosine residues on SRC activates downstream signaling, this mechanism could also explain the potential discrepancy. Overall, PTPRH deregulation has been noted in numerous cancers. NON-SMALL CELL LUNG CANCER Lung cancer accounts for the greatest amount of U.S. cancer deaths in both men and women, and 5 year survival rates remain poor [152]. Broadly, lung cancer is classified into two major histologies, including small-cell (SC) and non-small cell lung cancer (NSCLC). SC lung cancer typically has a poorer prognosis than NSCLC, and is typically associated with smoking. The mutation profile between the two histologies also varies, with SC lung cancer patients typically having mutations in the tumor suppressor genes Rb and Tp53, and NSCLC patients having mutations in oncogenes EGFR and KRAS. Overall, NSCLC accounts for approximately 85% of all lung cancer cases, and is further delineated into three histologies including Adenocarcinoma, Squamous cell carcinoma, and Large cell carcinoma [284]. The prognosis for NSCLC patients is markedly improved compared to that of patients with small cell lung cancer, however prognosis varies widely depending on whether the tumor has metastasized. 5 year survival rates for localized NSCLC approach 63%, but with distant metastasis 5 year survival rates drop to 7% (American Cancer Society). Prognosis is complicated by a number of factors however, including smoking status, EGFR mutation status, and initial response to treatment [285]. Approximately 15% of NSCLC patients have tumors presenting with EGFR activating mutations, or amplification of EGFR, however this percentage is substantially higher in Asian patients [286]. 80% of these EGFR mutations are putative oncogenic drives, with the vast majority of these mutations being 108 missense L858R mutations, or a small deletion around amino acids 750. Activating EGFR mutations are indicators of responsiveness to tyrosine kinase inhibitors, however this is not the case for tumors with EGFR amplification [287]. Overall, patients with activating mutations in EGFR have better 5-year survival outcomes, as TKIs are capable of increasing survival time. TYROSINE KINASE INHIBITORS Development of drugs targeting PTPs has proved difficult in many cases. This is potentially due to the context dependent nature of many PTPs, as well as their tumor suppressive roles. Targeting a PTP with potential tumor suppressive qualities would result in the opposite of the intended effect. With these difficulties, drugs have been developed to target certain PTPs. Typically, these drugs target PTPs with oncogenic properties that increase cellular pathway signaling. Shp099 is an inhibitor of the protein tyrosine phosphatase SHP2, a PTP with SRC homology like domains known to activate MAPK signaling [288, 289]. With the apparent difficulty of targeting PTPs for drug treatment, targeting PTP substrates may be another viable option. Since we have shown non-functional PTPRH to enhance EGFR signaling, targeting PTPRH mutant tumors with tyrosine kinase inhibitors directed at EGFR may be a viable option. Tyrosine kinase inhibitors are often used to treat NSCLC patients who have tumors presenting with canonical EGFR activating mutations. First generation TKIs, such as erlotinib and gefitinib, were designed to target the ATP binding domain of EGFR. These TKIs successfully enhance progression free survival, however resistance mechanisms eventually develop, usually in the form of a T790M EGFR mutation which causes a structural shift and prevents binding of TKIs to the ATP binding domain [290]. Third generation TKIs, such as osimertinib, have been developed to get around this structural inhibition by binding to a nearby cysteine residue. While third generation TKIs are capable of overcoming T790M resistance, new resistance mechanisms eventually develop. Currently, 4th generation TKIs are being developed based on allosteric inhibition of EGFR. 109 RESULTS POOLED PTPRH KNOCKOUTS HAVE INCREASED GROWTH Previous data had shown Ptprh mutant PyMT tumors to have decreased tumor latency [193]. With loss of PTPRH function in the human H23 cell line resulting in increased PI3K/AKT pathway activation downstream of EGFR, we hypothesized this may lead to increased cellular proliferation within PTPRH KO cells. To address this, growth curves were completed using H23 PTPRH WT cells, as well as two PTPRH KO clones (Figure 4.1). Growth curves involving clones were ambiguous, with one clone showing clear increased growth, but a second clone growing at a similar rate as the wild type cells. To determine whether clonal effects were responsible for the discrepancy in phenotype, PTPRH pooled knockouts were created in the H23 cell line. Sanger sequencing of pooled knockout cells and subsequent TIDE (Tracking of Indels by Decomposition) analysis showed a knockout efficiency of approximately 45%. Even with low efficiency of knockout, MTT assays and growth curves using PTPRH pooled knockout cells showed increased proliferation and growth of KO cells over wild type cells (Figure 4.2). Overall, these data show loss of PTPRH leads to increased cellular growth. PTPRH MUTANT CELL LINES ARE SENSITIVE TO TYROSINE KINASE INHIBITION THROUGH OSIMERTINIB TREATMENT Non-small cell lung cancer patients whose tumors present with EGFR mutations often benefit from tyrosine kinase inhibitor therapy. With loss of PTPRH function leading to increased activation of EGFR and pathways downstream of EGFR, we hypothesized that PTPRH mutant tumors would benefit from treatment with tyrosine kinase inhibitors. Previous work showed Ptprh mutant PyMT mouse tumors to be sensitive to the TKI erlotinib [193]. To explore whether human PTPRH mutations sensitize tumors to TKI therapy, we obtained two NSCLC cells lines with PTPRH mutations. Cell line H1155 has an M188I PTPRH mutation within one of the fibronectin domains (similar to the mutation we found within our mouse tumors), and cell line H2228 has a Q887P mutation within the phosphatase domain. Subjecting 110 these cell lines to a dose response curve with the TKI erlotinib showed no response (Figure 4.3A). However, when completing a dose response curve using the TKI osimertinib, a third generation TKI, PTPRH mutant cell lines showed a response (Figure 4.3B). However, subjecting H23 PTPRH KO cell lines to the same osimertinib dose regime showed no response as compared to H23 WT cells (data not shown). This may be due to the high mutational burden of the H23 cell line, which includes a mutation in the TP53 and KRAS genes, well characterized tumor suppressor and oncogenes respectively. To explore whether H23 PTPRH KO cells would show enhanced response to KRAS and EGFR inhibition, a dual drug dose response curve was completed. However, no enhanced response was seen (Figure 4.3C). With PTPRH KO cells also showing increased phosphorylation of FGFR1, it was hypothesized these cells may respond to dual inhibition of FGFR1 and EGFR. A dose response curve was completed using osimertinib and the FGFR1 inhibitior PD166866. PD166866 was chosen due to its high selectivity for FGFR1 over other members of the FGFR family. Even with increased FGFR1 noted within PTPRH KO cells, no increased sensitivity was seen upon inhibition with FGFR1 (Figure 4.3D). TREATING MICE WITH HUMAN PTPRH MUTANT TUMORS With PTPRH mutant NSCLC lines responding to osimertinib in vitro, we wanted to determine whether tumors grown from these cell lines would respond in vivo. To explore this, PTPRH mutant H2228 cells or EGFR mutant H1975 cells serving as a positive control were injected into the left flank of nude mice. After tumors reached approximately 6 mm in the largest direction, mice were randomized into vehicle control or drug treatment groups. H1975 injected mice were subjected to an osimertinib dose of 25 mg/kg, and H2228 injected mice were subjected to either 25 mg/kg or 50 mg/kg as seen in the literature. As expected, H1975 injected mice serving as the positive control responded extremely well to osimertinib treatment (Figure 4.4A). While H2228 mice receiving 25 mg/kg of osimertinib failed to respond to treatment, mice treated with 50 mg/kg responded favorably (Figure 4.4B). However, 50 mg/kg treatment had to be stopped after 14 days due to weight loss. 111 With PTPRH mutant tumors showing response to osimertinib in vivo, we wanted to determine whether tumors experienced reduced proliferation and increased apoptosis. After completion of drug course, H2228 injected mice were necropsied with portions of the tumor preserved in formalin, as well as flash frozen for future analysis. To assess proliferation and apoptosis within H2228 tumors, immunohistochemistry was completed for KI67 and TUNEL staining. As seen via KI67 staining, tumors from mice treated with 50 mg/kg of osimertinib had vastly reduced proliferation when compared to tumors from mice given vehicle control (Figure 4.5A). Interestingly, mice given vehicle control actually had slightly increased apoptosis as compared to osimertinib treated mice (Figure 4.5B), which was unexpected. DISCUSSION Initial findings show increased phosphorylation of EGFR upon loss of PTPRH in the NSCLC line H23. Furthermore, 5% of NSCLC patients are shown to have mutations in PTPRH, with certain mutations having predicted high EGFR and PI3K/AKT activity. With an estimated 235,000 cases of lung cancer occurring yearly within the United States (cancer.gov), over 10,000 patients (85% of all lung cancer cases are NSCLC, and 5% of those have PTPRH mutations) who present with PTPRH mutations could potentially benefit from EGFR targeted TKI therapy. Two NSCLC lines with PTPRH mutations were found to respond to the TKI osimertinib in vitro, with the H2228 cell line also responding in vivo. Interestingly, PTPRH mutant cell lines responded to osimertinib, but not the first line TKI erlotinib, even with erlotinib having more affinity for wild type EGFR and osimertinib having more affinity for T790M mutant EGFR. A possible explanation for this may lie in the conformational state of EGFR, which may remain in an activated state upon PTPRH failing to dephosphorylate tyrosine residues on the c-terminal tail of EGFR. However, this hypothesis would need to be further explored. Osimertinib treatment of H2228 PTPRH mutant tumors in mice resulted in tumor shrinkage, showing proof of principal that PTPRH mutant tumors may benefit from treatment with TKIs. Other potential options for treatment of PTPRH 112 targets include dual inhibition of kinases whose signaling pathways are altered by PTPRH loss, or targeting RTKs with proteolysis targeting chimera (PROTAC) molecules, which target them for degradation. Overall, treatment of downstream targets regulated by phosphatases, rather than the phosphatases themselves, may be a viable solution, although this will would require considerable characterization of the pathways affected by deregulated phosphatases. This is especially important to consider with the context dependent nature of PTP regulation, such as PTPRH deactivating EGFR, but activating SRC. MATERIALS AND METHODS POOLED CRISPR KNOCKOUT Guide RNA (AGCACACACTAACATCACCG) for PTPRH was designed using Benchling. Guide was cloned in lentiviral Cas9 plasmid Addgene # 52961. Viral generation was completed through transfection of 293T cells with packaging plasmid psPAX2 and envelop plasmid pMD2.G in a ratio of 3.7:1.2:5 with the Cas9 plasmid respectively. Viafect was used for transfection. Viral supernatant was collected from 293T cells 3 days after transfection, and filtered through a .22 uM syringe filter. 1 mL of filtered viral supernatant was applied to H23 WT cells at ~30% confluency. Puromycin at 2.5 ug/mL was used as a selectable marker. Sanger sequencing was used to confirm knockout, and for TIDE analysis [291]. MTT ASSAY H23 WT and H23 pooled PTPRH KO cells were subjected to an MTT assay. Assay kit (Roche 11465007001) instructions were followed. Assay was completed in triplicate. Graphpad was used to plot and statistically analyze results. A Welch’s two-tailed t-test yielded a p-value of .0137. GROWTH CURVES On day 0, 1.0 x 105 cells were plated in triplicate within 6-well plates. On days 1-5, cells were trypsinized and cell number was read using an automated cell counter. Graphpad was used to plot results. 113 DOSE RESPONSE CURVES Cells were trypsinized and cell concentration was read using an automated cell counter. Cells were then diluted to 5.0 x 104 cells per mL, and 20 uL of cell suspension was added to wells of an opaque 384 well plate using an electronic multichannel pipette. After overnight recovery, cells were subjected to a dose response curve of increasing drug concentration in half log steps. For single drug curves, osimertinib (Cayman AZD9291) range was .00003 to 30 uM. For dual drug curves, osimertinib range was .03 to 10 uM, and either KRAS inhibitor (ARS853, Cayman) or FGFR1 inhibitor (PD166866, Cayman) range was .00003 to 30 uM. 10 mM stocks of drugs were made by diluting with DMSO, and half-log drug series were diluted fresh with complete media. Cell viability was read after 48 hours using Promega’s Cell Titer Glo. Luminescence values were normalized to non-drug treated controls, and plotted using Graphpad. IN VIVO MOUSE TREATMENT H2228 or H1975 cell lines were injected into the left flank of 6-12 week old nude mice. Cells were trypsinized and suspended in PBS at a concentration of 10,000 cells/uL. Mice were briefly anesthetized using isofluorane, and injected using a 25 gauge needle. After tumors reached 6mm in the largest dimension, mice were randomized into one of three treatment groups; vehicle control, 25 mg/kg osimertinib, or 50 mg/kg osimertinib. The 50 mg/kg dose was only used for mice with H2228 tumors. Osimertinib (AZD9291 Cayman) was diluted using the following in order to achieve a final ratio: 5% DMSO, 40% polyethylene glycol, 5% tween-80, 50% H2O. Max volume of treatment was 10 uL for 1 gram of body weight. Mice were weighed on first day of treatment, and volume of drug was adjusted to achieve proper dose. After endpoint (28 days or tumors reaching 20 mm in largest direction), mice were euthanized using CO2, and necropsied. Portions of tumors were preserved in formalin for histology as well as flash frozen for future experiments. Mice were also checked for metastasis. 114 APPENDIX 115 Figure 4.1: Variable growth of PTPRH KO clones Growth curves of H23 WT cells and two H23 PTPRH KO clones show variable growth of knockout clones. 116 Figure 4.2: Increased cellular growth and proliferation upon pooled PTPRH knockdown H23 cells were subjected to pooled knockout of PTPRH using CRISPR. A) TIDE analysis estimates 45% knockout efficiency from sequencing data. Western blotting of pooled PTPRH KO cells showed increased 117 Figure 4.2 (cont’d) phosphorylation of EGFR at Y1197. B) Growth curves and MTT assays using PTPRH pooled KO cells show increased growth and proliferation of PTPRH KO cells compare to PTPRH WT cells. 118 Figure 4.3: Tyrosine kinase inhibitor treatment of PTPRH mutant cell lines 119 Figure 4.3 (cont’d) PTPRH mutations found within cell lines derived from human NSCLC tumors were subjected to dose response curves. H1975 has canonical L858R activating EGFR mutation and T790M resistance mutation. H1975 serves as positive control, but is not inhibited by erlotinib due to T790M resistance mutation. A427 serves as negative control, and has no mutations in EGFR or PTPRH. H1155 line has M188I PTPRH mutation residing within a fibronectin domain. H2228 line has Q887P PTPRH mutation residing in phosphatase domain. A) Cell lines treated with erlotinib, a 1st generation tyrosine kinase inhibitor. B) Cell lines treated with osimertinib, a 3rd generation tyrosine kinase inhibitor used to overcome EGFR T790M resistance mutation. C) H23 PTPRH KO cells don’t respond to TKI inhibition. Since H23 has a KRAS G12C mutation, H23 PTPRH KO cells were subjected to dual inhibition of EGFR (osimertinib) and KRAS (ARS853). No response was seen upon the addition of KRAS inhibitor. D) PTPRH KO cells have increased activation of FGFR1. H23 PTPRH KO cells were subjected to a dual inhibition curve of EGFR inhibitor osimertinib and FGFR1 inhibitor PD166866, however no increased sensitivity to FGFR1 was noted. 120 Figure 4.3 (cont’d) 121 Figure 4.4: In vivo treatment of H2228 PTPRH mutant tumors Graphs showing tumor size (measured in largest dimension) of nude mice injected with human NSCLC lines, and treated with osimertinib via oral gavage. Treatment began when tumors reached ~6.0 mm. X- axis indicates measurements taken post initiation of treatment. A) H1975 injected mice served as positive control arm for drug treatment. B) Experimental arm H2228 injected mice was divided into two treatment arms, 25 mg/kg and 50 mg/kg. Treatment was stopped after 14 days in 50 mg/kg treated mice due to weight loss. 122 Figure 4.5: TUNEL and KI67 staining in PTPRH mutant tumors treated with osimertinib 123 Figure 4.5 (cont’d) Representative pictures of KI67 or TUNEL stained slides, from FFPE preserved mouse tumors. Mouse tumor slides are from osimertinib treatment (50 mg/kg) or vehicle control groups of H2228 (PTPRH mutant) injected mice. A) KI67 staining shows decreased proliferation in mouse tumors treated with 50 mg/kg osimertinib. B) TUNEL staining shows mild increase in vehicle control treated tumors. 124 CHAPTER 5 FUTURE DIRECTIONS 125 METASTASIS IN E2F1 KNOCKOUT MOUSE MODELS While this work characterized the genomes of E2F1 KO mice using extensive bioinformatics analysis, bench work validation is needed to further explore potentially mutated pathways. One of the most fruitful follow ups would be investigation into whether cell adhesion pathways are indeed disrupted within E2F1 KO mice. Previous research from the lab found a decrease in circulating tumor cells within PyMT E2F1 KO mice, supporting the hypothesis that mutated cell adhesion genes allow potentially metastatic cells to leave the primary tumor in greater numbers. A possible exploration into this could involve immunohistochemistry staining for cadherin and other adhesion molecules, in PyMT WT and PyMT E2F1 KO tumors. This experiment however wouldn’t differentiate between potentially disrupted collagen and cadherin fibers, so if these proteins were still present in PyMT E2F1 KO tumors, but were non-functional, staining wouldn’t be able to determine that. Another potential experiment would be the use of cell adhesion assays such as Vybrant. These assays however, are just measures of whether cells are able to bind. They are not capable of measuring the strength of that binding. If disruptions occurred to cell adhesion molecules that allowed them to bind, but the strength of that binding was diminished, these assays would not make that distinction. Advanced microscopy techniques may be prudent to investigate the binding forces of these cells, and could be used on cell lines derived from PyMT tumors. An extremely interesting experiment would be to determine if E2F1 loss is indeed driving an increased mutational burden in cell adhesion genes, and how this may be occurring if it is the case. We also discovered a variation in the mutation profiles of E2F1 KO tumors, with PyMT E2F1 KO tumors having increased association with defective miss-match repair. This may be tied to an increase in mutational burden within cell adhesion genes, as mentioned above. All of this evidence points towards E2F1 loss leading to a shift in the mutation profile of these tumors. As mentioned in chapter two, E2F1 is involved in numerous DNA repair mechanisms, including recruitment of double stranded break and 126 nucleotide excision repair processing factors. Loss of recruitment of these factors, and a disruption to the S phase of the cell cycle could explain why we see a shift in mutation profile in E2F1 KO tumors, although this needs to be confirmed. In our investigation, we also discovered potential disruptions to the WNT pathway, a pathway with known involvement in the epithelial to mesenchymal transition (EMT). Further investigations into WNT and Beta Catenin may prove fruitful. A good place to start may be determining whether these pathways are actually disrupted through a series of western blots for active beta catenin or other downstream signaling molecules. It may also be beneficial to investigate whether PyMT E2F1 KO cells have a reduced ability to undergo EMT. If that is indeed the case, a deeper dive into how E2F1 loss is preventing the epithelial to mesenchymal transition would be a potentially interesting paper. PTPRH MUTATIONS IN HUMAN CANCERS We have uncovered a Ptprh mutation within PyMT mouse tumors. Ptprh mutant tumors correlated with increased phosphorylation of EGFR, a known oncogene. Throughout this work we have shown PTPRH mutations are present in 5% of human NSCLC patients, and many of these patient tumors have predicted high EGFR activity. CRISPR knockout of PTPRH in the H23 NSCLC cell line results in increased phosphorylation of EGFR and AKT, and PTPRH mutant cell lines respond to the TKI osimertinib in vitro and in vivo. This work suggests patients with PTPRH mutant tumors may respond to FDA approved TKI therapy, however there are a lot of avenues left to explore. Below we will discuss the following three research areas that we believe will be the most fruitful going forward. First, it would be prudent to explore the impact of various human PTPRH mutations of the mechanism of interaction between PTPRH and EGFR. Our bioinformatics prediction data suggests certain mutations are more likely to result in the increase of phosphorylated EGFR, with some of these mutations occurring within the fibronectin domains of PTPRH, and other occurring within the phosphatase domain. Mutations within the fibronectin domains could result in a failure of PTPRH to bind target substrates, or 127 it could result in increased homodimerization of PTPRH. Increased dimerization of PTPRH has been shown to decrease activity of the phosphatase. A series of co-immunoprecipitation experiments of various overexpressed PTPRH mutants could potentially answer these questions. It would seem obvious that mutations within the phosphatase domain of PTPRH would abolish catalytic activity, however the majority of phosphatase domain mutations within NSCLC tumors appear outside of the conserved HC-(X5)-R activity motif. This suggests other mechanisms may be at play for the disruption of PTPRH de-phosphorylating EGFR. Other conserved motifs within the phosphatase domain are involved with recognition of phosphorylated tyrosines, so mutations within these motifs might result in a failure to recognize target substrates. Further characterization of the PTPRH mutations occurring in NSCLC, as well as other cancers including melanoma, may prove fruitful for future genetic screening to determine whether patients may benefit from TKI therapy. The second area involves a deeper investigation into potential treatment methods for patients whose tumors harbor PTPRH mutations. While we have shown the PTPRH mutant cell line H2228 to respond in vitro and in vivo to the TKI osimertinib, the response was not as robust as the EGFR mutant line H1975. Interestingly, the H2228 cell line did not respond to the TKI erlotinib, a first generation TKI that has a higher affinity for WT EGFR that osimertinib. If mutations in PTPRH led to increased EGFR signaling, one would expect an inhibitor with higher affinity for WT EGFR would have a greater impact, due to EGFR being WT in this scenario. One potential explanation for this could be EGFR undergoing a conformational shift due to PTPRH failing to remove phosphate residues on the c-terminal tail, however this would need to be further explored. Another potential explanation is the response of H2228 to osimertinib is simply due to other factors outside of PTPRH mutation, however we would point out that another cell line (H1155) with mutant PTPRH also responded to osimertinib inhibition. To rule out this possibility, addback of WT PTPRH through an overexpression experiment would be prudent. However, this experiment would need to be carefully managed, and perhaps done under the direction of the endogenous promoter. This 128 is due to dimerization of PTPRH reducing PTPRH activity. Strong overexpression of WT PTPRH could conceivably result in increased homodimerization due to saturation of the protein in the membrane leading to increased proximity. With this data in mind, there are a number of other avenues to explore. The first is using combination therapies to determine if PTPRH mutant tumors are more responsive to multiple TKIs. We have shown PTPRH KO cells to have increased FGFR1, therefore it is feasible PTPRH mutant tumors may have increased signaling of other RTKs. Profiling the activation of these RTKs, and subsequent dual TKI inhibition may prove a fruitful endeavor for treating PTPRH mutant tumors. While our data is not promising for dual EGFR and FGFR1 inhibition, it is only the result of one FGF1 inhibitor test. With dozens of other FGFR1 inhibitors on the market, it may be prudent to test some of these as well. Further characterization of RTK activation upon PTPRH mutation may lead to other RTKs being discovered as potential targets. Since PTPRH targets are often WT and uninterrupted themselves, PROTACs targeting RTKs and other signaling molecules downstream may be another area to explore. Targeting EGFR or AKT with a PROTAC may be a beneficial treatment, however further characterization of what happens to EGFR molecules after failed interactions with PTPRH needs to be further explored. If mutations in PTPRH simply cause higher turnover of EGFR, and EGFR is already being internalized and marked for ubiquitination at a high rate, PROTAC treatment may prove unbeneficial. With in vivo CRISPR experiments for the treatment of mouse tumors beginning to be explored, as well as viral overexpression of genes in vivo, another avenue may be targeting expression of WT PTPRH to the tumor, although this is most likely a long way off if feasible at all. A third area of interest would be the impact of increased nuclear EGFR within PTPRH mutant tumors. Many questions remain here including determining a mechanism behind increased EGFR within the nucleus, what EGFR is doing within the nucleus, and whether this provides further treatment 129 opportunities. To determine the mechanism behind increased EGFR localization to the nucleus, a prudent first step would be assessing whether nuclear EGFR was full length or truncated in this case. This could be further explored through determining whether canonical mechanisms are responsible for EGFR internalization at the membrane through clathrin-mediated pits, and trafficking through the cell. Recycling of EGFR back to the membrane can be affected by cellular pH. With tumors being known to have increased cellular acidity, it is also possible that mutant PTPRH is leading to increased internalization of EGFR due to a failure to remove phosphate groups, and then altered pH within tumor cells is resulting in increased trafficking to the nucleus. Determining the impact of increased nuclear EGFR could be prudent for investigating tumor biology as well as other potential treatments. EGFR is known to act as a transcriptional coactivator by binding AT rich regions directly on the promotor of certain genes such as cyclin-D1. To determine what molecules and signaling pathways may be affected by nuclear EGFR, a Mass-spec experiment may prove fruitful. Other pathways found deregulated through increased nuclear EGFR, may provide other targets for treatment. Overall, this project has many potential areas for future exploration. 130 WORKS CITED 131 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. WORKS CITED Stehelin D, Varmus HE, Bishop JM, Vogt PK (1976) DNA related to the transforming gene(s) of avian sarcoma viruses is present in normal avian DNA. Nature 260:170–173 Tabin CJ, Bradley SM, Bargmann CI, Weinberg RA, Papageorge AG, Scolnick EM, Dhar R, Lowy DR, Chang EH (1982) Mechanism of activation of a human oncogene. Nature 300:143–149 Stratton MR, Campbell PJ, Futreal PA (2009) The cancer genome. Nature 458:719–724 Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Kinzler KW (2013) Cancer genome landscapes. Science (80- ) 340:1546–1558 Nagase H, Nakamura Y (1993) Mutations of the APC (adenomatous polyposis coli) gene. Hum Mutat 2:425–434 Powell SM, Zilz N, Beazer-Barclay Y, Bryan TM, Hamilton SR, Thibodeau SN, Vogelstein B, Kinzler KW (1992) APC mutations occur early during colorectal tumorigenesis. Nature 359:235–237 Forrester K, Allmoguera C, Perucho M, Han K, Grizzle WE (1987) Detection of high incidence of K- ras oncogenes during human colon tumorigenesis. Nature 327:298–303 Hayakumo T, Nakajima M, Yasuda K, et al (1991) Prevalence of K-ras gene mutations in human colorectal cancers. Nippon Shokakibyo Gakkai Zasshi 88:1539–1544 Fearon ER, Vogelstein B (1990) A genetic model for colorectal tumorigenesis. Cell 61:759–767 Pao W, Miller V, Zakowski M, et al (2004) EGF receptor gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib. Proc Natl Acad Sci 101:13306–13311 Prior IA, Lewis PD, Mattos C (2012) A comprehensive survey of ras mutations in cancer. Cancer Res 72:2457–2467 Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, Mcguire WL (1987) Human Breast Cancer: Correlation of Relapse and Survival with Amplification of the HER-2lneu Oncogene. Science (80- ) 235:177–182 13. Nowell P (1960) A minute chromosome in human chronic granulocytic leukemia. 14. Rowley JD (1973) A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243:290–293 15. Collins SJ, Groudine MT (1983) Rearrangement and amplification of c-abl sequences in the human chronic myelogenous leukemia cell line K-562. Proc Natl Acad Sci U S A 80:4813–4817 132 16. 17. 18. Taberlay PC, Statham AL, Kelly TK, Clark SJ, Jones PA (2014) Reconfiguration of nucleosome- depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res 24:1421–1432 Aran D, Sabato S, Hellman A (2013) DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol 14:R21 Yegnasubramanian S, Wu Z, Haffner MC, et al (2011) Chromosome-wide mapping of DNA methylation patterns in normal and malignant prostate cells reveals pervasive methylation of gene-associated and conserved intergenic sequences. BMC Genomics 12:313 19. Nielsen FC, Van Overeem Hansen T, Sørensen CS (2016) Hereditary breast and ovarian cancer: New genes in confined pathways. Nat Rev Cancer 16:599–612 20. Mendoza PR, Grossniklaus HE (2015) The Biology of Retinoblastoma. In: Prog. Mol. Biol. Transl. Sci. Elsevier B.V., pp 503–516 Epstein SS (1978) The politics of cancer. Sierra Club Books 21. 22. Haenszel W (1966) Epidemiological Approaches to the Study of Cancer and Other Chronic Diseases. In: Natl. Cancer Inst. Monogr. 19. https://books.google.com/books?hl=en&lr=&id=VoxrAAAAMAAJ&oi=fnd&pg=PR7&dq=haenszel+ epidemiological+study+of+cancer+and+other+chronic+diseases&ots=ZEbNQrYnCc&sig=bRfwbC7 2wz-_hHfQtFtFwX6GrOU#v=onepage&q&f=false. Accessed 14 Apr 2020 23. Mason T, McKay F (1974) US cancer mortality by county, 1950-1969. Dhew Publ 74–615 24. Hoover R, Fraumeni JF (1975) Cancer mortality in U.S. counties with chemical industries. Environ Res 9:196–207 25. 26. Blot WJ (1977) Geography of Cancer. Sciences (New York) 17:12–15 Jaehn P, Kaucher S, Pikalova L V., Mazeina S, Kajüter H, Becher H, Valkov M, Winkler V (2019) A cross-national perspective of migration and cancer: incidence of five major cancer types among resettlers from the former Soviet Union in Germany and ethnic Germans in Russia. BMC Cancer 19:869 27. Maskarinec G, Noh JJ (2004) THE EFFECT OF MIGRATION ON CANCER INCIDENCE AMONG JAPANESE IN HAWAII. 28. 29. 30. (1957) SMOKING and health; joint report of the Study Group on Smoking and Health. Science 125:1129–33 London TRC of P of (1962) Smoking and Health. Cutler SJ (1955) A Review of the Statistical Evidence on the Association Between Smoking and Lung Cancer. J Am Stat Assoc 50:267–282 133 31. Gou LY, Niu FY, Wu YL, Zhong WZ (2015) Differences in driver genes between smoking-related and non-smoking-related lung cancer in the Chinese population. Cancer 121:3069–3079 32. 33. Alexandrov LB, Ju YS, Haase K, et al (2016) Mutational signatures associated with tobacco smoking in human cancer. Science (80- ) 354:618–622 Tomasetti C, Vogelstein B (2015) Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science (80- ) 347:78–81 34. Wu S, Powers S, Zhu W, Hannun yusuf (2016) Substantial contribution of extrinsic risk factors to cancer development. Nature. doi: 10.1038/nature16166 35. Wild C, Brennan P, Plummer M, Bray F, Straif K, Zavadil J (2015) Cancer risk: Role of chance overstated. Science (80- ) 347:728 36. 37. 38. 39. Song M, Giovannucci EL (2015) Cancer risk: many factors contribute. Science 347:728–729 Ashford NA, Bauman P, Brown HS, Clapp RW, Finkel AM, Gee D, Hattis DB, Martuzzi M, Sasco AJ, Sass JB (2015) Cancer risk: Role of environment. Science (80- ) 347:727 Tomasetti C, Li L, Vogelstein B (2017) Stem cell divisions, somatic mutations, cancer etiology, and cancer prevention. Science (80- ) 355:1330–1334 Zhu L, Finkelstein D, Gao C, et al (2016) Multi-organ Mapping of Cancer Risk. Cell 166:1132- 1146.e7 40. McFarland CD, Korolev KS, Kryukov G V, Sunyaev SR, Mirny LA (2013) Impact of deleterious passenger mutations on cancer progression. Proc Natl Acad Sci U S A 110:2910–2915 41. 42. Castro-Giner F, Ratcliffe P, Tomlinson I (2015) The mini-driver model of polygenic cancer evolution. Nat Rev Cancer 15:680–685 Kumar S, Warrell J, Li S, et al (2020) Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences. Cell 180:915-927.e16 43. Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS (2014) Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res 42:112 44. Giacomelli AO, Yang X, Lintner RE, et al (2018) Mutational processes shape the landscape of TP53 mutations in human cancer HHS Public Access Author manuscript. Nat Genet 50:1381–1387 45. 46. Yamagiwa K, Ichikawa K (1918) Experimental study of the pathogenesis of carcinoma. J Cancer Res 27:123–81 Pazos P, Lanari C, Meiss R, Charreau EH, Dosne Pasqualini C (1991) Mammary carcinogenesis induced by N-methyl-N-nitrosourea (MNU) and medroxyprogesterone acetate (MPA) in BALB/c mice. Breast Cancer Res Treat 20:133–138 134 47. 48. 49. 50. 51. 52. Bonser M (1954) The Evolution of mammary cancer induced in female IF mice with minimal doses of locally acting methylcholanthrene. Abba MC, Zhong Y, Lee J, Kil H, Lu Y, Takata Y, Simper MS, Gaddis S, Shen J, Marcelo Aldaz C (2016) DMBA induced mouse mammary tumors display high incidence of activating Pik3ca and loss of function Pten mutations. Oncotarget 7:64289–64299 Currier N, Solomon SE, Demicco EG, et al (2005) Oncogenic Signaling Pathways Activated in DMBA-Induced Mouse Mammary Tumors. Toxicol Pathol 33:726–737 Rehm S (1990) Chemically induced mammary gland adenomyoepitheliomas and myoepithelial carcinomas of mice. Immunohistochemical and ultrastructural features. Am J Pathol 136:575–84 Behbod F, Kittrell FS, LaMarca H, et al (2009) An intraductal human-in-mouse transplantation model mimics the subtypes of ductal carcinoma in situ. Breast Cancer Res 11:R66 Valdez KE, Fan F, Smith W, Allred DC, Medina D, Behbod F (2011) Human primary ductal carcinoma in situ (DCIS) subtype-specific pathology is preserved in a mouse intraductal (MIND) xenograft model. J Pathol 225:565–573 53. D’Cruz CM, Gunther EJ, Boxer RB, et al (2001) c-MYC induces mammary tumorigenesis by means of a preferred pathway involving spontaneous Kras2 mutations. Nat Med 7:235–239 54. 55. Andrechek ER, Cardiff RD, Chang JT, Gatza ML, Acharya CR, Potti A, Nevins JR (2009) Genetic heterogeneity of Myc-induced mammary tumors reflecting diverse phenotypes including metastatic potential. Proc Natl Acad Sci U S A 106:16387–16392 Ivics ZN, Hackett PB, Plasterk RH, Izsvá Z (1997) Molecular Reconstruction of Sleeping Beauty, a Tc1-like Transposon from Fish, and Its Transposition in Human Cells its original location and promotes its reintegration else- where in the genome (Plasterk, 1996). Autonomous mem- bers of a transposon family can express an active trans- posase, the trans-acting factor for transposition, and thus are capable of transposing on their own. Nonauton. Cell 91:501–510 56. Drabek D, Zagoraiou L, DeWit T, Langeveld A, Roumpaki C, Mamalaki C, Savakis C, Grosveld F (2003) Transposition of the Drosophila hydei Minos transposon in the mouse germ line. Genomics 81:108–111 57. Ding S, Wu X, Li G, Han M, Zhuang Y, Xu T (2005) Efficient transposition of the piggyBac (PB) transposon in mammalian cells and mice. Cell 122:473–483 58. 59. 60. Collier LS, Carlson CM, Ravimohan S, Dupuy AJ, Largaespada DA (2005) Cancer gene discovery in solid tumours using transposon-based somatic mutagenesis in the mouse. Nature 436:272–276 Kas SM, De Ruiter JR, Schipper K, et al (2017) Insertional mutagenesis identifies drivers of a novel oncogenic pathway in invasive lobular breast carcinoma. Nat Genet 49:1219–1230 Stewart TA, Pattengale PK, Leder P (1984) Spontaneous mammary adenocarcinomas in transgenic mice that carry and express MTV/myc fusion genes. Cell 38:627–637 135 61. 62. 63. Andres A-C, Schonenberger C-A, Groner B, Hennighausent L, Lemeur M, Gerlinger P (1987) Ha-ras oncogene expression directed by a milk protein gene promoter: Tissue specificity, hormonal regulation, and tumor induction in transgenic mice (whey acidic protein gene/whey acidic protein-ras transgene/Y chromosome integration/mammary gland tumors/salivary gland tumors). Dev Biol 84:1299–1303 Vassar R, Rosenberg M, Rosst S, Tyner A, Fuchs E (1989) Tissue-specific and differentiation- specific expression of a human K14 keratin gene in transgenic mice (stratified squamous epithelia). Jackson EL, Willis N, Mercer K, Bronson RT, Crowley D, Montoya R, Jacks T, Tuveson DA (2001) Analysis of lung tumor initiation and progression using conditional expression of oncogenic K-ras. Genes Dev 15:3243–3248 64. Muller WJ, Sinn E, Pattengale PK, Wallace R, Leder P (1988) Single-step induction of mammary adenocarcinoma in transgenic mice bearing the activated c-neu oncogene. Cell 54:105–115 65. Gunther EJ, Belka GK, Wertheim GBW, Wang J, Hartman JL, Boxer RB, Chodosh LA (2002) A novel doxycycline-inducible system for the transgenic analysis of mammary gland biology. FASEB J 16:283–92 66. Debies MT, Gestl SA, Mathers JL, Mikse OR, Leonard TL, Moody SE, Chodosh LA, Cardiff RD, Gunther EJ (2008) Tumor escape in a Wnt1-dependent mouse breast cancer model is enabled by p19Arf/p53 pathway lesions but not p16Ink4a loss. J Clin Invest 118:51–63 67. Moody SE, Sarkisian CJ, Hahn KT, Gunther EJ, Pickup S, Dugan KD, Innocent N, Cardiff RD, Schnall MD, Chodosh LA (2002) Conditional activation of Neu in the mammary epithelium of transgenic mice results in reversible pulmonary metastasis. Cancer Cell 2:451–461 68. Podsypanina K, Politi K, Beverly LJ, Varmus HE (2008) Oncogene cooperation in tumor maintenance and tumor recurrence in mouse mammary tumors induced by Myc and mutant Kras. Proc Natl Acad Sci 105:5242–5247 69. Demarest RM, Dahmane N, Capobianco AJ (2011) Notch is oncogenic dominant in T-cell acute lymphoblastic leukemia. Blood 117:2901–2909 70. Wang X, Cunningham M, Zhang X, Tokarz S, Laraway B, Troxell M, Sears RC (2011) Phosphorylation regulates c-Myc’s oncogenic activity in the mammary gland. Cancer Res 71:925– 936 71. 72. 73. Andrechek ER (2000) Amplification of the neu/erbB-2 oncogene in a mouse model of mammary tumorigenesis. Proc Natl Acad Sci 97:3444–3449 Andrechek ER, Hardy WR, Laing MA, Muller WJ (2004) Germ-line expression of an oncogenic erbB2 allele confers resistance to erbB2-induced mammary tumorigenesis. Liu DP, Song H, Xu Y (2010) A common gain of function of p53 cancer mutants in inducing genetic instability. Oncogene 29:949–956 136 74. 75. 76. 77. Yuan W, Stawiski E, Janakiraman V, et al (2013) Conditional activation of Pik3ca H1047R in a knock-in mouse model promotes mammary tumorigenesis and emergence of mutations. Oncogene 3253:318–326 Cressman VL, Backlund DC, Hicks EM, Gowen LC, Godfrey V, Koller BH (1999) Mammary tumor formation in p53- and BRCA1-deficient mice. Cell Growth Differ 10:1–10 Rao T, Ranger JJ, Smith HW, Lam SH, Chodosh L, Muller WJ (2014) Inducible and coupled expression of the polyomavirus middle T antigen and Cre recombinase in transgenic mice: an in vivo model for synthetic viability in mammary tumour progression. doi: 10.1186/bcr3603 Ranger JJ, Levy DE, Shahalizadeh S, Hallett M, Muller WJ (2009) Identification of a Stat3- dependent transcription regulatory network involved in metastatic progression. Cancer Res 69:6823–6830 78. Masuda T, Xu X, Dimitriadis EK, Lahusen T, Deng CX (2016) “DNA binding region“ of BRCA1 affects genetic stability through modulating the intra-S-phase checkpoint. Int J Biol Sci 12:133–143 79. 80. 81. Annunziato S, Kas SM, Nethe M, et al (2016) Modeling invasive lobular breast carcinoma by CRISPR / Cas9-mediated somatic genome editing of the mammary gland. Genes Dev 30:1470– 1480 Xue W, Chen S, Yin H, et al (2014) CRISPR-mediated direct mutation of cancer genes in the mouse liver HHS Public Access. Nature 514:380–384 Platt RJ, Chen S, Zhou Y, et al (2014) CRISPR-Cas9 Knockin Mice for Genome Editing and Cancer Modeling. Cell 159:440–455 82. Gaj T, Gersbach CA, Barbas CF (2013) ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends Biotechnol 31:397–405 83. Ahronian LG, Lewis BC (2014) Using the RCAS-TVA System to Model Human Cancer in Mice. Cold Spring Harb Protoc 2014:pdb.top069831 84. Guy CT, Cardiff RD, Muller WJ (1992) Induction of mammary tumors by expression of polyomavirus middle T oncogene: a transgenic mouse model for metastatic disease. Mol Cell Biol 12:954–961 85. Golovkina T V, Prakash O, Ross SR (1996) Endogenous Mouse Mammary Tumor Virus Mtv-17 Is Involved in Mtv-2-Induced Tumorigenesis in GR Mice. Virology 218:14–22 86. 87. Andrechek ER (2015) HER2/Neu tumorigenesis and metastasis is regulated by E2F activator transcription factors. Oncogene. doi: 10.1038/onc.2013.540 Jhan J-R, Andrechek ER (2016) Stat3 accelerates Myc induced tumor formation while reducing growth rate in a mouse model of breast cancer. Oncotarget 7: 88. Hollern DP, Honeysett J, Cardiff RD, Andrechek ER (2014) The E2F Transcription Factors Regulate 137 89. 90. 91. Tumor Development and Metastasis in a Mouse Model of Metastatic Breast Cancer. Mol Cell Biol 34:3229–3243 Cardiff RD, Anver MR, Gusterson B a, et al (2000) The mammary pathology of genetically engineered mice: the consensus report and recommendations from the Annapolis meeting. Oncogene 19:968–988 Cardiff RD, Wellings SR (1999) The Comparative Pathology of Human and Mouse Mammary Glands. J. Mammary Gland Biol. Neoplasia 4: Ponzo MG, Lesurf R, Petkiewicz S, et al (2009) Met induces mammary tumors with diverse histologies and is associated with poor outcome and human basal breast cancer. Proc Natl Acad Sci U S A 106:12903–8 92. Hollern DP, Swiatnicki MR, Andrechek ER (2018) Histological subtypes of mouse mammary tumors reveal conserved relationships to human cancers. PLoS Genet. doi: 10.1371/journal.pgen.1007135 93. 94. 95. 96. 97. 98. 99. Lifsted T, Le Voyer T, Williams M, Muller W, Klein-Szanto A, Buetow KH, Hunter KW (1998) Identification of inbred mouse strains harboring genetic modifiers of mammary tumor age of onset and metastatic progression. Int J Cancer 77:640–644 Jeffers M, Fiscella M, Webb CP, Anver M, Koochekpour S, Vande Woude GF (1998) The mutationally activated Met receptor mediates motility and metastasis. Med Sci 95:14417–14422 Seth P, Porter D, Lahti-Domenici J, Geng Y, Richardson A, Polyak K, Kang K-W, Frank SA, Lee W-H, Lee EY-HP (2002) Cellular and molecular targets of estrogen in normal human breast tissue. Cancer Res 62:4540–4 Perou CMCM, Sørile T, Eisen MBMB, et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752 Lukes L, Crawford NPS, Walker R, Hunter KW (2009) The Origins of Breast Cancer Prognostic Gene Expression Profiles. Cancer Res 69:310–318 Flowers M, Schroeder JA, Borowsky AD, Besselsen DG, Thomson CA, Pandey R, Thompson PA (2010) Pilot study on the effects of dietary conjugated linoleic acid on tumorigenesis and gene expression in PyMT transgenic mice. Carcinogenesis 31:1642–1649 Eilon T, Barash I (2011) Forced activation of Stat5 subjects mammary epithelial cells to DNA damage and preferential induction of the cellular response mechanism during proliferation. J Cell Physiol 226:616–626 100. Lou Y, Preobrazhenska O, Auf Dem Keller U, Sutcliffe M, Barclay L, McDonald PC, Roskelley C, Overall CM, Dedhar S (2008) Epithelial-Mesenchymal Transition (EMT) is not sufficient for spontaneous murine breast cancer metastasis. Dev Dyn 237:2755–2768 101. Kretschmer C, Sterner-Kock A, Siedentopf F, Schoenegg W, Schlag PM, Kemmner W (2011) 138 Identification of early molecular markers for breast cancer. Mol Cancer 10:15 102. Zhang M, Tsimelzon A, Chang C-H, Fan C, Wolff A, Perou CM, Hilsenbeck SG, Rosen JM (2015) Intratumoral Heterogeneity in a Trp53-Null Mouse Model of Human Breast Cancer. Cancer Discov 5:520–533 103. McBryan J, Howlin J, Kenny PA, Shioda T, Martin F (2007) ERalpha-CITED1 co-regulated genes expressed during pubertal mammary gland development: implications for breast cancer prognosis. Oncogene 26:6406–6419 104. Herschkowitz JI, Simin K, Weigman VJ, et al (2007) Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors. Genome Biol 8:R76 105. Hollern DP, Andrechek ER (2014) A genomic analysis of mouse models of breast cancer reveals molecular features of mouse models and relationships to human breast cancer. Breast Cancer Res. doi: 10.1186/bcr3672 106. Kirouac DC, Du J, Lahdenranta J, Onsum MD, Nielsen UB, Schoeberl B, McDonagh CF (2016) HER2+ Cancer Cell Dependence on PI3K vs. MAPK Signaling Axes Is Determined by Expression of EGFR, ERBB3 and CDKN1B. PLoS Comput Biol 12:1004827 107. Rennhack J, To B, Wermuth H, Andrechek ER (2017) Mouse models of breast cancer share amplification and deletion events with human breast cancer. J Mammary Gland Biol Neoplasia. doi: 10.1007/s10911-017-9374-y 108. Santarpia L, Lippman SL, El-Naggar AK Targeting the Mitogen-Activated Protein Kinase RAS-RAF Signaling Pathway in Cancer Therapy. doi: 10.1517/14728222.2011.645805 109. Martini M, Chiara M, Santis D, Braccini L, Gulluni F, Hirsch E (2014) PI3K/AKT signaling pathway and cancer: an updated review. doi: 10.3109/07853890.2014.912836org/10.3109/07853890.2014.912836 110. Bild AH, Yao G, Chang JT, et al (2006) Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353–357 111. Pfefferle AD, Herschkowitz JI, Usary J, et al (2013) Transcriptomic classification of genetically engineered mouse models of breast cancer identifies human subtype counterparts. Genome Biol. doi: 10.1186/gb-2013-14-11-r125 112. Nik-Zainal S, Davies H, Staaf J, et al (2016) Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534:47–54 113. Bose R, Kavuri SM, Searleman AC, et al (2013) Activating HER2 mutations in HER2 gene amplification negative breast cancer. Cancer Discov 3:224–237 114. Samuels Y, Wang Z, Bardelli A, et al (2004) High Frequency of Mutations of the PIK3CA Gene in Human Cancers. Science (80- ) 304:554 139 115. McFadden DG, Politi K, Bhutkar A, et al (2016) Mutational landscape of EGFR- , MYC- , and Kras- driven genetically engineered mouse models of lung adenocarcinoma. Proc Natl Acad Sci 113:E6409–E6417 116. Govindan R, Ding L, Griffith M, et al (2012) Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 150:1121–34 117. Rennhack J, Swiatnicki M, Zhang Y, Li C, Bylett E, Ross C, Szczepanek K, Hanrahan W, Jayatissa M, Hunter K (2018) Integrated sequence and gene expression analysis of mouse models of breast cancer reveals critical events with human parallels. bioRxiv 375154 118. Pfefferle AD, Agrawal YN, Koboldt DC, Kanchi KL, Herschkowitz JI, Mardis ER, Rosen JM, Perou CM (2016) Genomic profiling of murine mammary tumors identifies potential personalized drug targets for p53-deficient mammary cancers. Dis Model Mech 9:749–757 119. Gerlinger M, Rowan AJ, Horswell S, et al (2012) Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing. N Engl J Med 366:883–892 120. Navin N, Kendall J, Troge J, et al (2011) Tumor evolution inferred by single-cell sequencing. Nature 472:90–95 121. Pal B, Chen Y, Vaillant F, et al (2017) Construction of developmental lineage relationships in the mouse mammary gland by single-cell RNA profiling. Nat Commun 8:1627 122. Dai C, Arceo J, Arnold J, Sreekumar A, Dovichi NJ, Li J, Littlepage LE Metabolomics of oncogene- specific metabolic reprogramming during breast cancer. doi: 10.1186/s40170-018-0175-6 123. Pitteri SJ, Faca VM, Kelly-Spratt KS, et al (2008) Plasma Proteome Profiling of a Mouse Model of Breast Cancer Identifies a Set of Up-Regulated Proteins in Common with Human Breast Cancer Cells. J Proteome Res 7:1481–1489 124. Schoenherr RM, Kelly-Spratt KS, Lin C, et al (2011) Proteome and Transcriptome Profiles of a Her2/Neu-driven Mouse Model of Breast Cancer. Proteomics Clin Appl 5:179–188 125. Andrechek ER, Cardiff RD, Chang JT, Gatza ML, Acharya CR, Potti A, Nevins JR (2009) Genetic heterogeneity of Myc-induced mammary tumors reflecting diverse phenotypes including metastatic potential. 126. Ponzo MG, Lesurf R, Petkiewicz S, et al (2009) Met induces mammary tumors with diverse histologies and is associated with poor outcome and human basal breast cancer. Proc Natl Acad Sci U S A 106:12903–8 127. Usary J, Zhao W, Darr D, et al (2013) Predicting drug responsiveness in human cancers using genetically engineered mice. Clin Cancer Res 19:4889–4899 128. Jhan J-R, Andrechek ER (2017) Effective personalized therapy for breast cancer based on predictions of cell signaling pathway activation from gene expression analysis. Oncogene 36:3553–3561 140 129. Portier WS (2020) Cancer Clinical Trials: Implications for Oncology Nurses. Semin Oncol Nurs 150998 130. Johnson JI, Decker S, Zaharevitz D, et al (2001) Relationships between drug activity in NCI preclinical in vitro and in vivo models and early clinical trials. Br J Cancer 84:1424–1431 131. Olive KP, Tuveson DA (2006) The use of targeted mouse models for preclinical testing of novel cancer therapeutics. Clin Cancer Res 12:5277–5287 132. Van Norman GA (2019) Phase II Trials in Drug Development and Adaptive Trial Design. JACC Basic to Transl Sci 4:428–437 133. Van Norman GA (2019) Limitations of Animal Studies for Predicting Toxicity in Clinical Trials: Is it Time to Rethink Our Current Approach? JACC Basic to Transl Sci 4:845–854 134. Bailey J, Thew M, Balls M (2014) An analysis of the use of animal models in predicting human toxicology and drug safety. ATLA Altern to Lab Anim 42:181–199 135. Weinstein BS, Ciszek D (2002) The reserve-capacity hypothesis: Evolutionary origins and modern implications of the trade-off between tumor-suppression and tissue-repair. Exp Gerontol 37:615– 627 136. Hemann MT, Greider CW (2000) Wild-derived inbred mouse strains have short telomeres. 137. Uhl EW, Warner NJ (2015) Mouse Models as Predictors of Human Responses: Evolutionary Medicine. Curr Pathobiol Rep 3:219–223 138. Day CP, Merlino G, Van Dyke T (2015) Preclinical Mouse Cancer Models: A Maze of Opportunities and Challenges. Cell 163:39–53 139. Institute. NHGR (2016) The Cost of Sequencing a Human Genome | NHGRI. In: Cost Seq. a Hum. Genome. https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome- cost. Accessed 24 Apr 2020 140. Hwang B, Lee JH, Bang D (2018) Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50:1–14 141. Thomas A, Rajan A, Giaccone G (2012) Tyrosine Kinase Inhibitors in Lung Cancer. Hematol Oncol Clin North Am 26:589–605 142. Zhao S, Fung-Leung W-P, Bittner A, Ngo K, Liu X (2014) Comparison of RNA-Seq and Microarray in Transcriptome Profiling of Activated T Cells. PLoS One 9:e78644 143. Huang CT, Hsieh CH, Chung YH, Oyang YJ, Huang HC, Juan HF (2019) Perturbational Gene- Expression Signatures for Combinatorial Drug Discovery. iScience 15:291–306 144. Li L, Zhao GD, Shi Z, Qi LL, Zhou LY, Fu ZX (2016) The Ras/Raf/MEK/ERK signaling pathway and its role in the occurrence and development of HCC (Review). Oncol Lett 12:3045–3050 141 145. Gatza ML, Lucas JE, Barry WT, et al (2010) A pathway-based classification of human breast cancer. Proc Natl Acad Sci U S A 107:6994–9 146. Subramanian A, Tamayo P, Mootha VK, et al (2005) Gene set enrichment analysis: A knowledge- based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102:15545–15550 147. Veronesi U, Banfi A, del Vecchio M, et al (1986) Comparison of Halsted mastectomy with quadrantectomy, axillary dissection, and radiotherapy in early breast cancer: Long-term results. Eur J Cancer Clin Oncol 22:1085–1089 148. Goodman LS, Wintrobe MM, Dameshek W, Goodman MJ, Gilman A, McLennan MT (1984) Nitrogen Mustard Therapy: Use of Methyl-Bis(Beta-Chloroethyl)amine Hydrochloride and Tris(Beta-Chloroethyl)amine Hydrochloride for Hodgkin’s Disease, Lymphosarcoma, Leukemia and Certain Allied and Miscellaneous Disorders. JAMA J Am Med Assoc 251:2255–2261 149. Pegram MD, Lipton A, Hayes DF, et al (1998) Phase II study of receptor-enhanced chemosensitivity using recombinant humanized anti-p185(HER2/neu) monoclonal antibody plus cisplatin in patients with HER2/neu-overexpressing metastatic breast cancer refractory to chemotherapy treatment. J Clin Oncol 16:2659–2671 150. Slamon DJ, Leyland-Jones B, Shak S, et al (2001) Use of chemotherapy plus a monoclonal antibody against her2 for metastatic breast cancer that overexpresses HER2. N Engl J Med 344:783–792 151. Masoud V, Pagès G (2017) Targeted therapies in breast cancer: New challenges to fight against resistance. World J Clin Oncol 8:120–134 (2019) SEER Stat Database: Mortality. 152. 153. Woodard GA, Jones KD, Jablons DM (2016) Lung cancer staging and prognosis. In: Cancer Treat. Res. Kluwer Academic Publishers, pp 47–75 Ilic M, Ilic I (2016) Epidemiology of pancreatic cancer. World J Gastroenterol 22:9694–9705 154. 155. Dickson P V, Gershenwald JE (2011) Staging and prognosis of cutaneous melanoma. Surg Oncol Clin N Am 20:1–17 156. Mitri ZI, Parmar S, Johnson B, et al (2018) Implementing a comprehensive translational oncology platform: From molecular testing to actionability. J Transl Med 16:358 157. Vagner T, Spinelli C, Minciacchi VR, et al (2018) Large extracellular vesicles carry most of the tumour DNA circulating in prostate cancer patient plasma. J Extracell Vesicles. doi: 10.1080/20013078.2018.1505403 158. Guy CT, Webster MA, Schaller M, Parsons TJ, Cardiff RD, Muller WJ (1992) Expression of the neu protooncogene in the mammary epithelium of transgenic mice induces metastatic disease. Proc 142 Natl Acad Sci 89:10578–10582 159. Nevins JR (1992) E2F: A link between the Rb tumor suppressor protein and viral oncoproteins. Science (80- ) 258:424–429 160. Gorgoulis VG, Zacharatos P, Mariatos G, Kotsinas A, Bouda M, Kletsas D, Asimacopoulos PJ, Agnantis N, Kittas C, Papavassiliou AG (2002) Transcription factor E2F-1 acts as a growth- promoting factor and is associated with adverse prognosis in non-small cell lung carcinomas. J Pathol 198:142–156 161. Qin G, Kishore R, Dolan CM, et al (2006) Cell cycle regulator E2F1 modulates angiogenesis via p53-dependent transcriptional control of VEGF. 162. Rouaud F, Hamouda-Tekaya N, Cerezo M, et al (2018) E2F1 inhibition mediates cell death of metastatic melanoma. Cell Death Dis. doi: 10.1038/s41419-018-0566-1 163. Field SJ, Tsai FY, Kuo F, Zubiaga AM, Kaelin WG, Livingston DM, Orkin SH, Greenberg ME (1996) E2F-1 Functions in mice to promote apoptosis and suppress proliferation. Cell 85:549–561 164. Cancer Genome Atlas Research Network JN, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM (2013) The Cancer Genome Atlas Pan- Cancer analysis project. Nat Genet 45:1113–20 165. Chaffer CL, Weinberg RA (2011) A perspective on cancer cell metastasis. Science (80- ) 331:1559– 1564 166. Fidler IJ (2003) The pathogenesis of cancer metastasis: The “seed and soil” hypothesis revisited. Nat Rev Cancer 3:453–458 167. Welch DR, Hurst DR (2019) Defining the Hallmarks of Metastasis. Cancer Res 79:3011–3027 168. Burnier J V., Wang N, Michel RP, et al (2011) Type IV collagen-initiated signals provide survival and growth cues required for liver metastasis. Oncogene 30:3766–3783 169. Chang TT, Thakar D, Weaver VM (2017) Force-dependent breaching of the basement membrane. Matrix Biol 57–58:178–189 170. Walker C, Mojares E, del Río Hernández A (2018) Role of Extracellular Matrix in Development and Cancer Progression. Int J Mol Sci 19:3028 171. Cox TR, Rumney RMH, Schoof EM, et al (2015) The hypoxic cancer secretome induces pre- metastatic bone lesions through lysyl oxidase. Nature 522:106–110 172. Daves MH, Hilsenbeck SG, Lau CC, Man TK (2011) Meta-analysis of multiple microarray datasets reveals a common gene signature of metastasis in solid tumors. BMC Med Genomics. doi: 10.1186/1755-8794-4-56 173. Cosphiadi I, Atmakusumah TD, Siregar NC, Muthalib A, Harahap A, Mansyur M (2018) Bone 143 Metastasis in Advanced Breast Cancer: Analysis of Gene Expression Microarray. Clin Breast Cancer 18:e1117–e1122 174. Lawrence MS, Stojanov P, Polak P, et al (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499:214–218 175. Ding L, Ellis MJ, Li S, et al (2010) Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464:999–1005 176. Yachida S, Jones S, Bozic I, et al (2010) Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467:1114–1117 177. Wu Y, Ginther C, Kim J, Mosher N, Chung S, Slamon D, Vadgama J V (2012) Expression of Wnt3 Activates Wnt/β-Catenin Pathway and Promotes EMT-like Phenotype in Trastuzumab-Resistant HER2-Overexpressing Breast Cancer Cells. Mol Cancer Res 10:1597–1606 178. Zhan T, Rindtorff N, Boutros M (2017) Wnt signaling in cancer. Oncogene 36:1461–1473 179. Rennhack JP, To B, Swiatnicki M, et al (2019) Integrated analyses of murine breast cancer models reveal critical parallels with human disease. Nat Commun 10:3261 180. Alexandrov LB, Nik-Zainal S, Wedge DC, et al (2013) Signatures of mutational processes in human cancer. Nature 500:415–421 181. Chen J, Zhu F, Weaks RL, Biswas AK, Guo R, Li Y, Johnson DG (2011) E2F1 promotes the recruitment of DNA repair factors to sites of DNA double-strand breaks. Cell Cycle 10:1287–1294 182. Geyer FC, Weigelt B, Natrajan R, Lambros MB, De Biase D, Vatcheva R, Savage K, Mackay A, Ashworth A, Reis-Filho JS (2010) Molecular analysis reveals a genetic basis for the phenotypic diversity of metaplastic breast carcinomas. J Pathol 220:562–573 183. Dietz S, Harms A, Endris V, Eichhorn F, Kriegsmann M, Longuespée R, Stenzinger A, Sültmann H, Warth A, Kazdal D (2017) Spatial distribution of EGFR and KRAS mutation frequencies correlates with histological growth patterns of lung adenocarcinomas. Int J Cancer 141:1841–1848 184. Greaves M, Maley CC (2012) Clonal evolution in cancer. Nature 481:306–313 185. Johnson BE, Mazor T, Hong C, et al (2014) Mutational analysis reveals the origin and therapy- driven evolution of recurrent glioma. Science (80- ) 343:189–193 186. Hao JJ, Lin DC, Dinh HQ, et al (2016) Spatial intratumoral heterogeneity and temporal clonal evolution in esophageal squamous cell carcinoma. Nat Genet 48:1500–1507 187. Nattestad M, Chin C-S, Schatz MC (2016) Ribbon: Visualizing complex genome alignments and structural variation. bioRxiv 0344:82123 188. Chang JT, Nevins JR (2006) GATHER: a systems approach to interpreting genomic signatures. Bioinformatics 22:2926–2933 144 189. Choi EH, Kim KP (2019) E2F1 facilitates DNA break repair by localizing to break sites and enhancing the expression of homologous recombination factors. Exp Mol Med. doi: 10.1038/s12276-019-0307-2 190. Guo R, Chen J, Zhu F, Biswas AK, Berton TR, Mitchell DL, Johnson DG (2010) E2F1 localizes to sites of UV-induced DNA damage to enhance nucleotide excision repair. J Biol Chem 285:19308–19315 191. Francis JC, Melchor L, Campbell J, et al (2015) Whole-exome DNA sequence analysis of Brca2 - And Trp53 -deficient mouse mammary gland tumours. J Pathol 236:186–200 192. Campbell KM, O’Leary KA, Rugowski DE, Mulligan WA, Barnell EK, Skidmore ZL, Krysiak K, Griffith M, Schuler LA, Griffith OL (2019) A Spontaneous Aggressive ERα+ Mammary Tumor Model Is Driven by Kras Activation. Cell Rep 28:1526-1537.e4 193. Rennhack JP, To B, Swiatnicki M, et al (2019) Integrated analyses of murine breast cancer models reveal critical parallels with human disease. Nat Commun 10:3261 194. Shah SP, Roth A, Goya R, et al (2012) The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486:395–399 195. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0. Nat Genet 38:500–501 196. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120 197. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 198. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079 199. Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, Ding L (2012) Somaticsniper: Identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28:311–317 200. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G (2013) Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. doi: 10.1038/nbt.2514 201. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK (2012) VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22:568–576 202. Wang K, Li M, Hakonarson H (2010) ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. doi: 10.1093/nar/gkq603 203. Layer RM, Chiang C, Quinlan AR, Hall IM (2014) LUMPY: A probabilistic framework for structural 145 variant discovery. Genome Biol 15:R84 204. Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO (2012) DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28:333–339 205. Díaz-Gay M, Vila-Casadesús M, Franch-Expósito S, Hernández-Illán E, Lozano JJ, Castellví-Bel S (2018) Mutational Signatures in Cancer (MuSiCa): A web application to implement mutational signatures analysis in cancer samples. BMC Bioinformatics 19:224 206. Ahmadinejad N, Troftgruben S, Maley C, Wang J, Liu L (2019) MAGOS: Discovering Subclones in Tumors Sequenced at Standard Depths. bioRxiv 790386 207. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: An information aesthetic for comparative genomics. Genome Res 19:1639–1645 208. Levi-Montalcini R, Booker B (1960) EXCESSIVE GROWTH OF THE SYMPATHETIC GANGLIA EVOKED BY A PROTEIN ISOLATED FROM MOUSE SALIVARY GLANDS. Proc Natl Acad Sci 46:373–384 209. COHEN S (1962) Isolation of a mouse submaxillary gland protein accelerating incisor eruption and eyelid opening in the new-born animal. J Biol Chem 237:1555–1562 210. Purchio AF, Erikson E, Brugge JS, Erikson RL (1978) Identification of a polypeptide encoded by the avian sarcoma virus src gene. Proc Natl Acad Sci U S A 75:1567–1571 211. Collett MS, Erikson RL (1978) Protein kinase activity associated with the avian sarcoma virus src gene product. Proc Natl Acad Sci U S A 75:2021–2024 212. Carpenter G, King L, Cohen S (1978) Epidermal growth factor stimulates phosphorylation in membrane preparations in vitro. Nature 276:409–410 213. Sefton BM, Hunter T, Beemon K (1979) Product of in vitro translation of the Rous sarcoma virus src gene has protein kinase activity. J Virol 30:311–8 214. Eckhart W, Hutchinson MA, Hunter T (1979) An activity phosphorylating tyrosine in polyoma T antigen immunoprecipitates. Cell 18:925–933 215. Ushiro H, Cohen S (1980) Identification of phosphotyrosine as a product of epidermal growth factor-activated protein kinase in A-431 cell membranes. J Biol Chem 255:8363–8365 216. Weinberg RA (2014) The Biology of Cancer, 2nd Edition. 217. Lawrence MC, McKern NM, Ward CW (2007) Insulin receptor structure and its implications for the IGF-1 receptor. Curr Opin Struct Biol 17:699–705 218. Leppänen VM, Prota AE, Jeltsch M, Anisimov A, Kalkkinen N, Strandin T, Lankinen H, Goldman A, Ballmer-Hofer K, Alitalo K (2010) Structural determinants of growth factor binding and specificity by VEGF receptor 2. Proc Natl Acad Sci U S A 107:2425–2430 146 219. Wiesmann C, Fuh G, Christinger HW, Eigenbrot C, Wells JA, De Vos AM (1997) Crystal structure at 1.7 Å resolution f VEGF in complex with domain 2 of the Fit-1 receptor. Cell 91:695–704 220. Sasaki T, Knyazev PG, Clout NJ, Cheburkin Y, Göhring W, Ullrich A, Timpl R, Hohenester E (2006) Structural basis for Gas6-Axl signalling. EMBO J 25:80–87 221. Plotnikov AN, Schlessinger J, Hubbard SR, Mohammadi M (1999) Structural basis for FGF receptor dimerization and activation. Cell 98:641–650 222. Stauber DJ, DiGabriele AD, Hendrickson WA (2000) Structural interactions of fibroblast growth factor receptor with its ligands. Proc Natl Acad Sci U S A 97:49–54 223. Schlessinger J, Plotnikov AN, Ibrahimi OA, Eliseenkova A V., Yeh BK, Yayon A, Linhardt RJ, Mohammadi M (2000) Crystal structure of a ternary FGF-FGFR-heparin complex reveals a dual role for heparin in FGFR binding and dimerization. Mol Cell 6:743–750 224. Graus-Porta D, Beerli RR, Daly JM, Hynes NE (1997) ErbB-2, the preferred heterodimerization partner of all ErbB receptors, is a mediator of lateral signaling. EMBO J 16:1647–1655 225. Huse M, Kuriyan J (2002) The conformational plasticity of protein kinases. Cell 109:275–282 226. Lemmon MA, Schlessinger J (2010) Cell signaling by receptor tyrosine kinases. Cell 141:1117– 1134 227. Favelyukis S, Till JH, Hubbard SR, Miller WT (2001) Structure and autoregulation of the insulin-like growth factor 1 receptor kinase. Nat Struct Biol 8:1058–1063 228. Pawson T (2004) Specificity in Signal Transduction: From Phosphotyrosine-SH2 Domain Interactions to Complex Cellular Systems. Cell 116:191–203 229. Schlessinger J, Lemmon MA (2003) SH2 and PTB domains in tyrosine kinase signaling. Sci STKE. doi: 10.1126/stke.2003.191.re12 230. Kirkin V, Dikic I (2007) Role of ubiquitin- and Ubl-binding proteins in cell signaling. Curr Opin Cell Biol 19:199–205 231. Avraham R, Yarden Y (2011) Feedback regulation of EGFR signalling: Decision making by early and delayed loops. Nat Rev Mol Cell Biol 12:104–117 232. Clayton AHA, Walker F, Orchard SG, Henderson C, Fuchs D, Rothacker J, Nice EC, Burgess AW (2005) Ligand-induced dimer-tetramer transition during the activation of the cell surface epidermal growth factor receptor-A multidimensional microscopy analysis. J Biol Chem 280:30392–30399 233. Gadella TWJ, Jovin TM (1995) Oligomerization of epidermal growth factor receptors on A431 cells studied by time-resolved fluorescence imaging microscopy. A stereochemical model for tyrosine kinase receptor activation. J Cell Biol 129:1543–1558 147 234. Chung I, Akita R, Vandlen R, Toomre D, Schlessinger J, Mellman I (2010) Spatial control of EGF receptor activation by reversible dimerization on living cells. Nature 464:783–787 235. Ogiso H, Ishitani R, Nureki O, et al (2002) Crystal structure of the complex of human epidermal growth factor and receptor extracellular domains. Cell 110:775–787 236. Garrett TPJ, McKern NM, Lou M, et al (2002) Crystal structure of a truncated epidermal growth factor receptor extracellular domain bound to transforming growth factor α. Cell 110:763–773 237. Zhang X, Gureasko J, Shen K, Cole PA, Kuriyan J (2006) An Allosteric Mechanism for Activation of the Kinase Domain of Epidermal Growth Factor Receptor. Cell 125:1137–1149 238. Schulze WX, Deng L, Mann M (2005) Phosphotyrosine interactome of the ErbB‐receptor kinase family. Mol Syst Biol 1:2005.0008 239. Erba EB, Bergatto E, Cabodi S, Silengo L, Tarone G, Defilippi P, Jensen ON (2005) Systematic analysis of the epidermal growth factor receptor by mass spectrometry reveals stimulation- dependent multisite phosphorylation. Mol Cell Proteomics 4:1107–1121 240. Honegger A, Dull TJ, Szapary D, Komoriya A, Kris R, Ullrich A, Schlessinger J (1988) Kinetic parameters of the protein tyrosine kinase activity of EGF-receptor mutants with individually altered autophosphorylation sites. EMBO J 7:3053–60 241. Guo L, Kozlosky CJ, Ericsson LH, Daniel TO, Cerretti DP, Johnson RS (2003) Studies of ligand- induced site-specific phosphorylation of epidermal growth factor receptor. J Am Soc Mass Spectrom 14:1022–1031 242. Knudsen SLJ, Wai Mac AS, Henriksen L, Van Deurs B, Grøvdal LM (2014) EGFR signaling patterns are regulated by its different ligands. Growth Factors 32:155–163 243. Lynch TJ, Bell DW, Sordella R, et al (2004) Activating Mutations in the Epidermal Growth Factor Receptor Underlying Responsiveness of Non–Small-Cell Lung Cancer to Gefitinib. N Engl J Med 350:2129–2139 244. Paez JG, Jänne PA, Lee JC, et al (2004) EGFR mutations in lung, cancer: Correlation with clinical response to gefitinib therapy. Science (80- ) 304:1497–1500 245. Conte A, Sigismund S (2016) Chapter Six - The Ubiquitin Network in the Control of EGFR Endocytosis and Signaling. In: Prog. Mol. Biol. Transl. Sci. Elsevier B.V., pp 225–276 246. Stern KA, Place TL, Lill NL (2008) EGF and amphiregulin differentially regulate Cbl recruitment to endosomes and EGF receptor fate. Biochem J 410:585–594 247. Raymond JR, Baldys A, Göoz M, Morinelli TA, Lee MH, Luttrell LM, Raymand JR (2009) Essential role of c-Cbl in amphiregulin-induced recycling and signaling of the endogenous epidermal growth factor receptor. Biochemistry 48:1462–1473 248. French AR, Tadaki DK, Niyogi SK, Lauffenburger DA (1995) Intracellular trafficking of epidermal 148 growth factor family ligands is directly influenced by the pH sensitivity of the receptor/ligand interaction. J Biol Chem 270:4334–4340 249. Marti U, Burwen SJ, Wells A, Barker ME, Huling S, Feren AM, Jones AL (1991) Localization of epidermal growth factor receptor in hepatocyte nuclei. Hepatology 13:15–20 250. Psyrri A (2005) Effect of Epidermal Growth Factor Receptor Expression Level on Survival in Patients with Epithelial Ovarian Cancer. Clin Cancer Res 11:8637–8643 251. Lipponen P, Eskelinen M (1994) Expression of epidermal growth factor receptor in bladder cancer as related to established prognostic factors, oncoprotein (c-erbB-2, p53) expression and long- term prognosis. Br J Cancer 69:1120–1125 252. Lin SY, Makino K, Xia W, Matin A, Wen Y, Kwong KY, Bourguignon L, Hung MC (2001) Nuclear localization of EGF receptor and its potential new role as a transcription factor. Nat Cell Biol 3:802–808 253. Lo HW, Hsu SC, Ali-Seyed M, Gunduz M, Xia W, Wei Y, Bartholomeusz G, Shih JY, Hung MC (2005) Nuclear interaction of EGFR and STAT3 in the activation of the iNOS/NO pathway. Cancer Cell 7:575–589 254. Lo H-W, Xia W, Wei Y, Ali-Seyed M, Huang S-F, Hung M-C (2005) Novel prognostic value of nuclear epidermal growth factor receptor in breast cancer. Cancer Res 65:338–48 255. Traynor AM, Weigel TL, Oettel KR, et al (2013) Nuclear EGFR protein expression predicts poor survival in early stage non-small cell lung cancer. Lung Cancer 81:138–141 256. Sefton BM, Hunter T, Beemon K, Eckhart W (1980) Evidence that the phosphorylation of tyrosine is essential for cellular transformation by Rous sarcoma virus. Cell 20:807–16 257. Chernoff J, Li HC, Cheng YS, Chen LB (1983) Characterization of a phosphotyrosyl protein phosphatase activity associated with a phosphoseryl protein phosphatase of Mr = 95,000 from bovine heart. J Biol Chem 258:7852–7857 258. Pallen CJ, Valentine KA, Wang JH, Hollenberg MD (1985) Calcineurin-mediated dephosphorylation of the human placental membrane receptor for epidermal growth factor urogastrone. Biochemistry 24:4727–30 259. Chan CP, Gallis B, Blumenthal DK, Pallen CJ, Wang JH, Krebs EG (1986) Characterization of the phosphotyrosyl protein phosphatase activity of calmodulin-dependent protein phosphatase. J Biol Chem 261:9890–9895 260. Lin MF, Clinton GM (1988) The epidermal growth factor receptor from prostate cells is dephosphorylated by a prostate-specific phosphotyrosyl phosphatase. Mol Cell Biol 8:5477–5485 261. Tonks NK (2013) Protein tyrosine phosphatases - from housekeeping enzymes to master regulators of signal transduction. FEBS J 280:346–378 149 262. Tonks NK, Diltz CD, Fischer EH (1988) Characterization of the major protein-tyrosine- phosphatases of human placenta. J Biol Chem 263:6731–6737 263. Jia Z, Barford D, Flint AJ, Tonks NK (1995) Structural basis for phosphotyrosine peptide recognition by protein tyrosine phosphatase 1B. Science (80- ) 268:1754–1758 264. Pingel JT, Thomas ML (1989) Evidence that the leukocyte-common antigen is required for antigen-induced T lymphocyte proliferation. Cell 58:1055–1065 265. Koretzky GA, Picus J, Thomas ML, Weiss A (1990) Tyrosine phosphatase CD45 is essential for coupling T-cell antigen receptor to the phosphatidyl inositol pathway. Nature 346:66–68 266. Wälchli S, Espanel X, Van Huijsduijnen RH (2005) Sap-1/PTPRH activity is regulated by reversible dimerization. Biochem Biophys Res Commun 331:497–502 267. Matozaki T, Suzuki T, Uchida T, Inazawa J, Ariyama T, Matsuda K, Horita K, Noguchi H, Mizuno H, Sakamoto C (1994) Molecular cloning of a human transmembrane-type protein tyrosine phosphatase and its expression in gastrointestinal cancers. J Biol Chem 269:2075–2081 268. Yao Z, Darowski K, St-Denis N, et al (2017) A Global Analysis of the Receptor Tyrosine Kinase- Protein Phosphatase Interactome. Mol Cell 65:347–360 269. Yao Z, Darowski K, St-Denis N, Babu M, Gingras A-C, Correspondence IS (2017) A Global Analysis of the Receptor Tyrosine Kinase-Protein Phosphatase Interactome. Mol Cell 65:347–360 270. Olayioye MA, Beuvink I, Horsch K, Daly JM, Hynes NE (1999) ErbB receptor-induced activation of Stat transcription factors is mediated by Src tyrosine kinases. J Biol Chem 274:17209–17218 271. Leaman DW, Pisharody S, Flickinger TW, Commane MA, Schlessinger J, Kerr IM, Levy DE, Stark GR (1996) Roles of JAKs in activation of STATs and stimulation of c-fos gene expression by epidermal growth factor. Mol Cell Biol 16:369–375 272. Levy DE, Darnell JE (2002) STATs: Transcriptional control and biological impact. Nat Rev Mol Cell Biol 3:651–662 273. Bjorge JD, Chan TO, Antczak M, Kung HJ, Fujita DJ (1990) Activated type I phosphatidylinositol kinase is associated with the epidermal growth factor (EGF) receptor following EGF stimulation. Proc Natl Acad Sci U S A 87:3816–3820 274. Vivanco I, Sawyers CL (2002) The phosphatidylinositol 3-kinase-AKT pathway in humancancer. Nat Rev Cancer 2:489–501 275. Zhong W, Myers JS, Wang F, et al (2020) Comparison of the molecular and cellular phenotypes of common mouse syngeneic models with human tumors. BMC Genomics 21:2 276. De Ruiter JR, Wessels LFA, Jonkers J (2018) Mouse models in the era of large human tumour sequencing studies. Open Biol. doi: 10.1098/rsob.180080 150 277. Guy CT, Cardiff RD, Muller WJ (1992) Induction of mammary tumors by expression of polyomavirus middle T oncogene: a transgenic mouse model for metastatic disease. Mol Cell Biol 12:954–961 278. Kumar S, Warrell J, Li S, et al (2020) Passenger Mutations in More Than 2,500 Cancer Genomes: Overall Molecular Functional Impact and Consequences. Cell 180:915-927.e16 279. Sato T, Soejima K, Arai E, et al (2015) Prognostic implication of PTPRH hypomethylation in non- small cell lung cancer. Oncol Rep 34:1137–1145 280. (2021) Benchling [Biology Software]. https://help.benchling.com/en/articles/1413662-cite- benchling-in-your-research. Accessed 26 Mar 2021 281. Li J, Yen C, Liaw D, et al (1997) PTEN, a putative protein tyrosine phosphatase gene mutated in human brain, breast, and prostate cancer. Science (80- ) 275:1943–1947 282. Chen CY, Chen J, He L, Stiles BL (2018) PTEN: Tumor suppressor and metabolic regulator. Front Endocrinol (Lausanne) 9:338 283. Seo Y, Matozaki T, Tsuda M, Hayashi Y, Itoh H, Kasuga M (1997) Overexpression of SAP-1, a transmembrane-type protein tyrosine phosphatase, in human colorectal cancers. Biochem Biophys Res Commun 231:705–711 284. Travis WD, Brambilla E, Nicholson AG, et al (2015) The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances since the 2004 Classification. J Thorac Oncol 10:1243–1260 285. Crvenkova S (2015) Prognostic factors and survival in non-small cell lung cancer patients treated with chemoradiotherapy. Maced J Med Sci 3:75–79 286. Shi Y, Au JSK, Thongprasert S, Srinivasan S, Tsai CM, Khoa MT, Heeroma K, Itoh Y, Cornelio G, Yang PC (2014) A prospective, molecular epidemiology study of EGFR mutations in Asian patients with advanced non-small-cell lung cancer of adenocarcinoma histology (PIONEER). J Thorac Oncol 9:154–162 287. Fukuoka M, Wu YL, Thongprasert S, et al (2011) Biomarker analyses and final overall survival results from a phase III, randomized, open-label, first-line study of gefitinib versus carboplatin/paclitaxel in clinically selected patients with advanced non - small-cell lung cancer in Asia (IPASS). In: J. Clin. Oncol. J Clin Oncol, pp 2866–2874 288. Bennett AM, Tang TL, Sugimoto S, Walsh CT, Neel BG (1994) Protein-tyrosine-phosphatase SHPTP2 couples platelet-derived growth factor receptor β to Ras. Proc Natl Acad Sci U S A 91:7335–7339 289. Vogel W, Ullrich A (1996) Multiple in vivo phosphorylated tyrosine phosphatase SHP-2 engages binding to Grb2 via tyrosine 584. Cell Growth Differ 7:1589–1597 290. Nagano T, Tachihara M, Nishimura Y (2018) Mechanism of Resistance to Epidermal Growth 151 Factor Receptor-Tyrosine Kinase Inhibitors and a Potential Treatment Strategy. Cells 7:212 291. Brinkman EK, Kousholt AN, Harmsen T, Leemans C, Chen T, Jonkers J, van Steensel B (2018) Easy quantification of template-directed CRISPR/Cas9 editing. Nucleic Acids Res 46:e58 152