KNOWLEDGE SPILLOVERS AND SAFE DRINKING WATER ACT COMPLIANCE By Kyle James Redican A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Geography – Doctor of Philosophy 2022 iv ABSTRACT KNOWLEDGE SPILLOVERS AND SAFE DRINKING WATER ACT COMPLIANCE By Kyle James Redican In the wake of the 2014 Flint Water Crisis, researchers, regulators, and utility professionals have given increased attention to understanding drivers of (CWS) Safe Drinking Water Act (SDWA) compliance by community water systems (CWSs). Most of this research has only explored system traits while ignoring the vital role of human capital, especially the operator. The status of CWS operators can vary widely between different systems. More critically, scholars have not investigated how effective external linkages between CWS operators have impacted SDWA compliance. Drawing from the theories of Organizational Learning’s inter-organizational learning, Innovation Systems’ knowledge transfers, and Agglomeration Economics’ knowledge spillovers, I hypothesized that increased interactions between CWS operators, facilitated in part by geographic proximity, would lead to more information sharing, increased CWS performance, and fewer SDWA violations. Remarkably little is known about the drivers of inter-operator interactions or whether such interactions improve SDWA compliance, and this research helped fill the data gap through a large-sample survey of CWS operators in Michigan to capture the frequency of interactions along with a range of operator and system characteristics which may explain why some operators participate in more inter-operator interactions than others. With this novel dataset, along with publicly available system and community data, this research first investigated what endogenous operator characteristics were associated with more reported inter-operator interactions. Through multiple methods on reported operator interactions, the Utility and Contract operators and v operators with memberships in professional organizations appear more likely to report more interactions than Non-Affiliated operators and all operators who were not members of professional organizations. Second, based on Tobler’s first law of geography, there should be some spatial autocorrelation in the number of reported interactions, and this was tested using variogram modeling. Observed spatial autocorrelation indicated location-based differences in the number of reported interactions. Third, we used multiple methods to explore the primary research question to identify endogenous and spatial drivers of reported inter-operator interactions. Multiple models found that rural districts had a higher probability of fewer SDWA violations with increased interactions, while the urban districts had the inverse relationship. Fourth, the research incorporated CWS-specific and operator-specific variables, as the operator-specific data were not independent of the CWS observations (since some operators run multiple CWSs). I used a Generalized Linear Mixed-Model to estimate these relationships accounted for the multiple levels and found that more interactions increased the probability of SDWA compliance for certain types of operators. The broader implications of this research encourage stakeholders to pursue more inter- operator interactions as a low-cost mechanism to increase SDWA compliance. Seven avenues to increase interactions are outlined, ranging from open operator contact lists to operator focus groups to identify common problems and solutions to creating a state-level operator mentorship program to support new operators vi ACKNOWLEDGEMENTS This dissertation was not completed because of 10,000 hours of reading and research, but because of the 10,000 hands that held me up with support, love, compassion, and kindness. I would first like to thank my advisor, Dr. Ashton Shortridge (the most Dumbledore like character in my life), who didn’t just mentor me about the dissertation research and writing, but also about how to carry myself as a professional, educator, and person. The lesson that I will always hold closest to me was about stewardship, and how being a good steward means more than just caring about the environment, social justice, or the most visible/relevant justice cause at the time; but it means taking an active role in trying to make your community better and there is no good deed that you are too educated/smart to be doing. Ashton, thank you for everything, working with you has been a life changing experience. Dr. Janice Beecher, thank you for everything. You really helped turn this “squishy” dissertation idea into a great research project. You have provided me every opportunity for success and showed me how to conduct research (especially about CWSs) the right way. When I was a second year Ph.D. student, I wandered into the Institute of Public Utilities hoping for some direction on CWS research, which I received, but more importantly to me, I received an amazing mentor and friend. You helped give me a second home on campus at the IPU and involved in me your research which provided opportunities to continue to expand my CV and gain valuable professional experience. Jan, thank you for everything, I look forward to our continuing work in the future. iv Dr. Igor Vojnovic, thank you for helping shape my view on the world. From my initial interview visit, I knew that I wanted to work with you as you were one of the kindest and most passionate people I had met. During our time in class together, you helped me better understand the global connectivity of everything, and even more importantly, that being a “quant jock” means nothing without the theory behind it. I am so grateful for everything over the years and look forward to continued discussions about the world and the best eateries in town. Dr. Nathan Moore, thank you for helping develop me as a professional and supporting me. You did more than help me through one of the toughest times in my professional life, but you encouraged me to take the high road when I did not want too. Your endless support and care to the students (like me) is inspiring. I hope that I can have a fraction of the positive impact on someone else’s life that you have had on mine. Nathan, thank you for everything and I look forward to the days where we can just enjoy a drink and not have to develop schemes for my professional development. My wife, Kayla Davis. Neither my thesis nor dissertation would have been completed without you. Without you, I would have left the program a long time ago and probably be selling cellphones in Southwest Virginia. Your endless support and cool headedness have helped carry me to heights that I could have never reached alone. All the edits and consulting have helped construct the final research project and the degree. I am so thankful that our journeys became one, and together we can get through anything. I love you with all my heart. My father, Dr. Kerry Redican. A decade ago you would have never betted that you would be reading my dissertation acknowledgements section (you would have guessed you would be reading about me in ‘Crime Times’), but we did it! You have been a fantastic father and your constant pushing me to be my best has gotten us to this point. You have edited v anything I sent you, made the effort to find resolutions, and supported me in every way. I am really lucky that I have you as a father and I am so proud to be your son. Thank you for everything, I owe you so much and it would take me multiple lifetimes of good deeds to pay you back for everything you have done. My sister, Kelly Redican, thank you for always being there for me, and the numerous conversations where you let me vent out my frustrations without judgement. My beautiful niece, Alexandra Ramey, thank you for being you. Since the moment you were born you have been a bright light in my very dark world, and my world has been better since you came into it. My mother, Barbara Redican, thank you for the support. My in-laws, Tery and Marshall Davis, thank you for accepting me into the family and treating me like one of your own. Tery, you have been a strong support beam in turbulent times and always give insightful/thoughtful/heartfelt advice and care. My sister in-law, Maegan Davis, thank you for all the fun, support, and laughs. Doctor Timothy Spedoske- You have been so more than a primary care physician to me. When I was scared, isolated, and just needed a friend you were there for me. Your positivity and encouragement have been a blessing to me over the course of the last few years. You have opened my eyes and been my example of the personification of treating people with love and kindness. Thank you for everything Dr. Spedoske, and while I will graduate and eventually move away, it will be a lot harder for you than that to get rid of me. To my previous teachers: Thank you all for everything you did to help me get to this point. Sharon Ruggles, thank you for always putting me back together every time I fell apart, and not letting me join the circus after my written comps. Dr. Alan Arbogast- thank you for the support, opportunities, great conversations, and insight into higher education over the years. Beth Weisenborn- thank you for all the opportunities and the mentorship in learning how to vi teach online. Dr. Randall Schaetzl- thank you for all the fun times mixed with mentorship and support. Dr. Elizabeth Mack, thank you for the help in constructing the early forms of this dissertation work and for all the lessons about conducting research. I would like to thank other members of the MSU faculty that have made a positive impact on me and my understanding of the world: Dr. Joe Darden/ Dr. Bruce Pigozzi/ Dr. Gary Schnakenberg/ Dr. Raechel Portelli/Dr. Lifeng Luo/ Dr. Julie Winkler. Dr. Yang Zhang (VT), you taught me GIS, helped me through my masters, and gave me confidence to pursue the Ph.D., which is something I am truly grateful for. I would also like to thank Jon Holland of the MRWA for teaching me about CWSs and taking the time to show me multiple systems around Michigan to better my understanding (and great conversation). Dr. Kevin Credit, you were my friend and answered every plea for help. You did it without the ego and just cared about helping me through, which was the model that I tried to replicate for my peers and friends. You rock Kev-Bot, even if you have gone full Euro. Dr. Marc Fialkoff, you were the first person to show me what a Ph.D. student does and how hard you need to work. Dr. Aaron Kamoske for helping me through things and being a friend who constantly supported and sent me so many jobs to apply too. To my many friends and colleagues at MSU: B.J. / Breaunte/ Amanda Rz/ Ponyaun/ Nafesha/ Rajiv/ Ken/ Kyesha/ Lisa/ Raven/ Laura/ Ana/ Kelsey/ Jonah/ Chris/ Jen/ Jonnell/ Pietro/ Joey/ Gabriela/ Donald/ Dan/ Teng/ Judith/ Brad/ April/ Ryan/ Chase/ Lonnie/ Meghan and all the others who have shared their experiences with me, been a part of my community, and helped support me. Shelly/Julian/David thank you all so much for the love and great conversations in the mornings before anyone came in (I looked forward to our morning talks). I would also like to thank all the amazing students I vii had the privilege and pleasure to teach over the years, I probably learned more from you than you have from me. To all my other friends and family, thank you so much for the instrumental role you have played in my life. Alex, Mikey, and little Sloane/ Aaron, Kelly, and Adelaide/ Peter and Kim/ Jiang, Donyaun, and An and the multitude of others that have taught me that your family is the community of people you surround yourself with, and that doesn’t have to be limited to people who share your DNA. Thank you to all my previous students, as y’all were what got me out of bed in the mornings and I believe I have learned more from you than you have from me. Thank you to all the operators who responded to both the survey and interviews and let me better understand your world. This work would not have been possible without your honesty and participation. Finally, thank Tim and to a much lesser and more inconsequential amount Ben, thank you for the laughs. Thank you to everyone, as this would not be possible without the 10,000 hands that carried me to this point. viii TABLE OF CONTENTS LIST OF TABLES ...................................................................................................................... xii LIST OF FIGURES .................................................................................................................... xv KEY TO ABBREVIATIONS................................................................................................... xvii CHAPTER 1: Introduction............................................................................................................ 1 1.1 Background of the Problem ............................................................................................... 1 1.2 Research Questions ............................................................................................................. 4 1.3 Significance of the Study..................................................................................................... 7 1.4 Road Map to the Dissertation ............................................................................................ 8 CHAPTER 2: Literature Review and Theoretical Framework .................................................. 10 2.1 Introduction ....................................................................................................................... 10 2.2 Shape of Water (Background Context) ........................................................................... 10 2.3 Drinking Water Systems................................................................................................... 12 2.4 United States Regulation of Public Water Systems........................................................ 17 2.5 Technical, Managerial, and Financial Capacity ............................................................. 22 2.6 Theoretical models for Knowledge Transfer .................................................................. 27 2.6.1 The Nature of Knowledge ............................................................................................. 27 2.6.2 Organizational Learning .............................................................................................. 28 2.6.3 Innovation Systems Theory ........................................................................................... 33 2.6.4 Agglomeration Economics............................................................................................ 36 2.6.5 Knowledge Transfers and Water Systems Operators ................................................... 40 2.7 Conceptual Framework .................................................................................................... 42 2.7.1 Broad Conceptual Model for Water System Compliance ............................................. 42 2.7.2 Community Water Systems and Knowledge Transfers ................................................. 43 2.7.3 Primary Hypotheses Model .......................................................................................... 46 2.7.4 Endogenous Hypotheses Model .................................................................................... 48 2.7.5 Spatial Hypotheses Model ............................................................................................ 50 2.8 Conclusion .......................................................................................................................... 54 CHAPTER 3: Study Area, Survey, and Data .............................................................................. 56 3.1 Introduction ........................................................................................................................ 56 3.2 Study Area........................................................................................................................... 56 3.2.1 CWSs in Michigan ........................................................................................................ 56 3.2.1.1 Michigan SDWA Regulation ................................................................................. 56 3.2.1.2 Overview of Type and Ownership of Michigan CWSs ......................................... 57 3.2.1.3 Regulatory Units .................................................................................................... 62 3.2.1.4 Michigan CWS Operators ...................................................................................... 65 3.2.2 Advantages and Disadvantages of Michigan as a Study Area ..................................... 69 3.3 Operator Survey and Interview Data .............................................................................. 73 ix 3.3.1 Survey ........................................................................................................................... 73 3.3.1.1 Survey Design ........................................................................................................ 74 3.3.1.2 Survey Construction............................................................................................... 75 3.3.1.3 Selected Survey Results ......................................................................................... 81 3.3.1.4 Survey Representation ........................................................................................... 85 3.3.1.5 Survey Issues and Concerns................................................................................... 94 3.3.2 Interviews...................................................................................................................... 96 3.4 External Data ................................................................................................................... 100 3.4.1 Operator and System Location Data .......................................................................... 100 3.4.2 Performance Metrics .................................................................................................. 106 3.4.3 Managerial Capacity .................................................................................................. 111 3.4.4 Technical Capacity ..................................................................................................... 115 3.4.5 Financial Capacity ..................................................................................................... 117 3.4.6 Natural Advantage ...................................................................................................... 119 3.5 Conclusion ........................................................................................................................ 120 CHAPTER 4: Methods .............................................................................................................. 122 4.1 Introduction ..................................................................................................................... 122 4.2 Kruskal Wallis and X2 tests (Independence Testing) ................................................... 122 4.3 Variogram Modeling ....................................................................................................... 126 4.4 OLS Models ..................................................................................................................... 128 4.5 Geographically Weighted Regression Models .............................................................. 133 4.6 Ordered Logistic Regression Models............................................................................. 139 4.7 Generalized Linear Mixed-Model (GLMM)................................................................. 149 4.7.1 Testing for Random Effects or GLM........................................................................... 158 4.7.2 Diagnostics: Model Strength MLM ............................................................................ 160 4.7.3 GLMM R2 .................................................................................................................... 160 4.7.3.1 Hosmer-Lemeshow Test ...................................................................................... 162 4.8 Conclusion ........................................................................................................................ 163 CHAPTER 5: Results................................................................................................................. 165 5.1 Introduction ..................................................................................................................... 165 5.2 Endogenous Hypotheses Results .................................................................................... 167 5.2.1 Descriptive Statistics and Tests of Independence [EN-1, EN-2, EN-3] ..................... 167 5.2.2 Endogenous Hypotheses Ordered Logistic Regression Models [EN-1, EN-2, EN-3] 171 5.2.3 Endogenous Hypotheses Meaning of Results ............................................................. 175 5.3 Spatial Hypotheses Results ............................................................................................. 177 5.3.1 Descriptive Statistics and X2 Tests [SP-1] ................................................................. 177 5.3.2 Variogram Interactions Model [SP-2] ....................................................................... 178 5.3.3 Spatial Hypotheses Meaning of Results...................................................................... 179 5.4 Primary Hypotheses Results .......................................................................................... 180 5.4.1 Regional Advantage Explorations [Prim-1, Prim-2] ................................................. 180 5.4.2 Districts Exploration OLS Model Results and Plots [Main-1, Main-2]..................... 181 5.4.3 Districts Exploration GWR Model Results and Plots [Prim-1] ................................. 188 5.4.4 Operator Only Model (Ordered Violation Percentages) [Prim-1, Prim-2]............... 193 5.4.5: GLMM System and Operator levels for Binary SDWA Violations [Prim-1, Prim-2]196 x 5.4.6: Primary Hypotheses Meaning of Results .................................................................. 201 CHAPTER 6: Discussion and Conclusion ............................................................................... 203 6.1 Introduction ..................................................................................................................... 203 6.2 Theoretical Contributions .............................................................................................. 205 6.2.1 Organizational Learning ............................................................................................ 205 6.2.2 Innovation Systems Theory ......................................................................................... 207 6.2.3 Agglomeration Economics.......................................................................................... 209 6.3 Broader CWS Compliance Research and Regulation ................................................. 211 6.4 Directions for future research ........................................................................................ 215 6.5 Conclusion ........................................................................................................................ 218 APPENDICES ........................................................................................................................... 219 APPENDIX A: Open Response to the last Survey Question ............................................. 220 APPENDIX B: Ordered Logistic Regression Variables for Interactions and Violation Group Dependent Variable Models ..................................................................................... 223 APPENDIX C: OLR models for interactions (6 models) .................................................. 225 APPENDIX D: OLR for Primary Hypotheses of Operator only level ............................. 226 APPENDIX E: GLMM models for primary hypotheses ................................................... 227 APPENDIX F: GVIF Tables for All Models ...................................................................... 228 APPENDIX G: Convergence Plots for OLR models ......................................................... 231 BIBLIOGRAPHY .............................................................................................................................. 234 xi LIST OF TABLES Table 1: EPA Minimum State Operator Standards………………………………………...…19 Table 2: Outline of Michigan and Indiana States Operator Certification Standards (adjusted from Oxenford, 2018)……………………………………………………………………………....21 Table 3: Systems New to the SDWIS and EGLE database in 2020 Quarter 4 that were not in 2019 Quarter 4………………………………………………………………………………..57 Table 4: CWSs that were in SDWIS and EGLE databases in 2019 Q4 but not in 2020 Q4…58 Table 5: Overview of CWSs ownership and function using Beecher et al. (2020) classification scheme......................................................................................................................................59 Table 6: Overview of Systems, Population, and Counties for Michigan EGLE Community Water Regions and Districts (2020 systems/population numbers) .....................................................63 Table 7: Overview of Michigan CWSs DO by types and number of CWSs ...........................69 Table 8: Overview of the Timeline and tasks for the Survey and Interview data collection ...73 Table 9: Survey Questions on the Background of the Operator .............................................76 Table 10: Survey Questions on the Operator’s Employment ..................................................78 Table 11: Survey Questions on the Operator’s Interactions and Knowledge Spillovers and Transfers ..................................................................................................................................80 Table 12: Survey Respondents broken down by Operator Type .............................................86 Table 13: Survey Respondents broken by Beecher et al. (2020) Classifications (2-level)......87 Table 14: Survey Respondents broken by Beecher et al. (2020) Classifications (4-level)......89 Table 15: Survey Respondents broken down by 5-Level Population Served Size Categories ..................................................................................................................................................90 Table 16: Interview Questions, their purpose, and probes.......................................................98 Table 17: Overview of the data and CWSs in Figure 13 .......................................................102 Table 18: Overview of the sample of CWSs by the location .................................................104 xii Table 19: Michigan 2020 Violations Overview.....................................................................109 Table 20: Overview of the 48 OLS Models for Michigan EGLE Regions and Districts ......131 Table 21: Overview of Variables for the System and Operator Level Models .....................151 Table 22: Overview of All GLM models...............................................................................155 Table 23: Testing Results for GLM vs GLMM models ........................................................159 Table 24: Overview of all methods and which hypotheses they relate to .............................164 Table 25: Overview of research hypotheses and methods employed ....................................166 Table 26: Descriptive Statistics for All Variables and Tests of Independence......................167 Table 27: All Ordered Logistic Regression Models Results .................................................171 Table 28: Best Model and Endogenous Hypotheses Model Summaries ...............................172 Table 29: Overview of the Responses for County of inter-operator interactions and chi-square test of independence ...............................................................................................................177 Table 30: Michigan EGLE OLS Results on Aggregated Violations on Different Interactions Measures ................................................................................................................................180 Table 31: OLS model results for the Percentage of Systems with any 2020 Violation ........181 Table 32: OLS Model Plots for the Percentage of Population Served by a System with a 2020 SDWA Violation....................................................................................................................183 Table 33: OLS Models Results for the Percentage of Systems with a 2020 Major Violation ................................................................................................................................................184 Table 34: OLS Models Results for the Percentage of Population Served by a System with a 2020 Major Violation......................................................................................................................186 Table 35: Overview Table of the Results of the Tricube GWR Models ...............................188 Table 36: Results of the Ordered Logistic Regression Models on the Operator Only Variables ................................................................................................................................................193 Table 37: Overview of R2 values, Hosmer-Lemeshow Test, and the AIC/BIC for the GLMMs ................................................................................................................................................196 Table 38: Results of the GLMM models for Model 1 All and Model 2 All ..........................198 xiii Table 39: Results of the "Best" or Step GLMM Models .......................................................199 Table 40: Overview of the Research Hypotheses and Key Findings.....................................204 Table 41: All open-ended survey question responses grouped by positive, neutral, and other ................................................................................................................................................220 Table 42: Ordered Logistic Regression Variable Overview for Endogenous and Primary Hypotheses Model………………………………………………………………………….223 Table 43: Overview of the Six Ordered Logistic Regression Models Investigating the Endogenous Hypotheses…………………………………………………………………...225 Table 44: Ordered Logistic Regression Models for Primary Hypotheses Investigation at only the Operator Level……………………………………………………………………………..226 Table 45: Overview of All Six Primary Hypotheses GLMMs…………………………….227 Table 46: GVIF Measures of Multi-Collinearity for Endogenous Hypotheses Models…...228 Table 47: GVIF Measures of Multi-Collinearity for Primary Hypotheses Models (Ordered Logistic Regression)……………………………………………………………………….229 Table 48: GVIF Measures of Multi-Collinearity for Primary Hypotheses Models (GLMM)…………………………………………………………………………………...230 xiv LIST OF FIGURES Figure 1: Changes in the Number of Community Water Systems in the U.S. between 2000 and 2019..........................................................................................................................................23 Figure 2: Technical Managerial and Financial Capacity Venn Diagram from Shanaghan et al. (1998) .......................................................................................................................................24 Figure 3: Broad Knowledge Sharing and Firm Performance Model (from Wang and Wang, 2012) ..................................................................................................................................................31 Figure 4: Knowledge Sharing Benefit Example ......................................................................32 Figure 5: Community Water Systems Capacity and Compliance Model ................................42 Figure 6: Knowledge Sharing and Water Systems Conceptual Model Example ....................45 Figure 7: Primary Hypotheses on Performance and Knowledge Spillovers............................46 Figure 8: Hypotheses Endogenous Factors and Interactions ...................................................49 Figure 9: Spatial Hypothesis 1 (SP-1)- County Specific Hypothesis ......................................52 Figure 10: Spatial Hypothesis 2 (SP-2)- Spatial autocorrelation of inter-operator interactions ..................................................................................................................................................53 Figure 11: Michigan EGLE Community Water Regions and Districts (2021) .......................62 Figure 12: Sample Representation of Michigan EGLE Community Water Regions ..............91 Figure 13: Sample Representation of Michigan EGLE Community Water Districts ..............92 Figure 14: Map of Isabella County Three CWSs and the Census Tracts they intersect ........102 Figure 15: Comparison of SDWA violation in research sample to all systems and systems not in the sample ..............................................................................................................................110 Figure 16: Maps of Operator Interactions by System and Number of Systems ....................128 Figure 17: Kernel Types from Lu et al., (2014) .....................................................................137 Figure 18: Multiple Levels of CWSs and Operations ............................................................156 Figure 19: Endogenous Hypotheses Model (M3.Enall) Odds Ratios ....................................174 xv Figure 20: Semi-Variogram Plots of Surveyed Operators Interactions .................................178 Figure 21: OLS Plots for Percentage of Systems with any 2020 Violation on the Three Interactions Measures ............................................................................................................181 Figure 22: OLS Model Plots for the Percentage of Population Served by a System with any 2020 Violation on the Three Interactions Measures .......................................................................183 Figure 23: OLS Model Plots for the Percentage of Systems with a 2020 Major Violation on the Three Interactions Measures ..................................................................................................184 Figure 24: OLS Models Plots for the Percentage of Population Served by a System with a 2020 Major Violation on the Three Interactions Measures ............................................................186 Figure 25: 2020 Major Violations on Surveyed Interactions (13 neighbors) ........................189 Figure 26: 2020 Major Violations on Survey Interactions (22 Neighbors) ...........................190 Figure 27: 2020 Percentage of Systems with Major Violations on Imputed Interactions (13 neighbors) ..............................................................................................................................191 Figure 28: 2020 Percentage of Systems with Major violation on Imputed Interactions (22 neighbors) ..............................................................................................................................192 Figure 29: Operator Level Odds ratio for Operator Type, Interactions, and Violation Groupings ................................................................................................................................................195 Figure 30: Endogenous Hypotheses Convergence Plots .......................................................231 Figure 31: Primary Hypotheses OLR Convergence Plots .....................................................232 Figure 32: GLMM Convergence Plots ..................................................................................233 xvi KEY TO ABBREVIATIONS CWS Community Water Systems SDWA Safe Drinking Water Act of 1974 EPA United States Environmental Protection Agency TMF Technical, Managerial, and Financial Capacity UN United Nations EGLE Michigan Department of Environment, Great Lakes, and Energy MCL Maximum Contaminant Levels 2, D-4 Dichlorophenoxyacentic Acid PFNA Perflourononanoic Acid CCR Consumer Confidence Report DWSRF Drinking Water State Revolving Fund R&D Research and Development CDP Capacity Development Partnerships Prim-1 Primary Hypothesis 1 Prim-2 Primary Hypothesis 2 En Endogenous Hypotheses SP Spatial Hypotheses DEQ Michigan Department of Environmental Quality MRWA Michigan Rural Water Association OTCIS Michigan EGLE’s Operator Training and Certification Information System FOIA Freedom of Information Act Request xvii DO Designated Operator In Charge SDWIS EPA’s Safe Drinking Water Information System IPPSR Michigan State University’s Institute for Public Policy and Social Research CEC Michigan community water system operators continuing education credits PWS ID Public Water System Identification Number AWWA American Water Works Association GAO Government Accountability Office (United States) OIG Office of Inspector General (United States) ECHO United States Environmental Protection Agency’s Enforcement and Compliance History online MHI Median Household Income MHV Median Home Value EQ Environmental Quality CWA Clean Water Act MDOT Michigan Department of Transportation OLS Ordinary Least Squares GWR Geographically Weighted Regression OLR Ordinal Logistic Regression GLM General Linear Model GLMM General Linear Mixed Model xviii CHAPTER 1: Introduction 1.1 Background of the Problem Access to clean drinking water is considered a fundamental human right (U.N., 2018). There are very few water sources that are uncontaminated by either natural or anthropogenic factors, and most water resources require some sort of treatment before it is safe for human consumption (Zimmerman et al., 2008). Drinking water systems are the holistic entity that include the extraction, treatment and delivery of drinking water that is safe for human consumption (National Research Council & Safe Drinking Water Committee, 1984). When the water system fails to provide safe drinking water, they are infringing on the basic human right of access to safe drinking water. In recent years, the most notable, researched, and discussed failure of a U.S. water system was in Flint, Michigan. In the Flint Water Crisis, which started in 2014, multiple safeguards (both infrastructure and human capital) failed to provide safe drinking water to the residents of Flint, Michigan. These failures exposed an estimated 100,000 people (6,000 to 12,000 children) to unsafe levels of lead in the water and resulted in at least 12 deaths and more than 80 hospitalizations (Booker, 2021). Nine individuals involved with the Flint water system (at varying levels of involvement in either the direct operations or regulatory oversight) have been indicted for criminal charges for the system’s failure (Haddad, 2021). This notable event triggered increased research interest on drinking water systems, and a web of science search of scholarly articles using the term “drinking water systems” had 109.35% more scholarly articles between 2014 and 2021 (582) than between 2007 and 2014 (278). However, many of these articles only focus on the physical infrastructure impact (Allaire et al., 2018; Statman-Weil et al., 2020) on drinking water systems and ignore the human/labor capital parts of the drinking water systems. The nine individuals under indictment for their role in the Flint water system’s 1 failure were key to the failures in Flint, and a better understanding of the failures in the human capital of management and operations of the systems could help avoid future failures. Organizational learning, innovation systems, and agglomeration economics theories stress the importance of knowledge transfers and spillovers among unrelated firms and organizations to facilitate cross-organizational learning and solve complex problems to advance innovation and increase the organization’s performance (Marshall and Marshall, 1920; Fischer and Fröhlich, 2001; Levitt and March, 1988; Asheim et al., 2011; Wehn and Montalvo, 2018). Organizational learning emphasizes that one possible avenue to increase an organization’s performance is through inter-organization learning, which can be achieved through knowledge transfers and spillovers between two unrelated organizations (Levitt and March, 1988). More broadly, knowledge transfer theories focus on the actual movement of knowledge between organizations regardless of space, while knowledge spillovers include the geographic component of spatial proximity which encourages or discourages the transfer of knowledge between unrelated firms (Asheim et al., 2011). The spatial extent of the knowledge transfers and spillovers has interested researchers as a means of measuring regional advantages stemming from firm location based on the regional culture (Saxenian, 1996). Knowledge transfers and spillovers that lead to regional advantages have been explored in their ability to help solve complex problems to increase performance in the banking sector (Gilbert and Cordey-Hayes, 1996), accounting sector (Rodgers et al., 2016), technology sector (Quah, 2001; Wang and Wang, 2012; Wang et al., 2016), tourism sector (Weidenfeld, Williams, and Butler, 2010), and the manufacturing sector (Hamdoun, Jabbour, and Othman, 2018), as well as in the promotion of entrepreneurship (Agarwal et al., 2004; Audretsch and Keilbach, 2008). 2 There has yet to be an exploration of how knowledge transfers and spillovers influence regional advantages on resource-based sectors, and specifically public utility performance. Public utilities are not like private organizations because they are subject to different regulatory oversight and are relatively geographically immobile (Beecher, 2009; Beecher, 2013). Considering these limitations, U.S. community water systems’ (CWSs) ability to comply with the Safe Drinking Water Act of 1974 (SDWA) has been seen as a complex task (Shanaghan et al., 1998; Beecher, 2009; Tiemann, 2014), and if the knowledge transfer and spillover theories fit with public utilities1, then knowledge transfers and spillovers between unrelated CWSs could facilitate SDWA compliance. Compliance with SDWA requirements and standards is used by regulators and researchers as a measure of CWS performance, focusing on whether the system provides safe drinking water to its population (Rubin, 2013; Tiemann, 2014). In the 1996 Amendments to the SDWA, Congress required the Environmental Protection Agency (EPA) to implement a framework to ensure that new and existing water systems would have adequate technical, managerial, and financial (TMF) capacity (or “capability”) to further the goal of SDWA compliance (Shanaghan et al., 1998). One dimension of managerial capacity is ‘effective external linkages’, encompassing connections between the system and its service population, governmental agencies, and other water systems (Shanaghan et al., 1998). Subsequent literature has mainly explored the interactions between CWSs and federal, state, and local regulators and has found that compliance increases with more interactions and linkages to regulatory agencies (Ottem et al., 2003; Mullin, 2009; Grooms, 2016). Other scholars (Switzer, 2017; Montgomery 1 It is imperative to note that not all CWSs are Utilities, but the regulatory unit. Detailed discussion of the difference between CWSs and Utilities discussed in more detail in Section 3.2 3 et al., 2018) explored the relationships between compliance and public participation and showed increased linkages between CWSs and their service population were correlated with increased CWSs SDWA compliance. The interrelationships and knowledge transfers between unrelated systems have only been explored in international studies (Leinert et al., 2006; Meene et al., 2011; Wehn and Montalvo, 2018) and the findings have pointed to an increase in the perceived ability of operators to complete their job and provide high quality drinking water when they have more inter-operator interactions. However, due to the differences between U.S. and international regulatory and water system structures, far less is known about the potential role of various water system interactions in terms of knowledge spillovers and formation of regional advantages in the U.S. context, including dynamic effects associated with learning. 1.2 Research Questions Through bringing organizational learning’s theory of inter-organization learning, innovation systems’ theory of knowledge transfer, and agglomeration economics’ theory of knowledge spillovers into CWS compliance research, this research addresses the knowledge gap of the role of inter-operator interactions and CWS SDWA compliance, through exploring a series of interconnected hypotheses. The primary hypotheses of the study aim to answer the primary research question: what is the nature of regional advantages for inter-organizational learning, knowledge transfers, and knowledge spillovers, which facilitate CWS SDWA compliance? This is an investigation of whether the theories about knowledge transfers and spillovers extend past the previous research which showed benefits for private organizations. It focuses on the regional advantages created by the local culture of knowledge sharing and accounts for the location of the CWSs. However, since there has been no research that has explicitly explored knowledge transfers and spillovers between CWS operators within the U.S. context, two other research questions need to be addressed first. 4 CWS operators in the U.S. are not all the same as there are key structural differences in their employment that reflects the CWS. CWSs are defined by the EPA (2020) as public water systems that supply the same population of at least 25 of more people all year round. This definition classifies the small mobile home park system that serves 25 people in the same category as the New York City CWS serving over 8 million people. Due to this heterogeneity of CWSs, there are differences in the CWS operators’ employment and backgrounds. In Michigan there are three identified types of CWS operators: Utility operators (operators fully employed by a utility), Contract operators (operators employed by a consulting/engineering firm to run CWSs they do not own), and Non-Affiliated operators (operators not employed by a utility or contracting firm). These differences lead to the endogenous research question: in what manner does the operator type, professional engagement, or education background lead to greater interactions? The Utility and Contract operators would be theorized to have greater professional engagement and thus more interactions with outside CWS operators, than the Non-Affiliated operators. Professional engagement through water organizations would also be a possible avenue to explain heterogeneity in the number of inter-operator interactions, as operator membership to a water related professional group should increase the number of inter-operator interactions. Previous research ( Teodoro, 2014; Shahr et al., 2019) has pointed to educational attainment being related to both performance and professional engagement, where college educated individuals are more likely to be professionally engaged. This research investigates these questions and hypotheses to ensure that relationship between performance and knowledge transfers and spillovers is not characterized by spurious correlations among some of the structural features of the operator’s background or employment. 5 The direct geographic influence of nearby systems and regional culture needs to be explored in order to understand whether inter-organizational knowledge spillovers are based around spatial proximity of CWSs. Tobler (1970)’s first law of geography states that nearby things are more similar than things that are distant. This is important because without a spatially explicit investigation, the results could be missing regional advantages and local culture. Organization learning (Levitt and March, 1988; Wang and Wang, 2012) and agglomeration economics (Saxenian, 1996; Rosenthal and Strange, 2004) theoretical investigations have found that the amount of knowledge exchanged between organizations is dependent on the local culture. Based on the fundamental law of geography that spatial proximity matters, and the previous findings of organizational learning and agglomeration economics, the spatial component representing local culture needs to be investigated to ensure the results are not skewed by heterogeneity in local knowledge transfer and spillovers culture. The CWS literature has tended to use counties as the spatial unit, as the county is highest level of spatial resolution for CWSs tracked by the EPA (Rubin, 2013). Based on the different types of operators there should be differences in the spatial extents of their interactions. This leads to the first spatially explicit question: How does the type of operator (Utility/Contract/Non-Affiliated) impact the location (county) where their primary interactions with other operators occur? Utility and Non-Affiliated operators are typically only operating either one or just a couple CWSs embedded in a single county, which leads to the hypothesis that these types of operators will primarily interact with other operators within their own county. While the Contract operators will typically run many systems that span multiple counties, leading to the hypothesis that these operators will have more inter-operator interactions with operators of CWSs ‘outside the county.’ The second research question for the spatial investigation, assess Tobler (1970)’s first 6 law of geography on CWS inter-operator interactions; how are CWS reported interactions spatially autocorrelated? If the first rule of geography is true for CWS inter-operator interactions, then the amount of CWS operator interactions reported will be similar based on spatial proximity. The endogenous hypotheses and the spatial hypotheses are crucial for appropriate investigation of the primary research question. Without understanding the operator specific characteristics or the spatial autocorrelation of the interactions, this research would be at risk of missing the reality of the impact of inter-operator interactions on CWS performance. This research investigates all the components that lead to the number of interactions and then focuses on the impact of inter-operator interactions on CWS performance. 1.3 Significance of the Study The intellectual merit of this work is to elaborate on the existing theoretical foundation for understanding the relevance of knowledge transfers and spillovers in the domain of CWS capacity development, as well as research on determinants of CWS performance, including SDWA compliance. The research will expand organizational learning’s theory of inter- organizational learning, innovation system’s theory of knowledge transfers, and agglomeration economics’ theory of knowledge spillovers through the investigation of a new domain of CWS performance based around their inter-operator interactions. It departs from much of the previous research on private industries that measures performance through revenue or profit, because these are not the main goal of CWSs. CWSs’ main goal is to provide safe and reliable drinking water to its service population, which implies that it meets all federal regulatory standards. This 7 departure from the previous literature offers the opportunity to expand the knowledge transfers and spillovers theories from sole focus on private organizations to include the resource based and highly regulated sectors. Performance being measured in the contextual framework of these theories as regulatory compliance is a departure from the norm and could expand the research to start including the public services sector. It furthers the agglomeration economics’ theory of knowledge spillovers through the exploration of geographically immobile organizations, which have yet to be explored. Although the research detailed here examines CWS operators, other organizations (both public and private) that deal in the utilities, public service provision or natural resource usage as well as exhibit traits of spatial stationarity may be able to utilize results from this research to better understand how these organizations can increase their performance. The broader impacts of this research include informing drinking water policymakers and stakeholders in terms of identifying alternatives means of developing the capacity of CWS operators. The alternative mechanism for CWS operator learning, whether in proximate or isolated systems, can help provide safe drinking water. Through presenting and analyzing operator specific perspectives, this research should help increase the general understanding of knowledge transfers and spillovers as tools for increasing performance of systems. 1.3 1.4 Road Map to the Dissertation The rest of the dissertation will follow the structure explained here. Chapter two first provides extensive backgrounds on water related research, regulation, and TMF capacity. Chapter two then shifts into a literature review of organizational learning, innovation systems, agglomeration economics, and all the research (primarily international) that has looked at operator interactions. Following the literature review, the conceptual and theoretical framework 8 is laid out, and the hypotheses of the research are explained. Chapter three focuses on explaining the study area, survey and interview data collected, and all other “outside” data collected for this research. In the data chapter, previous CWS research findings and variables are addressed further. Chapter four outlines the methods used to investigate the hypotheses. Chapter five presents the results of the research and addresses how they the support or contradict the research hypotheses. Chapter six is the discussion and conclusion, which provides what this research contributes to the overall body of literature, policy recommendations, and directions for future research. 9 CHAPTER 2: Literature Review and Theoretical Framework 2.1 Introduction The main objective of this dissertation is to explore whether knowledge sharing and spillovers between CWS operators can yield better rates of SDWA compliance. This chapter aims to provide key background, theoretical, and literature context to the connections between SDWA compliance and CWSs. First, the chapter provides key background context about water, and then outlines what makes water and CWSs a unique and important research topic. It next explains the regulation of CWSs in the United States. Then, this chapter defines knowledge and its instrumental role in increasing the performance of organizations by explaining how knowledge research is a fundamental component of economic geography within three key theoretical domains: organizational learning, innovation systems, and agglomeration economics. This chapter highlights both the gaps in the literature and the benefits of extending both theories to better encompass the distinctive context of CWSs. The chapter concludes by outlining a novel framework for the role of knowledge transfers and spillovers between CWS operators and providing the foundation for this dissertation’s primary, endogenous, and spatial hypotheses. 2.2 Shape of Water (Background Context) In 2010, the United Nations (U.N.) under Resolution 64/292 recognized access to clean water as a fundamental human right (U.N., 2018). The U.N. determined water access rights need to be sufficient in quantity and quality for safe consumption (Oliveira, 2017). The demand for and supply of water resources vary around the world in both quantity and quality (Zimmerman et al., 2008). Zimmerman et al. (2008) found that North and South America only contain ~14% of the world’s population but ~41% of the world’s freshwater resources. Further, there are extreme regional variations in water quality (Rickwood and Carr, 2009). Some of these variations are based around natural environmental features, while others stem from human behavior. 10 The Southwest U.S.’s struggle with arsenic pollution is an example of how both adverse natural environmental and anthropogenic factors impact water quality. Arsenic is a natural chemical element that is found in a variety of mineralized granitic, volcanic, and sedimentary rocks and is a known carcinogen when dissolved in water (Sanchez, 2017). In the Rocky Mountains and Interior Plains of the U.S., it is a commonly found element that naturally infiltrates groundwater systems and impacts the source water quality (Sanchez, 2017). However, the disruption of arsenic-bearing rocks by human mining activity in the Southwest has exacerbated and expedited the natural processes, raising the level of arsenic in groundwater (Sanchez, 2017). Management and research of water typically focuses on quantity and quality issues, either the raw water supply or the treated drinking water. The raw water supply refers to the rivers, streams, oceans, lakes, aquifers, and any other type of water that is not intended for drinking use without treatment (CDC, 2009; Sensorex, 2021). In contrast, drinking water is the water that is extracted from the raw water supplies and treated for human consumption (CDC, 2009; Sensorex, 2021). The two are intricately tied together; for example, degradation of the raw water supply by pollution can make drinking water treatment more difficult, while the overextraction for drinking water can deplete the raw water supply. The diverse properties of water lead to numerous academic fields to investigate the different types of water and require water research to be “transdisciplinary” where answers require numerous different perspectives ranging from the physical to social sciences (Krueger et al., 2016). Brelsford et al., (2020) points to the main delineation in academic research as many of the engineers and physical scientists are focused on the hydrology and water quality of the raw water resources, while the social science research has focused on the human management of raw water resources or drinking water systems. Similarly, 11 water management agencies employ a slew of different types of physical scientists (e.g., biologists and ecologists), economists, planners, engineers, and hydrologists to manage water systems (Walker, Loucks, and Carr, 2015). While all these different fields are engaged in water research and management, it is important to understand the multiple different forms that water can take including its physical properties and as an economic good. Beecher (2015) illustrated how water in its different forms can fit into each of the four categories of economic goods. Research on the raw water supplies is different from the research on bottled water, which is different from research on the public water systems. Most importantly for this framework, Public water systems represent a toll or club good, which are economic goods that are non-rivalrous (the good being used does not cause them to be used up to a point) and excludable (people can be denied access to them) (Beecher, 2015). As a toll good, water provided by CWSs demonstrates excludability as access can be limited but are non-rivalrous, as the use of the water system by one consumer does not deplete the system or limit the use by other paying consumers (Ostrom, 1990; Beecher, 2015). The understanding of water as a toll good related to human health and consumption is an interesting research area. 2.3 Drinking Water Systems Drinking water systems are entities that provide water and ensure both the quantity and quality of water for human consumption (National Research Council & Safe Drinking Water Committee, 1984). Drinking water systems are holistic including the raw source water supply, the treatment (treatment plants), the distribution infrastructure and the labor (human capital) that provides safe drinking water to the final users (National Research Council & Safe Drinking Water Committee, 1984). Water systems can be owned by governmental or non-governmental entities, and the distinction depends on the governing/oversight bodies regulating the water 12 system sector in the defined region. Water systems in each of these areas can have natural space- based geographic and infrastructure features that provide advantages for the quantity, quality, or delivery efficiency of drinking water. Drinking water systems typically obtain their water supply through either surface or groundwater (CDC, 2009; Sensorex, 2021). Surface water is the water that collects on the ground (stream, river, lake, reservoir, or ocean) and can be obtained more easily than ground water which is obtained by drilling wells into the water table (CDC, 2009; McLachlan et al., 2017; Sensorex, 2021). Ground water is typically higher water quality than surface water because it is not as exposed to the surface level pollutants from human activity and the rock and sediment help filter some of the pollution before they contaminant the ground water supply (McLachlan et al., 2017; Sensorex, 2021). However, this might not always be true as in some places (e.g. arsenic example in section 2.2.) there are greater levels of naturally occurring contaminants that negatively impact the ground water quality. Typically, surface water exposure and shallow groundwater wells put the supplies at greater risk than deep groundwater supplies due in part to rising temperatures and droughts (McLachlan et al., 2017). However not every system or place has access to reliable groundwater sources and many have relied on the lower quality surface water due to quantity issues (Sensorex, 2021). For example, between 2000 and 2015, New Jersey switched many of its water supplies to surface water to meet their growing population’s need for water because their withdrawal from aquifers (groundwater) were exceeding the natural replenishment or safe yield (New Jersey Department of Environmental Protection, 2017). Regardless of the type of source water, there are geographically based variations in quantity of either type of source waters. For example, Maricopa County, Arizona (home to Phoenix) had experienced water quantity issues, and between 2000 and 2020 13 experienced 280 weeks (26%) of drought conditions which have required restrictions on water usage. Meanwhile Ingham County, Michigan (home to Lansing) had 0 weeks in that same period with drought conditions (Groundworks, 2020). Source water is key to the system as the quantity available and quality drive decisions on the treatment and maximum usage. Another key feature of water systems is the treatment plant, which are facilities that “treat or clean” the raw water supply in order to provide drinking water that poses no known threats for human consumption (National Research Council & Safe Drinking Water Committee, 1984). Depending on the service population’s water usage, and the raw water supply’s quality, the “treatment” can vary. Systems that that perform “complete” treatment typically have low raw water source quality, which requires the system to treat the water with multiple chemicals and processes to provide safe drinking water. However, systems with “limited” treatment might have better quality source water that needs only a chemical feed to ensure safe drinking water (Pepper, Gerba, and Brusseau, 2011). For large populations, the treatment facilities are typically very large in order to treat enough water to meet the user demand, while drinking water systems serving small populations might just be monitoring a chemical feed on a single well (Pepper, Gerba, and Brusseau, 2011). Depending on the local geographies for raw water quality and quantity, as well as the service population, the treatment component of water systems will be different. After the raw water has been treated, the distribution system is the physical infrastructure that transports the treated water to the final user (National Research Council & Safe Drinking Water Committee, 1984). The distribution system includes the storage of the treated water (i.e., water towers), pipes, valves, fire hydrants, service connections to users, and in some cases pumping facilities (EPA, 2021). One of the big keys for distribution systems is the need for a 14 certain amount of water pressure throughout the system to ensure the water flow. Pressure can either be natural through gravity or through mechanical pumping to create the appropriate pressure (National Research Council & Safe Drinking Water Committee, 1984). The layout and choice of distribution system depend on the characteristics of the system’s service area; for instance, topography can provide advantages for using a gravity-based pressure system for places with changes in elevation or the spatial distribution of users can influence the need for pumping (National Research Council & Safe Drinking Water Committee, 1984). More than just the topography or natural features impact the pressure of the systems as there are variations in the distribution infrastructure between places based on the density of the service population. The impact of population density can be seen through a comparison of drinking water systems in Houston, Texas and Chicago, Illinois, where the city of Chicago has only a slightly larger with a population of ~2.7 million people than the city of Houston at ~2.4 million people (U.S. Census Bureau, 2019). The two cities are substantially different in their population density, where Houston has a population density of ~4,000 people per square mile, and Chicago has a population density of ~12,000 people per square mile (U.S. Census Bureau, 2019). Their primary drinking water systems differ in the amount of distribution infrastructure as Houston has an estimated over 7,000 miles of water pipelines (Molly, 2021), while Chicago only has an estimated 4,200 miles of water pipelines (Corley, 2016). Fewer miles of pipeline infrastructure can benefit systems as the infrastructure begins to age, as there is less of a burden on replacement and monitoring of the distribution system. The final component of drinking water systems is the human/labor capital, which comprises the management and operations of the system. Drinking water systems are multi- faceted which require continued operations of treating and distributing the water, as well as 15 financial health to ensure the system can cover the costs of their operations (National Research Council & Safe Drinking Water Committee, 1984). In many large systems there are multiple operators for treatment and delivery and employees entirely dedicated to the financial component of the systems, while for many systems serving small populations, the human capital component could be just a single person who handles the extraction, treatment, delivery, and financials (Blanchard and Eberle, 2013). Operating water systems is a complex task that requires competence in the extraction and management of raw water supplies, the treatment, and distribution components of their drinking water system (Shanaghan et al., 1998; Beecher, 2009; Tiemann, 2014). The management and operations of systems needs to be able to assess the raw water supply for quantity and quality, conduct the appropriate treatment techniques, and ensure that the distribution system is delivering safe and reliable drinking water to its final users at a cost that both ensures the financial health of the system and affordability (EGLE, 2020). The US geographic context matters for the quality and quantity of human capital, as the EPA has found that many small rural communities lack a population with the necessary skillsets to be effective drinking water system managers, while larger metropolitan areas have greater potential for the necessary human capital (Blanchard and Eberle, 2013). The water sector can be described as pluralistic, where there are multiple stakeholders involved in both water delivery and consumption (Beecher, 2009). The pluralistic nature of the U.S. water sector can be seen through amalgamation of governing and regulatory bodies, water utilities and systems, different user’s water use (public supply, fire protection, industrial use, irrigation, energy, recreation), customers, and advocates (public health, environment, human rights) all with direct and sometimes competing interests in the sector (Beecher, 2009) regarding the efficient and equitable distribution of drinking water. 16 2.4 United States Regulation of Public Water Systems Federal oversight of the U.S. water sector stems from the Safe Drinking Water Act of 1974 (SDWA) and its’ subsequent amendments defining the EPA’s regulatory authority over CWSs (Rubin, 2013; Tiemann, 2014). Prior to 1974, numerous studies showed that there were widespread water quality problems that posed health risks to many Americans based on “poor operating procedures, inadequate facilities, and uneven management of water supplies for all sizes” (Tiemann, 2014, pg. 2). Fundamentally the goal of the SDWA was to regulate water systems serving 25 or more people for drinking water quality and ensure that the delivered water has no known adverse impacts on human health (Tiemann, 2014). Failure to provide safe drinking water in both the short and long term can cause any number of negative health outcomes for a healthy population but is even more dangerous for vulnerable populations. It is important to note that the SDWA does not include any sort of economic regulation and is only focused on regulating drinking water quality. To attempt to limit SDWA failures, U.S. water quality regulation takes a “multiple-barrier” approach (Office of Water, 2013). The multiple barrier approach attempts to limit water contamination at several different points in the extraction, treatment, and distribution processes through the regulatory oversight of multiple federal/state/local agencies and organizations, as well as the CWS itself, which have a vested interest the delivery of safe drinking water. The multiple barrier approach can be seen through the federalism of the U.S. water sector oversight. The SDWA sets up a regulatory framework where the United States Congress sets the legal requirements for water systems’ minimum standards for delivered water quality, maximum contaminant levels (MCLs), treatment techniques, operator certifications, monitoring and reporting, and authorizes financial assistance (Tiemann, 2014). With the legal standards set, the Federal EPA has the substantial discretionary authority to regulate water systems for quality 17 issues and uses ten regional EPA offices for oversight (Water, 2003). Under the Act the EPA can delegate the primary enforcement to State governments (typically through a specific agency) as long as the State can prove their standards are at least as stringent as the EPAs (Tiemann, 2014). As of 2020, only Wyoming and the District of Columbia do not have enforcement primacy for their drinking water systems (EPA, 2020). The water system operator is the last of the multiple barriers and is responsible for keeping their system in compliance with the SDWA and is legally responsible for SDWA compliance (Office of Water, 2003). From the governance and human side of CWSs, the different oversight levels are multiple barriers that help reduce the chances of catastrophic failure of systems as there are several different stakeholders that would all need to fail for delivered drinking water to be a threat to human health. The nine individuals indicted in the Flint Water Crisis were each a barrier that failed to ensure safe drinking water to Flint residents. Regulatory standards can vary between different State primacy agencies and the EPA. New York provides a great example of the difference in MCL standards, where the State may have a higher standard than the EPA. Dichlorophenoxyacentic Acid (2, D-4) is an organic compound that is used as an herbicide and has been found to cause fertility problems and cancers in humans (NIOSH, 2014). EPA federal MCL for 2, D-4 is 70,000 ppt (parts per trillion) but New York created its MCL to be more stringent at 50,000 ppt because they believed the EPA’s MCL standard was still a great risk (Napoli, 2017). As another example, State primacy agencies can regulate contaminants that are not federally regulated. New Jersey in 2018 found an EPA non-regulated chemical Perfluorononanoic Acid (PFNA) in 11 different CWSs (Fallon, 2018). PFNA has been linked to liver and immune system diseases as well as negative impacts on fetal and infant growth (Fallon, 2018). Since the EPA had no MCL for PFNA, New Jersey created its 18 own MCL of 0.013 micrograms per liter (Fallon, 2018). While there is uniformity in basic federal standards under the SDWA, there is still variation between states in the extent they make their standards more stringent than the federal minimums. It is completely up to the primacy agency to determine if the EPA standards are high or low enough, and in some places (like New Jersey), agency policy sets more stringent standards to deliver safe drinking water. Federalism in water regulation is also illustrated by variations in operator certification requirements. The 1996 SDWA amendments established broad guidelines suggesting qualification standards, enforcement on certification requirements, and certification renewal (Tiemann, 2014). Table 1 shows the EPA minimum requirements for State operator certification programs. The underlying minimum certification program requirements for potential operators are passing an exam showing they have the knowledge to run the system, minimum educational achievement or relevant work experience, and a State-required continuing education and training certification renewal at least once every three years (EPA, 2000; Tiemann, 2014; Oxenford, 2018). Again the State primacy agencies can determine their own standards for operator certifications as long as they have at least as much rigor as the Federal EPA standards (Tiemann, 2014). Minimum requirements for State Operator Certification Programs (EPA, 2000) • Pass an exam that demonstrates the operator has the necessary skills, knowledge, ability, and judgment to run the water system • Have a high school diploma or general equivalency diploma (GED), or relevant training and experience • Defined minimum on-the-job experience for each appropriate level of certification. Amount of experience required increases for each classification level. Post-high school education can be substituted for experience. Credit can be given for tangential fields. • States must establish training requirements for renewal based on the level of certification held by the operator. States renewal cycles must be within 3 years. Table 1: EPA Minimum State Operator Standards 19 State primacy agencies have different requirements and rules for approving a water operator certification as illustrated by the contrast in Indiana and Michigan operator certification programs in Table 2. Both states follow the minimum operator requirements of education, renewals at least every three years, and have an exam to pass. However, they differ in what it takes to pass the exam as Michigan just requires a 60% or higher, while Indiana requires 70% or higher. Further commonalities are the breakdown of certifications primarily based on system size as measured by population served (EPA, 2016). A major difference comes from the amount of continuing education (training hours) required; Michigan only requires 24 hours every three years (for the largest systems) and mandates that a minimum of 18 of those hours must be focused on training in TMF capacity, while Indiana requires 30 hours but does not mandate the hours in specific areas (EPA, 2016). 20 Classification of State Minimum Educational Experience Metric Renewal Renewal Requirements Systems 24 or more hours, with > 20,000 population S-1 minimum 18 being TMF served training 24 or more hours, with • High School Diploma or 4,000 to 20,000 S-2 minimum 18 being TMF Equivalent population served training (*important to note higher Must renew every 24 or more hours, with 1,000 to 4,000 Michigan educational background allows S-3 three years with minimum 18 being TMF population served training the applicant to earn more points continuing education on the test) <1,000 population 12 or more hours, with S-4 minimum 6 being TMF training • Pass Exam served NTNCWSs or CWSs S-5 with no treatment and 9 or more hours limited distribution WT 1 <500 population served 10 hours 501-3,300 population WT 2 15 hours served 3,301-10,000 population WT 3 served (ground water or 25 hours • High School Degree or Must complete purchased water) Indiana Equivalent Continuing Education 3,301-10,000 population Hours every 3 years • Score at least 70% on Exam WT 4 30 hours served (surface water) 10,000-100,000 WT 5 30 hours population served >100,000 population WT 6 30 hours served Table 2: Outline of Michigan and Indiana States Operator Certification Standards (adjusted from Oxenford, 2018) 21 SDWA compliance for a CWS means that the CWS has achieved (to SDWA standards) no issues in the delivery of safe drinking water quality that puts human health at risk and has appropriately handled all the administrative duties. Researchers (e.g., Rubin, 2013; Van der Slice, 2011; Allaire et al., 2018) regularly use SDWA violations as a key metric for measuring compliance and system performance. SDWA violations are multifaceted: some violations imply serious and imminent threats to human health, while others involve administrative issues, including monitoring and reporting (Rubin, 2013). Health-based compliance means that the CWS’s distributed water meets SDWA standards and minimizes health risks, while administrative compliance refers to the tasks of testing and reporting water quality to regulatory agencies/consumers and any other water system management requirements of the SDWA (Rubin, 2013). For example, a CWS would receive a health-based violation if its tested water had lead levels greater than 15μg/L; a threshold beyond which lead levels threaten the health of the consumer, especially if the consumer is a member of a vulnerable population such as children or elderly (Tiemann, 2014). An administrative violation could be failure of the CWS to provide a consumer confidence report (CCR) to inform its users about the water quality or failure to submit monitoring or testing results to its primacy agency (Tiemann, 2014). SDWA compliance is crucial to understanding CWSs as it is often used as a measure of performance, and compliance indicates that a CWS is providing safe drinking water (Rubin, 2013; Tiemann, 2014). 2.5 Technical, Managerial, and Financial Capacity As part of the 1996 SDWA amendments, capacity development was identified as a fundamental tool for CWSs to ensure new CWSs were able to be SDWA compliant before delivering drinking water; as well as assessing and developing capacity for existing CWSs ( Office of Water, 2013). Capacity here refers to the technical, managerial, and financial (TMF) 22 capabilities of a CWS for achieving SDWA compliance (Shanaghan et al., 1998; Beecher, 2013; Office of Water, 2013). Congress attempted to improve financial capacity and regulatory compliance through the Drinking Water State Revolving Fund (DWSRF), which was established to help provide low-cost sources of financial support for systems to ensure delivery of safe drinking water (Tiemann, 2014). Financial assistance can be provided to systems that lack TMF capacity as long as the systems could show that additional funding would be able to help the system reach compliance (Shanaghan et al., 1998). The SDWA required the States to put into place new capacity development strategies to ensure any new system would be able to reach compliance before being allowed to operate (Shanaghan et al., 1998). There are close to ~50,000 CWSs in the U.S. serving 95% of the population, which are run by a variety of different organizations (municipalities, states, private companies, non-profits) (EPA, 2020). This patchwork of systems reflects differences in organizational routines as well as compliance. Number of Community Water Systems 54,000 53,000 52,000 51,000 50,000 49,000 2000 2005 2010 2015 2020 Figure 1: Changes in the Number of Community Water Systems in the U.S. between 2000 and 2019. 23 As seen in Figure 1, an increasing downward trend is evident with 3,751 fewer CWSs in 2019 (Q3) than in 2000 (Q4) (SDWIS, 2019). The trend has been attributed in part to the capacity requirements, which limited the creation of new systems that lacked TMF capacity (Tiemann, 2014). The assessment of compliance and funding has led to some consolidation of existing CWSs, which resulted from incentives under the SDWA. Figure 2: Technical Managerial and Financial Capacity Venn Diagram from Shanaghan et al. (1998) In supporting guidance, the EPA conceived of TMF capacity best practices as a Venn diagram, connecting each capacity indicator to one another and all parts of the system need to work to ensure compliance. Figure 2 (Shanaghan et al. 1998) shows this Venn diagram of the key components in each TMF area and how they cut across domains (Shanaghan et al., 1998; Office of Water, 2013). Technical capacity refers to the physical and operational ability of a CWS to comply with both Federal and State quality and quantity regulations (Shanaghan et al., 1998). Technical capacity typically focuses on source water, infrastructure, and the technical knowledge and is most associated with the water operator’s extraction, treatment, and delivery 24 tasks (Shanaghan et al., 1998). Managerial capacity is the ability of a CWS to conduct its affairs in a manner enabling the system to achieve and maintain compliance with both Federal and State regulations (Shanaghan et al., 1998). The managerial capacity is commonly associated with the system organization and administrators. Financial capacity refers to the ability of the public water supply system to acquire and manage sufficient financial resources to allow the system to achieve and maintain compliance with state and federal drinking water regulations (Shanaghan et al., 1998). The financial capacity ensures that the system has good credit and ample revenue (Shanaghan et al., 1998). Each area is key to the successful provision of drinking water under the SDWA. As discussed in section 2.3, larger systems there may be a specific employee(s) for each task, but many small systems have only a single administrator who is also the operator (Blanchard and Ellerbe, 2013). Even in larger systems there is still great overlap between the areas; for example, changes in treatment techniques can be an aspect of technical (changes in water chemistry), financial (new costs associated with treatment), and managerial (employees that can handle the tasks) capacity. The capacity development framework suggests that one way that CWSs can build the capacity is by establishing and maintaining ‘effective external linkages.’ Effective external linkages are multi-dimensional and can represent interactions between the CWS and the service population, CWS and the local or state governments (or primacy agencies) or could represent the interactions between two separate CWSs (Shanaghan et al., 1998; Office of Water, 2013). These effective external linkages are one aspect that helps a system achieve capacity, which in turn should increase the system’s ability to comply with SDWA regulations. Academic research has explored the effect on performance (as measured by SDWA compliance) of linkages between a CWS and its service population and governing/ regulating bodies (Montgomery et al., 2018; 25 Grooms, 2016; Ottem et al., 2003; Mullin, 2009). Montgomery et al. (2018) investigated the linkages between the CWS and the service population, finding that greater stakeholder attention and participation increases the compliance of CWSs regardless of ownership. Further, Montgomery et al. (2018) found public participation and attention is even more impactful on large CWSs than on small CWSs. Grooms (2016) found that public discourse deters compliant systems from SDWA non-compliance but does little to help a non-compliant CWS get back into compliance. Other studies have connected SDWA compliance and increased linkages between the CWS and the governing/regulatory bodies (Ottem et al., 2003; Mullin, 2009; Grooms, 2016). Ottem et al. (2003) showed a difference in the SDWA compliance rate of small systems (<3,300 population served), if the system was in nearer spatial proximity to its regulatory office as well as if the system was more involved with its regulatory body. Mullin (2009) found correlations between a type of CWS’s (special purpose water districts) SDWA compliance and connections to its governing bodies. This literature has empirically investigated the linkages between the CWSs, its service population and governing bodies showing the “effective external linkage” model described by Shanaghan et al. (1998) lines up with SDWA compliance for two of the possible linkages. However, the literature has yet to systematically explore in the U.S. context the relationship between SDWA compliance and effective external linkages between unrelated CWSs and their operators. Scott and Greer (2018) researched inter-CWS interactions in the form of personnel sharing in special water districts in Houston, Texas and found the structural features of systems sharing a groundwater supply for source water, systems purchasing their water, and 26 systems with high outstanding debt are the most likely to share personnel. While Scott and Greer (2018)’s findings are useful for understanding the structural features that lead to personnel sharing, they did not explore the relationships between these external linkages and SDWA compliance, such as how shared operators or administrators relate to SDWA compliance. Further, the lack of connection to performance, and assessment of interactions between non personnel sharing CWSs, adds an additional layer of complexity that leaves a substantial research gap. 2.6 Theoretical models for Knowledge Transfer 2.6.1 The Nature of Knowledge Knowledge is a strategic and critical resource that is used by organizations (governmental, private, non-governmental) to achieve their goals (Goswami and Agrawal, 2018). Knowledge differs from information because information is the facts provided or uncovered, while knowledge is the utilization of information that allows for the practical understanding of things (Goswami and Agrawal, 2018). Knowledge further divides itself into two categories: codified and tacit knowledge (Gertler, 2003). Codified knowledge is the more formal or systemic knowledge that is easy to transfer in written forms (MacKinnon and Cumbers, 2007); easy examples of this type of knowledge are car manuals or furniture assembly instructions. Tacit knowledge is the knowledge that comes from direct experience or expertise that is not easy to communicate through writing and is often thought of as ‘know-how’ knowledge (MacKinnon and Cumbers, 2007). Examples of tacit knowledge are ‘how to throw a perfect curveball’ or ‘which method is most appropriate for my research question’, or the type of knowledge that requires activities and experience to perform the task. Gertler (2003) points to economic geography’s exploration of knowledge (tacit or codified) sharing has utilized theories from organizational learning, innovation systems, and agglomeration economies. 27 Knowledge transfer can be defined as the process of organizations learning from within their own organization or from other unrelated organizations (Easterby-Smith et al., 2008). Organizations learning is when organizations improve their routines through increasing their internal knowledge management practices or through obtaining knowledge from an unrelated organization (Levitt and March, 1988). Organizations improving their output based on inter- organizational learning make up the key mechanisms of knowledge transfers in innovations systems (Easterby-Smith et al., 2008) and knowledge spillovers in agglomeration economies theories (Marshall and Marshall, 1920; Rosenthal and Strange, 2004). The view of knowledge transfers from innovation systems theory is broad and any knowledge exchanges between organizations are considered knowledge transfers, regardless of spatial proximity, governmental/non-governmental, or inter/intra-organizational distinctions (Easterby-Smith et al., 2008). Conversely, the Marshallian micro-foundations of agglomeration economics’ theory describe a specific type of knowledge transfer called ‘knowledge spillovers’, which are considered an explicit benefit of knowledge exchanged between organizations based on their geographic proximity (Marshall and Marshall, 1920). Organizational learning, innovation systems, and agglomeration economics theories point to inter-organizational knowledge transfers as a way to increase an organization’s outputs/performances. 2.6.2 Organizational Learning Many economic researchers (Levitt and March, 1988; Cherrington, 1994; Youndt and Snell, 2004; Richard et al., 2009) have investigated the fundamental question “how can organizations increase their performance?” because better organizational performance can ensure organization long-term growth, stability, and security. Research (Levitt and March, 1988; Cherrington, 1994; Nieminen, 2005; Lawler III, 2005; Senge, 2006; Cao and Zhang, 2011) has 28 found organizations can increase productivity, innovation, and output through effective creation, capture, and distribution of their organizational knowledge (tacit or codified) through a process known as organizational learning. Organizational learning is “the process by which an organization improves itself over time through gaining experience and using that experience to create knowledge (Valamis, 2019).” The utilization of knowledge by an organization can be the difference between organization’s life and death. Organizational knowledge is created through direct experience because in a learning-by-doing approach, the solutions to problems/issues become part of the organization’s operations routines (Levitt and March, 1988). The output of an organization is driven by its routines. There are three main ways that organizations can increase their organizational learning: (1) Research and Development activities (R&D), (2) increased training protocols, and (3) inter- organizational knowledge sharing (Levitt and March, 1988). R&D or increased training protocols are intra-organizational learning opportunities because these activities increase the spread of knowledge throughout an organization, while the inter-organizational knowledge sharing approach to increasing organizational learning is between unrelated organizations (Levitt and March, 1988). Greater R&D for the organization can lead to stronger practices or products and increase the output (Levitt and March, 1988; Peterson and Jeong, 2010). Dissemination of organizational knowledge through employee training protocol has been found to positively impact an organization’s learning potential (Levitt and March, 1988; Jerez-Gomez et al., 2005). Both R&D activities and training have the typical drawback of increasing upfront costs, as R&D activities and increasing employee trainings require substantial investment (Levitt and March, 1988). Lee and Choi (2015) found the investment into R&D is not seen as a viable solution for small or high debt ratio firms because small and debt-ridden firms cannot afford the high capital 29 costs of R&D activities. Brinkeroff (2006) found that while the increases in training and human- resources protocols can increase the organizational learning, the additional training and oversight costs are out of reach of small organizations and/or debt-heavy organizations. The third opportunity for growth of inter-organizational knowledge sharing does not have the same financial drawbacks as R&D or increased trainings (Levitt and March, 1988; Cao and Zhang, 2011). Inter-organizational knowledge sharing is where one organization learns from the experiences of another unrelated organization (Levitt and March, 1988; Appleyard, 1996; Nieminem, 2005). Inter-organizational knowledge sharing has been of interest to geographers as it takes place in two dimensions: geographic space and organizational network space (Howells, 2002). The geographic space refers to opportunities for firms to learn from one another based on the physical proximity of the firms, while organizational network space is aspatial and driven by the network connections of the organization. Saxenian (1996) attributes the rise of Silicon Valley as a product of the local culture of technology firms due to both geographic proximity and their strong network ties that encouraged knowledge sharing between unrelated firms. The propensity for knowledge sharing between organizations is largely dependent on the local culture and competition between the organizations (Levitt and March, 1988; Nieminem, 2005; Cao and Zhang, 2011). Highly competitive local cultures would result in less knowledge sharing between organizations, while a less competitive, more trusting and more community-focused cultures would encourage the open sharing of information (Levitt and March, 1988; Nieminem, 2005). A second issue with the idea that knowledge sharing opportunities increase organizational learning and knowledge is the specificity of the tasks performed by the organization (Levitt and March, 1988). Knowledge transfer will only work if the firms are facing the same or related problems because firms are not going to be focused on 30 problems that do not relate to them. A high technology company like Facebook is not going to face many of the same problems that a small locally owned boutique retail store will, which would make knowledge transfers between the two not very productive. However, Facebook and Twitter will face similar problems, and knowledge transfers between the two could increase the output for both companies. Figure 3: Broad Knowledge Sharing and Firm Performance Model (from Wang and Wang, 2012) Figure 3 presents the basic model for knowledge sharing between unrelated firms (Wang and Wang, 2012). According to Wang and Wang (2012) firm performance can be measured in two ways: (1) operational performance and (2) financial performance. Both types of performances are intricately tied together, as operational performance captures the increases in customer service, cost management, productivity, quality, and asset management performance; while financial performance captures the profit margins, profit growth, and return on investments (Wang and Wang, 2012). An increase in operational performance could lead to an increase in financial performance and vice versa. One mechanism hypothesized to increase firm performance is knowledge sharing which can impact the firm performance in two ways: (1) direct impact and (2) facilitating innovation. An example of the impact of direct knowledge sharing between unrelated firm’s positive performance would be one firm educating the other on the industry standards which leads to increased firm output. While knowledge sharing of innovative processes (not industrial standard) from one firm to another would also increase the 31 firm performance, firms working together can lead to innovation that mutually increases both firms’ performance (Wang and Wang, 2012). Figure 4: Knowledge Sharing Benefit Example Knowledge sharing between unrelated organizations can be such a powerful tool to increase performance because in the proper conditions it is a low-cost way to increase output (Levitt and March, 1988). Figure 4 shows one advantage of utilization of knowledge sharing between unrelated organizations “A” and “B.” Organization “A” and “B” are organizations in the same sector and “A” has 300% greater performance than organization “B.” Following the hypotheses in organization learning (Levitt and March, 1998), innovation systems (Easterby- Smith et al., 2008; Asheim et al., 2011), and agglomeration economies (Marshall and Marshall, 1920; Rosenthal and Strange, 2004) theories, organization “B” should be able to increase performance through a strategic partnership of knowledge transfers with organization “A”. In the knowledge sharing step in Figure 4, organization “A” shared knowledge with “B” helps increase “B’s” “performance.” “B” improves its performance through linkage with organization “A” without “A” losing any of its performance. This is a key feature of knowledge transfers as they can help increase performance of one organization without hurting the other. 32 2.6.3 Innovation Systems Theory According to innovation systems theory one of the main ways that a firm can innovate, and experience success is to take advantage of inter- and intra-firm knowledge transfers (Easterby-Smith et al., 2008; Asheim et al., 2011). The intra-organizational learning focuses on the ability of an organization to increase learning, innovations, and output through increases in organizational size and scale (Asheim et al., 2011). Inter-organizational learning stems from the ability of the organization to take advantage of relationships, networks, physical proximity, or other forms of connectivity with outside organizations to solve their tasks (Asheim et al., 2011). However, the effectiveness of both intra- and inter-knowledge transfers varies considerably among organizations (Argote, 2011). The basic premise of knowledge transfer in innovation systems theory is that individual actors and organizations will increase their innovation and productivity through interactions, collaboration, and healthy competition that has been shown to work through numerous studies (Wehn and Montalvo, 2018). Gilbert and Cordey-Hayes (1996) showed inter and intra knowledge transfers between UK bank employees increased the bank performance (as measured in new accounts and profit) and the adoption of new innovative banking technologies. Weidenfeld, Williams, and Butler (2010) compared knowledge transfers between two tourist locations on the Lizard Peninsula (UK) and found that increased knowledge transfers lead to increased new attractions and innovations for the tourism industry. Rodgers et al., (2016) explored knowledge transfers between CPAs and found they increased in the ability to perform successful audits with greater experience and interactions with other firms. Wang and Wang (2012) examined knowledge transfers between high-technology firms in the Jiangsu Province of China and found that firms engaged in knowledge transfers lead to more 33 firm innovation and greater financial performance then firms that did not engage in knowledge transfers. Wang et al. (2016) provides an explicit predictive model of firm performance based on knowledge transfers in the Chinese high-technology firms. Their basic model showed that knowledge transfer practices enriched the intellectual capital of the firm, which then increased the firm performance (measured in profit). Hamdoun, Jabbour, and Othman (2018) found using questionnaires of companies in Tunisia that joint efforts and interactions between manufacturing companies lead to better environmental outcomes and innovations. Sedighi et al. (2016) using a survey of 283 employees of different car companies focused on the perceived benefits and costs of the quantity and quality of knowledge sharing; finding the certain individual and corporate traits (reputation, reciprocity, altruism) impact the both the perceived quantity and quality of information exchanged. The main limitations of innovation systems theory which obfuscate the understanding of the role of knowledge transfers between CWS operators on the performance of the system are the lack of quantitative explorations on resource-based sectors’ performance (Soete et al., 2010; Wehn and Montalvo, 2018). Primarily, the studies exploring the inter-organization interactions focus less on the quantitative assessment of performance and more on the qualitative exploration of what factors impact or explain knowledge transfers (Soete et al., 2010). Pavitt (1984) defined resource-based sectors as the industries whose entire output are based around natural resources: industries like fisheries, logging, water and wastewater; while the non-resource-based sectors are made up of industries that do not require the explicit use of natural resources; sectors like the financial, manufacturing, and information technology. In non-resource-based sectors, the fundamental prospect of knowledge transfers leading to greater output can be missed due to having to account for issues of intellectual property rights, patenting, competition, and other 34 private activities (Wehn and Montalvo, 2018). In the resource-based sectors these barriers are far less of concern as the primary concerns are focused on the efficient and equitable use of the natural resources. Some studies have tried to make these connections (See: 2.6.5 Knowledge Transfers and Water System Operators) but the vast majority lack the connection to the performance of resource-based sectoral organization and knowledge transfers. Further, Soete et al., (2010) asserts one of the main reasons for the research limitations is the difficulty in measuring performance in the resource, public utility, and public service sectors. For most industrial sectors, the ability to quantify performance is easier than it is for natural resource sectors, because the output of the organization can account for the profits made, increased employment, or other financial metrics (Soete et al., 2010). In the banking sector (Gilbert and Cordey-Hayes, 1996), accounting sector (Rodgers et al., 2016), technology sector (Wang and Wang, 2012; Wang et al., 2016), tourism sector (Weidenfeld, Williams, and Butler, 2010), or the manufacturing sector (Hamdoun, Jabbour, and Othman, 2018), the dependent variable is the increases in profit, growth of the company, adoption of new technologies, and/or innovations. In comparison to the natural resource and public utility sectors have monopolistic traits often requiring quality regulations, and either governmental ownership structure or economic regulation to protect consumers (Beecher, 2013). CWSs are heterogeneous in ownership type as there are government owned (local, state, federal), and non-government (publicly traded companies, private ownership, non-profit, ancillary) owned systems (Beecher et al., 2020). Public utilities are economically regulated to limit their prices and subsequently profits due to their monopolistic characteristics and their consideration as essential to everyday life (Beecher, 2015). For standards development. EPA, 35 WEF, and AWWA (2013) set an affordability suggested threshold for water and wastewater services at no more than 4.5% of median household income. Raising rates to cover costs may require review by state economic regulators (Beecher, 2015). Utilities want more than a modest profit, which is why they require economic regulation. CWSs (utility vs system discussed in Section 3.2.1.2) in the U.S. are a great vehicle to investigate whether greater knowledge transfers lead to increases in performance, as the goal of CWSs is to provide safe drinking water to its population and the economic regulation of some CWSs limits their profit potential (Tiemann, 2014). This main goal of CWSs puts providing safe drinking water over profit accumulation, allowing for different performance metrics to be used by research to model for resource-based, public services, and public utility sectors compared to the primarily profit driven performance models for many of the private sectors. One common method the academic literature measures CWS performance is through SDWA compliance (more in Section 2.4 and 3.4.2). Through modeling the relationships between a CWS’ SDWA violations as the performance metric and exploring the number of interactions between a CWS and other CWSs, this research might illuminate a feasible and low-cost mechanism (knowledge transfers) to increase performance and thus compliance with drinking water regulations. 2.6.4 Agglomeration Economics While knowledge transfer in innovation systems theory focuses on the benefits for individual enterprises, agglomeration economics’ knowledge spillover theory focuses on the broader regional impacts of knowledge spillovers. In agglomeration economics theory, knowledge spillovers are knowledge transfers which are spatially based phenomena (Marshall and Marshall, 1920; Saxenian, 1996; Rosenthal and Strange, 2004). If a city or region has industries that are successful and have knowledge spillovers, then that area will have regional 36 advantages attracting and inspiring more successful firms in the same economic sector (Marshall and Marshall, 1920; Saxenian, 1996; Rosenthal and Strange, 2004). The key idea is that complex tasks are solved by different firms and by sharing of information (both formally and informally), which leads to regional advantages for firms’ performances (Rosenthal and Strange, 2004). The industry employee or worker is the primary vehicle of knowledge spillovers (Rosenthal and Strange, 2004). One of the main goals of the agglomeration economics work on knowledge spillovers is to explore spatial heterogeneity between regions. The agglomeration economics literature has shown knowledge spillovers have increased firm and regional productivity because the sharing of information helps solve complex tasks (Rosenthal and Strange, 2004). Jaffee et al. (1993) was the cornerstone research and showed the impact of knowledge spillovers by measuring the spatial concentration of patent citations. They found that patent citations were 5–10 times more likely to come from the same metropolitan statistical area (MSA) as control patents. The key in Jaffee et al. (1993) was patents were more likely to be cited by country and state of the initial patent, leading to their interpretation of localized regional advantages in innovation of firms. Audretsch and Feldman (1996) regressed spatial concentration of new products introduced on local and industry specific attributes (one measure is the number of firms in an industry). They found that knowledge-oriented industries have more spatially concentrated innovation activity, leading them to hypothesize the role of knowledge spillovers. Charlot and Duranton (2004) explored knowledge spillovers between workers in French cities to explain the urbanization effects on knowledge spillovers and hypothesized that more communication between workers should increase the wages. Through their logit regression model, they found larger and more educated cities had workers communicating more and this raised their wages. Many studies (Audretsch and Keilbach, 2008; 37 Agarwal, et al., 2004; Acs and Sanders, 2012) link together success in entrepreneurship and regional knowledge spillovers, under the basic guise of the localized knowledge spillovers creates hubs of entrepreneurial ecosystems that increase the success rates of new businesses and encourages spin-offs. Typically, all the quantitative investigations model knowledge spillovers compared to performance measured in new businesses, profitability, new patents, or patent citations. While much of this research is quantitative, there is substantial qualitative literature that explores how a regional culture of openness and knowledge spillovers between firms is a product of how the regional unrelated firms communicate and perceive one another (regional culture). These perceptions can create the regional culture which may lead to regional advantages or disadvantages. Saxenian (1996) wrote about this extensively in an investigation of the rise of the technology industries in Silicon Valley and on Route 128 (Boston area). Both Silicon Valley and Route 128 were surrounded by some of the best universities, had access to the most updated technology and talent, but Silicon Valley became the premier technological cluster in the US. Saxenian (1996) attributes the rise of Silicon Valley to the regional culture of openness and the knowledge spillovers between firms in the region. Saxenian (1996) even goes so far to say that the ability for programmers to solve problems on cocktail napkins gave Silicon Valley the regional advantage that allowed for it to win the technological race. Agglomeration economics theory of knowledge spillovers has yet to explain CWS performance because (1) CWSs are practically geographically immobile (Beecher, 2009; 2013), and (2) measuring CWS knowledge spillovers and performance are difficult (Rosenthal and Strange, 2004). In contrast to other sectors of the economy, public utilities and CWSs are 38 geographically immobile (Beecher, 2009; 2013) and do not have the ability to move to areas where knowledge spillovers are frequent occurrences to increase their performance. This immobile attribute of CWSs may explain natural regional advantages based on the location of the utilities and systems, where the regional cultures facilitate the system or utilities performance. Many researchers (Rubin, 2013; Teodoro and Switzer, 2016; Grigg, 2018) point to the complexity of the water operator’s task. If operators can solve their tasks, then these systems can achieve higher performance in health and administrative arenas. No studies have attempted exploration of the regional knowledge spillover landscape for utilities or CWSs, in part due to this immobility. Other geographically immobile industries (fisheries, utilities) could benefit from the exploration of this research gap. The second big issue with agglomeration economics theory having yet to answer this research’s primary question: it is difficult to measure knowledge spillovers (Rosenthal and Strange, 2004) or CWS performance (Beecher, 2013). As pointed to earlier, studies measured knowledge spillovers by looking at patent filing locations (Jaffee et al., 1993), firms in an industry (Audretsch and Feldman, 1996), or new firms (Agarwal et al., 2004; Audretsch and Keilbach, 2008). However, these measures act as proxies because of how difficult outside of directly surveying organizations there is no existing information source on how much unrelated firms talk with one another (Rosenthal and Strange, 2004). While proxies for interactions might work in private sectors it is not applicable for water utilities because patents or new water systems (spin offs) in the region would not represent knowledge spillovers, as both are rare events. However, by adopting the survey methods employed in some agglomeration literature (Rosenthal and Strange, 2004), surveys of water operators asking the direct questions about interactions with unrelated operators could explain regional variances in SDWA compliance. 39 Surveying operators in this way would depart from typical surveys of water operators, which have asked questions focused on treatment techniques, employees at the water system, degree of operator, or financial questions (Blanchard and Eberle, 2013; Baum et al., 2015); this conventional approach leaves little knowledge about the inter-operator interactions. Through adapting these methods research could better understand the relationship between knowledge spillovers in sectors where patents or spin offs are not an applicable measure of knowledge spillovers. 2.6.5 Knowledge Transfers and Water Systems Operators International studies have linked the innovation systems theory of knowledge transfers between operators and have pointed to increases in the water provider performance, but they did not explore the role of knowledge transfers (or spillovers) in geographic space. From a knowledge transfer perspective, Meene et al. (2011) and Leinert et al., (2006) point out that tacit knowledge of water operators is important in understanding the efficiency and quality of water system performance. Leinert et al. (2006) found that tapping into the tacit knowledge of Swiss Water Operators was extremely valuable for scholarly insight in water systems’ performances and best practices. Meene et al. (2011) conducted interviews of Australian Water Operators to attempt to pick up on the tacit knowledge of water provision and found that one of the keys for successful water provision is inter-agency collaboration and protection. Pascual-Sanz et al. (2013) researched Capacity Development Partnerships (CDP), which are international water system development partnerships aimed to increase developing world water systems. Wehn and Montalvo (2018) explored interactions and knowledge transfer practices between water operator partnerships; which is an international initiative that pairs different water provision organizations operators to promote knowledge sharing and build water system capacity. These partnerships 40 typically match up a water system operator from a developed country, with one from a developing country (Wehn and Montalvo, 2018). Wehn and Montalvo (2018) point to these partnerships as at least having anecdotal evidence of increased capacity for the developing systems. Two main issues arise from using these international studies; first Pascual-Sanz et al. (2013) points to the benefit of the knowledge transfers between operators but warn that studies exploring these international partnerships run the risk of the ‘results’ not reflecting the country- specific regulatory authority over their drinking water systems. There is a high degree of heterogeneity in how countries regulate water quantity and quality (Pascual-Sanz et al., 2013). This heterogeneity in compliance for national regulations makes measures of performance difficult to compare. The second is that these studies do not investigate the facilitation of knowledge transfers by geographic proximity, which limits understanding about knowledge spillovers (Pascual-Sanz et al., 2013). While the mentorship and learning from other operators has been researched in international and cross boundary contexts, there has yet to be an investigation considering the spatial dimensions of local/regional operators’ interactions. Through taking these approaches and developing them for U.S. regulatory contexts, the question of whether this holds up can be explored. 41 2.7 Conceptual Framework 2.7.1 Broad Conceptual Model for Water System Compliance Figure 5: Community Water Systems Capacity and Compliance Model Figure 5 shows the main conceptual model for CWS SDWA compliance (one measure of performance) stemming from the EPA’s TMF capacity theory. Researchers have utilized alternative measures of performance, such as resource efficiency (water loss), reliability, and cost (EPA, 2015). However, the primary metric used by both regulators and researchers is SDWA compliance because of the ease of data availability and encompassing nature of the SDWA (EPA, 2015). In the 1996 amendments to the SDWA, the EPA released their capacity development guidelines that were flexible enough that states and water systems could work together to build capacity in order to ensure that systems could meet SDWA public health protection objectives (Shanaghan et al., 1998). The SDWA is dynamic, changing as science illuminates greater risks to health from contaminants; therefore, the capacity development guidelines focus on continual improvement for drinking water systems (Shanaghan et al., 1998). 42 Figure 5 summarizes the capacity development framework for achieving SDWA compliance. A system’s capacity is largely formed by two main factors: (1) endogenous factors and (2) exogenous factors. Endogenous factors are the features of a water system that are controlled by the system itself or the structural components of the water system, such as: system size, ownership status, source water, or staffing. Conversely the exogenous factors are the features of the water system that are outside of its control, such as: the socio-economic status (SES) of the population being served by the CWS, the local environmental quality, and the proximity to other systems. Both endogenous and exogenous factors help define the capacity for the CWS, which leads to the increased likelihood of SDWA compliance. High-capacity systems are expected to have fewer violations and be more SDWA compliant than low-capacity systems (Shanaghan et al., 1998). Research investigating water system compliance typically will employ non-parametric regression approaches that provide a mix of exogenous and endogenous factors as proxies for capacity measurements. Research has found structural (Berg and Marques, 2011, Rubin, 2013; Allaire et al., 2018), environmental (Pennino et al., 2017; Montgomery et al., 2018), socio-economic status of service population (Switzer and Teodoro, 2017; McGavisk et al., 2013), and governance/regulation (Ottem, 2003; Mullin, 2009) impact CWS SDWA compliance. Any research exploring the relationships between capacity and compliance will have to control and/or account for these endogenous and exogenous factors to pick up on CWS capacity and to ensure that there are no missing variable biases or spurious correlations in the models. 2.7.2 Community Water Systems and Knowledge Transfers U.S. CWSs offer a unique opportunity to test against the knowledge transfer and spillover models established in organizational learning, innovation systems, and agglomeration economics 43 theories due to their unique features. One of the big caveats for the inter-organizational knowledge transfer theories is that there needs to be an established line of trust and local open culture of sharing for knowledge transfer to occur and be effective (Levitt and March, 1988; Nieminem, 2005; Cao and Zhang, 2011). CWS operators are not in competition with one another, and many are involved in professional group membership that encourages collective problem solving and education for the delivery of safe drinking water. This non-competitive culture meets the main assumption for the primary hypotheses of increased performance in the knowledge transfer models. A second property highlighted in the knowledge transfer literature is that the organizations need to face similar problems (Levitt and March, 1988). CWSs meet this criterion as all CWSs are required under law to meet the standards set forth by the SDWA (Tiemann, 2014). While there may be heterogeneity in the exact issues systems face, the underlying motivation to achieve SDWA compliance and meet all federal and state regulations are shared by all CWSs. These two key features of water systems make them a very interesting industry to explore in the context of knowledge transfers and spillovers. 44 Figure 6: Knowledge Sharing and Water Systems Conceptual Model Example The primary hypothesis of this research is that greater knowledge sharing between CWS operators will increase the performance of their CWSs. Figure 6 shows the basic hypothesis in action. Each container represents a CWS, and the fill of each container is the TMF capacity or the performance of the system. Part one of the figure shows six different CWSs that are all different sizes with varying levels of capacity with no connections between any of the systems. In part one, if the EPA’s theory on system capacity leading to greater SDWA compliance and system performance is true, then system “B” would be more likely to be SDWA compliant than system “A”, because system “B” has greater capacity (as seen through the fill). Part two (capacity changes) of the Figure 6 shows the same systems but provides connections between three of the systems. In part two, system “A” captures knowledge sharing from the other systems, thus increasing its capacity and is more likely to be in compliance than “B” which 45 remains isolated with unchanged capacity. The expectation is that the more inter-system linkages, the fewer the SDWA violations (aka: SDWA compliance). 2.7.3 Primary Hypotheses Model The primary research question for this dissertation is: what are the nature of regional advantages for inter-organizational learning, knowledge transfers, and knowledge spillovers, which facilitate CWS SDWA compliance? While Figure 6 shows the broad hypothesis of greater knowledge sharing leading to SDWA compliance, Figure 7 shows the two specific primary (1 and 2) hypotheses to answer the main research question. • (Prim-1): If spatial structure exists for operator interactions, there are regional advantages based on knowledge spillovers and transfers between community water systems (CWS) operators that facilitate Safe Drinking Water Act Compliance. • (Prim-2): “Isolated” or Non-Affiliated operators with fewer interactions are more likely to have SDWA violations/non-compliance than “non-Isolated” operators with more interactions. Figure 7: Primary Hypotheses on Performance and Knowledge Spillovers 46 While research investigating CWS compliance using non-parametric regression approaches has found that structural (Berg and Marques, 2011, Rubin, 2013; Allaire et al., 2018), environmental (Pennino et al., 2017; Montgomery et al., 2018), socio-economic status of service population (Switzer et al., 2016; Switzer and Teodoro, 2017; McGavisk et al., 2013), and governance/regulation (Ottem, 2003; Mullin, 2009) appear to impact CWS SDWA compliance; there still has yet to be an exploration of knowledge sharing/spillovers between CWSs impacts compliance. Figure 7 shows the three different types of operators that are found in Michigan CWSs: Utility operator, Contract Operator, and Non-Affiliated Operator. In both the top and bottom section of the graphic the systems are the same, with the only difference being the connections between systems in the top and greater capacity. Primary Hypothesis 1 (Top) shows that regardless of the type of operator, operators who are connected to other operators can learn from each other and increase their capacity which decreases the number of violations. If the top and bottom of the graphic are two different regions, then it is clear that the top region’s connectivity increases the overall performance of the CWSs as measured by SDWA violations. Further, the Non-Affiliated operators with inter- operator interactions have higher capacity than they would if they were not connected (as in the lower part). Primary Hypothesis 2 shows that operators who are not connected to other operators are isolated and more likely to have lower capacity and more SDWA violations. The bottom part of Figure 7 shows that the contract operators and utility operators have lower capacity when isolated compared to the top of the figure where they are connected. This is the basis for the dissertation research. 47 2.7.4 Endogenous Hypotheses Model To ensure against spurious correlations, the study controls for key structural characteristics. One major question is about the role of the operator type on increased interactions. A Utility operator is defined here as a CWS operator who is employed full time by a utility, meaning the operator’s employment is through an organization (governmental or non- governmental) whose primary focus is to provide water, where the organization both owns and operates the system/s. The Contract operator is an operator employed by a consulting firm that has a contract to operate a CWS but does not own the CWS’ physical assets. In these consulting firms, there may be many operators, and the CWS contracts can be located anywhere in a state. Finally, there are the Non-Affiliated operators who are “isolated” in the sense that they are not employed by a utility or contract operations firm but are individuals running the CWSs. Typically, the Non-Affiliated operators are running small systems and their main employment is not CWS operation. These operators typically will not have access to size or scale in their own systems, and outside knowledge will be required to facilitate their learning. Further, there are differences in educational status, external organizational membership, and certification within and between these groups. The following three endogenous hypotheses need to be explored to understand how endogenous factors relate to the number of reported inter-operator interactions, leading to the endogenous research question: in what manner does the operator type, professional engagement, or education background lead to greater interactions? • (En-1) Utility or Contract operators will have more inter- and intra-operator interactions than Non-Affiliated operators. • (En-2) Operators (Utility, Contract, Non-Affiliated) who are professionally engaged through water organizational membership, and pursuit of continuing education will have more interactions than operators who are not professionally engaged. 48 • (En-3) Operators (Utility, Contract, Non-Affiliated) with higher certification levels and educational attainment will have more interactions than operators with low levels of certification or educational attainment. Figure 8: Hypotheses Endogenous Factors and Interactions Figure 8 shows a visual representation of the endogenous (En) factor hypotheses. In this figure the fill of the CWSs (cylinders) is not the capacity as in Figure 7, but the number of inter- operator interactions. Hypothesis En-1 focuses entirely on the difference between the types of operators and the number of inter-operator interactions. The difference of the Utility and Contract operator typically having full time operator employment, while the Non-Affiliated operator is less likely to be full-time employed as an operator, lead to the hypothesis that the Non-Affiliated operators will not have as many interactions as the Utility or Contract CWS operators. Hypothesis En-2 focuses on the professional organization membership. A member of a professional water organization (more on professional water organizations in section 3.3.1), regardless of the type of operator, should have more interactions than operators who have no 49 professional membership. Further exploration of this hypothesis will investigate the involvement of operators in these organizations and how their outside organizational membership/ involvement plays a role in determining their number of interactions. Hypothesis En-3 focuses on the difference in educational attainment and certification between operators. The operators with equal levels educational attainment or operation certification should not have similar numbers of interactions, but if the operators with less education and a lower certification status should have less interactions than the operators with greater achievement in either arena. Higher certification levels would mean the operator is most likely running a larger system, and therefore be more professionally ingrained in the operator culture which would likely increase their interactions. Investigating these hypotheses ensures that the research does not make statements about the role of inter-operator interactions on SDWA compliance based on spurious correlations around endogenous characteristics of the CWS operator. 2.7.5 Spatial Hypotheses Model The other key research questions needed prior to exploration of primary hypotheses are: (1) how does the type of operator (Utility/Contract/Non-Affiliated) impact which counties their primary inter-operator interactions take place? and (2) how are CWS reported interactions spatially autocorrelated? The key to answering both of these questions is to explore the roles of spatial proximity between operators and local cultures play in determining the number of inter- operator interactions. These hypotheses are based in the key points in the organizational learning and agglomeration economics literature about the local culture. One of the possible factors driving knowledge transfer between unrelated organizations in organizational learning (Levitt and March, 1988) and in agglomeration economics (Rosenthal and Strange, 2004) was the style of competition between organizations. As pointed to earlier, water systems have no reason to be 50 in competition so they should meet that assumption. However, while fierce rivalries and competitions with one another is unlikely, there might be spatial heterogeneity in local networks or cultures and these differences could influence the models based on performance. Therefore, this research investigates the spatially explicit research questions and the spatial hypotheses. Hypothesis SP-1 states that Utility and Non-Affiliated operators are going to have interactions that occur locally within the county, while contract operators will have interactions outside of their localities. Utility and Non-Affiliated operators are assumed to be more localized than Contract operators because unlike many contract operators they are not running systems all over a state. The county is highest level of spatial resolution for systems tracked by EPA, and previous research (Wallsten and Kosec, 2008; McGavisk et al., 2013; Grooms, 2016; Greiner, 2016; Pennino et al., 2017; Allaire et al., 2018; McDonald and Jones, 2018; Montgomery et al., 2018) has aggregated CWS data to the county level for analyses. Based on the use of counties for previous analyses and the local embeddedness, SP-1 needs to be explored and explained before larger questions can be answered. • SP-1: Interactions between Utility and Non-Affiliated CWS operators occur primarily with operators in the same counties, while Contract CWS operators are more likely to have interactions with operators outside of the county their systems are serving. 51 Figure 9: Spatial Hypothesis 1 (SP-1)- County Specific Hypothesis Figure 9 presents a visualization of the spatial hypotheses. Unlike the primary or endogenous hypotheses, the internal characteristics of systems do not matter in this figure. Instead, the key is to look at the three-county area and the connections between systems within and across county lines. In all three counties the local overlap is shown through the main inter- operator interactions for Utility and Non-Affiliated operators being with CWS operators within their own county. Conversely, the three-contract operated systems all have connections to operators outside of their county. While not every connection is expected to follow the exact pattern of the hypotheses, the majority of inter-operator connections should follow this pattern. In both county B and C there are Contract operators who are interacting with operators in their own county, while the Contract operator in county A is only outside of its county. Further, in county C one Utility operator is connected to another Utility operator outside of its county. This could occur because the operator in county A is physically closer in proximity to the operator in 52 county C then any of the other C operators. Also, in county C there is a Non-Affiliated operator who is not connected to any other systems, which is another real possibility. The second spatially explicit question raised by this research focuses on the spatial autocorrelation of inter-operator interactions. Stemming from Tobler (1970)’s first law of geography – everything is related to everything else but nearby things are more similar than distant things -- the idea is that CWS operators reported interactions should be similar based on where they are located. This leads to SP-2 that operators near each other will have more similar amounts of inter-operator interactions than operators more distant. • SP-2: Inter-operator interactions have spatial structure such that operators are more likely to interact with each other if their systems are close together in geographic space Figure 10: Spatial Hypothesis 2 (SP-2)- Spatial autocorrelation of inter-operator interactions Figure 10 shows spatial hypothesis 2 (SP-2) through the volume of each CWS (cylinder) representing the number of inter-operator interactions similarly to Figure 9. Here the most important feature is the proximity of the systems to each other (represented by being in the same 53 county) and the similarities in the number of interactions. County B systems all have high interactions, while county C has low interactions, which represents those nearby systems are more likely to have the same number of inter-operator interactions. SP-2 argues that geographic proximity is the driver of interactions more so than the internal characteristics (endogenous hypotheses) or the type of operator. 2.8 Conclusion In conclusion, the literature and regulatory guidance make a strong case for the role of knowledge sharing and spillovers between unrelated CWS operators as a way of increasing their performance. Through the regulatory oversight lens, operators are encouraged to increase their TMF capacity through ‘effective external linkages’ (Shanaghan et al., 1998). Studies have investigated two types of connections between CWSs and the government or CWSs and population served (Grooms, 2016; Montgomery et al., 2018), but the connections between CWSs have remained unexplored. Organizational learning theories show that knowledge sharing between un-related organizations provides the lowest investment to increase the knowledge and routines or organizations (Levitt and March, 1988). Innovation systems theory gives a model for knowledge sharing leading to increased performance, but the lack of resource-based sector investigations leaves the applicability of the model for CWSs in question (Soete et al., 2013; Wehn and Montalvo, 2018). Agglomeration economies theory of spatially based knowledge spillovers have shown regional heterogeneity in performance of firms (Rosenthal and Strange, 2004), but has yet to investigate firms that remain ‘geographically immobile.’ The direct research on the water operators has provided evidence of perceived benefits for international inter-operator knowledge transfers, but it is difficult to use these in the context of the regulated U.S. water systems (Pascual-Sanz et al., 2013). A conceptual framework for 54 knowledge transfer for water systems was created by adapting capacity theory from the EPA and adding in the role of inter-organizational knowledge exchange between CWS operators as a factor leading to compliance. Through testing and addressing the endogenous and spatial research questions, this research can begin to fill the gaps in the literature about the factors causing inter-operator interactions, and then answer the primary research question about the applicability of the knowledge sharing and spillovers between CWS operators to increase the performance of CWSs. Further this model and the results of analysis testing its applicability for CWSs can extend to other types of organizations that share the key features: geographic immobility, monopolistic traits, and non-competitive industries. 55 CHAPTER 3: Study Area, Survey, and Data 3.1 Introduction This chapter outlines the study area, survey and interviews, and the outside data used in the analyses of this research. First, the chapter focuses on explaining the study area through an overview of CWS regulation in Michigan covering the demographics of CWSs and CWS operators and assessing the benefits and limitations of using Michigan as the study region. The chapter then explains the construction, deployment, representation, and results of the novel survey and interview data. The chapter then shifts to explaining how the spatial locations for each CWS in the sample was obtained. Finally, it finishes off with a discussion of the outside data included in the study and what each piece of data represents. 3.2 Study Area 3.2.1 CWSs in Michigan 3.2.1.1 Michigan SDWA Regulation In the state of Michigan, the Department of Environment, Great Lakes, and Energy (EGLE) has primacy authority for SDWA enforcement (EGLE, 2021). Following the federal regulation of the SDWA, the Michigan Public Act 399 was enacted in 1976 and provided the Michigan Department of Environmental Quality (DEQ) the power to maintain direct control over public drinking water systems. In 2019, Executive Order 2019-02 restructured the DEQ and renamed it EGLE with the guiding purpose of the department to “administer the implementation of administrative rules and the conduct of administrative hearings- particularly those that protect Michigan’s air, land, and water, and the public health- by consolidating state functions and responsibilities relating to administrative hearings and rules.” The department provides regulatory oversight for nearly 1,400 CWSs and ~10,000 non-community water supplies (EGLE, 2021). EGLE enforces the Public Act 399 which provides the regulatory oversight on multiple areas (contaminant levels in the finished drinking water, treatments, samplings, monitor and 56 reporting, operator certifications, and more) to ensure drinking water is safe for human consumption (EGLE, 2021). 3.2.1.2 Overview of Type and Ownership of Michigan CWSs Data collection for Michigan CWSs primarily took place in 2019 and 2020, which required the research to assess any changes in the CWSs over that time. Michigan 2020 Quarter 4 (Q4) had 1,380 systems which report serving 7,374,774 people, while in 2019 Q4 there were 1,380 systems serving a reported population of 7,321,942 people. Although the number of CWSs stayed the same there was a little change in the population served. The 1,380 CWSs in both quarters were not completely the same as there was an absolute change of six CWSs, with three new CWSs entering in 2020 Q4 and three CWSs that were in 2019 Q4 that were not in 2020 Q4. These can be seen in Tables 3 and 4 below. It should be noted that all three of the new systems are small ancillary non-governmental systems. PWS ID System Name IPU Ownership Population MI0062955 Heartland Health Care Non-governmental Ancillary 150 and Center Healthcare MI0066700 Non-governmental Ancillary 25 The Porches Healthcare MI0000501 Beach House Non-governmental Ancillary 25 Apartments Development Rentals Table 3: Systems New to the SDWIS and EGLE database in 2020 Quarter 4 that were not in 2019 Quarter 4 The three systems that were not in 2020 Q4 but included in 2019 Q4 and can be seen in Table 4. Similarly, the changes in the systems were just small systems (<100 people) that were non-governmental ancillary systems. Due to this discrepancy between the systems, this research drops the six total systems and limits the total possible number of CWSs included as 1,377 systems serving a reported population of 7,374,574 people. 57 PWS ID System Name IPU Ownership Population MI0006477 Sunny Crest Youth Non-governmental Ancillary 41 Ranch Recreation MI0005993 Brookdale Apartments Non-governmental Ancillary 100 Development Rentals MI0040479 Pebble Creek Mobile Non-governmental Ancillary 70 Home Park Development Mobile Home Parks Table 4: CWSs that were in SDWIS and EGLE database in 2019 Q4 that were not in 2020 Q4 There are numerous different ownership and functions of systems that make them different in operations and commitment to the system. Table 5 outlines all 1,377 Michigan CWSs that were both in 2019 Q4 and 2020 Q4, the number of systems, and population summary statistics. 58 Ownership Total Total Minimum Maximum Systems Population Population Population Governmental Primary Municipal 438 5,190,355 40 713,777 Township 216 1,664,179 25 97,513 County 10 131,274 50 71,500 Special Districts 25 196,739 177 53,988 Wholesalers 14 0 0 0 Ancillary Federal 1 150 150 150 State 10 23,181 40 12,793 Local 9 1,942 25 1,233 Non- Primary Publicly Traded 5 9,204 50 5,535 Governmental Company Independent 3 1,505 261 845 companies Associations 52 15,541 22 3,584 Cooperatives 2 1,444 621 823 Ancillary Mobile Home Parks 335 94,706 10 2,268 Developments, 187 30,499 16 3,200 Condos, and Rent Health care 48 3,658 25 325 Recreation, 22 10,197 35 3,000 Religious, Education Table 5: Overview of CWSs ownership and function using Beecher et al. (2020) classification scheme The primary CWSs show far more governmental primary CWSs serve a larger population than the non-governmental primary CWSs. A “primary” CWS is a system whose primary function is to provide water as a utility service (Beecher et al., 2020). The primary CWSs are CWSs that are a ‘utility,’ because of that sole purpose of existing is the delivery of safe drinking water. There can be both governmental primary systems that are typically through governmental units to provide water to their local populations (Beecher et al., 2020), and non-governmental primary systems can be either for-profit or not-for-profit systems. For-profit systems can either be CWSs owned by the large publicly traded companies (example: American Water Company), or through individuals who privately own the water distribution business. Non-for-profit 59 systems are primary CWSs owned and operated by owner associations or cooperative (Beecher et al., 2020). In Michigan, municipally owned systems (village, town, city) made up about 32% of the total systems which serves a little more than 70% of the population. The townships have the third most CWSs making up about 16.7% of the total CWSs serving a reported 22.5% of the population. Between all four of the governmental primary systems, a little more than 97% of the population is served by these systems. In contrast, the non-governmental primary systems make up only about 4.5% of the total systems, serving less than 0.4% of the total population. Only the owner-association systems are relevant to the larger numbers as they make up close to 4% of the total CWSs and serve 0.2% of the total population. This reflects Michigan’s particular CWS environment, as there is almost no presence from the non-governmental for-profit primary CWSs, with only three non-governmental primary independently owned CWSs with the sole purpose of water distribution, and only five publicly traded company owned primary CWSs. Ancillary systems can be broken down into governmental and non-governmental systems where the water service is secondary to the primary activity of the owner organization (Beecher et al., 2020). The ancillary CWSs are not ‘utilities’ because they are owned by organizations with primary objectives that are not based around solely delivering water. Governmental ancillary systems are facilities owned and operated by a governmental organization or unit, where the primary purpose of the unit is not water service. An easy example of this would be the CWS of Michigan State University, where they have their own CWS but the primary purpose for Michigan State is a higher-education institution and not as a water service provider. Other forms of governmental ancillary systems could be federal/state/local services that could be military bases, public healthcare facilities, educational facilities, or public housing. While non- governmental ancillary systems are CWSs that provide water to a residential population, their 60 primary purpose for existing is unrelated to water distribution. Examples of non-governmental ancillary systems are residential development systems (owned by developer), condominium and apartment systems, recreational facilities, religious facilities, education, and healthcare, which also provide drinking water to their residential facilities. The ancillary CWSs are inversed compared to the primary CWSs, where the non- governmental ancillary systems make up the majority of the ancillary systems and there are very few governmental ancillary systems. The largest type of CWS for non-governmental ancillary systems are mobile home parks, which make up about 24% of the total systems, but only covers about 1.3% of the total population. The next highest is the non-governmental ancillary developments (includes condominiums, developers, and rental units) which accounts for about 13.6% of total CWSs and serve about less than half a percent of the total population. On the governmental ancillary side there are very few systems representing ~1.5% of the total CWSs and only 0.3% of the total population. While the ancillary systems are very small, they need to be accounted for in the investigation of the CWSs as they make up about 46% of Michigan’s CWSs. While there is a big divide between ancillary and primary systems, there is an even larger divide between “wholesale” CWSs and any of the other categories. While many CWSs will sell treated water on a retail/wholesale basis to other CWSs (particularly smaller CWSs), wholesale systems as defined by Beecher et al., (2020) are systems that do not directly serve a final population but only sell their treated water to other systems for distribution to the final consumer. These are easy to identify as they have reported populations served as “1” or “0.” These systems are different from the CWSs that serve a final customer and other systems as they have less risk 61 for non-compliance as without end customers there are fewer requirements in the administrative and monitoring and reporting arenas (EGLE, 2021). In Michigan, there were 14 CWSs that operate only on a wholesale basis. These systems were removed from the sample as they are not a fair comparison to the other systems. This brings the total number of systems explored by this research to 1,363 systems serving a reported population of almost 7.4 million people. 3.2.1.3 Regulatory Units One of EGLE’s strategies for oversight of so many CWSs is to break down the state into regions and districts comprised of counties. There are 8 regions and 25 districts (EGLE, 2021). Figure 11 shows maps of the EGLE community water regions and districts, while Table 6 provides the number of systems and population served in each of the regions/districts. Figure 11: Michigan EGLE Community Water Regions and Districts (2021) 62 Region Systems District Systems Counties Population Population Lansing 139 District 11 45 Genesee (692,218) (286,471) District 12 49 Clinton, Eaton, Ingham (335,438) District 14 45 Gratiot, Lapeer, Shiawassee (70,309) Bay 172 District 21 53 Arenac, Bay, Saginaw (468,191) (284,255) District 22 64 Alcona, Clare, Gladwin, Iosco, Isabella, (132,782) Midland, Ogemaw, Oscoda District 23 55 Huron, Sanilac, Tuscola (51,154) Jackson 152 District 31 53 Jackson, Lenawee (576,404) (128,633) District 32 51 Hillsdale, Monroe, Washtenaw (391,013) District 33 48 Livingston (56,758) Warren 202 District 41 43 Wayne (3,790,076) (1,823,745) District 42 48 Macomb, St. Clair (946,317) District 43/44 111 Oakland* (1,020,014) Kalamazoo 221 District 51 38 Barry, Kalamazoo (537,486) (225,250) District 52 65 Allegan, Branch (62,420) District 53 64 Berrien, Van Buren (127,039) District 54 54 Calhoun, Cass, Saint Joseph (122,777) Grand Rapids 152 District 61 75 Mecosta, Muskegon, Newaygo, Oceana, (880,559) (316,743) Ottawa District 62 77 Ionia, Kent, Montcalm (563,816) Cadillac 206 District 71 50 Benzie, Lake, Manistee, Mason, Missaukee, (173,426) (47,396) Osceola, Wexford District 72 57 Alpena, Cheboygan, Emmet, Montmorency, (46,391) Presque Isle District 73 53 Grand Traverse, Leelanau, Roscommon (47,728) District 74 46 Antrim, Charlevoix, Crawford, Kalkaska, (31,911) Otsego Marquette 117 District 81 75 Baraga, Gogebic, Houghton, Iron, Keweenaw, (199,348) (107,527) Marquette*, Ontonagon District 82 42 Alger, Chippewa, Delta, Dickinson, Luce, (91,821) Mackinac, Menominee, Schoolcraft Table 6: Overview of Systems, Population, and Counties for Michigan EGLE Community Water Regions and Districts (2020 systems/population numbers). *Slight variation from the total EGLE. Marquette is split between 81/82, but this research used 81. Oakland is technically having two full districts 43/44 to itself and is condensed down into one for this research. The largest region by number of systems is the Kalamazoo region with 221 CWSs serving ~580,000 people; however, the Warren region has 202 CWSs serving over three million 63 people. The Warren region is home to three most populous counties (Wayne, Oakland, Macomb) in Michigan, and no other region comes close to that large of a population served; thus, the Warren region skews the average population served to be 921,320 when included. Removing the Warren region decreases the average population to 503,948 people, which is more in-line with the rest of the State. On the other hand, the Marquette region which represents the 15 upper peninsula counties has the lowest number of systems at 117 serving ~200,000 people, while the Cadillac region has almost double the number of systems (202) but only serves ~175,000 people. Lansing, Bay, Jackson, and Grand Rapids regions are all in about the center of both number of systems and population. While these regions serve similarly sized areas and populations, there is far more heterogeneity in the Michigan EGLE districts among these regions. The largest district in population served is District 41 (Wayne County) serving over 1.8 million people from 43 CWSs, while the smallest district in population is District 74 serving only ~32,000 people from 46 systems. There is a clear difference in these areas between the urban and rural districts when it comes to number of systems and population served. This is important to account for as Marcillo and Krometis (2019) found an urban and rural divide in SDWA compliance where rural CWSs had higher prevalence of SDWA violations than urban systems. To control for this divide, the USDA Rural-Urban County continuum codes were attached to each district’s counties to define the rural and urban districts. The rural-urban continuum codes are numbers 1 through 9 representing the urbaneness or ruralness of counties, where a “1” through “3” represent metropolitan counties with populations (1 = 1 million or more, 2= 250,000 to 1 million, 64 3=counties with less than 250,000 but in the metro area) (USDA, 2020). Codes 4-9 range from counties with “Urban population more than 20,000 and adjunct to a metro area” to “completely rural with urban populations under 2,500 people”, respectively (USDA, 2020). Attaching these codes to the county then assigning each district the average of the codes gives a proxy measure to reflect the urbaneness and rurality of the districts. District 41 is Wayne County which is home to the biggest city in Michigan (Detroit) and has a population estimate of over 1.8 million people, so it would get a value of a 1; there are four districts that have a rural-urban code of 1 and all of them are in the Detroit metropolitan region. Conversely, District 71 has the lowest district value of 8.14, where four out of the seven counties had rural-urban codes of “9” and the other three had a value of “7”. The average value for the rural-urban codes was 4.18 (median 3.66). Due to this range of values, the research will control for rural/urban by splitting the districts into urban if they have a rural-urban code average of less than 4, and into rural if they have one above 4, which results in 11 rural districts and 13 urban districts. 3.2.1.4 Michigan CWS Operators Michigan EGLE’s oversight of CWSs extends to the drinking water system operators, where they provide the certification and training (EGLE, 2021). CWSs have higher requirements than the non-community or transient systems for their operators because serving the exact same population year-round creates more risk for the water user/consumer (EGLE, 2021). CWS operators need to have multiple skills as they are responsible for the water quality sampling, inspections of the infrastructure, correcting deficiencies, conduct the treatment, reporting and record keeping, and handle any emergency situations (EGLE, 2021). Michigan CWSs has three categories of water operators: Designated Operator in Charge (DO), operator, and distribution system operators. The DO has greatest responsibilities as the highest-ranking 65 operator for any system and have the most legal liability for SDWA compliance failures (NEIWPCC, 2013). In Michigan, each system has at least one of DO but can have multiple other operators and distribution system operators at the same system. Further, DOs can be the DO for multiple CWSs and may not be physically on-site daily for each of their CWSs. This research focuses on the DOs because this population of operators provide this research with quality understanding of the operator perspectives and ensures the lead decision makers’ perspectives are explored. Based on the system size and treatment techniques employed, any Michigan CWS operator needs to be certified through Michigan EGLE in one or more of the three types of certifications; (S) water distribution, (D) limited treatment, and (F) filtration (EGLE, 2021). Since each certification represents a different component of drinking water systems, many operators hold multiple certifications. Each of the categories is broken down into various level of 1-5, where “1” would represent the highest level and “4” represents the minimum requirement for CWS operators, while “5” is for non-community water systems (EGLE, 2021). The breakdown for “S” distribution certification in Michigan can be seen in Table 2 in section 2.4 and shows how population served the driving force behind the required level of certification, where the highest “S-1” is for CWSs serving more than 20,000 people, while “S-4” is for CWSs serving under 1,000 people (EGLE, 2021). In order to get the certification in the first place, the operator has to have a high school diploma or equivalent, have defined minimum on-the-job experience (advanced education can be substituted for experience), and pass an examination (EGLE, 2021). The examinations are directly related to the “S”, “F”, or “D” requirements to ensure the operator can do their job. If the operator passes the exam and meets the requirements, they receive their certification and have to renew the certification every three years through 66 completing continuing education credits (CEC); for the large systems and high certifications (1s- 2s), they need at least 24 or more hours of training (with 18 being TMF capacity focused), while systems that are “3s” (between 1,000 and 4,000 people served) have 24 hours of training but only half need to be in TMF capacity building, and the lowest levels for CWSs is at least 12 hours of CECs with only six hours being TMF capacity focused (EGLE, 2021). Michigan EGLE either offers or approves outside organizations (such as MRWA) to run the CECs that allow for the operators to renew their certification. If an operator does not take the courses and fails to renew their license, then the CWS is not SDWA compliant and EGLE takes regulatory action. Information about the individual operator certification and renewals can be obtained from the Michigan EGLE’s Operator Training and Certification Information System (OTCIS) Database. Just as with the ownership and function of systems, there is heterogeneity in the operators with major differences in the types of systems operated and organizational structure behind the operator. The overview of the types of Michigan CWSs operators provided through the 2019 FOIA of Michigan EGLE ( “wholesale” CWSs removed and two incomplete operator information CWSs also removed) are shown by the number of operators and systems in Table 7. There are three broad types of operators in Michigan: Utility, Contract, and Non-Affiliated operators. Utility operators are defined here as an operator who is employed full time by a utility, meaning the operator’s employment is through an organization (governmental or non-governmental) whose primary focus is to provide water, where the organization both owns and operates the CWS/s. These are broken down into municipal/ government employed operators, and non-governmental water utility operators. The governmental operators here could have any number of designations (for example: water system operator/ supervisor / director or the director of public works), but their direct employment is through the village, city, township, county, state, etc. and they are operating at least 67 one CWS that is a primary utility. Some of these operators also run smaller non-utility CWSs within their regions based on need and quantity of water their plants treat. This is the largest group of operators in Michigan as they make up about 74% of the total operators and they operate about 54% of the total CWSs in Michigan. The non-governmental Utility operators are employed by one of the “private” utilities and are often an operator directly employed by one of the for-profit large water companies. Michigan does not have a large non-governmental utility CWS presence with only one for-profit Utility operator running five systems that serve a total population of less than 9,500 people. The Contract operator is typically an operator employed by a consulting firm that has a contract to operate a system but does not own the physical assets to the system. In these consulting firms, there may be many operators, and the CWS contracts can be located anywhere in a state and are often used due to lack of local interested and qualified labor pool or inability to financially hire a full-time operator. For many of the non-governmental ancillary systems, this is the primary operator employed, as the small mobile home park is not going to be able to afford a full-time water operator for 25 people served. Typically, these Contract operators will work for an engineering or consulting firm and each operator will be the DO for multiple CWSs. However, some of the non- governmental primary utilities will have contract operations without owning the CWS assets and in Michigan Suez Water has three of their employed operators running three systems as contract operators. All Contract operators make up about 14% of the total operators in Michigan and run about 40% of the total CWSs in Michigan. The final group is the Non-Affiliated operators who are isolated in the sense that they operate systems but are not employed by a utility or contract operations firm. Typically, the Non- 68 Affiliated operators are running small systems and their main employment is predominantly not water system operation. Many of these operators in Michigan are the apartment complex or mobile home park owners who went through the certification process and run the CWS themselves. They make up the smallest percentage of total operators at about 12.5% and operate about 7% of the total systems. All these operators are very different from one another, and this research aims to uncover how these differences relate to inter-operator interactions and SDWA compliance. However, the lack of non-governmental utility operators and systems in Michigan limits the scope of inferences to just an exploration of the municipal (Utility), Contract, and Non-Affiliated operators. Broader Category Type of Operator Unique Michigan Operators CWSs (DO) Utility Operator Municipal/Governmental 577 729 System Operator (Primary) Non-governmental–Water 1 5 Utility Operator (Primary) Contract Operator Contracted Operator 77 474 (Engineering/ consulting firms) Non-Affiliated Small Systems or Single System 124 153 Operator Operators with no organizational affiliation (Typically Owner of Ancillary System) Totals All types 779 1,363 Table 7: Overview of Michigan CWSs DO by types and number of CWSs 3.2.2 Advantages and Disadvantages of Michigan as a Study Area This research focuses on CWS operators in Michigan because Michigan provides demographic, geographic, and data collection advantages over other states. One of the key geographic features of Michigan is the Great Lakes, which create a natural boundary limiting the impact of edge effects in Michigan to just 11 out of 83 counties. Edge effects are present when the observations near the borders of the study have fewer neighbors or unobserved neighbors 69 outside the study region (Bailey and Gatrell, 1996). Spatial investigations of CWSs could result in incorrect inferences due to edge effects because characteristics of CWSs in bordering counties may impact the CWSs in focal counties. For example, the operator of a Michigan CWS near the Michigan / Ohio border might be interacting with operators in Ohio and not in Michigan. Such interactions would be unobservable to this research, but the impacts of these edge effects are limited in Michigan compared to other states due to the natural border created by the Great Lakes around nearly the entire state. Using Michigan provides further benefits to explore knowledge transfers and spillovers between CWS operators, through the demographics of the state which range from high metropolitan populations in Wayne, Oakland, and Macomb counties to low rural micropolitan populations predominantly in counties in the upper peninsula and upper-lower peninsula (Vojnovic, 2009). The two regimes provide several interesting questions about both the urban and the rural areas. According to USDA (2020), about 22% (18 counties) are parts of metro or urban areas, while about 78% (65 counties) are non-metro rural areas. Further, the urban areas in the southeastern part of the state have low population densities, which creates particular challenges for CWSs (Vojnovic, 2009; Beecher and Kalmbach, 2013). As discussed in section 2.3, the population density can increase or decrease the amount of physical distribution infrastructure needed to deliver safe drinking water. Previous research (Marcillo and Krometis, 2019) has found rurality of CWSs increases the probability of SDWA violations. Through using a state with both urban and rural areas, this research is able to effectively continue to explore the role of rurality in SDWA compliance. 70 Josset et al. (2019) pointed to a major limitation of CWS research as the dearth of consistent quality and comprehensiveness of “water” data between states. There are major implications for this research, as the information on the inter-operator interactions are needed to be collected directly by the research and the reliability of the Michigan CWS operator contact information data ensured higher and better-quality feedback from the operators (Jones, Baxter, and Khanduja, 2013). A Freedom of Information Act (FOIA) request in the summer of 2019 provided all the contact information for each certified operator for Michigan CWSs. The FOIA provided more than just the contact information, but also brought along the organization of employment for each operator. The FOIA data only had contact email information missing from three CWS operators and they were removed from the sample (776 total). Having access to this high-quality operator specific data in Michigan provides a strong advantage to exploration of CWS operator interactions. Focusing on a single state helps limit the impacts of the issues of the heterogenous regulatory landscape, set up by the US regulatory oversight for SDWA compliance. Section 2.4 outlines some of the differences between states in their regulatory rules for water quality, monitoring and reporting, and the operator certifications. One of the major problems with multi- state studies is they are using data from multiple regulatory regimes and SDWA violations are a perfect example of the issue. State primacy agencies report SDWA violations to the EPA which renders national or inter-state comparisons difficult (GAO, 2011; OIG, 2017) because it raises the question of whether a given state is outperforming another due to efforts to achieve higher SDWA compliance or simply due to better reporting. Through only using a single state, this research issues with different rules and reporting by different State primacy agencies. 71 The study area of Michigan alone does have some drawbacks and limits the inferential reach of the research. As discussed in section 3.2.1.2, Michigan does not have a large presence of for-profit nongovernmental primary CWSs. Michigan only has five nongovernmental primary for-profit CWSs systems (~0.4% of total CWSs), which based on Beecher et al., (2020) is a much smaller amount than Illinois (~6% of CWSs) or Pennsylvania (~12% of CWSs). In States with a larger presence of private-for-profit CWSs, Michigan’s CWS operators will not reflect those state populations. Some states such as Wisconsin (~0.3% of CWSs) and Minnesota (0% of CWSs) have a more similar CWS structural ownership population and this research’s inferences may be more applicable these states. Primary non-governmental for-profit CWSs in some cases also provide contract operations to CWSs they do not own, and there are only three CWSs in Michigan with a large primary non-governmental for-profit organization employing a single Suez operator. While research has yet to provide comparisons between state CWS operations, it cannot speculate that ~0.1% of Michigan CWS operators would be representative of all states. This limitation of the structural population of CWSs and CWS operators in Michigan limits the research in space to primarily explain the role of knowledge transfers and spillovers between CWS operators in Michigan for the Utility (municipal/governmental), Contract (consulting and engineering firms), and Non-Affiliated operators. Studying Michigan for 2019 to 2020 limits the research inferences at a single point in time. Neither the SDWA of 1974 nor the state specific rules are static features but continue to evolve (Tiemann, 2014). Section 2.4 explained drinking water regulation in the United States and discussed the multiple amendments to the SDWA of 1974 to better support and regulate CWSs and serve the CWS population the most up-to-date drinking water quality with no known adverse health impacts. Due to the evolution of the Federal and state guidelines, SDWA 72 violations, and operator certification rules, this research only captures a moment in time. Further, the survey was deployed during the beginning of the Covid-19 pandemic, and it is not a stretch to address that the chaos of the pandemic could have impacted the enforcement of SDWA violations in 2020 or the operators’ responses to the survey. It is important to frame the limitations of a single point in time and the context of that time for the research as they may limit the broader impacts of the research’s findings. 3.3 Operator Survey and Interview Data 3.3.1 Survey To analyze the factors impacting the knowledge transfers and spillovers between CWS operators and the effects of those knowledge transfers and spillovers on the CWSs’ ability to achieve SDWA compliance, this research collected information directly from CWS DO operators through both a standardized survey and semi-structured interviews. The survey and the interview data worked together as the survey provided the raw numbers and the interviews provided context for the survey results. This type of data is necessary because previous industrial and academic surveys (Dziegielewski and Bik, 2004; Blanchard and Eberle, 2013; NEIWPCC, 2013; Teodoro and Whisenant, 2013/2014/2015; Baum et al., 2015; OHADWS, 2018; AWWA, 2020) have not investigated the interactions between CWS operators and there is no existing database containing this type of information. Table 8: Overview of the Timeline and tasks for the Survey and Interview data collection 73 Table 8 shows the timeline of the different survey and interview data collection processes and an overview of the tasks in each time period. From spring of 2019 to November of 2019, 20 survey questions were developed and then deployed for a pilot survey. Following development, the pilot survey questions were refined to better reflect the research goals and one question was added. From March 9, 2020 to April 9, 2020 the full 21 question survey was deployed to 776 operators. Following the completion of the surveys, the operators who answered “yes” to being open to participating in an interview, were contacted and 20 operators participated in semi- structured interviews with six direct questions. This is the brief overview of the survey and interview data collection processes that will be further explained in this section of the chapter. First, the section explains the survey design, development, and deployment; then focuses on the interviews. Then section 3.3.1.3 provides the results for the survey and interviews, as well as assesses the representativeness of the population of CWS operators in Michigan. 3.3.1.1 Survey Design This research’s survey was constructed and refined using an iterative survey design process. First, it aimed to address some of the water data gaps, identified in Josset et al. (2019) to fill the missing data in the EPA’s Safe Drinking Water Information System (SDWIS) and Michigan Primacy CWS databases. Two major data gaps were identified: 1) the lack of inter- operator interactions data, and 2) the lack of operator specific information (for example: education, experience). These gaps in the data informed the focus of the survey to collect the operator-specific information that was not already available. Survey questions were then designed using Harrison (2007)’s tips for survey research, where the first and most important step in good survey design was engaging in pretesting of the survey. During the summer of 2019, an informal pre-test of the survey was conducted with a few CWSs operators who would 74 be in the survey’s target population. Pre-testing the survey with only 17 questions illuminated several issues from initial drafts that helped reduce ambiguity in any questions and added in more questions specifically relating to geographic locations, which brought the total number of questions to 20. Further, following Harrison’s (2007) suggestions for good survey research, we made sure to keep the question count low and estimated completion time to under 10 minutes to ensure full responses from the intended response sample. Following with advice from MSU’s Institute for Public Policy and Social Research (IPPSR), the instrument for the survey was selected to be electronic emails using Qualtrics survey software. Following the pre-testing, the 20-question survey was distributed to 80 operators in a formal IRB approved pilot study in November of 2019 (IRB# 00003590). The survey was sent to 80 operators in six Michigan counties (three high SDWA compliance performers, and three low SDWA compliance performers). The formal pilot returned 21 responses and allowed the research to explore the types of data, responses, and quality of questions for the final survey. From December 2019 to February 2020, the survey was further refined to continue to eliminate any more ambiguity in the questions and added in another question about the operator’s perceived usefulness of interactions. IRB #00004136 was the official approval for sending the survey out to 776 DO operators in Michigan representing 1,361 Michigan CWSs. 3.3.1.2 Survey Construction This section shows and explains all the survey questions that were included in the final 21 question survey. Survey questions were broken up into three key parts: (1) Operator Background and Education, (2) Operator Employment, and (3) Operator Knowledge Spillovers and Transfers. Every question either directly provided information unattainable through anything other than a survey, or was a question used to help validate the survey 75 responses. The final question asked if the operator would be open to participate in a semi- structured interview later to help the research contextualize the results of the survey. Survey: Operator Background and Education (Questions 1 to 5) Number Survey Question Survey Possible Answers Question 1 What is the highest level of education you have a) Some High School completed? b) High School Degree or (Select only one of the following) Equivalent c) Associates Degree d) Bachelor’s Degree e) Master’s Degree f) Professional Degree g) Ph.D. Question 2 What is your highest level of drinking water a) No Certification certification? b) S-1 c) S-2 d) S-3 e) S-4 f) S-5 Question 3 How long have you been the 'Operator of Record' a) Less than 1 year for your current system? b) 1 to 2 years (If you operate more than 1 system, please indicate c) 2 to 5 years the length of time at the system you have been d) More than 5 years operating the longest.) Question 4 How many years have you been at your current • Box Entry level of drinking water certification? (Please record the number in the box) Question 5 How many hours of continuing education (re- • Box Entry education) for certification renewal did you spend in the last 12 months? (Please record the number in the box) Table 9: Survey Questions on the Background of the Operator Table 9 outlines questions one through five on the survey, which investigated the CWS operator’s educational and professional background. Research (Teodoro, 2014; Shahr et al., 2019) has found that the educational and professional experience of the operator could impact the amount of the effective external linkages between operators, causing the research to include operator background information. The background information of the operator has questions focusing on educational attainment, operator certification level, length of experience, and previous experience. Question 1 explicitly asks what was/is the highest level of education achieved by the CWS operator, breaking down the choices into one of eight categories listed in 76 the table which match the US census education categories. Previous surveys of water professionals (Teodoro and Whisenant, 2012, 2013, 2014; Meier and O’Toole, 2013; Blanchard and Ellerbe, 2013) asked about the educational attainment of water system personnel to explore the trends and the relationships between educational attainment and CWS performance. Question 2 asked what the water operator’s highest Michigan EGLE level of drinking water distribution certification (S). As previously discussed in section 3.2.1.4, many operators have more than just the distribution certification, and have the limited treatment (D), and/or filtration (F) certification/s because their system does more than just distribution (EGLE, 2021). This information on certification level can also be obtained from the Michigan EGLE’s OTCIS Database, and in April 2020 the reported certification was compared to OTCIS certification to validate the respondents. Further, question 5 asked about the number of hours spent in continuing education credits (CECs) performed in the last year with a number in the box. However, one hour of time does not equal one CEC (EGLE, 2021), and the conversions between hours and CEC made the fill in the blank responses difficult to read or validate. Due to this complexity, this research just used the number of hours said to have been completed by the OTCIS database. Question 3 asked about the length of time they have been operator at their current system with choices of: less than 1 year, 1 to 2 years, 2 to 5 years, and 5 years or more. While question 4 focused on the number of years they have been at their current level of drinking water certification. 77 Survey: Operator Employment (Questions 6 to 13) Number Survey Question Survey Possible Answers Question Do you own the water system you operate? a) Yes 6 (that is, are you the owner of record for the b) No system's assets)? Question In your capacity as an operator, are you employed a) Private or Investor-owned utility 7 by any of the following? b) Publicly owned Utility (City/County/District) c) Engineering or Consulting Firm d) Other _______________ Question Were you previously employed as an operator at a) Yes- a community water system 8 another community or non-community water (s) system? b) Yes a non-community water system(s) c) Yes- both community and non- community water systems d) No previous experience Question How many community water systems with unique • Box Entry 9 PWS identification numbers (ID)s, do you operate? (Please record the number in the box) Question How many certified operators are employed by • Box Entry 10 your water system including yourself (at any certification level)? (Please record the number in the box below) Question Are you a member of the American Water Works a) Yes 11 Association (AWWA)? b) No Question Are you a member of a non-AWWA professional a) Yes 12 water organization? b) No Question How many hours in the last 12 months have you a) No Hours 13 spent at professional water meetings, conferences, b) 1-2 summits, or forums? c) 2-4 (Please record the number in the box) d) 4-8 e) 8-16 f) 16-32 g) More than 32 Table 10: Survey Questions on the Operator’s Employment Table 10 outlines the second section of the survey which investigates the operator’s current employment. This set of questions ensures that the survey is not biased due to professionalism qualities or homogeneity in operator type. Question 6 and 7 were questions that helped the research verify the operator and their classified type. Question 6 asked if the operator was the current owner of their water system, where Contract and Utility operators should have answered no to this question as either the public owns the CWS (utility) or in the case of Contract operators they are hired by the entity to run the system. The only operators who should 78 have answered yes to this question were the Non-Affiliated operators who could possibly own their own system. Question 7 asked about their employer to further directly verify the operator type. Since Michigan only has five CWSs owned and operated by a private utility, there could only be one operator answering a) Private or Investor-owned utility. The vast majority of the respondents should fall into the publicly owned utility (Utility), engineering or consulting firms (Contract), or in other (Non-Affiliated). These two questions gave the research another verification method for the survey responses by comparing them with the FOIA data on employment organization. Question 9 provided another opportunity for data verification as the number of CWSs ran by public water system identification number (PWS ID) can be verified between the operator and the EGLE data. Further, increased number of systems as the distribution operator could relate to more professional engagement with the increased opportunities. Question 10 directly asked about the number of other operators at their current system/s. Having more certified operators at their system/s would provide more possible between-operator interactions and possibly indicate a larger network of operators. Questions 11, 12, and 13 get into the details about the professional engagement of the operators. Question 11 and 12 asked simple yes/no questions about whether the operator was a member of the American Water Works Association (AWWA) and/or member of non-AWWA water groups. The purpose of the AWWA is national non-profit organization that membership provides water system professionals (including operators, CEOs) opportunities to continue their education through attending professional workshops and meetings, advocate for their needs, and offering opportunities to build their professional networks (AWWA, 2021). Non-AWWA membership could be with smaller groups like the Michigan Rural Water Association (MRWA) which aim to serve similar purposes of supporting water professionals (MRWA, 2021). These 79 groups exist to support water systems and help build the networks, and membership in these groups provides an avenue for possible inter-operator interactions. Further, Question 13 aimed to pick up on the level of activity within these professional organizations by identifying the number of estimated hours spent in the meetings/conferences/summits/forums in the last 12 months. Lower estimated hours spent in meetings would likely mean fewer possible opportunities for inter-operator interactions or lower professional engagement of the operator. Survey: Operator Knowledge Transfer and Spillovers (Questions 14 to 21) Number Survey Question Survey Possible Answers Question 14 In the last 12 months, have you consulted or a) Yes sought advice from another system operator? b) No Question 15 In the last 12 months, have you provided advice to a) Yes an operator from a different community water b) No system? Question 16 How many times in the last year have you a) Never discussed water treatment or distribution b) 1-10 techniques with an operator from a different c) 11-20 public water system? d) 21-30 e) 31-40 f) 41-50 g) More than 50 times Question 17 My interactions with operators from a different a) True public water system occur with operators of b) False systems outside of the county/counties my system/s serve. (True or False) Question 18 My interactions with operators from a different a) True public water system are with operators of systems b) False within the same county/counties where my system/s serve. (True or False) Question 19 In the last 12 months, how useful have your a) Useless interactions with other operators been in b) Not very useful improving your ability to better provide safe c) Neutral drinking water? d) Somewhat useful (1 useless and 5 being very useful) e) Useful Question 20 Please feel free to provide any additional • Box Entry comments regarding your experiences interacting with operators from a different public water system. Question 21 Would you be willing to participate in a follow up c) Yes interview with the researchers? d) No Table 11: Survey Questions on the Operator’s Interactions and Knowledge Spillovers and Transfers 80 Table 11 shows the final group of survey questions that directly addressed knowledge spillovers and transfers between CWS operators. Questions 14 and 15 asked about both directions of whether the operator has provided or sought advice from an operator of a different CWS. It was important to ask about both directions as Wehn and Montalvo (2018) found some operators provide and seek advice, while others will only give or receive advice through their interactions. Question 16 directly asked them to estimate the number of interactions with outside operators about treatment and distribution techniques. This was the most important question of the survey as it these reported inter-operator interactions were the basis of the research’s primary hypotheses. Questions 17 and 18 asked about the county in which their interactions take place, which related to the previously discussed CWS geographic data limitations and the previous CWS compliance literature. Question 19 focused on the operator’s perceived usefulness of interactions. Question 20 was an open question that allowed the operator to share any other thoughts about their experiences with inter-operator interactions. The final question asks if the operator would be open to being contacted again for an interview (IRB#00004557) to help contextualize the results of the survey. 3.3.1.3 Selected Survey Results The first section of questions on the background of the operators showed heterogeneity in the educational background of the operator, while showing more homogeneity in the length of they have been operators. Only three operators did not have a high school degree, while ~52% of the operators’ highest level of educational attainment was ‘high school or equivalent (GED)’. ~22% of operators had an associate’s degree and ~18.5% had a bachelor’s degree. There were 12 operators with a master’s degree, and four operators that had obtained a professional or Ph.D. The heterogeneity in the educational attainment of the operator was interesting because about 81 25% had a bachelor’s or higher, while the vast majority had not obtained the bachelor’s degree. 70.1% of the operators have been with their current CWS/s for more than five years, with only ~8% being at their current system for less than 2 years. The number of years at their current level of certification had ~56% of operators being at their systems for more than 10 years, and only ~6% having less than two years at their current certification level. ~22% of operators had fewer than five years at their current certification level, and ~25% with their certifications between five and ten years. The maximum length of time an operator had been certified at their system was over 50 years, and ~2% of the operators had more than 40 years’ experience. The survey results on the length of time at their CWS and certification level shows that the vast majority of operators have been operators for a long period of time. The operator employment questions captured more information about the systems and structure of their employment. Only 22.8% (58 operators) were the owner of their system and all of these operators were the Non-Affiliated operators, while there were a few Contract operators who owned at least of the systems that they operated. The vast majority of the operators did not have ownership of their systems and were just employed by the municipality or a Contract operators’ organization. The majority of the operators (~60.2%) had no previous experience with CWS or non-community water systems prior to their current system/s. Only three operators moved from being a non-community water system operator to CWSs and about 40% of the operators had previously been employed by at least a CWS. 31.5% of the operators were the sole operator of their systems, while 47.2% responded that they had two to five other operators. The highest number of other operators was 30 and was from a Contract operator’s organization, however out of the 15 operators who responded they had more than 10 other operators at their system, only two of the operators were Contract operators and the rest were Utility operators of 82 the larger systems (by reported population served). The number of systems run by each operator was used to validate the EGLE FOIA data on the number of CWSs operated. The majority (68.5%) of the operators only ran a single system, and ~85% of all the operators only ran one or two systems. The operator with highest number of CWSs was a Contract operator running 88 small ancillary CWSs. About 55% of the Contract operators ran more than five CWSs, while about 78% of the Utility operators ran only a single system. The questions about group membership illuminated that about 82% of the operators belong to at least one professional water organization. ~44.5% of operators were a member of AWWA and another water organization, while ~24.8% only belonged to AWWA and ~12.6% were not AWWA members but another organization. While organization membership offers some details about the avenues for interactions the actual time spent in water related professional meetings captured the commitment to an organization of the operator as well as captured the time spent in meetings by operators who are not a member of any organization. About 12.39% of the operators said they did not spend any time at professional water events, while about 21% of operators spent two hours or less at the meetings. Over 55% of the operators reported spending more than eight hours at the events over the previous 12 months. The final set of questions focused directly on the interactions with other operators as an exploration of the knowledge spillovers between operators. 87% of operators responded that they had consulted or sought advice from another system operator over the course of the last 12 months, while 87.8% said they had been consulted by another CWS operator. The majority of the operators (63%) responded that they had between 1 and 10 interactions with operators of a different CWS over the course of the last 12 months. The second highest response category was 83 11 to 20 interactions (13.4% of operators), while only 12.2% of operators reported no interactions. 6.7% responded they had more than 50 interactions, and ~5.2% of operators had between 21 and 40 interactions. Questions 17 and 18 were a little problematic as the questions did not explicitly put the past 12 months’ timeline on the inside/outside county inter-operator interactions designation and operators who reported no interactions were expressing when they have interacted in the past whether it was inside and outside the county, not based situated in the last year. 13 operators reported not interacting at all in the past within or outside the county, while about 10.7% expressed only interacting with CWS operators outside of their county, and 36% reported only talking to operators in their own county. Almost half of the operators (48%) reported interacting with operators both inside and outside their county. About 63% of operators reported that they believed their interactions with other operators in the past 12 months were either useful (40.6%) or somewhat useful (32.2%) in improving their ability to better provide safe drinking water. About 23% of operators were neutral, while about 4.3% of operators found it to be useless (1.6%) or not very useful (2.8%). The last question about interactions was an open response question where operators could insert any additional information about their experience with interactions with other CWS operators. 74 operators (~29.3%) answered the question, and the responses were grouped into three categories (see Appendix 1: Table 41): positive, neutral or detailing experience, and other. 41 of these responses provided positive comments about the role of interacting with other operators. A couple of examples of these comments: “This is how we learn, classes only get you so far”, “It is an integral part of my decision-making process”, “They are priceless”, and “great tools to make sure we are all using our limited resources in the best possible way”. There were 23 operators that provided neutral responses about the role of interactions or provided details of 84 where they learn or interact. Some of these operators pointed to the regional or state water organizations they are a part of or the spaces of interactions like the CEC courses but did not discuss any benefits to the interactions. Other operators pointed to how long they have been in the field and how their interactions are mostly providing advice to inexperienced operators, while others pointed out the content, they discuss without mentioning the benefits. The final category of other included the responses that did not fit into the other two categories as they did not discuss interactions. The notable responses here were three operators: “I have not had discussions with other water operators”, “it happens only rarely”, and “Our system is so small, <1000, that we don't often run into complex issues”. Outside of the three responses almost all of the responses were positive about the benefits of operators interacting with one another. 3.3.1.4 Survey Representation Between March 9, 2020 and April 9, 2020 there were 254 operators who responded (~32.6%) representing 538 CWSs (~39.5% of total CWSs) in Michigan, which served a reported population of 2,966,018 (~40.5% of total CWS population served). There was one operator of a small mobile home park (<150 people served) who left many of the questions blank and they were removed from the sample, and it minimally impacted the representation percentages. Previous surveys of drinking water system professionals have shown a variety of response rates. Teodoro and Whisenant (2012,13,14) reported a rate of 40% on a stratified random sample of 300 total systems for phone surveying, while Blanchard and Eberle (2013) reported a 90% rate due to the partnership with the state to survey electronically. While Teodoro and Whisenant (2012, 13, 14) were able to get a 40% response rate on their stratified sample of 300 systems for phone surveys for water executives, that was an unlikely response rate for a web-based survey, where the typical response rate is between 20% and 40% (Nulty, 2008; FluidSurveyTeam, 2014). 85 The AWWA State of the Water Industry Report (2020) surveyed all water systems in the US and had a 2.2% response rate. The ~32.4% response rate of operators, ~39.5% of CWSs, and ~40.5% of the population served being included in the survey fits into the typical rates and provide a good sample of CWSs in Michigan. The final survey was a higher response rate than pilot survey (from November/December 2019) which received ~27.5% (of 80 possible operators), and only two of these operators responded to both the pilot and the final survey. The pilot survey responses could not be included due to the changes in the questions and were outside the temporal period of the widely distributed survey. While the global response rate was important as it informed the research on whether it has enough data to conduct statistical analyses, it did not explain about the representation of sub populations from the survey. For the rest of this section, the representation of the sub- populations is explored. Operator Sample Total Number Total population Minimum Maximum Type Operators of Systems (% of total Population Population (% of total (% of total population served Served Served operators in systems) by each operator Type) type) Utility 194 245 2,810,845 25 713,777 (~33.6%) (~45.6%) (~40.9%) Contract 31 260 146,676 18 10,483 (~40.3%) (~48.4%) (~36.8%) Non- 27 32 8,286 25 2,428 Affiliated (~21.8%) (~5.9%) (~20.2%) Table 12: Survey Respondents broken down by Operator Type Table 12 shows the sample of the operators broken down by the operator type. The sample of Contract operators were the best represented clearing the 30% to 40% goal through sample by percentage of operators (~40.3%) and systems (54.2%). Utility operators crossed the 30% threshold in the percentage of operators, systems, and population served. The biggest issue from the sample was the Non-Affiliated operators where their percentages of operator, systems, 86 and population all hovered around 21%, which were well under the 30% threshold and were far less than the utility or contract operator respondents. This was not completely unexpected because, as previously discussed, the full-time operators (Utility and Contract) were expected to be more professionally engaged and would likely have higher response rates. The deficiency of Non-Affiliated operators needs to be addressed as a limitation of any finding of this research. Sample Systems Sample Population Percentage of Percentage of as Percentage of as Percentage of Systems of Total Systems of Total Overall Systems Overall Population Systems - Population - (Sample Count / (Sample Count / Percentage of Percentage of Total Count) Total Count) Systems of Total Systems of Total Sample Sample Population Governmental 34.9% 40.5% -2.6% -0.8% Primary Governmental 25% 7.9% 0.4% 0.3% Ancillary Non-Governmental 51.6% 42.4% -2.5% -0.0% Primary Non-Governmental 29.6% 30% 4.7% 0.5% Ancillary Table 13: Survey Respondents broken down by Beecher et al. (2020) Classifications (2-level) Another keyway to break down the sample was through the types of systems they were running using the novel Beecher et al., 2020 dataset on the function and ownership of CWSs in the Great Lakes States, and Table 13 shows the key overviews for the representation at two levels of the Beecher et al. (2020) classification scheme. There were some skews in the sample based on representation. First, the primary systems (both governmental and non-governmental) represented ~35% and over 50% of the possible Michigan CWSs. While both types of ancillary systems were underrepresented by only capturing less than 30% of the total types of the Michigan CWSs. The total population of representation was adequate for governmental primary and non-governmental primary with 40.5% and 42.4% of the total population. The non- governmental ancillary only had about 30% of the total possible population represented and the governmental ancillary category only had 20 total CWSs, and the five systems that are in the 87 sample are small only representing 8% of the total population of governmental ancillary systems. The change in the percentages of representation from the total systems to sample systems showed -2.5% for any of the primary systems, which indicated these are a greater share of our sample than they are the overall population. Conversely, the total population had a greater share of the non-governmental ancillary systems by nearly 5%. The government ancillary represented close to 0% change due to the small amount of possible governmental ancillary systems. The percentage share of the overall population between the sample and total was small for both primary types of systems where non-governmental was almost perfectly represented, while governmental was 0.75% greater in the sample. Both ancillaries shared a slightly larger proportion of the total population than in the sample. 88 System Type Total Number of Average Population Minimum Maximum Systems Served Population Population (% of Sample) (Standard Deviation) Served Served All Systems All Systems 533 (100%) 5,564 18 713,777 (34,857) Governmental Municipal 154 (29%) 14,093 40 713,777 Systems System (62,567) Township 73 (13.7%) 7,262 25 97,513 System (17,501) County 7 (1.3%) 17,308 50 71,500 System (26,596) Special 8 (1.5%) 7,813 303 26,780 Purpose (8,905) Districts State Ancillary 2 (0.38%) 250 40 460 (297) Local 3(0.56%) 532 36 1,233 Ancillary (624) Non- Owner 32 (6.0%) 374 25 3,584 Governmental Associations (671) Systems Mobile Home 149 (28%) 322 18 2,268 Parks (389) (Ancillary) Other 75 (14%) 202 20 3,200 Developments (413) (Ancillary) Other 30 (5.6%) 144 25 950 Ancillary (198) Table 14: Survey Respondents broken down by Beecher et al. (2020) Classifications (4-level) Table 14 shows the breakdown of number of systems and population served of the ownership and functions of CWSs using Beecher et al., (2020) novel database of ownership and function for the Great Lakes States to the fourth level. About 43% of the sample operators ran a municipal or township CWS, while another 28% of the systems were mobile home parks. There was a substantial difference between the municipal or township and mobile home parks as the governmental systems serve a larger population. 89 Size Sample Systems Sample Percentage of Percentage of Systems of as Percentage Population as Systems of Total Population - of Overall Percentage of Total Systems - Percentage of Systems of Systems Overall Percentage of Total Sample Population (Sample Count Population Systems of / Total Count) (Sample Count Total Sample / Total Count) Under 500 32.5% 32.2% 1.2% 0.30% 501 to 3300 36.3% 37.8% -2.5% 0.44% 3301 to 10000 33.1% 33.6% 0.04% 2.1% 10000 to 100000 28.1% 31.6% 1.4% 11.8% 100000+ 42.9% 65.9% -0.2% -14.6% Table 15: Survey Respondents broken down by 5-Level Population Served Size Categories Table 15 shows the breakdown in size of systems using the conventional SDWIS five category population served breakdown on both percentage of total systems, and sample population compared to the total population. The sample captured over 30% of the total systems in Michigan for the first three size categories. It was a little under representative of the total amount of systems-between 10,000 and 100,000 people served at only 28% of the systems included. The sample included three out of the seven Michigan CWSs in the largest sized systems (~43% of the total possible). Further, the largest sized system category was greatly represented by the population as the majority of the population in the largest sized category (65.9%) were served by the three systems. All the other categories had greater than 30% with the 501 to 3,300 population category having 37% of the total state’s population included. The difference between the share of total systems in each category and the share of systems in each category from the sample showed they were very close. Under 500 had a larger 1.16% larger share for total share of systems than in the sample, while 501 to 3,300 the sample had 2.49% more systems (compared to total sample) than all possible CWSs in the population range. 3,301 to 10,000 was nearly the same, while 10,000 to 100,000 size category might have been 90 overrepresented with 1.43% higher share. Largest systems were a little bit greater of the share of the systems than overall. Figure 12: Sample Representation of Michigan EGLE Community Water Regions Figure 12 shows the breakdown of the percentage of the total systems, population, and operators covered by the survey respondents sample. The lowest percentage of total systems in the sample was in the Warren (Detroit) region with only ~27.7% of the total systems included in the sample, while the Kalamazoo had the highest percentage of systems in the sample with ~51.1%. All the other regions had greater than 30% of the systems covered in the sample, with six different regions having greater than 35%. This indicated that at the system level there was a good representation of systems in in each region. Population had the inverse relationship where Kalamazoo had the lowest percentage of the population covered by the sample at ~23.6%, which indicated that while the sample the picked up well on the number of CWSs; however, it only captured the majority of the small systems in the region. The highest percentage was the Grand 91 Rapids region, with more than 50% of the population covered by the sample. There were four regions that had under 30%, which indicated that the population served is not nearly as well represented as the small systems in the regions. The Warren region was the second highest with almost 50% of the population in the sample, which indicated that the survey captured at least one of the large systems in the region. All but one of the regions had greater than 30% of the operators in the region covered by the survey. The lowest amount of operator coverage was in the Lansing region at only 27.9% of operators included. The highest was the Marquette region with 36.2% of the operators in the region responding to the survey. Figure 13: Sample Representation of Michigan EGLE Community Water Districts Figure 13 shows the percentage of systems, population, and operators in the sample from all the Michigan EGLE districts. The percentage of systems covered in the sample had an average of 39.6%, with the lowest percentage of systems covered in District 73 (Grand Traverse area) with only 20.8% and the highest being District 33 (Livingston County) with 62.5% of 92 systems covered by the sample. Six districts had over 50% of systems included in the sample, and six had under 30% of the systems included. Overall, 18 Districts had responses above the 30% target goal for the percentage of systems. The percentage of population covered by district had an average of 35.5% but had a much larger range than the other representation metrics of close to 67%. Districts 51 (Kalamazoo) and District 73 had less than 4% of the population covered despite District 51 having ~53% and District 73 having ~21% of the possible CWSs in the sample. District 73 had low response rates overall as seen by the comparison of systems and population, while District 33 was covered by systems more than adequately but is in a major metropolitan area and it missed out on a couple of the large population systems which made the percentage so low. There were five districts with under 20% and additional five districts with less than 30% of the population included in the sample. While there were six districts with more than 50% population representation and five districts with more than 40% of the population. The lack of population coverage in some of these areas could be problematic in the analyses on the population served. For percentage of operators in the sample, the average percentage was 33.5%, with the smallest districts percentage covered being District 11 (Genesee County) with only 24% of the possible operators responded to the survey and District 73 having only 25%. Only six (of the 24 districts) had operator responses of less than the mark of 30%. The highest response by operators was in District 33 with about 44% of operators responding, with District 51 and District 72 having greater than 40% of operators represented by the sample. The vast majority (75%) of the systems for operators in district had more than 30% responses of possible operators. Overall, the sample did a good job of representing CWSs and operators in Michigan, with only a couple of notable under-representations. There was a slight under-representation (<30%) 93 of the Non-Affiliated operators compared to the Utility and Contract operators, which means the that the Non-Affiliated operators in this survey sample may not accurately reflect the larger population of Michigan CWS Non-Affiliated operators. The 30% aim comes from previous research (Nulty, 2008; FluidSurveyTeam, 2014), which has pointed to needing about 30% of a population in a census survey to get heterogeneity of responses and accurate representation of the underlying population. Governmental ancillary systems were underrepresented slightly in percentage of systems and majorly underestimated for population served. However, this may not impact the results of the research as there were only 20 governmental ancillary systems and one of those systems that accounted for more than 50% of the population served in this category did not respond to the survey. CWSs that reported serving between 10,000 and 100,000 people were slightly underrepresented by the sample and the effect of size could be missed in this research. For the most part the regions and districts did a good job of capturing the systems, operators, and population served. There were a couple districts with low response rates that will require special consideration during the analyses. 3.3.1.5 Survey Issues and Concerns Three major concerns exist for survey research: 1) non-response bias, 2) misrepresentation of the population of interest, and 3) lack of a verification mechanism for the self-reported survey responses. The non-response bias is when the non-responses will bias the results because the respondents share other traits (Hoddinott and Bass, 1986). This could potentially be an issue within the work as non-response bias could cause the survey results to show correlations with some other factor; thus, rendering the results of the survey suspect (Hoddinott and Bass, 1986). To ensure that the research did not encounter non-response bias or surveying a poor sample of the population of interest, it explored the responses in a few different 94 ways in Section 3.3.1.4. The first way was combining the data of the operator to the system/s they operate. Understanding how responses to surveys relates to different structural factors, such as system ownership or function (municipal vs homeowners association vs authorities), ensured the survey and results were interpreted appropriately and did not extend beyond the structural features of the system. A second option for exploring these potential issues was to run t-tests on the results of the survey based on survey submission timing (Hoddinott and Bass, 1986). The basic theory was that there would be high correlations between operators who submit the survey later and those who did not respond. By running a t-test based on the time of submission, the research can find correlations based on the submission time of survey responses and if the operators of same organizational type showed high correlations in early responses, and the late survey respondents (controlling for organizational type) show high correlations in responses; then the research can be confident that there was not non-response bias. This issue was avoided as a simple t-test based on the which of the four possible weeks showed no difference in the responses on any of the variables of interest. Further, the survey of all operators in Michigan limited the scope of the questions that were completely explored. In order to ensure high response rates and quality of responses, the survey did not ask operators to identify systems or other operators by PWS ID, Name, or System Name. Thus, the survey was unable distinguish who the operators are interacting with based on their responses. This limited the survey to asking broad questions about whether operators predominantly interact with outside operators in or outside of their systems’ county. This was a limitation of the survey, and more detailed information on the extent and context of direct communications was explored through strategic selective semi-structured interviews of DOs in Michigan. 95 Finally, the proposed research lacked a direct verification mechanism for the self- reported survey, which limited the clarity and causality of inferences from the survey data (Fan et al., 2006). This dearth of verification mechanisms stemmed from the novelty of the research, as no studies had investigated the number or quality of inter-operator interactions within a US context. This meant that operators could have over- or underestimated the number of inter- operator interactions. Without a precedent from prior literature/surveys, this research is at risk of self-reported bias impacting the inferences. The responses of the self-reported survey make any parts of this research using the data exploratory work and more research will be required to confirm the findings. 3.3.2 Interviews The final question of the survey asked if the operator would be interested in a follow up short interview to help the research contextualize the results. 154 (60.7%) operators indicated they would be willing to be a part of the interviews. The research conducted IRB approved, semi-structured interviews of 20 DOs in Michigan (IRB#00004557). This allowed the research to explore further extents and connections between operators. The purpose of the semi-structured interviews was to help contextualize the results of the survey and provide more insights on the role of inter-operator interactions to illuminate the role of external linkages between CWSs operators on SDWA compliance. Interview participants were selected by the researchers to capture as many different types as possible of operators to help contextualize the operator experience. There was a mix of Utility, Contract, and Non-Affiliated operators, with the largest representation being Utility operators. Further stratification placed some operators in each of the categories of reported interactions and at varying CWS sizes, ranging from CWSs with only a reported population 96 served of 25 people to CWSs with over 250,000 reported population served. It was also broken down by the number of interactions reported by the operator with at least one operator in each of the reported interaction categories. The semi-structured interview was focused around seven areas (seen in Table 16), where the operator fleshed out more of their experiences. The first question asked the operators to tell their history and how they became a CWS operator. The interview then moved towards asking about what they have found to be best practices for avoiding SDWA violations for their CWS/s. The next four questions focused specifically on interactions, asking who they were interacting with, where the interactions take place, any barriers, and any benefits to CWS inter-operator interactions. These questions were primarily used to better contextualize the operator experience with interacting with other operators. The final question asked operators what the biggest problems for CWS operators were, to ensure that this research better understood what operators were most concerned about moving forward. Table 16 shows all the questions, probes, and the purpose for each question. The results of the semi-structured interviews are included in the discussion and conclusion section of this research 97 Category Question and Probe Purpose 1. History of Can you share with me the story of how the Operator you became a community water system This part is just to get a little bit more information focused on operator in Michigan? the operator's path. It is meant as an easy question that gets the operator and interviewer more comfortable with one another. Can use simple quotes from this to better outline the differences Probe: What led you to this as a career or in where operators are coming from. side job? 2. Experience What practices have you engaged in that with have helped you achieve SDWA The focus of this question is to see if there are other practices best practices compliance? outside of the interactions and learning that operators think about right off the cuff that relate to their compliance. These answers could illuminate some of the missing variables that are involved in the process or not completely in the scope of the research. I foresee adding in the confidential quotes from this Probe: How do you achieve compliance? into the discussion of the other activities that operators attribute compliance too. 3. Who Who are the operators you interact with most frequently (both inside and outside) This question tries to get a direct answer on who they talk too. of your system? This has a few different parts that really add to the research. 1) it helps validate the survey with linking inside/outside county, 2) it can tell about distances of interactions or avenues, 3) it can pick up on where the personal/professional relationships were Probe: I am trying to match some names, so forged. hoping you can provide some that match to the data. Table 16: Interview Questions, their purpose, and probes 98 Table 16 (cont’d) 4. Where Where do these interactions take place? Similar to question 4, this is trying to pull out the locations of the interactions. Is it on the phone, or in person? Do large meetings facilitate these? This will help supplement discussions Probe: Are there any meetings or groups about the avenues of interactions. that you are a part of that facilitate the interactions and learning? 5. Barriers What are the biggest barriers to your interaction with other operators? This question provides the operator the chance to talk about the barriers. One operator responded that they do not have an easy Probe: What would stop or encourage you spread sheet to reach other operators and understanding factors to interact with other operators? like the example could be interesting support to the research. 6. Benefits In your opinion, what are the benefits to interactions with other operators? In the reverse of question 5, this question focuses on the benefits. We want to better understand whether they feel as though there are benefits or if interactions don’t mean much to helping increase their job performance. If they are learning then a nice quote about learning works, if they are sharing resources then a nice quote on that. This is all supplemental to help the Probe: Does talking to other operators help discussion in the paper. improve your ability to complete your job? 7. Problems In your Opinion, what are the biggest Final question to just give them a last chance to share about the CWS problems CWS operators are facing? things they believe are most problematic for operators. operators face Probe: Just a last question, do you have any other thoughts about the what the biggest problems that CWS operators are facing? 99 3.4 External Data 3.4.1 Operator and System Location Data In order to effectively explore the spatially explicit hypotheses and the primary hypotheses, this research first identified the absolute location of all the CWSs represented by the sample. Not only in Michigan but also throughout the US, CWSs are highly heterogenous. Some of the government systems may match with municipal boundaries (typically the primary governmental systems), but in many cases, for the small CWSs (typically ancillary), capturing the geographic characteristics was more difficult as they were either outside the traditional municipal boundaries or were so small that using the entire municipality boundary would not make sense (Statman-Weil, 2020). First, this section outlines the census tract spatial scale and provides an example of the spatial scale issues for CWSs through outlining three differently sized CWSs in Isabella County. Then, the section discusses the process for finding absolute location and how they were intersected with data at the census tract scale. Trying to connect CWSs to their socio-demographics has been a problem for researchers as the national SDWIS database has too broad of a spatial resolution, and many states do not provide publicly or through FOIA the CWS boundaries data (Josset et al., 2019; Beecher et al., 2020). The national SDWIS database’s only reliable spatial resolution of the service area is provided at the ‘county served’, which provides the county where the CWS is located (Beecher, et al., 2020). Many researchers have used the county scale aggregated available socio-economic and environmental quality data to explore the relationships between socio-demographics, environmental quality, and CWS performance (Wallsten and Kosec, 2008; McGavisk et al., 2013; Grooms, 2016; Greiner, 2016; Pennino et al., 2017; Allaire et al., 2018; McDonald and Jones, 2018; Montgomery et al., 2018). However, these results always come with caveat of the “need for higher spatial resolution” because counties hold multiple systems (example: Berrien 100 County Michigan has 40 different CWSs), and overall socio-economic characteristics might not reflect the entire population in the county. Statman-Weil et al., (2020) explored this issue with investigation of the role of spatial scale in determining the relationships between SDWA compliance and socio-economic status and whether the relationships changed based on the choice of counties or a smaller unit of census tracts. Census tracts are “relatively small semi- permanent statistical subdivisions of a county” with an average of 4,000 people per tract with a population range of 1,200 to 8,000 people and are the smallest spatial scale that captured socio- economic demographic data (U.S. Census Bureau, 2021). Statman-Weil et al., (2020) could use census tracts because the State of Pennsylvania was one of the few states with a publicly available GIS shapefile from the State of Pennsylvania of CWSs boundaries. They found that the results of the relationships between socio-economic demographics and SDWA compliance changed substantially with the higher spatial resolution (census tracts) compared to only having the county socio-economic demographic information. Due to the issues with spatial scale, this research used census tracts as the spatial unit to capture the socio-economic and environmental quality proxy data. 101 Figure 14: Map of Isabella County Three CWSs and the Census Tracts they intersect Number of PWS ID System Name IPU Ownership Population Census Tracts Governmental MI0004530 Mount Pleasant 26,084 9 Primary Municipal Village of Governmental MI0006030 1,515 1 Shepard Primary Municipal Non-Governmental Maple View MI0000501 Ancillary Mobile 220 1 Estates (East) Home Park Table 17: Overview of the data and CWSs in Figure 13. Figure 14 and Table 17 show the intersection of the census tracts to three different CWSs in Isabella County Michigan. Figure 14 shows the location of the service area for three different CWSs in a county where there were no survey responses. To capture the census and environmental quality variables, where the finest resolution was the census tract, different strategies were required. With a fair amount of confidence, the research can say that the City of 102 Mount Pleasant’s city boundaries captured the vast majority of the population served, as the CWS served a reported population of over 25,000 people and the 2020 Census reported Mount Pleasant had a population of about 95,000, where the boundaries should encompass all the 25,000 people served by the CWS. There were nine census tracts that intersected the civil divisions shapefile boundary for Mount Pleasant (red outlined tracts). To effectively capture the census and environmental quality variables all of the data at the census tracts needed to be aggregated by the city boundaries. While this may be a slightly imperfect measure based on the map, where two of the census tracts cover only a small portion of the city, there were seven of the nine census tracts that are majority within the city boundaries. Unlike the large system and population of Mount Pleasant which intersected a number of tracts, the small village of Shepard is only within one census tract. Due to it only being inside one census tract, there was no need to aggregate it, and it could take on the value of the census tract. The final CWS was a small mobile home park serving a reported population of 220 people, and it only intersected a single census tract. While the tract was much larger than the 220 people in the mobile home park, the census tract was the highest level of reliable data to represent the park. Therefore, it took on the values of the census tract, even with the possibility of some flaws in the estimation. Populations outside the CWSs service area may be supplied by water wells that are not regulated under the SDWA as CWSs. This could also cause some distortion in the population demographics for the CWSs. 103 Type Count Average Min Pop Max Pop Intersection / Matching Pop (SD) Civil Division 202 13,749 25 713,777 Matched to MCGI State of Michigan (City, Town, (55,516) Civil Divisions Shapefile Township) Village 35 1,016 148 2,515 Intersected with census tract from (682) Villages Shapefile County 2 41,250 11,000 71,500 Uses entire county-wide census data (42,780) through summarizing by county and join Individual 298 263 (362) 18 3584 Intersected census tract from CWS point Table 18: Overview of the sample CWSs by the location. Table 18 outlines the spatial resolution on the CWSs in the sample and how the spatial resolution was obtained. The first step was to separate out all 537 CWSs into the CWSs that represented municipalities, village, county, and the individual systems. The research used the State of Michigan’s GIS shapefiles “Minor Civil Divisions (Cities and Townships)”, “Counties (v17a)” and “Michigan Villages- Framework V17” from the Michigan Geographic Framework data provided by State of Michigan. The “Minor Civil Divisions (Cities and Townships)” shapefile had the boundaries of all 1,520 cities and townships in Michigan, while the “Michigan Villages- Framework V17” provided the boundaries for all 253 villages in Michigan. “Counties (v17a)” contained the boundaries of 83 counties in Michigan. The fourth shapefile used for location was the “2019 Census Tracts” shapefile from the US Census Bureau’s TIGER geodatabase. There were 202 CWSs that represented a civil division’s (city, town, township) system, and they were manually matched to a GIS shapefile “Minor Civil Divisions (Cities and Townships)” by label to join the system information to the civil division. The census tracts that related to the civil divisions were intersected with the “Minor Civil Divisions (Cities and Townships)” shapefile using the over() function from the rgeos R package (Bivand and Rundel, 2020). This intersection provided the census tracts with the names of the civil divisions they intersected, and then these tracts were summarized by the name and provided the average of all 104 census and EPA data values to the civil divisions and thus to the 202 CWSs that served those systems. There were 35 villages, and these were matched to the “Michigan Villages- Framework V17”. Every one of the 35 villages only had one census tract representing them; however, they were intersected to the census tract using the over() function from the rgeos R package (Bivand and Rundel, 2020). There were two systems that were identified as county systems serving populations of 11,000 and 71,500 people, which took on the values of the county as they served large populations across the county. This is a little less confidence than the villages or civil division systems because the county is a much larger unit; however, without greater spatial resolution on the exact boundaries of these county systems the best that could be done was to use the countywide averages. While the 298 CWSs (mostly non-governmental ancillary) that were identified as individual non-community wide CWSs only intersected one census tract. Due to the lack of spatial resolution, the research had to take the additional steps of finding the location and then geocoding to the point. In December of 2020 all 298 systems’ service population areas were identified through web searches of each system, and an address for the system location was obtained. Then using the Geocodio online geocoding program the coordinates for each of these systems were obtained. Any results from geocoding programs needs to be assessed for accuracy as substantial error rates have been identified in these programs and failure to assess for location accuracy can negatively impact the results of research (Manoruang and Asavasuthirakul, 2019). The accuracy of the geocoding was checked through both the geocodio program which provided an estimated confidence percentage in the location of the CWS, and regardless of the accuracy percentage returned they were also assessed through ensuring that the CWS location fell within the ‘county served’ provided by the SDWIS database. Through these checks there were 27 105 systems (~9.2% of the CWSs) that were replaced through manually providing the accurate coordinates. Once the 298 individual systems locations were obtained, then they were intersected with the census tract taking on the values of the tract. 3.4.2 Performance Metrics According to Wang and Wang (2012) a firm’s or organization’s performance can be measured in two ways: (1) operational performance and (2) financial performance. Both types of performances are intricately tied together, as operational performance captures the increases in customer service, cost management, productivity, quality, and asset management performance; while financial performance captures the profit margins, profit growth, and return on investments (Wang and Wang, 2012). An increase in operational performance could lead to an increase in financial performance and vice versa. When focusing on CWS “performance” there are substantial data gaps on numerous of the possible measures of performance (Josset et al., 2019). Financial performance of CWSs is not often accessible and most of the data portals do not include some of the key operational performance metrics, such as water losses (Josset et al., 2019). Due to these data gaps SDWA violations have been regularly used as a key metric for measuring CWS performance by researchers and regulators (Rubin, 2013; Van der Slice, 2011; Allaire et al., 2018; EGLE, 2020). Michigan EGLE (2020) in a CWS capacity development report described violations as measuring SDWA compliance and compliance is a “measure of success” for CWSs (pg. 4). Following the precedent set by previous research (Wallsten and Kosec, 2008; McGavisk et al., 2013; Pape and Seo, 2015; Grooms, 2016; Switzer and Teodoro, 2017; Allaire et al., 2018; McDonald and Jones, 2018; Teodoro et al., 2018; Montgomery et al., 2018; Marcillo and Krometis, 2019; Schaider et al., 2019; Fu et al., 2020; Statman-Weil et al., 2020), and due to performance data accessibility limitations in Michigan, this research used 106 SDWA violations as a proxy measure for SDWA compliance. This limits the inferences on CWS “performance” only to how systems perform under the SDWA. However, it is a strong measure for CWS outcomes for the current research as the compliance data source is independent from the survey data, which avoids the common source bias (Meier and O’Toole, 2013). SDWA violations are multifaceted and a system-level regulatory violation, where systems could have health-based violations (drinking water poses a direct risk to human health), or administrative/monitoring and reporting violations (no direct threat to human health but failure to abide by the SDWA) (Rubin, 2013). Section 2.4 outlined the differences in health, monitoring and reporting, and administrative violations. To further understanding of the severity of violations, the monitoring and reporting and administrative violations can receive an extra designation of “Major (or serious) Violation” if it was a monitoring and reporting violation where the system either did not sample for the majority of regulated contaminants, or the system failed to take and/or report the majority of required samples (EPA, 2021). The major violations are labeled in such a way because the failure to adequately sample at a large scale could pose a threat to human health but is not an immediate health-based violation. SDWA violations from SDWIS (and State Primacy agencies) have been criticized for data reliability issues, and researchers are cautioned against using these data (OIG, 2017). Reports from Governmental Accountability Office (GAO) (2011) and USEPA’s Office of the Inspector General (OIG) (2017) pointed to a number of reliability issues with the SDWIS database, particularly around the underreporting of the health-based violations. In a 14-state study, OIG (2017) that about 26% of health-based violations were under- or mis-reported to the 107 EPA, which was down from the GAO (2011) findings of about 38% between 2002-2004. While underreporting has been a major issue, and it is imperative that all results from the research be interpreted with caution because of the known violation data problems, the SDWIS (and ECHO) database is still the best database containing CWS SDWA violation data and has used by numerous researchers/policymakers (OIG, 2017; Allaire et al., 2018; McDonald and Jones, 2018; Marcillo and Krometis, 2019). Another issue with SDWIS violation data is that comparing compliance rates between years is not recommended as there are a “rapidly increasing number and complexity of rules and requirements each year” (EGLE, 2020; pg. 5). Allaire et al. (2018) addressed this by only exploring a specific measure, MCL violations for “total coliform”, as that rule stayed consistent over time. If the research is not exploring across multiple years or time periods, then this is not a concern. Further, there are geographic issues in the quality of data: heterogeneity between State primacy reporting to the EPA renders national or inter-state comparisons difficult (OIG, 2017) because it raises the question of whether a given state was outperforming another because of efforts to achieve higher SDWA compliance or simply due to better or more complete reporting. However, if the research only investigates a single state, then it can avoid issues with interstate differences in reporting. The nature of the violations makes it difficult to use the “count of violations” as a CWS performance metric because multiple violations does not always reflect multiple CWS failures (Oxenford and Williams, 2009; Rubin, 2013). For instance, a failure to test for a single sample can result in multiple violations for each regulated contaminant not tested for by the sample regardless of whether the contaminant was above SDWA standards (Oxenford and Williams, 108 2009; Rubin, 2013). This has pushed the research into treating violations as a binary variable (either has one or not) rather than based around the count of violations when modeling at the system level (Allaire et al., 2018; Marcillo and Krometis; 2019). Even when systems and violations are aggregated to a larger spatial unit, the number of violations in that spatial unit are converted to either a percentage of systems with a violation (McDonald and Jones, 2018) in the spatial unit, or a binary indicator of whether any system in the spatial unit had a violation (Switzer and Teodoro, 2017; Allaire et al., 2018). This research adopts the binary approach for the types of violations for systems and aggregates the violating systems to a rate calculation for the EGLE Regions and Districts. Violation Type Total Number of Systems Population Served by Violations with Violation Violating System (% of all systems) (% of total population) 2020 Any Violation 757 274 (20.0%) 1,492,721 (20.2%) 2020 Health or Major 553 151 (11.0%) 1,019,745 (13.8%) Violation 2020 Health-Based 21 13 (0.09%) 252,602 (3.4%) 2020 Major Violation 532 145 (10.6%) 782,416 (10.6%) 2020 Administrative 204 157 (11.5%) 802,488 (10.9%) Violation (non- major) Table 19: Michigan 2020 Violations Overview Table 19 shows the overview of violations for all Michigan CWSs based on two aggregated violation metrics: (1) percentage of systems in the spatial unit with a violation, and (2) percentage of the population in the spatial served by a CWS with a violation. Both of these metrics are important to explore as they represent different things. The percentage of systems with violations could be above 50% in one spatial unit, and that could mean that there are a number of systems in a spatial unit where violations occurred, but these could be small systems that impact less than 10% of the total population. On the other hand, the percentage of the population in violation might be high, but it could be one large system in the area that is 109 responsible for the violation while the rest of the CWSs complied. Both metrics need to be investigated along with the type of violations in order to better understand the phenomena of violations versus compliance. Figure 15: Comparison of SDWA violation in research sample to all systems and systems not in the sample The split of the sample (survey respondents) by the operator type and violations in Michigan can be seen in Figure 15. Figure 15 shows the percentage of systems in total (dark blue), research sample (orange), and not in the sample (light blue), with a violation in 2020. Where the research sample was lower showed that the sample underrepresented the total population, and where it was higher it showed an overrepresented the population of violating CWSs. The sample of Contract operators was best represented of the overall population by the operator type. There was less than 1% difference between the research sample and the ‘All systems’ with a slight overrepresentation of CWSs with violations in the research sample than of the total population of Contract operated CWSs. Utility operators had almost 5% less violations in the research sample than in the overall and the non-sampled Utility operated CWSs had more 110 violations, which indicated that the sample of Utility operator-run CWSs was under representative of the overall population of Utility operated CWSs. Non-Affiliated operator run CWSs were overrepresented by the research sample. Overall, the sample did a pretty good job of representing the underlying populations as none of the operator types had more than 5% over or under representation of violating CWSs. The CWS performance metrics were based around SDWA violations and data were collected from SDWIS 2020 at the system level and utilized in two different ways. First, violations were categorized into three categories: any 2020 violations, non-health 2020 violations, and major 2020 violations. As previously discussed, treating violations as count variables did not make sense, as a single missed sample could be recorded as multiple violations even though it was really just one (Rubin 2013). This research treated each type of violations as a binary variable indicating whether there was a violation or not, which was the standard approach taken in previous CWS SDWA compliance research (Allaire et al., 2018; Switzer, Teodoro, and Karasik, 2016). Any violation referred to any violation (health, administrative, or M/R), and there were 78 CWSs in the sample with any 2020 violation and 377 without. Non- health violation is a binary variable (0 or 1), where 75 CWSs had a non-health violation (1), and 380 CWSs were without a violation (0). Major violations were the combination of major violations and health-based violations, where a ‘1’ meant the CWS had either a major or health- based violation, while a 0 meant they had neither but still could have had a different violation. 3.4.3 Managerial Capacity Managerial capacity is the “personal expertise and institutional and administrative capabilities (EGLE, 2020; pg. ii)” or more simply put the ability of CWS operations that allows the system to maintain compliance with State and Federal regulations (Shanaghan et al., 1998; 111 Office of Water, 2013). An EPA (2012) report pointed to four possible indicators of managerial capacity: ownership, staffing and organization (training and professionalism), previous violations, and effective external linkages. Managerial capacity is the broader area where the primary hypotheses relating to “effective external linkages” between operators and CWS performance has proxy variables (interactions, Operator Type*interactions) representing the effective external linkages. Operator Type*interactions is a model interaction term, that combines the operator type and the number of reported interactions. This term allows for the assessment of Utility, Contract, and Non-Affiliated interactions and SDWA violations. The managerial capacity data and variables came from a variety of sources: this dissertation’s survey of Michigan CWS operators, a July 2019 FOIA of Michigan EGLE (conducted by the researcher), EPA’s Enforcement and Compliance History Online (ECHO), and the SDWIS 2020 Quarter 4 data. All of the continuous variables were transformed to be mean centered variables using the scale() function from Base R library (R Core Team, 2013). The only managerial capacity variable that came from the ECHO and SDWIS violation databases was any violation in 2019 (or violation in the previous year). EPA (2020) and EGLE (2020) pointed to previous SDWA violations as an indicator of managerial capacity, as the organizational success in SDWA compliance reflects the organizational practices. Research has (Allaire et al., 2018) found that a violation in the previous year increased the probability of a violation in the following year. This research used the 2019 violation as a binary (0 or 1) for any 2019 SDWA violation at the system. About 17% (88 CWSs) of the sample had a 2019 SDWA violation and based on the previous research these systems should be more likely to have a SDWA violation in 2020. 112 As discussed in previous sections, not all operators are the same and managerial capacity can change based on the type of operator and the underlying experience and level of professionalism. Pons et al. (2014) found that the dearth of human capacity in the form of trained, full-time operators and managers leads to inefficient system management and increased risk of water system failures, while also preventing systems from improving treatment techniques, applying for funding, or exploring new supply sources to increase their performance. The problem is exacerbated for smaller systems because many operators are part time workers and water system operation is just one piece of the job (Dziegielewski and Bik, 2004). Using data from the 2019 FOIA of Michigan EGLE, operators were separated into possible groups: Utility, Contract, and Non-Affiliated operators. These groupings skim the surface of the differences between operators and controls for a basis of the professionalism and managerial capacity. This research uses ‘operator type’ as a proxy variable to represent a piece of managerial capacity, where the full-time operators are hypothesized to have greater capacity than the non- affiliated operators. The most important variable for the main hypotheses was the number of inter-operator interactions, as it acted as the proxy for knowledge transfer and spillovers between CWS operators. Detailed explanation of the variable can be found in section 3.3.1.3. This variable directly asked about the number of interactions between the operator and other operators over the course of the last year, where it was assumed that more interactions reported indicated that the operator was more involved with outside CWS operators and had effective external linkages. The ordinal variable with breaks of 0 | 1 to 10 | 11 to 20 | 21 to 30 | 31 to 40 | 41 to 50 | 50+ interactions, was transformed to take the median of the interactions bins as the number of interactions during analyses. With an assumed normal distribution this was treated as a 113 continuous variable, which strengthens some of the model diagnostics (Johnson and Creech, 1983; Sullivan and Artino, 2013). Inter-operator interactions and ‘operator type’ need to be included in the models as an interaction* term as the heterogeneity in operator type might show different relationships between interactions based on the ‘operator type.’ The interaction variable, ‘operator type * interactions’, allowed this research to pull out these exact relationships as they pertain to the SDWA violations and represent the managerial capacity. Additional measures for the professionalism and managerial capacity of the operator came from two more variables from the survey: group membership and education. Both of these survey-retrieved variables were discussed in section 3.3.1.3. Education can act as a proxy for managerial and professional capacity as research (Shahr et al., 2019) has found greater levels of education have a socializing effect on members of a profession. Previous research on CWSs has even gone so far to say: “Utilities that are headed by professional engineers violate the SDWA significantly less frequently than do utilities led by nonengineers” (Teodoro, 2014, p. 983). This would indicate that higher levels of education would increase the likelihood of professional engagement and for CWS operators possibly stronger effective external linkages. Educational attainment was obtained by survey question one, which asked the respondent to select their highest level of education using the US census breakdowns of education. This was transformed into a binary variable of bachelor’s degree or higher. If a respondent had obtained a bachelor’s degree or higher then this variable received a 1, otherwise it received a 0. There were 62 operators with a bachelor’s degree or higher, and 189 with less than a bachelor’s degree. Group membership aimed to capture the professional networking opportunities of the operator which help them increase their managerial capacity. Group membership was transformed into a binary variable based on the true/false responses for survey questions eleven and twelve (discussed in 114 section 3.3.1.3). These responses were transformed into a binary variable where a “1” represented at least membership to one water related organization, while “0” represented no membership. There were 206 operators with membership in at least one group, and 45 operators with no group membership. 3.4.4 Technical Capacity Technical capacity is the “physical infrastructure and operational ability (EGLE, 2020, pg. ii)” of the system to comply with Federal and State quality and quantity regulations (Shanaghan et al., 1998). Whereas the managerial capacity picks up on execution of the operations through system organization and administration, the technical capacity is the actual infrastructure (pipes, source water supply, treatment) and the ability to sufficiently provide enough drinking water quantity and quality (EGLE, 2020). The technical capacity data and variables came from two sources: SDWIS 2020 Quarter 4 data and Beecher et al., (2020)’s novel database of CWS function and ownership for the Great Lakes States. All of these variables (both continuous and categorical) were transformed into binary and ordinal variables. One of the key CWSs technical capacity indicators is the source water, where quantity and quality of the source water can impact the delivery of SDWA compliant drinking water (Shanaghan et al., 1998; EGLE, 2020). Source water is important for quantity as there needs to be enough water to meet current and future demands to ensure the source water supply. However, as Josset et al., (2019) found there was not much available data on the quantity of source water, and in Michigan there was no data available about the direct source waters for CWSs. The only information available on the source water comes from SDWIS and breaks down source water into extraction of groundwater or surface water, or through purchasing water from a wholesaler (Tiemann, 2014). While there is no data available on source water quantity, 115 the source water extraction data explains a bit about quality as CWSs purchasing their water are at lower risk for SDWA non-compliance because the water (in most cases) will go through two rounds of treatment, one round with the wholesaler and one round with CWS delivering the water (Rubin, 2013; Tiemann, 2014). Numerous studies have included source water through inclusion of a variable picking up on the CWSs that purchase their water had fewer SDWA violations (Noll, 2002; Wallsten and Kosec, 2008; Balaz et al., 2011; Teodoro, 2014; Switzer et al., 2016; Allaire et al., 2018; Statman-Weil et al., 2020). Using the SDWIS 2020 Q4 data on source water and transforming it into a binary variable, there were 90 systems in the reduced sample that purchased their water and 365 systems in the reduced sample that did not purchase their water. This research uses system size as measured by the population served by the CWS and the function (primary system) as proxy variables to pick up on technical capacity. One of the consistent findings from both researchers (Ottem et al., 2003; Blanchard and Eberle, 2013; Allaire et al., 2018; McDonald and Jones, 2018; Statman-Weil et al., 2020) and regulators (EGLE, 2020) was that small systems (those serving 10,000 or fewer people) had higher rates of SDWA violations than the large systems. In Michigan in 2020, there were 1,072 small CWSs serving 663,936 people, while there were 289 large CWSs serving 6,653,772 people: substantially more small systems served a substantially smaller population than the large systems. The large and small designations had necessarily wide ranges, and to get more equal groupings, this research transformed the continuous variable (population served) into a 3- category ordered factor variable of small systems serving less than 3,300 people (228 systems in sample), medium systems serving between 3,300 and 10,000 people (187 systems in sample), and large systems serving more than 10,000 people (40 systems in sample). Another technical 116 capacity proxy variable was the function of the system: primary or ancillary. A primary system was an entity whose primary function is to provide water as public utility service (example: Lansing Board of Water and Light, Great Lakes Water Authority), while an ancillary system was a system where the financial resources of the entity that owns the system is not water service (example: mobile home park systems, apartment systems) (Beecher, et al., 2020). This is distinctly different from the operator type as this primary/ancillary designation follows the system not the operator; for instance, a Contract operator might run both a small town’s CWS (primary system) and a mobile home park CWS (ancillary) system. While the operator type serves as a proxy for the managerial capacity as it focuses on the human capital, the primary/ancillary variable acts as a technical type as the primary systems are more likely to have physical infrastructure and commitment to technical capacity building, those ancillary systems are often lacking (Grigg, 2018; Beecher et al., 2020). Using the novel Beecher et al., (2020) database of CWS ownership and function in the Great Lake States this research includes the primary/ancillary variable as a binary indicator. This variable is differentiated from the system size variable as there are more primary systems (750 systems serving 7,151,627 people) than ancillary systems (611 systems serving 166,081 people) in the Michigan sample. The research sample reflected this as there were more primary systems (275) then ancillary (180). Including the system size and primary/ancillary variables allows the research to include proxy measures of technical capacity controlled for both system size and the system function. 3.4.5 Financial Capacity Financial Capacity of CWSs refers to the “monetary resources” or the ability of a CWS to accumulate and manage their money in a way that allows for the system to operate over time (Shanaghan et al., 1998; EGLE, 2020). Financial capacity is the most difficult of the TMF 117 capacity measures to obtain data on because there are no nationally collected indicators of financial capacity. Unlike technical and managerial capacity, Michigan EGLE does not require financial capacity assessments of existing systems unless they have shown problems that indicate a lack of technical or managerial capacity or are trying to increase their water rates (EGLE, 2020). EGLE will ask for a list of possible financial indicators from systems (budgets, last two years of audited records, water use and water rate ordinances, latest rate ordinance or resolution, recent rate or feasibility study, and contract or service agreements with outside customers); however, most of the CWSs do not provide any of the information requested (EGLE, 2020). Due to the lack of data, EGLE will use alternative financial capacity indicators of the local economic circumstances, such as median household income, median home value, unemployment, percentage of population below the poverty level; as proxies for the financial capacity (EGLE, 2020). The survey was unable to collect any information about the financial capacity of systems because in many cases the financial aspects of the CWS were not part of the operators’ duties and they would not be able to provide reliable responses to financial questions. For this reason, this research used median household income (MHI), median home value (MHV), and unemployment percentage as three proxy variables to represent financial capacity. Previous research on compliance with US environmental regulation has been linked to the ethnic, racial, and socioeconomic composition of community populations and included median household income, median home value, and unemployment percentage from ACS data (Konisky & Schario 2010; Switzer et al., 2016; Statman-Weil et al., 2020). Further, both researchers and regulators (EGLE, 2021) use these ACS data as a proxy for financial capacity when there is no available direct information as ACS data represent the local socio-economic characteristics is the location. 118 Similarly, all three variables come from the ACS- 5-year estimates for 2016 to 2020. ACS data has numerous limitations as it is an aggregated sample of the population and results from ACS require careful framing (Greiman, 2017; U.S. Census Bureau, 2018). However, this research used the ACS 5-year estimates, which according to the U.S. Census Bureau (2018), are appropriate to use when precision matters more than currency, research is interested in small populations, and examinations occur at the tract level. The 5-year estimates gave a feel for the local economic situation, and many of the CWSs were in areas with small populations, requiring the use of tracts. Any results from the financial capacity indicators derived from the ACS 5-year estimates of 2016 to 2020 were appropriately addressed as possibly incomplete due to error, but with lack of other financial information, this was the best available proxy. These data were collected at the census tract and county level and matched to the system based on the processes described in section 3.4.1. While not a direct measure of financial capacity, this research assumed that these economic measures of the locality served by the CWS picked up on possibly more financially distressed CWSs. However, all results need to be taken with the knowledge of this data assumption. 3.4.6 Natural Advantage The final modeling area was natural advantage, where there may be benefits to SDWA compliance that do not relate to any of the TMF capacity indicators. One of these possible benefits was the local environmental quality, where the assumption that better environmental quality meant that waters were less polluted, making it easier to deliver safe drinking water (Montgomery et al., 2018). The environmental quality (EQ) variable was constructed from the EPA’s ECHO database for Clean Water Act (CWA) violating facilities in 2019 and 2020, then was transformed into the percentage of CWA discharge violating facilities versus total facilities, 119 and then was matched to the systems using the detailed processes discussed section 3.4.1. Rurality was a factor variable with four levels ranging from rural to major city and came from a Michigan Department of Transportation (MDOT) rurality index shapefile. Based on the findings of Marcillo and Krometis (2019), this was an important variable to include as they found rural systems were more likely to have a SDWA violation. While this indicates the urban/rural designation of the county in which each CWS is located, it is not informative about the age of systems, population density, or any of the other factors that might change between urban counties or across metropolitan regions within counties. Future research could address this limitation. The final variable of Peninsula was a binary variable that gave the CWS a “1” if it was in the upper peninsula and a “0” if it was in the lower peninsula. The upper peninsula only houses about 8% of the total CWSs in Michigan and has 0.0071 CWSs per square mile, whereas the lower peninsula houses about 92% of the total systems and has 0.031 CWSs per square mile. Separating out these two ensured that the opportunity for interaction was not being missed in the models. 3.5 Conclusion This chapter documented essential aspects of the study region, the survey and interview data, explained the outside data sources, and which variables represented TMF capacity or natural advantages. The study area of Michigan was shown to offer a great opportunity as the demographic and geographic features limit some of the possible problems. However, the study area choice does have some limitations for the broader applicability of the findings. The survey and interview construction, deployment, results, and representation showed that the data collected was a good representation of the underlying population. The step-by-step walkthrough of obtaining the spatial location and intersections of the location and outside data were 120 explained. The outside data sources were addressed, and how all the variables relate to the broader conceptual models in this research were explained. The next chapter further explains how each of these variables were used in numerous modeling efforts to answer the research questions. 121 CHAPTER 4: Methods 4.1 Introduction This chapter outlines all of the statistical methods used in the investigation of the research hypotheses. Some of the methods test only a single hypothesis while other methods are used to test multiple hypotheses. First the chapter outlines the independence tests, which are used to investigate the endogenous hypotheses that there are differences based on operator type [EN-1], education [EN-2], and professional organization membership [EN-3] for the number of reported inter-operator interactions. These independence tests also address one of the spatial hypotheses [SP-1] that based on operator type there will be differences in which county/s the interactions take place. The next section explains the variogram modeling to test the other spatial hypothesis [SP-2], that reported interactions show spatial autocorrelation. The OLS models used to explore the primary hypotheses [Prim-1, Prim-2], that there are regional advantages in the role of inter- operator interactions and SDWA compliance, are explained. Further exploration of the primary hypotheses through the geographically weighted regression models are addressed and show the spatially explicit models. The chapter then moves to talking about the ordered logistic regression models that are used to assess all the endogenous hypotheses [EN-1, EN-2, EN-3], and the primary hypotheses at the operator-only level [Prim-1, Prim-2]. The generalized-linear mixed model that explores the primary hypotheses [Prim-1, Prim-2] is explained and justified for the use of the more complex statistical techniques. Finally the chapter concludes with a reminder of the content overview of how each statistical technique will be used to address the research hypotheses. 4.2 Kruskal Wallis and X2 tests (Independence Testing) One of the key parts of the endogenous hypotheses [EN-1, EN-2, EN-3] and spatial hypothesis one [SP-1] are the exploration of the differences between the operator traits (location 122 of interactions, organization characteristics, and operator specific characteristics) and the employer type (i.e., Non-Affiliated, Contract, Utility). To explore these differences, first descriptive statistics were calculated to contrast means and standard deviations for the key variables (endogenous hypotheses results 5.2.1, and spatial hypothesis results 5.3.1), as grouped by the operator employer. Descriptive statistics were useful because they provide a summary of the variable values and broadly highlight the differences between groups (Burt, Barber, and Rigby, 2009). Second, inferential modeling was conducted to assess the differences between the type of employer and the different hypothesized co-variates (education and group membership) using two different statistical independence tests based on the data. The first of these tests was the Kruskal-Wallis test, a non-parametric t-test that relaxes the assumptions on normality or particular variable distributions (Burt, Barber, and Rigby, 2009). For the purposes of the investigation of the endogenous hypotheses, it was the most appropriate test to use because there were three groups (two groups would change this to a Mann Whitney U Test), the non-normality in the data, and all the observations were independent of one another (Burt, Barber, and Rigby, 2009). Marcillo and Krometis (2019) used this test to explore the relationship between rurality and CWS SDWA violations in Virginia. Further, the Kruskal- Wallis test had two key extensions to assist in clarifying the relationships: the Scheirer-Ray-Hare test controls for other variables, and the post-hoc Dunn’s test identifies which groups were significantly different. The key in the Kruskal-Wallis test was that it assessed the median values between groups with the 𝐻0 = “population medians are equal or near values” and 𝐻𝑎 = “Population medians are not equal and far apart” (Burt, Barber, and Rigby, 2009). The H statistic in these tests is calculate through Equation A below: 123 12 𝑅12 𝑅22 𝑅32 Equation A 𝐻= ( + + ) − 3(𝑁 + 1) 𝑁(𝑁 + 1) 𝑛1 𝑛2 𝑛3 Where: 𝑅1 = ∑ 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑟𝑎𝑛𝑘𝑠𝑜𝑓𝑁𝑜𝑛 − 𝑎𝑓𝑓𝑖𝑙𝑖𝑎𝑡𝑒𝑑𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠(𝑠𝑎𝑚𝑝𝑙𝑒1) 𝑛1 = 𝑡ℎ𝑒𝑁𝑜𝑛 − 𝑎𝑓𝑓𝑖𝑙𝑖𝑎𝑡𝑒𝑑𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠(𝑠𝑎𝑚𝑝𝑙𝑒1) 𝑅2 = ∑ 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑟𝑎𝑛𝑘𝑠𝑜𝑓𝐶𝑜𝑛𝑡𝑟𝑎𝑐𝑡𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠(𝑠𝑎𝑚𝑝𝑙𝑒2) 𝑛2 = 𝑡ℎ𝑒𝐶𝑜𝑛𝑡𝑟𝑎𝑐𝑡𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠(𝑠𝑎𝑚𝑝𝑙𝑒2) 𝑅3 = ∑ 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑟𝑎𝑛𝑘𝑠𝑜𝑓𝑈𝑡𝑖𝑙𝑖𝑡𝑦𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠(𝑠𝑎𝑚𝑝𝑙𝑒3) 𝑛3 = 𝑡ℎ𝑒𝑈𝑡𝑖𝑙𝑖𝑡𝑦𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠(𝑠𝑎𝑚𝑝𝑙𝑒3) 𝑁 = 𝑛1 + 𝑛2 + 𝑛3 (total sample size) First, all the data regardless of group (operator type) is ranked, and then the average ranking for each of the groups is calculated by dividing by the number of observations in each group. H is the sum of the ranks in each operator type for the number of reported inter-operator interactions, multiplied by 12 over the sample population then subtracted by the sample size plus one multiplied by 3 (number of operator types). If the average ranks in each group are similar, then the H will be closer to zero. The closer the H value is to zero the less difference between the groups, while a higher value will represent greater differences. The results can be tested against a chi-square distribution for the number of groups-1 degrees of freedom (Hoffman, 2015). The Kruskal-Wallis is conducted using the Kruskal.test() function in the Stats R package (R Core Team, 2013), which provides the H value and critical p-values at the 95% confidence level to assess statistical significance. One of the big limitations of the Kruskal-Wallis test is that while it can tell that there are differences between the groups, it does not identify which groups are different from one another. 124 Following the precedent set by Marcillo and Krometis (2019), for further exploration of the results, this research employed the post hoc Dunn test to differentiate which groups are significantly different from one another. The Dunn tests were conducted using the dunnTest() function in the FSA R package. The dunnTest function (Dinno and Dinno, 2017) provides identification of the difference between groups, as well as the p-values to assess statistical significance at the 95% confidence level. For the endogenous hypotheses [EN-2, EN-3], two of the variables of interest, group membership and educational attainment, are binary variables, which rendered the Kruskal-Wallis test inappropriate. Instead, these variables were summarized by each operator group and transformed as percentage of the operator type that fell into the binary options. The chi-square test of independence is a test that looks for an association between two categorical variables (Burt, Barber, and Rigby, 2009). The chi-square test has an 𝐻0= no relationship exists between the categorical variables and 𝐻𝑎 = a relationship exists between the categorical variables (Burt, Barber, and Rigby, 2009). For this research’s purposes the null hypothesis was that the percentages of group membership and education attainment will be no different across Utility, Contract, or Non-Affiliated operators, and the 𝐻𝑎 is there is a difference. Similarly SP-1 has a factored choice (inside/outside the county, inside county only, outside county only, and neither inside nor outside), which helped inform this research to use the chi-square test for independence to assess the hypothesis. Like the Kruskal-Wallis test, this is an important first step in exploring how the employer impacts the operator’s responses in both the endogenous and spatial features. The chi-square tests run the on the group membership and education variables was conducted using the chisq.test() function in the stats R package (R Core Team, 2013). 125 4.3 Variogram Modeling This research employs a semi-variogram to explore spatial hypothesis two [SP-2], that there is spatial autocorrelation of the reported inter-operator interactions. A semi-variogram is a tool that explains the spatial dependence of a random process, where the amount of variance between the Z values (in this case, reported interactions) is evaluated based on the relative location of systems (Haining, 2003). If there is noticeable spatial structure in reported interactions, the variance will be lower at short distances and greater at longer distances (Haining, 2003). The basic semi-variogram equation (Equation B) is: 𝑁(ℎ) 𝐶(ℎ) Equation B 𝛾(ℎ) = ∑ 2𝑁(ℎ) 𝛼=1 Where 𝛾(ℎ) is half the average squared difference of all pairs of observations of the z value separated by the spatial lag h or the distance between observations. It is calculated through summing the squared value of the variable of interest 𝑧(𝑠𝛼 ) (interactions) at spatial location minus the sum of the variable of interest plus the distance lag. At each distance between the points this variance is summed then multiplied by the covariance at the lag divided by two because at large ℎ the semi variance is equal to the true variance within the data. This method allows the research to explore the variance in reported interactions over space, which provides the number of changes between reported interactions at different spatial lags. The research calculated semi-variograms on three different subsets of data: (1) all systems surveyed, (2) Utility and ‘single system’ operators included, and (3) ‘single systems only’ (operator only operates a single system). After cleaning the survey and finding spatial location (discussed in section 3.4.1) there were 253 operators of 537 systems, which were then geocoded based on the addresses from the water system’s official website or EGLE contact data 126 (more information in section 3.4.1). Of the 253 operators, 63 operators ran more than one system with the majority being Contract operators. Most of the systems (349) were run by an operator who supervised at least one other system; consequently some of the closest pairs of z values (interactions) were from the same operator and therefore were identical reported interactions. If CWSs were operated by the same operator, then interactions will not vary much over different distances and Variogram 1 (all systems) is expected to show a stronger spatial trend in variance. Variogram 2 (Utility and single systems) excludes those systems run by Contract operators and represents only the Utility and Non-Affiliated operators, with 277 systems. Based on SP-2, it is assumed CWSs run by these operators will have more spatial variance structure in interactions than Variogram 1; however, the issue of multiple systems from a single operator still existed for Variogram 2 as some of the Utility and Non-Affiliated operators ran more than one CWS. The final variogram (Variogram 3) included 189 single CWSs, where each CWS was the only CWS run by their operator. The three raw values of CWSs location and reported interactions can be seen in the Figure 16 below. Each semi-variogram was constructed using the variogram() function and modeled with the fit.variogram() function from the gstat R package (Pebesma, 2004; Pebesma & Heuvelink, 2016). 127 Figure 16: Maps of Operator Interactions by System and Number of Systems 4.4 OLS Models This research uses ordinary least squares (OLS) models to explore primary hypotheses one and two through modeling the relationships between violations and interactions aggregated to both of the EGLE spatial units: the region and the district. OLS models quantify the linear relationships between changes in values of dependent variable (𝑦) from independent variable/s, with assessment of the error (𝑒) which is the difference between actual and predicted values of the dependent variable (Burt, Barber, & Rigby, 2009). Understanding how the EGLE regions’ or districts’ percentage of systems in violation relate to the number of interactions helps in the investigation of primary hypotheses one and two. The basic OLS equation is seen in Equation C: Equation C: 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝑒 Where 𝑦 is the dependent variable representing the percentage of the types of violations, 𝛽0 is the intercept, 𝛽1 𝑥 is the number of interactions measure, and 𝑒 is the error term or the part of y that 128 cannot be explained by the model. OLS models provide several goodness of fit measures which indicate the quality of the relationship between the interactions variables and the violations variables (basis of primary hypothesis one and two). The first measure is the R 2 value, which is the statistical measure of the goodness of fit of the model and should be between “0” and “1”, where a value of “0” represents a poor fit and a value of “1” represents a perfect fit (Burt, Barber, & Rigby, 2009). It is calculated by dividing the sum of the squared residuals (errors) by the total sum of squares, and then subtracting the product of that from 1 (Burt, Barber, & Rigby, 2009) (more discussion on R2 in section 4.7). Outside of the goodness of fit measures, these simple linear models show the directionality of the relationship between interactions and violations through exploring the 𝛽1 𝑥 values and assessing them through their p-values for statistical significance hypothesis testing. The beta value explains the slope of the line, or how much a change in the value of the independent variable (here an aggregated interactions measure) will change the value of the dependent variable (here violations) (Burt, Barber, & Rigby, 2009). The research’s hypotheses suggest that more interactions would lead to fewer violations, causing the expectation of a negative slope, reflecting that as interactions go up, then the percentage of violations should go down. If the slope is positive, then the opposite of the initial hypotheses would be true—that more interactions between operators increases violations. Whether an effect (or beta value) statistically significant is determined by the p-value, which allows for null hypothesis testing where the two hypotheses are: 129 • 𝐻0 or the null hypothesis: There is no distinguishable linear relationship between the percentage of the spatial unit with violations and aggregated interactions • 𝐻𝑎 or the alternative hypothesis: There is a distinguishable linear relationship between the percentage of the spatial unit with violations and aggregated interactions Failing to reject the null hypothesis (𝐻0 , would mean that the direction and intensity of the interactions variables cannot be distinguished from random, while rejecting the null hypothesis would imply that there is a relationship between the interactions variable and violations. Whether the null hypothesis is rejected or failed to be rejected relies on coefficient p-values and explains the statistical significance. P-values are calculated from the t-statistic which takes the beta coefficient and divides it by the standard error of the coefficient, then based on the sample degrees of freedom (n-1), confidence level, and directionality, the t-statistic is compared to the t- distribution table to determine significance (Burt, Barber, & Rigby, 2009). If a p-value is lower than 0.1, the research is 90% confidence that the changes in y from x fall within a defined range. This research tested significance in the OLS models at the conventional 90%, 95%, and 99% confidence levels for these models. 130 EGLE Regions (12 OLS Models) Dependent Variable Interactions Measure 2020 Percentage of Systems with a Violation ~Survey Only ~Imputed ~Global Interactions 2020 Percentage of Population with a Violation ~Survey Only ~Imputed ~Global Interactions 2020 Percentage of Systems with a Major ~Survey Only Violation ~Imputed ~Global Interactions 2020 Percentage of Population with a Major ~Survey Only Violation ~Imputed ~Global Interactions EGLE District OLS Models (36 OLS Models) Global (All Urban Rural Districts) Dependent Variable Interactions Interactions Measure Interactions Measure Measure 2020 Percentage of Systems with a ~Survey Only ~Survey Only ~Survey Only Violation ~Imputed ~Imputed ~Imputed ~Global ~Global Interactions ~Global Interactions Interactions 2020 Percentage of Population with a ~Survey Only ~Survey Only ~Survey Only Violation ~Imputed ~Imputed ~Imputed ~Global ~Global Interactions ~Global Interactions Interactions 2020 Percentage of Systems with a ~Survey Only ~Survey Only ~Survey Only Major Violation ~Imputed ~Imputed ~Imputed ~Global ~Global Interactions ~Global Interactions Interactions 2020 Percentage of Population with a ~Survey Only ~Survey Only ~Survey Only Major Violation ~Imputed ~Imputed ~Imputed ~Global ~Global Interactions ~Global Interactions Interactions Table 20: Overview of the 48 OLS Models for Michigan EGLE Regions and Districts All of the OLS models and parameters were estimated by using the lm() function in the stats R package (R core team, 2020). Table 20 reviews the variables for the 48 different OLS models reported by the research. There are four dependent variables that were explored: 2020 percentage of systems with violation, 2020 percentage of population served by a system with a violation, 2020 percentage of systems with a major violation, and 2020 percentage of the 131 population served by a system with a major violation. For the Michigan EGLE regions, each of the dependent variables was tested against three of the interaction measures, making a total of 12 OLS models for Michigan EGLE regions. At the EGLE district level the same 12 OLS models were run, but an additional 24 OLS models were run to separate out the urban/rural districts and effects, with 12 models only using rural district data and 12 models only using urban district data. The urban and rural designation for EGLE districts in these models comes from the USDA RUCO designations, which could over- or under-characterize the urbanity or rurality of the region as they were made up of county-level designations. Many counties contain a mix of urban and rural areas, which the OLS models will detect because the county RUCOs make up the district designations and the measure is for the overall county, not the individual place served by particular CWSs. This limitation is discussed further in chapter 6 with regard to the implications of the models. The first aggregated inter-operator interaction measure, the “survey only” measure used the average of the CWS operator survey responses in the spatial unit to represent the average number of interactions in the spatial unit (region or district). This is the most direct of the three measures as it uses data that directly represents the operator; however as seen in section 3.3.1.4 there is heterogeneity in the responses from regions and districts, which puts some of these results at risk of over or underestimating the average interactions in the spatial unit based only on survey responses. The “imputed” interactions measure stemmed from taking the average survey interactions based on operator type (Utility=10.28, Contract=12.17, Non-Affiliated=4.63), and attributed these values to the non-survey respondent operators based on type and combined it with the respondent operator values to get the average for the spatial unit (region or district). The final measure, “global interactions,” gave each CWS (survey respondents or not) the average 132 interactions value based solely on type, then averaged the values for the spatial unit (region or district). While the OLS models for the EGLE regions and districts provided an insight into the relationships between violations and interactions, it was necessary to address the limitations of the method. First, both the regions and districts were relatively small datasets (6 regions and 24 districts), which, due to their low degrees of freedom, produced less reliable results than those from large datasets (Burt, Barber, & Rigby, 2009). Another key issue was that these models are global models, which meant that they included every single observation and did not account for any geographic heterogeneity (outside urban and rural) in the relationships (Brundson, Fotheringham, and Charlton, 1996). Due to this limitation, they could not assess regional advantage, clustering, or the direct role of space in the models. Different, local methods were needed to assess the spatial component of knowledge spillovers and violations. 4.5 Geographically Weighted Regression Models In order to address the regional advantages of interactions (Prim-1), it was essential that those trends were investigated using spatially explicit models. This research used exploratory geographically weighted regression (GWR) models on the district aggregated data where the dependent variables of the different types of SDWA violations’ percentages were regressed on the aggregated number of interactions for neighboring districts around each district (or observation). GWR was a useful exploratory spatial tool as it provided local observations with their own R2, betas, and standard errors (Brundson, Fotheringham, and Charlton, 1996). By having these local regression parameters at each point, GWR was extremely useful in leveraging the spatial structure of the data that would not be picked up in a global regression model, like those explained in section 4.4. However, there were some major caveats to using GWR, 133 including issues of multi-collinearity (Wheeler and Tiefelsdorf, 2005) and multi-testing problems (Paez et al., 2011), which violated the standard regression assumptions; thus, this method should not be used in any inferential manner. GWR works by allowing the parameters to vary at each location instead of having a global regression model fit the data (Brundson, Fotheringham, and Charlton, 1996). The basic process begins with defining the spatial weights matrix (also known as the neighbor’s matrix, proximity matrix, or spatial lag matrix) which can mathematically define how close observations are in geographic space and be used in cross sectional analysis (Bailey and Gatrell, 1995; Brundson, Fotheringham, and Charlton, 1996). The spatial weights matrix is an n-by-n matrix that defines how close each observation is to every other observation. The nearness between observations is defined by a distance decay function applied to the geographic distance between those observations. The distance decay function is essentially a weighting scheme that determines the influence of nearby observations on each other by using some sort of distance measure. The key for appropriate use of the distance decay function is selecting the appropriate bandwidth, which is a distance measure defining the radius around each point within which the distance decay function, or weighting scheme, is to be used. Scholars tend to use a kernel weighting scheme (Brundson, Fotheringhom, and Charlton, 1996; Tu and Xia, 2008; Tu and Tu, 2016), which has the kernel function (symmetrical) travel to each point and using the distance decay function to assign each observation a different weight based on its relative spatial position to the other observations. The choice of the bandwidth for the spatial weights’ matrix is key, because if the research selects too large of a bandwidth, then all the observations will be included, essentially making it no different than a global OLS model and local/regional variation will be disguised (Fotheringhom, Brunsdon, and Charlton, 2003). For zonal data defining the 134 bandwidth and distance decay function (to establish the kernel weighting scheme) is tricky because most often zonal data are census data which are highly irregular shapes differing in sizes. With the irregular shapes of the data, the most appropriate kernel choice is based around shared borders between neighboring observations, as the measure of nearness and the distance decay is based on the first order, second order, third order, etc. neighbors sharing borders (Fotheringham, Brunsdon, and Charlton, 2003). Equation D shows the basic GWR model: Equation D: 𝑦𝑖 = 𝐵0𝑖 (𝑢𝑖 ) + 𝛽1𝑖 (𝑢𝑖 )𝑥 + 𝑒𝑖 (𝑢𝑖 ) Where i is a Michigan EGLE district, and 𝑢𝑖 is the spatial weights matrix specifically for i depending on the choice of the kernel, the bandwidth, and the distance to all other districts from the ith district. 𝑦 is the dependent variable representing the percentage of the district’s systems or population with at least one of the types SDWA violation in 2020. The research used GWR to explore the same four dependent variables as the OLS models, but only at the district levels because eight regions contained too few observations to explore spatial heterogeneity. Two different independent variables (𝛽1 were explored in different models: “survey only” aggregated average interactions and “imputed” interactions. This added up to a total of eight base models prior to calibrating key GWR parameters. Two key model parameter choices that need to be clearly defined by the research using GWR are the kernel type and the bandwidth. Since EGLE districts are clearly irregularly sized zones (Figure 11) and distance measures would limit the number of neighbors (especially in the Upper Peninsula), an adaptive bandwidth with a fixed number of neighbors was selected. Using the gwr.bandwidth() function in the GWmodel package in R (Gollini et al., 2013; Lu et al., 2014), the optimal bandwidth for the model was identified as 22 neighbors. This is a very high number 135 as it implies that 22 out the 24 districts need to be included in each of the local regression models, and even with low weights to the distant neighbors, the model would imply these distant neighbors have some impact on the local values. Due to the large number here, the research reports back on two different bandwidths, 13 and 22 in two model names GWR-13 and GWR-22. The 13 neighbors’ bandwidth represents the center point between the 22 neighbors “optimal bandwidth” and the average number of districts’ first order bordering neighbors (4.5 neighbors). There should be some consistency in patterns in results between the two models and consistent results indicates the robustness of the models. The GWmodel R package includes six possible kernel types (Global, Gaussian, Exponential, Bisquare, Tricube, and Boxcar) as shown in Figure 17 from Lu et al., (2014). The research developed all the GWR models using all the kernel types but only reports the best kernel choice based on a combination of theory and number of significant betas. Knowledge spillovers happen at nearby geographic locations (Rosenthal and Strange, 2004), which for the kernel and a bandwidth of 22 means that the optimal kernel needs a fairly short distance decay function to minimize the impact of faraway neighbors. The global kernel would not make sense as there is no distance decay and the only difference between the in the GWR-22 models and the OLS would be to not include two of the 24 observations, which would show little about regional trends in the relationships between interactions and SDWA compliance. The Gaussian and Exponential kernel options are also poor choices, as every district in the “neighborhood” still gets some weight to the model and the distance decay function has a slower falling slope. The Box-Car kernel runs into a similar problem as the global where all the included neighborhood and observation locations have the same weight in the model, which would tell little about the regional variations in the relationships. The Bisquare and Tricube kernels are appropriate 136 choices because they both have a steep distance decay function that would eliminate the impact of distant neighbors in the GWR-22 models, which provides a more localized model. The research tested both of these kernels and chose to report the results of the analysis with the Tricube kernel due to the greater number of significant betas. Figure 17: Kernel Types from Lu et al., (2014) There are two main issues that need to be addressed in GWR as the process has been shown to often violate two underlying assumptions of hypothesis testing in basic linear regression: (1) independence of tests (Paez et al., 2011), and (2) independence of independent variables (i.e., no multi-collinearity) (Wheeler and Tiefelsdorf, 2005). The latter issue has been discussed extensively in prior literature (Wheeler and Tiefelsdorf, 2005; Wheeler and Paez, 2010) and has been one of the biggest critiques of GWR. However the models used in this research employed only a single independent variable in any given model, ensuring no possibility of multi-collinearity among multiple independent variables. Like the OLS models, 137 there was hypothesis testing of the betas for the GWR models, where the null hypothesis and alternative hypothesis had the same meaning as in the OLS models and GWR models provided the same basic output/diagnostics as in the OLS models (Fotheringhom, Brunsdon, and Charlton, 2003). However, GWR violates one of the fundamental assumptions of OLS models, the multiple comparison problem. The multiple-comparison problem occurs when multiple of the same tests and models are run on a single dataset, resulting in an increase in the likelihood of having false significance due to type I (false positive) errors (Paez et al., 2011). For the simple OLS models, this was not a problem as each test used either different dependent or independent variables, which did not violate the assumption (Burt, Barber, & Rigby, 2009). However, this was a big problem in GWR where there were as many tests as there were observations, because local regressions were run for each observation (Paez et al., 2011). To avoid the type I error, the conventional methods have changed the alpha (α) value chosen for significance (Paez et al., 2011). In OLS models that violated this assumption, one of the key p-value correction methods employed was the Bonferroni correction which divided the chosen alpha value by the number of tests (Bryne et al., 2009; Paez et al., 2011). This reduced the error rate by making it more difficult to get significance with a larger alpha value (Bryne et al., 2009). GWR is a little more complicated than the traditional OLS; however Bryne et al. (2009) created the Fotheringham and Bryne adjustment to the p-value which divides the alpha value by the number of parameters and the number of tests, which is a more formal way to reduce the chances of type I errors. This research reported the significance based on the Fotheringham and Bryne adjustment for the GWR models calculated by the gwr.t.adjust() function from the GWmodel R package (Gollini et al., 2013; Lu et al., 2014). 138 Section 5.4.3 presents the results of this regional investigation into the EGLE districts relationships between interactions and SDWA violations by showing the range of the local R 2 and beta values for the Tricube kernels for four different dependent variable models that had the highest number of significant betas. Further, it maps out the local betas and R 2 to show the statistically significant regional trends in these relationships. While this is an exploratory not confirmatory method, it does provide some insight into possible regional advantages of knowledge spillovers based on the pattern and direction of the relationships between system violations and operator interactions, which is a key component of primary-1. 4.6 Ordered Logistic Regression Models To investigate the operator-specific level in the endogenous hypotheses (EN-1, EN-2, EN-3) and the primary hypotheses (Prim-1, Prim-2), this research developed Ordered (Ordinal) Logistic Regression (OLR) models. The focus of the endogenous hypotheses was what operator specific factors showed a relationship to the number of reported interactions with other operators, while the focus of the primary hypotheses was what relationship exists between the operator’s percentage of CWSs they operate with a violation and the number of reported interactions while accounting for operator specific factors. The ordered logistic regression models provided coefficients and odds ratios characterizing the magnitude and directionality of the relationships between the independent variables (operator specific information) and dependent variables of either the percentage of CWSs with a violation managed by the operator or the number of interactions. These models investigating the endogenous and primary hypotheses only included the operator-specific data. The key dependent variables in the models were ordered factor variables that categorize the number of reported interactions for the endogenous hypotheses models and categorized 139 percentage of systems with a violation per operator for the primary hypotheses models. Here, Equation E describes the probability that the data fall into the response “interactions” category, while Equation F describes the probability that the data fall into the response “violations” category. 𝑃(𝑌 ≤ 𝑗) is the cumulative probability of Y being less than or equal to a specific category j=1,…, J-1, using the cumulative logit model: 𝑃(𝑌 ≤ 𝑗) 𝑙𝑜𝑔 = 𝑙𝑜𝑔𝑖𝑡(𝑃(𝑌 ≤ 𝑗)) = 𝑃(𝑌 > 𝑗) Equation E: 𝛼0 + 𝛼1 + 𝛼2 + 𝛼3 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 + 𝛽2 ∗ 𝐺𝑟𝑜𝑢𝑝𝑚𝑒𝑚𝑏𝑒𝑟𝑠ℎ𝑖𝑝 + 𝛽3 ∗ 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽4 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚𝑠𝑖 + 𝛽5 ∗ 𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒𝑖 + 𝛽6 (EN-1, EN- ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠𝑖 + 𝛽7 ∗ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 + 𝛽8 2, EN-3) ∗ 𝐸𝑎𝑟𝑛𝑒𝑑𝑅𝑒𝑐𝑒𝑟𝑡𝑖𝑓𝑖𝑎𝑐𝑡𝑖𝑜𝑛𝐻𝑜𝑢𝑟𝑠𝑖 + 𝛽9 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒𝑖 + 𝛽10 ∗ 𝑈𝑠𝑒𝑓𝑢𝑙𝑛𝑒𝑠𝑠𝑖 + 𝛽11 ∗ 𝑀𝑒𝑒𝑡𝑖𝑛𝑔𝐻𝑜𝑢𝑟𝑠𝑖 𝑃(𝑌 ≤ 𝑗) 𝑙𝑜𝑔 = 𝑙𝑜𝑔𝑖𝑡(𝑃(𝑌 ≤ 𝑗)) = 𝑃(𝑌 > 𝑗) 𝛼0 + 𝛼1 + 𝛼2 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 + 𝛽2 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽3 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛 ∈ Equation F: 2019 + 𝛽4 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠𝑖 ) + 𝛽5 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚𝑠𝑖 + 𝛽6 ∗ (Prim-1, 𝐶𝑒𝑟𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑙𝑒𝑛𝑔𝑡ℎ𝑖 + 𝛽7 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠𝑖 + 𝛽8 ∗ 𝑂𝑡ℎ𝑒𝑟𝑜𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠 + 𝛽9 ∗ Prim-2) 𝑈𝑠𝑒𝑓𝑢𝑙𝑛𝑒𝑠𝑠𝑖 + 𝛽10 ∗ 𝐺𝑟𝑜𝑢𝑝𝑚𝑒𝑚𝑏𝑒𝑟𝑠ℎ𝑖𝑝 + 𝛽11 ∗ 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽12 ∗ 𝑀𝑒𝑒𝑡𝑖𝑛𝑔𝐻𝑜𝑢𝑟𝑠𝑖 +𝛽13 ∗ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 + 𝛽14 ∗ 𝐶𝐸𝐶 + 𝛽15 ∗ 𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 Where in Equation E: 𝛼0 through 𝛼3 are the cumulative odds of a response for “1 to 10 interactions” (factor 1) and below, “11 to 20 interactions” (factor 2) and below, “21 to 30 interactions” (factor 3) and below, and “30+ interactions” (factor 4) and below, respectively. In contrast, in Equation F: 𝛼0 through 𝛼2 are the cumulative odds of a response for “operators with 0% of systems with a violation” (factor 1), “operators with at least one but not all systems in violation (0.01% to 99.99% of systems)” (factor 2) and below, and “100% of systems with a violation” (factor 3) and below. The 𝛽1-𝛽11 (𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛𝐶) and 𝛽1 -𝛽12 (𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛𝐷) terms are described in detail in Appendix 2: Table 42. The interpretation is that 𝛽 equals the cumulative 140 log odds ratio for a one-unit increase in the predictor (an increase in dependent variable groups). The estimated probability of any specific response category j can be obtained through subtractions by: Equation G: ^(𝑦𝑖 = 𝑗) = 𝑃 𝑃 ^(𝑦𝑖 ≤ 𝑗) − 𝑃 ^(𝑦𝑖 ≤ 𝑗 − 1) To investigate the endogenous hypotheses model, this research performed six different models (Appendix 3: Table 43). Endogenous Model 1 All (EN.M1.All) included all the possible covariates based on the interactions survey data. Endogenous Model 2 Best (EN.M2.Best) was made up of the reduced the number of independent variables based on the results of the stepAIC() function from the MASS R package (Ripley et al., 2002) on the EN.M1.All model. The stepAIC() function performed a stepwise model on the EN.M1.All by using every possible combination of variables to find the model with the lowest AIC value (Venables and Ripley, 2002). The lowest AIC value was 497.61 for the EN.M1.All only included the independent variables: the number of other operators at their CWS/s, operator type, hours in professional water meetings, and perceived usefulness of interactions. The Endogenous Variables Only Model 3 (EN.M3.ENall) only included the three key independent variables that directly related to the endogenous hypotheses (type of operator, group membership, and educational attainment). The Endogenous Models 4-6 (EN.M4.Type| EN.M5.Group | EN.M6.EDU ) compared one of the independent variables in the endogenous hypotheses with their sole impact on the number of reported interactions. All of these endogenous hypotheses’ models were estimated using the brm() function from the BRMS R package (Bürkner, 2017; 2018). The primary hypothesis only were explored using two ordered logistic models. The first was the full model shown in the Equation F that included all 15 independent variables 141 (OP.OL.All). The second model reduced the number of independent variables based on the results of the stepAIC() function on the OP.OL.All model from the MASS R package (Ripley et al., 2002) on the full model. Here the reduced or best model (OP.OL.Reduced) dropped down to four independent variables (Operator Type, Interactions, Operator Type * Interactions, 2019 Violation), and dropped the AIC from 343.41 in the OP.OL.All to 330.28 in the OP.OL.Reduced model. Both models were estimated using the brm() function from the BRMS R package (Bürkner, 2017; 2018). There are four main assumptions to the ordered logistic regression model: (1) the dependent variable is ordinal, (2) one or more of the independent variables are continuous/ categorical/ordinal, (3) no multi-collinearity between independent variables exists, and (4) proportional odds assumption is met (Norris et al., 2006). The first two assumptions were met by all the models as the dependent variables were ordinal variables and there were numerous independent variables that fell into the continuous/categorical/ordinal designations (outlined in Appendix 2: Table 42). The third assumption of no multi-collinearity was tested using the VIF() function in the regclass R library (Petrie, 2020) on the independent variables. Collinearity between independent variables makes it nearly impossible to determine the relationship between one predictor and the dependent variable independently of the other variables, which skews the interpretation of the model (Frost, 2017). Multi-collinearity can take two forms: (1) data multi-collinearity and (2) structural multi-collinearity (Frost, 2017). Data multi-collinearity is when the data is highly related to each other, and this is the type of multi-collinearity to avoid, while the structural multi- collinearity is based around the use of the variable in the model and not the underlying data 142 (Frost, 2017). A common measure of multi-collinearity is the Variance-Inflation-Factor (VIF) of each continuous independent variable where higher VIF values indicate high correlations and low values near 1 represent low collinearity (Fox and Monette, 1992). The VIF only works for continuous variables and an extension Generalized-Variance-Inflation-Factor (GVIF) is used when the model includes categorical or ordinal independent variables (Fox and Monette, 1992). The GVIF equation takes the VIF value and controls for the number of degrees of freedom, providing details about the collinearity between the non-continuous independent variables (Fox and Monette, 1996). For the GVIF there are variety of thresholds to determine if collinearity will be a problem in the model: scholars range setting the threshold to GVIF > 10 (Vittinghoff et al., 2012) to GVIF > 5 (Menard, 2001) to GVIF > 2.5 (Johnston et al., 2018). Tables of VIF and GVF values can be found in Appendix A: Table 46 and Appendix B: Table 47. For the six endogenous hypotheses models none of the GVIF values were above the 2.5 threshold, indicating that these models do not violate the multi-collinearity assumption. While in the two primary hypothesis models (OP.OL.All, OP.OL.Reduced), only two of the variables had GVIF values above 1.5, interactions (6.94) and interactions*operator type (2.64). However, this was not unexpected as interaction terms between independent variables were more likely to have high collinearity. This was not considered problematic, as the bigger model mis-interpretation risk is the unknown multi-collinearity (Fox and Monette, 1992). The final assumption of proportional odds (also known as parallel regression assumption) is that the slopes between each independent variable and in each group of the dependent variable are the same (Brandt, 1990; Norris et al., 2006; Liu, 2009; Harrell, 2015). If the slopes differ between the response variable’s groups then there are different coefficients, which means that one would need multiple models to assess the relationship between the independent variables and 143 the different levels of the ordinal dependent variable (Norris et al., 2006; Liu, 2009). Brandt (1990) developed the “Brandt Test” implemented in the Brandt package in R (Schlegel and Steenbergen, 2020) for ordinal logistic regression models to test the validity of the proportional odds assumption. The test examines separate fits for each of the dependent variable’s factor groupings, and then performs a chi-square test to compare the slopes (Brandt, 1990; Liu, 2009; Harrell, 2015). For each model (endogenous and primary), the Brandt test was conducted, and the ordinal logistic regression results include the results of the Brandt test. Model parameters were estimated in a Bayesian framework using the BRMS R package (Bürkner, 2017; 2018). The Bayesian framework offers several advantages for this research over traditional frequentist approaches (e.g., maximum likelihood estimation). The key difference is how Bayesian analysis captures and utilizes the full posterior distribution, while the frequentist approach only provides a single point estimate (Kery, 2010). Thus, an advantage of the Bayesian approach over the frequentist approach is the intuitiveness of the probability statement. The probability statement in the frequentist approach corresponds to the frequency of the observed outcome given an infinite number of hypothetical datasets, and this is counterintuitive because only one dataset was observed (Kery, 2010). Conversely, because Bayesian analyses the full range of the posterior distribution for each parameter, the model provides a credible interval that is simply the probable range of the parameter values given the observed data (Kery, 2010). In this way, the research can be 95% confident that the true parameter value lies between the given range of values (credible intervals), as opposed to the frequentist statement that would be interpreted as the true parameter value lies between the 95% confidence intervals in 95% percent of the trials (Kery, 2010). In other words, the Bayesian probability statement is easier to interpret compared to the frequentist approach. In the context of this research, Bayesian 144 estimation is used to intuitively describe the probability of a shift in violation category as a function of changes in the independent variables. The BRMS R package allows the research to specify code to estimate parameters by Markov Chain Monte Carlo (MCMC) using program STAN for the cumulative logit models (Bürkner, 2017; 2018). To run the models, the research used noninformative priors for all parameters. It ran four chains for 5,000 iterations after a burn- in period of 1,000 iterations and selected every other sample (thinning = 2) for the posterior distributions; thus, we had a total of 10,000 sampled iterations across the four chains (2500 per chain). The model diagnostics for a Bayesian analysis can be explored by assessing model convergence. One key consideration/ issue for the MCMC simulations is whether the chains have converged and are sampling from a target distribution (Clark, 2018; Smeets and Schoot, 2019). The chains converging is important because it ensures that the MCMC iterations are effectively exploring the parameter space and converging towards one target (Clark, 2018). A common way that modelers have tried to explore this is through looking at “trace plots” and “density plots” of the distribution of the parameters (Clark, 2018; Smeets and Schoot, 2019). Trace plots showing chain convergence or mixing with each other should make it difficult to identify any single chain on the plot (e.g., a “fuzzy” appearance) (Clark, 2018). The density plots should appear to have a normal distribution with single modal peaks (Clark, 2018). Appendix G shows the plots for the 4 chains/ 5,000 iterations / 1,000 burn-in period parameters used in the research and compares it to 2 chains/ 100 iterations / 10 burn period; through this example it is shown the models in this research meet the convergence requirements. 145 The research compared and assessed each of the models using conventional diagnostics metrics of Bayesian R2 and Bayesian Information Criterion (BIC). In frequentist analyses, the R 2 diagnostic explains the amount of variance in a dependent variable that is captured by independent variables in the regression model (Kery, 2010). Equation H shows the traditional R2 diagnostic. 𝑁 Equation H: 𝑉𝑛=1 𝑦 ^𝑛 𝑆𝑆𝑅𝐸𝑆 𝑅2 = 1− 𝑁 =1− 𝑉𝑛=1 𝑦𝑛 𝑆𝑆𝑇𝑂𝑇 SSRES is the residual sum of squared errors of the regression model and while the SSTOT is the total sum of squared errors. Through dividing the residual summed squared errors by the total squared errors, then subtracting that value from 1, the R2 value will be between 0 and 1; a value of 1 indicates the model perfectly fit the data, while a value of 0 indicates the model did not fit the data at all (Burt, Barber, and Rigby, 2009). The traditional R2 value is problematic when it comes to ordinal models because the dependent variable is not continuous, and researchers have tried several different techniques to assess the model fits, referring to the resulting measures as “pseudo- R2” (Gelman et al., 2019). The pseudo R2 diagnostics are an issue in a Bayesian framework for two reasons: (1) strong prior information and weak data can cause the SSRES (fitted variance) to be larger than the SSTOT (total variance) causing the value to be above 1, ruining the interpretation of the value, and (2) the Bayesian framework aims to capture the amount of uncertainty within the coefficients to eliminate problems of overfitting in the least-squares framework (Gelman et al., 2019). To correct for it, Geldman et al. (2014) suggests using an alternative R 2 value that divides the 146 variance of predicted values by the variance of the predicted values plus the expected variance of errors. Equation I for the Bayesian R2 appears below: 𝑁 Equation I: 𝑉𝑛=1 𝑦 ^𝑛 𝑆𝑆𝑅𝐸𝑆 𝐵𝑎𝑦𝑒𝑠𝑖𝑎𝑛𝑅𝑠2 = 1 − 𝑁 𝑁 =1− 𝑉𝑛=1 𝑦 ^𝑛 + 𝑉𝑛=1 𝑦𝑛 𝑆𝑆𝑅𝐸𝑆 + 𝑆𝑆𝑇𝑂𝑇 Through the addition of the fitted variance to the denominator the error of a possible R 2 above 1 is eliminated and the value will fall between 0 and 1. This research will use the bayes_R2() function in the BRMS package (Bürkner, 2017; 2018) to analyze the results (Geldman et al., 2019). One of the limitations of the Bayesian R2 in assessment of a cumulative ordinal regression model is values are lower than in typical standard linear regressions (Geldman et al., 2019). This is a more important diagnostic for the endogenous hypotheses models as there were six different models, meaning comparison of best fit could identify better performing models. It was less important for the primary hypothesis models as this research is only running two models (full and reduced) making the R2 not as important as when research compares multiple models. The final model diagnostic metric employed by this research was the Bayesian Information Criterion (BIC). Akaike Information Criterion (AIC) is commonly used metric to compare the quality of different statistical models to each other. The AIC measure uses the log likelihood evaluated at the maximum likelihood estimate of the parameters and applies a penalty based on the number of parameters included in each model (Burt, Barber, and Rigby, 2009). The basic AIC model is Equation J below: 147 Equation J: 𝐴𝐼𝐶 = −2𝑙𝑛(𝐿𝑖 ) + 2𝑘𝑖 Where: L = likelihood k= number of parameters. For AIC models with a high log-likelihood (well-fitting model) the value will be low, and for models with low log-likelihood the value will be low (Burt, Barber, and Rigby, 2009). AIC simply compares among a set of models, where a lower AIC value indicates a better fitting model. The BIC provides similar information about the best model and is interpreted in the same way (lower value means better model). However BIC is different from AIC in how it penalizes the number of parameters (k), where instead of multiplying it by two and risk overfitting the model, it is now multiplied by the log of n (number of observations). BIC calculations can be seen in Equation K below. 𝐵𝐼𝐶 = −2𝑙𝑛(𝐿𝑖 ) + 𝑙𝑛(𝑛)𝑘𝑖 Equation K: L = likelihood k= number of parameters n= number of observations The change in the penalizing between AIC and BIC allows for BIC to simultaneously adjust for both the number of observations and the number of parameters. With a large n, the value will be higher than 2 and give a greater penalty based on the number of parameters (Clyde et al., 2020) This is a useful diagnostic tool for models as R2 values will typically inflate with an increased number of parameters (Clyde et al., 2020). This inflated R2 value can lead to false inferences and using the secondary diagnostic of fit is useful (Clyde et al., 2020). This research conducted 148 its ordered logistic regression models in a Bayesian framework and used the BIC to ensure that neither the n value nor the k number of parameters overfitted the model. The end result of the ordinal logistic regression models should be an opportunity to understand more about how operator specific characteristics relate to reported interactions (endogenous hypotheses), and how number of interactions relates to the percentage of systems in violation at the operator level (primary hypotheses). The direction of the beta values and the confidence intervals will provide a better understanding of the relationship only at the operator scale. This provides some answers, but there are system-level factors that need to be explored and accounted for in order to comprehensively address the main hypotheses. 4.7 Generalized Linear Mixed-Model (GLMM) The final methods employed integrate both the system-level and operator-level information to better understand the relationships between SDWA compliance and knowledge spillovers and transfers (the primary hypotheses). Working with more than just the operator data, the methods explained in detail here account for variation in TMF capacity, along with some of the natural advantages. Through accounting for all these key features, the previously outlined conceptual model was put to the test. This section will first outline all of the data used in the models, then explain why, based on the data and tests, a generalized linear model (GLM) was not appropriate for the exploration of the hypotheses and a generalized linear mixed model (GLMM) was necessary. Then it will walk through the GLMM explaining how it works, the assumptions, diagnostics, and interpretation. Table 21 outlines all the 15 different variables used in the investigation of the main hypotheses that connect the system and operator level variables of TMF capacity, natural advantage, and knowledge spillovers/transfers to SDWA compliance. These data were collected 149 from multiple sources: dissertation survey of Michigan CWS operators, a July 2019 FOIA of EGLE (conducted by the researcher), MDOT, ACS 5 year-estimates for 2014 to 2019, EPA’s ECHO, and SDWIS 2020 Quarter 4 data. All of the continuous variables were transformed to be mean centered variables using the scale() function from Base R library (R Core Team, 2020). 150 Conceptual Model Variable Type Transformations Year Source ID 0 = No 2020 Violation Any Violations Binary 2020 SDWIS/ ECHO 1= Any 2020 Violation 0 = No (non-health)2020 Non-Health Performance Binary Violation 2020 SDWIS/ ECHO Violations 1= non-Health 2020 Violation 0 = No Major 2020 Violation Major Violations Binary 2020 SDWIS/ ECHO 1= Major 2020 Violation Binary 0 = Does Not Purchase Water Source Water 2020 SDWIS (Categorical) 1 = Purchased Water Binary 0 = Not a Primary System Primary System 2020 IPU Database (Categorical) 1 = Primary System Technical Capacity Small = System Serving Less than 3,300 people Ordinal Medium = System Serving System Size (Ordered 2020 SDWIS between 3,301 and 10,000 people Factor) Large = System Serving more than 10,000 people 0 = No Violation in 2019 Violation 2019 Binary 2019 SDWIS / ECHO 1= Violation in 2019 (3 Levels) Non-Affiliated Michigan EGLE Operator Type Categorical 2019 Utility FOIA Contract Operator (4 levels) Managerial No interactions Capacity Interactions Ordered Factor 1 to 10 interactions 2019/2020 Survey 10 to 20 interactions 20+ interactions (2 Levels) Group 0 = No Group membership Binary 2019/2020 Survey Membership 1 = Member of any professional Group Operator Type * Interaction Michigan EGLE Mean Centered 2019/2020 Interactions Continuous and Survey American Median Home 2014 to Community Continuous Mean Centered Value 2018 Survey 5-year estimates American Financial Capacity Median Household 2014 to Community Continuous Mean Centered Income 2018 Survey 5-year estimates American 1) Calculating the unemployment 2014 to Community Unemployment Continuous rate 2018 Survey 5-year 2) Mean center the percentage estimates Table 21: Overview of Variables for the System and Operator Level Models 151 Table 21 (cont’d) Intersecting the CWA Violations for 19/20 with Michigan Census Environmental Tracts. Then adding up for each Continuous 2019/2020 EPA ECHO Quality system. 1) Percentage of CWA violations 2) Mean center Intersecting Point Shapefile of Natural Advantage Systems to MDOT Urban Census Layer (4 levels) Rurality Ordered Factor 2018 Codes MDOT Rural Code 1= Most Rural Rural Code 2= Second most rural Rural Code 3 = Urban Rural Code 4 = Major City Upper or Lower Binary 0 = Lower Peninsula 2020 County Location Peninsula (Categorical) 1 = Upper Peninsula *These data dropped two operators due to the uniqueness of these operators (one based on being a temporary circuit rider, and the other who ran over 80 CWSs due to differences, to give a total of 455 CWSs run by 251 operators. For investigation of the main hypotheses this research explored the factors most greatly associated with SDWA compliance or non-compliance using a GLM framework. Previous research (Switzer and Teodoro, 2017; Allaire et al., 2018; McDonald and Jones, 2018) on SDWA compliance has used logistic regression in GLM frameworks because of the relatively skewed distribution of SDWA violations. In Equation L below let X represent a 2020 SDWA compliance violation with a binary value of “1” (violation) or “0” (no violation). 𝑒 𝐵0 +𝐵1𝑋 𝜌(𝑋𝑗 ) = 1 + 𝑒 𝐵0 +𝐵1𝑋 Equation L: 𝑛 𝑝(𝑋𝑗 ) 𝑙𝑜𝑔 ( ) = 𝛽0 + ∑(𝛽𝑖 𝑋𝑖 ) + 𝜀 1 − 𝑝(𝑋𝑗 ) 𝑖=1 𝑖= independent variables in the technical, managerial, financial, or natural advantage categories. j= to a binary indicator of if the water system had any type of violation in 2020, major violation 2020, or a non-health violation in 2020 152 𝑝(𝑋 ) Where the (1−𝑝(𝑋𝑗 )) is the odds ratio that can only be between 0 and ∞ , where an odds ratio 𝑗 closer to 0 would indicate a very low probability of 𝑝(𝑋𝑗 ), while the larger the number the higher 𝑝(𝑋 ) probability of 𝑝(𝑋𝑗 ). By taking the logarithm, 𝑙𝑜𝑔 (1−𝑝(𝑋𝑗 )) is placed on the logit scale, which 𝑗 allows interpretation of the relationships between dependent variable (compliance/non- compliance) and the independent variables. There were three models, each with a different binary dependent variable (performance) which modeled whether the system had 1) any violation, 2) a major violation, or 3) a non-health related violation. Each of these dependent variables is assessed and compared to their relationships to the i groupings of variables that referred to the technical, managerial, financial, and natural advantages of the systems seen in Table 21. If the beta 𝛽𝑖 𝑋𝑖 for an independent variable was positive then that indicated that the probability of a violation was positive with increases in the value of the independent variable, while if the beta was negative that indicated a lower probability of a violation as the independent variable’s value increased. The binomial models were run using the glm() function from the stats v3.6.2 R package (R Core Team, 2020) by specifying the family argument as binomial and the link as “logit.” Similar to the ordered logistic regression models, this research selected three more binary logistic regression models obtained by using the stepAIC() function from the MASS R package (Ripley et al., 2002) on each of the full models. All of the models can be seen in Table 22. Binomial logistic regression models have five main assumptions that need to be met in order to trust the inferences of the model (Kassambara, 2018). The first assumption is: “Independence of the response variable and the response is dichotomous.” This research used three different dependent variables of violations: any SDWA violation in 2020, major SDWA 153 violation in 2020, and non-health related SDWA violation in 2020. Each of these dependent variables was binary (1 for violation and 0 for no violation), which fit the assumption of the dichotomous response variable. Further, the assumption of independence was met here because SDWA violations were given at the CWS level and not based around any shared feature/s. The second assumption of the binomial logistic regression models is: “no influential (extreme outliers and value) in the continuous predictors” (Kassambara, 2018). The influence of outliers and extreme values was explored using ‘Cook’s-distance’ and standardized residual plots. Cook’s distance is a commonly used to estimate the influence of a single data point in regression modeling (Kassambara, 2018). Through using the plot() function on the glm() output, both the standardized residual plots and Cook’s distance of points was explored. If the standardized residuals showed no trend and Cook’s distance values were below the traditional cut off of 0.25 (Burt, Barber, and Rigby, 2009), then the models were determined to meet these assumptions. The third assumption “No high intercorrelations/ multi-collinearity between predictors” was assessed for every model considered for this analysis (Kassambara, 2018). For the models in this research, the interaction terms (specifically number of interactions * type of operator) will have structural multi-collinearity as it relates to both the individual interactions variable and the operator type. To avoid the data multi-collinearity, each model was assessed similarly to the operator only ordinal logistic regression models (endogenous and main hypotheses) using the VIF() function in regclass package (Petrie, 2020) for R all the variables (with the exception of the interaction term) evaluated the variables for a GVIF of less than 2.5 to ensure no multi- collinearity existed between terms (Johnston et al., 2018). The results of the VIF tests were reported for the independent variables are in Appendix 6.3: Table 48 and there was only the known structural multi-collinearity between two groups of variables: (1) interactions [7.27], 154 operator type [1.54], and interactions*operator type [2.76], and (2) for the variables of region educational attainment [2.25], median home value [2.72], and median household income [2.32]. The (1) first group was known structural multi-collinearity as the operator type * interactions term was going to show collinearity as it was derived from two of the other independent variables. The (2) second group was more concerning as median household income, median home value, and educational attainment were collinear; therefore the research used the StepVIF() function from the pedometrics R package (Samuel-Rosa, 2020) to remove all the high data-collinearity variables impacting the model for the ‘best models’. Table 22 below outlines all the assumptions and how they fit with the six models. Model Dependent Linear Influential Residuals No Multi- Variable Relationship Values (Largest (mean) collinearity between IV Cook’s Standardized among and DV? Distance) plot Independent variables? Prim.Any.All Binary 2020 No (0.11) (0.026) No Any Violation Prim.Any.Reduced N/A (No (0.20) (-0.008) Yes continuous Variables) Prim.NH.All Binary 2020 No (0.12) (0.016) No Violation Non-Health Prim.NH.Reduced Violation None (0.21) (-0.13) Yes Prim.MAJ.All Binary 2020 No (0.17) (0.176) No Major Violation Prim.MAJ.Reduced No (0.24) (-0.018) Yes Table 22: Overview of All GLM Models The final key assumption for the binomial logistic regression model is the “independence of the observations” (Kassambara, 2018). This assumption was not met by any of the logistic binary regression models because all the operator-specific data that stemmed from the survey related to multiple observations at the system level. There are two levels to the data that were 155 collected: the system level and the operator level. For instance, the variables of group membership or number of interactions in the last year for some systems were not independent because there were 65 operators that ran more than one system, which meant these independent variables were a repeated for systems run by the same operator. Figure 18 shows these two levels and the possibilities of operators running single or multiple systems. Figure 18: Multiple Levels of CWSs and Operations One way to combat this is to use a generalized linear mixed model (GLMM), whereby the model partitions the variance into random effect terms in addition to the residual variance, instead of using a fixed-effects GLM (McCulloch and Neuhaus, 2014). The fundamental assumption in the GLM is that the sole source of the variance arises from the random sample used to measure the relationships (McCulloch and Neuhaus, 2014). If the predictor variables were not independent of each other, then there is less variation in the sample than would be expected if all samples were independent, which increases the likelihood of committing a Type II error. (McCulloch and Neuhaus, 2014). When data have a nested structure, as is the case for this research’s investigation of CWSs and operators, it may be expected that observations within a given hierarchical group are more similar to each other than observations among different groups. In the context of the present analysis, the research needed to assess whether a random effect term was necessary by determining if the repeated system-level observations for each operator were more similar within a single operator than they were among different operators. If one operator was running five systems and had done a poor job with each, these five 156 observations, if treated independently, and could contribute more weight to the model than they should because the observations are not truly independent from one another. The addition of a random effects term for operator would assign a distribution to each level of the operator variable (each individual operator) thereby centering the expected value for operator violations around an operator-specific mean, with an operator-specific variance. What is reported in the model output is the total variance contributed to the data due to variation in system performance (violations) among operators. Incorporation of a random effects term brought a new methodological component to SDWA compliance research, which had primarily explored the relationship between compliance and variables only collected at the system level (Rubin, 2013; Allaire et al., 2018;) or aggregated up to the county or municipal level (Switzer and Teodoro, 2017; McDonald and Jones, 2018). CWSs are more than just system-level characteristics but also have the hierarchical nesting within operators, and previous research has not explored the multiple levels CWSs and their performance. This research modeled multiple dimensions with some of the variables at the operator level (level 2) with others at the system level (level 1). Exploring the hypotheses through incorporating the multiple scales in the same model opened possibilities for future research ensured the observed relationships were reflective of reality. This research’s GLMM model with the random effects term is shown in Equation M below. 𝑛 𝑝(𝑋𝑗 ) Equation M: 𝑙𝑜𝑔 ( ) = 𝛽0 + ∑(𝛽𝑖 𝑋𝑖 ) + 𝛼𝑖 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 + 𝜀 1 − 𝑝(𝑋𝑗 ) 𝑖=1 𝑖= independent variables in the technical, managerial, financial, or natural advantage categories. 157 𝛼𝑗 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 = Random effects term based around the operator. 𝑝(𝑋 ) The 𝑙𝑜𝑔 (1−𝑝(𝑋𝑗 )) is representative of the log odds of a violation given a set of independent 𝑗 variables∑𝑛𝑖=1(𝛽𝑖 𝑋𝑖 ) . The variables were the same as in the GLM model and in Table 21. However, this model added a random effects term 𝛼𝑖 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 . This was modeled using the brm() function from the BRMS package in R (Bürkner, 2017; 2018). Like the ordered logistic regression models, a Bayesian framework was used for all the benefits previously expressed in section 4.6. 4.7.1 Testing for Random Effects or GLM One of the commonly used tools to explore the goodness of fit between a fixed effects model and random effects model is likelihood ratio tests (Chen et al., 2019). A likelihood ratio test can compare two models by testing whether the additional complexity of the added random effect term improves that model’s accuracy significantly (Chen et al., 2019). If a model is more accurate, then its log likelihood should be higher, and if it less accurate it should be lower (Chen et al., 2019). All six models were explored with and without the inclusion of the random effect term. Equation N shows the basic equation for comparison of log likelihoods. Test statistic = (-2)*LogLikelihood(Less complex model)- Equation N: LogLikelihood(Random Effects Model or more complex model) This first step is to calculate the test statistic by subtraction of the log likelihood for the more complex model from the less complex model to show the differences in prediction. Then it is multiplied by -2 to provide the test statistic with a chi-square distribution with the degrees of freedom equal to the differences in number of parameters in each of the models (Chen et al., 2019). This research calculated the test statistic the loglik() function in the stats R package (R Core Team, 2020). 158 Once this test statistic was obtained and the difference in degrees of freedom was identified, then a simple chi-square test provided a p-value for each comparison. A p-value below a defined threshold indicated the complex model was more accurate than the simpler model, while a p-value that exceeded the threshold indicated that the less complex model was better (Chen et al., 2019). In this research, the p-value for the test statistic was measured at the 95% confidence level with 0.05 as the p-value threshold to determine a better or worse model using the pchisq() function in the stats R package (R Core Team, 2020). Table 23 shows these results of the test statistics to determine the fit of the GLM vs GLMM. Five out of six of the models were better with the greater level of complexity with the operator random effect term in the model. The only model that did not work better with the random effect term was Prim.MAJ.All, which provided a poor fit to the data even without the random effect term due to less of the surveyed systems having a “major” violation in 2020 and too many parameters for the amount of data in the model. Model Dependent ANOVA P-value Best model Variable Chi- Square Prim.Any.All Binary 2020 Any 4.11 0.04 GLMM Violation Prim.Any.Reduced 3.81 0.05 GLMM Prim.NH.All Binary 2020 4.08 0.04 GLMM Violation Non- Health Violation Prim.NH.Reduced 4.42 0.04 GLMM Prim.MAJ.All Binary 2020 Major 0 1 GLM Violation Prim.MAJ.Reduced 21.58 0.00003 GLMM Table 23: Testing Results for GLM vs GLMM Models 159 4.7.2 Diagnostics: Model Strength MLM The research compared and assessed each of the models using all the same metrics as the ordered operator level logistic regression model (BIC, AIC) but also explored GLMM model- specific model diagnostic metrics of Hosmer-Lemeshow Goodness of Fit test, and the GLMM R2 (which was broken down into two metrics—fixed effects R2 and conditional effects R2). The following two sections outline the new GLMM R2 to capture the fixed and random effects, and the Holsem-Lemeshow test. 4.7.3 GLMM R2 The first goodness of fit metric employed on the GLMM models was the R 2 value. The traditional R2 and Bayesian R2 values were outlined with the OLR models in section 4.6. However, neither of these metrics are appropriate for mixed models. First, take the traditional R 2 value seen in the Equation O below: ^𝑖 ) 𝑣𝑎𝑟(𝑦 Equation O: 𝑇𝑟𝑎𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙𝑅 2 = 𝑣𝑎𝑟(𝑦𝑖 ) where the variance of 𝑦^𝑙 the model’s predicted outcome is divided by the variance of 𝑦𝑖 the individual outcome (Burt, Barber, and Rigby, 2009; Love, 2020). This means that there are only two possible sources of variability in the model; (1) the “fixed effects” or the variables known values, and (2) the error that is not explained by the variables (Love, 2020). However, once the random effects term is inserted into the model, a third possible source of variability is introduced around the operator specific variables that relate to multiple observations (Nakagawa and Schielzeth, 2013; Love 2020). To adjust for this, Nakagawa and Shielzeth (2013) introduced an R2 calculation that provides two numbers: (1) the fixed effects R 2 where the variability is based around the same calculation as the traditional R2 with the addition of the random effects term’s 160 variance to the denominator, and (2) the conditional or random effects R 2 which includes the fixed effects and the variance explained by the random effects term in the numerator. Equation P shows the calculation of fixed or marginal effects R2 value: Equation P: 𝑣𝑎𝑟(𝑓) 𝑀𝑎𝑟𝑔𝑖𝑛𝑎𝑙 ∨ 𝐸𝑓𝑓𝑒𝑐𝑡𝑠𝑅 2 = 𝑣𝑎𝑟(𝑓) + 𝑣𝑎𝑟(𝑟) + 𝑣𝑎𝑟(𝜀) Where 𝑣𝑎𝑟(𝑓) was the variance of the fixed effects divided by the summation of the variance of the random effects term 𝑣𝑎𝑟(𝑟) and the variance of the error 𝑣𝑎𝑟(𝜀). Through doing this, the R2 only included the proportion of the variance explained by the fixed effects terms on the overall model (Nakagawa and Schielzeth, 2013). However, the fixed effects variance was not the only measure that mattered as the random effects term should explain some of the variance, as well. To get the conditional R2, the modeler needed to include the variance explained by the random effects term in the numerator (Nakagawa and Schielzeth, 2013). The equation for the conditional effects R2 is seen in Equation Q: 𝑣𝑎𝑟(𝑓) + 𝑣𝑎𝑟(𝑟) 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 ∨ 𝑅𝑎𝑛𝑑𝑜𝑚𝐸𝑓𝑓𝑒𝑐𝑡𝑠𝑅 2 = Equation Q: 𝑣𝑎𝑟(𝑓) + 𝑣𝑎𝑟(𝑟) + 𝑣𝑎𝑟(𝜀) The addition of the variance explained increases as the variance of the random effects is added to the numerator. The conditional R2 is thus always larger than the marginal R2 value (Nakagawa and Schielzeth, 2013). For the GLMM, model both the conditional and marginal R2 values were 161 reported and were calculated using the bayes_R2() function in the BRMS R package (Bürkner, 2017; 2018). 4.7.3.1 Hosmer-Lemeshow Test While the R2 values were useful for comparing models to each other and assessment of model enrichment, they were not great for explaining the overall calibration of the model (Fagerland and Hosmer, 2012; Bartlett, 2015). The calibration of the model error focused on how well the model actually fit the data and did not require comparisons to other models like the R2 diagnostics. The Hosmer-Lemeshow goodness-of-fit test divides up the sample according to their predictive probabilities or risks (Fagerland and Hosmer, 2012; Bartlett, 2015). The test 𝑝(𝑋 ) works by exploring each of the independent 𝛽^𝜌 parameter estimates for when the 𝑙𝑜𝑔 (1−𝑝(𝑋𝑗 )) = 𝑗 1 (or in simpler terms, when the system has a violation of the j type). This probability is calculated based on the covariate values can be seen in Equation R: 𝑒𝑥𝑝⁡(𝛽^0 + 𝛽^1 𝑋1 +. . . +𝛽^𝜌 𝑋𝜌 ) Equation R: 𝜋 ^= 1 + 𝑒𝑥𝑝⁡(𝛽^0 + 𝛽^1 𝑋1 +. . . +𝛽^𝜌 𝑋𝜌 ) The sample is typically split into deciles based around their predictive probabilities of whether or not they had a violation. The breakdown into deciles is related to the predicted probability of violations, as about 10% of the observations should have a predicted probability of 0.1, and the observations within this decile should have small amounts of variance between them (Fagerland and Hosmer, 2012; Bartlett, 2015). If the model accurately groups those together and ~10% had a violation, then it is a good fitting model, however if the percentage of systems with a violation in the 0.1 decile is high (>~50%) then the model would be a poor fit. The test takes the number of predictive probabilities of membership in the violating and non-violating systems 162 groupings, and then performs Pearson’s goodness-of-fit statistic. The Pearson’s statistic provides the p-value and degrees of freedom (as number of groups - 2), which allows for hypothesis testing (Fagerland and Hosmer, 2012; Bartlett, 2015). If the confidence interval and p-value is set to 0.05, then p-value of over 0.05 would represent failure to reject the null hypothesis that the model fits the data, while below the 0.05 would have a rejection of the null hypothesis pointing to the model being a bad fit. For each model explored, the results of the Hosmer-Lemeshow provide another metric towards assessing the goodness of fit. 4.8 Conclusion Table 24 provides an overview of all the hypotheses and the methods associated with each hypothesis. This chapter explained each of the methods and how they were the most appropriate to answer the research questions, given the limitations of the data. The descriptive statistics and tests of independence were used to address all of the endogenous hypotheses and spatial hypothesis 1. The variogram models were used to explore spatial hypothesis 2. The ordered logistic regression models focused on both the endogenous hypotheses and the primary hypotheses. The primary hypotheses were investigated using OLS, and two “new” approaches to CWS SDWA compliance research: GWR and GLMM. The following chapter presents the results of the methods and explains the findings in relation to the hypotheses. 163 Hypothesis Hypothesis (Explicit) Methods Prim-1 If spatial structure exists for operator interactions, there are • Ordinary Least Squares regional advantages based on knowledge spillovers and • Geographically Weighted transfers between community water systems (CWS) Regression operators that facilitate Safe Drinking Water Act • Ordered Logistic Compliance. Regression • Generalized Linear Mixed Models Prim-2 “Isolated” operators with fewer interactions are more likely • Ordinary Least Squares to have SDWA violations/non-compliance than “non- • Geographically Weighted isolated” operators with more interactions Regression • Ordered Logistic Regression • Generalized Linear Mixed Models EN-1 Utility or contract operators will have more inter- and • Descriptive Statistics intra-operator interactions than non-affiliated operators. • Kruskal-Wallis [EN-1] EN-2 Operators (utility, contract, non-affiliated) who are • Chi-square test of professionally engaged through water organizational Independence [EN-2, EN- membership, and pursuit of continuing education will have 3] more interactions than operators who are not professionally • Ordered Logistic engaged. Regression EN-3 Operators (utility, contract, non-affiliated) with higher lev- els of certifications and educational attainment will have more interactions than operators with low levels of certifi- cation or educational attainment. SP-1 Interactions between Utility and Non-Affiliated CWS • Descriptive Statistics operators occur primarily with operators in the same • Chi-square test of counties, while Contract Operators are more likely to have Independence interactions with operators outside their county SP-2 Operator interactions have spatial structure such that • Variogram operators are more likely to interact with each other if their systems are close together in geographic space Table 24: Overview of all methods and which hypotheses they relate to. 164 CHAPTER 5: Results 5.1 Introduction This chapter walks through the methods employed by this research to investigate all of the hypotheses. Table 25 is a reminder of the relationship between the methods and the hypotheses. This chapter will first walk through the endogenous hypotheses results as they used descriptive statistics, tests of independence, and ordered logistic regression models to explore the relationships between operator type, education, and group member, and the number of reported interactions. The spatial hypotheses first use the descriptive statistics and chi-square test of independence for assessment of the relationship between operator type and the county of their inter-operator interactions; then uses variogram models to explore spatial autocorrelation in the number of interactions based on spatial proximity of operators. The final section addresses the primary hypotheses, that there are regional advantages in SDWA compliance based around increased inter-operator interactions, and that isolated operators with fewer interactions have more SDWA violations than non-isolated operators with more inter-operator interactions. Through using OLS and GWR models for aggregated interactions and violation data, the rural/urban divide and spatial structure of regional advantages are investigated, while the ordered logistic regression models and GLMMs assess the role of inter-operator interactions while controlling for TMF capacity and natural advantages. 165 Hypothesis Category Hypothesis Methods Primary Prim-1 • Ordinary Least Squares • Geographically Weighted Regression • Ordered Logistic Regression • Generalized Linear Mixed Models Prim-2 • Ordinary Least Squares • Geographically Weighted Regression • Ordered Logistic Regression • Generalized Linear Mixed Models Endogenous EN-1 • Descriptive Statistics • Kruskal-Wallis • Ordered Logistic Regression EN-2 • Descriptive Statistics • Chi-square test of Independence • Ordered Logistic Regression EN-3 • Descriptive Statistics • Chi-square test of Independence • Ordered Logistic Regression Spatial SP-1 • Descriptive Statistics • Chi-square test of Independence SP-2 • Variogram Table 25: Overview of research hypotheses and the methods employed 166 5.2 Endogenous Hypotheses Results 5.2.1 Descriptive Statistics and Tests of Independence [EN-1, EN-2, EN-3] Utility operators Contract operators Non- Affiliated Operators (n=194) (n=30) (n=27) Variable Mean S.D. Mean S.D. Mean S.D. Test Number of 10.28 13.27 12.17 15.41 4.63 4.37 KW= 9.51*** Interactions Membership Organizations (no 9.8% 40% 51.9% 𝑋2 = 14.23*** outside org memberships) Education Level 22.2% 30% 37% 𝑋2 = 3.72 (Bach or higher) Number of 1.25 0.68 6 6.86 1.18 0.40 KW=75.73*** Systems in charge Experience 15.02 10.34 13.78 10.33 18.26 12.53 KW=2.01 Other Operators 4.14 4.04 3.83 5.40 0.94 0.64 KW=21.33*** Average 14,161 56,805 1,659 2,929 281 466 KW=66.95*** Population Served Educational or Recertification 1.65 1.82 1.12 1.10 0.64 0.66 KW=8.86* hours (Earned) Educational or Recertification 0.99 0.95 1.25 0.85 0.66 0.61 KW=4.79 Hours (Needed) Usefulness 4.13 0.89 3.87 1.14 3.81 1.00 KW=3.32 Meeting Hours 14.54 11.02 8.85 11.45 6.94 7.42 KW=19.54*** Table 26: Descriptive Statistics of All Variables and Tests of Independence 167 Table 26 (cont’d) KW is Kruskal-Wallis Test 𝑋 2 is the chi-square Test *p<0.05, **p<0.01, ***p<0.001 (Note system experience is left off from this table) Table 26 provides the means, standard deviations, and association tests results for the key survey operator specific variables. This section will first outline how the descriptive statistics and tests of association relate back to the three endogenous hypotheses, then it will discuss some of the other interesting findings for the other operator specific variables. EN-1 focused on Utility and Contract operators having more interactions than Non- Affiliated operators and based on the mean across the groups both Utility (10.28) and Contract (12.17) have substantially more interactions than Non-Affiliated operators (4.63). The results of the Kruskal Wallis test are statistically significant at all levels, indicating a difference in interactions between groups. The Dunn test results show statistically significant differences at the 95% confidence level for differences between “Contract and Non-Affiliated operators” and the “Non-Affiliated and Utility operators”, while the “Contract and Utility operators” were not statistically significantly different from one another. This shows that the main difference in the number of inter-operator interactions stems from the Non-Affiliated operators as the Utility and Contract operators are similar to each other in the number of reported interactions. EN-2 assesses whether an operator’s membership in professional organizations impacts the number of inter-operator interactions. The percentage values in Table 26 are the number of operators in each group that are not a member of any professional water related organization. According to the chi-square test of independence there is an observable statistically significant difference between the operator types. Over half of the Non-Affiliated operators belong to no 168 professional organization, while only 40% of the Contract operators and less than 10% of the Utility operators have no organization membership. A Kruskal Wallis test (not in Table 26) on the number of inter-operator interactions against group membership, shows a statistically significant (99% confidence interval) H value of 10.68, which indicates there are differences in the number of inter-operator interactions based on group membership. EN-3 contends that operators (regardless of employer type) with higher levels of educational attainment would have greater interactions. The descriptive statistics show the highest operator group where operators have obtained a bachelor’s degree or higher was the Non-Affiliated operators at 37%, then the Contract operators at 30%, and finally the Utility operators 22.2%. Further, the chi-square test of independence does not show statistically significant differences between the groups in educational attainment. This is unexpected and exact opposite direction of the values compared to inter-operator interactions. Running a Kruskal Wallis test comparing the number of interactions directly against the education attainment groups (bachelors or higher, and less than bachelors) also provided a not statistically significant H value, which implies the relationship between educational attainment and inter- operator interactions is undifferentiable from randomness. There are several other interesting results of the descriptive statistics and tests of association from the operator specific data. Average population across groups is different with Utility operators having an average population of 14,162 people, Contract operators having 1,659, and Non-Affiliated operators having only 281. What the difference in average population shows is that many of the Utility operators are managing much larger systems than the Contract operators or the Non-Affiliated operators. While Utility operators run larger systems, the 169 Contract operators on average operate more (six different systems) than the utility (1.25) or non- affiliated operator (1.18). Utility operators had the highest average number of other operators at their system with 4.14, followed by Contract operators with 3.83, and the Non-Affiliated operators was less than 1. Further, the Kruskal Wallis test for the number of other operators employed shows that their system/s was highly statistically significant indicating differences between the employers, and a Dunn test shows statistically significant differences between “Contract operators and Non-Affiliated operators” and the “Non-Affiliated operators and Utility operators”. Neither of the earned or needed recertification hours showed differences, with only earned hours’ Kruskal Wallis test showing statistically significant at the 90% confidence interval. The Kruskal Wallis tests on the number of years of experience and on perceived usefulness was not statistically significant. The number of hours spent in water organization meetings was statistically significant with Utility operators having more (14.54) than double the mean number of hours than Non-Affiliated operators (6.94). This is not surprising given the results of the differences in organization membership discussed above. 170 5.2.2 Endogenous Hypotheses Ordered Logistic Regression Models [EN-1, EN-2, EN-3] Proportional Odds Model Explanation Bayes R2 BIC assumption met?* All variables included EN.M1.All 0.28 569.6 No in the model Stepwise Regression EN.M2.Best 0.26 529.3 Yes with 4 terms All 3 variables En.M3.Enall endogenous 0.07 593.1 Yes hypotheses included Interactions and Type EN.M4.Type 0.03 591.1 Yes of Operator Only Interactions and the EN.M5.Group group membership of 0.03 584.4 No an operator Interactions and the educational EN.M6.EDU 0.004 595.5 Yes attainment of the operator Table 27: All Ordered Logistic Regression Model Results Table 27, outlines all six of the endogenous hypotheses ordered logistic regression models. Based on the Bayes R2 the top two models were EN.M1.All with 0.28, and EN.M2.Best following close behind with 0.26. This was not entirely unsurprising as both models had more parameters than the other four models. Models only focused on the endogenous hypothesis variables all had low R2 values and high BIC values. EN.M2.Best unsurprisingly had the lowest BIC value at 529.3. The final differentiator in these models was whether the proportional odds assumption held, and for all the models but EN.M1.All and EN.M5.Group the assumption held. Regardless of the BIC or R2 values, if the proportional odds assumption did not hold then those 171 models were not reflective of reality because the OLR models’ predictive relationships for ordinal dependent variables need the same slope and not change based on the dependent variable grouping. For the EN.M5.Group model, this could have meant that since the relationship or slope of the coefficients between reported interactions and professional water organization membership changes based on the number of reported interactions, there would need to be multiple coefficients to accurately assess the relationships. This research cannot interpret the coefficients for this model because they were unreliable, and the lack of consistency would have required different modeling practices to tease out the proper coefficients. EN.M2.Best EN.M3.Enall L- U- L- U- Variable Estimate Est Error Estimate Est Error 95% 95% 95% 95% Intercept [1] (1 to 10 2.05* 0.75 0.63 3.53 -0.41 0.45 -1.32 0.48 Interactions) Intercept [2] (10 to 20 6.19* 0.88 4.52 7.96 2.93* .5 1.97 3.93 Interactions) Intercept [3] (20 to 30 7.36* 0.9 5.63 9.19 3.93* 0.52 2.92 4.97 Interactions) Intercept [4] 7.95* 0.92 6.19 9.79 4.47* 0.55 3.42 5.56 (30+ Interactions) (Type) 0.56 0.46 -0.33 1.48 0.84 0.45 -0.06 1.72 Utility operators (Type) Contract 1.58* 0.57 0.48 2.68 1.52* 0.54 0.46 2.58 operators Educational - - - - 0.04 0.30 -0.56 0.62 Attainment Group - - - - 1.11* 0.38 0.37 1.86 membership Number of Other 0.31* 0.12 0.07 0.55 - - - - Operators Hours in Meeting 0.63* 0.15 0.34 0.92 - - - - Usefulness 1* 0.17 0.67 1.34 - - - - *Confidence intervals do not contain 0 Table 28: Best Model and Endogenous Hypotheses Model Summaries 172 Table 28 highlights the coefficients, standard estimated errors, and lower/upper 95% confidence interval of the two best-performing models. Best performing models was based on meeting the OLR assumptions and provided insight on the endogenous hypotheses. In OLR interpreting the coefficients for significance relies on their 95% intervals not containing a zero (Sullivan, 2017). All the intercepts’ 95% confidence intervals for model EN.M2.Best did not contain 0, while all but intercept [1]- 1 to 10 interactions in EN.M3.ENall of the 95% confidence intervals did not contain a zero. This was not surprising given the high BIC and low R 2 values for EN.M3.ENall compared to EN.M2.Best. In both models, the Utility operators’ 95% confidence intervals contained zeros, while the Contract operators did not. The Contract operators’ coefficients were both positive, indicating a positive relationship between being a Contract operator and being in a higher inter-operator interaction category. In EN.M2.Best, the number of other operators, hours spent in meetings, and perceived usefulness all had positive coefficients and confidence intervals that did not include a zero, which indicated an unsurprising positive relationship between the interactions category of operator and more hours spent in meetings, more operators at their system, and greater perceived usefulness of interactions, while accounting for all of the other independent variables. EN.M3.ENall education attainment showed no observable relationship as all variable’s confidence intervals contained a zero, with the exception of group membership which had a positive coefficient and confidence intervals that did not contain a zero. These results indicate membership to a group should increase the probability of a greater number of interactions for an operator. 173 Figure 19: Endogenous Hypotheses Model (M3.Enall) Odds Ratios Figure 19 shows the converted odds ratio to probabilities from the EN.M3.Enall model, specifically showing the changes in probability of the number of inter-operator interactions based on operator type, group membership, and education. Each box in the plot represents a different combination of the effects of operator’s group membership and educational attainment, by operator type and the probability of belonging to the reported interactions group. Group membership “1” means that the operator belongs to one of the water related professional organizations, while “0” meant that they were not a member of any water related professional organization. Similarly, education with a value of “1” meant the operator had achieved a bachelor’s degree or higher, while “0” meant the operator did not have a bachelor’s degree. The box in the top left corner of the plot depicts all operators who were not members of a professional water organization and did not achieve a bachelor’s degree or higher; while the box 174 in the bottom right corner depicts all operators who were members of a professional water organization with a bachelor’s degree or higher. In every combination, there was a higher probability of Contract or Utility operators being in the ‘10 to 20’, ‘20 to 30’, or ‘30+’ inter-operator interactions groupings compared to the Non-Affiliated operators. In every possible combination the Non-Affiliated operators had a higher probability to be in the ‘no interactions’ group compared to the contract or Non-Affiliated operators. When a member of a water related group (AWWA or other) the probability of having ‘1 to 10’ interactions was highest for the Non-Affiliated operators, however when there was no group membership the probability of ‘1 to 10’ interactions for Non-Affiliated operators drops below that of Contract or Utility operators. The relative proximity of probabilities to each other made sense when looking back at the model diagnostics. Only the operator type variable was included in the EN.M2.best model while group membership and education were dropped, indicating they did not improve the model. Further, education in the Kruskal-Wallis test did not show differences between the groups while the group membership variable did, which might explain why the graphs do not change a lot when controlling for education but showed substantial changes when controlling for group membership. 5.2.3 Endogenous Hypotheses Meaning of Results Overall the results of the endogenous hypotheses investigations were mixed in their findings. There was strong evidence based on the descriptive statistics, tests of independence, and the OLR models that EN-1 was accurate, with Utility and Contract operators having more inter-operator interactions than the Non-Affiliated operators. EN-1 was focused on greater professional engagement of these most often full-time CWSs operators, and they were shown to have more average interactions, were statistically significantly different from the Non-Affiliated 175 operators, and had higher probabilities to be in the larger inter-operator interactions groups. The OLR models had positive betas for the Utility and Contract operators, which meant that these operator types had a higher probability of reporting greater inter-operator interactions than the Non-Affiliated operators, even when controlling for professional organization membership and educational attainment (Figure 19). EN-2 continued the professional engagement hypotheses, where professional engagement was not based solely on operator type but membership to a water related professional organization. The results were mixed, as in the best model based on BIC did not include the group membership variable as it did not add value to model with all the possible variables included, but in the explicit endogenous hypotheses model (EN.M3.ENall) the group membership variable had a positive beta (with intervals that did not include zero), which indicated that probability of reporting more inter-operator interactions rose with being a member of a professional organization. EN-3 focused on the educational attainment of the operator, where operators with higher educational attainment were expected to have more inter-operator interactions. The results of the chi-square tests of independence and OLR models provided no evidence of this hypothesis as none of the model outputs had education with any betas that did not include a zero. 176 5.3 Spatial Hypotheses Results 5.3.1 Descriptive Statistics and X2 Tests [SP-1] Operator Type Inside and Inside County Outside Neither inside Outside Only County Only nor outside County county Utility 51.3% (100) 37.4% (73) 8.2% (16) 3.1% (6) Contract 48.4% (15) 35.5% (11) 12.9% (4) 3% (1) Non-Affiliated 25.9% (7) 25.9% (7) 25.9% (7) 22.2% (6) X2 value= 28.351, df =6, p-value= 0.0008069 Table 29: Overview of the Responses for County of inter-operator interactions and chi-square test of independence SP-1 focused on the county as the unit of knowledge transfers between operators. The descriptive statistics in Table 29 show that Utility and Contract operators had very similar breakdowns of the reported location of inter-operator interactions. Utility operators had the highest percentage (~51%) of operators that reported having interactions both inside and outside the county that is home to their CWS, followed closely behind the Contract operators with ~48%. This similarity trend continued as ~37.5% of Utility and ~35.5% of Contract operators reported only interacting with operators in their own county. While “outside the county only” Contract had ~5% more respondents than the Utility operators. Non-Affiliated operators had about 25% in each of the categories, which showed that there are more differences between the Non-Affiliated operators than the utility or the contract. The chi-square test showed statistical significance indicating that there were observable differences between the groups. 177 5.3.2 Variogram Interactions Model [SP-2] Figure 20: Semi-Variogram Plots of Surveyed Operators Interactions Figure 20 shows the three variograms that explore SP-2, or the spatial structure of survey respondent interactions. All three variograms showed that at distances less than ~19,500 meters (< 12 miles), there was lower semi-variance then at longer distances. This indicated that systems (operators) that were nearby each other reported more similar numbers of interactions than those that were further away. The lowest semi-variance at shortest distances was in the Utility and Non-Affiliated operator only systems (Variogram 2), where there were steadily increasing semi- variances from ‘lag 1’ (75-point pairs) to ‘lag’ 15 (671 pairs) showing a consistency in the spatial structure, with lower variance at shorter distances. The single system operators only (Variogram 3) only had 13 pairs of operators within 2 miles (~3,200 meters) or ‘lag 1’, while increasing the numbers in ‘lag 2’ to 47 pairs, ‘lag 3’ 68 pairs, and in ‘lag 4’ 83 pairs, which all showed 178 increasing variance at greater distances. After about 20,000 meters (~12 miles), the variance no longer showed a spatial trend and leveled off. These results indicated that reported interactions were more likely to be similar based on the spatial location of the systems. It did not matter whether all the systems, only Utility/Non-Affiliated operator systems, and single system operators were included, as the spatial structure persisted across all three variograms reflecting SP-2. 5.3.3 Spatial Hypotheses Meaning of Results The results of the spatial hypotheses were mixed in their findings. The expected relationships of SP-1 were not observed, as the relationships between operator type and counties of interactions did not show more Utility and Non-Affiliated operators interacting with operators inside their own county at a higher rate. These findings indicated that the county scale is probably not the right unit of analysis for CWSs or operator interactions. This was reiterated during the interviews where most of the operators pointed to how their interactions came from the closest systems which were not always within the same county and these findings did not support SP-1. SP-2 focused on the spatial autocorrelation of inter-operator interactions and found that location matters to the variance of reported inter-operator interactions. The extent of spatial autocorrelation was around 12 miles of proximity of CWSs, where the least amount of variance was observed in reported inter-operator interactions. Through the variogram models’ findings, there was strong observable evidence in support of SP-2. While the county as the unit of analysis might have not shown major differences, the absolute locations of the systems clearly mattered, as lower variance (higher correlation) was observed for nearby operators. 179 5.4 Primary Hypotheses Results 5.4.1 Regional Advantage Explorations [Prim-1, Prim-2] Survey Only Imputed Global (Interactions) (Interactions) (Interactions) Dependent 𝑅2 Beta 𝑅2 Beta 𝑅2 Beta Variable (p-value) (p-value) (p-value) 2020 Percentage 0.28 0.64 0.08 1.58 0.05 -4.49 of Systems with a (0.11) (0.26) (0.29) Violation 2020 Percentage -0.13 0.33 -0.16 -0.37 0.08 -8.69 of Population (0.68) (0.89) (0.25) with a Violation 2020 Percentage -0.15 0.09 -0.16 -0.13 -0.07 -2.35 of Systems with a (0.78) (0.91) (0.48) Major Violation 2020 Percentage -0.17 0.01 -0.1 -1.62 0.15 -10.9 of Population (0.989 (0.58) (0.18) with a Major Violation Table 30: Michigan EGLE OLS Results on Aggregated Violations on Different Interactions Measures Table 30 shows the results for the 12 OLS models for the Michigan EGLE CWS regions that investigated Prim-1 and Prim-2, by comparing the percentage of systems in violation to the three different interactions measures. None of these models showed any significant betas at any of the explored confidence levels (90%/95%/99%). Further, 7 out of the 12 models came back with negative R2 values, which indicated that the proportion of the variance in these models was better explained by fitting a horizontal line than the actual linear model. At the Michigan EGLE regional level of aggregation there were no observable statistically significance relationships that expressed the number of interactions having any impact on the percentage of the region (population or systems) with a violation. 180 5.4.2 Districts Exploration OLS Model Results and Plots [Main-1, Main-2] Figure 21: OLS Plots for Percentage of Systems with any 2020 Violation on the Three Interactions Measures Dependent Independent Measure All included Urban Rural Variable Variable Districts Districts 2020 Survey Only R2 -0.045 0.27 0.23 Percentage of Beta (p-value) -0.005 (0.98) 0.56 (0.04)** -0.35 Systems in (0.22) Violation Imputed R2 -0.04 0.16 -0.042 Beta (p-value) -0.19 (0.75) 1.53 (0.1)* -1.12 (0.2) Global R2 -0.04 -0.07 -0.11 Beta (p-value) -0.98 (0.72) -1.6 (0.68) -0.85 (0.87) Table 31: OLS model results for the Percentage of Systems with any 2020 Violation Figure 21 and Table 31 show the results of the nine OLS models, exploring Prim-1 and Prim-2, by regressing the percentages of systems in a Michigan EGLE district with any 2020 violation on the different interaction measures. When all the districts were included in the model 181 none of the predictors were significant, and the R2 values of the models were negative. However, once urban and rural districts were separated, the urban models with ‘survey-only’ interactions (interactions that operators reported in the research survey), and ‘imputed’ interactions (average of survey reported interactions and imputed non-respondents with global average) had statistically significant positive slopes for both predictors. ‘Survey-only’ interactions was statistically significant at the 95% confidence level, while the ‘imputed’ interactions came back significant at the 90% confidence level. The positive sign of the betas (i.e., positive relationship between interaction predictors and violations) indicated that urban districts with greater aggregated interactions had a higher percentage of systems in violation than districts with less aggregated interactions. None of the rural models came back as statistically significant and two of the R2 values came back negative. None of the ‘global’ interactions (interactions-based average of operator type replaces known survey interactions with global average) showed significance in any of the models. These findings for the ‘imputed interactions’ and the ‘survey-only’ interactions for percentage of systems with any 2020 SDWA violation did not match up with Prim-1 or Prim-2 hypotheses, as the urban districts showed the inverse relationship, whereas interactions increased so did the percentage of systems with any 2020 SDWA violation. 182 Figure 22: OLS Model Plots for the Percentage of Population Served by a System with any 2020 Violation on the Three Interactions Measures Dependent Independent Measure All included Urban Rural Variable Variable Districts Districts 2020 Survey Only R2 -0.12 -0.06 -0.04 Percentage of Beta (p-value) 0.38 (0.4) 0.39 (0.6) 0.5 (0.45) Population in Imputed R2 -0.03 -0.07 -0.07 Violation Beta (p-value) 0.81 (0.57) 0.95 (0.69) 1.19 (0.56) Global R2 0.03 -0.07 -0.11 Beta (p-value) -8.1 (0.21) 4.18 (0.66) -18.1 (0.08)* Table 32: OLS Model Plots for the Percentage of Population Served by a System with a 2020 SDWA Violation Figure 22 and Table 32 show the results of the nine OLS models for the percentages of population in the Michigan EGLE districts served by a CWS with a 2020 SDWA violation of any type on the three aggregated interaction measures. Only the model for rural districts with the ‘global’ interactions measure came back significant. The relationship slope was negative, 183 indicating more interactions and a smaller percentage of the population in violation, but the R2 was negative indicating that this was a poor fitting model. These findings gave no support to Prim-1 or Prim-2. Figure 23: OLS Model Plots for the Percentage of Systems with a 2020 Major Violation on the Three Interactions Measures Dependent Independent Measure All Urban Rural Variable Variable included Districts Districts 2020 Survey Only R2 -0.03 0.36 0.26 Percentage Beta (p-value) -0.09(0.57) 0.51 (0.02)** -0.45 (0.06)* of Systems Imputed R2 0.01 0.22 0.35 with Major Beta (p-value) -0.54 (0.3) 1.41 (0.06)* -1.54 Violation (0.03)** Global R2 -0.03 -0.07 -0.1 Beta (p-value) -1.55 (0.52) -1.38 (0.66) -1.61 (0.72) Table 33: OLS Models Results for the Percentage of Systems with a 2020 Major Violation 184 Figure 23 and Table 33 show the results of the nine OLS models for the percentage of systems in Michigan EGLE districts with a 2020 major violation on the different interaction measures. Four out of the nine models came back with significant betas and positive R 2 values, indicating that there were some observable relationships between ‘survey-only’ interactions and ‘imputed’ interactions, when separating out the urban and rural districts. Here the urban and rural district models had almost a completely inverse relationship in both the ‘survey-only’ and ‘imputed’ inter-operator interactions. The plots for the ‘survey-only’ and ‘imputed’ interactions for the urban and rural OLS lines made a clear “X” (cross). For ‘survey-only’ measure the urban districts had a higher R2 of 0.36 to rural’s 0.26, which indicated that the urban model for ‘survey- only’ was a better fitting model. Further, the positive relationship between ‘survey-only’ interactions and percentage of systems with a 2020 major violation for urban districts (0.51) was statistically significant at the 95% confidence level, while that same relationship for the rural ‘survey-only’ model (-0.45) was statistically significant at the 90% confidence level. In the ‘imputed’ interaction models, the rural district model had a higher R 2 (0.35) than the urban model (R2=0.22). The rural model’s negative beta (-1.54) for the ‘imputed’ interactions was statistically significant at the 95% confidence level, while urban model’s positive beta (1.41) for ‘imputed’ interactions was only significant at the 90% confidence interval. It was clear from the OLS models for the relationships between the ‘survey-only’ or ‘imputed’ interactions measures and percentage of systems with a 2020 major violation, in the rural districts, as the number of interactions increased, the percentage of systems with a major violation decreased. However, in the urban districts as the number of interactions increased, the percentage of systems with a major violation also increased. Prim-1 and Prim-2 were accurate hypotheses for the rural districts but did not reflect what was happening in the urban districts. 185 Figure 24:OLS Models Plots for the Percentage of Population Served by a System with a 2020 Major Violation on the Three Interactions Measures Dependent Independent Measure All included Urban Rural Variable Variable Districts Districts 2020 Survey Only R2 -0.01 0.03 -0.08 Percentage Beta (p-value) 0.29 (0.4) 0.63 (0.28) 0.22 (0.64) of Population Imputed R2 -0.03 0.01 -0.1 with Major Beta (p-value) 0.59 (0.59) 1.9 (0.32) 0.36 (0.8) Violation Global R2 0.03 -0.03 0.36 Beta (p-value) -6.02 (0.22) 5.76 (0.45) -14.75 (0.03)** Table 34: OLS Models Results for the Percentage of Population Served by a System with a 2020 Major Violation Figure 24 and Table 34 show the results of the nine OLS models for the percentages of population in the Michigan EGLE districts served by a CWS with a 2020 major violation on the three aggregated interaction measures. Only the model for rural districts with the ‘global’ aggregation interactions came back with a significant beta. The slope was negative and 186 significant at the 95% confidence level, which indicated that in rural districts, there was an inverse relationship between ‘global’ measures of inter-operator interactions and the population served by a system with a 2020 major SDWA violation. Thus, ‘global’ measures of inter- operator interactions was correlated with decreases in the percentage of the population served by a system with a violation. These findings gave no support to Prim-1 or Prim-2. 187 5.4.3 Districts Exploration GWR Model Results and Plots [Prim-1] Model Neighbors Min Max Mean Median Number of Significant Betas (90% CI) 2020 13 R2 -0.68 0.67 0.21 0.29 Percentage of Betas -0.61 0.5 0.09 0.21 17 Systems in (under FB Violation~ correction) Survey Only 22 R2 -0.65 0.22 0.14 0.18 Betas -0.33 0.36 0.1 0.2 16 (under FB correction) 2020 13 R2 -0.88 0.69 0.23 0.29 Percentage of Betas -2.46 1.57 -0.06 0.23 5 Systems in (under FB Violation~ correction) Imputed 22 R2 -1 0.29 0.13 0.16 Betas -1.23 0.96 0.06 0.31 0 (under FB correction) 2020 13 R2 0.19 0.92 0.51 0.55 Percentage of Betas -0.55 0.52 0.12 0.25 18 Systems in (under FB Major correction) Violation~ 22 R2 0.18 0.94 0.37 0.34 Survey Only Betas -0.4 0.42 0.12 0.26 21 (under FB correction) 2020 13 R2 0.1 0.85 0.51 0.56 Percentage of Betas -2.11 1.71 0.09 0.54 16 Systems in (under FB Major correction) Violation~ 22 R2 0.16 0.88 0.37 0.37 Imputed Betas -1.46 1.22 0.15 0.53 12 (under FB correction) Table 35: Overview Table of the Results of the Tricube GWR Models To account for the spatial structure hypothesized in Prim-1, this research ran GWR models to incorporate the possible spatial structure within these interactions. As in the previous section, the percentage of population served by a system in violation (any or major) came back with no significant relationships for the GWR models between any of aggregated inter-operator 188 interaction measures and violation measures, and the non-significant results were not included. Table 35 shows the summary statistics for the eight different GWR models reported by this research. For EGLE districts shown as percentage of systems with a violation, all of the models came back with observed significant relationships with the exception of the 22 neighbors’ percentage of systems in violation on the ‘imputed’ interactions. There were more significant local relationships for the ‘survey-only’ aggregated interactions than for the ‘imputed’ interactions in all models. The two models with the highest number of significant betas (Major Systems~Survey Only Interactions) and no negative R2 values was 2020 percentage of systems with a major violation on the ‘survey-only’ interactions. Further, it should be noted that none of the R2 values were negative for the major systems in violation on the ‘imputed’ interactions variable models, while in the any violation models each one had some negative local R 2 values. For all the models, there were both positive and negative betas, which indicated some spatial variation in the relationships. Figure 25: 2020 Major Violations on Surveyed Interactions (13 neighbors) 189 Figure 26: 2020 Major Violations on Survey Interactions (22 Neighbors) Figures 25 and 26 map out the local betas, their significance measured by the Fotheringhom and Bryne p-value correction, and the local R2 values for the two GWR models for percentage of systems with a major violation on the aggregated survey interactions. The GWR-22 model had only three districts that did not have significant local betas for ‘survey-only’ interactions and all of those non-significant local betas were positive values, while all of the negative local betas were significant. The GWR-13 model had 16 significant slopes for ‘survey- only’ interactions and showed a very similar pattern to the GWR-22 ‘survey-only’ interactions model. Both models showed a similar trend, R2 values were higher towards the north, with district 81 (upper peninsula) having the highest R2. This noticeable change in patterns of the R2 values indicated that the more average interactions in a district were correlated with smaller percentages of systems with major violation (both patterns) in the Northern Michigan. Southern Michigan showed the inverse pattern, where fewer aggregated interactions were associated with lower percentages of systems in violation. This reflected the findings from the OLS models for the district, where the relationship between percentage of systems with major violations and 190 interactions was inverted between urban and rural districts. Prim-1 accurately hypothesized the relationship in the northern and more rural districts with better predictive models (higher R 2) and negative beta values but had positive beta values with lower R 2 values in the southern more urban parts of the state. Figure 27: 2020 Percentage of Systems with Major Violations on Imputed Interactions (13 neighbors) 191 Figure 28: 2020 Percentage of Systems with Major violation on Imputed Interactions (22 neighbors) Figures 27 and 28 map out the local betas for the ‘imputed’ interactions, their significance as measured by the Fotheringhom and Bryne p-value correction, and the local R2 values for the two GWR models for percentage of systems with a major violation on the aggregated ‘imputed’ interactions variable. The patterns in the two ‘imputed’ interactions models were very similar to the patterns shown in the survey only models on major violations. The local R2 values were still the highest in district 81, where there were statistically significant negative slopes, which indicated good fitting models. Greater average ‘imputed’ interactions were correlated with a decrease in percentage of system with a major violation. Further, the urban/rural or north/south divides in the relationships were apparent, where the northern and more rural districts showed negative local betas that were statistically significant and had the highest local R2 values. The southern/urban districts had positive betas with some statistical significance and lower R2 values than the northern districts. The clear spatial pattern showed 192 benefits in the rural areas for greater interactions between operators, but in the urban areas there was the opposite effect, which reflected the previous findings in relation to Prim-1. 5.4.4 Operator Only Model (Ordered Violation Percentages) [Prim-1, Prim-2] OP.OL.All OP.OL.Reduced Variable Estimate Estimate L- U- Estimate Estimate L- U- Error 95% 95% Error 95% 95% Intercept [1] 0.95 1.19 -1.44 3.28 -0.36 0.77 -2.01 1.02 (0.01% to 99.99% of systems with a violation) Intercept [2] 1.85 1.2 -0.54 4.18 0.51 0.76 -1.13 1.88 (100% of systems with a violation) (Type) -2.52 0.82 -4.22 -1.02 -2.33 0.78 -3.99 -0.93 Utility operators (Type) -2.28 0.94 -4.22 -0.55 -1.95 0.85 -3.73 -0.40 Contract operators Interactions (scaled) 2.76 1.25 0.51 5.46 2.57 1.19 0.49 5.25 Violation in 2019 1.6 0.38 0.86 2.35 1.61 0.35 0.94 2.31 Utility operators * -2.8 1.26 -5.51 -0.56 -2.61 1.21 -5.28 -0.50 interactions Contract operators -2.81 1.3 -5.56 -0.44 -2.61 1.25 -5.35 -0.42 * interactions Total Systems 0.19 0.29 -0.38 0.77 Certification Length 0.21 0.19 -0.16 0.57 Other Operators -0.24 0.25 -0.78 0.21 Usefulness 0.24 0.2 -0.15 0.64 Group Membership 0.4 0.52 -0.57 1.44 Educational 0.31 0.41 -0.52 1.11 Attainment Hours in Meetings -0.27 0.21 -0.69 0.12 Average Population -0.43 0.54 -1.72 0.31 Served Continuing 0.25 0.19 -0.14 0.63 Education Credits Earned (Experience) -0.1 0.47 -0.98 0.84 2 to 5 Years (Experience) -0.09 0.41 -0.9 0.69 5 years+ Bayesian R2 (BIC) 0.23 0.19 (410.4) (358.5) Table 36: Results of the Ordered Logistic Regression Models on the Operator Only Variables Table 36 highlights the coefficients, standard estimated errors, and lower/upper 95% confidence intervals of the two OLR models exploring operator only features on the percentage 193 of operator’s systems in violation. The dependent variable in these models was an ordered factor which represented the percentage of the operators CWS/s that had a SDWA violation in 2020 and was broken down into three groups: (1) no violations, (2) 0% to 100% of CWSs with a violation, and (3) 100% of CWSs with a violation (more about this in section 4.6). The ordered logistic regression model explored Prim-1 and Prim-2 by looking at operator only characteristics that related to SDWA compliance. In both of the models, the intercept’s confidence intervals did contain zero, which was unsurprising given the high BIC and low R2 values for both models. In both models, only four of the variables had betas with confidence intervals that did not contain a zero: operator type, interactions, 2019 violation, operator type * interactions. Operator type was an ordered factor variable, where the order was based on the average number of inter-operator interactions, where the Non-Affiliated operators were the lowest factor, and the Contract operators were the highest factor. In both models, the Utility and Contract operators had a negative beta value (with no 0 in confidence intervals) which indicated that for these two types of operators the likelihood of moving from ‘no violations’ group to either of the other two higher groupings of the dependent variable was negative and unlikely. Interactions alone had a positive coefficient estimate, that indicated operators with greater interactions had an increased probability of having either ‘0% to 100%’ or ‘100%’ of their CWS/s with a 2020 SDWA violation. These findings were the opposite of what was proposed by Prim-1 and Prim-2. The presence of the operator having a system with a violation in 2019 had positive coefficients with confidence intervals that did not contain a zero, that indicated operators of a system with a violation in 2019, had an increased probability of belonging to either of the dependent variable groupings which reflected a percentage of their systems in 2020 194 had a SDWA violation. The operator type * interactions term fit with Prim-1 and Prim-2 as the likelihood of an operator moving up in violations groupings decreased with increased inter- operator interactions for Utility or Contract operators. Figure 29: Operator Level Odds ratio for Operator Type, Interactions, and Violation Groupings Figure 29 depicts the relationships between operators and the number of interactions by converting odds ratios in Table 36 to probabilities. It shows the likelihood changes (y axis) between violation groups (x axis) based on the different combinations of reported inter- interactions and operator type (each box). It should be noted that in the Utility and Contract operator types there was little observed variation within the violations group membership and interactions. Almost all the confidence intervals overlapped with each other. However, for the Utility operators “no violations group” there was an increasing slope of the probability having ‘0% of systems with CWS violation’ with increased interactions. This indicated a relationship (weak) with increased interactions for Utility operators and a higher probability of having no systems with a violation. Alternatively, there was a downward slope in probability of belonging to either the ‘0 to 100%’ violations group or the ‘100%’ violations group with increased 195 interactions. Contract operators had a higher probability in general of belonging to either the ‘0 to 100%’ violations or the ‘100%’ violations group. Non-Affiliated operators showed an interesting trend here where increased interactions actually lowered the probability of being a member of the ‘no violations’ group. Further, there were none of the Non-Affiliated operators that reported having more than ‘10 to 20’ interactions, hence the missing pieces on Figure 29. Figure 29 shows that while there was significance based on the models, it was not highly significant as there was little variation in the probability of utility or contracts operators being having more than 0% of CWS/s in violation based on the interactions alone. 5.4.5: GLMM System and Operator levels for Binary SDWA Violations [Prim-1, Prim-2] Hosmer- Fixed Dependent Conditional Lemeshow AIC / Model Effects 2 Variable Effects R Goodness of BIC R2 Fit 4.65 (0.79) – 411.34 / Prim.Any.All 0.19 0.41 Binary 2020 Any Good fit 497.87 Violation 2.72 (0.95) – 395.22 / Prim.Any.Reduced 0.12 0.28 Good fit 432.30 7.23 (0.51)– 395.62 / Prim.NH.All Binary 2020 0.17 0.46 Good fit 482.15 Violation (Non- 6.54 (0.59) – 381.29 / Prim.NH.Reduced Health Violation) 0.13 0.33 Good fit 422.49 16.87 (0.03)- 251.99 / Prim.MAJ.All 0.15 0.72 Binary 2020 Bad fit 338.52 Major Violation 6.73 (0.57) – 237.74 / Prim.MAJ.Reduced 0.14 0.31 Good fit 278.95 Table 37: Overview of R2 values, Hosmer-Lemeshow Test, and the AIC/BIC for the GLMMs Table 37 outlines all six of the GLMM models explored by this research and the quality of the fit for these models. The GLMM models investigated Prim-1 and Prim-2 through representation of the different types of binary 2020 SDWA violations, while accounting for TMF capacity and natural advantage variables. Only one of the models showed a bad fit, as the Prim.MAJ.All did not pass the Hosmer-Lemeshow test for the goodness of fit with the p-value 196 lower 0.05. Further, the conditional effects R2 was very high compared to the others which showed an issue with this model. The issues with the model were most likely due to the small number of systems (observations) that actually had a major violation in 2020 and the large number of parameters included in the full model. Every other model passed the Hosmer- Lemeshow test and had positive and realistic R2 values, indicating they were a more appropriate fit. The best fixed effects R2 were in Prim.Any.All and Prim.NH.All, where Prim.Any.All had the slighter higher fixed effects, while the Prim.NH.All had the higher conditional effects R2. Prim.NH.Reduced had a slightly higher conditional effects R2 value than the other two best models, while Prim.MAJ.Reduced had the highest fixed effects values. Prim.MAJ.Reduced had the lowest BIC values followed by Prim.NH.Reduced and Prim.Any.Reduced. It is clear that five of the six models had appropriate fits and were very similar in their explanatory power. Only the five models with appropriate fits will be discussed further. 197 Prim.Any.All Prim.NH.All 0.25 0.25 % Variable Beta E.E. % 95% Beta E.E. 95% Intercept -0.59 1.53 -3.57 2.53 -0.94 1.73 -4.33 2.59 Utility operators -4.7 1.67 -8.4 -1.89 -5.15 1.9 -9.51 -2.00 Contract operators -3.55 1.57 -6.95 -0.78 -3.95 1.79 -7.9 -0.82 Interactions 4.12 2.14 0.47 8.87 4.37 2.41 0.18 9.73 MHI -0.64 0.49 -1.64 0.27 -0.12 0.54 -1.17 0.95 MHV 0.63 0.55 -0.43 1.72 -0.15 0.65 -1.49 1.05 Unemployment -0.02 0.26 -0.54 0.48 -0.2 0.31 -0.83 0.38 Education 0.21 0.90 -0.71 1.16 0.32 0.56 -0.74 1.4 Environmental Quality 0.21 0.21 -0.19 0.61 0.27 0.23 -0.19 0.72 Medium Population -0.83 0.9 -2.74 0.83 -1.09 1 -3.16 0.81 Large Population -0.46 0.53 -1.57 0.54 -0.54 0.6 -1.79 0.57 Rural Code 2 0.69 0.58 -0.43 1.85 0.97 0.67 -0.29 2.34 Rural Code 3 -0.27 0.53 -1.34 0.76 -0.31 0.59 -1.5 0.84 Rural Code 4 0.14 0.6 -1.01 1.31 0.31 0.67 -0.97 1.66 Group Membership 1.06 0.91 -0.65 2.94 1.32 1.03 -0.61 3.51 Violation 2019 1.88 0.52 0.91 2.93 2.13 0.57 1.09 3.31 Purchased Water -0.85 0.78 -2.54 0.58 -1.11 0.85 -2.9 0.44 Primary System 0.94 0.61 -0.26 2.14 1.21 0.65 -0.07 2.51 Upper Peninsula -0.48 1 -2.54 1.46 -0.59 1.13 -2.9 1.6 Utility operators and Interaction -4.23 2.19 -9.06 -0.43 -4.52 2.45 -10.05 -0.25 Contract operators and Interaction -4.68 2.25 -9.64 -0.79 -4.9 2.54 -10.73 -0.53 Table 38: Results of the GLMM models for Model 1 All and Model 2 All Table 38 shows the coefficient estimates, error, and confidence intervals for the fully parameterized models for variables explaining the likelihood of any 2020 SDWA violation (Prim.Any.All) and a non-health related 2020 SDWA violation (Prim.NH.All). The “type of operator” variable’s confidence interval did not include zero and there were negative coefficients for the Utility and Contract operators. This meant that compared to the systems run by Non- Affiliated operators, the systems run by Utility and Contract operators were less likely to have a violation. For both models, the interactions variable had a positive beta value with confidence intervals that did not contain 0. This indicated the opposite Prim-1 and Prim-2 that increases in interactions would lead a higher probability of having any 2020 SDWA violation or a non-health 198 related SDWA violation. However, these relationships change and reflect Prim-1 and Prim-2, when the interaction term was added to the two variables: Utility operators*Interactions and Contract operators* Interactions. The negative betas (not containing 0) indicated increased inter-operator interactions for Utility and Contract operators decreased the probability of a 2020 SDWA violation (any violation or non-health). The only other statistically significant variable was the variable representing whether the system had any 2019 SDWA violation, which was positive, indicating that if the system had a violation in the previous year, then the probability of having a violation in the following year increased. Prim.Any.Reduced Prim.NH.Reduced Prim.MAJ.Reduced 0.25 Beta E.E. 2.5 % 97.5% Beta E.E. 0.25 % 95% Beta E.E. 95% Variable % - - Intercept -0.84 1.02 -2.74 1.23 -1.11 1.1 -3.2 1.09 1.28 -5.13 2.48 0.03 - - -3.53 1.10 -5.88 -1.55 -3.85 1.2 -6.45 -1.74 1.43 -7.22 Utility operators 4.02 1.63 - - -2.39 1.05 -4.59 -0.5 -2.59 1.14 -5.01 -0.51 1.39 -5.92 Contract operators 2.92 0.46 Interactions 2.61 1.45 0.12 5.81 2.6 1.55 -0.2 6.01 3.62 2.05 0.35 8.32 Environmental - - - - 0.25 0.18 -0.1 0.61 0.38 0.25 -0.09 0.9 Quality Group Membership 0.92 0.61 -0.25 2.17 1.09 0.67 -0.2 2.43 0.87 0.84 -0.73 2.6 Violation 2019 1.5 0.4 0.7 2.33 1.70 0.43 -0.88 2.60 1.78 0.55 0.77 2.96 Primary System 0.89 0.48 -0.05 1.81 1.05 0.51 0.05 2.07 1.47 0.69 0.13 2.82 Utility operators -2.75 1.48 -6.01 -0.14 -2.73 1.57 -6.16 0.12 - 2.11 -8.75 - and Interaction 3.94 0.54 Contract operators - - -2.98 1.5 -6.28 -0.33 -3.03 1.62 -6.52 -0.16 2.67 -2.2 and Interaction 6.32 12.51 Table 39: Results of the "Best" or Step GLMM Models Table 39 presents the results of the three models selected by stepwise model selection methods: the Prim.Any.Reduced (any 2020 Violation), Prim.NH.Reduced (non-health 2020 violation), and Prim.MAJ.Reduced (2020 major violation) models. Prim.Any.Reduced included the variables of operator type, interactions, group membership, violation in 2019, primary system, and operator type *interactions, while Prim.NH.Reduced and Prim.MAJ.Reduced also included the environmental quality variable. For all three models, the operator type betas came 199 back positive with confidence intervals that did not include zero, which indicated that Utility and Contract operated systems had a higher probability of not having a 2020 violation (all three models) than the Non-Affiliated operators. Interactions had a positive beta and confidence intervals that did not overlap 0 for Prim.Any.Reduced and Prim.MAJ.Reduced, which indicated that as the number of interactions rose so did the probability of the system to have either any type of 2020 SDWA violation or a 2020 major violation. While in Prim.NH.Reduced, the interactions variable had confidence intervals that included a 0. When taken together, the interactions and operator type had negative betas with confidence intervals that did not include 0, which indicated that as interactions increased for Utility or Contract operators, the probability of having either any type of 2020 SDWA violation, or a major 2020 SDWA violation decreased. In Prim.NH.Reduced, only the beta for interactions for Contract operators and interactions had confidence intervals that did not include 0. Prim.Any.Reduced and Prim.MAJ.Reduced both had the positive beta values with confidence intervals that did not include 0, that indicated any violation in 2019 increased the probability of having either any type of violation in 2020, or a major type of violation in 2020. Environmental quality and group membership beta estimates had confidence intervals that included zero, and there was nothing that can be taken away about the impact of these variables on probability of a violation in any of the different violation models. Prim.NH.Reduced and Prim.MAJ.Reduced both had positive betas for the primary systems variables that did not have confidence intervals that included a zero, which indicated the primary systems in the sample increased the probability of having a non-health 2020 SDWA violation or a major 2020 SDWA violation. Prim-1 and Prim-2 both showed a signal for greater interactions being related to a lower probability of any or a major 2020 SDWA violation when 200 accounting for the type of operator, however interactions alone showed the inverse of the main hypotheses. 5.4.6: Primary Hypotheses Meaning of Results The results of the methods used to explore the primary hypotheses provided mixed results on the quality of the hypotheses. Prim-1 focused on the role that spatial structure played in determining regional advantages for CWSs operators SDWA compliance. The OLS and GWR models found a clear urban/rural divide in the aggregated interactions measures and SDWA compliance. In rural districts the percentage of 2020 major SDWA violations was inversely correlated with the aggregated interactions measures, meaning that in the rural districts greater interactions decreased the percentage of 2020 major SDWA violations. However, in urban districts the percentage of CWSs with any 2020 SDWA violation and 2020 major SDWA violations had positive relationships with the aggregated interactions measures, which indicated that as aggregated interactions measures increased so did the percentage of CWSs with either type of violation. Further exploration of Prim-1 through the OLR models and the GLMMs found that interactions alone increased the probability of a 2020 SDWA violation (any 2020 SDWA violation, and 2020 major SDWA violations). However, when accounting for operator type and interactions, there was evidence of Utility and Contract operators with greater inter-operator interactions reducing their probabilities of having any of the types of 2020 SDWA violations. While there was a strong signal based on operator type and the rurality that increased interactions decreased the probability of SDWA violations, there was also a strong signal that these relationships were inversed for Non-Affiliated operators and urban districts. Due to these findings Prim-1 was neither rejected nor accepted, as further investigation is required to determine the full validity of the idea. 201 Prim-2 expressed the idea that “isolated” with fewer inter-operator interactions would have an increased probability of SDWA violations compared to the non-isolated operators. Here isolated was defined as the operators who are not professionally engaged or interacting with their peers. The primary isolated operators were the Non-Affiliated operators who had lower inter- operator interactions and lower percentage of these operators with water-related professional organization membership. Based on the OLR models and GLMMs there was a strong signal through the positive betas (with confidence intervals not containing zero) that the Non-Affiliated operators were more likely to have a 2020 SDWA violation (all three of the measures) than the Utility or Contract operators. Further the operator type*interactions term in the models explicitly showed that more inter-operator interactions for Utility and Contract operators increased the probability of SDWA compliance. These signals from the methods indicated that Prim-2 was an accurate hypothesis on the relationship between isolated operators, interactions, and SDWA compliance. 202 CHAPTER 6: Discussion and Conclusion 6.1 Introduction This chapter focuses on the contributions of this research and the broader impacts of the findings. First, it briefly reviews the research hypotheses and the findings from Chapter 5. Then, it explains the theoretical contributions of the work in the context of organizational learning, innovation systems, and agglomeration economics theories. After addressing the theoretical contributions of the work, the chapter then addresses the broader impacts for SDWA compliance research and regulation and lays out seven suggestions for increasing inter-operator interactions as possible avenues to increase SDWA compliance. After establishing the main contributions, the chapter provides several possible directions for future research on external linkages between CWSs. The chapter concludes with a brief overview of the entire research. 203 Hypothesis Hypothesis (Explicit) Key Findings Prim-1 If spatial structure exists for inter-operator • Regional Advantages (EGLE districts) in interactions, there are regional advantages the aggregated number of inter-operator based on knowledge spillovers and transfers interactions and percentage of CWSs between community water systems (CWS) with a 2020 major SDWA violation for operators that facilitate Safe Drinking Water Act rural districts Compliance. • Urban EGLE districts showed greater aggregated interactions had higher percentage of CWSs with any or major 2020 SDWA violations • Interactions alone showed a positive relationship between increased interactions and probability of the operator or CWS having a 2020 SDWA violation • Interactions when combined with operator type, showed decreasing probability of 2020 SDWA violation with increased interactions for Utility or Contract Operator Prim-2 “Isolated” operators with fewer inter-operator • The “isolated” Non-Affiliated operators interactions are more likely to have SDWA had limited interactions and higher violations/non-compliance than “non-isolated” probabilities of 2020 SDWA violations operators with more interactions than the better-connected Utility and Contract operators. EN-1 Utility or Contract operators will have more • Utility and Contract operators had more inter- and intra-operator interactions than non- reported inter-operator interactions affiliated operators. through averages and probabilities than Non-Affiliated operators EN-2 Operators (Utility, Contract, Non-Affiliated) • Mixed findings as group membership who are professionally engaged through water alone did not increase the interactions for organizational membership, and pursuit of all operator types continuing education will have more • When controlling for Utility and interactions than operators who are not Contract operators, there was an increase professionally engaged. in the probabilities of more reported inter-operator interactions with group membership EN-3 Operators (Utility, Contract, Non-Affiliated) • The operator’s educational attainment with higher certifications levels and educational showed no relationship to the number of attainment will have more interactions than reported inter-operator interactions operators with low levels of certification or educational attainment. SP-1 Interactions between Utility and Non-Affiliated • There were no observable differences CWS operators occur primarily with operators between the operator type and the county in the same counties, while Contract operators where their interactions occurred. are more likely to have interactions with operators outside their county SP-2 Operator interactions have spatial structure • There was observed spatial such that operators are more likely to interact autocorrelation between reported inter- with each other if their systems are close operator interactions for operators with together in geographic space CWSs within 12 miles of each other Table 40: Overview of the Research Hypotheses and Key Findings 204 Table 40 shows the research hypotheses and the key research findings of the analyses discussed in Chapters 4 and 5. There were mixed findings for the primary hypotheses (Prim-1 and Prim-2). The models assessing Prim-1 showed the hypothesized relationships in rural EGLE districts and when accounting for operator type. However, for both inter-operator interactions alone (not controlling for type of operator) and in EGLE urban districts, there was the opposite relationship than proposed by Prim-1. Prim-2 results found that the less isolated and more professionally engaged operators reduced their probability of SDWA violations when controlling for operator type. The endogenous hypotheses (EN-1, EN-2, EN-3) had mixed findings. EN-1 focused on the operator type, and the results supported the hypothesis as the Utility and Contract operators had a higher probability of reporting more inter-operator interactions than the Non- Affiliated operators. EN-2 showed mixed results, as in some models when controlling for the type of operator type belonging to a professional water organization increased the reported number of inter-operator interactions for Utility and Contract operators, while in other models without controlling for operator type, it was not statistically significant. Investigations in EN-3 failed to show any relationship between the operator’s educational attainment and the number of reported inter-operator interactions. The results for SP-1 showed no indications of operator type being related to the county/s of inter-operator interactions, while the results for SP-2 showed interactions were spatially autocorrelated. 6.2 Theoretical Contributions 6.2.1 Organizational Learning In the context of organizational learning, this research’s exploration of the endogenous hypotheses found the organizational and human capital (operator-specific) characteristics were drivers of organizational learning through “inter-organization learning.” Previous research (Levitt and March, 1988; Brinkeroff, 2006; Lee and Choi, 2015) on organizational learning 205 found that the alternative paths to increased performance included (1) increased R&D and (2) increased training; however, these were unattainable for many small and debt-heavy organizations. Considering that in Michigan, ~52% of CWSs are regarded as very small systems serving populations of 500 people or less, and ~79% of CWSs serve 3,300 people or less, the first two organizational learning mechanisms are less likely to be attainable. It is a widely known issue through previous SDWA compliance research that the small systems (Ottem et al., 2003; Blanchard and Eberle, 2013; Allaire et al., 2018; McDonald and Jones, 2018; Statman- Weil et al., 2020; EGLE, 2020) and rural systems (Krometis and Marcillo, 2019) are more likely to struggle with SDWA compliance. For these systems, organizational learning theory would point to the third way to increase their performance through ‘inter-organizational’ learning, which would be a low-cost alternative and more based around the professional network of the small/rural CWSs. Based on this research’s sample and model results, the role of interactions in lowering the probability of SDWA violations was mixed based on the urban/rural area of the system and the type of operator. Interactions alone showed the opposite trend to the primary hypotheses: as the frequency of interactions increased, there were also increases in the probability of a SDWA violation. However, when accounting for urban/rural area, the rural CWSs had a decreased probability of SDWA violation, while the CWSs in urban areas had an increased probability. This finding, paired with some previous findings (Krometis and Marcill,o, 2019), suggests that one possible avenue to support these rural CWSs is to promote inter-organizational knowledge sharing. In the interviews and open-ended survey questions, many of the operators indicated that some of the best information they received or shared was beneficial to them in several different ways. One operator in the open-ended survey question went as far as to say, “This is how we 206 learn; classes only get you so far,” which pointed to the role of inter-operator interactions as increasing their learning. A different operator highlighted the learning benefits of keeping up with the dynamic SDWA and EGLE regulations: “when talking to other operators, I like to discuss the problem we may be facing or new regulations that may have been implemented at the federal or state level.” These inter-operator interactions are “priceless,” as one operator said, and whether it is about distribution, source water, treatment techniques, or changes in the regulations/laws, these learning experiences help CWSs avoid SDWA violations. One possible reason for these interaction results being different in urban and rural regions could be that opposite causal directions apply in these different areas. Violations in urban districts may lead operators to reach out and interact with other CWS operators to achieve SDWA compliance, but more research is needed to flesh out the causal inferences. Several possible reasons for the observed nonintuitive results in urban districts did not match the primary hypotheses. Urban and rural areas have substantial environmental and demographic differences affecting water treatment requirements and infrastructure needs and challenges. Further, there were more CWSs and operators in the urban districts, which means they likely had more (and more convenient) opportunities for professional engagement. The nuance of the differences between the urban and rural districts needs to continue to be explored in future research.” 6.2.2 Innovation Systems Theory This research contributed to the larger innovation systems theory of knowledge transfer through quantitative analyses and expansion of the research domain to the geographies of regulated resource-based sectors. One of the gaps identified in the innovation systems theory of 207 knowledge transfer literature was the dearth of research exploring regulated resource-based sectors (Soete et al., 2010). Previous research covering had found the role of knowledge transfers between operators of unrelated systems was one mechanism for better water system performance (Lienert et al., 2006; Meene et al., 2013; Pascual-Sanz et al., 2013; Wehn and Montalvo, 2018). They made these conclusions through qualitative research methods focused on interviewing operators in Switzerland (Lienert et al., 2006) and Australia (Meene et al., 2013), or through focus groups/ interviews of water operators from numerous countries participating in Capacity Development Partnerships (Pascual-Sanz et al., 2013) or Water Operator Partnerships (Wehn and Montalvo, 2018). However, these explorations were all in international contexts, and in a regulated industry like the water sector, context is undoubtedly crucial for understanding the role of knowledge transfer. There are differences in regulation for quality and quantity between countries (Pascual-Sanz et al., 2013), which has limited the quantitative explorations and specificity of the benefits by countries. Further, U.S. CWS regulation is scalar and heterogeneous even within a country, as different state primacy agencies may have more stringent rules. By focusing solely on exploring Michigan CWSs and operators, this research filled those gaps in the innovation systems theory to further the quantitative exploration of knowledge transfers. The single-state focus allowed this research to capture more types of operators than the international studies, as not every operator or system was the same. One operator pointed to the heterogeneous landscape of systems and operators in a comment, “Each system is somewhat different. Each operator’s experience is a small sample size.” The international studies did not reflect the U.S. experience as they failed to capture the heterogeneity of U.S. CWS operators where part-time and full-time operators make up the CWS operator pool. Specifically, the 208 international organization explorations only included full-time operators. The research findings for the impact of knowledge transfers were situated in the type of operator, where Utility and Contract operators showed that with higher reported interactions, the probability of SDWA compliance decreased (increased probability of violations), while the Non-Affiliated operators showed no relationship between increased interactions and SDWA violations. The findings of interactions and Utility or Contract operators in both the GLMM binary logistic regression and the ordered logistic regression results reflected the results of the previous studies but expanded them to gain insight into the types of interactions and which types of operators most benefited from them. Further, it included CWSs that were owned and served many different populations. The small mobile home park CWS operators were not considered in the international literature but are relevant to the U.S. CWS landscape: by including them, this research better reflected U.S. CWSs. Based on these results, future studies of innovation systems theory and knowledge transfers between water system operators should account for and explain the different operator types. 6.2.3 Agglomeration Economics This research contributed to the agglomeration economics literature by further expanding the perspective of knowledge spillovers to new sectors and through new measurement and modeling of knowledge spillovers. Agglomeration economics and knowledge spillovers provide one of the foundational theories underlying economic geography, exploring the role of spatial proximity between organizations, regional advantages, and organizational performance (Rosenthal and Strange, 2004). Previous research has found regional advantages based on the spatial proximity of organizations and their knowledge spillovers in organizational performance for non-governmental industries (Audretsch and Feldman, 1996; Charlot and Duranton, 2004; 209 Rosenthal and Strange, 2004) and entrepreneurship (Agarwal et al., 2004; Audretsch and Keilbach, 2008; Acs and Sanders, 2012). However, the previous research has yet to explain the role of knowledge spillovers in the resource and geographically based sectors. CWSs were a great vehicle to explore the gap because (1) CWSs are practically geographically immobile (Beecher, 2009), and (2) measuring CWS knowledge spillovers and performance are difficult (Rosenthal and Strange, 2004). This immobile attribute of CWSs may explain natural regional advantages based on the location of the utilities and systems, where the regional cultures facilitate the CWS’ or utility’s performance. This research showed a spatial structure to CWS operators reported interactions as the variance between CWS operators reported interactions increased with distance between the CWS and operators up to about 12 miles (SP-2). This spatial structure reflected regional advantages in the number of inter-operator interactions as Michigan has spatial heterogeneity. Thus, the cultural landscape is different across the state, and further investigation of the spatial differences between the local operator culture could further the understanding of how operators are professionally engaged and elucidate the professional networking connections between CWSs. Further, the research fleshed out the impact of the performance benefits of interactions between unrelated CWSs operators in the context of operator type and the urbanity/rurality of the CWS. Through the geographically weighted regression models and the OLS models for districts, this research found that more aggregated operator interactions in rural districts had a lower percentage of CWSs with a major SDWA violation in 2020, while the urban districts showed the opposite relationship. These findings suggest regional advantages for the rural areas for knowledge spillovers positively impacting performance. To further SDWA compliance, 210 regulatory agencies and professional organizations can encourage networking and engagement in these rural regions to support CWS operators. Further, using the spatial autocorrelation tools, regulators can identify the areas with high levels of non-complying CWSs and figure out what is causing the issues in the region. 6.3 Broader CWS Compliance Research and Regulation This research contributed to the broader CWS compliance research as it focused on attempting to understand the role of the CWS operator while accounting for the TMF capacity and natural advantages of CWS for SDWA compliance. Most of the previous research (McGavisk et al., 2013; Rubin, 2013; Pennino et al., 2017; Switzer and Teodoro, 2017; Allaire et al., 2018; Montgomery et al., 2018) has ignored the operator-specific characteristics and TMF capacity indicators, in lieu of using the SDWIS database with numerous system-level characteristics. The outcomes of this research have focused on attempting to explain the structural ownership’s role or source water (McGavisk et al., 2013; Rubin, 2013; Pennino et al., 2017; Switzer and Teodoro, 2017; Allaire et al., 2018; Montgomery et al., 2018) but have neglected the human capital that manages and operates these systems, which was considered a vital component of the EPA’s TMF capacity framework. CWSs are a regulatory unit subject to drinking water standards set by the federal government and their state primacy agency; however, these units say nothing about how CWSs are connected (Beecher et al., 2020). CWSs can be connected through common ownership, operation, or purchasing/wholesale (Beecher et al., 2020). This research focused on the connection between CWSs through the shared DO operators. Almost 25% of the operators in the sample were the DO for more than one CWS, and almost 59% of the CWSs in the sample were run by an operator of more than one CWS. This connection between systems has been missed by 211 previous research, which has treated each CWS as an independent entity/observation. Using the GLMM model to capture the operator and the CWS level, this research showed the need to measure the multiple layers of CWSs that impact their performance. This research found there were at least two layers required to measure the performance of CWSs: the system level characteristics and the human/labor capital. The significance of the operator level characteristics showed the need for investigations of SDWA compliance to account for conventional other variables (TMF capacity measures) and the operator-specific information. Fundamentally, the most important finding of this research was that the operator-specific data (not CWS but operator) were the most significant factors in modeling SDWA compliance. The conventional approach that uses CWS level data alone was not appropriate as the operator-specific characteristics were more important for estimating the probability of SDWA compliance. Further, the expectation set out by previous research (Teodoro and Whisenant, 2012, 2013, 2014; Meier and O’Toole, 2013; Blanchard and Ellerbe, 2013; Teodoro, 2014) about the educational attainment of the human capital component of CWS performance was misguided. Operators can be categorized as Utility, Contract, or Non-Affiliated operators. Within these operator groupings, there were differences in the educational attainment of the operators. In the sample, nearly 52% of operators’ highest level of educational attainment was a High-school diploma, and only ~25% of operators had attained a bachelor’s degree or higher. Previous research asserted that “Utilities that are headed by professional engineers violate the SDWA significantly less frequently than do utilities led by nonengineers” (Teodoro, 2014, p. 983). These previous findings attribute education to operator success but ignore the reality of the operator landscape and fail to reflect the reality of operator experiences. There was no observable difference in the number of reported interactions and the operator’s educational 212 background. Further, this research did not observe any relationship between CWS operators’ educational background and the system's performance, and the efforts to push for more highly educated operators might not be as constructive as previously hypothesized. The findings for rural CWSs and interactions provide key insight for policymakers and regulators. While the EPA’s TMF capacity points to the role of ‘external linkages’ between systems, there had yet to be a study that investigated the benefits of ‘external linkages’ and SDWA compliance. Through modeling the relationships in Michigan, it was seen that rural districts with more aggregated interactions lowered the percentage of systems with a ‘major SDWA violation.’ It is widely known that rural or small CWSs have the most challenging time with SDWA compliance, and the results of this research suggest that more professional engagement decreases the probability of major violations. An operator of small CWSs in the upper peninsula pointed to the two geographically proximate operators as the people they bounce ideas off of and work together to understand better what changing regulations mean for their day-to-day operations. While another operator of small CWSs in the upper peninsula said that they were not near enough to any other operators to have consistent engagement, and they had to learn everything from continuing education credit courses or AWWA meetings. In the open- ended survey, one operator said, “very good 1-on-1 with Local system,” which reflected the location-based drivers of interactions. Other comments from both urban and rural CWS operators addressed the need to mentor and help newer operators with their CWSs, and one even went as far as to say that there needs to be a contact database that connects the operators. Finding ways to encourage professional engagement between CWSs (especially in rural systems) would be a possible avenue to address SDWA violations in these rural areas. 213 Fundamentally, the most important finding of this research was that the operator-specific data (not CWS but operator) were the most significant factors in modeling SDWA compliance. The conventional approach of CWS level data was not appropriate as the operator-specific characteristics were the most important in constructing the probability of SDWA compliance. Research tends to view these systems as individual observations and not connected, completely ignoring the fact that many CWSs are connected by the operator that runs the CWS. The general public also tends to ignore the operator until there is an issue with the system and then heap all the blame for a failure on the operator. During one of the interviews, an operator compared their job to a placekicker on an American football team. The operator said: “Everyone expects you to hit the extra point or field goal, and you never receive praise for doing the job right, but everyone notices and throws you under the bus the second you miss a kick (whether it was your fault or not).” This quote from one of the operators was particularly telling about the operator's role, as they tend to be forgotten and ignored by the public and research until there is any sort of issue with the system. They are the invisible infrastructure within the invisible infrastructure. However, the CWS operator is so vital to communities that the public and research cannot ignore their impact on system performance. Based on the findings, this research presents the following recommendations to support more external linkages between CWS operators as possible low-cost capacity building opportunities: • Use spatial statistics to identify districts or regions with low compliance rates and target the areas with the greatest need for support 214 • In districts/regions with low compliance, hold focus operator groups to encourage open conversations between operators • Creation of online platforms (forums) that allow operators to digitally interact with one another and opportunities to get multiple perspectives for issues they face • Creation of a mentor program for new operators that connect the new operator with an experienced and successful CWS operator • Prove increased support for operator participation in outside water organizations • Creation of a database of CWS operators’ contact information that is available to operators 6.4 Directions for future research This research only looked at a single U.S. state at a single point in time. However, more research is needed to understand if the interactions between operators matter in other states, as in Michigan. The structure of water policy in the U.S. federal system has created a heterogeneous regulatory landscape, where some states have enacted more stringent water quality and operator certification guidelines. While this research showed that Utility and Contract operators had a lower probability of SDWA violations, that was only for the Michigan context. Table 2 in Section 2.4 showed how operator certification requirements differed between the bordering states of Michigan and Indiana. Future research and regulators could benefit from exploring how these requirements impact the frequency of interactions and the effectiveness of the interactions. Do more required continuing education hours impact the number of inter-operator interactions or their effectiveness? It is also important to situate the findings in the context of only covering 2019 and 2020 SDWA violations and interactions for 2019. While the reported interactions were 215 probably not highly impacted by the Covid-19 pandemic as they were measured by the survey for the 2019 or the year prior to the Covid-19 pandemic shutdowns of in-person activities, there could have been a greater impact on the 2020 SDWA violations as these were collected for the year 2020. Future research on assessing the hot and cold spots over extended temporal periods would illuminate the regions with persistent SDWA compliance problems and show some of the possible targeted solutions. The theories of knowledge transfers and spillovers impacting CWS performance could be more robust and possibly prescriptive for better oversight and support of CWS operators, with broader spatial and temporal scales. Similar to how future research could benefit from exploring spatial scales, there is also an opportunity to reduce the scale from the large region to a smaller unit (metropolitan area or county). The state-scaled research suffered from several limitations concerning data (discussed in sections 3.2.2 and 3.4.1); a finer spatial scale could eliminate some of these issues. An example of the potential benefits of finer-scale analysis comes from the issues of the urban/rural designations for the EGLE districts and individual CWSs. Some of the “urban” labels for systems were an artifact of using the county RUCOs to designate the urbanity or rurality of the system; however, some of these CWSs might be in an urban county but not in an urban area within that county. A smaller spatial unit with a finer scale urban/rural measure could illuminate more about the CWSs. In addition to the urban/rural nature of the CWSs, it could also allow for direct data collection on CWSs. A case study approach could collect more information about the age of CWSs, population density served, and boundaries to provide a more direct picture of CWSs. 216 Future research would benefit from exploring alternative measures of CWS performance and the role of inter-CWS operator interactions. As discussed in section 3.4.2, there are numerous issues with the “SDWA violations” data, and using this as the measure of performance only provides one piece of the overall performance puzzle. Utilizing performance metrics such as water loss, or financial health, would further explain the role of operator interactions in increasing alternative CWS performance measures. Michigan does not provide these types of data to the public or researchers, and expansion to other states (providing these types of data) would allow for an investigation into the alternative metrics of CWS performance. The opportunity exists for further research directly on operators' social and professional networks. This exploratory research could not capture which operators were professionally engaging. Knowing these details would allow exploring the networks through network modeling. This research only captured the frequency of interactions, and while the interviews provided some information on the interactions, there was not enough information to appropriately run a network model. If the research could capture the direct networks, then it could illuminate more information on the quality and quantity of the networks throughout the state. This approach would capture higher spatial resolution and identify the CWS operators that are professionally isolated from other operators. Further, it could assess the quality of the networks over space and link the data to show the full regional advantages. Finally, future research could connect SDWA compliance to Clean Water Act (CWA) compliance to explore the relationship between the two acts. This type of investigation would flesh out more about the CWS operators, some of whom are also wastewater operators, and the relationship between drinking water and wastewater treatment. Understanding the professional 217 network connections between these operators could illuminate the direct role of professional engagement on both sides of water-related public services and fill in a missing link in the relationship between human capital and SDWA compliance. 6.5 Conclusion This research investigated the role of inter-operator interactions on SDWA compliance and found that more reported interactions reduced the probabilities of SDWA violations for Utility and Contract operators. It expanded on the ideas of knowledge transfers and spillovers from the theories of organizational learning, innovation systems, and agglomeration economies by performing quantitative assessments and expansion to the public utility and natural resource- based sectors. Previous CWS compliance research has largely ignored the human capital factors, and this research found that operator type and interactions have a significant impact on the probability of compliance, and research can model not only the CWS level but also include more. Fundamentally, this research suggests that the relevance of CWS operators should not be overlooked by the public or policymakers, as they are one of the primary drivers of the delivery of safe drinking water. 218 APPENDICES 219 APPENDIX A: Open Response to the last Survey Question Positive Experience Neutral or Details Other ◼ The treatment plants on the west ◼ No open forum to bounce ques- ◼ I have a F-1 and a S-3 not a S- side of Michigan have always had tions off of 1 an active relationship. We regularly ◼ some kind of database or spread- ◼ I live in a San Marino Villa seek advice from each other as well sheet with contact info would be community located in sou- as meeting a few times per year. handy hfield. The subdivision own a ◼ The exchange of ideas and technics ◼ At water related meetings a lot of private well. We follow in- benefit both. interactin takes place but small structions from DEQ, Our ◼ no one person has all the answer you groups in an area also takes place work is limited to sampling. must use the water community like lunch ETC. Subdivision has hired private ◼ This is how we learn, classes only ◼ Many Operators from other sys- contractor to themainnnce. get you so far. tems do not have the experience ◼ I have ran multiple ◼ Although we all have to follow the that I have or do not take the time wastewater facilities also for same guidelines everyone has their to read the rules or guidance doc- over 14 years own way of preforming different du- uments from State and Federal ◼ I have been in the water ties. By talking about the different Drinking Water Regulators. buisness from 1983 where I ways of doing things someone may ◼ started as a shift operator from try another way if they agree it will ◼ Usually at CEC classes, small talk a parks job with the City of St. improve their job ◼ Most of my questions recently in- Joseph MI. ◼ Very good 1 on 1 with Local water volve types of equipment that we ◼ Most operators I have inter- system plan to invest in acted with are from Municipal ◼ A lot of the interaction is helping ◼ Interaction is usually related to systems where all of mine are train up young operators. Public Notifications, Sampling, private or even if it is owned ◼ It is an integral part of my decision EGLE, making distribution sys- by the State is a much smaller making process. tem changes. system than what they deal ◼ Operators should meet at regular in- ◼ Mainly focused interactions in with. Also, as a private opera- tervals, it can be very helpful to new learning the rules of the EPA and tor I count on Maintenance operators. state. Have been in the business staff or outside contractors to ◼ I have and continue to assist the wa- for 50 years. do repairs where the munici- ter system I retired from which hap- ◼ Generally get into discussions pal people do their own re- pens to be the community I live in. with operators of similar systems pairs for the most part. Their operators are not experienced at CE courses ◼ I have not had discussions enough to fully understand the ◼ Most networking that takes place with other water operators quirks. away from my immediate region ◼ it happens only rarely ◼ I think they are a necessity we try is at the MI-AWWA ACE confer- ◼ My interests are more in and help each other ence wastewater ◼ when talking with other certified op- ◼ As a member of multiple associa- ◼ Our system is so small, erators I like to discuss problem we tions and board director of Michi- <1000, that we don't often run may be facing or new regulations gan Rural Water Association I into complex issues that may have been implemented speak with and discuss water is- ◼ from the federal or state level sues with operators from multiple ◼ Most are very helpful different states ◼ They are priceless ◼ having worked in a number of ◼ I always have good encounters with other systems helps maintain con- the operators I talk with tacts ◼ A good way to learn ◼ Most interactions revolve around ◼ Very helpful to interact with other sharing equipment, borrowing operators materials or vetting contractors. Table 41: All open-ended survey question responses grouped by positive, neutral, and other 220 Table 41 (cont’d) ◼ I think we all seek each other's input ◼ Over the past few years, EGLE ◼ and opinions. Each system is some- has become an almost strictly en- what different. Each operator's ex- forcement agency, so in order to perience is a small sample. We all ask questions or run ideas past discuss ideas, problems, and solu- someone, most operators will in- tions. teract with other operators over ◼ Getting ideas from other operators is running the risk of receiving vio- a huge help. Why re-invent the lations from EGLE. wheel when other already have. ◼ Most of the operators I have deal- ◼ Interacting with other operators has ings with are also client commu- always been a positive experience nities that we represent as engi- ◼ It is critical. neering consultants ◼ great tools to make sure we are all ◼ Instructing for MWEA generates using our limited resources in the a platform to be approachable best possible way with questions ◼ Always helpful to discuss issues ◼ We are able to share information with peer agencies. that we know will affect other wa- ◼ People in this field have similar dili- ter plant operations. gence not only with their job but ◼ Most interaction occurs at meet- also with their need for knowledge ings and conferences. acquisition/dispensation ◼ Some are unwilling to offer useful ◼ I personally know operators from tips, others are very helpful several systems and we speak regu- ◼ Started the directors group at the larly. DCC and now meet with the ◼ networking is a vital component group from western wayne county ◼ Sharing is crucial. Asking questions group is crucial. Sometimes asking a ◼ I would like to have more interac- question sparks a bigger conversa- tions with other operators, and re- tion ally see how they handle their ◼ It is always good to network with work, rather than a quick discus- other operators. sion about Lead/Copper etc. ◼ I think its vital to communicate with ◼ The community that I serve has others in this field. created a partnership with 3 other ◼ you always learn something when neighboring Cities to form an au- talking with other operators about thority, North Oakland County many different subjects. Water Authority (NOCWA), to ◼ It is always good to find out how share best practices, support oper- other communities handle situations. ations and control water rates ◼ getting knowledge of new lead /cop- ◼ The city of Plymouth is an active per laws participant in the Western Wayne ◼ during CEC classes and expos is Public Utilities Work Group very helpful which meets bi-monthly to dis- ◼ Everyone I have met in the industry cuss public works topics includ- has been helpful and very open with ing water system operations. information. It's as if we are one big ◼ most interactions are questions for team. me due to my experience in this field 221 Table 41 (cont’d) ◼ They are always very willing to answer any questions I have. ◼ We all work together ◼ everyone is willing to share ◼ Been helpful at water classes ◼ Collaboration is essential to op- erate and maintain public water systems. ◼ It's always good to get another perspective on a situation ◼ Basically working through chal- lenges we each may have en- countered. ◼ Interactions with other munici- palities is pivotal to the success of running a successful Water Distribution System. We all are facing the same challenges and addressing them collaboratively helps with communication with the public. This also allows mu- nicipalities to share new stream- line processes and workflows to assure new Environmental Com- pliance regulations can be com- pleted and implemented effi- ciently. 222 APPENDIX B: Ordered Logistic Regression Variables for Interactions and Violation Group Dependent Variable Models Variable Type Description (Model Name) Violations Group Ordinal Percentage of Violations for operators systems, as retrieved from the SDWIS/ECHO databases for 2020. There are three (Violations Percentage) (ordered groups: 0% of CWSs with a violation, 0.1 to 99% of CWSs with categorical) a violation, and 100% of CWSs with a violation. [Primary Hypotheses- dependent variable] Interactions Ordinal Survey Data from Question 16, broken into 5 groups. (Interactions) (ordered (0 interactions | 1 to 10 interactions | 11 to 20 interactions| 21 to categorical) 30 interactions| 31+ interactions) [Endogenous Hypotheses Model- dependent variable] Continuous (transformed) [Primary Hypotheses Model] Transformed to continuous using the median number of the groups Interactions* Operator Type Factor Interaction term between the reported number of interactions and Operator Type (Interactions*Operator Type) Operator Type Categorical EGLE data on employer type for the operator in 3 groups. (Operator Type) (Non-Affiliated, Contract, Utility Operator) Group Membership Binary Survey data from questions 11 and 12. Converted to binary variable (Group Membership) (1 for any group membership, 0 for no group membership) Education Binary Survey data from question 1 on level of education attained. Converted to binary (Education) (1 for bachelor’s degree or higher, 0 for less than a bachelor’s degree) Total Systems Continuous Total number of community water systems operated obtained from survey question 9. (Systems) (scaled) Certification Length Continuous Survey question 4 about the length of time they have been at their current certification level. (Experience) (scaled) Other Operators Continuous Survey question 10 about the number of other operators at their organization. (Operators) (scaled) Table 42: Ordered Logistic Regression Variable Overview for Endogenous and Primary Hypotheses Model 223 Table 42 (cont’d) Average Population Served Continuous SDWIS data based on the population PWS ID of each system and averaged across operator systems. (Average Population) (scaled) Continuing Education Credits Continuous EGLE data on the number of CEC hours earned since last Earned renewal (scaled) (Earned Recertification Hours) Length of Time as an Operator Ordinal Survey question 3 about the length of time as Operator of record at their current system/s (Systems) (Ordered Categorical) Perception of Usefulness Ordinal Survey Question 19 about the perception of usefulness of interaction. (Use) (Converted to Continuous) (1 is useless to 5 which is useful) Meeting Hours Ordinal Survey Question 13 about the number of hours spent at professional meetings, conferences, summits, or forums in the (Meeting Hours) (Converted to last year. Continuous) 224 APPENDIX C: OLR models for interactions (6 models) Equation Model 𝑙𝑜𝑔𝑖𝑡(𝑃(𝑌 ≤ 𝑗)) = 𝛼0 + 𝛼1 + 𝛼2 + 𝛼3 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 + 𝛽2 ∗ 𝐺𝑟𝑜𝑢𝑝𝑀𝑒𝑚𝑏𝑒𝑟𝑠ℎ𝑖𝑝 + 𝛽3 ∗ 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽4 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚𝑠𝑖 + 𝛽5 ∗ 𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒𝑖 + 𝛽6 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠𝑖 + 𝛽7 ∗ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 + 𝛽8 EN.M1.All ∗ 𝐸𝑎𝑟𝑛𝑒𝑑𝑅𝑒𝑐𝑒𝑟𝑡𝑖𝑓𝑖𝑎𝑐𝑡𝑖𝑜𝑛𝐻𝑜𝑢𝑟𝑠𝑖 + 𝛽9 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒𝑖 + 𝛽10 ∗ 𝑈𝑠𝑒𝑓𝑢𝑙𝑛𝑒𝑠𝑠𝑖 + 𝛽11 ∗ 𝑀𝑒𝑒𝑡𝑖𝑛𝑔𝐻𝑜𝑢𝑟𝑠𝑖 𝛼0 + 𝛼1 + 𝛼2 + 𝛼3 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 + 𝛽2 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠𝑖 + 𝛽3 ∗ 𝑈𝑠𝑒𝑓𝑢𝑙𝑛𝑒𝑠𝑠𝑖 EN.M2.Best + 𝛽4 ∗ 𝑀𝑒𝑒𝑡𝑖𝑛𝑔𝐻𝑜𝑢𝑟𝑠𝑖 𝛼0 + 𝛼1 + 𝛼2 + 𝛼3 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 + 𝛽2 ∗ 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽3 EN.M3.ENall ∗ 𝐺𝑟𝑜𝑢𝑝𝑀𝑒𝑚𝑏𝑒𝑟𝑠ℎ𝑖𝑝𝑖 EN.M4.Type 𝛼0 + 𝛼1 + 𝛼2 + 𝛼3 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 EN.M5.Group 𝛼0 + 𝛼1 + 𝛼2 + 𝛼3 + 𝛽1 ∗ 𝐺𝑟𝑜𝑢𝑝𝑀𝑒𝑚𝑏𝑒𝑟𝑠ℎ𝑖𝑝𝑖 EN.M6.EDU 𝛼0 + 𝛼1 + 𝛼2 + 𝛼3 + 𝛽1 ∗ 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 Table 43: Overview of the Six Ordered Logistic Regression Models Investigating the Endogenous Hypotheses 225 APPENDIX D: OLR for Primary Hypotheses of Operator only level Model Equation 𝑙𝑜𝑔𝑖𝑡(𝑃(𝑌 ≤ 𝑗)) = OP.OL.All 𝛼0 + 𝛼1 + 𝛼2 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 + 𝛽2 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽3 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑖𝑛⁡2019 + 𝛽4 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠𝑖 ) + 𝛽5 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚𝑠𝑖 + 𝛽6 ∗ 𝐶𝑒𝑟𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑙𝑒𝑛𝑔𝑡ℎ𝑖 + 𝛽7 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠𝑖 + 𝛽8 ∗ 𝑂𝑡ℎ𝑒𝑟𝑜𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑠 + 𝛽9 ∗ 𝑈𝑠𝑒𝑓𝑢𝑙𝑛𝑒𝑠𝑠𝑖 + 𝛽10 ∗ 𝐺𝑟𝑜𝑢𝑝𝑚𝑒𝑚𝑏𝑒𝑟𝑠ℎ𝑖𝑝 + 𝛽11 ∗ 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑖 + 𝛽12 ∗ 𝑀𝑒𝑒𝑡𝑖𝑛𝑔𝐻𝑜𝑢𝑟𝑠𝑖 +𝛽13 ∗ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒⁡𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 + 𝛽14 ∗ 𝐶𝐸𝐶 + 𝛽15 ∗ 𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 OP.OL.Reduced 𝛼0 + 𝛼1 + 𝛼2 + 𝛽1 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒𝑖 + 𝛽2 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽3 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑖𝑛⁡2019 + 𝛽4 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠𝑖 ) Table 44: Ordered Logistic Regression Models for Primary Hypotheses Investigation at only the Operator Level 226 APPENDIX E: GLMM models for primary hypotheses Model Dependent Variable Linear Relationship between IV and DV? Prim.Any.All Binary 2020 Any α + 𝛽1 ∗ 𝑆𝑜𝑢𝑟𝑐𝑒⁡𝑊𝑎𝑡𝑒𝑟 + 𝛽2 ∗ 𝑃𝑟𝑖𝑚𝑎𝑟𝑦⁡𝑆𝑦𝑠𝑡𝑒𝑚 + 𝛽3 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚⁡𝑆𝑖𝑧𝑒 + 𝛽4 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠⁡𝑌𝑒𝑎𝑟 + 𝛽5 Violation ~ ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 + 𝛽6 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽7 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽8 ∗ Group⁡Membership + 𝛽9 ∗ 𝑀𝑒𝑑𝑖𝑎𝑛⁡𝐻𝑜𝑚𝑒⁡𝑉𝑎𝑙𝑢𝑒 + 𝛽10 ∗ 𝑀𝑒𝑑𝑖𝑎𝑛⁡𝐻𝑜𝑢𝑠𝑒ℎ𝑜𝑙𝑑⁡𝐼𝑛𝑐𝑜𝑚𝑒 + 𝛽11 ∗ 𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡 + 𝛽12 ∗ 𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡𝑎𝑙⁡𝑄𝑢𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛽13 ∗ 𝑅𝑢𝑟𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛽14 ∗ 𝑃𝑒𝑛𝑛𝑖𝑠𝑢𝑙𝑎 + ⁡ 𝛼𝑗 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 + 𝜀 Prim.Any.Reduced α + 𝛽1 ∗ 𝑃𝑟𝑖𝑚𝑎𝑟𝑦⁡𝑆𝑦𝑠𝑡𝑒𝑚 + 𝛽2 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠⁡𝑌𝑒𝑎𝑟 + 𝛽3 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 + 𝛽4 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽5 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽6 ∗ Group⁡Membershi + ⁡ 𝛼𝑗 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 + 𝜀 Prim.NH.All Binary 2020 α + 𝛽1 ∗ 𝑆𝑜𝑢𝑟𝑐𝑒⁡𝑊𝑎𝑡𝑒𝑟 + 𝛽2 ∗ 𝑃𝑟𝑖𝑚𝑎𝑟𝑦⁡𝑆𝑦𝑠𝑡𝑒𝑚 + 𝛽3 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚⁡𝑆𝑖𝑧𝑒 + 𝛽4 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠⁡𝑌𝑒𝑎𝑟 + 𝛽5 Violation Non- ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 + 𝛽6 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽7 Health Violation ~ ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽8 ∗ Group⁡Membership + 𝛽9 ∗ 𝑀𝑒𝑑𝑖𝑎𝑛⁡𝐻𝑜𝑚𝑒⁡𝑉𝑎𝑙𝑢𝑒 + 𝛽10 ∗ 𝑀𝑒𝑑𝑖𝑎𝑛⁡𝐻𝑜𝑢𝑠𝑒ℎ𝑜𝑙𝑑⁡𝐼𝑛𝑐𝑜𝑚𝑒 + 𝛽11 ∗ 𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡 + 𝛽12 ∗ 𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡𝑎𝑙⁡𝑄𝑢𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛽13 ∗ 𝑅𝑢𝑟𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛽14 ∗ 𝑃𝑒𝑛𝑛𝑖𝑠𝑢𝑙𝑎 + ⁡ 𝛼𝑗 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 + 𝜀 Prim.NH.Reduced α + 𝛽1 ∗ 𝑃𝑟𝑖𝑚𝑎𝑟𝑦⁡𝑆𝑦𝑠𝑡𝑒𝑚 + 𝛽2 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠⁡𝑌𝑒𝑎𝑟 + 𝛽3 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 + 𝛽4 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽5 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽6 ∗ Group⁡Membership + 𝛽7 ∗ 𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡𝑎𝑙⁡𝑄𝑢𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛼𝑗 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 + 𝜀 Prim.MAJ.All Binary 2020 Major α + 𝛽1 ∗ 𝑆𝑜𝑢𝑟𝑐𝑒⁡𝑊𝑎𝑡𝑒𝑟 + 𝛽2 ∗ 𝑃𝑟𝑖𝑚𝑎𝑟𝑦⁡𝑆𝑦𝑠𝑡𝑒𝑚 + 𝛽3 ∗ 𝑆𝑦𝑠𝑡𝑒𝑚⁡𝑆𝑖𝑧𝑒 + 𝛽4 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠⁡𝑌𝑒𝑎𝑟 + 𝛽5 Violation ~ ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 + 𝛽6 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽7 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽8 ∗ Group⁡Membership + 𝛽9 ∗ 𝑀𝑒𝑑𝑖𝑎𝑛⁡𝐻𝑜𝑚𝑒⁡𝑉𝑎𝑙𝑢𝑒 + 𝛽10 ∗ 𝑀𝑒𝑑𝑖𝑎𝑛⁡𝐻𝑜𝑢𝑠𝑒ℎ𝑜𝑙𝑑⁡𝐼𝑛𝑐𝑜𝑚𝑒 + 𝛽11 ∗ 𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡 + 𝛽12 ∗ 𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡𝑎𝑙⁡𝑄𝑢𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛽13 ∗ 𝑅𝑢𝑟𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛽14 ∗ 𝑃𝑒𝑛𝑛𝑖𝑠𝑢𝑙𝑎 + ⁡ 𝛼𝑗 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 + 𝜀 Prim.MAJ.Reduced α + 𝛽1 ∗ 𝑃𝑟𝑖𝑚𝑎𝑟𝑦⁡𝑆𝑦𝑠𝑡𝑒𝑚 + 𝛽2 ∗ 𝑉𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛⁡𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠⁡𝑌𝑒𝑎𝑟 + 𝛽3 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 + 𝛽4 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽5 ∗ (𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟⁡𝑇𝑦𝑝𝑒 ∗ 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 + 𝛽6 ∗ Group⁡Membership + 𝛽7 ∗ 𝐸𝑛𝑣𝑖𝑟𝑜𝑛𝑚𝑒𝑛𝑡𝑎𝑙⁡𝑄𝑢𝑎𝑙𝑖𝑡𝑦 + ⁡ 𝛼𝑗 ∗ 𝑂𝑝𝑒𝑟𝑎𝑡𝑜𝑟 + 𝜀 Table 45: Overview of All Six Primary Hypotheses GLMMs 227 APPENDIX F: GVIF Tables for All Models Variable DF 𝑮𝑽𝑰𝑭𝟏/(𝟐∗𝒅𝒇) Operator Type (Operator Type) 2 1.169 Group Membership (Group Membership) 1 1.175 Education (Education) 1 1.048 Total Systems (Systems) 1 1.211 Certification Length (Experience) 1 1.102 Other Operators (Operators) 1 1.159 Average Population Served (Average Population) 1 1.135 Continuing Education Credits Earned (Earned Recertification Hours) 1 1.08 Length of Time as an Operator (Systems) 2 1.065 Perception of Usefulness (Use) 1 1.048 Meeting Hours (Meeting Hours) 1 1.124 Table 46: GVIF Measures of Multi-Collinearity for Endogenous Hypotheses Models 228 Variable DF GVIF (Value) Operator Type (Operator Type) 2 1.38 Interactions 1 6.91 Type* Interactions 2 2.64 Total Systems (Systems) 1 1.24 Certification Length (Experience) 1 1.12 Other Operators (Operators) 1 1.17 Perception of Usefulness (Use) 1 1.11 Group Membership (Group Membership) 1.18 Education (Education) 1 1.07 2019 Violation 1 1.13 Meeting Hours (Meeting Hours) 1 1.17 Average Population Served (Average Population) 1 1.14 Continuing Education Credits Earned (Earned Recertifi- 1 1.09 cation Hours) Length of Time as an Operator (Systems) 2 1.08 Table 47: GVIF Measures of Multi-Collinearity for Primary Hypotheses Models (Ordered Logistic Regression) 229 Variable DF GVIF (value) Operator Type 2 1.54 Interactions 1 7.27 MHI 1 2.32 MHV 1 2.72 Unemployment 1 1.27 Education 1 2.25 Environmental Quality 1 1.06 Population Size 2 1.25 Rurality 3 1.12 Group Membership 1 1.16 Violation 2019 1 1.06 Purchased Water 1 1.19 Primary System 1 1.49 Upper Peninsula 1 1.15 Operator Type * Interactions 2 2.76 Table 48: GVIF Measures of Multi-Collinearity for Primary Hypotheses Models (GLMM) 230 APPENDIX G: Convergence Plots for OLR models Endogenous Hypotheses Convergence Plots Figure 30: Endogenous Hypotheses Convergence Plots 231 Figure 31: Primary Hypotheses OLR Convergence Plots 232 Figure 32: GLMM Convergence Plots 233 BIBLIOGRAPHY 234 BIBLIOGRAPHY Acs, Z. J., & Sanders, M. (2012). Patents, knowledge spillovers, and entrepreneurship. Small Business Economics, 39(4), 801-817. Agarwal, R., Audretsch, D., & Sarkar, M. B. (2010). Knowledge spillovers and strategic entre- preneurship. Strategic Entrepreneurship Journal, 4(4), 271-283. Allaire, M., Wu, H., & Lall, U. (2018). National trends in drinking water quality violations. Pro- ceedings of the National Academy of Sciences, 115(9), 2078-2083. Appleyard, M. M. (1996). How does knowledge flow? Interfirm patterns in the semiconductor industry. Strategic management journal, 17(S2), 137-154. Asheim, B. T., Smith, H. L., & Oughton, C. (2011). Regional innovation systems: theory, empir- ics and policy. Regional studies, 45(7), 875-891. Argote, L., & Ingram, P. (2000). Knowledge transfer: A basis for competitive advantage in firms. Organizational behavior and human decision processes, 82(1), 150-169. Argote, L. (2011). Organizational learning research: Past, present and future. Management learn- ing, 42(4), 439-446. Audretsch, D. B., & Feldman, M. P. (1996). R&D spillovers and the geography of innovation and production. The American economic review, 86(3), 630-640. Audretsch, D. B., & Keilbach, M. (2008). Resolving the knowledge paradox: Knowledge-spillo- ver entrepreneurship and economic growth. Research Policy, 37(10), 1697-1705. AWWA, (2021). About Us: Who We Are. American Water Works Association. https://www.awwa.org/About-Us AWWA. (2020). State of the Water Industry Report. American Water Works Association. https://www.awwa.org/Portals/0/Awwa/Professional%20Development/2020SOTWIre- port.pdf?ver=2020-08-06-130735-113 Bailey, T. C., & Gatrell, A. C. (1995). Interactive spatial data analysis (Vol. 413). Essex: Long- man Scientific & Technical. Balazs, C., Morello-Frosch, R., Hubbard, A., & Ray, I. (2011). Social disparities in nitrate-con- taminated drinking water in California’s San Joaquin Valley. Environmental health perspec- tives, 119(9), 1272-1278. 235 Bartlett, J. (2015, October 25). Jonathan Bartlett. The Stats Geek. https://thestatsgeek.com/2014/02/16/the-hosmer-lemeshow-goodness-of-fit-test-for-logistic-re- gression/. Baum, R., Amjad, U., Luh, J., & Bartram, J. (2015). An examination of the potential added value of water safety plans to the United States national drinking water legislation. International jour- nal of hygiene and environmental health, 218(8), 677-685. Beecher, J. A. (2009). Private water and economic regulation in the United States. In Handbook Utility Management (pp. 779-801). Springer, Berlin, Heidelberg. Beecher, J. A. (2013). What matters to performance? Structural and institutional dimensions of water utility governance. International Review of Applied Economics, 27(2), 150-173. Beecher, J. A., & Kalmbach, J. A. (2013). Structure, regulation, and pricing of water in the United States: A study of the Great Lakes region. Utilities Policy, 24, 32-47. Beecher, J. A. (2015). Economic regulation of water utilities: the US framework. In The Routledge Companion to Network Industries (pp. 274-288). Routledge. Beecher, J., Redican, K., & Kolioupoulos, M. (2020). (Mis) Classification of Water Systems in the United States. Available at SSRN 3627915. Berg, S., & Marques, R. C. (2011). Quantitative studies of water and sanitation utilities: a bench- marking literature survey. Water Policy, 13(5), 591-606. Bivand, R., & Rundel, C. (2020). Interface to geometry engine-open source ('GEOS’)[R package rgeos version 0.5–5]. Computer software. CRAN. Blanchard, C. S., & Eberle, W. D. (2013). Technical, managerial, and financial capacity among small water systems. Journal‐American Water Works Association, 105(5), E229-E235. Booker, B. (2021). Former Michigan Gov. Rick Snyder Charged in Flint Water Crisis. National Public Radio. https://www.npr.org/2021/01/13/956592508/new-charges-in-flint-water-crisis-in- cluding-former-michigan-gov-rick- snyder#:~:text=Rick%20Snyder%20Charged%20In%20Flint%20Water%20Cri- sis%20At%20least%20a,system%20city%20residents%20relied%20on. Brant, R. (1990) Assessing proportionality in the proportional odds model for ordinal logistic re- gression. Biometrics, 46, 1171–1178 Brelsford, C., Dumas, M., Schlager, E., Dermody, B. J., Aiuvalasit, M., Allen-Dumas, M. R., ... & Zipper, S. C. (2020). Developing a sustainability science approach for water systems. Ecology & society, (2). 236 Brinkerhoff, R. O. (2006). Increasing impact of training investments: An evaluation strategy for building organizational learning capability. Industrial and commercial training. Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). Geographically weighted regres- sion: a method for exploring spatial nonstationarity. Geographical analysis, 28(4), 281-298. Byrne, G., Charlton, M., & Fotheringham, S. (2009). Multiple dependent hypothesis tests in geo- graphically weighted regression. In Proceedings of the 10th International Conference on Geo- Computation. University of New South Wales. Bürkner, P. C. (2017). Advanced Bayesian multilevel modeling with the R package brms. arXiv preprint arXiv:1705.11123. Bürkner, P. C. (2018). Advanced bayesian multilevel modeling with the R package brms. R J. 10, 395–411. doi: 10.32614. RJ-2018-017. Burt, J. E., Barber, G. M., & Rigby, D. L. (2009). Elementary statistics for geographers. Guilford Press. Cao, M., & Zhang, Q. (2011). Supply chain collaboration: Impact on collaborative advantage and firm performance. Journal of operations management, 29(3), 163-180. Center for Disease Control and Prevention (2009). Water Sources. Cdc.gov. https://www.cdc.gov/healthywater/drinking/public/water_sources.html Charlot, S., & Duranton, G. (2004). Communication externalities in cities. Journal of Urban Economics, 56(3), 581-613. Chen, S. T., Xiao, L., & Staicu, A. M. (2019). An Approximate Restricted Likelihood Ratio Test for Variance Components in Generalized Linear Mixed Models. arXiv preprint arXiv:1906.03320. Cherrington, D. J. (1994). Organizational behavior: The management of individual and organi- zational performance. Prentice Hall. Clark, M. (2018) Bayesian Basics. M-clark.github.io. https://m-clark.github.io/bayesian-basics/ Clyde, M., Rundel, M. C., Rundel, C., Banks, D., Chai, C., & Huang, L. (2020). An Introduction to Bayesian Thinking-A Companion to the Statistics with R Course. GitHub repository: GitHub. Corley. C. (2016). Water Lines May Disturb Lead Pipes. National Public Radio. https://www.npr.org/2016/04/14/474130954/chicagos-upgrades-to-aging-water-lines-may-dis- turb-lead-pipes Dinno, A., & Dinno, M. A. (2017). Package ‘dunn. test’. CRAN Repos, 10, 1-7. 237 Dziegielewski, B., & Bik, T. (2004). Technical assistance needs and research priorities for small community water systems. Journal of Contemporary Water Research & Education, 128(1), 13- 20. Easterby‐Smith, M., Lyles, M. A., & Tsang, E. W. (2008). Inter‐organizational knowledge trans- fer: Current themes and future prospects. Journal of management studies, 45(4), 677-690. EGLE. (2020). Capacity Development Report for the Governor 2020 (Report No. 703955_7). Michigan Department of Environment, Great Lakes, and Energy- Drinking Water and Environ- mental Health Division. https://www.michigan.gov/documents/egle/egle-dwehd-2020_capac- ity_development_report_to_governor_703955_7.pdf EGLE. (2021). EGLE Water Drinking Water. EGLE - Drinking Water. https://www.michigan.gov/egle/0,9429,7-135-3313_3675---,00.html. EPA (2017, October). Review of the Michigan Department of Environmental Quality Drinking Water Program 2016. Report EPA Public Water Supervision Program. Retrieved from https://nepis.epa.gov/Exe/ZyNET.exe/P100T4JE.txt?ZyActionD=ZyDocument&Cli- ent=EPA&Index=2016%20Thru%202020&Docs=&Query=%28minor%20viola- tion%29%20OR%20FNAME%3D%22P100T4JE.txt%22%20AND%20FNAME%3D%22P100 T4JE.txt%22&Time=&EndTime=&SearchMethod=1&TocRestrict=n&Toc=&TocEn- try=&QField=&QFieldYear=&QFieldMonth=&QFieldDay=&Use- QField=&IntQFieldOp=0&ExtQFieldOp=0&XmlQuery=&File=D%3A%5CZYFILES%5CIN- DEX%20DATA%5C16THRU20%5CTXT%5C00000005%5CP100T4JE.txt&User=ANONY- MOUS&Password=anonymous&SortMethod=h%7C-&MaximumDocuments=1&FuzzyDe- gree=0&ImageQuality=r75g8/r75g8/x150y150g16/i425&Display=hpfr&DefSeek- Page=x&SearchBack=ZyActionL&Back=ZyActionS&BackDesc=Results%20page&Maxi- mumPages=1&ZyEntry=1 EPA. (2016, March). Summary of State Operator Certification Programs. US Environmental Protection Agency- Office of Water. (816-R-16-002). Washington D.C. https://www.epa.gov/sites/default/files/2016-03/documents/summary_of_state_operator_certifi- cation_programs.pdf EPA. (2000, January). Operator Certification Guidelines: Implementation Guidance.US Envi- ronmental Protection Agency- Office of Water. (816-R-00-022). Washington D.C. https://www.epa.gov/sites/default/files/2015-11/documents/operator_certification_guidelines_- _implementation_guidance.pdf EPA. (2020, August 6). State/Territory/Navajo Nation Annual Public Water Systems Compliance Report. United States Environmental Agency Compliance. https://www.epa.gov/compli- ance/stateterritorynavajo-nation-annual-public-water-systems-compliance-report#:~:text=Wyo- ming%20and%20the%20District%20of,and%20posts%20their%20annual%20reports. EPA. (2021) Drinking Water Distribution Systems. US Environmental Protection Agency. https://www.epa.gov/dwsixyearreview/drinking-water-distribution-systems 238 EPA. (2021). Drinking Water Systems Dashboard-Help. US Environmental Protection Agency. https://echo.epa.gov/help/drinking-water-dashboard-help Executive Order 2019-02. Michigan (February 4, 2019). Fagerland, M. W., & Hosmer, D. W. (2012). A generalized Hosmer–Lemeshow goodness-of-fit test for multinomial logistic regression models. The Stata Journal, 12(3), 447-453. Fallon, S. (2018, September 6). New Jersey becomes First State to Regulate Dangerous Chemi- cal PFNA in Drinking Water. Northjersey.com. https://www.northjersey.com/story/news/envi- ronment/2018/09/06/new-jersey-first-state-regulate-dangerous-chemical-pfna-pfoa/1210328002/ Fan, X., Miller, B. C., Park, K. E., Winward, B. W., Christensen, M., Grotevant, H. D., & Tai, R. H. (2006). An exploratory study about inaccuracy and invalidity in adolescent self-report surveys. Field Methods, 18(3), 223-244. Fischer, M. M., & Fröhlich, J. (2001). Knowledge, complexity and innovation systems: pro- logue. In Knowledge, Complexity and Innovation Systems (pp. 1-17). Springer, Berlin, Heidel- berg. Fluid Surveys Team. “Response Rate Statistics for Online Surveys -What Numbers Should You Be Aiming For?” Fluid Surveys University, Survey Monkey, 8 Oct. 2014, fluidsurveys.com/uni- versity/response-rate-statistics-online-surveys-aiming/. Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2003). Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons. Fox, J., & Monette, G. (1992). Generalized collinearity diagnostics. Journal of the American Statistical Association, 87(417), 178-183. Frost, J. (2017). Multicollinearity in Regression Analysis: Problems, Detection, and Solutions. Statistics By Jim. https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/. Fu, G., Liu, P., & Swallow, S. K. (2020). Effectiveness of Public versus Private Ownership: Vio- lations of the Safe Drinking Water Act (SDWA). Agricultural and Resource Economics Re- view, 49(2), 291-320. Gertler, M. S. (2003). Tacit knowledge and the economic geography of context, or the undefina- ble tacitness of being (there). Journal of economic geography, 3(1), 75-99. Gilbert, M., & Cordey-Hayes, M. (1996). Understanding the process of knowledge transfer to achieve successful technological innovation. Technovation, 16(6), 301-312. GAO. (2011). Unreliable State Data Limit EPA’s Ability to Target Enforcement Priorities and Communicate Water Systems’ Performance. United States Government Accountability Of- fice, Rep. No. 11-381. 239 Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and computing, 24(6), 997-1016. Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (2019). R-squared for Bayesian regression models. The American Statistician, 73(3), 307-309. Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2013). GWmodel: an R package for exploring spatial heterogeneity using geographically weighted models. arXiv preprint arXiv:1306.0413. Goswami, A. K., & Agrawal, R. K. (2018). A reflection on knowledge sharing research: patterns and trends. VINE Journal of Information and Knowledge Management Systems. Greiman, L. (2017). Data Limitations in the American Community Survey: The Impact on Rural Disability Research. Greiner, P. T. (2016). Social drivers of water utility privatization in the United States: An exami- nation of the presence of variegated neoliberal strategies in the water utility sector. Rural Sociol- ogy, 81(3), 387-406. Grigg, N. S. (2018). Classifying Drinking Water Systems to Improve Their Effectiveness. Jour- nal‐American Water Works Association, 110(11), 54-62. Grooms, K. K. (2016). Does water quality improve when a Safe Drinking Water Act violation is issued? A study of the effectiveness of the SDWA in California. The BE Journal of Economic Analysis & Policy, 16(1), 1-23. Groundworks. (2020). What are the worst U.S. cities for Drought? https://www.groundworks- companies.com/about/articles/worst-us-cities-for-drought/ Haining, R. P., & Haining, R. (2003). Spatial data analysis: theory and practice. Cambridge uni- versity press. Haddad, K. (2021). Flint Water Crisis Investigation: Here’s who was charged. Graham Media Group. https://www.clickondetroit.com/news/michigan/2021/01/14/flint-water-crisis-investiga- tion-heres-who-was-charged/ Hamdoun, M., Jabbour, C. J. C., & Othman, H. B. (2018). Knowledge transfer and organiza- tional innovation: Impacts of quality and environmental management. Journal of Cleaner Pro- duction, 193, 759-770. Harrell, F. E. (2015). Ordinal logistic regression. In Regression modeling strategies (pp. 311- 325). Springer, Cham. Harrison, C. (2007, November 17). TIP SHEET ON QUESTION WORDING. Retrieved 2019, from https://psr.iq.harvard.edu/files/psr/files/PSRQuestionnaireTipSheet_0.pdf 240 Hoddinott, S. N., & Bass, M. J. (1986). The dillman total design survey method. Canadian fam- ily physician, 32, 2366. Hoffman, J. I. (2015). Biostatistics for medical and biomedical practitioners. Academic press. Howells, J. R. (2002). Tacit knowledge, innovation and economic geography. Urban stud- ies, 39(5-6), 871-884. Jaffe, A. B., Trajtenberg, M., & Henderson, R. (1993). Geographic localization of knowledge spillovers as evidenced by patent citations. the Quarterly journal of Economics, 108(3), 577-598. Jerez-Gomez, P., Céspedes-Lorente, J., & Valle-Cabrera, R. (2005). Organizational learning ca- pability: a proposal of measurement. Journal of business research, 58(6), 715-725. Jones, T. L., Baxter, M. A. J., & Khanduja, V. (2013). A quick guide to survey research. The An- nals of The Royal College of Surgeons of England, 95(1), 5-7. Johnson, D. R., & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A simu- lation study of categorization error. American Sociological Review, 398-407. Johnston, R., Jones, K., & Manley, D. (2018). Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Quality & quantity, 52(4), 1957-1976. Josset, L., Allaire, M., Hayek, C., Rising, J., Thomas, C., & Lall, U. (2019). The US Water Data Gap—A Survey of State‐Level Water Data Platforms to Inform the Development of a National Water Portal. Earth's Future, 7(4), 433-449. Kassambara, A. (2018, March 11). Logistic Regression Assumptions and Diagnostics in R. Sta- tistical Tools For High throughput Data Analysis. http://www.sthda.com/english/articles/36-clas- sification-methods-essentials/148-logistic-regression-assumptions-and-diagnostics-in-r/. Kéry, M. (2010). Introduction to WinBUGS for ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses. Academic Press. Konisky, D. M., & Schario, T. S. (2010). Examining environmental justice in facility‐level regu- latory enforcement. Social Science Quarterly, 91(3), 835-855. Krueger, T., Maynard, C., Carr, G., Bruns, A., Mueller, E. N., & Lane, S. (2016). A transdiscipli- nary account of water research. Wiley Interdisciplinary Reviews: Water, 3(3), 369-389. Lawler III, E. E. (2005). Creating high performance organizations. Asia Pacific Journal of Hu- man Resources, 43(1), 10-17. 241 Lee, M., & Choi, M. (2015). The Relationship between R&D investment and ownership struc- ture in KOSDAQ pharmaceutical firms. The Journal of the Korea Contents Association, 15(6), 445-454. Levitt, B., & March, J. G. (1988). Organizational learning. Annual review of sociology, 14(1), 319-338. Lienert, J., Monstadt, J., & Truffer, B. (2006). Future scenarios for a sustainable water sector: a case study from Switzerland. Liu, X. (2009). Ordinal regression analysis: Fitting the proportional odds model using Stata, SAS and SPSS. Journal of Modern Applied Statistical Methods, 8(2), 30. Love, K. (2020, October 5). R-Squared for Mixed Effects Models. The Analysis Factor. https://www.theanalysisfactor.com/r-squared-for-mixed-effects-models/. Lowry, R. Chapter 14: One-Way Analysis of Variance for Independent Samples. Concepts and applications of inferential statistics. http://vassarstats. net/textbook (accessed Mar 2013). Lu, B., Harris, P., Charlton, M., & Brunsdon, C. (2014). The GWmodel R package: further top- ics for exploring spatial heterogeneity using geographically weighted models. Geo-spatial Infor- mation Science, 17(2), 85-101. MacKinnon, D., & Cumbers, A. (2007). An introduction to economic geography: globalization, uneven development and place. Pearson Education. Manoruang, D., & Asavasuthirakul, D. (2019). Quality analysis of online geocoding services for Thai text addresses. Engineering and Applied Science Research, 46(2), 86-97. Marcillo, C. E., & Krometis, L. A. H. (2019). Small towns, big challenges: Does rurality influ- ence Safe Drinking Water Act compliance?. AWWA Water Science, 1(1), e1120. Marshall, A., & Marshall, M. P. (1920). The economics of industry. Macmillan and Company. McCulloch, C. E., & Neuhaus, J. M. (2014). Generalized linear mixed models. Wiley StatsRef: Statistics Reference Online. McDonald, Y. J., & Jones, N. E. (2018). Drinking Water Violations and Environmental Justice in the United States, 2011–2015. American journal of public health, 108(10), 1401-1407. McGavisk, E., Roberson, J. A., & Seidel, C. (2013). Using community economics to compare arsenic compliance and noncompliance. Journal‐American Water Works Association, 105(3), E115-E126. McLachlan, P. J., Chambers, J. E., Uhlemann, S. S., & Binley, A. (2017). Geophysical character- isation of the groundwater–surface water interface. Advances in Water Resources, 109, 302-319. 242 Meene, S. J., Brown, R. R., & Farrelly, M. A. (2011). Towards understanding governance for sustainable urban water management. Global environmental change, 21(3), 1117-1127. Meier, K. J., & O'Toole Jr, L. J. (2013). I think (I am doing well), therefore I am: Assessing the validity of administrators' self-assessments of performance. International Public Management Journal, 16(1), 1-27. Menard, S. (2002). Applied logistic regression analysis (Vol. 106). Sage. Michigan Public Act 399 of 1976. (1976). Molly. D. (2021). Drinking Water Operations. City of Houston. https://www.public- works.houstontx.gov/pud/drinkingwater.html Montgomery, A. W., Lyon, T. P., & Zhao, D. (2018). Not a Drop to Drink? Drinking Water Quality, System Ownership, and Stakeholder Attention. In Social Movements, Stakeholders and Non-Market Strategy (pp. 207-245). Emerald Publishing Limited. MRWA. (2021). About & Mission Statem ent. Michigan Rural Water Association. https://www.mrwa.net/mission-statement. Mullin, M. (2009). Governing the tap: Special district governance and the new local politics of water. MIT Press. Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from gen- eralized linear mixed‐effects models. Methods in ecology and evolution, 4(2), 133-142. Napoli, T. (2017) Federal and New York State Regulation of Drinking Water Contaminants. New York State Comptroller. Albany, NY. Retrieved from https://www.osc.state.ny.us/sites/de- fault/files/reports/documents/pdf/2018-12/environmental-drinking-water-2017.pdf. National Research Council, & Safe Drinking Water Committee. (1982). Drinking Water and Health: Volume 4. NEIWPCC. (2013). Drinking Water Operator Discipline Survey Report. Retrieved 2019, from https://www.neiwpcc.org/waterresourceprotection/wrp_docs/DrinkingWaterOperatorDisciplineS urveyReport.pdf New Jersey Department of Environmental Protection. (2017). New Jersey Water Supply Plan 2017-2022. 484p. http://www.nj.gov/dep/watersupply/wsp.html Nieminen, H. T. (2005). Successful inter-organizational knowledge transfer: Developing pre- conditions through the management of the relationship context. In 21st Imp Conference, Rotter- dam. 243 NIOSH. (2014, June 6). The Effects of Workplace Hazards on Male Reproductive Health. The National Institute for Occupational Safety and Health (NIOSH). https://www.cdc.gov/niosh/docs/96-132/ Noll, R. (2002). The economics of urban water systems. Thirsting for efficiency: The economics and politics of urban water system reform, 43-63. Norris, C. M., Ghali, W. A., Saunders, L. D., Brant, R., Galbraith, D., Faris, P., ... & APPROACH Investigators. (2006). Ordinal regression model and the linear regression model were superior to the logistic regression models. Journal of clinical epidemiology, 59(5), 448-456. Nulty, D. D. (2008). The adequacy of response rates to online and paper surveys: what can be done?. Assessment & evaluation in higher education, 33(3), 301-314. Office of Inspector General. (2017). EPA is Taking Steps to Improve State Drinking Water Program Reviews and Public Water Systems Compliance Data.17-P-0325. https://www.epa.gov/office-inspector-general/report-epa-taking-steps-improve-state-drinking- water-program-reviews-and Office of Water. (2013). Assessing Water System Managerial Capacity (12th ed., Vol. 816, Ser. 004, pp. 1-34, Rep.). EPA. OHADWS. (2018). Water System Survey Reference Manual (pp. 1-125, Rep.). OR: Oregon Health Authority Drinking Water Services. Oliveira, C. M. D. (2017). Sustainable access to safe drinking water: fundamental human right in the international and national scene. Revista Ambiente & Água, 12(6), 985-1000. Ottem, T., Jones, R., & Raucher, R. (2003). Consolidation Potential for Small Water Systems– Differences Between Urban and Rural Systems. National Rural Water Assn., Duncan, Okla. Oxenford, J. L., & Williams, S. I. (2009). Failure and Root Cause Analysis Project Report. Ca- pacity Building Unit. Colorado Safe Drinking Water Program, Colorado Department of Public Health and Environment, Denver. Oxenford, J. (2018, February). Operator Licensing Requirements Across the United States. American Water Works Association, Oxenford Consulting, and Jim Ginley Consulting. https://www.awwa.org/Portals/0/AWWA/ETS/Resources/Final_Report_Compiled_2.19.18.pdf Páez, A., Farber, S., & Wheeler, D. (2011). A simulation-based study of geographically weighted regression as a method for investigating spatially varying relationships. Environment and Plan- ning A, 43(12), 2992-3010. Pascual Sanz, M., Veenstra, S., Wehn de Montalvo, U., van Tulder, R., & Alaerts, G. (2013). What counts as ‘results’ in capacity development partnerships between water operators? A multi- path approach toward accountability, adaptation and learning. Water Policy, 15(S2), 242-266. 244 Pape, A. D., & Seo, M. (2015). Reports of water quality violations induce consumers to buy bot- tled water. Agricultural and Resource Economics Review, 44(1), 78-93. Pavitt, K. (1984). Sectoral patterns of technical change: towards a taxonomy and a theory. Tech- nology, Management and Systems of Innovation, 15-45. Pebesma, E. J. (2004). Multivariable geostatistics in S: the gstat package. Computers & geosci- ences, 30(7), 683-691. Pebesma, E., & Heuvelink, G. (2016). Spatio-temporal interpolation using gstat. RFID Jour- nal, 8(1), 204-218. Pennino, M. J., Compton, J. E., & Leibowitz, S. G. (2017). Trends in drinking water nitrate vio- lations across the United States. Environmental science & technology, 51(22), 13450-13460 Pepper, I. L., Gerba, C. P., & Brusseau, M. L. (2011). Environmental and pollution science. Elsevier. Peterson, R. A., & Jeong, J. (2010). Exploring the impact of advertising and R&D expenditures on corporate brand value and firm-level financial performance. Journal of the academy of mar- keting science, 38(6), 677-690. Petrie, A. (2020). regclass: Tools for an Introductory Class in Regression and Modeling. R pack- age version 1.6. https://CRAN.R-project.org/package=regclass Pons, W., McEwen, S. A., Pintar, K., Jones-Bitton, A., Young, I., & Papadopoulos, A. (2014). Experience, training and confidence among small, non-community drinking water system opera- tors in Ontario, Canada. Journal of water and health, 12(4), 782-790. Quah, D. (2001). ICT clusters in development: Theory and evidence. EIB papers, 6(1), 85-100. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/. R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org Richard, P. J., Devinney, T. M., Yip, G. S., & Johnson, G. (2009). Measuring organizational per- formance: Towards methodological best practice. Journal of management, 35(3), 718-804. Rickwood, C. J., & Carr, G. M. (2009). Development and sensitivity analysis of a global drink- ing water quality index. Environmental monitoring and assessment, 156(1-4), 73. Ripley, B., Venables, B., Bates, D. M., Hornik, K., Gebhardt, A., Firth, D., & Ripley, M. B. (2013). Package ‘mass’. Cran r, 538, 113-120. 245 Rodgers, W. (2016). Knowledge Creation: Going beyond published financial information. Nova Science Publishers, Incorporated. Rosenthal, S. S., & Strange, W. C. (2004). Evidence on the nature and sources of agglomeration economies. In Handbook of regional and urban economics (Vol. 4, pp. 2119-2171). Elsevier. Rubin, S. J. (2013). Evaluating violations of drinking water regulations. Journal‐American Water Works Association, 105(3), E137-E147. Samuel-Rosa, A. (2020). pedometrics: Miscellaneous Pedometric Tools. R package version 0.7.0. https://CRAN.R-project.org/package=pedometrics Sanchez, Andres R. (Winter 2017). Arsenic in Groundwater Poses Ongoing Challenge. Arizona Water Resources. (vol. 25, no. 1). https://wrrc.arizona.edu/arsenic-groundwater-poses-chal- lenge#:~:text=A%20study%20conducted%20by%20the,Environmental%20Protec- tion%20Agency%20(EPA) Saxenian, A. (1996). Beyond boundaries: Open labor markets and learning in Silicon Valley. The boundaryless career: A new employment principle for a new organizational era, 23, 39. Schaider, L. A., Swetschinski, L., Campbell, C., & Rudel, R. A. (2019). Environmental justice and drinking water quality: are there socioeconomic disparities in nitrate levels in US drinking water?. Environmental Health, 18(1), 1-15. Schlegel, B. and Steenbergen, M. (2020). brant: Test for Parallel Regression Assumption. R package version 0.3-0. https://CRAN.R-project.org/package=brant Scott, T. A., & Greer, R. A. (2018). Polycentricity and the Hollow State: Exploring Shared Per- sonnel as a Source of Connectivity in Fragmented Urban Systems. Policy Studies Journal. Senge, P. M. (2006). The fifth discipline: The art and practice of the learning organization. Cur- rency. Sensorex. (2021) Groundwater vs. Surface Water- What’s the Differemce? Halma Company. https://sensorex.com/blog/2021/05/31/groundwater-vs-surface-water/ Shahr, H. S. A., Yazdani, S., & Afshar, L. (2019). Professional socialization: an analytical defi- nition. Journal of medical ethics and history of medicine, 12. Shanaghan, P. E., Kline, I. P., Beecher, J. A., & Jones, R. T. (1998). SDWA capacity develop- ment. Journal‐American Water Works Association, 90(5), 51-59. Soete, L., Verspagen, B., & Ter Weel, B. (2010). Systems of innovation. In Handbook of the Economics of Innovation (Vol. 2, pp. 1159-1180). North-Holland. 246 Smeets, L. & Schoot, R. (2019) Building a Multilevel Model in BRMS Tutorial: Popularity Data. Rens van de Schoot. Building a Multilevel Model in BRMS Tutorial: Popularity Data - Rens van de Schoot. Statman-Weil, Z., Nanus, L., & Wilkinson, N. (2020). Disparities in community water system compliance with the Safe Drinking Water Act. Applied Geography, 121, 102264. Sullivan, L. (2017). Confidence Intervals. Boston University School of Public Health. https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Confidence_Intervals/BS704_Confi- dence_Intervals_print.html#:~:text=%3D%200).,case%20the%20differ- ence%20in%20means).&text=If%20the%20confidence%20interval%20does,significant%20dif- ference%20between%20the%20groups. Swaminathan, S. (2018, March 15). Logistic Regression - Detailed Overview. Retrieved July 10, 2019, from https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc Switzer, D., Teodoro, M. P., & Karasik, S. (2016). The Human Capital Resource Challenge: Recognizing and Overcoming Small Utility Workforce Obstacles. Journal‐American Water Works Association, 108(8), E416-E424. Switzer, D. (2017). Citizen partisanship, local government, and environmental policy implemen- tation. Urban Affairs Review, 1078087417722863. Switzer, D., & Teodoro, M. P. (2017). The Color of Drinking Water: Class, Race, Ethnicity, and Safe Drinking Water Act Compliance. Journal-American Water Works Association, 109(9), 40- 45. Teodoro, M. P., & Whisenant, T. E. (2013). Water utility executive leadership, Part 1: Who our CEOs are. Journal AWWA, 105(12), 22. Teodoro, M. P., & Whisenant, T. E. (2014). Water utility executive leadership, Part 2: What our CEOs think. Journal AWWA, 106(4), 55. Teodoro, M. P. (2014). When professionals lead: Executive management, normative isomor- phism, and policy implementation. Journal of Public Administration Research and The- ory, 24(4), 983-1004. Teodoro, M. P., & Whisenant, T. E. (2015). Water Utility Executive Leadership, Part 3: What CEOs Do. Journal‐American Water Works Association, 107(1), 71-80. Teodoro, M. P., & Switzer, D. (2016). Drinking from the talent pool: A resource endowment the- ory of human capital and agency performance. Public Administration Review, 76(4), 564-575. Tiemann, M. (2014). Safe drinking water act (SDWA): a summary of the act and its major re- quirements. Report RL31243, Congressional Research Service, Washington, DC. 247 Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Eco- nomic geography, 46(sup1), 234-240. Tu, J., & Xia, Z. G. (2008). Examining spatially varying relationships between land use and water quality using geographically weighted regression I: Model design and evaluation. Science of the total environment, 407(1), 358-378. Tu, J., Tu, W., & Tedders, S. H. (2016). Spatial variations in the associations of term birth weight with ambient air pollution in Georgia, USA. Environment international, 92, 146-156. United Nations. Sustainable Development Goal 6 Synthesis Report on Water and Sanitation. United Nations, New York (2018). U.S. Census Bureau. (2018). Understanding and Using ACS Single-Year and Multi-Year Esti- mates. Census.gov. https://www.census.gov/content/dam/Census/library/publica- tions/2018/acs/acs_general_handbook_2018_ch03.pdf. U.S. Census Bureau. (2019). Annual Estimates of Resident Population for Incorporated Places of 50,000 or More, Ranked by July 1, 2019. https://www.census.gov/data/tables/time-se- ries/demo/popest/2010s-total-cities-and-towns.html U.S. Census Bureau. (2021). Census Tract. https://www.census.gov/programs- surveys/geography/about/glossary.html#par_textimage_13 USDA. (2020). Rural-Urban Continuum Codes. USDA ERS - Rural-Urban Continuum Codes. https://www.ers.usda.gov/data-products/rural-urban-continuum-codes.aspx#.U0VBhleG-Hs. Valamis. (2019). Organizational Learning. The Valamis group. https://www.valamis.com/hub/organizational-learning. Van Der Slice, J. (2011). Drinking water infrastructure and environmental disparities: evidence and methodological considerations. American journal of public health, 101(S1), S109-S114. Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S fourth edition. World. Vittinghoff, E., Glidden, D. V., Shiboski, S. C., & McCulloch, C. E. (2011). Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. Springer Science & Business Media. Vojnovic, I. (2009). Urban settlements in Michigan: suburbanization and the future. Michigan Geography and Geology, 487-507. Walker, W. E., Loucks, D. P., & Carr, G. (2015). Social responses to water management deci- sions. Environmental Processes, 2(3), 485-509. 248 Wallsten, S., & Kosec, K. (2008). The effects of ownership and benchmark competition: An em- pirical analysis of US water systems. International Journal of Industrial Organization, 26(1), 186-205. Water, R. D. (2003). Small Systems Guide to Safe Drinking Water Act Regulations. Wang, Z., & Wang, N. (2012). Knowledge sharing, innovation and firm performance. Expert systems with applications, 39(10), 8899-8908. Wang, Z., Sharma, P. N., & Cao, J. (2016). From knowledge sharing to firm performance: A pre- dictive model comparison. Journal of Business Research, 69(10), 4650-4658. Wehn, U., & Montalvo, C. (2018). Knowledge transfer dynamics and innovation: Behaviour, in- teractions and aggregated outcomes. Journal of Cleaner Production, 171, S56-S68. Weidenfeld, A., Williams, A. M., & Butler, R. W. (2010). Knowledge transfer and innovation among attractions. Annals of tourism research, 37(3), 604-626. Wheeler, D., & Tiefelsdorf, M. (2005). Multicollinearity and correlation among local regression coefficients in geographically weighted regression. Journal of Geographical Systems, 7(2), 161- 187. Wheeler, D. C., & Páez, A. (2010). Geographically weighted regression. In Handbook of applied spatial analysis (pp. 461-486). Springer, Berlin, Heidelberg Youndt, M. A., & Snell, S. A. (2004). Human resource configurations, intellectual capital, and organizational performance. Journal of managerial issues, 337-360. Zimmerman, J. B., Mihelcic, J. R., & Smith, A. J. (2008). Global stressors on water quality and quantity. 249