APPLICABILITY OF DATA DRIVEN METHODS FOR ASSESSING COMPLIANCE OF WASTEWATER TREATMENT PLANTS SELF-REPORTED DATASETS By Pouyan Hatami Bahman Beiglou A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Biosystems Engineering- Master of Science 2016 ABSTRACT APPLICABILITY OF DATA DRIVEN METHODS FOR ASSESSING COMPLIANCE OF WASTEWATER TREATMENT PLANTS SELF-REPORTED DATASETS By Pouyan Hatami Bahman Beiglou The primary source of compliance information in water quality monitoring is self-reported data. Despite the heavy reliance on self-reported data in United States environmental regulation, the U.S. General Accounting Office has expressed concerns regarding the potential for fraud in environmental self-reports. Furthermore, recent research indicates that the methods used by state enforcement are unlikely to detect fraud. Therefore, the need for data-driven methods to support regulatory enforcement is an important area of research. In this thesis, we evaluated the applicability of data-driven methods for assessing compliance of wastewater treatment plants (WWTP) self-reported datasets based on a description of the variability in these data streams. For this purpose, first a literature review was conducted (1) to determine the goals of the Clean Water Act programs; (2) identify limitations of current monitoring efforts and data gaps in the understanding of the sources of variability in WWTPs data; and (3) to identify appropriate predictive analytical methods to address the problems. Second, the applicability of a method for uncovering irregularities in the distribution of first and second digits in a sample dataset was tested and its effectiveness was discussed. Finally, the use of other promising approaches, which may be capable of finding mishandling in wastewater treatment plants are presented with preliminary data. iii ACKNOWLEDGEMENTS I would like to thank my advisors Dr. Jade Mitchell and Dr. Amir Pouyan Nejadhashemi for all their support, guidance and advice. Without them, I would not have been able to write this thesis. I would also like to thank my committee, Dr. Timothy Harrigan and Dr. Carole Gibbs for their guidance in completing this degree. iv TABLE OF CONTENTS LIST OF TABLES ......................................................................................................................... vi KEY TO ABBREVIATIONS ....................................................................................................... vii 1 INTRODUCTION ................................................................................................................... 1 2 WATER QUALITY MONITORING FRAMEWORK .......................................................... 3 2.1 WATER RESOURCES MANAGEMENT ...................................................................... 3 2. Clean Water Act ........................................................................................................ 3 2.1.2 Total Maximum Daily Load (TMDL) ...................................................................... 4 2.1.2.1 TMDL Deficiencies............................................................................................... 4 2.1.3 National Pollutant Discharge Elimination System (NPDES) ................................... 5 2.1.3.1 NPDES Deficiencies ............................................................................................. 6 2.2 ENVIRONMENTAL MONITORING PRACTICE AND IMPLEMENTATION .......... 6 2.2.1 Deficiencies in Environmental Monitoring Practice and Implementation ............... 8 2.2.2 Discharge Monitoring Report Review ...................................................................... 9 2.2.2.1 Deficiencies in Self-Monitoring Report System ................................................... 9 2.2.2.2 Malfunction in Wastewater Treatment Plants, Reasons of Inaccurate Self-Reporting............................................................................................................................ 10 2.2. On-Site Compliance Evaluation ............................................................................. 11 2.2.3.1 Deficiencies in On-Site Compliance Assessment ............................................... 11 2.2.3.2 Deficiencies in Use of Sampling Studies in Defining Monitoring Strategies ..... 12 2.2.4 Use of Remote Sensing in Monitoring.................................................................... 13 2.2.4.1 Deficiencies in Use of Remote Sensing in Monitoring ....................................... 13 2.3 SUMMARY ................................................................................................................... 15 3 ANALYSIS AND RESULTS OF THE FIRST STUDY ...................................................... 17 3.1 INTRODUCTION .......................................................................................................... 17 3.1.1 ............................................................. 19 3.1.2 ................................................................ 20 3.2 OBJECTIVES OF THE STUDY ................................................................................... 23 3.3 METHODOLOGY ......................................................................................................... 24 3.3.1 Screening the Datasets for Applicability ................................................................ 24 3.3.2 Analysis................................................................................................................... 25 3.4 RESULTS....................................................................................................................... 27 3.4.1 ................................... 27 3.4.2 ........... 30 3.4.3 .............................................................................. 34 3.4.3.1 The First Two Digits Test ................................................................................... 40 3.5 DISCUSSION ................................................................................................................ 42 3.5.1 Previous .... 44 v 4 DEVELOPING A PREDICTIVE MODEL TO DETECT MISHANDLING IN THE SELF-REPORTED WATER DISCHARGE DATA ............................................................................... 47 4.1 CORRELATION ANALYSIS ....................................................................................... 48 4.2 CLUSTER ANALYSIS ................................................................................................. 49 4.3 DEVELOPMENT OF MULTIPLE LINEAR REGRESSION MODELS ..................... 50 4.4 VALIDATION OF MODELS ....................................................................................... 51 5 CONCLUSION ..................................................................................................................... 52 6 FUTURE WORKS ................................................................................................................ 54 APPENDIX ................................................................................................................................... 55 REFERENCES ............................................................................................................................. 70 vi LIST OF TABLES Table 1- The expected probability of first significant digit (FSD) predicted by Benford's Law .. 18 Table 2- ................... 19 Table 3- Parameters Excluded After Initial Screening the Dataset Law ............................................................................................................................................... 29 Table 4- Results of exclusion of the parameters with inherent minimum and maximum ............ 31 Table 5- Results of exclusion of the parameters without at least one order of magnitude ........... 32 Table 6- Results of exclusion of the parameters with number of reported values less than 21 .... 33 Table 7- Results of exclusion of the parameters without mean greater than median and positive skewness .......................................................................................................................... 33 Table 8- Results of overall screening process............................................................................... 34 Table 9- Analysis based on number of reported values with 95% CI .......................................... 36 Table 10- Analysis of Conforming Facilities based on Groupings of Parameters ....................... 37 Table 11- Analysis of Facilities based on Classification and Grouped Parameters ..................... 39 vii KEY TO ABBREVIATIONS CI - Confidence Interval CWA - Clean Water Act EPA - Environmental Protection Agency GAO - General Accounting Office GPD - Gallons Per Day MGD - Million Gallons per Day NPDES - National Pollutant Discharge Elimination System PCS - Permit Compliance System TMDL - Total Maximum Daily Load WWTP - Wastewater Treatment Plants WERF - Water Environmental Research Foundation 1 1 INTRODUCTION Amendments to the 1972 Clean Water Act, the foundation of surface water quality protection in the United States establish the supreme goal of this legislation , Section 402 of the Clean Water Act established the National Pollutant Discharge Elimination System (NPDES), which regulates the level of pollutant discharges from point sources into the waters of the U.S. through permitting programs. Permits are defined based on Total Maximum Daily Load (TMDL), which determines the maximum amount of a pollutant that can occur in a waterbody. All dischargers are required to receive a permit before discharging their effluents into the surface waters. Monitoring the compliance of dischargers is required in every state, but current strategies rely heavily on inspections, which occurs yearly, biannually or as infrequently as once every two years or once every five years for major dischargers and minor dischargers, respectively. Therefore, environmental monitoring and enforcement relies extensively on regulated entities to self-report pollution discharges. The only compliance evaluation of the self-reported datasets is through visual monitoring whereas studies have reported that compliance evaluation of self-reported data needs a more in-depth analysis (Shimshack & Ward, 2005). Due to the lack of a robust platform to assess the integrity of self-reported data, human and ecological health are potentially endangered by risks associated with underreported discharges. Therefore, the need for data-driven methods to support regulatory enforcement is an important compliance evaluation area of research. Data-driven1 methods can provide an effective approach 1 Data-driven methods are the methods that are based on data rather than intuition or personal experience 2 and to ensure achieving water quality safety by finding the underlying patterns and relationships within the data, cost lower than inspections for regulatory enforcement. Furthermore, data-driven methods are more simple to implement compared to the conventional monitoring methods, and may allow for more timely regulatory enforcement. Therefore, the specific objectives of this thesis are as follows: 1) To review the current monitoring processes and identify data gaps and limitations; 2) To assess and test the applicability of a simple data-driven approach using predictable wastewater treatment plants (WWTPs) data; 3) To discuss the effectiveness of the approach; 4) To discuss other promising data-driven approaches. 3 2 WATER QUALITY MONITORING FRAMEWORK 2.1 WATER RESOURCES MANAGEMENT One of the most important aspects of water resources management is to maintain water quality. Unfortunately, industrialization and urbanization have led to many water quality issues through point sources and non-point sources (Wang, 2001). Because surface water pollution is a major water quality problem in the United States, regulated and unregulated programs have been developed by the United States Environmental Protection Agency (USEPA) to control the amount of pollution into the surface water (Parry, 1998). One of the primary regulations put in place to control the amount of discharges into surface waters is the Clean Water Act. 2. Clean Water Act The Clean Water Act, initially executed in 1948 as the Federal Water Pollution Control Act, was the key element of surface water quality protection in the United States, but was significantly reorganized and developed in 1972 (EPA, 2002). Prior to 1987, it was exclusively directed at point source pollution, but amendments to the law in that year included nonpoint source pollution measurements. Point source pollution has caused more than 50% of the (Copeland, 2010). Non-point source pollution has been identified as the likely the most significant source of water pollution (Puckett, 1995). Due to the already extensive regulations, point source pollution has generally has received less attention as an ongoing threat, but it may be more harmful than anticipated (Andreen, 2004). Specifically, point source discharges hidden by fraudulent self-reported data represent another potential source of water quality impairment. Although the Clean Water Act was specifically established to protect water quality or to 4 stressed (Doremus & Dan Tarlock, 2012). Title VI of the Clean Water Act explains that all industrial and municipal dischargers are prohibited from violating water quality standards, and that states are responsible for ensuring all regulatory requirements are met (Copeland, 2014). Water quality standard are defined through a program called TMDL. 2.1.2 Total Maximum Daily Load (TMDL) The TMDL program rose as an establishment for the country's endeavors to meet state surface water quality standards. Total maximum daily load of a pollutant is the amount of discharge that complies with a water quality standard; the "TMDL process" refers to the arrangement to create and perform the TMDL (Tjeerdema, 2007). A TMDL is the measure of pollutant loads apportioned from point sources and nonpoint sources in addition to a margin of safety for probable unknown and seasonal changes in water quality (Miller-mcclellan, Shanholtz, & Miller-mcclellan, 2003). The goal of the TMDL is to guarantee that the waterbody will have the capacity to meet water quality standards for all seasonal variations. Generally, TMDL point and nonpoint loads are evaluated using computer modeling. Although monitoring is the most desirable method to calculate TMDL loads, its utilization is limited because of the high cost and the large variability in spatial and temporal components of ecosystems that would necessitate a prohibitive amount of samples to fully characterize the water quality of a waterbody (Muñoz-Carpena et al., 2006). The margin of safety is incorporated to represent uncertainties connected with the improvement of the TMDL and is added to increase water quality protection (Zhang & Yu, 2004). 2.1.2.1 TMDL Deficiencies Despite the importance of the TMDL program, its advancement is still a challenging task, particularly when there is no technical guidance to deliver help in executing uncertainty analysis 5 (Shirmohammadi, Chaubey, & Harmel, 2006). Theoretically, due to the regulatory and strategic decision-making processes, the uncertainty analysis always has some arbitrariness so as a long-term consequence, that can affect the success of the TMDL program (USGS, 2008). Furthermore, due to tight timetables and constrained financial resources for TMDL improvement, typically the margin of safety has been selected by subjective decisions without clear consideration of uncertainty sources and estimation their direct influences on total uncertainty in the TMDL calculations (USGS, 2008). A national study supported by the Water Environmental Research Foundation (WERF) revealed that among 172 TMDLs, 12 TMDLs had no margin of safety estimates at all; 119 of the remaining TMDLs used the subjective EPA simple explicit margin of safety technique; 40 of them had conservative assumptions, and only one TMDL unambiguously calculated the uncertainty over a analogous research study and redirected this uncertainty into Margin of safety (Dilks & Freedman, 2004). While TMDLs are being established, discharge permits may be issued to dischargers through the National Pollutant Discharge Elimination System (NPDES) program. 2.1.3 National Pollutant Discharge Elimination System (NPDES) Under section 402 of the Clean Water Act, the National Pollutant Discharge Elimination System requires the acquisition of a permit2 for all facilities discharging wastewater into the surface water of the nation (EPA, 2004). NPDES has played an important role in protecting and restoring water quality in the United States by regulating and limiting direct discharge into the 2 There are two types of individual and general permits which a facility can request where each permit type is used under different conditions and embrace different permit issuance procedures (Boyd, 2003). An individual permit is a permit that is issued for an individual facility based on the its information, such as previous permit requirements, discharge monitoring reports, technology, water quality standards and total maximum daily loads, whereas a general permit covers multiple facilities, which can be considered in a specific group of dischargers (Gaba, 2007). 6 surface water (Houck, 2002). While in 1972, only about 30% of the United States surface water were considered to be healthy, this was increased to approximately two thirds by 2001 (Birkeland, 2001). A facility owner or operator has to apply for an NPDES permit through EPA or a state permitting authority. Then the permit writer defines the proper permit terms and conditions through evaluating facility-specific information (Gaba, 2007). As of 2010, more than 65,000 industrial and municipal dischargers must attain NPDES permit from the EPA or qualified states (Copeland, 2010). 2.1.3.1 NPDES Deficiencies The main performance gap of NPDES is outdated permits (Rechtschaffen & Markell, 2003). Facilities can use their outdated permits as long as the request for permit renewal is under process. As of 2003, 15% of major facilities and one third of minor facilities were using outdated permits. 2.2 ENVIRONMENTAL MONITORING PRACTICE AND IMPLEMENTATION Monitoring is needed for policy makers to plan, develop and assess environmental rules. A monitoring program is designed to ensure quality and accessibility of data and cost-effectiveness evaluation of water quality protection programs (Lovett, Burns, & Driscoll, 2007). The NPDES monitoring and reporting conditions section defines detailed requirements for location and frequency of monitoring, sample collection techniques, analytical methods, reporting and recordkeeping (EPA, 2010a). Section 308 of the Clean Water Act, which authorizes monitoring of facilities to ensure that water quality standards are met, provides two types of monitoring (EPA, 2004): 1- Self-monitoring, where the facility must monitor wastewater components alone; 7 2- Monitoring by the EPA or the state, which consists of two processes: (i) Evaluation of facility self-monitoring; and (ii) Direct monitoring activities. Regardless of what type of monitoring (inspections, etc.), the frequency of monitoring the discharge is related to several factors. These factors include design capacity of the treatment facility, compliance history, treatment method used, cost of monitoricapability, discharge location, types of pollutants, frequency of discharge, and sum of monthly samples used in developing effluent limitations (EPA, 2010). For example, a highly variable discharge should be monitored more frequently than a discharged water quality parameter, which is more consistent over time. Data collected from tracking plant-level self-reported emissions and on-site inspections along with permitted effluent limitations and enforcement action are Monitoring programs have showed success since they were initiated, and empirical studies have found that they positively influence compliance and levels of pollutant discharge. Magat and Viscusi (1990) evaluated the impact of monitoring on water quality of 77 pulp and paper mills between 1982 to 1985. They found that while their overall compliance rate was about 75%, not inspecting the facilities in the previous quarter could double the possibility of noncompliance. Also, they assessed the impact of monitoring on the amount of discharge pollution by facilities, which was about 20% decrease for each inspection. Earnhart (2004) and Glicksman & Earnhart (2007) examined conventional water pollution discharge for forty Kansas wastewater treatment plants and hundred chemical facilities respectively. Both studies found that monitoring programs along with monetary fines steadily decreased relative discharges. Shimshack & Ward (2005) assessed the compliance of 217 pulp and papers facilities between 8 1988 to 1996 after penalizing and regulatory action. Application of additional fines caused two-thirds of water pollution violation percentage drop in the year following the actions. 2.2.1 Deficiencies in Environmental Monitoring Practice and Implementation Although there have been endeavors to monitor the compliance of water pollution dischargers, it is not a flawless system and noncompliance is threatening waterbodies. Most monitoring practices perform as expected and there are still controversial topics among practitioners, regulators and researchers (Harmancioglu, Fistikoglu, Ozkul, Singh, & Alpaslan, 1999). There are reports about permits violation (GAO, 1983), in which 82% of dischargers violated their permit at least once. Additionally, 24% were in a substantial noncompliance status with their discharge permit. The U.S. General Accounting Office (GAO) report of fiscal years 1992-1994 also declared that one in six major facilities was significantly in noncompliance with its allocated discharge permit and that the actual number could be twice as high. A nationwide compliance analysis of major facilities performed by the EPA disclosed that 25% were significantly in noncompliance with their discharge permits at any given time (Rechtschaffen & Markell, 2003). in many studies (Glicksman & Earnhart, 2007; Rechtschaffen & Markell, 2003; Shimshack & Ward, 2005; Magat and Viscusi, 1990) such as failure to perform inspections, failure to implement the proper actions; and failure to have effective penalties. Gray & Shimshack (2011) declared another weakness of the monitoring programs due to significant variability across time and state authorities such as different inspection frequencies and noncompliance fines. They discussed that cross-state variability in facilities composition leads to defining non-practical 9 federal enforcement guidelines and monitoring strategies. However, few studies have addressed this variability within or across states. 2.2.2 Discharge Monitoring Report Review Self-monitoring reports are considered the primary source of information for permittee compliance evaluation (Shimshack & Ward, 2005). Monitoring programs require the permit holders to self-report their water pollution discharge routinely and report the analytical results to the permitting authority with the essential information to assess discharge characteristics and compliance status (EPA, 2010b). Periodic self-reporting creates a continuous record of the , which can help to detect violations as well as providing a source of information to support any necessary enforcement action (NYSDEC, 2012). Facilities report their self-monitoring to the permitting authorities and the permitting authorities are responsible to transfer the facilities report to EPA headquarters either electronically or manually. Subsequently, the reports are entered into the EPA electronic database to be reviewed for any permit noncompliance (EPA, 2006). 2.2.2.1 Deficiencies in Self-Monitoring Report System Despite the heavy reliance on self-monitoring data, the GAO has expressed concerns regarding the potential for fraud in environmental self-reports (GAO, 1993). Strategic misreport can lead to inaccurate compliance evaluation and consequently inaccurate estimation of the discharge pollutions putting human and ecology in danger. Also, averaged reported values over periods of time (i.e. weekly, monthly and quarterly) may cause inaccurate estimation of real time discharges and be another reason for potential environmental risks. Because self-reported violations are treated with administrative penalties and the outcome of strategic falsification of reports can lead to criminal prosecution of employees and managers, 10 generally self-monitoring reports are considered to be truthful (Gray & Shimshack, 2011). Kaplow & Shavell (1991) discussed that dischargers can be encouraged to report their own violations without materially affecting their motivation to refrain from violation. Shimshack & Ward (2005) discussed that veracity assessment of -reporting records are carried out through visual monitoring whereas compliance assessment mandates a more in-depth analysis. The authors posit that though on-site state or federal EPA inspections are another method to ensure the accuracy of self-report records and verify the maintenance and operation of facilities, permittees are intended to report truthfully in the presence of regulatory inspection. The researcher proposed a secret and random inspection of the facilities as an ideal solution (Gray & Shimshack, 2011). 2.2.2.2 Malfunction in Wastewater Treatment Plants, Reasons of Inaccurate Self-Reporting The existence of problems in wastewater treatment plants may cause violations in the reporting of the actual effluent discharges. More than two third of the wastewater treatment plants in the United States had serious gaps in their measurements and more than one thirds of all sewage systems have been in noncompliance with environmental laws (Duhigg, 2009). Flajsig (1999) studied common problems in wastewater treatment plants. The malfunctions may be due to poor quality of planning and designing data, lack of experience of plants operators, poor maintenance of the plants due financial problems. Also, Freeman (1990) presented maintenance and equipment deficiencies, and treatment plants overloading as the reasons for violation in operation of wastewater treatment plants. Because of all those problems, there are strict regulations for wastewater treatment plants to employ new technologies and optimize the current ones but applicability and financial aspects of the change should be considered (Jury & Vaux, 2005). 11 2.2. On-Site Compliance Evaluation On-site monitoring ranges from quick inspections for few hours to a very comprehensive inspection, which can last to up a month or more (Gray & Shimshack, 2011). Current EPA compliance monitoring strategies require major and minor facilities to receive a comprehensive inspection at least once biannually and every five years, respectively. However, regulators are not limited to those timelines and the frequency of compliance assessment can be increased (Tsakiris & Alexakis, 2012). 2.2.3.1 Deficiencies in On-Site Compliance Assessment On-site monitoring has several deficiencies, which may lead to facility non-compliance. size of the facilities and how their sizes have changed over time, frequency and comprehension of the inspection strategies needs to be matched with their development rates (EPA, 2005). Regulatory agencies are also suffering from lack of budget and local economic circumstances (Deily & Gray, 1991), pressures from local interest groups (Peltzman, 1976), and political burdens (Kleit, Pierce, & Hill, 1998), which can result in circumstances where performing an appropriate and more frequent inspection is unachievable. Storey et al. (2011) studied inspection methods as another area of discussion for deficiencies in on-site compliance assessment. They argued that although, recently there have been promising technological developments in biological monitors and micro sensors to monitor water quality and contaminant detection, large-scale implementation will still take years to achieve. While employing advanced monitoring technologies comes at a high cost and is non-compatible with current treatment operations, they need to evolve to meet many operational limitations. 12 Earnhart (2010) analyzed the effects of permitted discharge limits on strategy of inspection in the municipal wastewater treatment plants both empirically and theoretically. He found that when relative discharge of a facility increases, there is a greater probability of noncompliance for the facility so in result the likelihood of the inspection by the agencies in the preceding month rises. In turn, Shimshack & Ward (2005) discussed that theoretically a plant may quickly reduce its effluents to the standard levels when there is the possibility that regulatory inspectors be present. 2.2.3.2 Deficiencies in Use of Sampling Studies in Defining Monitoring Strategies Before monitoring it should be considered that water quality monitoring is a highly complex process. This complexity arises from uncertainty in the nature of water quality and uncertainty in defining the specific purpose of monitoring (Harmancioglu et al., 1999). Uncertainties in the nature of water quality are due to natural hydrologic cycle and human-made influences (Sanders, 1983). Spatial and seasonal variations of water quality are extremely relevant to land use pattern and influences from watershed runoff discharge (Caccia & Boyer, 2005; Ming-kui, Li-ping, & Zhen-li, 2007). Horowitz (2013) discussed the result of those uncertainties as biased sampling, processing and analytical methods, which generate nonrepresentative data. He argued that the assumptions of the existing programs such as calendar-based sampling and stationarity are not defensible anymore, some monitoring programs may need to be redesigned, some sampling and analytical methods need to be updated (e.g. sampling locations and frequency) and statistical models which do not consider dynamic characteristics of hydrologic interrelationships which may require recalibration. 13 2.2.4 Use of Remote Sensing in Monitoring information about an object, area, or phenomena through the analysis of data acquired by device that is not in (Lillesand, Kiefer, & Chipman, 2014). There are various studies which discussed about the usefulness of remote sensing as a method of monitoring the water quality (Giardino, Brando, .& Dekker, 2007; Ritchie & Cooper, 1988; Schalles, Gitelson, Yacobi, & Kroenke, 1998). Use of remote sensing in water quality monitoring has been conducted since There are many studies in which each one describes a different satellite sensor (Alparslan, Aydöner, Tufekci, & Tüfekci, 2007; Giardino et al., 2007; He, Chen, Liu, & Chen, 2008; Maillard & Santos, 2008). In each, attributes of water and pollutants are crucial to water quality monitoring. Received signal has a spectral characteristic, which is a function of hydrological, biological and chemical features of water (Seyhan & Dekker, 1986). For example suspended solids cause an increase in radiance emergent from surface waters near infrared quantity of the electromagnetic spectrum (Ritchie & Cooper, 1988). 2.2.4.1 Deficiencies in Use of Remote Sensing in Monitoring It is very usual to have systematic errors in use of remote sensing as a tool to monitor water quality due to reflectance terminology does not convey physical standards entirely (Schaepman-Strub, Schaepman, Painter, Dangel, & Martonchik, 2006) so an analysis of the systematic errors along with random errors of the data is required. The revision of systematic errors is an essential for the correction of the depth and color data, and depends on the identification of the mathematical model and included parameters (Khoshelham, 2011). Another 14 source of error in remote sensing monitoring may originate from the sensor, measurement settings and characteristics of the object surface (Khoshelham, 2011). 15 2.3 SUMMARY Although there have been endeavors to monitor the compliance of water pollution dischargers, still it is not flawless and noncompliance is threatening waterbodies. Most monitoring practices not perform what is expected and there are still controversial topics among practitioners, regulators and researchers. Deficiencies in monitoring programs originate from inappropriate uncertainty analysis in TMDL, using outdated NPDES permits, possible falsification in self-monitoring reports, old treatment and measurement technologies in wastewater treatment plants, and improper inspection frequency and methods. Consequently, incomplete and inaccurate generated data affect the excellence of monitoring program. A significant amount of environmental regulation in the United States is conducted via self-reports but still the U.S. GAO has expressed concerns regarding the potential for fraud in environmental self-reports (US GAO, 1993), which represents a criminal act under many environmental laws. The most recent EPA Office of Inspector General (OIG) report indicates that efforts to improve procedures to detect and address environmental reporting fraud remain inadequate (1999; 2014). The US EPA delegated regulatory authority for most major federal environmental programs to the states (Situ & Emmons, 2000). The environmental regulatory offices tasked with assessing the level of compliance of permitted entities have limited resources to proactively assess the veracity of the data (Dumas & Devine, 2000). The potential for fraud in the self-report process is an important issue. Despite the laws in place to maintain water body quality standards, water resources are still threatened (Parry, 1998). Two-thirds of coastal systems, one-third of streams and two-fifths of lakes in the United States are impaired due to nutrient loading (Davidson et al., 2011). Non-point source pollution has been identified as the likely the most significant source of water pollution (Puckett, 1995). 16 Due to the already extensive regulations, point source pollution has generally has received less attention as an ongoing threat, but it may be more harmful than anticipated (Andreen, 2004). Specifically, point source discharges hidden by fraudulent self-reported data represent another potential source of water quality impairment. Therefore, it is important to devise strategies to discover and address fraudulent self-reports in the environmental arena, particularly at the state-level. Although understanding of state regulatory and enforcement processes is limited, recent research indicates that state methods (e.g., the on-site inspection process) are unlikely to detect fraud (Rivers, Dempsey, Mitchell, & Gibbs, 2015). New solutions that complement and can be easily integrated into existing state practices are needed (Rivers et al., 2015). This thesis represents a direct effort to build on these suggestions by examining data-driven methods to detect potential fraud. 17 3 ANALYSIS AND RESULTS OF THE FIRST STUDY 3.1 INTRODUCTION In this chapterdischarge self-(M. Nigrini, 2012) and has been used with some success to detect irregularities in environmental data, such as concentrations of chemical emissions to air (De Marchi & Hamilton, 2006) and air quality monitoring data (Fu, Fang, Villas-Boas, & Judge, 2014). Given the relative ease of using is an important starting point for evaluating data driven methods to detect fraud in environmental self-reports. This paper represents a direct effort to build on these suggestions by examining data-driven methods to detect potential fraud. In the 1930s, physicist Frank Benford found a first digit pattern of numbers in certain datasets(Mark J Nigrini & Miller, 2007), describes the expected frequency of the numbers between one and nine in which the appearance of lower numbers (i.e. 1, 2, and 3) as first digits are more likely to appear than higher numbers (i.e. 7, 8 or 9) as first digits within a given dataset. The expected frequency of the first digits reported in a dataset follow a logarithmic pattern (Benford, 1938).The formula to describe the discrete probability distribution of the frequency of occurrence of single digit numbers as first digits (Brown, 2005). The law describes the lower numbers (i.e. 1, 2, and 3) as first digits appear more than higher (i.e. 7, 8 or 9) as within a given dataset. The expected frequency of the first digits reported in a dataset follow a logarithmic pattern (Benford 1938) as equation 1: 18 ; (1) where, P is the expected probability of a number with a first digit equal to and for all possible first digits is tabulated in Table 1. Table 1- The expected probability of first significant digit (FSD) predicted by Benford's Law First Digit Probability (%) 1 30.1 2 17.6 3 12.5 4 9.7 5 7.9 6 6.7 7 5.8 8 5.1 9 4.6 as described by Equation 2 (Mark J Nigrini, 2005): (2) A continuous function to describe the expected probabilities of the appearance of the first two digits is presented in the Table 2. 19 Table 2- Expected Probability of the First Two Digits Predicted by 0 1 2 3 4 5 6 7 8 9 1 4.14 3.78 3.48 3.22 3.00 2.80 2.63 2.48 2.35 2.23 2 2.12 2.02 1.93 1.85 1.77 1.70 1.64 1.58 1.52 1.47 3 1.42 1.38 1.34 1.30 1.26 1.22 1.19 1.16 1.13 1.10 4 1.07 1.05 1.02 1.00 0.98 0.95 0.93 0.91 0.90 0.88 5 0.86 0.84 0.83 0.81 0.80 0.78 0.77 0.76 0.74 0.73 6 0.72 0.71 0.69 0.68 0.67 0.66 0.65 0.64 0.63 0.62 7 0.62 0.61 0.60 0.59 0.58 0.58 0.57 0.56 0.55 0.55 8 0.54 0.53 0.53 0.52 0.51 0.51 0.50 0.50 0.49 0.49 9 0.48 0.47 0.47 0.46 0.46 0.45 0.45 0.45 0.44 0.44 3.1.1 Criteria for A given dataset (Dumas & Devine, 2000). If the standards are ignored, correctly labeling a dataset as fraudulent or anomalous cannot be accurately inferred if the dataset Law (Durtschi, Hillison, & Pacini, 2004). One of the most important considerations is the nature of the dataset and the context of the area of focus from which it pertains. A dataset should (Dumas & Devine, 2000)(e.g. wastewater treatment plant discharge parameters). The dataset should not have an inherent minimum and maximum (Durtschi et al., 2004). For example, the hydrogen ion concentration of a solution (pH) varies between 0-14 so a dataset self-Law assessment. A dataset should also be spread across at least one order of magnitude (expressed as a power of 10)(Dumas & Devine, 2000). Some researchers refer to this criterion as (Brown, 2005). Wallace (2002) suggested if the mean of the dataset is greater than the median and the data skewness of the distribution is positive, it is more likely to obey erion or standard is that the number of reported values should not be (Brown, 2005). When a dataset has few reported values, the criterion of the frequency of first digitsbeing spread across at least one order of magnitude will not be satisfied. 20 ncredibly helpful to assess the veracity of datasets, it has limitations that must be considered before applying the Law. 3.1.2 Previous Applications o In accounting and financial data, Carslaw (1988) numbers of New Zealand firms and proved that the datasets reported were biased based on . In fraud detection in accounting numbers, Nigrini (1992, 1994, 1996, 1999, 2003, 2005, 2007, 2011, 2012) is the first to find data irregularities. o find anomalies in the results of elections. For example, in in 366 voting areas showed evidence of fraud (Roukema, 2009). Beyond the social sciences, numbers and digits are often lognormally distributed in nature and as a result, the enormous prevalence astronomy, geology, and biology (Kossovsky, 2015). Sambridge et al. (2010) proved time between earthquakes, rotation frequencies of pulsars, river lengths of Canada, global temperature anomalies, greenhouse gas emissions, global infectious diseases reported by the , seismic body P-wave speeds of Earth's mantle, the brightness of gamma rays reaching Earth. -reported datasets. Docampo et al. (2009) showed the gross datasets of daily pollen counts from from stations located in European cities with different vegetation and climatology. Vries and 21 Murk (2013) EPA ECOTOX database, LC50 (calculation of lethal concentration at which 50% of the subject succumb) and NOEC (no observed effect concentration) as a tool to quickly screen large amounts of data for irregularities in order to identify the reliability of the data for risk datasets of interpolated NOEC. Brown (2005) authenticity checking tool for several datasets related to the measured concentrations of pollutants in ambient air in the U.K. Analysis of the first digit of the dataset found some datasets ; however, some fit poorly based on their orders of magnitude3. In that study, a plot of the numerical range of datasets versus the sum of datasets containing numbers across a whereas fewer than four orders of magnitude had an exponential reduction in correlation with the law. De Marchi and Hamilton (2006) compared the first digit distribution of air emissions reported by plants in the Toxic Release Inventory with chemical concentration levels measured regulated chemicals, lead and nitric acid were not accurately self-reported. In 2014, Fu et al. onitors in Beijing. A substantial number of Beijing monitors reported daily air quality that departed -time data fit well. Through the use of principal components ,which had higher traffic volumes and housing prices, demonstrated higher correlation in the levels of data manipulation. Dumas and Devine (2000) 3 Increasing one order of magnitude is multiplying the number by 10, increasing two orders is multiplying by, and increasing N orders of magnitude means multiplication of number by. For example, 436 is one order of magnitude greater than 43.6 and is three orders of magnitude smaller than 436000. 22 self-pollution emissions data in an empirical example. They found that distortion in the data, which reduced the mean of the reported data by 9.5-10%, was produced because of a lack of certain first digits existed among the large firms and similarly a low frequency of another digit existed among small firms. Based on the analysis, it was concluded that smaller firms may be distorting values downward to avoid classification in the highest emission category (Title V or over 100tons/year); therefore, ly be more useful in identifying the likelihood of fraudulent reporting within categories of data instead than within specific datasets that conform or lack conformance to the Law. The study also demonstrated that if regulatory aw as an auditing tool, it would still be possible to self-report by other means would be more difficult. Subtraction of a given amount or reduction to meet a Zahran et al. (2014) -reported lead (Pb) emissions in order to evaluate the accuracy of these self-reported datasetgoverning oversight of lead emissions. The goal of their study was to identify systematic changes datasets. The expectation was to find more inaccuracies following the new rule, which lowered the threshold for Pb emissions, but their results showed improved accuracy of self-reported Pb emissions. The study provided further evidence of the utility of statistical analysis using ool to enhance EPA regulations. 23 3.2 OBJECTIVES OF THE STUDY irregularities in environmental datasetconsidered a completely versatile method due to limitations on the types of data it may be applied to (i.e. large, non-uniform, unbounded, etc.), the literature has yet to establish whether it may be employed at a screening level in a tiered approach to assess the veracity of self-reported Law based on the general standards put forth for its use (Dumas & Devine, 2000). In order for ies in datasets, which do not follow the law, must be distinguished from statistical anomalies caused by fraudulent or mishandled self-reported datasets. In this study, we evaluate self-reported discharge data from wastewater treatment plants from one statfor determining: 1) The suitability of wastewater treatment plant data, which contains a variety of physical, 2) The level of conformity that suitable parameter dataset 24 3.3 METHODOLOGY To achieve the objectives of the study, a dataset was obtained from a state environmental agency, containing three years of facility reported discharge parameters including data previously identified as being fraudulently reported. It consisted of 223 facilities, 354 permits, and 96 parameters. Parameters consisted of a number of permitted water quality indicators like dissolved Oxygen (DO), hardness, temperature, pH, fecal bacteria, nutrients (e.g. ammonia, sulfate, phosphorus), metals, minerals (e.g. antimony, magnesium, potassium), and herbicides (e.g. 2, 4, 6-Trichlorophenol). Choosing three years of self-reported data for the analysis could be a reasonable segment of data, as compliance monitoring frequency through inspections occurs yearly, biannually or as infrequently as every five years, based on the size of the facilities. However, the dataset has some unreported values, which may limit the analytical process. There were 4,095 total combinations of facilities and parameters. Each combination is equivalent to a single dataset for evaluation by Benf 3.3.1 Screening the Datasets for Applicability reliable results from the assessment. Therefore, some parameters and datasets should be eliminated from the analysis based on characteristics, which deem them inappropriate for The first screening involved exclusion of the parameters with a built-in minimum and maximum (like pH). The second screening consisted of assessing the uniformity of reported values for the remaining dataset. Evaluation was done based on the range between minimum and maximum of reported values for parameters of every facility and those, which were not spread across at least one order of magnitude, were eliminated from the dataset. The third screening involved elimination of very small datasets. In this study, datasets with less than 25 21 reported values were excluded because smaller datasetthe reason that follows. There should be at least 1 reported value with 9 as the first digit this is istribution (4.6%). The minimum number of reported values with first digits between 8 to 1, were then calculated based Accordingly, the minimum number of reported values for first digits 8, 7, 6, 5, 4, 3, 2 and 1 are 1, 1, 1, 2, 2, 3, 4, and 6, respectively. In the final screening step, datasets containing parameters with means less than their medians and lacking positive skewness were excluded from the evaluation set of data. After excluding the parameters that did not follow the criteria described above, the remaining self-reported datasets were tested to determine if 3.3.2 Analysis A computer code was developed in MATLAB (Statistics Toolbox Release 2014b) to fit each dataset The Pearsonian Chi-square test was calculated based on Equation 3 to evaluate the goodness of fit. (3) where, N is the sum of the observed frequencies, P(o) is the percentage of observed data and P(e) -square test provides an overall measure of the statistical deviation from the expected Benford's Law distribution of numbers in comparison with the observed distributions in the wastewater treatment discharge datasets. To establish goodness of fit, the P-value of the calculated test statistic with degrees of freedom equal to 8 must be greater datasetconfidence. A more conservative confidence level of 99%, which reduces the probability of a 26 Type I error to 0.01, was also used in the evaluation to relax the stringency of the Chi-square penalize certain types of datasets too harshly (Lesperance, Reed, Stephens, Tsao, & Wilton, 2016). 27 3.4 RESULTS 3.4.1 Parameters Eliminated Through BLaw Screening evaluated based using the screening procedures described above. After exclusion of the datasets that did not meet the screening criteria, 690 facility/parameter combinations of 4095 remained or 17% of the initial dataset. Of the 96 reported parameters across the facilities in this dataset, only 21 parameters were able to be considered for further analysis. To meet the criterion of excluding the parameters which have inherent minimums and maximums from the dataset, facility/parameter combinations for 6 parameters pH, pH maximum, pH minimum, Bypass Total hours per day, Dissolved Oxygen and Water Temperature were eliminated from dataset. As a result, 4095 combinations were reduced to 3473. In this step, about 90% of the excluded facility datasets parameter were associated with removing the 3 parameters - pH, Dissolved Oxygen, and Water Temperature. The second screening process consisted of eliminating combinations in which reported values of parameters were not spread across at least one order of magnitude. As a result, 1129 of the 3473 remaining after the first screening were selected for further investigation. The majority of excluded combinations in this step consisted of flow rate, overflow occurrence and Chlorine total residual which were uniform reported values. The third screening process was excluding combinations with a number of reported values less than 21. An additional 369 datasets were eliminated to leave 760 out of 1129. In this step, there were various parameters which were excluded from dataset. This is further described in the section 3.4.2 on classes of facilities ria screening. 28 The remaining combinations were screened to meet the criterion of positive skewness datasets without a mean greater than the median were excluded. The results of this step included the removal of an additional 70 combinations mostly CBOD5. Therefore, the total datasets remaining after elimination was 690, which were tested using Benford's Law to assess conformance.Table 3 contains a listing of the 75 parameters that did not meet the necessary 29 Table 3- Parameters Excluded After Initial Screening the Dataset 1,4-Dichlorobenzene Bypass Occurrence Cyanide, Free Mercury, Total Recoverable Phosphorus, Total In Sludge 2,3,7,8'-TCDD TTE, Total in Sludge Bypass Occurrence, Number per month Cyanide, Total Molybdenum In Sludge Potassium In Sludge 2,4,6-Trichlorophenol Bypass Total Hours Per Day DDE, Whole Sample Nickel, Total In Sludge Salmonella Sp. 48 Hour Acute Pimephales promelas Bypass Volume Dieldrin, Whole Sample Nickel, Total Recoverable Selenium, Total In Sludge 48-Hr. Acute Toxicity Ceriodaphnia dubia Cadmium, Total In Sludge Dissolved Oxygen Nitrogen Kjeldahl, Total Selenium, Total Recoverable 7-Day Chronic Toxicity Ceriodaphnia dubia Cadmium, Total Recoverable Flow Rate Nitrogen Kjeldahl, Total In Sludge Silver, Total Recoverable 7-Day Chronic Toxicity Pimephales promelas CBOD 5 day Fluoranthene Nitrogen, Inorganic, Total Sludge Solids, Percent Total 96-Hr. Acute Toxicity Pimephales promela Chemical Oxygen Demand (Low Level) Gamma-BHC, Total Oil and Grease, Freon Extr-Grav Meth Sludge Solids, Percent Volatile Acute Toxicity, Ceriodaphnia dubia Chlorine, Total Residual Heptachlor Epoxide Oil and Grease, Hexane Extr Method Sludge Volume, Gallons Acute Toxicity, Pimephales promelas Chromium, Dissolved Hexavalent Iron, Suspended (Fe) Oil and Grease, Total Solids, Dissolved-Sum of Antimony, Total Chromium, Hexavalent (Cr +6) Lead, Total Recoverable Overflow Occurrence Strontium, Total Recoverable Antimony, Total Recoverable Chromium, Total In Sludge Manganese, Suspended (Mn) Overflow Volume Thallium, Total (TL) Beryllium, Total In Sludge Chromium, Total Recoverable Mercury, Total (Hg) pH Thallium, Total Recoverable Bis(2-ethylhexyl) Phthalate Chronic Toxicity, Ceriodaphnia dubia Mercury, Total (Low Level, PQL=1000) pH, Maximum Water Temperature Bypass Duration, Hours per month Chronic Toxicity, Pimephales promelas Mercury, Total In Sludge pH, Minimum Zinc, Total In Sludge 30 3.4.2 Class of Facilities Eliminated Through BLaw Criteria Screening In the previous description of the screening, each criterion was addressed sequentially so that the remaining facility/parameter combinations after each screening step was the starting total for the next screening step. In this section, it is more important to know how each screening step independently impact the facilities represented in the remaining dataset, so each criterion is addressed individually based on initial number of combinationswhich was 4095. For example, it would be informative to know the percentage of small and large facilities eliminated due to reported parameters with inherent minimums and maximums. These results provide an understanding of what percentage of the reported values for the overall dataset by facility size consists of parameters with inherent minimums and maximums. The facility size that has the most remaining parameters compared to other categories of facilities would represent an ideal class of facilities for After the impacts of each screening criteria identified, the total impact all criteria were considered to identify the most suitable category of facilities for Benford Facilities were classified into 4 classifications of wastewater treatment plants by flow rate, where , , , and (Pennsylvania Department of Environmental Protection, 2016). Higher discharge flow rate corresponds to larger facility size and the potential for more significant pollutant discharges and associated fees. While the total facility/parameter combinations of datasets was 4095, because of missing flow rates in the reported values for three facilities, the total combinations evaluated in this study decreased to 4010. 31 Of the 4010 datasets, 42% of the combinations were from Class D facilities, 31% Class C, 20% Class B and 7% Class A. To meet the first criterion, facilities with 6 reported parameters pH, pH maximum, pH minimum, Bypass Total hours per day, Dissolved Oxygen and Water Temperature were eliminated from dataset. As a result, 3407 out of 4010 facility/parameter combinations remained. Additionally, it was interesting to know that each class had almost close percentage of elimination. Results of remaining and eliminated percentage of each class are presented in Table 4. Table 4- Results of exclusion of the parameters with inherent minimum and maximum Initial number of facility/parameter combinations Number of combinations - After exclusion of parameters with inherent minimum and maximum (only) Remaining (%) Eliminated (%) Total in classes 4010 3407 85 15 Class A 278 247 89 11 Class B 797 709 89 11 Class C 1254 1114 89 11 Class D 1681 1337 80 20 It was also important to know which classes would be removed more from dataset if only the criterion of having reported values spread across at least one order of magnitude is met. Results of this process are presented in Table 5. Only 32% of Class D remained after this screening process. For other classes, Class C and Class B had 27% remaining; the lowest percentage of among 4 classes while Class A with 36% remaining in this step had the highest percentage. The reason could be due to more variability in the reported values of larger facilities which is normally expected. 32 Table 5- Results of exclusion of the parameters without at least one order of magnitude Initial number of facility/parameter combinations Number of combinations - After exclusion of parameters without at least one order of magnitude (only) Remaining (%) Eliminated (%) Total in classes 4010 1192 30 70 Class A 278 101 36 64 Class B 797 215 27 73 Class C 1254 338 27 73 Class D 1681 538 32 68 Another informative screening could help to know which sizes of facilities have the most and the least exclusion based on the number of reported values. The results could help to target re degrees of confidence in finding self-report data mishandling. Results of exclusion of the combinations with number of reported values less than 21 are presented in Table 6. Among 4 classes, Class A had the least percentage of exclusion which was about 46%. Class B, Class C and Class D had 65%, 63% and 58% removal, respectively and elimination of 64% for Class A, 73% for Class B, 73% for Class C and 68% for Class D. As it was expected, because larger facilities report more frequently, number of reported values for these types of facilities is normally more than other facilities. Hypothesis is that Class D would have lowest number of reported values, but it does not have most elimination from datasets. The last criterion was exclusion of combinations which did not have parameters with mean greater than the median and positive skewness. Results are presented in Table 7. Class evaluation would have more confident in finding mishandling in wastewater treatment plants self-report data if it targets larger facilities. 33 Table 6- Results of exclusion of the parameters with number of reported values less than 21 Initial number of facility/parameter combinations Number of combinations - After exclusion of parameters with number of reported values less than 21 (only) Remaining (%) Eliminated (%) Total in classes 4010 1608 40 60 Class A 278 151 54 46 Class B 797 278 35 65 Class C 1254 468 37 63 Class D 1681 711 42 58 Table 7- Results of exclusion of the parameters without mean greater than median and positive skewness Initial number of facility/parameter combinations Number of combinations - After exclusion of parameters without mean greater than median and positive skewness (only) Remaining (%) Eliminated (%) Total in classes 4010 2004 50 50 Class A 278 158 57 43 Class B 797 357 45 55 Class C 1254 561 45 55 Class D 1681 928 55 45 The overall presented in Table 8. Class A combinations decreased from 278 to 65, which had the maximum remaining percentage among all 4 classes with 23%. Remaining percentages of Class B, Class C and Class D were 15%, 18% and 17%, respectively. Since in each screening process, Class A had less exclusion compared to other classes, this was expected to be observed in overall as well. 34 Table 8- Results of overall screening process Initial number of facility/parameter combinations Number of combinations - After overall screening process Remaining (%) Eliminated (%) Total in classes 4010 690 17 83 Class A 278 65 23 77 Class B 797 119 15 85 Class C 1254 226 18 82 Class D 1681 280 17 83 Lso it could be helpful for environmental regulators to focus more on larger facilities when veracity of self-reported datasets. 3.4.3 combinations conformed the with 95% confidence. Due to the value equal to 0.05, the datasetconservative 99% confidence level. The results of this evaluation were an overall increase in conformance from 31% to 42%. While the percentage of the conforming datasets is low, it is highly unlikely that 58% of the self-reported datasets are fraudulent or mishandled. According to the state agency providing the raw data, only one parameter/facility combination was known to contain false data during the time period evaluated in this study. Therefore, it is more probable that characteristics of the wastewater treatment plant discharge datasets exists, which make datasets of similar size and scope. Since it is well established that datasets with more reported values, that can spread across several orders of magnitudes, conformance based on the size of each facility/parameter data stream. Table 9 shows the level of conformance for categorized datasets at 95% and 99% confidence based on the number of 35 reported values for every facility/parameter combination. Facility/parameter datasets were grouped into 12 categories as described in the Table 9. The count column indicates the number of facility/parameter combination datasets in each range for the number of reported values. Although higher percentages of conformance were expected for larger datasets, this was not observed. In fact, the highest level of conformance was for the group with between 21 and 50 reported values and the lowest level of conformance was 0 for categories of reported values between 1000 to 1500, 1500 to 2000 and more than 2000. Because a consistent pattern was not observed across the categories of datasets by size, we concluded that the number of reported values or lack of reported values alone, cannot be responsible for such low conformance in this analysis. The parameters with the highest number of reported values were Total Suspended Solids, Nitrogen-Ammonia (NH3). The parameters with the lowest number of reported values were Nitrogen-Ammonia (NH3), Hardness-Total (CaCO3), Residue-Total Dissolved, Residue-Total Filterable, Total Suspended Solids, Mercury-Total (Low Level), Nitrite Plus Nitrate-Total, Sludge-Fee Weight, Phosphorus-Total (P), Fecal Coliform, E. coli., Lead-Total in Sludge, Arsenic-Total in Sludge, Barium- Total Recoverable, Copper-Total in Sludge, Copper-Total Recoverable, Zinc-Total in Sludge. Although Total Suspended Solids and Nitrogen-Ammonia (NH3) were among both the highest and lowest numbers of reported values, the datasets with the lowest numbers of reported values and highest level of conformance to Benforincluded parameters for metals and microbial indicators. 36 Table 9- Analysis based on number of reported values with 95% CI As previously reported, it may be useful for a regulator to idea the likelihood of fraud among a set of self-reported values across several facilities. For this purpose, parameters were grouped into four categories - nutrients, metals, microbial indicators and solids. Table 11 shows the groupings along with the associated level of conformance. Number of Reported Values (NRV) Count Percentage of Conforming Facilities (p=0.05) Percentage of Conforming Facilities (p=0.01) 21