ENHANCING CORPORATE CRIME ENFORCEMENT WITH MACHINE
 LEARNING—A MULTIDISCIPLINARY RISK FACTOR APPROACH
                                By
                            Fiona Chan
                       A DISSERTATION
                           Submitted to
                   Michigan State University
           in partial fulfillment of the requirements
                        for the degree of
            Criminal Justice—Doctor of Philosophy
                               2022


                                        ABSTRACT
      ENHANCING CORPORATE CRIME ENFORCEMENT WITH MACHINE
         LEARNING—A MULTIDISCIPLINARY RISK FACTOR APPROACH
                                            By
                                        Fiona Chan
        Despite its severe and lasting social and financial ramifications, corporate
financial crime remains one of the most understudied crime types, as it is often
hindered by two challenges. First, its multidisciplinary nature requires both
financial and criminological expertise among others to conduct proper
investigations. Second, corporate crime data is fraught with constraints such as
high dimensionality, complex interactions, and nonlinear functional forms that are
ill-suited for classical statistical modeling. The lack of research coupled with the
limited resources in corporate crime enforcement represent a great impediment to
the advancement of fraud interventions. This dissertation seeks to overcome these
specific challenges by unifying cross-disciplinary financial fraud research under a
risk factor framework, and by leveraging recent advancements in artificial
intelligence. The goal is to examine whether two machine learning algorithms—
random forest and neural network—can be used to enhance corporate fraud risk
detection/prediction beyond more commonly employed analytical techniques.
        Findings from the analysis showed that the random forest algorithm
outperformed logistic regression and a naïve classifier in a 1:1 matched sample. The
neural network performed better than a naïve classifier but slightly worse than
logistic regression. Feature selection improved the algorithms’ predictive accuracy


and ability to distinguish between classes even further. Despite promising results
from the 1:1 matched sample, both machine learning algorithms struggled with a
heavily imbalanced 1: many dataset, which represents a more realistic setting. With
the implementation of an oversampling strategy and feature selection, the
algorithms improved substantially in identifying the rare fraud cases, and showed
promise of improvement with further research on imbalanced classification.
       Feature importance from the random forest classifier identified risk factors
that are consistent with findings from prior studies. Measures of financial distress
ranked lower in importance than measures of financial health, suggesting future
research can build on prior findings on corporate strain to examine specific
mechanisms. The analysis also identified auditor independence as a key concept of
guardianship and opportunity structure that warrants further study. Findings from
this research also have important methodological implications for corporate crime
studies—namely, the need to improve measurements of organizational-level fraud
risk factors.


In memory of my lost lemon.
            iv


                               ACKNOWLEDGEMENTS
        My long journey of pursuing a Ph.D. comes with an equally long list of
individuals to whom I owe my sincerest gratitude. I express my deepest thanks to
my dissertation committee: Drs. Carole Gibbs, Michael Benson, Steven Chermak,
Chris Melde, and Wenjuan Ma, for your invaluable insights on this dissertation
project, and for your immense contribution to my professional growth.
        Carole, thank you for being the best mentor a PhD student can ask for. I can
honestly say I would not have survived this program without your guidance and
encouragement. Thank you for all the time and effort you have invested into being
such a genuine mentor to me, and for always having my best interest at heart.
Mike, thank you for letting this random public accountant show up at your office
door years ago, and for inspiring her to pursue a career in academia. My life as an
academic began with you, and I am forever grateful for your continuous support.
Rae, thank you for being the most wonderful colleague, friend, and human being.
You make the hardest parts of this journey so much more tolerable.
        Thank you, mom and dad, for your unconditional love and care, even when
faced with life-altering crises of your own. Porro, for keeping my mental health in
check. Holmes and Adie, for the sacrificed walkies and fetch time. Mr. and Mrs. K,
for all the crisis management. Matthew, for grad school rants and reminders of life
beyond career. Dear friends, for the much-needed mandatory fun. Most importantly,
thank you, Kyle, for being the one constant that makes everything good possible.
                                            v


                                   TABLE OF CONTENTS
LIST OF TABLES ....................................................................................................... viii
LIST OF FIGURES ........................................................................................................ ix
CHAPTER 1. INTRODUCTION .................................................................................... 1
  1.1| Problem Statement ............................................................................................. 1
  1.2| Goals and Merit of Current Research ................................................................ 3
CHAPTER 2. BACKGROUND AND RELEVANT LITERATURE .............................. 6
  2.1| Challenges in the Study of White-Collar and Corporate Crime ....................... 6
    2.1.1| Definition of Corporate Financial Fraud..................................................... 6
    2.1.2| Methodological and Data Challenges .......................................................... 8
    2.1.3| Enforcement Challenges .............................................................................. 8
  2.2| The Risk Factor Approach ................................................................................ 10
    2.2.1| Overview ..................................................................................................... 10
    2.2.2| Corporate Crime Risk Factors ................................................................... 11
CHAPTER 3. CURRENT RESEARCH ........................................................................ 17
  3.1| Research Questions ........................................................................................... 17
  3.2| Identification of Corporate Financial Fraud Risk Factors ............................. 17
  3.3| Machine Learning ............................................................................................. 21
    3.3.1| Overview ..................................................................................................... 21
    3.3.2| Differences between Machine Learning and Inferential Statistics ......... 23
    3.3.3| Suitability for White-collar and Corporate Crime Data ........................... 25
CHAPTER 4. DATA AND ANALYTIC STRATEGY ................................................... 27
  4.1| Sampling Methodology...................................................................................... 27
    4.1.1| Fraud Sample ............................................................................................. 27
    4.1.2| Non-Fraud Samples.................................................................................... 31
  4.2| Risk Factors Data Collection ............................................................................ 34
  4.3| Machine Learning: Application ........................................................................ 37
    4.3.1| Overview of Machine Learning Procedures .............................................. 37
    4.3.2| Random Forest Classifier ........................................................................... 41
    4.3.3| Neural Network Classifier ......................................................................... 46
CHAPTER 5. RESULTS ............................................................................................... 53
  5.1| Performance Evaluation Metrics...................................................................... 54
  5.2| Results from the 1:1 Sample............................................................................. 57
  5.3| Feature Selection & Importance ...................................................................... 63
  5.4| Results from the 1:many Sample ..................................................................... 70
                                                      vi


    5.4.1| The Challenge of Classifying an Imbalanced Sample .............................. 70
    5.4.2| Synthetic Minority Oversampling Technique (SMOTE) .......................... 73
  5.5| Key Findings ..................................................................................................... 80
CHAPTER 6. DISCUSSION......................................................................................... 82
  6.1| Limitations of the Present Study ..................................................................... 82
  6.2| Discussion and Implications ............................................................................. 86
    6.2.1| General Discussion and Methodological Implications .............................. 86
    6.2.2| Theoretical Implications ............................................................................ 90
    6.2.3| Practical Implications ................................................................................ 92
    6.2.4| Directions for Future Research.................................................................. 93
    6.2.5| Conclusion ................................................................................................... 94
APPENDICES ............................................................................................................... 96
  APPENDIX A. SEARCH TERMS & DATABASES................................................. 97
  APPENDIX B. CORPORATE FINANCIAL FRAUD RISK FACTORS ................. 98
  APPENDIX C. DESCRIPTIVE STATISTICS AND CORRELATIONS ............... 104
  APPENDIX D. KEY TO FEATURE IMPORTANCE (FIGURE 11) ..................... 106
  APPENDIX E. SUPPLEMENTAL ANALYSIS ..................................................... 107
BIBLIOGRAPHY ........................................................................................................ 109
                                                      vii


                                            LIST OF TABLES
Table 1. Fraud Sample Industries (n=450) ................................................................. 29
Table 2. Financial Risk Factors ................................................................................... 36
Table 3. Organizational Risk Factors .......................................................................... 36
Table 4. Classification Results from Random Forest and Neural Network
         (n=760) ............................................................................................................ 58
Table 5. Random Forest Models with Risk Factor Subsets ........................................ 68
Table 6. Classification Results (n=10,792) .................................................................. 71
Table 7. Results of 1:Many Sample After SMOTE ...................................................... 78
Table 8. Results with SMOTE and Feature Selection ................................................ 79
Table 9. Financial Fraud Risk Research and Synthesis ............................................. 98
Table 10. Risk Factor Descriptives and Point-Biserial Correlations (n=760) ......... 104
Table 11. Classification Results with Years Included .............................................. 107
                                                         viii


                                   LIST OF FIGURES
Figure 1. Statistical Modelling ..................................................................................... 24
Figure 2. Statistical/Machine Learning ....................................................................... 25
Figure 3. Corporate Financial Fraud Enforcement Trend ......................................... 31
Figure 4. A Single Classification Tree ......................................................................... 43
Figure 5. Gini Impurity Calculation ............................................................................ 44
Figure 6. Neural Network Structure ........................................................................... 48
Figure 7. Forward Propagation .................................................................................... 49
Figure 8. Confusion Matrix for Fraud Classification .................................................. 55
Figure 9. ROC Curve and AUC for Models Comparison ............................................ 61
Figure 10. Feature Importance from Random Forest (n=760) ................................... 63
Figure 11. Feature Importance from Subset 4 ............................................................ 69
Figure 12. 1:many Industry-Matched Sample (n= 10,972) ......................................... 70
Figure 13. Training Sample Before and After SMOTE .............................................. 77
                                                  ix


                                   CHAPTER 1. INTRODUCTION
1.1| Problem Statement
         Corporate financial fraud broadly refers to the misrepresentation of an
enterprise’s financial condition through accounting practices that deviate from the
Generally Accepted Accounting Principles (GAAP) and/or omission of pertinent
information from mandated disclosures.1 Despite being one of the least prevalent
forms of white-collar crime, corporate financial fraud has consistently been shown to
be the costliest (see ACFE’s Reports to the Nation 2010-2018). Dyke et al. (2013)
estimated annual cost of corporate financial fraud to be upwards of $181 billion in
the U.S. The substantial monetary losses associated with corporate financial fraud
not only devastate businesses, they also threatens the livelihood of public investors
and employees, jeopardizing retirement savings and job securities. It further
impacts the economy as a whole, as investors lose trust and confidence in the
capital markets (FBI, 2018). Thus, these consequences of corporate financial fraud
can be enduring and far-reaching.
         Despite these social consequences, corporate crime research and enforcement
are each faced with its own difficulties. Resources dedicated to the regulation and
enforcement of corporate financial fraud is relatively limited in comparison to
conventional street crime (Karpoff, Koester, Lee and Martin, 2017). Perceptions of
lax enforcement (Unnever, Benson and Cullen, 2008; Holtfreter, Van Slyke,
1
  GAAP represent a standard set of accounting principles set forth by the Financial Accounting Standards
Board (in the U.S.), with which all publicly listed corporations must adhere to. See below for further
discussion on definitions of corporate financial fraud.
                                                        1


Bratton, and Gertz, 2008) creates a challenge with achieving deterrence, as effective
deterrence requires swift and certain detection and proportionate punishment
(Bentham, 1962; Becarria, 1764).
       There is also a lack of evidence-based research to inform corporate crime
policies and prevention efforts. Scholars have attributed this paucity of corporate
crime research to resource constraints, lack of centralized, longitudinal official data,
conceptual ambiguity of the definitions of white-collar and corporate crime, and
offenders’ resourcefulness in avoiding prosecution (Rorie et al., 2018; Paternoster,
2016; Simpson & Yeager, 2015; Braithwaite, 2016; Simpson, 2013). In addition to
these commonly cited drawbacks, two additional factors may have contributed to
this problem.
       First, corporate crime is an inherently multidisciplinary problem, requiring
domain-specific knowledge from disparate technical fields of study. Research
interest in different aspects of corporate financial fraud has generated scattered
pockets of knowledge in disciplines such as criminal justice, accountancy/finance,
information systems, organizational psychology, linguistics and communications
(discussed in further detail below). While scant in quantity, lessons learned from
this body of work, when combined, may help advance current prevention and
intervention efforts targeting corporate financial fraud. Recognizing this need,
scholars have called for more interdisciplinary collaboration (Simpson, 2013;
Trompeter, Carpenter, Jones and Riley, 2012).
                                            2


       Second, corporate crime research is fraught with methodological challenges
related to statistical modeling, due to the complex interactions of cross-level
antecedents and insufficient degrees of freedom (Perols et al., 2016). Exacerbating
the problem of weak data, many commonly employed statistical models suffer from
an underfitting problem, as they fall short in capturing the complexity of the data
structure and relationships between corporate crime variables (Simpson, 2013;
Paternoster, 2016). Operating under the frequentist philosophy on probability,
these models also impose strict assumptions that are ill-suited for corporate crime
data (Perols et al., 2016). As a result, researchers face tremendous difficulties in
establishing baseline relationships between potential antecedents and corporate
crime that can inform enforcement practices (Schell-Busey et al., 2016).
1.2.| Goals and Merit of Current Research
       By combining criminological theories, domain knowledge in accounting and
linguistics/ communication, and technological and analytical advancements in the
information systems disciplines, this dissertation aims to initiate a first step in
overcoming some of the challenges described above. The goal of this project is to
develop machine learning fraud prediction algorithms and assess whether recent
developments in this subfield of artificial intelligence can aid in corporate financial
fraud detection. This goal can be further broken down into three objectives—first is
to synthesize cross-disciplinary knowledge on corporate financial fraud under a risk
factor framework; next is to use the identified risk factors as inputs to develop two
                                             3


machine learning risk assessment tools and compare them against standard
benchmarks for financial fraud detection; last is to further our understanding on
one of the identified risk factor groups—deception cues—that has received very
little attention in corporate crime research both theoretically and empirically.
        There is much merit to accomplishing these goals. By adopting a risk factor
approach to fraud detection, the current project helps unify cross-disciplinary
literature and brings cohesion to the understudied field of corporate crime. And by
leveraging the most recent advancements in computational data analytics, this
dissertation project seeks to overcome the data scarcity and methodological
challenges that hamper corporate crime research. Risk data compiled for the project
represents the first step toward a comprehensive multidisciplinary corporate
financial fraud risk database with combined explanations that aid in
interdisciplinary theory development. Data on the fraud and matched firms will
also represent the most recent empirical data set for corporate crime gathered from
original open-source data. Together, these datasets will constitute a step forward in
overcoming the data deficiency problem in white-collar crime research. The final
machine learning models will contribute to crime prevention by identifying the key
risk factors that are most predictive of corporate financial fraud. This will not only
pinpoint specific areas most in need of interventions, but will also form the
quantitative basis for subsequent qualitative causal mechanism inquiries. Finally,
the machine learning models will also contribute to risk assessment methods for
other crime types or for research on reoffending.
                                            4


       In terms of practical significance, given the limited resources afforded to
corporate financial fraud regulatory agencies, the machine learning models may
provide an efficient tool in screening and detecting fraud cases. Since the algorithms
are adaptive and scalable, it will be able to accommodate new data and efficiently
produce the predictive results that swift enforcement requires. Improvement in the
celerity and certainty of detection will in turn directly improve deterrence of
corporate financial fraud. It may also advance our understanding on the true scope
of corporate financial fraud. Since corporate crime tends to accumulate in severity
the longer it remains undetected, more efficient detection tools can aid in reducing
government spending on rectifying corporate misconduct in the long run.
Furthermore, higher enforcement levels can raise investor awareness (Brazel et al.,
2015), which can introduce a different form of crime control mechanism.
                                             5


            CHAPTER 2. BACKGROUND AND RELEVANT LITERATURE
2.1| Challenges in the Study of White-Collar and Corporate Crime
    2.1.1| Definition of Corporate Financial Fraud
        White-collar and corporate crime research is difficult to synthesize in part
due to definitional inconsistencies. Over seven decades after Edwin Sutherland
coined the term “white-collar crime” (1949), its definition continued to remain a
source of debate amongst white-collar and corporate crime scholars. Most research
have now adopted the approach of defining the crime based on the goal of the
research (Friedrichs, 1992). For the purposes of this dissertation, corporate crime is
defined as “conduct of a corporation, or of employees acting on behalf of a
corporation, which is proscribed and punishable by law” (Braithwaite, 1984; p.6). A
corporation refers to a legal entity formed under the laws of its state of
incorporation. It is considered a “legal person” distinctive from its owners. But even
with an offense-based definition, what have been referred to as corporate financial
fraud so far could mean financial misreporting, financial misrepresentation,
fraudulent interstate transactions, and/or securities fraud involving manipulative
and deceptive devices, depending on the data used in the research.2 These subtypes
of corporate financial fraud are enforced under various sections of the Securities
Exchange Act of 1934.
        Financial misreporting refers violation of Section 13(a), which requires timely
filing of financial reports (including annual 10-K reports and quarterly 10-Q
2
  Language and definitions presented here are adopted from Amiram, Bozanic, Cox, Dupont, Karpoff and
Sloan (2018).
                                                      6


reports) from publicly listed firms. Financial misrepresentation refers to violation of
Section 13(b), which requires listed firms to keep accurate books and records and
maintain an effective internal control system that ensures accurate reporting. In
particular, financial records, statements and disclosures must be in conformity to
the Generally Accepted Accounting Principles (GAAP). GAAP represents a standard
set of accounting rules and principles set forth by the Financial Accounting
Standards Board (FASB) and are codified under the Accounting Standards
Codification (ASC). Section 17(a) prohibits the use of interstate commerce for the
purpose of fraud and deceit, and finally Section 10(b) prohibits the use of
manipulative and deceptive device with the purchase and sale of any security.3
         The present study focuses on financial misrepresentation, and more
specifically Sections 13(b)(2)(A) and 13(b)(2)(B) of the Securities Exchange Act
(1934). While cases involving these sections can also involve financial misreporting,
fraudulent interstate transactions or securities fraud, it is important to make the
distinction between misstatements and omission from financial reports and
disclosure (financial misrepresentation) and a late filing (financial misreporting),
for instance, as one would expect the risk factors associated with these two subtypes
of corporate financial fraud to be distinctive due to the different motivations and
mechanisms of offending.
3
  Rule 10(b)-5 is frequently levied in private enforcement against alleged fraud firms through class action
lawsuits.
                                                       7


   2.1.2| Methodological and Data Challenges
       In addition to definitional debates that may hamper empirical corporate
crime research, more technical difficulties with data quality and statistical
modeling are often the culprits for null findings and uninterpretable results. Unlike
the more readily available and systematically collected FBI official data for offense-
based white-collar crimes, corporate crime regulation agencies are far from
consistent in records of corporate crime instances. Antecedents of corporate crime
are often high in quantity and multilevel in nature, and are characterized by
complex interactions across industry, firm and individual levels. This high
dimensionality coupled with small datasets often lead to insufficient degrees of
freedom (Perols et al., 2016). Exacerbating the problem with cross-sectional data,
rigid functional form and strict assumptions, many empirical analyses exhibit signs
of underfitted models (Simpson, 2013; Paternoster, 2016; Perols et al., 2016). Scarce
replicable empirical results are likely due to these research difficulties pertaining to
data and methods. Scarce replicable baseline associations between a predictor and
corporate crime makes informing policies and practices with evidence-based
research near impossible.
   2.1.3| Enforcement Challenges
       Enforcement of corporate crime has often been criticized as lax in public
perception (Unnever, Benson and Cullen, 2008; Holtfreter, Van Slyke, Bratton, and
Gertz, 2008). For example, Cohen et al. (2015) reported slippage in SEC’s
mandatory disclosure requirements; Cox et al. (2016), found substantial percentage
                                            8


of unenforced joinders of multiple unconnected items in proxy resolutions. Some
corporate crime scholars have attributed this limited enforcement to the lack of
resources dedicated to the regulation and enforcement, especially when compared to
conventional street crime (Karpoff, Koester, Lee and Martin, 2017; Feroz et al.
1991). Others have placed blame on regulatory capture (Bozanic et al., 2012;
Vaughan, 2002). When accounting irregularities are uncovered and come into the
attention of the SEC, it often takes 36 months for the SEC’s Enforcement Division
to open, investigate and file a case (Woodcock, Shipchandler, McKown and Day,
2019). As such, the enforcement agencies often only take on investigations and
prosecutions when there is a high probability of conviction. The cases that are
processed typically result in a class-action civil suits against senior executives of the
firm itself (COSO, 1999).
          The SEC seldom prosecute audit firms associated with the corporations
accused of financial misconducts (Brennan and McGrath, 2007).4 Corporations that
attribute their financial misrepresentation to audit failure often have to hold the
auditing firms accountable via private litigation.5 In addition to budgeting issues,
researchers have also pointed out other struggles that plague the SEC—such as its
ineffectiveness in collecting disgorgement, constitutional challenges to its
administrative procedures, and the growth of digital asset offerings. These
4
  Publicly traded firms are required to have their financial statements opined on by external auditors. An
unqualified opinion meant that external auditors have deemed the company’s financial reports to be free of
material misstatements.
5
  Palmros (1987) found that auditor litigations are often associated with the economic climate. That is, auditor
litigations tend to increase during economic recessions.
                                                         9


challenges have been exacerbated by recent political climate, with hiring freezes
and staffing decreases (by 400 positions compared to 2016) and enforcement
personnel reduced by 10%, fewer cases involving public companies are filed
(Woodcock et al., 2019).
       The phenomenon described above creates tremendous challenge in achieving
deterrence, as enforcement is neither swift, certain, nor severe (Bentham, 1962;
Becarria, 1764). The lack of evidence-based policies derived from research coupled
with the lack of consistent enforcement effort limits the effectiveness of corporate
crime control efforts, which is evident in the recurring corporate financial scandals
every few years or so.
2.2| The Risk Factor Approach
   2.2.1| Overview
       A risk factor refers to any attribute or characteristic of an organization that
increases its likelihood of corporate financial fraud. Identification of risk factors
plays an important role in both enforcement and research of corporate financial
fraud. With regards to enforcement, all publicly traded companies are required to
disclose self-assessed risk factors in their annual financial report (Form 10-K) to the
Securities Exchange Commission (SEC). External auditors are also required to
consider a clients’ fraud risk factors as part of the annual audit procedures in
accordance to the Statements of Auditing Standards No.99. With regards to
research, risk-focused studies have contributed to numerous crime prevention
                                            10


efforts in other criminal justice domains (e.g., Farrington, 2000). Risk-based
research lays the groundwork in establishing basic patterns and relationships
between relevant antecedents and the crime of interest, from which future research
can be built. Some scholars (e.g., Bernard and Snipes, 2016) have even argued for a
risk factor approach to theory integration, as the traditional approach of theory
falsification does not appear to have made much progress in theory reduction.
        Given the importance of risk factors and the research interest evident in
various disciplines, the proposed project aims to aggregate research on corporate
financial fraud to compile a comprehensive set of corporate financial fraud risk
factors. However, identification alone is insufficient, as unexplained risk factors
give regulators, enforcement agents and researchers very little guidance in
planning relevant provisions and interventions to mitigate those risks. Thus, I
would also like to document the domain specific explanations associated with each
risk factor and how it may be linked to crime prevention and criminal justice
theories broadly.
   2.2.2| Corporate Crime Risk Factors
        White-collar crime scholars have linked performance pressure, organization
size, and organizational structure and complexity to various types of corporate
crime, although empirical findings are not consistent. For example, some studies
found a weak but significant negative relationship between multiple corporate
offense types and firm profit (Clinard and Yeager, 1980; McKendall et al., 2002),
profitability trends such as declining financial performance (Keane, 1993), low
                                           11


growth (Alexander and Cohen, 1996; Clinard and Yeager, 1980), substantial savings
for the firm (Paternoster and Simpson, 1996), and financial distress such as
bankruptcy risk (Schwartz et al., 2021). Schuchter and Levi’s (2013) interviews with
high-profile fraudsters provided further support on the role of performance pressure
as a salient factor in their crime decisions and rationalizations. Yet, other research
has indicated that firm profits are unrelated to financial (e.g. Simpson, 1986) and
non-financial crimes (Baucus and Near, 1991; Hill et al., 1992); some even found
profits (McKendall and Wagner, 1997) and growth (Simpson, 2002; Wang &
Holtfreter, 2012) to be associated with increased violations.
       Though inconsistent (McKendall and Wagner, 1997; Paternoster and
Simpson, 1996; Simpson, 2002), overall the literature suggests a positive
relationship between organization size and financial crimes (Schwartz, 2021;
Baucus and Near, 1991; Simpson, 1986) as well as non-financial crimes such as
discrimination (Baucus and Near, 1991) and environmental cases (Alexander and
Cohen, 1996). While complexity (McKendall and Wagner, 1997) is theoretically
relevant, diversification is unrelated to multiple types of corporate misconduct
(Clinard and Yeager, 1980; Hill et al., 1992). However, more criminogenic industries
can exacerbate financial strains and accordingly, violation rates (Wang &
Holtfreter, 2012). In addition to these empirical findings, scholars have also
theorized extensively on the criminogenic properties of organizational structures.
Certain cultural mandates, political environments and departmentalization within
an organization can engender corporate misconducts, by facilitating acts of
                                           12


normalized deviance, concerted ignorance and/or structural secrecy (e.g., Vaughan,
2002; Prechel and Morris, 2010). For instance, knowledge of misconducts can be
compartmentalized within subunits of a complex organization such that detection
by another subunit is difficult.
        Accounting and organizational research is more specific to corporate financial
fraud. In addition to case studies of major “creative accounting” scandals (e.g.,
Cohan, 2002; Bhasin 2013; Jones, 2011), this body of research also identified a
variety of risk factors associated with corporate financial fraud. For example,
various financial metrics such as those measuring rapid growth or financial
instability (e.g., cash flow, debt, and sales-related indices) and operational efficiency
(e.g., assets-related indices and turnover ratios) are linked to higher likelihood of
fraudulent financial reporting. Firm characteristics are also relevant; the lack
outside blockholders6 (Dechow et al., 1996), high latitude of managerial discretion
(e.g., discretionary accrual estimates) (Beneish, 1999; Bell and Carcello, 2000), and
firms in certain industries are also more likely to engage in specific types of
financial misstatements (Beasley et al., 2000).
        Other studies focus on the individual incentives/motivations for committing
corporate financial fraud. Executive equity incentives such as stock options have
received mixed support in relation to financial fraud (e.g., Burns and Kedia, 2006;
Erickson et al., 2006; Efendi et al., 2007). Meeting analyst expectations was
6
  Owners of large blocks of company shares and/or bonds with special voting rights.
                                                    13


associated with an increase in accounting scandals at the macro level (Koh et al.,
2008) and the organizational level (Perols and Lougee, 2011).7
         Then there are studies that have examined the guardianship dimension of
opportunities for corporate financial fraud. These studies link weak internal
controls, corporate governance and audit quality to the likelihood of financial
misstatements. Specifically, board composition and/or characteristics such as
committee independence (Beasley, 1996; Abbott et al., 2004), the presence of
financial experts (Farber, 2005), CEO’s tenure and his/her dual roles as Chairman
of the board (Dechow et al., 1996) are board characteristics that significantly predict
financial fraud. Loebecke et al. (1989) also attributed corporate financial fraud to
weak internal controls that are dominated by motivated management. Smith et al.
(2000) raised concern for the effectiveness of external auditors in their guardianship
role when they found that risk assessments only alter the allocation of control
versus substantive testing, but did not increase the likelihood of fraud detection.
Auditor characteristics, including auditing firms’ size, industry specialization,
tenure with the client, and individual experience of the audit team, are also
associated with likelihood of fraud (Carcello and Nagy, 2004; Myers et al., 2003).
         Linguistics and communications scholars have identified deception-related
risk factors that represent verbal, visual and audible cues of deceitful content (e.g.
Dyer et al., 2016; Throckmorton et al., 2015; Humphreys, Moffit, Burns, Burgoon
7
  Financial analysts specializing in specific public corporations/industries make periodical forecast and
prediction on how the companies perform prior to actual filing of company financial statements to the SEC.
These forecasts are generally viewed by the public as expert advice for investment purposes. As such,
companies have an incentive to meet analyst expectations in order to preserve the trend of their stock prices.
                                                        14


and Felix, 2011). Humphreys et al. (2011) examined linguistic cues and found
significant differences between fraud and non-fraud firms. According to these
authors these linguistic-based risk factors may be management’s deliberate attempt
to deceive (explained by Bloomfield’s management obfuscation hypothesis and
McCornack’s information manipulation theory) (Bloomfield, 2002; McCornack,
1992), or they may be attempts to hide unintentional “leakage” of deception cues
that stem from being dishonest (explained by interpersonal deception theory and
four factor theory) (Buller and Burgoon, 1996; Zuckerman, DePaulo and Rosenthal,
1981).
       Finally, organizational psychologists have examined organizational justice
related risk and protective factors that may facilitate corporate misconduct or
encourage whistleblowing (e.g. Young, 2013; Seifert et al., 2010; Lewicki et al.,
2005; Cropanzano et al., 2001). Other factors such as corporate culture and
perceived likelihood of retaliation are linked to intention to whistleblow (Commers,
2004; Mesmer-Magnus and Viswesvaran, 2005), which may serve as protective
factors for corporate financial crime. The concept of relational governance (Poppo &
Zenger, 2002; Kramer, 1999; Zajac & Olsen, 1993) similarly represents protective
factors that are akin to the informal crime control mechanisms in the street crime
literature.
       As one may have observed, research from diverse disciplines have applied
their respective interests to cases of corporate crime. Despite the diversity of
findings and conclusions, they provide unique and valuable insights when
                                           15


combined. Systematic organization of these risk factors under a criminological
framework will provide a multi-disciplinary understanding of the understudied area
of corporate financial fraud.
                                         16


                                 CHAPTER 3. CURRENT RESEARCH
3.1| Research Questions
         In light of the project’s objectives to synthesize cross-disciplinary knowledge
and to overcome current methodological challenges by ways of machine learning,
the research questions that will be addressed in this dissertation are as follows:
    1. Can the multi-disciplinary risk factors identified by research be used to
         predict corporate financial fraud with the use of a random forest classifier
         (i.e., does the algorithm perform better than a naïve classifier8)?
    2. How does the random forest classifier perform in comparison to commonly
         employed prediction tools (e.g., logistic regression)?
    3. Which of the multi-disciplinary risk factors are most important in predicting
         corporate financial fraud?
    4. Can the multi-disciplinary risk factors be used to predict corporate financial
         fraud with the use of a deep neural network classifier? (i.e., does the
         algorithm perform better than a naïve classifier)?
    5. How does a neural network classifier perform in comparison to logistic
         regression and the random forest classifier?
3.2| Identification of Corporate Financial Fraud Risk Factors
         In the previous chapter, I reviewed the risk factors associated with corporate
crime in general. To focus on empirically measured and tested risk factors
8
  A naïve classifier refers to one that predict the classes randomly (i.e., predicts no better than random chance)
or predict the same class invariably (e.g., predicts every case as fraud).
                                                          17


associated with corporate financial fraud specifically, I systematically reviewed the
literature by searching in a list of academic databases (see Appendix A) for variants
of the following search terms: “accounting fraud”, “financial fraud”, “financial
reporting fraud”, “financial statement fraud”, “management fraud”, “earnings
management”, “financial misstatement”, “earnings quality”, and “audit quality”. As
fraud detection research has notoriously been associated with high dimensionality
(Perols, Bowen, Zimmerman and Samba, 2016), I limited my documentation of risk
factors to organizational-level ones only. Appendix B represents the extensive list of
risk factors that resulted from this identification process. Although the current
study focuses on fraud prediction, I believe it is necessary to understand the roles
these risk factors play in influencing corporate financial fraud in order to shed light
on potential future interventions and provisions. As such, I also documented the
conceptual explanation of the hypothesized relationships between these risk factors
and corporate financial fraud. Many of the identified risk factors pertain to
management’s motivation and opportunity to commit corporate financial fraud.
       The motivation-based risk factors identified can be conceptually subdivided
into pressure-related or incentive-related motivations, explained by different sets of
theories. They are also consistent with the accounting regulations’ (namely,
Sarbanes-Oxley Act section 404 and Statement of Auditing Standards No. 99)
adaptation to Cressey’s (1953) fraud triangle. Prior studies have explained the
origins of pressure by applying Merton’s (1983) strain theory to the corporate
context (Gross, 1980; Vaughan, 1983; Simpson and Koper, 1997; McKendall and
                                           18


Wagner, 1997; Clinard and Yeager, 2006; Wang and Holtfreter, 2012). These
studies suggested that financial strain is constantly present in the corporate
environment, and sources of pressure can stem from multiple levels. Industry-level
strain may exist when an industry is declining and common resources are scarce,
leading to diminished legitimate means to achieve financial goals (Clinard and
Yeager, 2006). Driven by profit maximization and meeting earnings expectations,
organizational strain may be generated both internally by management and
externally by investors and creditors. Finally, potential offenders may experience
personal-level strain that supplies motivation for fraud. Another related white-
collar crime construct is Wheeler’s (1992) “fear of falling”. Applying the same logic
as Piquero (2012) to an organizational context, corporations may be susceptible to
financial pressure in fear of losing competitiveness in addition to personal losses.
       Incentives refer to motivation that is driven by rewards from perpetrating the
fraud. Rewards can be financial gain or intrinsic to the offender (such as
reputation). A classic incentive-driven risk factor for corporate financial fraud is
whether management was granted stock options as part of the compensation
structure. The conflict of interest that arises between agent (management) and
principal (corporation) when their goals no longer align is described in agency
theory (Benson & Simpson, 2018; Shapiro, 1990).
       The opportunity-based risk factors identified from literature primarily
focused on guardianship. Unlike conventional street crime, corporate crime
offenders often have legitimate access to targets; they also need not converge
                                            19


physically in time and space (Cohen and Felson, 1979; Benson, Madensen and Eck,
2009). Due to these specific characteristics of corporate crime, target hardening has
little applicability in corporate financial fraud scenarios (especially since
management is afforded broad-based access). Thus, corporate financial fraud
guardianship focuses primarily on the discovery of fraud. In other words, for
management to successfully perpetrate corporate financial fraud, an available
mechanism to conceal the fraud and avoid detection is a crucial part of the risk and
reward calculus (Clarke and Cornish, 1985; Simpson and Paternoster, 2017). There
are two possible ways that management may conceal corporate financial fraud —1)
they may limit guardians’ access to information, and 2) they may disguise
manipulations as legitimate transactions (Chan & Gibbs, 2021). These tactics are
not mutually exclusive and are often employed in combination. Risk factors falling
into these categories represent the mechanisms through which corporate financial
fraud is perpetrated.
        Most risk factors identified represent the second type of concealment, where
transactions are manipulated to appear legitimate. Concealment is a particularly
important aspect of corporate financial fraud, as deception rather than physical
threat (street crime) is used to perpetrate the crime (Benson and Simpson, 2014).
By the time regulatory agencies are investigating the misconduct, the fraudulent
financial statements have already successfully deceived external auditors and
circumvented corporate oversights.
                                             20


        Management is subjected to relatively limited oversight; the board of
directors and external auditors are the only primary guardians. As previously
mentioned, their role is not to bar management from access to certain operational
processes, but rather to grant themselves access to the same processes in order to
facilitate the detection of irregularities. However, consistent with agency theory,
information asymmetry exists between management and these guardians.
Withholding information or limiting access to information, therefore, is a strategy of
concealment that can be used by motivated offenders. Risk factors regarding
guardianship generally attempt to capture guardianship effectiveness.
3.3| Machine Learning
   3.3.1| Overview
        Driven by big data and the advancement in computing power, machine
learning has quickly become a popular choice for predictive analyses in scientific
research (Jordan and Mitchell, 2015). As a subfield of artificial intelligence (AI),
machine learning processes data to make decisions through training from examples
rather than explicit programming (Chollet, 2018; Goodfellow, Bengio and Courville,
2016). This is typically done by bifurcating a sample into a training set that is
supplied to the computer as examples to learn from, and a test set that is reserved
for assessing how well the algorithm performs post-learning. Broadly speaking,
machine learning can be classified into three categories—supervised learning,
unsupervised learning and reinforcement learning. Supervised learning involves
                                             21


predicting a target variable (analogous to response/ dependent variable) given a set
of features (analogous to predictor/independent variables). It is particularly suited
for classification and regression tasks where the target variable is labeled (Müller
and Guido, 2016; Sullivan, 2017). That is, we know the classification of the target.
Unsupervised learning is used to uncover hidden patterns from unlabeled data
(Müller and Guido, 2016; Sullivan, 2017). In other words, it is suited for clustering
analyses that help group uncategorized data into meaningful categories. Finally,
reinforcement learning involves decision making through interacting with the
environment at real time (Sullivan, 2017). The computer learns how to optimize
their decisions given a reward and punishment system.
       The current project employs supervised learning. In the present context, we
know whether each annual financial filing is fraudulent or not. Deviating from
traditional computer science programming where the programmer supplies explicit
rules to classify a fraud firm from a non-fraud firm (e.g., if financial performance of
the company declined more than 5% when compared to previous year, classify as
potential fraud firm), machine learning allows the computer to learn and modify its
algorithm based on a training set of fraud and non-fraud firm examples we provide.
The researcher is able to manipulate the parameters of the algorithm to maximize
or minimize any performance metrics we choose to favor, and test the algorithm’s
generalizability with a holdout sample of fraud and non-fraud cases that is not used
in the training process. More detailed descriptions of these procedures are described
under each algorithm below.
                                           22


   3.3.2| Differences between Machine Learning and Inferential Statistics
       In his seminal work on the two “cultures” of statistical modeling, Leo
Breiman (2001) described the differences between the statistical modeling culture
and the statistical learning culture. In criminal justice and most other social
sciences, the modeling culture has dominated the analytical methods. Data
modelling involves the assumption of a stochastic data model (e.g., linear
regression, logistic regression, Cox model) that explains the data generation process
between predictor x and response y (Figure 1). Selection of the model is based on
model assumptions that seem to reflect the data generating process the most (i.e.,
what we believe to represent reality). Parameters are then estimated from the data
and inferences are drawn with regards to the population parameters in the stated
hypotheses. Model validation is typically done through residual analyses or
goodness-of-fit tests and generalization are based on inferential statistics such as
frequentist/classical inference or Bayesian inference. There are certain drawbacks
of the data modelling approach. One is that model validity is uncertain despite the
goodness-of-fit tests, as predictive accuracy is not considered. Thus, if the selected
model emulated the data generation process/reality poorly, conclusions made based
on the model’s mechanisms may be faulty. Another drawback is directed
particularly to the frequentist approach of hypothesis testing that has dominated
empirical research in criminal justice—namely, that p-values and confidence
intervals do not provide the probability of whether the tested hypotheses were true,
                                           23


only how incompatible the data are with a specified statistical model, and only if the
data generation process were to be repeated infinite times.
                            Figure 1. Statistical Modelling
       On the other hand, the learning/algorithmic modeling culture considers the
mechanism with which x predicts y as complex and unknown (Breiman, 2001)
(Figure 2). Since the mechanism is treated as a black box, the focus of statistical
learning is to find an algorithm that results in the best predictions based on
observed data. Different sets of algorithms may be used for such an endeavor,
including random forest, support vector machines (SVM) and neural networks. A
model is validated by maximizing predictive accuracy and generalization is made
via the training and testing processes. The main drawback of the machine learning
approach is that the algorithm is less transparent than a known stochastic data
model, which is well studied and understood. In other words, there is a trade-off
between prediction accuracy (emphasized by machine learning) and model
interpretability (emphasized by statistical modelling) (Chollet, 2018; Müller and
Guido, 2016). Both techniques are adopted in this project to address two sets of
research questions; one a classification task that requires accurate prediction, and
another a regression task that requires certain degree of interpretability.
                                           24


                         Figure 2. Statistical/Machine Learning
   3.3.3| Suitability for White-collar and Corporate Crime Data
        I chose to explore machine learning for several reasons. First, the two
machine learning techniques I have chosen are suitable for supervised,
classification tasks with a discrete binary outcome (likelihood of financial
misstatement). Second, these models can accommodate complex modeling of non-
linear relationships and complex interactions without the need for a priori
specification (Chollet, 2018; Gromping, 2009; Hartshorn, 2016; Hastie, Tibshirani
and Friedman, 2017). Since most corporate crime theories have suggested a series
of complex interactions across risk factors within and between levels of analyses
(e.g., Rorie, 2016; Shover and Hochstetler, 2005), this avoids the underfitting
problem encountered in corporate crime research where the model falls short in
capturing the complexity of the data structure and relationships (Simpson, 2013;
Paternoster, 2016). Third, machine learning does not impose strict assumptions to
the data. This is particularly important in modeling financial ratio risk factors, as
they are likely to be highly correlated due to the nature of double-entry accounting
and the theoretical underpinning of their inclusion. Lax assumptions also reduce
the threat caused by the lack of degrees of freedom and statistical power that have
historically prohibited the empirical analysis of corporate crime data, as they often
suffer from high dimensionality and small sample size (Perols et al., 2016; Bellman,
                                            25


1961). Fourth, since motivation and opportunity are both ubiquitous to
corporations, understanding characteristics of firms that engaged in fraudulent acts
require comparison to similar non-offending firms through designs such as case-
control studies (Benson et al., 2009). Thus, the training and testing process is
particularly well-suited to the task at hand.
       Furthermore, machine learning has shown some promise in its application in
both accounting and the criminal justice arenas. A comparison study by Duwe and
Kim (2016) has shown that machine learning algorithms outperform the Burgess
methodology in predicting offender recidivism. The National Institute of Justice
(NIJ) has dubbed random forest models to be a new risk prediction tool that “shows
great promise” in helping to prioritize probation and parole decisions when
resources are scarce (Ritter, 2013). Various machine learning algorithms have been
shown to successfully predict financial fraud in transaction-level data (e.g. Chan
and Stolfo, 1998; Bolton and Hand, 2002).
       There are many machine learning algorithms that are suitable in addressing
the goal of fraud prediction. In addition to random forest and neural network,
support vector machines, naïve Bayes, or various boosting algorithms are also
appropriate choices for supervised, classification tasks like fraud prediction. Since
part of this project is to explore how machine learning methods can help overcome
some existing research challenges, I opted to explore one of the most interpretable
algorithms in machine learning (random forest) and one of the least interpretable
one (neural network) for comparative purposes.
                                           26


                   CHAPTER 4. DATA AND ANALYTIC STRATEGY
4.1| Sampling Methodology
   4.1.1| Fraud Sample
        To identify fraudulent firms, data are hand collected from published SEC’s
Audit and Enforcement Releases (AAERs). AAERs represent enforcement records
including civil lawsuits brought about by the SEC in federal courts, notices and
orders and any settlement of administrative proceedings towards an individual or
an organization. The target sample of fraud firms consists of ones that had violated
sections 13(b)(2)(A) and 13(b)(2)(B) of the Securities Exchange Act between the
years of 2002 and 2018. I began by gathering all enforcement records published
between 2002 and 2018, which yielded 2,442 AAERs. As enforcement actions can be
directed at internal employees and the external parties independently from the
fraudulent firms, I have collapsed the AAERs involving individuals and external
parties to the corresponding enforcement action. This process resulted in 643
AAERs spanning 17 years.
       Since there is often a lag between crime commitment and detection by the
SEC, and between detection and the enforcement publication, I examined each
enforcement record and only included cases in which the crime is stated to occur
after 2002. It is necessary to distinguish the period when the fraud is committed
(a.k.a. the relevant period) from the date of the enforcement action in order mitigate
any inconsistencies resulting from changes in disclosure, accounting, and corporate
governance rules after the enactment of Sarbanes Oxley Act (2002). This resulted in
                                           27


421 enforcement cases. Cases pertaining to private firms (such as CPA firms) are
excluded, as financial and organizational risk factors are not readily available. This
resulted in 357 publicly traded companies that can be matched to a Central Index
Key. 9 Cases pertaining to fraudulent quarterly filings are also excluded to ensure
fair comparison of financial information from the income statement or the cash flow
statement, which are period-based statements unlike the balance sheet (a point-in-
time statement). The procedure identified a fraud sample of 191 unique companies,
with 450 fraudulent annual financial filings. Note that unit of analysis for the
current study is company-year (i.e., filing per firm per year). The discrepancy
between the number of companies and the number of fraudulent annual financial
filings is explained by serial or repeat offending. Each of the 191 firms have at least
one fraudulent filing; most firms have two. The maximum number of fraudulent
financial reports a single firm had filed with the SEC is 10. The 191 companies are
scattered across a wide range of industries, with 118 unique Standard Industrial
Classification (SIC) codes. Table 1 shows that the majority of the fraudulent filings
fall in the transportation and public utilities and the manufacturing sectors.
9
  Central Index Key (CIK) is a unique key to identify corporations that have filed disclosures with the SEC.
                                                      28


                      Table 1. Fraud Sample Industries (n=450)
                 Industry                               Percentage
                 Wholesale & Retail Trade                  1.30%
                 Services                                  4.40%
                 Transportation & Public Utilities        43.30%
                 Finance, Insurance & Real Estate          5.80%
                 Agriculture                               8.70%
                 Public Administration                     5.30%
                 Manufacturing                            30.20%
                 Mining & Construction                     0.20%
       Step are taken to ensure we adhere to the definition of corporate financial
misrepresentation defined in the previous section. Cases pertaining to the Foreign
Corrupt Practice Act (FCPA) are prosecuted under Section 13(b) of the Securities
Exchange Act (1934), and typically have substantial impact on the financial
statements. Thus, they are included in the current sample of fraud firms. As
corporate crimes often result in financial impacts, the sample of fraud firms will
include a diverse range of offenses—including earnings management, foreign
corruption, material weaknesses internal control deficiencies, misappropriation,
and embezzlement. However, other forms of corporate crime such as environmental
crime, securities fraud, tax fraud and racketeering that are prosecuted under
different provisions or under different enforcement entities are not included in the
current sample, despite how these types of corporate misconduct may also impact
the accuracy of financial statements (as most corporate crime involves financial
transactions to carry out and conceal the crime).
                                          29


        To ensure each firm that faced enforcement actions resulted in financial
restatement, I cross-referenced relevant periods to restatement records from the
SEC’s EDGAR system. Restatement records document alterations to a corporation’s
financial records after its initial publication (alterations can be prompted by
unintentional errors or intentional fraud). This cross-referencing process helps to
identify and exclude cases in which the defending corporation had won its case
against the SEC’s allegations.
        The examination of the relevant period also showed that lag of enforcement is
indeed substantial. The latest fraudulent filing prosecuted in 2018 AAERs occurred
in 2014. Amongst the 2008 AAERs, only six cases that were perpetrated after 2002
had received an enforcement action; other cases enforced in 2008 dated back to as
far as 1997. Consistent with enforcement trends of other white-collar and corporate
crime, enforcement of corporate financial fraud, when broken down to the firm-year
level, have shown a steep decline over the recent decades (Garrett, 2020). As shown
in Figure 3, enforcement have dropped from 77 cases in 2002 to 4 cases in 2014,
with a brief increase in enforcement (31 filings in 2009) just after the 2008 financial
crisis.
                                             30


                    Figure 3. Corporate Financial Fraud Enforcement Trend
    4.1.2| Non-Fraud Samples
          Consistent with prior studies on fraud classification (e.g. Fanning and
Cogger, 1998; Beneish, 1999; Kaminski et al., 2004), the initial non-fraud sample
represents a 1:1 match, where the 450 fraudulent filings were matched to their non-
fraud counterparts based on fiscal year of the fraud, industry, and company size.
For each fraud filing, I first identified all the non-fraud annual filings in the same
fiscal year and industry (identified by the Standard Industrial Classification (SIC)
code). 10 I then selected the company that matched as closely to the fraud filing as
possible in terms of company size, measured by total assets. I also matched fiscal
10
   The North American Industry Classification System (NAICS) is not used because regulatory bodies are slow
in transitioning to the new standard and updating their data.
                                                       31


year-ends to ensure a fair comparison in reporting periods. Various proxy measures
have been used to account for firm size in corporate studies, including total assets,
number of employees and market cap. Yet, to date, there is no empirical analysis
available to shed light on which proxy is more appropriate for the different types of
corporate and organizational research questions. I have opted to use total assets
here because it has been shown to be more relevant to governance measures and
capital structure, which are identified risk factors of corporate fraud (Dang, Li and
Yang, 2017). Its correlation with sales data is also generally weaker (Al-Khazali and
Zoubi, 2005), thus having less potential risk for multicollinearity issues, given
financial crime is often committed to inflate income (a difference of sales and costs).
       The above matching procedures was performed without replacement—that is,
once a non-fraud firm has been identified to match a fraud firm, the next fraud firm
in the same fiscal year and industry will be matched with the next non-fraud firm
that has not already been selected. This resulted in 447 non-fraud filings, and a
total 1:1 sample of 894. Matched non-fraud filings provide benchmarks for training
the machine learning algorithms to compare our fraud filings against. It is
important to hold these factors constant across control (non-fraud firms) and
treatment (fraud firms) groups as variations in risk factors can be specific to
industry, company size and their reporting cycle. Matched non-fraud filings
therefore allow us to take into consideration macro-economic conditions, seasonal
financial patterns and other characteristics that are unique to specific industries,
company size and reporting cycle. A 1:1 match in this manner is also akin to the
                                           32


random under-sampling of the majority class (non-fraud cases), which is a
frequently employed method to account for class imbalance characteristic of rare
events such as fraud (Perols, 2011; Perols and Bowen, 2016). That is, it allows the
computer to learn from as many fraud firms as non-fraud firms. In sum, this 1:1
matching provides the same benefits as case-control studies of rare events, which
has been advocated for the study of white-collar crime (Benson, Madensen and Eck,
2009; Shadish, Cook and Campbell, 2002).
       While a 1:1 matching is commonplace in corporate crime studies and has its
own merits, in reality, the proportion of fraudulent to non-fraudulent annual
financial filings is likely to be much smaller. Much like other forms of crime, the
dark figure of corporate financial fraud is elusive. Yet, if enforcement were any
indication of reality, fraudulent financial filings comprised a minuscular fraction of
the total annual filings of all publicly traded companies, even at peak enforcement
years. Therefore, if the goal of a machine learning algorithm is to detect fraud, it
must be effective at distinguishing fraud from non-fraud even when the proportion
is not 1:1. To simulate this more realistic scenario, I also created a 1:many sample,
where each fraud filing was matched to all the annual filings in the same industry
at the given fraud year. This resulted in 13,015 non-fraud filings company-years,
and a total 1:many sample of 13,465.
                                            33


4.2| Risk Factors Data Collection
        Once the fraud and non-fraud firm-years were identified, annual reports
(Form 10-K) filed with the SEC were obtained for both fraud and non-fraud firms
for each year of the relevant period in question. For each company-year, I extracted
the relevant financial risk factors from the 10-K reports using the Python scripting
language.11 Table 2 provides the list of financial risk factors used in this
dissertation. These risk factors represent line items from the three financial
statements in the annual 10-k report—Balance Sheet, Income Statement and Cash
Flow Statement. They are required components of the annual financial report and
therefore no missing data is associated with these risk factors. Zeros in these
financial statement risk factor represent a true value, indicating the lack of the
corresponding financial item during or as of that reporting period.
        Table 3 contains the list of organizational risk factors used in the analysis.
Many of these motivation-related organizational risk factors are measured with
financial ratio proxies. For example, return on assets is a commonly used as a proxy
measure for a company’s financial health or profitability and some forms of liquidity
measure involving working capital is often used as a proxy for financial distress
when testing strain in the corporate setting (e.g., Wang & Holtfreter, 2012;
Swchartz et al., 2021). Other opportunity-related organizational risk factors pertain
to corporate governance. For instance, a CEO also serving as director of the board
may indicate higher risk of conflict of interest, should there be any misalignment
11
   Financial risk factor data is not obtained from COMPUSTAT because the database backfills restated figures
when they are issued.
                                                      34


between management self-interest and firm interest (e.g., Simpson & Koper, 1997).
Companies that employ one of the Big 4 auditing firms are hypothesized to have
lower risk of fraud because Big 4 firms have more resources, tend to specialize in
specific industries and are more concerned with audit quality due to their need to
maintain reputation (e.g., Farber, 2005).
       These motivation and opportunity measures are obtained from the WRDS
COMPUSTAT database or calculated from items from the 10-ks. More detailed
description of each risk factor can be found in Appendix B. While they do not
embody the comprehensive list of risk factors identified in the literature review,
they represent those that are accessible to the public and to law enforcement, and
will serve adequately in this exploratory project. Consistent with previous machine
learning studies, observations with missing theoretical risk factors are list-wise
removed and the remaining normalized. Sample size for the 1:1 matched sample
was reduced to 760, and to 10,972 for the 1:many sample. Descriptive statistics for
each of the 26 financial risk factors and 20 organizational risk factors can be found
in Appendix B, along with point-biserial correlations with the binary dependent
variable—fraud.
                                           35


                            Table 2. Financial Risk Factors
                       FINANCIAL STATEMENT RISK FACTORS
     Accounts Payable                              Investments and Equivalents
     Capital Stocks                                Long Term Debt
     Cash and Equivalents                          Net Sales
     Common Shares Outstanding                     Property Plant Equipment
     Cost of Goods Sold                            Retained Earnings
     Current Assets                                Short Term Investments
     Current Liabilities                           Taxes Payable
     Debt in Current Liabilities                   Total Assets
     Debt Issuance                                 Total Common Equity
     Depreciation and Amortization                 Total Current Liabilities
     Income Before Extraodinaries                  Total Inventories
     Income Taxes                                  Total Liabilities
     Interest Expense                              Total Receivables
                         Table 3. Organizational Risk Factors
THEORETICAL RISK FACTORS                 FRAUD ELEMENT       RESEARCH EXAMPLE
Audit Fees                                  Opportunity      Ferguson et al. (2003)
Auditor Change                              Opportunity      Myers et al. (2003)
Big Four Auditors                           Opportunity      Farber (2005)
Book to Market Ratio                         Motivation      Dechow et al. (2011)
Cash Margin                                  Motivation      Green and Choi (1997)
CEO Duality                                 Opportunity      Simpson & Koper (1997)
Change in Cash Sales                         Motivation      Beneish (1997)
Change in Free Cash Flows                    Motivation      Dechow et al. (2011)
Change in Non Cash Operating Assets          Motivation      Dechow et al. (1996)
Change in Reveivables                        Motivation      Green and Choi (1997)
Depreciation Index                           Motivation      Beneish (1999)
Nonaudit Fees                               Opportunity      Frankel et al. (2002)
Officer Change                              Opportunity      Simpson & Koper (1997)
Retained Earnings on Assets                  Motivation      Dechow et al. (2011)
Return on Assets                             Motivation      Wang & Holtfreter (2012)
Sale of Stock                                Motivation      Beneish (1999)
Soft Assets Ratio                            Motivation      Dechow et al. (2011)
Stock Price at Year End                      Motivation      Dechow et al. (2011)
Total Fees to Public Accounting Firms       Opportunity      Frankel et al. (2002)
Working Capital                              Motivation      Perols & Lougee (2009)
                                          36


4.3| Machine Learning: Application
   4.3.1| Overview of Machine Learning Procedures
       This dissertation will concentrate on using supervised learning to solve a
classification problem. In supervised learning, each observation of input measures
is associated with a corresponding outcome measure. The values or labels of the
outcome measure guide the learning of the algorithm. The goal is to fit a model that
generates the best predictions of the outcome measures based on a number of input
measures. For the current research question of fraud prediction, the outcome labels
(or classes) are binary, and the goal is to fit a model that most accurately classify
the fraud filings into the fraud class or the non-fraud class. The following
paragraphs in this section describe the general workflow of the analysis performed
in this dissertation. This workflow represents the conventional procedures in
machine learning, and are applied to both the random forest and neural network
classifiers used in this current analysis.
       All machine learning applications in this dissertation are coded in Python
(version 3.8.12), using the scikit-learn library for random forest models and the
Keras library for neural network models. Recall that in machine learning, we assess
the generalizability of a model by applying it to unseen data that is not used to
train the algorithms. To do so, the total sample is randomly split into a training set
and a holdout test set. The training set was used in training the algorithm for
pattern recognition, hyperparameter tuning, and cross validation. Only when the
training was completed was the holdout test set used to evaluate the
                                            37


generalizability of how well the learners did in classifying fraud cases against non-
fraud cases. This train-test split procedure in machine learning ensures that the
test set data does not influence model selection. I adopted a 75/25 split, as
consistent with range of train/test split ratios in prior fraud detection literature
(e.g., Lin et al., 2003). The split is stratified on the outcome measure (fraud/non-
fraud) to ensure the holdout test set has the same fraud to non-fraud ratio. A seed is
assigned to record the exact split to ensure future replicability.
        Training the classifiers was an iterative process. As with any form of
modeling, we strive to develop a well-generalized model that minimizes prediction
error. Prediction error of a model can be expressed in the following equation:
                                               !
               !""#"(%) = (!)*+(%), − *(%). + ![2*+(%) − !)*+(%),)! , + 3"!
                                           !
, where the first term (!)*+(%), − *(%). represents the squared bias, the second
term ![2*+(%) − !)*+(%),)! , represents the variance, and 3"! represents the irreducible
error. Since irreducible error is unlikely to avoid and remains constant, the goal is
to minimize bias and variance. In a layperson’s terms, we strive to produce an
algorithm that can model the true relationships between the predictors and
outcome accurately (low bias), and yield consistent or precise results across
different randomly drawn samples (low variance). However, for any given prediction
error, we can see that there will be a trade-off between bias and variance. A model
with high bias is said to be underfitting the data predictions may not be very
accurate, whereas a model with high variance is said to be overfitting the data such
that predictions may vary greatly across samples. Thus, striking a balance in the
                                               38


bias-variance tradeoff is one of the components in the training process that is at the
judgment and discretion of the researcher.
       There are many factors that can impact the performance of a machine
learning algorithm. One such factor is the number and quality of the input
measures (or “features” in machine learning jargon). In the next chapter of this
dissertation, I compare models using all the risk factors previously mentioned in
Tables 2 and 3, against models that used only a subset of the risk factors that
appeared to be the most important in the classification process. Another factor that
can impact the performance of a machine learning algorithm is how the parameters
of the algorithm are specified. Each machine learning algorithm has a different set
of parameters that I can manipulate (or “tune”) to determine the best combination
of parameter specifications. For example, with a random forest classifier, one can
specify the number of trees in the forest, how large (deep) the trees can be, how
many features each tree can consider, and other characteristics discussed in further
detail below. Hyperparameter tuning is the iterative process of finding the best
combination of values specified for each of these parameters that control the
learning process of the algorithms. To find these combinations, I performed
randomized searches and grid searches on the optimal hyperparameters that will
maximize the area under the receiver operating characteristic (ROC) curve in the
training data. While the most common performance evaluation metric is accuracy, I
favored the area the ROC curve (AUC) as a metric in the tuning process for purpose
                                           39


of model comparison, especially with imbalanced classes.12 Essentially, these
searches allowed me to take advantage of the processing power of the computer to
evaluate hundreds and thousands of models with different combinations of
hyperparameters.
        In order to ensure that the model is performing consistently, and that any
variation in performance metrics is not an artifact of the train/test split, I
performed 10-fold cross-validation with all the models presented in this
dissertation. This is consistent with prior studies on fraud detection (e.g. Liu, Chan,
Kazmi and Fu, 2015; Throckmorton, Mayew, Venkatachalam and Collins, 2015). K-
fold cross-validation is a technique that involves randomly dividing the data into 4
groups (or “folds”; 5 or 10 folds are common choices), of approximately equal sizes
(James et al., 2013). Each fold is treated as a test set while the remaining 4 − 1
folds are used to fit the model. Essentially, we are fitting and testing the model 10
times with 10 subsets of the data. We thus obtain 10 performance metrics (in our
case, the AUC), one for each subset of data, allowing us to examine the variation
across folds scores and providing us with an overall average.
        The cross-validation technique is used throughout the iterative training
phase both in hyperparameter random and grid searches and in validating the AUC
scores obtained. Note that the holdout test set that is set aside at the beginning of
the analysis is not used in this cross-validation procedure and remains completely
unseen by the algorithm during training. Only when performance of the algorithm
12
   Evaluation metrics will be further discussed in the Results section of the dissertation.
                                                       40


is optimized and validated, did I finally test it on the test set. The test set AUC
score for each model is compared to the corresponding average validation score. This
provides added assurance that the performance metrics obtained from the test set
are valid and not a result of how the data is split.
   4.3.2| Random Forest Classifier
       This section of the dissertation will address why the random forest algorithm
is chosen and describe how the algorithm is trained to perform a classification task.
I elected to implement random forests for the risk assessment of corporate financial
fraud for several reasons. First, it is a simple to understand algorithm that has
shown success in predicting offender recidivism in prior studies (e.g. Neuilly, Zgoba,
Tita and Lee, 2011; Pflueger, Franke, Graf & Hachtel, 2015), but has yet been
applied to other criminal justice inquiries. It has also been implemented by the
Pennsylvania Board of Probation and Parole to reduce re-arrests for both violent or
non-violent crime (Berk, 2017; Barnes, Hyatt, Ahlman and Kent, 2012). Second,
tree-based algorithms are flexible due to their ability to create decision regions
rather than a linear decision boundary (Hartshorn, 2016). Fraud detection, being
“cursed” with data dimensionality problem (Bellman, 1961), can also benefit from
their ability to accommodate a large number of risk factors with respect to sample
size (Breiman, 2001; 2002). Finally, while random forest has once been referred to
as a black box model in some criminal justice research, recent interpretation aids
have been developed to address this criticism. Not only does it provide rank order
feature importance, allowing us to assess whether certain risk factors are more
                                            41


predictive of corporate financial fraud risk than others, it also provides decision
paths that allows us to break down how decisions are made. Therefore, for both
practical and educational reasons, I believe random forest to be an appropriate
choice for a fraud risk assessment tool.
       To understand random forest, we must first understand how decision trees
work. Figure 4 is a graphical representation of what a simplified classification tree
may look like in the present context. It takes each observation in our data and
follows the decision arrows beginning at the root node at the top, through the
branches (or internal nodes), and ends with the leaf nodes. The figure used a binary
theoretic risk factor at the root as example, but continuous variables such as the
financial risk factors is split by using a decision threshold (e.g., total assets < $300
million). Leaf nodes are said to be “impure” when there are a mixture of fraud and
non-fraud observations. The impurity of a leaf node can be quantified with several
different methods, including Gini impurity, entropy and information gain. Models
from this dissertation used the Gini method, which is the default with the random
forest classifier from the Scikit Learn library. The Gini impurity for a single leaf
node can be computed as such:
                1 − (6"#7879:9;< #* *"8=>)! − (6"#7879:9;< #* ?#?*"8=>)!
Figure 5 shows an example of this calculation. It also shows the total Gini impurity
of the risk factor, which is the weighted average of the two leaves. The goal in
general, is to minimize the impurity. Therefore, if one risk factor is insufficient in
creating a pure classification of fraud and non-fraud filings, the node is split again
                                            42


with a second risk factor, and a third and so on until we are satisfied with the level
of impurity. However, what constitute a satisfactory level of impurity is an example
of the bias-variance tradeoff issue discussed previously. A large tree with many risk
factors may produce clear-cut classifications with pure leaves at the end, but may
overfit the data and generalize poorly on unseen data. Even with hyperparameters
tuning such as limiting tree depth and maximum number of splits, decision trees
are said to be prone to overfitting. Random forest, which consists of many decision
trees instead of just one, helps with this problem (Breiman, 2001).
                         Figure 4. A Single Classification Tree
                                           43


                         Figure 5. Gini Impurity Calculation
       Random forest is an extension of an ensemble learning method known as
bootstrap aggregation (a.k.a. bagging). The main idea behind bagging is to combine
multiple high in variance but relatively unbiased learners to reduce overall
prediction error, with each learner using a bootstrap sample (randomly drawn with
replacement) of the training set (Hastie et al., 2017). In the case of a random forest,
a number of decision trees are created using bootstrap samples that is the same size
as the training set. To avoid having overly similar trees within the ensemble,
random forest further introduces randomness by only allowing a subset of the input
features to be considered at each node. Each individual tree in the ensemble is
considered independent of each other, and will operate as it would singly, by
determining features and split points to minimize impurity. Random forest then
produces the final prediction by aggregating the results from each tree through a
majority voting mechanism for classification tasks (or by averaging for regression
tasks).
                                          44


         There are many types of decision trees—e.g., ID3, C4.5, CART, MARS. The
base learner I used in the random forest classifier is CART (classification and
regression tree), with impurity computed using the Gini criterion. CART is chosen
for its ability to accommodate the diverse distributional characteristics of financial
data (including the accommodation of outliers), and it’s ability to predict financial
distress in prior studies (e.g., Salehi & Fard, 2013, Chen, 2011). Random Forest has
quite a few hyperparameters to optimize—the number of CARTs in the forest, the
maximum number of input features considered at each node, the minimum
observations required in a leaf, the number of observations required to and the split
a node further. To bring some structure to the hyperparameter tuning process, I
first obtained a baseline level of each model’s performance using the default settings
for these hyperparameters from the Scikit Learn random forest classifier. Then to
narrow down the search, I performed a randomized search using a parameter grid
with a range of values for each hyperparameter. A randomized search attempts to
optimize an evaluation metric by testing random combinations of the range of
values for each parameter.13 Then, based on the best parameters of the randomized
search, I further fine-tuned the hyperparameters with a manual grid search. A
manual grid search requires the researcher to stipulate a parameter grid of specific
values for the hyperparameters. Both of these searches are performed using 5-fold
13
   For example, I can specify the randomized search to generate 5 random numbers between 20 to 200 for the
number of trees in the random forest. Then, specify it to generate 5 random numbers between 1 to 25 for the
number of features considered in each tree. The random search will consider the 55 = 3125 models to find the
best combination of these two hyperparameters. Something that is difficult to achieve without leveraging
computational power of modern processors.
                                                       45


cross validation, with AUC being the metric to be optimized. The out of bag error
rate generated from the bootstrap sampling procedures was also consulted as our
guide to determine the optimal number of CARTs to include in the ensemble. The
best combination of hyperparameters from these searches is used for a final 10-fold
cross validation of the AUC score to assess its reliability. The final models are then
tested on the holdout test set and detailed decomposition of the confusion matrix
along with various metrics are reported in the results section of this dissertation.
Despite of the availability of the out of bag sample, I elected to separately retain a
holdout test set for model evaluation as other algorithms used for comparison (e.g.
logistic regression) do not necessarily have an out of bag sample from the bootstrap
aggregation process.
   4.3.3| Neural Network Classifier
       This section of the dissertation will address why the neural network
algorithm is chosen and describe how the algorithm is trained to perform a
classification task. The neural network was chosen for the fraud detection tool for
several reasons. First, most white-collar criminologists have hypothesized corporate
crime as a result of complex interactions between opportunity-based risk factors
and between opportunities and motivations (Coleman, 1987; Shover & Hochstetler,
2006), many of which cannot be directly observed. The hidden layers of deep
networks are apt to capture these unobservable interactions that are present in the
real world. Second, previous studies have shown success in varying forms of neural
network in predicting fraud (e.g. Fanning & Cogger, 1998; Lin, Hwang & Becker,
                                            46


2003), but the risk factors included were limited to selected financial ratios, the
samples used were prior to substantial regulatory changes, and the networks were
limited to one hidden layer. Third, deep networks have great scalability both in
terms of sample size and model complexity (Chollet, 2018). While our analysis may
be limited to the data available to us currently, it is easily adaptable to incorporate
larger scale data, more detailed level of analysis and any other risk factors
identified in the future. Finally, neural network provides an excellent contrast
against random forest, the former being one of the least interpretable and the latter
being the one of the most interpretable machine learning algorithms. Comparing
these algorithms will serve as an exploratory analysis into how diverse forms of
machine learning methods compare to our standard approach of logistic regression
in the context of fraud detection.
       Figure 6 represents an illustration of a neural network in the present
context. An artificial neural network is made up of many neurons arranged in
layers. The far most column of neurons represents the input layer, where each
neuron represents the value of a risk factor. The middle column of neurons
represents a hidden layer, capturing the interactions between the input measures.
The right most column of neurons represents the output layer, which is made up of
the two classes of our outcome measures. The arrows between neurons carry
weights that are to be estimated. Neural networks make predictions by computing
the dot products of each neuron, and then applying an activation function to capture
the nonlinearity in the data. Figure 7 demonstrates the calculation for a single
                                           47


neuron in the hidden layer. This process of taking values from the input layer and
moving through the hidden layers to make a prediction at the output later is known
as forward propagation.
                        Figure 6. Neural Network Structure
                                         48


                                   Figure 7. Forward Propagation
         The goal of training is to determine the combination of weights and biases
associated with each arrow in the diagram to produce the best prediction for the
fraud and non-fraud class. Since we are capturing many interactions between many
input features and hidden nodes, one can see that a neural network must estimate
many weights and biases. Much like how weights are optimized in linear regression
by minimizing the sum of squared residuals, a neural network estimates the
weights by minimizing a loss function. Different loss functions are used with
different data types, but the way a neural network optimizes any loss functions is
with an algorithm called gradient descent.14 In introductory calculus courses, we
were taught to find the minima of functions by taking the derivative and setting
14
   Technically, the neural network models in this dissertation uses stochastic gradient descent, which performs
the same procedures as gradient descent, but in batches of randomly selected samples from all the
observations. To simplify the explanation, I left out the detail of how batches maximize efficiency of the
algorithm in terms of convergence time, but the batch sizes used are reported for each model in the results
section.
                                                        49


that to zero. Gradient descent takes a similar but slightly different approach; it
makes an initial guess of the value of the local minimum and take steps towards it
(larger steps when the guess is further away from and smaller steps as the guess is
closer to zero). This small difference makes gradient descent a powerful optimizer in
a wide range of scenarios where derivative equals zero is not possible to solve.
       The slope for a single node can be computed by multiplying three
components: 1) the slope of the loss function with respect to the value of the node of
interest, 2) the value of the nodes that feeds into our weight, and 3) the slope of the
activation function with respect to the value of the node of interest. To move this
slope towards the lowest point of the loss function, we compute a new slope by
subtracting the old slope by a small fraction of itself. This small fraction is known
as the learning rate. In other words, @AB C:#6A = D:> C:#6A − (D:> C:#6A ×
FA8"?9?G H8;A). A small learning rate prevents us from missing the slope. This is
done iteratively until we minimized the loss function. However, recall that a neural
network must estimate many weights and biases simultaneously. A gradient
represents the same concept as derivatives/slopes, but to a function of
multidimensional inputs. A neural network uses an algorithm called
backpropagation to compute said gradients with an efficient implementation of the
chain rule in calculus. This is what makes deep learning (i.e., neural networks with
multiple hidden layers) feasible. To compute the gradients, backpropagation takes
the prediction error from the output layer obtained from the previous procedures,
and propagates it backwards through the hidden layers to the input layer. Note that
                                           50


backpropagation is often mistaken for the algorithm used to train the neural
networks, but it is the automatic differentiation algorithm used to compute the
gradients that are then used by an optimization algorithm like stochastic gradient
descent in the learning process (Goodfellow, Bengio & Courville, 2016).
       All neural networks in this dissertation are built with the Keras library,
which is an extension to a popular open-source machine learning platform named
Tensorflow. The optimizer used is “Adam”, which is a form of stochastic gradient
descent that has an adaptive instead of a fixed learning rate. The hyperparameters
in a neural network are either related to the network structure or the training
algorithm. Network structure hyperparameters include the number of layers,
number of neurons in the layers and the activation function. Hyperparameters
related to the training algorithm include the learning rate, number of epochs and
batch size. The input layer of the neural network in this dissertation will
correspond to the number of input measures used in the model. The output layer
will contain two nodes, one of each class of the outcome measure. The two nodes in
the output layer, as well as the softmax activation function corresponds to the loss
function I opted for my neural networks—sparse categorical cross-entropy. Sparse
categorical cross-entropy is appropriate for any multiclass classification problems
(Géron, 2019). It computes the same error as the cross-entropy loss function (which
is similar to that of the log loss function) and yields the predicted probability for
each class in the output layer. The number of hidden layers of the neural networks
presented in this dissertation are limited to two for feasibility and because two
                                             51


hidden layers are theoretically all that is necessary to represent any functional
forms (Heaton, 2008). Note that unlike random forest, the structures of the hidden
layers along with learning rate, number of epochs and batch size are tuned to
minimize the loss function, not a specific performance metric.
       Validation for neural networks is often done with a validation set that is
separate of the training and the test set. That is because 1) neural networks or deep
learning are often applied to a large dataset with hundreds of thousands of
observations and a k-fold cross validation for every model will be extremely
computationally expensive and time consuming, 2) practically speaking, the number
of epochs and batch size used in k-fold cross-validation would be different, as those
are a function of the total number of observations. Nevertheless, I performed 10-fold
cross validation for the neural network models presented in this dissertation for
consistency reasons.
                                           52


                                          CHAPTER 5. RESULTS
         This chapter reports findings to the research questions of this dissertation
project:
    1. Can the multi-disciplinary risk factors identified by research be used to
         predict corporate financial fraud with the use of a random forest classifier
         (i.e., does the algorithm perform better than a naïve classifier15)?
    2. How does the random forest classifier perform in comparison to commonly
         employed prediction tools (e.g., logistic regression)?
    3. Which of the multi-disciplinary risk factors are most important in predicting
         corporate financial fraud?
    4. Can the multi-disciplinary risk factors be used to predict corporate financial
         fraud with the use of a deep neural network classifier? (i.e., does the
         algorithm perform better than a naïve classifier)?
    5. How does a neural network classifier perform in comparison to logistic
         regression and the random forest classifier?
          In other words, I sought to determine whether the risk factors identified in
previous research can be used to predict corporate financial fraud with the use of
machine learning. Specifically, I wished to examine how a random forest classifier
and a neural network classifier performed compared to the more commonly
employed logistic regression. I will first present my findings in this chapter, and
15
   A naïve classifier refers to one that predict the classes randomly (i.e., predicts no better than random chance)
or predict the same class invariably (e.g., predicts every case as fraud).
                                                          53


discuss the meanings and implications of the findings in more detail in the following
chapter (Chapter 6).
5.1| Performance Evaluation Metrics
          Before we examine results from the different algorithms, it may be prudent
to discuss the various forms of performance evaluation metrics used in machine
learning. If one were to examine the metrics documentation of the Scikit Learn
library, one would discover a list of almost 40 metrics16; almost half of them relate
to classification tasks and the rest regression and clustering tasks. Despite the
overwhelming number of classification metrics, almost all of them can be traced
back to the confusion matrix. A confusion matrix is a matrix that compares the
predicted classes from the algorithm to the true classes, as shown in Figure 8. In
the present context of fraud detection, a true positive represents a fraud case that
has been correctly classified by the algorithm as such. A true negative represents a
non-fraud case that has also been correctly classified as such. A false positive
represents a non-fraud case that has been incorrectly classified as a fraud case; in
other words, it is a false alarm that is synonymous with type I error in hypothesis
testing. A false negative represents a fraud case that is incorrectly classified as a
non-fraud case; in other words, it is a missed detection that is synonymous with
type II error in hypothesis testing.
16
   https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics
                                                        54


                 Figure 8. Confusion Matrix for Fraud Classification
      From the four values in the confusion matrix, we can compute some of the
most widely use classification metrics in machine learning:
                                              KL + K@
                             IJJ="8J< =
                                         KL + ML + K@ + M@
                                                      KL
                             CA?N9;9O9;< (HAJ8::) =
                                                    KL + M@
                                                  KL
                                  L"AJ9N9#? =
                                               KL + ML
                                                  K@
                                 C6AJ9*9J9;< =
                                               ML + K@
Accuracy, perhaps the most commonly reported metric, allows us to assess how well
an algorithm does overall at correctly identifying the truth (both positives and
negatives). If we refer to the confusion matrix, the denominator of the sensitivity
                                           55


(recall) formula is made up of the two cells in left column of the matrix (i.e., the
actual fraud cases). In other words, it tells us how well an algorithm does in
identifying the total positive cases (as false negatives are actual positives). The
denominator of the precision formula speaks to the top two cells of the confusion
matrix (i.e., the predicted positives). In other words, it tells us how correct an
algorithm is when it predicts a positive outcome. The denominator of the specificity
formula refers to the two cells on the right of the confusion matrix (i.e., the actual
non-fraud cases), thus telling us how well an algorithm does in identifying the total
negative cases.
       The ROC curve summarizes much of this information in the form of a graph.
It plots an algorithm’s false positive rate (x-axis) against its true positive rate (y-
axis). The false positive rate can be computed by 1 − C6AJ9*9J9;<. The true positive
rate is synonymous with sensitivity/recall. The ROC curve shows the trade-off
between correctly predicting the positive class (fraud) and incorrectly predicting the
negative class (non-fraud). If a naïve classifier were to predict the classes randomly
or to predict the same class invariably, it would be represented by the diagonal line
stemming from the origin of the graph, and the area under that diagonal line would
equal .5. We refer to it as a naïve classifier because it shows no ability in
discriminating between positive and negative classes. A skilled classifier would
perform better at distinguishing the two classes, with the fraction of correct positive
predictions close to 1 and the fraction of incorrect negative predictions close to 0,
                                            56


thus generating a curve above the diagonal line, and having an AUC of greater than
.5.
         These classification metrics are not only helpful in model comparison, but
also in hyperparameter tuning and in the consideration of real-world applications.
Aside from a perfect system, error is unavoidable and thus there must be a trade-off
between type I and type II error. Even within the domain of fraud detection, users
of the algorithm may have different goals and different resource constraints. If a
fraud detection algorithm is designed to flag credit card fraud such that bankers can
follow up with owners of the credit cards, a higher false positive rate may not be
problematic or may even be desired. On the contrary, given how our data has shown
how long it takes to investigate and prosecute a corporate financial fraud case, a
law enforcement agency such as the SEC that has high resource constraints may
not be able to afford having a lot of false alarms.
5.2| Results from the 1:1 Sample
         Table 4 shows the random forest and the neural network models in
comparison to logistic regression. These models are trained with all the input
measures listed in Table 1 and 2, and the ones with the best AUC after the iterative
training process are reported.17 The hyperparameters that are used to optimize the
AUC scores are also reported. Recall that these reported metrics are based on
17
   Note that company central index key is not included in the models as an input measure despite the nested
structure. Most firms only have two fraud filings or less, and the intraclass correlation at the firm-level is less
than .08 per the unconditional (null) model.
                                                       57


testing the tuned and cross-validated models on the completely unseen test data
from the 25% holdout sample. This holdout sample contains 190 observations, 99 of
which are fraud cases. 10-fold cross validation results for the AUC are reported in
parenthesis. The other classification metrics discussed above are reported along
with the deconstructed confusion matrices.
  Table 4. Classification Results from Random Forest and Neural Network (n=760)
                               Random Forest           Neural Network    Logistic Regression
                                max depth: 60           hidden layer 1:
                                max features: 5           neurons=30
          Model Specification min samples leaf: 1     activation=softmax         n/a
                              min samples split: 3      batch size: 50
                                     n: 120               epochs: 250
                           TP          76                      76                 79
                          TN           66                      27                 28
                          FP           25                      64                 63
                          FN           23                      23                 20
                   Accuracy          0.747                   0.542              0.563
                  Sensitivity        0.768                   0.768              0.798
                   Precision         0.752                   0.543              0.556
                  Specificity        0.725                   0.297              0.308
                  AUC (CV)        0.815 (.812)            0.614(.602)        .688(.695)
       With respect to the research questions 2 and 5, both the random forest and
the neural network algorithms performed more effectively than a naïve classifier in
this 1:1 matched sample, with AUCs of .815 and .614 respectively. In comparison,
the logistic regression model resulted in an AUC of .688, performing slightly more
effectively than the neural network model but not as well as the random forest
model, per research questions 3 and 6. The overall accuracy scores of these models
are in alignment with the AUC—the random forest algorithm performed best in
                                                   58


correctly classifying 75% of the unseen test cases, while neural network (with a
single hidden layer and 512 trainable parameters) and logistic regression correctly
classified 54% and 56% of all the test cases.
       When the random forest algorithm predicts a test case to be fraudulent, it is
correct 75% of the time, as compared to 54% of the time by the neural network and
56% by logistic regression. This precision metric is particularly important to
consider when implementing a fraud detection algorithm in real life. Since there is
often a finite amount of resources, users of the algorithm must decide how many
false positives is tolerable depending on their goals. Given the nature of a binary
outcome, users of a classifier must weigh the cost of lower precision against the
benefit of higher sensitivity/recall, as there is always a trade-off. Despite scoring the
best at overall accuracy, precision and specificity, the random forest algorithm is
outperformed by logistic regression in recalling fraud cases. Specifically, logistic
regression was able to identify 80% of the fraud cases in the test set, whereas the
random forest and the neural network algorithms were only able to identify 77% of
them. Since we know that logistic regression performed well in identifying the fraud
cases but was only marginally better than random chance in overall accuracy, we
can expect that it performed poorly in identifying the non-fraud cases, as confirmed
by its specificity score of .308. This was also the case with the neural network
model, which only identified less than 30% of the non-fraud cases. In contrast, the
random forest algorithm had a more balanced tradeoff between type I (false
positive) and type II errors (false negative).
                                            59


       The overlayed ROC curves of all three algorithms shown in Figure 9
represent a graphical summary of the results reported above. The random forest
classifier performed the best overall, yielding a ROC curve that is furthest away
from the random chance (represented by the black dotted line) and yielding the
greatest area under the curve. Since it was able to distinguish fraud and non-fraud
cases proportionately well, it also exhibited a more symmetrical ROC curve,
whereas the ROC curves for logistic regression and neural network were slightly
skewed in comparison. The average 10-fold cross-validation scores for the AUCs are
less than .02 from the test scores, suggesting results presented here are relatively
consistent and that the model did not overfit the data. Not only do the AUCs allow
us to conveniently compare between models, they ROC curves also allow us to
explore the optimal thresholds for classification, depending on the application and
the ultimate goal of implementing the algorithms. To ensure consistent comparison,
the threshold for classification is set at .5; that is cases of assigned probabilities of
greater or equal to .5 were classified as fraud, and cases with probabilities below .5
were classified as compliant. The changing of threshold level will be discussed
further in Section 5.3 when dealing with imbalanced data.
                                             60


                    Figure 9. ROC Curve and AUC for Models Comparison
          As mentioned in the previous chapters, one of the reasons random forest was
chosen for this project was due to its reputation for being one of the more
interpretable machine learning algorithms. Once the algorithm has been trained,
“feature importance” can be extracted from the trained model. Figure 10 represents
a bar chart of the top 25 features that are considered most “important” in the
random forest classification process. Feature importance is operationalized as the
reduction in the Gini impurity index when the feature is used to split a node,
averaged across all trees in the forest18. This average reduction in impurity
identified common equity, as the financial statement (line item on the balance
18
   The source code for how the Sciki Learn library computes feature importance function can be found here:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/_tree.pyx#L1056
                                                       61


sheet) risk factor that was used most frequently in the classification process. Return
on assets was identified as the most important motivational proxy measure; it
represents a financial ratio used in financial statement analysis to assess a
company’s performance in generating profits with its resources. The most important
opportunity proxy measure was audit fees, which is a proxy measure for audit
quality.
       Upon examining the point biserial correlations of these features with
outcome variable (fraud), only return on assets had a statistically significant
positive association and even then, a very weak one (r = .098, p-value < .05).
Common equity and audit fees have even weaker positive associations (.045 and
.070, respectively) that are not statistically significant. This provides some insights
as to why return on assets is frequently used in empirical studies of corporate crime
as a proxy indicator for profitability (e.g. Wang & Holtfreter, 2012; Schwartz et al.,
2021). Being a special case of Pearson’s correlation for a binary variable, point
biserial correlation is based on the assumption that the two variables have a linear
relationship, whereas the other two variables may have a more complex
relationship with fraud. For example, audit fees, as a proxy measure for audit
quality is often associated with auditor’s capability of detecting fraud (i.e., a
protective factor), yet it can also be interpreted as more complex audit requiring
more staff-hours (i.e. a risk factor). The random forest algorithm may be more
attuned to identifying non-linear relationships such as this. Yet it is important to
note that computation of feature importance does not take into consideration
                                             62


collinearity among input measures. As shown in Figure 10, the second and third
most relied upon measures are return on assets and retained earnings on assets,
two highly related numerators divided by the same denominator. Since only a
subset of features are considered in every tree in the random forest, two highly
correlated features can both be identified as important. Put differently, even
features that are considered important can be redundant in its contribution to the
algorithm’s ability to make a classification.
             Figure 10. Feature Importance from Random Forest (n=760)
5.3| Feature Selection & Importance
       In addition to hyperparameter tuning, one of most effective ways to improve
an algorithm’s classification performance is through feature selection. That is, to
reduce the number of input measures by removing those that either do not
                                           63


contribute too much to the algorithm’s ability to classify, or are redundant in
relation to another input measure. As briefly demonstrated earlier, redundancy is
an especially prevalent issue with the use of financial ratios as proxy measures, as
many of the financial ratios are computed with the same financial statement line
items. While multicollinearity issues do not violate any assumptions in machine
learning unlike traditional statistical methods, they do impact an algorithm’s
performance. In this section, I answer research question 4 of this dissertation, by
exploring whether more parsimonious subsets of our input measures can help
improve performance of the algorithms.
       Since random forest appeared to be the overall best performing algorithm in
the fraud classification task, I used it to compare the five different subsets of risk
factors. Predictive results using the same tuning procedures and evaluation metrics
are outlined in Table 5. The first subgroup of input measures (subset 1) contains the
financial statement line items only. A comparison of the overall performance
metrics showed that using a subset of financial features only improved the
performance of the random forest classifier; AUC increased by 5 points (from .815 to
.865) while overall accuracy increased by 2 points (from 75% to 77%). Without the
organizational proxy measures, the financial statement line items were able to
improve identification of the non-fraud cases (i.e., increased specificity) without
compromising the ability to identify fraud cases (i.e., little change in sensitivity),
thereby also improving precision and overall accuracy.
                                             64


         I then investigated the remaining organizational proxy measures only in
subset 2. The results in Table 4 showed these proxy measures are not as powerful
predictors as the financial ones and have reduced the classification performance of
the original model with all the features. AUC declined 2 points (from .815 to .792)
and overall accuracy also declined by 2 points (from 75% to 73%). The
organizational proxy measures did not perform as well at identifying fraud cases,
but was especially wanting in its identification of non-fraud cases (with specificity
dropped by 3 points compared to the full model and 9 points compared to the
financial measures only model). I attributed this decline in performance to the
financial ratios used as proxies for much of the motivational measures. As such, I
tested a subset of selected financial statement line items and the opportunity
measures in subset 3, excluding all financial ratio variables.19 Prediction results
from subset 3 showed that the combination of financial line items and opportunity
measures without ratio proxies yielded even greater improvement in accuracy,
sensitivity, and precision to the original model than with financial measures only.
This mixed group of selected financial and opportunity measures improved upon
identifying non-fraud cases, but not as much as the comprehensive set of financial
measures alone. Hence, AUC improved 4 points from .815 to .859 as opposed to 5
points with subset 1. Taken as a whole, these three subsets lend support to the need
19
   Subset 3 contains the following input measures: common equity, interest expense, long term debt, accounts
payable, total liabilities, total assets, total inventories, current assets, current liabilities, sale of stock, total
receivables, property plant and equipment, cash and equivalents, net sales, cost of goods sold, common stock
outstanding, debt in current liabilities, audit fees, non-audit fees, big four external auditor, auditor change,
officer duality and officer change.
                                                             65


to reexamine the use of financial ratios as motivational proxy measures, which will
be discussed further in the following chapter (Chapter 6).
          Whereas the first three subsets of features examined above were guided by
theory and domain knowledge, the next couple of subsets aimed to investigate
common features selection methods used in machine learning. Variables used in
subset 4 is produced by a feature selection tool from the same Scikit Learn library
used for training and testing the machine learning algorithms. First, chi-squared
tests were used to assess the dependencies between the categorical input measures
and the outcome measure (fraud), and f-tests to assess the variances for continuous
financial measures. Based on those tests, the feature selection tool assigns a score of
range 0 to 1 to each feature. I selected all the input measures that received a score
of greater than .5. I further removed 3 input measures that may be redundant from
an accounting perspective. Note that variables identified by this features selection
tool consist primarily of financial statement line items and opportunity measures
with only a few financial ratios. 20 Most financial ratio measures received scores of
less than .5. Aside from selecting features based on dependencies and covariances,
another popular method to prune the variables is to use the feature importance
scores from the original model as a guide. For comparison purposes, I constructed
subset 5 to contain the top 25 features as shown in Figure 10.
20
   Subset 4 variables: soft assets ratio, interest expense, long term debt, big four auditors, accounts payable,
total liabilities, total assets, auditor change, total inventories, audit fees, current assets, sale of stocks, current
liabilities, total receivables, non-audit fees, property plant and equipment, cash and equivalents, debt in current
liabilities, net sales, cost of goods sold, retained earnings on assets, common shares outstanding.
                                                            66


       AUCs for both subsets are greater than that of the full model, once again
lending support to the premise that a more parsimonious set of features can help
reduce noise and improve predictions. Subset 4 yielded the best overall accuracy of
all the subsets and was able to identify over 80% of all fraud cases in the test set
with the precision of 78%. It fell slightly short on specificity (.758) but still improved
upon the full model and the model with all the organizational proxy measures
(subset 2). Subset 5 did similarly better at identifying fraud cases but not at non-
fraud cases, identifying 78% of the former and 73% of the latter. The overall
accuracy of subset 5 (75%) and AUC (.83) suggests that sole reliance of the feature
importance as a selection method may not yield the most effective improvement in
classification algorithm. Nonetheless, once the algorithm is trained, it offers some
helpful insights.
       Figure 11 shows which risk factors are most favorable by the algorithm when
making predictions from Subset 4 (model with highest accuracy). Many of the top
predictors appeared to lend support to existing empirical studies of financial fraud.
For example, total assets was identified as the top predictor once common equity
has been removed, which is consistent with prior empirical studies of corporate
crime (e.g. Beasley, 1996), but also raise questions about its validity as a proxy
measure for firm size. Net sales (the second most important predictor) and retained
earnings over assets (the fifth predictor) are also consistent with findings regarding
firm profitability and likelihood of committing financial fraud (e.g. Erickson, Hanlon
& Maydew, 2006). Non-audit fees, being identified as the topmost important
                                            67


opportunity proxy measure, lends support to continuous criticisms for the lack of
auditor independence (Davis, Soo & Trompeter, 2003). However, it is important to
be reminded that feature importance merely tells us how much class discriminatory
information each feature contains, it does not seek to establish directional
relationships between these input features and fraud. In other words, while these
features are predictive of fraud, the mechanisms and reasons (i.e., the how and
why) remain unknown. It is also noteworthy to point out that many of financial
distress related measures (such as debt or liquidity risk factors) are ranked lower in
importance than financial health measures (such as assets and profitability risk
factors), as prior studies have used financial ratios as proxy measures to indicate
financial distress and health in testing corporate strain (e.g. Schwartz, 2021).
                  Table 5. Random Forest Models with Risk Factor Subsets21
                                                               Random Forest
                                          Subset 1                Subset 2                Subset 3
                                       max depth: 70,          max depth: 50,          max depth: 95,
                                      max features: 6,         max features: 5,       max features: 3,
            Model Specification     min samples leaf: 1,     min samples leaf: 2,   min samples leaf: 1,
                                    min samples split: 2,    min samples split: 4,  min samples split: 3,
                                            n: 120                  n: 200                 n: 273
                              TP              86                      75                     78
                             TN               87                      63                     70
                              FP              24                      28                     21
                             FN               27                      24                     21
                       Accuracy             0.772                   0.726                  0.779
                     Sensitivity            0.761                   0.758                  0.788
                      Precision             0.782                   0.728                  0.788
                     Specificity            0.784                   0.692                  0.769
                      AUC (CV)           0.865(.842)             0.792(.729)            0.859(.840)
21
   Subset 1 is trained and tested on a sample of 894 while subsets 2-5 is trained and tested on a sample of 760.
This is due to financial measures being more reliably available for public traded companies than organizational
risk factors (please refer back to Chapter 4 for the handling of missing data).
                                                          68


                                          Table 5 (cont’d)
                                                         Random Forest
                                                Subset 4               Subset 5
                                             max depth: 10,       max depth: None
                                            max features: 3,       max features: 3,
                    Model Specification    min samples leaf: 2,  min samples split: 4,
                                          min samples split: 3,  min samples leaf: 3,
                                                  n: 277                n: 282
                                     TP             80                    77
                                    TN              69                    66
                                     FP             22                    25
                                    FN              19                    22
                              Accuracy            0.784                 0.753
                            Sensitivity           0.808                 0.778
                              Precision           0.784                 0.755
                            Specificity           0.758                 0.725
                              AUC (CV)         0.856(.820)           0.830(.820)
                      Figure 11. Feature Importance from Subset 422
22 Refer to Appendix D for key to input features.
                                                    69


5.4| Results from the 1:many Sample
   5.4.1| The Challenge of Classifying an Imbalanced Sample
       The above investigation provided valuable initial insights into how machine
learning methods can aid in fraud detection. However, as discussed in the previous
chapter, a 1:1 match is far from ideal. The proxy measures for company size alone is
wanting, and 1:1 match is likely not an accurate representation of the reality a
classifier will face. Figure 12 shows a class decomposition that is more likely to
reflect reality. The initial 1:many sample consist of 13,015 filings that are matched
by fiscal year and industry. Excluding cases with missing data, the final sample
used in the analysis presented in this section consists of 10,397 non-fraud filings
and 395 fraud filings. This represents a highly imbalanced but more realistic
scenario where the minority class accounts for less than 4% of the total dataset.
                Figure 12. 1:many Industry-Matched Sample (n= 10,972)
       Results in Table 6 shows how this class imbalance impact classification
performance in comparison to the previous sample. Consistent with the prior
                                           70


analysis of the 1:1 matched sample, I used AUC as the model selection metric and
compared a random forest and a neural network model to a logistic regression
model. The reported metrics are again based on testing done on a completely unseen
holdout sample where the train/test split ratio remained the same at 75/25. Due to
the class imbalance, it is important that I stratify the sample to keep the ratio of
fraud and non-fraud cases in the holdout test set. The algorithms are trained and
cross-validated on 8,094 observations, with 7,798 non-fraud cases and 296 fraud;
they are then tested on the holdout sample containing 2,698 observations, with
2,599 non-fraud cases and 99 fraud cases. 10-fold cross validation results for the
AUC scores are reported in parenthesis.
                           Table 6. Classification Results (n=10,792)
                                  Random Forest          Neural Network      Logistic Regression
                                                          hidden layer 1
                                                            neurons: 10
                                 max_depth: 25,
                                                        activation function:
                                 max_features: 6,
                                                              softmax
         Model Specification   min_samples_leaf: 2,                                   n/a
                                                          hidden layer 2
                               min_samples_split: 3,
                                                             neurons: 5
                                n_estimators: 325
                                                        activation function:
                                                                 relu
                          TP            3                          2                   8
                         TN           2599                      2599                 2579
                          FP            0                          0                  20
                         FN            96                         97                  91
                   Accuracy           0.964                    0.964                0.959
                 Sensitivity          0.030                    0.020                0.081
                  Precision           1.000                    1.000                0.286
                 Specificity          1.000                    1.000                0.992
                  AUC (CV)         0.515(.509)              0.510(.500)          .537(.526)
                                                     71


       At first examination, all three models boasted very optimistic scores in
overall accuracy, correctly classifying over 95% of cases. However, a closer
examination of the confusion matrices revealed that the class imbalance posed a
significant challenge to the machine learning algorithms. None of the algorithms
performed much better than a naïve classifier at distinguishing between fraud and
non-fraud cases, with logistic regression having a slight edge. Not only do the
algorithms lack training instances or “density” from fraud cases (Fernandez et al.,
2018), but when the minority class comprise less than 4% of the total training
sample (i.e., when the imbalance between classes is extreme), the algorithms
learned the rarity of the event. In other words, they predicted the majority class for
almost all cases, as doing so automatically meant they are correct 96% of the time
(as seen in the accuracy metric). This highlights the limited utility of the accuracy
metric and the usefulness of the sensitivity metric in data with imbalanced class
distributions. Even when model complexity is increased, the random forest and the
neural network algorithms only predicted 3 and 2 fraud cases, respectively, with no
false positives at all. Thus, these models suffered from poor recall (low sensitivity).
Logistic regression predicted 8 fraud cases but also generated 20 false positives,
yielding poor sensitivity and precision, but overall a slightly better AUC than
random forest and neural network. In sum, to answer our research questions, even
though the machine learning algorithms yielded high overall accuracy, they did not
perform much better than a naïve classifier in a real-world setting. In comparison,
                                           72


logistic regression outperformed both algorithms but also only marginally better
than a naïve classifier.
       The extreme class imbalance also generated another complication with the
cross-validation process. When the sample is split into 10 folds, some folds may not
contain any fraud cases at all. As a result, validations scores among folds varied
widely, some with zero AUC scores, yielding averages that are drastically different
from the test scores. To remedy this, I stratified the data to ensure close to equal
ratio of fraud to non-fraud cases in each cross-validation fold. Once that is
accomplished, the 10-fold cross-validation produced consistent scores compared to
the test set and between folds.
   5.4.2| Synthetic Minority Oversampling Technique (SMOTE)
       The imbalance class problem is not unique to fraud detection, as rare events
have long been the subjects of interest in academic research and real-world
applications alike. With the expansion of data sources and the increased popularity
of big data analytics, research on methods to tackle imbalanced data have also
increased (He & Garcia, 2009). Yet, since each dataset has its own unique
characteristics that can exacerbate the classification challenge associated with
imbalance data (e.g., size, label noise, data distribution), there is no one-size-fits-all
strategy (Fernandez et al., 2018). There are many different schools of thoughts
when it comes to handling imbalanced data, but the most dominant approach is to
                                           73


target the sampling methodology.23 This section represents an exploratory analysis
on the effectiveness of one such technique on corporate financial fraud detection.
Specifically, I applied a widely implemented resampling technique called Synthetic
Minority Over-sampling Technique (SMOTE) (Chowla, Bowyer & Kegelmeyer,
2011) to the 1:many imbalanced data above, and evaluated its performance using
the same random forest, neural network algorithms and logistic regression.
         Sampling methods used to handle imbalance data have a straightforward
goal—to balance the class frequencies either through under-sampling, or over-
sampling, or both. The most simplistic methods would be to randomly duplicate
observations in the minority class (random over-sampling) or randomly remove
observations from the majority class (random under-sampling). SMOTE differs from
random over-sampling by generating new observations based on feature space
similarities between existing observations of the minority class (Chowla et al.,
2011), instead of simply duplicating existing data. Using k-nearest neighbor,
SMOTE identifies observations that are close in the feature space, forms a line
through those observations and create new observations along that line. This helps
build larger decision regions surrounding those fraud cases (Chowla et al., 2011).
         Since the 1:1 matched sample already resembles an under-sampling strategy,
I chose to investigate how an over-sampling strategy would compare. The SMOTE
algorithm I used came from the Imbalanced Learn library. Recall that our
23
   Other schools of handling imbalanced data include cost-sensitive methods, kernel-based learning methods,
active learning methods and one-class learning methods. He and Garcia (2009) offers a more detailed survey of
these methods.
                                                     74


preprocessed training data consist of 7,798 non-fraud observations and 296 fraud
observations. After the implementation of SMOTE, the minority class have been
over-sampled to match the 7,798 majority class, yielding a total training sample of
15,596. Note that SMOTE is only applied to the training data and not the holdout
test set. This way, performance evaluation of the algorithms is not impacted by
resampling of the test data. To better visualize the effect of this over-sampling
procedure, I created scatterplots of two financial variables before and after
resampling (Figure 13).
       Table 7 shows the random forest and the neural network models in
comparison to logistic regression after SMOTE is implemented. Recall that these
reported metrics are based on the same holdout sample from the previous section,
which contains 2,698 observations (2,599 non-fraud and 99 fraud cases). All three
models improved in identifying fraud cases. The random forest algorithm improved
the least, only identifying 19% of all the fraud cases. In comparison, the neural
network and the logistic regression models with resampled data were able to
identify over 60% of the fraud cases. However, in the process of doing so, both
models generated a lot of false positives, hence producing extremely low precision
scores. Interestingly, despite the low recall, random forest scored higher in
precision. It generated fewer false positives but made very little positive predictions
to begin with, suggesting that SMOTE may have only improved the algorithm
marginally by not predicting all cases as non-fraud. In contrast, while neural
network and logistic regression scored lower in specificity (.650 and .631,
                                            75


respectively) in comparison to random forest (.978), it may be an artifact of the high
false positives. In other words, SMOTE appeared to have made the neural network
and logistic regression models predict fraud more frequently24.
        The 10-fold cross validation process is slightly more complicated when a
resampling method is introduced. If one were to over-sample the entire training set,
then split it into 10 folds, information about data from one fold would likely appear
in another fold and the test set for each fold would also have contained resampled
data. This data leakage problem would ultimately cause the cross-validated metric
to generalize poorly to unseen data, as the algorithm that was supposed to be
learning from data within one fold would have also learned from data leaked from
another fold and from the test set, nullifying the validation process. Therefore, to
obtain a more robust cross-validation result, I first split the sample into 10 folds,
stratifying to ensure equal proportions of fraud to non-fraud cases in each fold, then
split each fold into training and test sets, and performed SMOTE on only the
training data of each fold prior to training and testing. This resulted in a more
generalizable average AUC score, as shown in the relatively low discrepancies
between cross-validation and test results.
24 To account for potential variations between relevant periods, I also performed supplemental
analysis with the fiscal year variable included in the random forest and logistic regression models
with SMOTE. Results are reported in Appendix E.
                                                   76


Figure 13. Training Sample Before and After SMOTE
                        77


                     Table 7. Results of 1:Many Sample After SMOTE
                                Random Forest           Neural Network    Logistic Regression
                                                         hidden layer 1:
                               n estimators: 183,          neurons= 10
                              min samples split: 4,    activation=softmax
                              min samples leaf: 3,       hidden layer 2:
         Model Specification                                                       n/a
                              max leaf nodes: 48,           neurons= 5
                                max features: 5,         activation=relu
                                 max depth: 110           batch size: 50
                                                           epochs: 150
                          TP           19                        48                 64
                         TN           2543                     2098               1641
                          FP           56                       501                958
                         FN            80                        51                 35
                   Accuracy          0.950                    0.795              0.632
                 Sensitivity         0.192                    0.485              0.646
                  Precision          0.253                    0.087              0.063
                 Specificity         0.978                    0.807              0.631
                  AUC (CV)        0.846(.867)              0.702(.681)        .659(.661)
       To explore whether feature selection in conjunction with oversampling can
help counterbalance the challenges brought about by class imbalance, I used the
combination of financial and organizational risk factors in subset 4 above to train
the three different algorithms with the 1:many dataset. The same performance
evaluation metrics as the previous sections are reported in Table 7. Feature
selection improved the random forest algorithm’s ability to identify fraud cases from
19% to 59%. However, this improvement did come at the cost of precision (from 25%
of the time correct when predicting fraud, to 12%). In other words, it had generated
many more false positives, as the other algorithms did in the previous section with
over-sampling alone. Specificity also declined for the random forest algorithm, from
being able to identify 98% of the non-fraud case to 84%. Nonetheless, overall
accuracy and the ability to distinguish between classes are still superior to the other
                                                    78


two models. With a more parsimonious subset of input features, neural network
experienced slight improvement in overall accuracy, precision, specificity, and AUC
scores. However, sensitivity in fraud case have decreased from 62% to 53%, and
precision only improved marginally (from 6% to 8%). Finally, pruning some features
appeared to have slightly impacted the classification performance of logistic
regression in a negative manner. Note that for equal comparisons, I have not
changed the classification thresholds to optimize sensitivity or precision, as each
algorithm will have a different optimal thresholds per their different ROC curves.
Further optimization of these algorithms for applications will be discussed in the
following chapter.
                  Table 8. Results with SMOTE and Feature Selection
                                Random Forest          Neural Network      Logistic Regression
                                                        hidden layer 1
                                                          neurons: 18
                              n_estimators: 277,
                                                      activation function:
                             min_samples_split: 3,
                                                            softmax
         Model Specification min_samples_leaf: 2,                                   n/a
                                                        hidden layer 2
                               max_features: 3,
                                                          neurons: 10
                                max_depth: 10
                                                      activation function:
                                                               relu
                          TP          58                        52                  67
                         TN         2174                      1983                 1527
                          FP         425                       616                 1072
                         FN           41                        47                  32
                   Accuracy         0.827                    0.754                0.591
                 Sensitivity        0.586                    0.525                0.677
                  Precision         0.120                    0.078                0.059
                 Specificity        0.836                    0.763                0.588
                  AUC (CV)       0.802(.711)              0.708(.716)          0.650(.632)
                                                   79


5.5| Key Findings
   •  Both the random forest and the neural network algorithms performed well
      above a naïve classifier in a balanced sample (research questions 1 and 4)
   •  Random forest outperformed logistic regression by a clear margin in terms of
      overall predictive accuracy and ability to distinguish between the two classes;
      its performance surpassed logistic regression in all metrics except sensitivity
      (research question 3)
   •  Neural network’s classification ability is subpar compared to logistic
      regression, performing slightly worse across all metrics (research question 5)
   •  A random forest model with only financial statement line items as input
      measures yielded the highest AUC score (.865), whereas model using
      financial ratios as proxy measures performed the worst (research question 3)
   •  A random forest model with a mixture of financial, motivational and
      opportunity measures yielded the highest accuracy (78%) in the classification
      of unseen fraud and non-fraud cases (research question 3)
   •  Feature importance identified several financial measures that are consistent
      with prior empirical studies in financial fraud (research question 3)
   •  Auditor independence measured by non-audit fees was identified as a key
      concept for guardianship and opportunity structures that is predictive of
      fraud (research question 3)
                                          80


• Measures of financial distress (debt and liquidity related risk factors) rank
  lower in importance than measures of financial health (performance and
  asset based risk factors) (research question 3)
• Both machine learning algorithms and logistic regression performed poorly in
  a heavily imbalanced dataset, but was able to improve identification of fraud
  cases to above 50% with the use of an oversampling strategy (SMOTE) and a
  more parsimonious set of features. Random Forest remained the best
  performing algorithm.
• Imbalanced data cautioned the use of single metrics to evaluate a classifier in
  rare events
• Since the 1:1 demonstrated promising performance and essentially
  represents an under-sampling strategy, further improvement is hopeful with
  the help of analytic strategies specifically designed to handle imbalanced
  data
                                      81


                              CHAPTER 6. DISCUSSION
       Corporate financial crime research is fraught with challenges; not only is
financial crime a subject that requires interdisciplinary expertise, complex financial
data is also associated with a myriad of methodological difficulties. Yet, without
more cross-disciplinary, empirical research to guide regulations, or some form of
alleviation to combat the limited resources to enforce these crimes, there is little
hope to curtail the continued recurrence of large-scale corporate scandals. The
overarching goal of this dissertation is to investigate whether some of these
research and enforcement related challenges can be overcome with the help of
recent developments in artificial intelligence. I sought to develop and train two
machine learning algorithms and assess their ability in detecting corporate
financial fraud with a set of publicly accessible financial and organizational risk
factors. Results discussed in the previous chapter have shown promising results in a
controlled setting, but have also demonstrated that more future research is required
in order to implement the machine learning algorithms as a real-world decision aid
in fraud detection.
6.1| Limitations of the Present Study
       Before discussing the results further, I must first point out the limitations of
the current study. To identify cases of corporate financial fraud, I have primarily
relied on official enforcement releases from the SEC. As discussed in Benson,
Kennedy, and Logan (2016), there are no standards for systematically reporting
                                           82


violations of any specific corporate violations. The AAERs used to identify fraud
cases comprised a variety of civil, administrative, and criminal proceedings that is
deemed to be related to accounting and auditing. Some AAERs pertain to a single
company with multiple years of violations and multiple individuals charged; other
companies have multiple AAERs across different years. The unit of analysis used in
this dissertation (firm-years or filings) partially mitigated this inconsistency, but
enforcement releases are by no means systematic or comprehensive. It is also part
of the reason the true ratio of fraud to non-fraud cases is unknown. The sample of
fraudulent filings I collected suggest a downward enforcement trend of corporate
financial fraud (Chapter 4). How much of this decrease can be attributed to the
deterrence effect of regulations such as the Sarbanes-Oxley Act or the Dodd Frank
Act, and how much can be attributed to the reduction in enforcement effort is an
empirical question. While I have taken the precautionary measure of checking for
restatements in the non-fraud group to ensure their filings are reasonably
compliant, it is possible that some corporate financial fraud went undetected and
uncorrected and therefore resulted in no restatements. An uncaught fraud filing
labeled as non-fraud impact our analysis by introducing noise to the data that may
be irreducible. Fraud cases also impact our interpretation of the analysis, as the
fraud sample used in this dissertation represent the likelihood that a firm has
committed fraud and are prosecuted. Therefore, it is important to improve our
understanding of the decision processes used to prosecute a fraud case. Our limited
understanding of the fraud to non-fraud distribution also impacts our sampling
                                            83


effort and our evaluation of the models. As demonstrated in the previous chapter,
training algorithms with a 1:1 sample and a 1:many sample yielded very different
results.
       Related to the reliability of corporate crime data sources is the accessibility of
corporate crime risk factors. Aside from financial risk factors and motivational
proxy measures (most often measured with financial ratios) that can be extracted
from the 10-k, other organizational variables are not consistently reported or easily
accessible. While I was able to obtain some information about the corporation’s
officers and auditors, much of the missing values from the dataset pertain to these
organizational risk factors. Board and committee information that was once readily
available through the various databases is no longer available or maintained. The
omission of these variables may have substantial impact on the prediction of fraud,
as they shed light on compensation/incentive and guardianship structures that may
be key factors in offenders’ decision processes.
       One common criticism of machine learning research pertains to the lack of
interpretability of the trained models. This weakness of machine learning is also its
strength. In traditional statistical modelling, we are assuming the reality fits a
certain function that can be expressed in an easily understood mathematical
formula. We use goodness of fit tests to assess said assumption. In contrast,
machine learning makes the assumption that a model that predicts the best is most
representative of reality, and reality may not always fit neatly into a mathematical
equation (Breiman, 2001). It is because of this tradeoff between interpretability and
                                            84


predictive ability that I chose to compare machine learning methods with more
commonly employed method in corporate crime research. The hope is to examine
whether foregoing interpretability will help improve generalizability of our findings.
However, this tradeoff is also why it is especially important for machine learning
researchers to communicate what the algorithms can and cannot achieve. Results
presented in Chapter 5 of this dissertation speaks to prediction and prediction only;
we must be careful when interpreting parts of the analysis such as feature selection
and feature importance. While some of the results showed support for existing
findings, we should take caution to not ascribe a causal link or a relationship
direction between the risk factors and corporate financial fraud. There is also a
spectrum of interpretability among machine learning algorithms. I chose to test one
of the most interpretable algorithms against one of the least interpretable one to
examine whether they provide different advantages in different use cases. If two
algorithms were to perform similarly in the metrics we wished to prioritize, the
more interpretable one would likely be more desirable.
       Finally, as only publicly traded corporations have annual financial reporting
requirements, by our definition of fraud in violation of section 13(b) of the Securities
Exchange Act, no privately held firms was examined. Yet, Benson et al. (2016)
pointed out that publicly held companies represent only 1% of all corporations.
Although financial fraud in those privately held firms may have less impact on
public investors, the harm they impose on employees, consumers and other
                                           85


stakeholders can be equally devastating, as exemplified by the recent case of
Theranos (Straker, Nusem & Wrigley, 2021).
6.2| Discussion and Implications
   6.2.1| General Discussion and Methodological Implications
        In a balanced 1:1 sample, both the random forest and the neural network
algorithms performed more effectively than a naïve classifier that would either
classify cases at random or consistently predict only one of the classes. Random
forest performed especially well across all categories, suggesting that in a controlled
setting, the full set of risk factors do contain sufficient class discriminatory
information to classify fraud cases from non-fraud cases with 75% accuracy. While
the current analysis cannot answer how or why these risk factors predict fraud with
this level of accuracy, this gives grounds for future theory development and testing.
It is especially true when feature engineering appeared to be able to effectively
improve the algorithm’s learning ability by reducing unwanted noise. With more
parsimonious subsets of input measures, random forest was able to achieve
accuracy as high as 78%, showing promise for further improvements should the
quality of measures improve, or should more features become accessible. In
comparison to logistic regression’s overall performance (accuracy of 56%), the
random forest algorithm may be a case where the substantial increase in predictive
accuracy outweighs the concerns of interpretability. In contrast, neural network,
while performing adequately, represents the least interpretable and least effective
                                             86


classifier. Therefore, there is little justification for using neural network for fraud
prediction, especially since the balanced sample is not large enough in size to play
to its strength.
        Financial accounting rules are often not black and white as commonly
misunderstood. Much of financial accounting requires judgment and estimations
that are based on assumptions agreed upon by industry standards or between
management and external auditors. As such, the task of predicting corporate
financial fraud is an inherently challenging endeavor. However, this also means
that using official enforcement data has its merits, as it implies that the identified
cases possess characteristics that make a seemingly gray area less so. That is, there
must be some characteristics embedded in these fraud cases that make them
unambiguously fraudulent. The question at hand is therefore, whether we can
capture and measure these characteristics adequately.
        One of the more important findings of this research pertains to such
measurement issues in the study of corporate crime. As shown in the feature
selection analysis (Table 4), the random forest algorithm with the highest AUC
score was trained with financial statements line items only (subset 1), and the
model with the lowest AUC score was trained by the organizational proxy measures
only. This discrepancy casts doubt as to the validity of financial ratios as proxy
measures for motivational risk factors. It is noteworthy because financial ratios are
very frequently used in corporate crime research as proxy measure to a
corporation’s financial health or financial distress. Yet, the adoption of financial
                                              87


ratios often make little theoretical sense and, as shown in the comparisons of
models trained by subsets 1, 2 and 3, also make very little sense analytically.
Financial accounting uses a double entry system. Roughly speaking, that is when a
transaction occurs, it is recorded as a debit to one account on the financial
statements, as well as a credit to another account. This is why there exists an
inherent risk of multicollinearity when using figures from the financial statement
line items. Financial ratios are used by accountants, auditors and analysts in the
financial statement analysis process, and often used to assess period to period
changes or trends. However, since they are often computed with two or more
financial statement line items, when used in the same statistical model, the
multicollinearity issue becomes magnified, especially when multiple ratios are used
in the same analysis.
        For example, tests of corporate strain often include a ratio for financial
health and another for financial distress. A popular proxy for financial health is
return on assets, which is computed by dividing net income (or some form of
earnings before or after extraordinary items) by total assets. A frequently used
financial ratio to measure financial distress is the Altman’s z-score, which consists
of five components (all ratios in and of themselves), two of which contains
earnings/income and total assets, and the remaining three components are often
included in the analysis as proxies for some other theoretical concept. Doing so is
essentially capturing the same few financial statements line items multiple times in
an analysis, while also artificially imposing a restriction range due to the nature of
                                           88


a quotient. More importantly, even if the use of financial ratios is theoretically
appropriate and does not violate assumptions of the statistical methods used, the
analysis presented in the previous chapter suggests that they do not make very
good predictors of fraud. Put simply, there is little justification for the use financial
ratios when raw financial statement line items can be used just as easily and with
greater classification power.
        Overall, machine learning represents a helpful tool to investigate how well a
set of risk factors can predict fraud. The underlying premise of machine learning
techniques is that if an algorithm can predict/generalize well on unseen data, it is
more likely to be reflective of reality, regardless of whether the model can be
specified in a neat mathematical formula with closed-form solutions. Therefore, it
provides a basis for future research to further explore how said set of risk factors is
related fraud. While more commonly used statistical models such as logistic
regression can specify a function that explains the relationship between the risk
factors and the fraud outcome, the results presented in this dissertation have shown
that it may possess less predictive power. In other words, the logistic regression
model, despite being more transparent, may be flawed in reflecting the true
relationships between the risk factors and the fraud outcome. Statistical modeling
and machine learning thus provide different utilities in the scholarly pursuits of
understanding corporate crime—machine learning allows us to examine how
predictive a risk factor is, whereas statistical models allow us to examine how a risk
factor impact crime.
                                            89


   6.2.2| Theoretical Implications
       Despite the necessary precaution against overinterpreting results from
feature importance, the findings do shed some light on theory and direction for
future work. Due to the improvement feature selection analysis made to the
learning of the random forest algorithm, the combinations of risk factors used in the
subsets warrant further theoretical investigation, especially subsets 3 and 4 where
particular subsets of financial and organizational risk factors were able to increase
classification accuracy. Ranked order feature importance for the best the subset 4
model identified several financial features—particularly total assets and net sales—
that are consistent with existing with empirical studies (e.g., Beasley, 1996,
Erickson, Hanlon & Maydew, 2006). However, this consistency is only limited to
these risk factors being predictive of fraud and non-fraud filings; as to their causal
direction, relationship signs (positive or negative) or strength, the random forest
algorithm is silent.
       Prior studies on corporate strain have included financial health and distress
indicators, primarily measured by financial ratios (e.g. Schwartz et al., 2021).
Results from the random forest classifier suggest that financial health risk factors
are more class discriminatory than financial distress risk factors, which supports
prior findings on fraud firms being more profitable and exhibit lower bankruptcy
risk (Schwartz et al., 2021). When applying the anomie-strain perspective to
corporate crime, prior studies have proposed that crime does not always arise from
negative circumstances (e.g., Benson, 2010). Rather, motivations to commit fraud
                                            90


can arise from pressure in maintaining performance and growth relative to peer
firms. Risk factors relating to debt and liabilities exhibited less class discriminatory
value in the random forest models than risk factors relating to sales and assets, but
are still relevant in the prediction process. This suggests possible differential
mechanisms through which strain can impact a corporation’s likelihood to offend.
Future research should explore this more thoroughly. With regards to opportunity
risk factors, some external auditor related measures are consistent with extant
research and theory on non-independence (Davis, Soo & Trompeter, 2003). Yet,
other opportunity measures such as officer change, officer duality, and auditor
change are not identified as important, in contrast to previous studies (e.g.,
Simpson & Koper, 1997). Further investigation is necessary to determine how much
data cardinality played a role in this, but it may support further inquiries on
guardianship capability in the corporate setting (e.g., Chan & Gibbs, 2022).
       Other less commonly tested risk factors identified by feature importance bar
charts also suggest that random forest may be able to capture nonlinear
relationships that may not be captured by general or generalized linear models. A
sequential explanatory mixed method study may help shed light on the mechanisms
with which these risk factors relate to fraud prediction. However, to truly capture
each individual risk factor’s impact on prediction, it may require the exclusion of
each risk factor from a model to compare the differences in classification results.
This is an especially labor-intensive task with machine learning, as each model
needs to be tuned and cross-validated to ensure generalizability.
                                           91


   6.2.3| Practical Implications
        Since the project is partly motivated by evaluating possible decision aid tools
to combat resource constraints in corporate crime enforcement, I trained and tested
the machine learning algorithms in a 1:many sample that might better reflect real-
life fraud detection scenarios. Both machine learning algorithms and logistic
regression trained with the heavily imbalanced sample performed poorly in the test
set, but showed improvements after over-sampling with SMOTE and in conjunction
with feature selection. This investigation cautioned the generalization of a 1:1
matched sample. It also cautioned the reliance of evaluation metrics without
thoroughly consulting the confusion matrix. When working with imbalanced data, a
classifier that heavily favored the majority class in its prediction will naturally yield
unrealistically high accuracy and specificity scores. In such cases, it is more helpful
to examine the precision-sensitivity tradeoff, which brings us to the importance of
defining the goal for a decision aid prior to its actual implementation. If the goal
were to detect new fraud cases, one might choose to optimize sensitivity when
training the algorithms, as one might be willing to accept more false positives, as
long as it helps to identify more fraud cases. However, if the concern is to conserve
resources, then optimizing precision may be of higher import. The algorithms
presented in Table 7 are optimized based on AUC for consistency in comparison. All
three models appeared to favor sensitivity at the expense of precision. However, in
practice, one would examine the ROC curve or the precision-recall curve to
determine a threshold that best suit the goal of the decision aid.
                                            92


       As this portion of the dissertation merely represents an initial exploration to
the practical application of a fraud detection algorithm, many potential solutions to
the class imbalance issue have yet been explored. For example, I have discussed the
different over- and under-sampling techniques in the previous chapter. Since the 1:1
matched sample have shown promising results, an under-sampling technique or a
combined sampling strategy may be explored. Aside from resampling, there are also
cost-sensitive learning algorithms that have been shown to be superior in
classifying imbalanced data in comparison to resampling (McCarthy, Zabar &
Weiss, 2005). These algorithms do not assume all misclassification errors to be
equal. For example, in medical research, a missed diagnosis of a lethal disease is a
more serious error than a false positive diagnosis. To reflect this differential
consequence in classification error, each class is assigned a misclassification cost,
and optimizing the algorithm is about minimizing classification cost rather than
optimizing a certain evaluation metric (He & Ma, 2013). There is also ensemble
learning, which combines multiple algorithms to create better prediction. For
instance, while our results have shown random forest to be a better overall
classifier, logistic regression performs better at identifying fraud cases (sensitivity)
in certain instances. Ensemble learning can yield better prediction by leveraging
each algorithm’s strength.
   6.2.4| Directions for Future Research
    As this project merely signifies a first step in exploring the use of machine
learning to study corporate crime, the discussion above has revealed a long list of
                                            93


potential future inquiries. With regards to improving fraud prediction with machine
learning, future research should consider other machine learning algorithms such
as support vector machines or incorporate other ensemble or boosting methods.
Future research should also explore ways to combat class imbalance in the fraud
detection setting, including the use of resampling and cost-sensitive learning
algorithms. Future corporate crime research should seek to examine class
discriminatory risk factors identified in this research and assess how they may
differentially impact different types of corporate crime or corporations in different
industries. Findings on financial distress ranking lower financial health in feature
importance also suggest the need to explore differential mechanisms with which
corporate strain affect likelihood to offend. The analysis also identified auditor
independence as a key concept of guardianship and opportunity structure that
warrants further study. Finally, to improve robustness of corporate crime research
findings, we should aim to explore better measurements to capture motivational
risk factors for theoretical inquiries. We should also better define and measure firm
size or employ more rigorous matching techniques (such as propensity scores).
   6.2.5| Conclusion
       In this dissertation project, I set out to investigate whether machine learning
algorithms, trained by a set of multidisciplinary risk factors, can be used in lieu of
more commonly employed statistical models to detect corporate financial fraud.
Overall, the results have shown random forest to be a promising algorithm in the
fraud classification. Although more work needs to be done for a real-world
                                            94


application, the fraud detection ability of the random forest algorithm has
surpassed that of logistic regression. Applications of algorithms have been criticized
in the criminal justice realm for perpetuating biases on vulnerable populations. It is
my hope that with more rigorous, transparent, and reproducible research
procedures, algorithms can be used to address a crime type that is understudied,
underenforced and perpetrated predominantly by powerful corporations.
                                           95


APPENDICES
    96


                   APPENDIX A. SEARCH TERMS & DATABASES
Search Terms. Variants of the following keywords are used to search the databases
to identify risk factor research related to corporate financial fraud:
    •  “accounting fraud”
    •  “financial fraud”
    •  “financial reporting fraud”
    •  “financial statement fraud”
    •  “management fraud”
    •  “earnings management”
    •  “financial misstatement”
    •  “earnings quality”
    •  “audit quality”
Databases. The following databases are used to search for empirically measured
and tested, organizational-level risk factors related to corporate financial fraud:
    •  Criminal Justice Abstracts
    •  ProQuest Criminal Justice
    •  NCJRS Abstracts, Sociological Abstracts
    •  ABI/INFORM Collection
    •  Business Source Complete
    •  Business Economic and Theory Collection
    •  Business Insights
    •  JSTOR
    •  SAGE Journals Online
    •  CSA Linguistics and Language Behavior
    •  Interdisciplinary Science Reviews
    •  American Journal of Sociology
    •  Social Science Research
    •  SciFinder
    •  Web of Science
    •  IEEE Transactions
    •  Communication Abstracts
                                            97


                APPENDIX B. CORPORATE FINANCIAL FRAUD RISK FACTORS
                     Table 9. Financial Fraud Risk Research and Synthesis
     Risk Factor      Fraud Element               Explanation                     Empirical Research
Accounts Receivable     Motivation     Proxy measure for liquidity       Kaminski et al. (2004)
to Sales
Accounts Receivable     Motivation     Proxy measure for efficiency in   Kaminski et al. (2004)
Turnover                               debt collection
Altman Z Score          Motivation     Low scores indicate financial     Fanning & Cogger (1998); Brazel
                                       distress; bankruptcy risk         et al. (2009); Perols et al. (2011);
                                                                         Perols et al. (2017); Wang &
                                                                         Holtfreter (2012); Schwartz et al.
                                                                         (2021)
Asset Quality Index     Motivation     Proportion of total assets with   Beneish (1999)
                                       less certain future benefits (> 1
                                       indicates potential for cost
                                       deferral by capitalizing)
Audit Committee        Opportunity     Proxy for board member            Klein (2003)
Independence                           independenece
Audit Committee        Opportunity     More effective governance         Abbott et al. (2000); Uzun et al.
Meetings                                                                 (2004); Farber (2005)
Audit Fees             Opportunity     Proxy measure for audit quality   Ferguson et al. (2003)
Auditor Change         Opportunity     Proxy measure for audit quality;  Myers et al. (2003)
                                       new auditors are less familiar
                                       with business
Auditor Tenure         Opportunity     Can represent both risk and       Myers et al. (2003); Carcello &
                                       protective factor--complacency    Nagy (2004)
                                       decrease audit quality and
                                       turnover may indicate opinion
                                       shopping; auditor knowledge of
                                       client improve audit quality
                                                 98


                                      Table 9 (cont’d)
     Risk Factor     Fraud Element             Explanation                     Empirical Research
Auditor is Big4 Firm  Opportunity  Proxy for audit quality; Big 4     Fanning and Cogger (1998);
                                   firms have more resources and are Carcello and Nagy (2004); Farber
                                   more concerned with reputation     (2005)
                                   should an audit fail
Board Size            Opportunity  Less effective governance          Uzun et al. (2004)
Book to Market Ratio   Motivation  Proxy for under- or over-valuation Carcello and Nagy (2004); Efendi
                                   for a firm                         et al. (2007)
Cash Margin            Motivation  Proxy measure for financial        Green & Choi (1997)
                                   health/profitability
CEO Duality           Opportunity  CEOs who are also chairpersons/    Carcello and Nagy (2004); Uzun
                                   presidents of the board have       et al. (2004); Farber (2005)
                                   potential conflict of interest in
                                   corporate governance
CEO Duality or Board  Opportunity  Officer (non)independence          Simpson & Koper (1997); Klein
Independence                                                          (2003)
CEO is Founder         Motivation  CEOs who are also founders have Dechow et al. (1996)
                                   potential conflict of interest in
                                   corporate governance
CEO Turnover          Opportunity  Can signal internal detection of   Dechow et al. (1996); Fanning &
                                   misconduct                         Cogger (1998); Feroz et al. (2000);
                                                                      Arthaud-Day et al. (2006);
                                                                      Simpson &Koper (2007)
Change in Free Cash    Motivation  Proxy measure for financial        Dechow et al. (1996); Dechow et
Flow                               distress/lack of liquidity         al. (2011)
Change in Cash Sales   Motivation  High growth firm are associated    Jones (1991); Fanning and Cogger
                                   with higher external pressure on   (1998); Beneish (1999); Bell and
                                   achieving earning targets          Carcello (2000); Erickson et al.
                                                                      (2006); Efendi et al. (2007);
                                                                      Brazel et al. (2009)
                                              99


                                       Table 9 (cont’d)
     Risk Factor      Fraud Element              Explanation                  Empirical Research
Change in Non-Cash      Motivation  Proxy measure for risk             Dechow et al. (1996)
Operating Assets                    diversification
Change in Receivables   Motivation  More receivables signal less cash Green & Choi (1997)
                                    flow
Compensation            Motivation  Protective Factor--Provides        Klein (2003); Uzun et al. (2004)
Committee                           oversights on executive
                                    compensation to reduce potential
                                    conflict of interest
Compensation           Opportunity  More oversight                     Uzun et al. (2004)
Committee
Compensation           Opportunity  More oversight                     Klein (2003)
Committee
Independence
Days in Receivables    Opportunity  Revenue inflation                  Beneish (1997); Chen and
                                                                       Sennetti (2005)
Debt to Equity Ratio    Motivation  Measures firm's financial leverage Kaminski et al. (2004); Fanning &
                                                                       Cogger (1998)
Decentralized          Opportunity  Decentralized organizational       Simpson & Koper (1997)
Organization                        structure are more crime
                                    conducive
Depreciation Index      Motivation  Signals upward revision of asset   Beneish (1999)
                                    useful lives or change in
                                    depreciation methods
Disclosure Complexity  Opportunity  Firms may intentionally            Humphreys et al. (2011)
                                    complicate financial disclosure to
                                    conceal financial misconduct
                                              100


                                      Table 9 (cont’d)
     Risk Factor      Fraud Element            Explanation                      Empirical Research
Discretionary          Opportunity  The unobservable portion of total Jones (1991); DeFond &
(Abnormal) Accruals                 accruals that is subjected to      Jiambalvo (1994); Dechow et al.,
                                    managerial discretion              (1995); Marrakchi et al., (2001);
                                                                       Perols and Lougee (2009); Dechow
                                                                       et al. (2012)
Equity Compensation     Motivation  Officer (non)independence          Gillett & Uddin (2005); Erickson
Committee                                                              et al. (2006)
External Board         Opportunity  Higher proportion of external      Beasley (1996); Beasley (2000);
Members                             board members represent more       Abbott et al. (2000); Carcello and
                                    oversight and less chance for      Nagy (2004); Farber (2005);
                                    collusion                          Crutchley et al. (2007)
Financial Expert in    Opportunity  Protective Factor--Financial       Farber (2005)
Audit Committee                     expert in audit committee
                                    represent stricter guardianship
                                    and higher risk of fraud discovery
Fixed to Total Assets   Motivation  Proxy for efficient management of  Kaminski et al. (2004)
                                    assets
In-the-Money Options    Motivation  Higher conflict of interest for    Efendi et al. (2007)
                                    officers
Industry Competition    Motivation  Low ratio represents higher        Rasheed & Prescott (1992);
                                    competition amongst firms in the   Palmer & Wiseman (1999);
                                    industry                           Ndofor et al. (2015)
Inventory Related      Opportunity  There are many ways to             Fanning & Cogger (1998);
Measures/Ratios                     manipulate inventories as          Summers & Sweeney (1998);
                                    management have discretion over Kaminski et al. (2004)
                                    accounting and valuation methods
                                              101


                                      Table 9 (cont’d)
      Risk Factor    Fraud Element             Explanation                   Empirical Research
Inventory to Sales     Motivation  Proxy for efficient management of Kaminski et al. (2004)
                                   inventories
Meeting/ Beating       Motivation  Management may experience         Coleman (1987); Finney and
Analyst Consensus                  pressure or incentives to meet/   Lesieur (1987); Kagan et al.
                                   beat analyst forecast on          Gunningham et al. (2004); Perols
                                   performance                       & Lougee (2009)
Non-Audit Fees        Opportunity  Proxy for auditor                 Frankel et al. (2002)
                                   (non)independence
Officer Change        Opportunity  Change in top management          Simpson & Koper (1997)
                                   disrupts social control
Officers are          Opportunity  More effective governance         Albrecht et al. (2018)
Accountants
Operating Leases      Opportunity  Operating leases allow firm to    Dechow et al. (2012)
                                   recognize earnings early and
                                   reduce reported debt
Operating Leases      Opportunity  Proxy for off-balance shee        Krische, Sanders & Smith (2012)
                                   financing (i.e. way to front load
                                   earnings)
Outside Audit         Opportunity  Proxy for board member            Abbott et al. (2000); Crutchley wt
Committee Member                   independenece                     al. (2007)
Retained Earnings on   Motivation  Measures relianace debt or equity Dechow et al. (2011); Kaminski et
Asset                              financing                         al. (2004)
Return on Asset        Motivation  Proxy measure for financial       Wang & Holtfreter (2012);
                                   health/profitability              Schwartz et al. (2021)
Return on Equity       Motivation  Financial health/growth           Feroz et al. (2000), Wang &
                                                                     Holtfreter (2012)
Sale of Stock          Motivation  Measures incentive to maintain    Dechow et al. (2011); Beneish
                                   high stock prices                 (1999)
                                             102


                                     Table 9 (cont’d)
     Risk Factor     Fraud Element             Explanation                    Empirical Research
Sales Growth           Motivation  High growth firm tend to          Jones (1991); Green & Choi
                                   experience more external pressure (1997); Fanning & Cogger (1998);
                                   on achieving earning targets      Beneish (1999); Myers et al.
                                                                     (2003); Bell & Carcello (2000);
                                                                     Lin et al. (2003); Erickson et al.
                                                                     (2006); Efendi et al. (2007);
                                                                     Brazel et al. (2009)
Soft Assets Ratio      Motivation  Financial health; ability to      Dechow et al. (2011)
                                   manage short term earnings
Stock Price at Year    Motivation  Proxy for financial health        Dechow et al. (2011)
End
Total Accruals to      Motivation  High proportion of accrual as     Beneish (1997); Dechow et al.
Total Assets                       opposed to cash are associated    (1996); Beneish (1999); Lee et al.
                                   with sales                        (1999); Crutchley et al. (2007);
                                                                     Bayley & Taylor (2007)
Total Fees to Public  Opportunity  Higher fees associated with less  Frankel et al. (2002)
Accounting Firms                   independence
Unexpecter Revenue    Opportunity  Firms that artificially manage    Perols & Lougee (2009)
per Employee                       earning will experience inflated
                                   revenue per employee
Volatile Industries    Motivation  Volatile industries may be more   Rasheed and Prescott (1992);
                                   prone to earnings smoothing in    Palmer and Wiseman (1999);
                                   order to convey a signal of       Ndofor et al. (2015)
                                   stability to investors
Working Capital        Motivation  Measure for financial liquidity   Perols & Lougee (2009); Kaminski
                                                                     et al. (2004)
                                             103


                            APPENDIX C. DESCRIPTIVE STATISTICS AND CORRELATIONS
                       Table 10. Risk Factor Descriptives and Point-Biserial Correlations (n=760) 25
                                                                 Fraud                  Non-Fraud            Point-Biserial
                        Risk Factor                        Mean         SD         Mean          SD           Corrrelation
         Accounts Payable                                 607.37      607.37       220.00     1,549.17           0.10*
         Audit Fees                                      1,980.14    1,980.14     1,488.95    3,545.03            0.07
         Auditor Change                                     0.08        0.08         0.08        0.28            -0.01
         Big Four Auditors                                  0.79        0.79         0.71        0.46            0.10*
         Book to Market Ratio                               0.58        0.58         0.51        0.92             0.05
         Capital Stocks                                     3.51        3.51         1.43       16.41             0.05
         Cash and Equivalents                             644.45      644.45       570.86     2,864.84            0.01
         Cash Margin                                        0.10        0.10        -0.09        2.11             0.05
         CEO Duality                                        0.96        0.96         0.94        0.24             0.06
         Change in Cash Sales                               0.24        0.24         0.16        0.95             0.04
         Change in Free Cash Flows                         -0.03       -0.03        -0.04        0.41             0.02
         Change in Non Cash Operating Assets                0.06        0.06         0.06        0.31             0.00
         Change in Reveivables                              0.03        0.03         0.01        0.07            0.08*
         Common Shares Outstanding                        195.44      195.44       196.37      844.79             0.00
         Cost of Goods Sold                              2,881.01    2,881.01     1,521.45    9,538.42            0.07
         Current Assets                                  1,899.70    1,899.70     1,235.21    5,863.88            0.05
         Current Liabilities                             1,305.78    1,305.78      672.18     3,571.74           0.08*
         Debt in Current Liabilities                      234.33      234.33        81.54      672.91            0.09*
         Debt Issuance                                    552.61      552.61       243.11     1,601.23            0.07
         Depreciation and Amortization                    198.47      198.47       176.46      844.77             0.01
         Depreciation Index                                 1.09        1.09         1.06        0.45             0.03
25 SD = Standard Deviation; Point-Biserial Correlations with Fraud measure (1=Fraud, 0=Non-Fraud); * p < .05
                                                                   104


                                          Table 10 (cont’d)
                                               Fraud               Non-Fraud       Point-Biserial
               Risk Factor              Mean           SD     Mean           SD     Corrrelation
Income Before Extraodinaries            106.12      106.12    279.05      1,518.99     -0.04
Income Taxes                            105.17      105.17    157.42      1,096.50     -0.03
Interest Expense                        116.56      116.56     30.30       101.65      0.09*
Investments and Equivalents             241.92      241.92    124.19       763.61       0.06
Long Term Debt                         1,531.45    1,531.45   518.36      2,076.32     0.09*
Net Income                              -36.18       -36.18   294.80      1,567.09     -0.04
Net Sales                              4,062.69    4,062.69  2,593.35    12,862.22      0.05
Nonaudit Fees                         826,349.19 826,349.19 325,463.92 642,561.38      0.15*
Officer Change                            0.39         0.39     0.38         0.49       0.01
Property Plant Equipment               3,769.06    3,769.06  2,021.05    11,627.64      0.04
Retained Earnings                       579.25      579.25   1,072.26     6,141.83     -0.03
Retained Earnings on Assets              -0.22        -0.22    -1.88         8.35      0.14*
Return on Assets                          0.00         0.00    -0.13         0.91      0.10*
Sale of Stock                           124.06      124.06     43.56       186.60       0.04
Short Term Investments                  163.41      163.41    197.82      1,166.01     -0.02
Soft Assets Ratio                         0.62         0.62     0.54         0.24      0.16*
Stock Price at Year End                  22.21        22.21    18.57        39.78       0.06
Taxes Payable                            23.23        23.23    43.32       252.27      -0.05
Total Assets                           6,346.42    6,346.42  3,032.66    13,974.58      0.07
Total Common Equity                    2,708.94    2,708.94  1,548.41     7,152.81      0.05
Total Fees to Public Accounting Firms  2,806.49    2,806.49  1,814.41     3,894.78     0.11*
Total Inventories                       395.63      395.63    228.85      1,328.49      0.06
Total Liabilities                      3,529.50    3,529.50  1,462.22     7,241.99     0.09*
Total Receivables                       683.84      683.84    358.64      2,182.63      0.07
Working Capital                           0.01         0.01    -0.01         0.15      0.09*
                                                 105


      APPENDIX D. KEY TO FEATURE IMPORTANCE (FIGURE 11)
• total.assets = Total Assets
• net.sales = Net Sales
• nonaudit.fees = Nonaudit Fees
• cash.invst = Cash and Equivalents
• reoa = Retained Earnings on Assets
• cogs = Cost of Goods Sold
• ppe = Property, Plant and Equipment
• csho = Common Shares Outstanding
• audit.fees = Audit Fees
• soft.assets = Percentage of Soft Assets
• current.assets = Current Assets
• total.rec = Total Receivables
• ap = Accounts Payable
• total.liab = Total Liabilities
• interest.exp = Interest Expense
• sale.stocks = Sale of Stocks
• current.liab = Current Liabilities
• invt = Total Inventories
• debt.cl = Debt in Current Liabilities
• lt.debt = Long Term Debt
• big.four = Big Four Auditor
• auditor.change = Change in Auditors
                                     106


                         APPENDIX E. SUPPLEMENTAL ANALYSIS
                       Table 11. Classification Results with Years Included
                            Random Forest       Random Forest (with    Logistic Regression Logistic Regression
                            (without years)             years)           (without years)      (with years)
                           n estimators: 183,     n estimators: 183,
                          min samples split: 4,  min samples split: 4,
                          min samples leaf: 3,   min samples leaf: 3,
   Model Specification                                                          n/a                 n/a
                          max leaf nodes: 48,    max leaf nodes: 48,
                            max features: 5,       max features: 5,
                             max depth: 110         max depth: 110
                    TP             19                     18                     64                  42
                   TN             2543                   2549                  1641                2235
                    FP             56                     50                    958                 364
                   FN              80                     81                     35                  57
             Accuracy            0.950                  0.951                 0.632               0.844
           Sensitivity           0.192                  0.182                 0.646               0.424
            Precision            0.253                  0.265                 0.063               0.103
           Specificity           0.978                  0.981                 0.631               0.860
            AUC (CV)          0.846(.867)            0.855(.844)           .659(.661)          .642(.631)
       The above analysis shows the changes in prediction results from the random
forest algorithm compared to logistic regression when the years of the relevant
period are included as an input measure. Random forest showed minimal changes
in classification performance, while logistic regression saw a trade-off between
specificity and precision. There was an increase in accuracy but decrease in overall
AUC. Since these changes are relatively small, we can conclude that year to year
variations do not impact predictive power in the classification of fraud cases.
                                                       107


BIBLIOGRAPHY
     108


                                    BIBLIOGRAPHY
Abbott, L. J., Parker, S., & Peters, G. F. (2004). Audit committee characteristics and
   restatements. Auditing: A journal of practice & theory, 23(1), 69-87.
ACFE. (2018). Report to the Nations on Occupational Fraud and Abuse: 2018 Global
   Study on Occupational Fraud and Abuse, Association of Certified Fraud
   Examiners.
Alexander, C. R., & Cohen, M. A. (1996). New evidence on the origins of corporate
   crime. Managerial and Decision Economics, 17(4), 421-435.
Al-Khazali, O. M., & Zoubi, T. A. (2005). Empirical testing of different alternative
   proxy measures for firm size. Journal of Applied Business Research (JABR),
   21(3).
Barak, G. (2012). Theft of a nation: Wall street looting and federal regulatory
   colluding. Rowman & Littlefield Publishers.
Barnes, G., & Hyatt, J. (2012). Using random forest risk prediction in the
   Philadelphia probation department. Washington, DC: National Criminal Justice
   Reference Service, Document NCJ241346.
Baucus, M. S., & Near, J. P. (1991). Can illegal corporate behavior be predicted? An
   event history analysis. Academy of management Journal, 34(1), 9-36.
Beasley, M. S. (1996). An empirical analysis of the relation between the board of
   director composition and financial statement fraud. Accounting review, 443-465.
Beasley, M. S., Carcello, J. V., Hermanson, D. R., & Lapides, P. D. (2000).
   Fraudulent financial reporting: Consideration of industry traits and corporate
   governance mechanisms. Accounting horizons, 14(4), 441-454.
Beccaria, C. (1764). On crimes and punishment. Trans by Paolucci. H. IN: Bobbs-
   Merrill.
Bell, T. B., & Carcello, J. V. (2000). A decision aid for assessing the likelihood of
   fraudulent financial reporting. Auditing: A Journal of Practice & Theory, 19(1),
   169-184.
Bellman, R. (1961). Curse of dimensionality. Adaptive control processes: a guided
   tour. Princeton, NJ.
                                           109


Beneish, M. D. (1999). The detection of earnings manipulation. Financial Analysts
   Journal, 55(5), 24-36.
Benson, M. (2010). Evolutionary ecology, fraud, and the global financial crisis. In R.
   Rosenfeld, K. Quinet, and C. Garcia (Eds.), Contemporary Issues in
   Criminological Theory and Research: The Role of Social Institutions (pp. 299–
   306). Belmont, CA: Wadsworth.
Benson, M., Madensen T.D., & Eck, J.E. (2009). White Collar Crime from an
   Opportunity Perspective. In S. S. Simpson & D. Weisburd (Eds.), The
   criminology of white-collar crime (Vol. 228). New York, NY: Springer.
Benson, M. L., & Simpson, S. S. (2009). White collar crime: An opportunity
   perspective. New York, NY: Routledge.
Bentham,J. (1962). Principles of penal law. In John Bowring (Ed.), The Works of
   Jeremy Bentham (p.396). NewYork: Russell and Russell.
Berk, R. (2017). An impact assessment of machine learning risk forecasts on parole
   board decisions and recidivism. Journal of Experimental Criminology, 13(2), 193-
   216.
Bernard, T. J., & Snipes, J. B. (1996). Theoretical Integration in Criminology (From
   Crime and Justice: A Review of Research, Volume 20, P 301-348, 1996, Michael
   Tonry, ed.--See NCJ-161959).
Bhasin, M. L. (2013). Corporate accounting fraud: A case study of Satyam
   Computers Limited. Open Journal of Accounting, 2, 26-38.
Bloomfield, R. J. (2002). The 'incomplete revelation hypothesis' and financial
    reporting. Cornell University Working Paper
Bloomfield, R. (2008). Discussion of “annual report readability, current earnings,
    and earnings persistence”. Journal of Accounting and Economics, 45(2-3), 248-
    252.
Bolton, R. J., & Hand, D. J. (2002). Statistical fraud detection: A review. Statistical
   science, 235-249.
Bozanic, Z., Dirsmith, M. W., & Huddart, S. (2012). The social constitution of
   regulation: The endogenization of insider trading laws. Accounting,
   Organizations and Society, 37(7), 461-481.
                                          110


Braithwaite, J. (2016). In search of Donald Campbell: Mix and multimethods.
   Criminology & Public Policy, 15, 417.
Brazel, J. F., Jones, K. L., Thayer, J., & Warne, R. C. (2015). Understanding
   investor perceptions of financial statement fraud and their use of red flags:
   Evidence from the field. Review of Accounting Studies, 20(4), 1373-1406.
Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a
   rejoinder by the author). Statistical science, 16(3), 199-231.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Brennan, N. M., & McGrath, M. (2007). Financial statement fraud: Some lessons
   from US and European case studies. Australian accounting review, 17(42), 49-
   61.
Buller, D. B., & Burgoon, J. K. (1996). Interpersonal deception theory.
   Communication theory, 6(3), 203-242.
Burns, N., & Kedia, S. (2006). The impact of performance-based compensation on
   misreporting. Journal of financial economics, 79(1), 35-67.
Carcello, J. V., & Nagy, A. L. (2004). Audit firm tenure and fraudulent financial
   reporting. Auditing: a journal of practice & theory, 23(2), 55-69.
Chan, F., & Gibbs, C. (2022). When guardians become offenders: Understanding
   guardian capability through the lens of corporate crime. Criminology.
Chan, P. K., & Stolfo, S. J. (1998). Toward Scalable Learning with Non-Uniform
   Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In
   KDD, 98, 164-168.
Chen, M. Y. (2011). Bankruptcy prediction in firms with statistical and intelligent
   techniques and a comparison of evolutionary computation approaches.
   Computers & Mathematics with Applications, 62(12), 4514-4524.
Chollet, F. (2018). Keras: The python deep learning library. Astrophysics Source
   Code Library.
Clinard, M. B., & Yeager, P. C. (2006). Corporate crime. New Brunswick, NJ.
Cohan, J. A. (2002). " I didn't know" and" I was only doing my job": has corporate
   governance careened out of control? A case study of enron's information myopia.
   Journal of Business Ethics, 40(3), 275-299.
                                           111


Cohen, L. E., & Felson, M. (1979). Social change and crime rate trends: A routine
   activity approach. American sociological review, 588-608.
Coleman, J. W. (1987). Toward an integrated theory of white-collar crime. American
   journal of Sociology, 93(2), 406-439.
Commers, M. S., & Vandekerckhove, W. (2004). Whistle blowing and rational
   loyalty. Journal of Business Ethics, 53(1), 225-233.
Cressey, D. R. (1953). Other people's money; a study of the social psychology of
   embezzlement. New York, NY, US: Free Press.
Cropanzano, R., Byrne, Z. S., Bobocel, D. R., & Rupp, D. E. (2001). Moral virtues,
   fairness heuristics, social entities, and other denizens of organizational justice.
   Journal of vocational behavior, 58(2), 164-209.
Dang, C., Li, Z. F., & Yang, C. (2018). Measuring firm size in empirical corporate
   finance. Journal of banking & finance, 86, 159-176.
Davis, L. R., Soo, B. S., & Trompeter, G. M. (2007). Auditor tenure and the ability to
   meet or beat earnings forecasts. Contemporary Accounting Research, 26(2), 517-
   548.
Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1996). Causes and consequences of
   earnings manipulation: An analysis of firms subject to enforcement actions by
   the SEC. Contemporary accounting research, 13(1), 1-36.
Dechow, P. M., Hutton, A. P., Kim, J. H., & Sloan, R. G. (2012). Detecting earnings
   management: A new approach. Journal of accounting research, 50(2), 275-334.
Duwe, G., & Kim, K. (2016). Sacrificing accuracy for transparency in recidivism risk
   assessment: The impact of classification method on predictive performance.
   Corrections, 1(3), 155-176.
Dyck, I. J., Morse, A., & Zingales, L. (2013). How pervasive is corporate fraud?
   Rotman School of Management Working Paper, (2222608).
Efendi, J., Srivastava, A., & Swanson, E. P. (2007). Why do corporate managers
   misstate financial statements? The role of option compensation and other
   factors. Journal of financial economics, 85(3), 667-708.
                                           112


Erickson, M., Hanlon, M., & Maydew, E. L. (2006). Is there a link between executive
   equity incentives and accounting fraud?. Journal of accounting research, 44(1),
   113-143.
Fanning, K. M., & Cogger, K. O. (1998). Neural network detection of management
   fraud using published financial data. Intelligent Systems in Accounting, Finance
   & Management, 7(1), 21-41.
Farber, D. B. (2005). Restoring trust after fraud: Does corporate governance
   matter?. The accounting review, 80(2), 539-561.
Farrington, D. P. (2000). Explaining and preventing crime: The globalization of
   knowledge—The American Society of Criminology 1999 presidential address.
   Criminology, 38(1), 1-24.
Farrington, D.P. (2005). Integrated developmental and life-course theories of
   offending. Transaction Publishers: New Brunswick, NJ.
FBI. (2018). White-Collar Crime. US Department of Justice. Retrieved from
   https://www.fbi.gov/investigate/white-collar-crime
Feroz, E. H., Park, K., & Pastena, V. S. (1991). The financial and market effects of
   the SEC's accounting and auditing enforcement releases. Journal of accounting
   research, 29, 107-142.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Grömping, U. (2009). Variable importance assessment in regression: linear
   regression versus random forest. The American Statistician, 63(4), 308-319.
Gross, E. (1980). Organizational structure and organizational crime. White collar
   crime: Theory and research, 52-76.
Hartshorn, S. (2016). Machine Learning With Random Forests And Decision Trees:
   A Visual Guide For Beginners. Kindle Edition.
Hastie, T. J. (2017). Generalized additive models. In Statistical models in S (pp.
   249-307). Routledge.
He, H., & Ma, Y. (Eds.). (2013). Imbalanced learning: foundations, algorithms, and
   applications.
                                          113


Hill, C. W., Kelley, P. C., Agle, B. R., Hitt, M. A., & Hoskisson, R. E. (1992). An
   empirical examination of the causes of corporate wrongdoing in the United
   States. Human Relations, 45(10), 1055-1076.
Holtfreter, K., Van Slyke, S., Bratton, J., & Gertz, M. (2008). Public perceptions of
   white-collar crime and punishment. Journal of Criminal Justice, 36(1), 50-60.
Humpherys, S. L., Moffitt, K. C., Burns, M. B., Burgoon, J. K., & Felix, W. F. (2011).
   Identification of fraudulent financial statements using linguistic credibility
   analysis. Decision Support Systems, 50(3), 585-594.
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and
   prospects. Science, 349(6245), 255-260.
Karpoff, J. M., Koester, A., Lee, D. S., & Martin, G. S. (2017). Proxies and databases
   in financial misconduct research. The Accounting Review, 92(6), 129-163.
Koh, K., Matsumoto, D. A., & Rajgopal, S. (2008). Meeting or beating analyst
   expectations in the post-scandals world: Changes in stock market rewards and
   managerial actions. Contemporary Accounting Research, 25(4), 1067-1098.
Kramer, R. M. (1999). Trust and distrust in organizations: Emerging perspectives,
   enduring questions. Annual review of psychology, 50(1), 569-598.
Kranacher, M. J., Riley, R. A, Jr, & Wells, J. T. (2011). Forensic accounting and
   fraud examination. New York: Wiley.
Lin, J. W., Hwang, M. I., & Becker, J. D. (2003). A fuzzy neural network for
   assessing the risk of fraudulent financial reporting. Managerial Auditing
   Journal, 18(8), 657-665.
Liu, C., Chan, Y., Alam Kazmi, S. H., & Fu, H. (2015). Financial fraud detection
   model: based on random forest. International journal of economics and finance,
   7(7).
Lewicki, R. J., Wiethoff, C., & Tomlinson, E. C. (2005). What is the role of trust in
   organizational justice. Handbook of organizational justice, 247-270.
McCarthy, K., Zabar, B., & Weiss, G. (2005, August). Does cost-sensitive learning
   beat sampling for classifying rare classes?. In Proceedings of the 1st
   international workshop on Utility-based data mining (pp. 69-77).
McCornack, S. A. (1992). Information manipulation theory. Communications
   Monographs, 59(1), 1-16.
                                            114


McKendall, M. A., & Wagner III, J. A. (1997). Motive, opportunity, choice, and
   corporate illegality. Organization Science, 8(6), 624-647.
Mesmer-Magnus, J. R., & Viswesvaran, C. (2005). Whistleblowing in organizations:
   An examination of correlates of whistleblowing intentions, actions, and
   retaliation. Journal of business ethics, 62(3), 277-297.
Müller, A. C., & Guido, S. (2016). Introduction to machine learning with Python: a
   guide for data scientists. " O'Reilly Media, Inc.".
Myers, J. N., Myers, L. A., & Omer, T. C. (2003). Exploring the term of the auditor-
   client relationship and the quality of earnings: A case for mandatory auditor
   rotation?. The accounting review, 78(3), 779-799.
Neuilly, M. A., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism
   in homicide offenders using classification tree analysis. Homicide studies, 15(2),
   154-176.
Paternoster, R. (2016). Deterring Corporate Crime: Evidence and Outlook.
   Criminology & Public Policy, 15, 383.
Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and
   machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2),
   19-50.
Perols, J. L., Bowen, R. M., Zimmermann, C., & Samba, B. (2016). Finding needles
   in a haystack: Using data analytics to improve fraud prediction. The Accounting
   Review, 92(2), 221-245.
Perols, J. L., & Lougee, B. A. (2011). The relation between earnings management
   and financial statement fraud. Advances in Accounting, 27(1), 39-53.
Pflueger, M. O., Franke, I., Graf, M., & Hachtel, H. (2015). Predicting general
   criminal recidivism in mentally disordered offenders using a random forest
   approach. BMC psychiatry, 15(1), 62.
Poppo, L., & Zenger, T. (2002). Do formal contracts and relational governance
   function as substitutes or complements?. Strategic management journal, 23(8),
   707-725.
Prechel, H., & Morris, T. (2010). The effects of organizational and political
   embeddedness on financial malfeasance in the largest US corporations:
                                          115


   Dependence, incentives, and opportunities. American Sociological Review, 75(3),
   331-354.
PwC. (2018). Global Economic Crime and Fraud Survey. London, U.K.: PwC.
Rigano, C. (2018). Using Artificial Intelligence to Address Criminal Justice Needs.
   National Institute of Jusitice Journal, 280, 37-46.
Ritter, N. (2013). Predicting Recidivism Risk: New Tool in Philadelphia Shows
   Great Promise. National Institute of Jusitice Journal, 271, 4-13.
Rorie, M., Alper, M., Schell-Busey, N., & Simpson, S. S. (2018). Using meta-analysis
   under conditions of definitional ambiguity: the case of corporate crime. Criminal
   Justice Studies, 31(1), 38-61.
Salehi, M., & Fard, F. Z. (2013). Data mining approach to prediction of going
   concern using classification and regression tree (CART). Global Journal of
   Management and Business Research Accounting and Auditing, 13.
Schell-Busey, N., Simpson, S. S., Rorie, M., & Alper, M. (2016). What works? A
   systematic review of corporate crime deterrence. Criminology & Public Policy,
   15(2), 387-416.
Schuchter, A., & Levi, M. (2016). The fraud triangle revisited. Security Journal,
   29(2), 107-121.
Seifert, D. L., Sweeney, J. T., Joireman, J., & Thornton, J. M. (2010). The influence
   of organizational justice on accountant whistleblowing. Accounting,
   Organizations and Society, 35(7), 707-717.
Shadish, W., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-
   experimental designs for generalized causal inference. Boston, MA: Houghton
   Mifflin.
Shover, N., & Hochstetler, A. (2005). Choosing white-collar crime. Cambridge
   University Press.
Simpson, S. S. (2013). White-collar crime: A review of recent developments and
   promising directions for future research. Annual Review of Sociology, 39, 309-
   331.
Simpson, S. S., Garner, J., & Gibbs, C. (2007). Why do corporations obey
   environmental law. National Institute of Justice. Washington, DC: US
   Department of Justice.
                                           116


Simpson, S. S., & Koper, C. S. (1997). The changing of the guard: Top management
   characteristics, organizational strain, and antitrust offending. Journal of
   Quantitative Criminology, 13(4), 373-404.
Simpson, S., Rorie, M., Alper, M. E., Schell-Busey, N., Laufer, W., & Smith, N. C.
   (2014). Corporate crime deterrence: A systematic review. Campbell systematic
   reviews, 10(4).
Simpson, S. S., & Yeager, P. C. (2015). Building a Comprehensive White-Collar
   Violations Data System. Washington, DC: Bureau of Justice Statistics.
SEC. (2018). Accounting and Auditing Enforcement Releases No. 4012. Retrieved
   from https://www.sec.gov/litigation/admin/2018/33-10601.pdf
Schwartz, J., Steffensmeier, D., Moser, W. J., & Beltz, L. (2020). Financial
   Prominence and Financial Conditions: Risk Factors for 21st Century Corporate
   Financial Securities Fraud in the United States. Justice Quarterly, 1-29.
Smith, J. R., Tiras, S. L., & Vichitlekarn, S. S. (2000). The interaction between
   internal control assessment and substantive testing in audits for fraud.
   Contemporary Accounting Research, 17(2), 327-356.
Sullivan, W. (2017). Machine Learning For Beginners Guide Algorithms:
   Supervised & Unsupervsied Learning. Decision Tree & Random Forest
   Introduction. Healthy Pragmatic Solutions Inc.
Throckmorton, C. S., Mayew, W. J., Venkatachalam, M., & Collins, L. M. (2015).
   Financial fraud detection using vocal, linguistic and financial cues. Decision
   Support Systems, 74, 78-87.
Trompeter, G. M., Carpenter, T. D., Desai, N., Jones, K. L., & Riley Jr, R. A. (2012).
   A synthesis of fraud-related research. Auditing: A Journal of Practice & Theory,
   32(sp1), 287-321.
Trompeter, G. M., Carpenter, T. D., Jones, K. L., & Riley Jr, R. A. (2014). Insights
   for research and practice: What we learn about fraud from other disciplines.
   Accounting Horizons, 28(4), 769-804.
Unnever, J. D., Benson, M. L., & Cullen, F. T. (2008). Public support for getting
   tough on corporate crime: Racial and political divides. Journal of Research in
   Crime and Delinquency, 45(2), 163-190.
                                          117


Vaughan, D. (2005). Organizational rituals of risk and error. Organizational
   encounters with risk, 33-66.
Wang, X., & Holtfreter, K. (2012). The effects of corporation-and industry-level
   strain and opportunity on corporate crime. Journal of Research in Crime and
   Delinquency, 49(2), 151-185.
Wolfe, D. T., & Hermanson, D. R. (2004). The fraud diamond: Considering the four
   elements of fraud. The CPA Journal, 74(12), 38-42.
Woodcock, R. A. (2019). The Antitrust Case for Consumer Primacy in Corporate
   Governance. UC Irvine L. Rev., 10, 1395.
Young, R. (2013). The role of organizational justice as a predictor of intent to comply
   with internal disclosure policies. Journal of Accounting and Finance, 13(6), 29-
   44.
Zajac, E. J., & Olsen, C. P. (1993). From transaction cost to transactional value
   analysis: Implications for the study of interorganizational strategies. Journal of
   management studies, 30(1), 131-145.
Zuckerman, M., DePaulo, B. M., & Rosenthal, R. (1981). Verbal and nonverbal
   communication of deception. In Advances in experimental social psychology (Vol.
   14, pp. 1-59). Academic Press.
                                          118