r [.7‘PIIJ '- ritf- .. .|I.I l- ... V ' O '7‘ I l “33E?!" l 3m“: F a . ‘ ‘ . . ca {it C vl I. . . ;I 5.51 .o!!! I . . In. 31.32.! .I 10.515:- ‘0 . V .x... ~51. Qvfiutt :inssri s?2t..9§ pl: . .2 . ‘ I art"! . .IP.or|r.l; Iva a!!!;nlp)n.r..flv( .51.!f.lvp!.(.\rl.n1c- VIII U,u»l1!.-$'1i1lvl=h||.l ’l’ nrvol it).£\.l. -)_\l'2 II... .I 12.44.. [tort .vC :(c ‘!;.: 3.9.5! {v '1‘...“qu 2:! ~92? . 1-0.3.- . \Jl‘lllv.|.A.I.n II . (I . . .. . . A 55:95.1 ‘ , gmwgfifh “vi . . .....zf :u...v4r39. it... ‘5...1P.:.,: .YKNNAKMQVD: ‘uv ‘ . . . ... ‘ 2. .1...)JJ.-.o.!!uJ-. . In: JEN ‘ . . , . £5.37»: .1500... yr. I. .34.».5 / . I . THESIS MIICH IGANS SUTATE III I IIIIII IIIIII I'II'IIIII” 293 00897J r8797 This is to certify that the dissertation entitled THE CONCEPTUAL MODELING AND AUTOMATED USE OF RECONSTRUCTIVE ACCOUNTING DOMAIN KNOWLEDGE presented by STEPHEN RAYMOND ROCKWELL has been accepted towards fulfillment of the requirements for Ph . D . degree in Account ingL “Mama/14cc»? Major professor William E. McCarthy Date g3 [La/ma» /?951 MSU is an Affirmative Action/Equal Opportunity Institution 0- 12771 LIBRARY Michigan State I University L_._ PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE l | TI I —ll MSU I: An Affirmative ActiorVEqual Opportunity limitation cmmiI-DA ———j THE CONCEPTUAL MODELING AND AUTOMATED USE OF RECONSTRUCTIVE ACCOUNTING DOMAIN KNOWLEDGE by STEPHEN RAYMOND ROCKWELL A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Accounting 1992 ABSTRACT THE CONCEPTUAL MODELING AND AUTOMATED USE OF RECONSTRUCTIVE ACCOUNTING DOMAIN KNOWLEDGE by STEPHEN RAYMOND ROCKWELL This thesis evaluates the use of domain-specific knowledge in the automation of the database design task of conceptual modeling. The primary research task was examination of how that knowledge could be used to resolve problems in the modeling task of view integration. Another task was exploration of the acquisition and use of reconstructive domain knowledge. A knowledge-based view integration system, REAVIEWS, was created for those purposes. It is the first knowledge-based system to use domain-specific theory to structure view integration. Industry-specific accounting knowledge was reconstructed from the Encyclopedia of Accounting Systems. Knowledge of first-order accounting principles was represented by means of the REA accounting model. REAVIEWS demonstrates additional problem solving ability from the use of such domain knowledge. This thesis discusses the design and implementation of REAVIEWS, including the acquisition, modeling, and use of domain knowledge for problem resolution. This thesis also offers insight into the process of modeling knowledge originally compiled into a form less suitable for use in knowledge-based systems. Copyright by STEPHEN RAYMOND ROCKWELL 1992 To Shere and Kate, with love iv ACKNOWLEDGEMENTS I would like to thank Severin Grabski and Jon Sticklen for the help and insight they provided in this research and in mw' scholarly pursuits at Michigan State University. Words seem. small recompense for the countless hours of discussion, debate, guidance, support, and friendship (and did I mention patience?) provided by William.McCarthy during this whole process. I am deeply indebted to Bill for all his assistance and inspiration. The greatest debt of all is owed to my loving wife Shere and daughter Kate. Without their love, cheerfulness, and limitless patience, this work would not have been possible. TABLE OF CONTENTS LI ST OF TABLES O O O O O O O O O O O O O O O O O O O 0 LI ST OF FIGURES O O O O O O O O O O O O O O O O O O O 0 Chapter 1. Introduction . . . . . . . . . . . . . . . Chapter 2. Accounting Database Systems and Conceptual Modeling . . . . . . . . . . . . . 2.1 Accounting Database Systems . 2.2 Semantic Data Models . . . . . . . . . . . . . 2.2.1 Conceptual Modeling. . . . . . . . . . . . 2.2.2 REA modeling. . . . . . . . . . . . . . . . 2.2.3 Problems in View Integration. . . . . . . . 2.2.4 Expert Systems for Conceptual Mode ling. . . 2.3 Integration Example . . . . . . . . . . . . . . 2.4 Integration Strategies . . . . . . . . . . . . 2.5 Knowledge-Based Modeling Systems . . . . . . 2.6 Conflict Resolution Using Accounting Knowledge 2.7 Accounting Knowledge Structured as E-R Templates 2.8 Overview of the Integration Process . . . . . . Chapter 3. Accounting Domain Knowledge in View Integration . . . . . . . . . . . . . . . . . . . . 3.1 Levels of Accounting Domain Knowledge . . . . . 3.2 Sources of Accounting Domain Knowledge . . . . 3.2.1 Principles Level Knowledge. . . . . . . . . 3.2.2 Industry Level Knowledge . . . . . . . . . 3.2.3 Company Level Knowledge . . . . . . . . . . 3.3 Conceptual Modeling of Principles Level nowledge. . . . . . . . . . . . . . . . . . . . Conceptual Modeling of Industry Level Knowledge Overall Knowledge Acquisition Strategy . . . . 1 Narrative. . . . . . . . . . . . . . . . . 2 Chart of Accounts. . . . . . . . . . . . . 3 Documents. . . . . . . . . . . . . . . . . 4 Organization Chart. . . . . . . . . . 3. 3. .5. .5. .5. .5. Selection of Entity Keys . . . . . . . . Composite-Key Entities . . . Relationships and Structural WConstrai nts Relationships in the EAS . . . . . . . . 3.9.1 Assigning Structural Constraints. . . 3.10 Modeling Compromise . . . . . . . . . . 3.11 Integration Conflict Resolution . . . . . . . K 4 5 3 3 3 3 6 7 8 9 3. 3. 3. 3. vi 11 12 15 18 2O 22 24 26 27 32 34 34 36 37 38 4O 41 43 46 47 49 54 55 58 59 63 64 67 7O 75 vii 3.11.1 Basic Problem Solving Concepts. . . . . . 75 3.12 View Integration Strategies . . . . . . . . . 77 3.12.1 Initial Schema Processing. . . . . . . . . 79 3.12.2 Entity Identification. . . . . . . . . . . 80 3.12.3 Relationship Identification. . . . . . . . 81 3.13 View Conflict Recognition . . . . . . . . . . 81 3.13.1 Homonyms. . . . . . . . . . . . . . . . . 83 3.13.2 Synonyms. . . . . . . . . . . . . . . . . 85 3.13.3 Type Conflicts. . . . . . . . . . 85 3.13.4 Structural Constraint Conflicts. . . . . . 88 3.13.5 Key Conflicts. . . . . . . . . . . . . . . 91 Chapter 4. The REAVIEWS System . . . . . . . . . 93 4.1 Knowledge Structures within REAVIEWS . . . . . 93 4.1.1 Frame-based Knowledge Representations. . . 93 4.1.2 Declarative Structures for Accounting Domain Knowledge. . . . . . . . . . . . . . . . . . . 95 4.1.3 Procedural Structures for Accounting Domain Knowledge. . . . . . . . . . . . . . . . . . . 101 4.1.4 Structures for View Integration Knowledge. 102 4.2 Test Cases for REAVIEWS . . . . . . . . . . . . 103 4.3 Inputs tO REAVIEWS . . . . . . . . . . . . . . 109 4.4 View Integration Session . . . . . . . . . . . 110 4.4.1 Produce-Sales-Analysis View. . . . . . . . 110 4.4.2 Update-Work-in-Process View. . . . . . . . 114 4.4.3 Record-Payment View. . . . . . . . . . . . 118 4.4.4 Record-Payroll View. . . . . . . . . . . . 119 4.5 Outputs from REAVIEWS . . . . . . . . . . . . . 122 4.6 Software Environment . . . . . . . . . . . . . 123 Chapter 5. Summary and Contributions . . . . . . . . . 126 1 Limits of Scope for REAVIEWS . . . . . . . . . 127 2 Research Context and Justification . . . . . . 127 3 Contributions . . . . . . . . . . . . . . . . . 132 4 Future Research Directions . . . . . . . . . . 134 5 Final Conclusions . . . . . . . . . . . . . . . 136 APPENDIX: Major View Integration Processes in REAVIEWS 138 LI ST 0F REFERENCES 0 C O O O O O O O O O O O O O O O O 14 3 Table 2-1 Table 2-2 LIST OF TABLES Integration Conflicts . . Integration Strategies . . viii LIST OF FIGURES Figure 2-1 User Schema . . . . . . . . . . . . . . . Figure 2-2 The General REA Template . . . . . . . . . Figure 2-3 Three Partial_User Views . . . . . . . . . Figure 2-4 Interschema Properties . . . . . . . . Figure 2-5 Kinds of Knowledge That Can Go into a K88 . Figure 2-6 Enterprise Schema . . . . . . . . . . . Figure 3-1 Partially Instantiated REA Template . . . . Figure 3-2 Partial Chart of Accounts-—-Assets . . . Figure 3-3 Partial Chart of Accounts-—-Marketing Expense Figure 3-4 "Composite-Key" Entities . . . . . . . Figure 3-5 "Flow-Budget" and "Stock-Flow" Relationships Figure 3-6 EAS-derived Structural Constraints . . . . Figure 3-7 Vendor Service Entity . . . . . . . . . . . Figure 3-8 Flowchart of REAVIEWS’s Integration Strategy Figure 3-9 Expansion of Foreign-Key Attribute . . . . Figure 4-1 Partial Structure of Entity Frame . . . . . Figure 4-2 Partial Entity Hierarchy . . . . . . . . . Figure 4-3 User View-—-Produce-Sales-Analysis . . . . Figure 4-4 User View-—-Update-Work-In-Process . . . . Figure 4-5 User View-—-Record-Payment . . . . . . . . Figure 4-6 User View —-Record-Payroll . . . . . . . . Figure 4-7 REAVIEWS-—-Main Screen . . . . . . . . . Figure 4-8 REAVIEWS-— "Candidate Entity" Screen . . . Figure 4-9 REAVIEWS-—-Notification of Foreign-Key Expansion . . . . . . . . . . . . . . . . . . . . . Figure 4- -10 REAVIEWS-—-Request for Company-Level Knowledge . . . . . . . . . . . . . . . . . . . . Figure 4- 11 Partial Output from REAVIEWS’s Session 10 14 21 28 29 31 50 51 53 61 66 68 74 78 87 98 100 105 106 107 108 111 113 117 120 124 Figure 4- -12 Partial Schema Produced from REAVIEWS Output125 Figure 5-1 Scope of Pilot System . . . . . . . . . . Figure 5-2 The REACH System . . . . . . . . . . . . ix 128 129 Chapter 1. Introduction Research in the design of accounting information systems (AIS) has been heavily influenced by work in the areas of events accounting and conceptual database modeling. Events accounting approaches (Sorter 1969; Colantoni, Manes, and Whinston 1971; Everest and Weber 1977; McCarthy 1979,1982) support the construction of accounting systems that record information about economic events in a disaggregate, multidimensional format. Conceptual database modeling theories are concerned with the development of high-level, global descriptions of databases called conceptual schemas. To aid that development, researchers have developed various semantic models, such as the entity- relationship (E-R) model of Chen (1976). McCarthy (1979, 1982) combined the work of Sorter and Chen with other accounting theory, such as Ijiri (1975) and Mattesich (1964), to produce the REA framework for the design of accounting database systems. REA theory provides the advantages of events accounting and conceptual modeling, while incorporating important principles from the traditional "value" approach to accounting. Analysts usually construct the conceptual schemas from smaller models of individual user applications, referred to 1 2 as user views, or user schemas. These schemas are combined together in a process known as view integration. This process is extremely complex, as analysts must attempt to satisfy the competing data and processing requirements of the people and applications that will eventually utilize the database. Accurately representing and combining the varied needs and perspectives of the individual users make . conceptual modeling one of the most difficult and challenging activities in the system design process. As such, this activity has generated considerable research interest. Recently, some of this research has been directed toward the development of knowledge-based systems (KBS) to aid in the conceptual modeling task (see, for example, Mattos and Michels 1989 and Reiner et al. 1987). These systems serve several purposes, among them: 1” They add to our understanding of the conceptual modeling process; 2. ‘they help validate the various theories about conceptual models; and 3. ‘they aid in the transfer of expertise in the modeling task itself. While a number of systems of varying complexity have been developed in both academic and commercial settings, in general, they rely heavily upon the user during the view integration phase. To a greater or lesser extent, these systems "fail" when attempting to resolve some of the problems of view integration and must "ask” the user to do much of the work. This project suggests that much of this 3 failure is a result of the general nature of the knowledge underlying most of these systems. The knowledge that has been embedded in these systems is almost exclusively from the domains of conceptual modeling and database theory, with little or no formal representation of knowledge from the domain for which the database is being designed. This thesis further proposes that some of the view integration problems can be resolved by the use of specific knowledge from the application domain. This suggests that expert systems for conceptual modeling can be made more robust by incorporating in them the expertise of someone familiar with the application domain. For example, an expert system for modeling accounting database systems could contain accounting knowledge about the particular type of business being modeled. This would be similar to having an experienced accountant "looking over the shoulder" of the system user during the modeling process. Thus, when encountering view integration conflicts that cannot be resolved with generic modeling expertise, the system could access the more specific accounting knowledge for suggestions and insights. This is in contrast to existing systems that must stop and ask the user to supply this domain knowledge and, in effect, resolve integration problems with little guidance from the system itself. For this project, we designed a K88 (named Reaviews) for schema integration using such domain knowledge. Before that knowledge could be embedded in the K88, it had to be 4 acquired from the expert. Reconstructive methods of reasoning (Johnson 1983) explicate domain knowledge of facts and procedures that can be codified and used by knowledge- based systems. These methods are useful for unearthing expertise that has been "compiled" into a form that is missing some of the procedural or declarative knowledge originally possessed by the expert. Much of the accounting knowledge used in REAVIEWS is this type of reconstructive expertise. The remainder of this thesis is organized as follows. In Chapter 2, we explain the process and problems of view modeling and integration in more detail (including a brief discussion of the REA model). We also review associated research from accounting and computer science. Chapter 3 explains our use of accounting domain knowledge in the solving of integration problems. Chapter 4 contains an overview of the REAVIEWS prototype. The final chapter includes a brief summary; discussion of the research limitations, justification, and contributions; and some recommendations for future research suggested by this work. Chapter 2. Accounting Database Systems and Conceptual lodeling 2.1 Accounting Database Systems From the 19605 on, we have witnessed a rapid expansion in the use of computerized data processing in companies of all types. These technological advances have provided accountants with the opportunity to overcome some of the weaknesses of traditional double-entry, chart-of-accounts- based accounting systems. McCarthy (1982) explained some of these weaknesses of a conventional accounting system (as identified by two American Accounting Association research committees): 1. Its dimensions are limited. 2. Its classification schemes are not always appropriate. 3. Its aggregation level for stored information is too high. 4. Its degree of integration with the other functional areas of an enterprise is too restricted. In an effort to overcome these weaknesses, a number of accounting researchers advocated the design and use of accounting systems that record information about economic events in a disaggregate, multidimensional format. 6 Sorter (1969) referred to this as the "events" approach to accounting, as contrasted with traditional accounting theory, which he labeled the "value” approach. One advantage of the events approach is that its less- aggregate data can always produce the more aggregate, value- oriented reports when desired. One problem with the events approach is that it can require significantly more information to be recorded and used in the production of reports. The advent of computerized accounting systems offered some solutions to that problem. In particular, the database approach to information processing and management was seen by many researchers as an appropriate vehicle by which the events theories could be implemented in accounting information systems. Accounting system designers, regardless of their acceptance of the events theories, have recognized the advantages of capturing data beyond that recorded using traditional value approaches. The accounting databases of today have moved away from purely chart-of-accounts-based systems and routinely capture the type of disaggregate information called for by events proponents. Events researchers provided a theoretical foundation for recording information in this fashion. Accounting database designers, whether they were using these theories or not, have moved steadily in the events direction. Researchers have developed various accounting system models that take advantage of the database approach. Some 7 of the accounting systems were based upon the primary data models used in database implementations, namely, the hierarchical, network, and relational models. Those using the hierarchical database model included Colantoni, Manes, and Whinston (1971), Lieberman and Whinston (1975), and Haseman and Whinston (1976). Haseman and Whinston (1977) used the network data model, and Everest and Weber (1977) used the relational data model. McCarthy (1979,1982) used an events accounting approach as the basis for his model of an accounting database system. For his data model, however, he chose the E-R semantic model rather than one of the primary models. The following section discusses semantic data models in more detail and explains how they are used in the database design task of conceptual modeling. 2.: semantic Data Models The E-R model is only one of a number of semantic data models that have been proposed by researchers (e.g., Smith and Smith 1977; Mylopolous, Bernstein, and Wong 1980; Shipman 1981; Hammer and McLeod 1981; Hull and King 1987). Semantic data models offer more powerful tools than do the primary data models for representing the domain of interest in a database management systems (DBMS). The semantic models allow one to construct an abstract, high-level specification of the data underlying a database implementation. This specification, called a conceptual schema, describes the data in terms of the real-world 8 entities (and relationships among those entities) that are modeled by the database. Modeling the database in those terms facilitates communication between the user groups and the design team, making it easier to construct databases that capture the important concepts in the application domain. The conceptual schema is the intermediate level of the three-schema framework for database design (Tsichritzis and Klug 1978). The other levels consist of the external (user) schemas (which describe the database from the user application perspective) and the internal (physical) schema (which describes the physical layout of the data as implemented in a particular DBMS). The three-schema framework allows the user views to refer only to the conceptual schema and thus remain independent of the physical storage structures of the database implementation. The user and conceptual schemas view the database from the level of the real-world concepts (or objects) modeled by the_database. Those objects are classified as entities, relationships among the entities, or attributes (which describe the entities or relationships). The tasks involved with creating those schemas are referred to as conceptual data modeling. Because of its power and simplicity, the E-R model is perhaps the best known and most widely used tool in the data modeling process. Teorey et al. (1986) refer to the E-R model as the "premier model for conceptual design." 9 For those reasons, a version of the E-R model was the main conceptual modeling system used in this project. In E-R modeling, entities are the real-world objects we wish to model and are drawn as rectangles in the view models. Diamonds are used to represent relationships that exist between those entities. Entities can be further characterized with attributes that are used to describe and identify actual instances of the entities. Figure 2-1 shows one example of a user schema (also called a user view). In that view, the entity Sale has the attributes inv. I, sale amount, and customer. It is shown in a relationship with Inventory. Attributes with a solid circle are known as primary-key attributes and are unique identifiers of their respective entities. In this example, if we have an invoice number, we can identify one particular sales event and the values for its other attributes such as sale amount. We can also identify the various inventory items which were sold in that particular sales event. Non-key attributes are not necessarily unique identifiers. For example, there may be a several different sales with the same sale date, thus date can not be used to identify a specific sales event. For simplicity, the figures show only selected attributes; there are a number of other attributes that normally would be present in the complete user schemas. The numbers and letters to each side of a relationship indicate maximum cardinalities for the entities participating in that relationship. The relationship 10 23.5 0.8 0| :80! 8.80! 313860! 3.831 .eEogooL e551 3.8.868 0.1 8.8. 8.8 0! 2:9. 9L e .93 9.1 |—<1 H" Sale l—{m H" Inventory] Sales Person Figure 2-1 User Schema 11 between Sales Person and Sale is a "one-to-many" relationship (frequently written as l-N). The maximum cardinalities tell us that a single instance of the entity Sales Person may participate in many sales events, but a particular instance of the entity Sale will be associated with one, and only one, sales person. The terms one-to-one (1-1), one-to-sany (l-N), and sany-to-sany (M-N) are referred to collectively as cardinality ratios. 2.2.1 COnceptual MOdeling. Regardless of the choice of data model, analysts generally follow the three phase modeling process recommended by Lum et al. (1979). In the first phase, called requirements analysis, analysts gather data from a variety of sources to identify the organization's information needs and determine how the proposed database system will meet those needs. At this phase, analysts must identify and specify the following items (McCarthy 1982, p. 557): 1” the processes (and decisions) that use data; 2. ‘the various data elements themselves and their patterns of usage across processes; and 3. the various organizational constraints on data use. For each process identified during this phase, the analyst will prepare a list of data elements. Each list can be thought of as one view of the database, taken from the 12 perspective of the associated process. This view is referred to as a user view. During view modeling (the second phase in the modeling process) analysts convert the lists of data elements into one of the many data models that have been developed for this purpose. Semantic data models are currently the modeling vehicles of choice for most analysts. At the conclusion of the view modeling phase, the analyst has a number of individual user views that must be combined into one global data model during the final design phase, view integration. Batini et a1. (1986, 326) explain why user views are produced independently and why differences between views may occur: 1" The structure of the database for large applications (organizations) is too complex to be modeled by a single designer in a single view. 2. ‘User groups typically operate independently' in organizations and have their own requirements and expectations of data, which may conflict with other user groups. 2.2.2 REA modeling. E-R modeling constructs are used for the conceptual data modeling underlying the construction of REAVIEWS, the prototype knowledge-based system developed for this thesis. These modeling techniques are applied in a very specific theoretical framework for accounting database design, as described in McCarthy (1982). The REA theory of 13 data modeling uses E-R constructs to implement basic accounting principles in the process of accounting database design. As such, it provided an acceptable method of incorporating accounting domain knowledge into REAVIEWS. It also was used as an important part of the problem-resolution process. Knowledge of important accounting principles helps us recognize the theoretically ”more-correct” choice of modeling construct in some integration conflict situations. This is discussed at greater length in subsequent sections. Figure 2-2 shows a graphical representation of the major modeling constructs in the REA accounting model. That figure is drawn using the same E-R diagrammatic techniques as in Figure 2-1, and it illustrates what we refer to as an REA template. A major premise of REA accounting theory is that a complete accounting data model of any economic event will include all of the elements shown in the REA template. As more fully explained in McCarthy (1982), there are a number instances in which we may construct data models that incompletely specify the REA templates for certain events. Such instances may arise when they provide us with important system efficiencies or when accounting convention allows a less than complete specification of certain events. We discuss some of these "compromises" to the basic REA template later. For now, it is sufficient to know we can identify and model the major components of the REA template for most of the economic events of a firm. 14 ECONOMIC AGENT ® l ECONOMIC ECONOMIC RESOURCE EVENT mm- ECONOMIC Figure 2-2 The General REA Template [adapted from: Denna and McCarthy (1987)] 15 2.2.3 Problems in View Integration. Producing a global view of an organization's data resources provides benefits to system designers and users alike. Sowa (1984) states the "common aspect that unifies all groups [of system design specialists] is a knowledge of the meaning of the data and the constraints necessary to keep it a faithful model of the real world" (p. 303). The conceptual schema provides that knowledge and can be just as valuable to the users of the system. A global data schema helps decision makers understand how the various data files are related, and it potentially affords them a better understanding of the total data resources available in the firm. As mentioned previously, the integration of the independent user views becomes problematic due to the structural and semantic diversities that arise among designers when modeling concepts that are common to multiple user views. Researchers have proposed several causes for these diversities. The following causes are summarized from a comparative study of integration methodologies (Batini et al. 1986): 1” (Different perspectives among user groups or designers-—-the same concept or relationship may be given different names in different schemas, or the same relationship may be modeled directly in one schema and indirectly in another schema. 16 2. .Equivalence among constructs of the model —- there may be different combinations of constructs that still provide equivalent models of the application domain; e.g., a concept may be represented as an attribute of another concept in one schema while another schema shows both as entities with a relationship connecting them together. 3. (Incompatible design specifications-—-choices made concerning names, types, integrity constraints, etc. in one schema may be incompatible with choices made in another schema. In addition to identifying the previous causes of conflicts, they found the types of conflicts that occurred could be grouped into two main areas: naming conflicts and structural conflicts. Naming conflicts are of two varieties: homonyms, where two different entities are given the same name; and synonyms, where one entity is given different names in different schemas. Structural conflicts are conflicts between entity types, dependency constraints, keys, or insertion/deletion policies. Table 2-1 provides more complete descriptions of these types of conflicts. There are frequently relationships, referred to as interschema properties, between two different sets of objects that reside in different schemas. These interschema properties may not be evident when viewing the database from any individual user schema, and therefore they must be added in at the view integration stage. Specific examples of conflicts and interschema properties are provided here later. 17 Table 2-1 Integration Conflicts [definitions from: Batini et al. (1986)] Type Conflict Description Naming Structural Homonym Synonym Type Conflict Dependency Conflict Key Conflict Behavioral Conflict The same name is used for two different concepts, giving rise to inconsistency unless detected. For example, merging two entities of this type in the integrated schema would result in producing a single entity for two conceptually distinct objects. The same concept is described by two or more names. Keeping each name modeled as a distinct entity in the integrated schema would result in modeling a single object by means of multiple entities. The same concept is represented by different modeling constructs in different schemas. For example, a class of objects may be represented as an entity in; one schema and as an attribute in another schema. A group of concepts are related among themselves with different dependencies in different schemas. For example, a relationship between two entities may be shown as 1:1 in one schema, but mm in another schema. The same concept is assigned different keys in different schemas. For example. 884? and Emp_id may be the keys of Employee in two component schemas. The same class of objects is assigned different insertion/deletion policies in distinct schemas. For example, in one schema a department may be allowed to exist without employees. whereas in another, deleting the last employee associated with a department leads to the deletion of the department itself. 18 2.2.4 Expert systems far conceptual Modeling; Conceptual modeling may be viewed as an expert task-—-the performance by experts is significantly better than the performance of novices (e.g., Goldstein and Storey 1989). As such, researchers have attempted to model the task with expert systems (ES), with varying levels of success. Examples of such systems include DDEW (Reiner et al. 1987), EDDS (Choobineh et al. 1988), CHRIS (Furtado et al. 1988), Modeller (Tauzovich 1989), KRISYS (Mattos and Michels 1989), and Pasta-3 (Kuntz and Melchert 1989). Nearly all of the systems developed to date have been general purpose in nature. They have been designed to operate in multiple modeling domains. For example, the same tool might be used to model databases for an automotive manufacturer and a large university. In the view integration stage, these systems follow a number of conflict resolution strategies that are also generic in nature. One conclusion that can be drawn from research is that conceptual modeling requires significant amounts of knowledge from both the application domain and the field of conceptual modeling itself. The knowledge-based systems developed to date, however, have been imbued primarily with the latter type of knowledge. This is perhaps the result of the search for generality that has been a consistent theme in artificial intelligence (AI) research. John McCarthy, a pioneer in the field of AI, notes that lack of generality has long plagued AI programs and believes that "the problem 19 of generality in artificial intelligence (AI) is almost as unsolved as ever” (McCarthy 1987, p. 1030). Regardless of the reasons for the preference for domain-independent conceptual modeling ESs, they all share similar problems at view integration time. These problems are generally resolved manually by the system user, utilizing his or her application domain knowledge. This being the case, it appears that conceptual modeling systems can be made more robust by embedding within them this application-specific knowledge. While this approach loses some of the oft-sought generality, many researchers view loss of generality not as a problem, but as a key to the development of some knowledge-based systems. ES development in the 1960s and 1970s provided the insight that the power of an ES is derived from the specific knowledge it possesses, not from the particular formalisms and inference schemes it employs. In short, an expert’s knowledge per se seems both necessary and nearly sufficient to develop an expert system. (Turban 1990, 425) That is the view taken in this thesis. A knowledge-based system is being developed to explore and test the earlier proposal that additional types of information are needed in the view integration process. Chapter 3 examines the potential improvement to view integration systems provided by the use of more domain-specific knowledge. 20 2.3 Integration Example The discussion of integration processes and problems is more easily understood with a few simple user views as illustration. Figure 2-3 shows three user views: 2-3a is the view model of a sales transaction originally shown in Figure 2-1; 2-3b is a model of the issuance of a paycheck to an employee; and 2-3c is a model of a labor operation in which a worker completes a work-in-process job that is subsequently transferred to finished goods. The user views were synthesized from transactions in Armitage (1985), in which he provided a detailed example of database design for an actual firm in the machine-shop industry. User schemas are typically developed from formal specifications of the information needs of the users. A variety of methodologies have been developed for analyzing an organization's information requirements and producing that formal specification. The approach used in this project was developed by McCarthy, Rockwell, and Armitage (1989). It is a synthesis of structured analysis, as described in Gane and Sarson (1979), DeMarco (1979), and Yourdon (1989), and database design, as described Lum et al. (1979) and Teorey and Fry (1982). As presented, the three schemas in Figure 2-3 have a number of conflicts. The two Employee entities are homonyms; the entities Finished Goods and Inventory are synonyms; and there are two interschema properties that are not identified in the individual user views. We next look 21 8 Egg b"Pd" h 6:8 .ayayScema c. 'BuiId-it" Schema Figure 2-3 Three Partial User Views 22 at some of the methods that have been developed for handling such conflicts. These examples are meant for illustration, and they are not the actual cases developed for testing REAVIEWS. 2.4 Integration Strategies After producing the user views, analysts face the problems of integrating them into a global schema. In a comparative review of integration methodologies, Batini et al. (1986) found that integration strategies could be grouped into one of two primary classifications: binary, in which two schemas are integrated at a time; and n-ary, in which n schemas (n>2) are integrated at one time. Binary strategies could be further classified as ladder or balanced, while n-ary strategies could be divided into one- shot or iterative. Definitions for these terms are found in Table 2-2, as are graphical depictions of the strategies. The ladder strategy is used in this project, as it offers the following advantages to the integration process: 1” The integration task is simplified compared to n-ary strategies, as integration complexity increases with the number of schemas integrated at one time (although the number of integration operations is greater than with n-ary strategies); 2. 'there is more control over the order of integration of individual user views; and 3. at each stage, the intermediate schema can be given a higher importance in settling integration conflicts. 23 Table 2-2 Integration Strategies [adapted from: Batini et al. (1986)] . . Graphic Strategy Type Descnptlon Representation Ladder A new component schema is (binary) integrated with an existing intermediate result at each step. Balanced Schemas are divided into pairs at (binary) the start and are integrated in a symmetric fashion. One-shot The schemas are all integrated in (”'80’) a single step. Iterative Any n-ary strategies other than ("-800 one-shot. 24 The advantages of task simplification are obvious. The ability to group the schemas together (e.g., by accounting cycle) offers the analyst some of the advantage of n-ary strategies. The third advantage, giving higher "weight" to the intermediate schema, is in fact a part of the problem solving strategy that is followed by the KBS being designed for this thesis. That strategy will be explained after a brief discussion of integration within existing conceptual modeling systems. 2.5 Knowledge-Based Modeling Systems Current E-R modeling systems range from passive drawing tools to knowledge-based assistants that embody some form of conceptual modeling expertise. The systems discussed herein reside toward the knowledge-based end of the spectrum. The list of systems is not all-inclusive, but it is representative of the state of the research at this time. DDEW (Reiner et al. 1987) finds objects that appear similar (e.g., same names) and let the user decide whether to merge them. Modeller (Tauzovich 1989) can identify conflicts in user-supplied assertions, but the system itself does not resolve them. It merely refuses to accept a new information until it is conflict-free. This is a common form of conflict resolution, and it is also used by EDDS (Choobineh et al. 1988), CHRIS (Furtado et a1. 1988), and Pasta-3 (Kuntz and Melchert 1989). The KRISYS system (Mattos and Michels 1989) allows analysts to design very 25 complex conceptual models, but still refers conflicts back to the analyst for solution. That solution usually consists of adding new application domain knowledge to the system. Some researchers have suggested conflict resolution strategies not tied to a specific conceptual modeling KBS. Teorey et al. (1986) provide a number of rules for distinguishing between entity types and attributes, but these also are generic, and in a specific context, one might need to refer to an expert in the application domain to resolve conflicts. Roussopoulos and Yeh (1984) suggest some rules of thumb for identifying entities, properties, and relationships, but they admit they have no algorithmic methods for this task. Indeed, many of their rules of thumb require the analyst to use knowledge of the application domain. The systems and methodologies just mentioned do help resolve, or at least identify, some common integration conflicts, but they still leave much of the conflict resolution to the analyst. Using their more general modeling methods, most of these systems could not identify the potential conflicts between the two Employee entities and between the Inventory and Finished Goods entities. Further, none would be in a position to provide the analyst with the theoretically "correct" solution. The best they could do is to query the analyst and "ask” whether the entities are in fact the same. If the analyst confirmed 26 that they were identical, the systems would then rely on the analyst to decide which attributes the entity should have. 2.6 Conflict Resolution Using Accounting Knowledge During the integration process, conflict resolution could be facilitated if the analyst had the services of an experienced accountant who possessed knowledge of the application domain. In Figure 2-3, the entity Employee appears in two user views. The analyst might easily assume that the two views are modeling exactly the same entity. The analyst might further assume that the entity Salesperson in the first schema is also the same entity (making Salesperson and Employee synonyms). An experienced accountant would know that the entities are being viewed from a different level of abstraction. In the Sales and Build-it schemas, the employees modeled are sub-types, or specializations of the more abstract entity in the Payday schema. This type of relationship is called a generalization relationship (Smith and Smith, 1977). It is also one of the enterprise’s interschema properties that are not apparent in the individual user views of Figure 2-3. The abstract concept of duality (McCarthy 1982) is another interschema property that is not usually apparent in user views. Simply stated, duality is the idea that changes in an entity's resource set generally occur in pairs; i.e., each increment to resources is accompanied by a related decrement to resources. A labor operation (increment to the 27 work-in-process resource) will usually have an associated disbursement (decrement to the cash resource). Figure 2-4 depicts the Pay-Day and Build—it schemas and the interschema relationships just described. The cloud surrounds the two relationships that are not apparent at the user view level. In an example used previously, we saw that the individual user views do not identify the entities Finished Goods and Inventory as synonyms. Our experienced accountant knows that the finished goods from the conversion accounting cycle are the inventory items that are sold in the sales accounting cycle. From these few examples, we see how the "extra" accounting knowledge can give the analyst guidance beyond that supplied by the current modeling systems. 2.7 Accounting Knowledge Structured as E-R Templates Many types of domain knowledge can go into a KBS. Fikes and Kehler (1985) presented one classification scheme for domain knowledge. They divided such knowledge into the eleven types shown in Figure 2-5. Years of experience in a particular industry help an accountant build a mental model of accounting systems for businesses in that industry. We can think of this model as a sort of template. This template could conceivably contain all of the types of knowledge shown in Figure 2-5. If we model this knowledge in entity-relationship terms, we can visualize this as a semi-generic E-R template of a firm. The E-R schema would ‘model the accountant's knowledge about the entities and 28 “' employeelf 1 n IEmP'W‘fl—O—‘IC Disbursement I“'<>_‘I 033" I 1QQ‘V—---~~~"--~ U Pays for Interschema '\ (dua'ny) Relationshlps I! ' O ---, 2% ti Finis ‘ i -Oquentityonhand E n Labor n 1 Work-in- 0 Inventory m peratlon Process Transfer Figure 2—4 Interschema Properties 29 . Uncertain Behavior [descriptionsJ T I I I P J I Vocabulary I\ definitions ...... .__I...........I Objects and relationships Heuristics IW.I Decision rules Disjunctive facts Figure 2-5 Kinds of Knowledge That Can Go into a KBS [adapted from: Pikes and Kehler (1985)] 30 relationships that exist in a typical business situation in that industry. Along with the schema, we would need to model the vocabulary definitions, decision rules, constraints, heuristics, and other facts that the accountant knows about this type of firm. The following example demonstrates how we can use this type of knowledge in the integration of the three user views. Figure 2-6 shows a portion of an E-R schema for a firm in the machine shop industry. The schema contains four interschema properties: two duality relationships and two generalization relationships. The actual construction of this schema is discussed later; the main focus here is the use of this schema in view integration. Minsky (1975) theorized that people faced with new situations try to fit current perceptions to some pre- existing memory structure, which he called frames. A similar process can aid us in view integration. New user views are "fitted" against existing mental templates to help understand the user views and reconcile conflicts. No claim is made here that this is the actual mental process that occurs when one performs the integration task, but personal experience and discussions with experienced data modelers suggest this may be so. The enterprise schema in Figure 2-6 may now be thought of as a pre-existing "template" to which the user views in Figure 2-3 will be compared. 31 I l [marl 3r... I rm" Sale lS-A l Inventory l Figure 2-6 Enterprise Schema 32 2.8 Overview of the Integration Process Step one in the integration consists of adding the Sales schema to our enterprise schema. Remember that we are using the ladder integration strategy. In the initial integration, the enterprise schema can be thought of as the intermediate schema. We "know" that the entity set labeled Inventory in the Sale schema is the same as Finished Goods Inventory in the enterprise schema. We keep track of these "extra" names as "aliases," so that users of the finished schema can find the enterprise schema entities that correspond to the entities in their user views. We next add attributes from the user view to the intermediate schema. While adding the attributes, we would notice the attribute called "customer" associated with the Sale entity. We would know that a customer is most often viewed as an entity, rather than an attribute. We would then (tentatively) change customer to an entity. Integration of the Payday schema proceeds in a similar fashion, with no conflicts to resolve. The two interschema relationships shown in Figure 2-4 are already present in the enterprise schema. They present no conflict with the user view, as these interschema relationships are not modeled in that view. The Build-it schema presents a number of interesting examples. The renaming of entities occurs here, as it did in the Sales schema, but we face a new problem when attempting to add the attributes to our intermediate schema. 33 We have some synonyms among the attributes also, e.g., descr./description and qoh/quantity on hand. Our experienced accountant knows the abbreviations used here (part of the vocabulary definition type of knowledge). If an attribute appears that is not in the current vocabulary, the accountant can use knowledge of the descriptive roles played by attributes in the firm prototype to help identify the attribute. For example, if the qoh attribute for Inventory had been labeled no. instead, the accountant would still be able to identify it if no. was described as representing the number of items of the product currently on hand. The accountant would then know that # was a synonym for quantity on hand. Integration continues in this fashion until all of the user views have been added. This chapter discussed some of the issues and research in conceptual modeling and view integration. It also provided a relatively high-level overview of the view integration process. In Chapter 3, we provide deeper analysis of the role played by domain knowledge in view integration conflict resolution. We also discuss the acquisition, modeling, and use of reconstructive knowledge for this project. Chapter 3. Accounting Domain Knowledge in View Integration In earlier chapters, we suggested that adding domain knowledge to the view integration process added extra power to our ability to solve integration conflicts. Some simple examples were given there, but a more complete explanation of such knowledge is in order before we can discuss how to use that knowledge in view integration. Much of the discussion in this chapter is intentionally at a relatively high level of abstraction. The implementation details in Chapter 4 provide a much lower level of analysis, and understanding that material should be aided by beginning with the higher level exposition here. 3.1 Levels of Accounting Domain Knowledge For the purposes of this project, it is helpful to think of accounting domain knowledge as being classifiable into three different levels, which we call the principles, industry, and company levels of knowledge. This allows us to consider separately the various sources and types of knowledge embodied in REAVIEWS and gain some insight into the how each might be used in our problem solving. At the highest (i.e., most general) level lies principles knowledge. This level consists of the basic 34 35 accounting and business concepts that must be considered in the creation of accounting systems. Included in this category are concepts such as duality, control, accountability, economic resources and events, and stock- flows. At the principles level lies much of the knowledge of how businesses operate and the "real-world" objects about which accountants capture information. Accounting systems must allow for the representation, either explicitly or implicitly, of those concepts. At a less general plane lies industry knowledge. At this level lies knowledge about the objects specific to groups of businesses within given industries. For example, at the principles level we understand the concept of economic resources as scarce objects of utility controlled by the business enterprise, but at the industry level we understand the specific types of resources common to business enterprises in a particular industry. Likewise, the principles level view of economic events includes the notion that such events reflect changes in a company’s resource set resulting from certain activities, such as production or sales. The industry level view would include an understanding of the particular types of exchange activities participated in by companies in a particular industry (such as installment sales). Knowledge at this industry level can help us determine more of the specific objects (and relationships among those objects) we would expect to be present in a typical company in that industry. 36 The lowest level of accounting knowledge in our classification scheme is the company level, which consists of knowledge about the objects of interest to one specific company. This level would be concerned with the specific business policies and terms used by the company for which we are producing the conceptual schema. As some of these policies and terms vary among individual businesses, it follows that some of the conflicts encountered in view integration might best be resolved by using company-specific knowledge. 3.2 Sources of Accounting Domain Knowledge The use of the three-level knowledge classification model described above developed out of earlier work (McCarthy and Rockwell 1989) which examined the use of various types of knowledge in the process of designing accounting information systems. The company-specific level of knowledge was not addressed in that work, and some of the terminology used here is new, but sources for the acquisition of knowledge for the principles and industry levels were discussed there. We next discuss the sources of those three levels of knowledge used by REAVIEWS. After that, we describe the transformation of this knowledge into a conceptual form that can be used for problem solving in view integration activities. 37 3.2.1 Principles Level Knowledge. The basic accounting concepts and principles that underlie modern accounting systems have been studied, explained, and formalized by a number of accounting theorists, particularly in the last several decades. Some of these principles relate directly to the economic phenomena that we attempt to model in conceptual schemas for database design. An example of this is the principle of duality, which recognizes that each increment in an enterprises’s resource set can be linked to a corresponding decrement.1 Others principles relate to concepts that are artifacts of particular methods of storing and transmitting data about those economic phenomena. An example of this nature is the prescription that, in journal entries and trial balances, debits must equal credits. In designing accounting systems following a database approach, we need an accounting model free of these artifactual concepts. REA accounting theory provides just such a model. It is a first-order theory of accounting expressed in terms and structures compatible with the semantic modeling of accounting systems for use in shared data environments (i.e., database systems). Because the declarative and procedural aspects of the REA model are expressed in terms 1 As presented in McCarthy (1982), the concept of duality is much richer than the brief explanation here. It includes the notions that an increment must be a member of an event set different from the event set of its matching decrement, that one event set will be that of transferring in events and the other will be the set of transferring out events, and that the accounting practice of matching of expenses allows the relaxation of the duality constraint for certain events for which direct linking to increment/decrement events is either undesirable or not possible. 38 of both the accounting and the conceptual database design domains, the model becomes an inherently useful vehicle for providing high-level accounting knowledge to the task of view integration. 3.2.2 Industry Level Knowledge. At this level, we are dealing with a more specific level of knowledge than at the principles level. As suggested in McCarthy and Rockwell (1989), this type of knowledge would be part of the accumulated expertise of an accountant who had experience designing accounting systems for businesses in a particular industry. To be used in a knowledge-based system such as REAVIEWS, this knowledge must first be elicited from that expert. Hoffman (1989) surveyed methods used by researchers to elicit experts’ knowledge and reasoning strategies. He classified the strategies into three categories, task analysis, interview techniques, and special tasks. The latter two elicit expertise directly from the experts. The data for task analysis can be acquired from experts indirectly (e.g., from documentation such as training manuals) or directly (as in the protocol analysis of Ericsson and Simon (1984)). Expertise within documentation can be placed on a continuum from direct representation to indirect representation. At the direct end of the continuum, expertise is provided in a form close to that used in task performance. One example would be a training 39 manual that provides the detailed procedural and declarative knowledge used in some task, along with heuristics for using that knowledge. Indirect representations are those requiring significant interpretation and reconstruction due to the compilation of those more direct forms of knowledge. Expertise is most commonly acquired directly from experts, and numerous knowledge-based systems have been developed with most or all of their domain knowledge acquired in this fashion. As we have proposed, part of the motivation for this thesis was the exploration of the use of reconstructive expertise, which we described as expertise derived from sources in which the domain knowledge has been "compiled" into a form that is missing some of the procedural or declarative knowledge originally possessed by the expert. This corresponds to Hoffman's cOncept of indirect knowledge. The Encyclopedia of Accounting Systems (Pescow 1976) contains, in a documentary form, the expertise of a number of experienced accountants. The Encyclopedia is organized by industry, and it provides just the sort of industry-level accounting knowledge proposed for use in view integration. The knowledge derived from this source can be considered reconstructive expertise, as the original accountants' expertise has been compiled down into chart-of-accounts- based templates for accounting systems. A more complete discussion of the nature of the Encyclopedia’s accounting knowledge is contained in section 3.4, which details the 40 conversion of this industry knowledge into a conceptual form appropriate to the view integration task. 3.2.3 company Level.Knowledge. There are numerous facts about a particular business entity that cannot be derived from the types of higher level domain expertise described in the previous two sections. Examples of this are the policies followed by businesses in granting credit to their customers. Some companies offer discounts for invoices paid within a short period of the invoice date, while others may offer no discounts. Likewise, some companies may assign customers to specific salespersons who act as account representatives, while others may allow multiple salespersons to participate in transactions with any single customer. A semantically "correct" enterprise schema must accurately model these facts. In a general integration tool such as REAVIEWS, it is not possible (nor is it desirable) to capture and embed this very specific knowledge in advance. Such knowledge would apply only to one specific enterprise, and its use in view integration for a different company might introduce errors. The solution is to acquire any such knowledge directly from the user at the time of initial need. Once acquired, this knowledge is available during that enterprise's view integration, should it be needed in resolving later conflicts. It is not, however, made part of the "permanent" 41 knowledge base available for later consultations with other enterprises' data. 3.3 Conceptual Modeling of Principles Level Knowledge. Batini, Ceri, and Navathe (1992, 73-76) suggest several strategies for database schema design. Their mixed strategy is presented as having significant advantages over other strategies, in part owing to the use of a skeleton schema in the view integration process. The skeleton schema is an overall schema of the application domain, developed separately from the individual user views. That schema "acts as a frame for the most important concepts in the application domain and embeds the links between partitions" (ibid., 74). The partitions mentioned are analogous to the concept of user views used in this thesis. Here, we call such a skeleton schema the managerial schema.2 This managerial schema becomes the primary knowledge structure for both the principles and industry levels of domain knowledge used in solving view integration conflicts. The embedding of the principles level knowledge within the managerial schema is accomplished by a rigorous adherence to REA accounting theory as the basis for its design. Some explanation of REA theory was given in Chapter 2 The skeleton schema of Batini et al. (1992) corresponds closely with the concept of the managerial schema discussed in Lum et al. (1979). It is in that sense that we here use the term managerial schema. To eliminate possible confusion, the reader is advised that Batini et al. use the term managerial schema for a somewhat different concept. 42 2, but it will be helpful to address briefly the choice of this manner of representing principles level knowledge. The problem domain of REAVIEWS is that of conceptual database design. The problem-solving task undertaken by REAVIEWS is the integration of conceptually-modeled user schemas into a unifying, global schema. The accounting principles relevant to system design would be most accessible and useful if characterized in terms related to the problem domain of conceptual database design. This is precisely the characterization provided by the REA model. It is helpful, if not critical, to understand that the important aspects of the REA model are the accounting theory and principles at its core, not the modeling language chosen to explain the theory's use in the design process. The "rigorous adherence" to the theory is much more than merely using a certain set of diagrammatic techniques.3 By such application of the theory, we are able to make strong inferences about the objects we expect to find in the enterprise schema. This high level knowledge alone, however, presents a rather incomplete view of such objects. By adding more specific knowledge from the enterprise industry level, we gain the ability to make even more inferences about those objects. Transformation of 3 While much of the published work on the REA model has used the Entity-Relationship modeling conventions of Chen (1976), the REA accounting model may be expressed in any semantic language rich enough to capture all of its declarative and procedural requirements. For example, Geerts and McCarthy (1991) present the REA accounting model using a very different set of diagrammatic constructs, but the principles and theory of the REA model are uncompromised. 43 accounting knowledge from the Encyclopedia of Accounting Systems into an REA-modeled representation provides a way of adding such industry level expertise to principles level expertise. It is discussed next. 3.4 Conceptual Modeling of Industry Level Knowledge Our eventual goal is to make use of industry-level knowledge in the process of view integration. That process uses such knowledge in the form of the managerial schema described earlier. The source of that knowledge, the Encyclopedia of Accounting Systems, presents its information from a traditional, chart-of-accounts-based viewpoint optimized for double-entry bookkeeping. The Encyclopedia includes charts of accounts, illustrations of source and output documents, examples of journals, and textual narrative. Some of the knowledge is less tied to this manual system orientation. For example, there is discussion of some principles useful in the design of appropriate organization structures, and sample organization charts are provided. In addition, when describing typical economic transactions in the industry, the encyclopedia contains many facts that are "accounting system neutral." Overall though, the tone is very heavily influenced by the underlying requirement that accounting data be classified according to the categories in the chart of accounts to enable the proper double-entry recording of transactions. The managerial schema (and the knowledge it embodies) must therefore be 44 reconstructed from the information provided in the encyclopedia. Knowledge acquisition (KA) from documentation has not been widely researched, with the notable exception of the CYC project (Lenat et al. 1986). There are few formal methodologies that deal with this approach to RA. Turban (1990,465) notes that acquiring knowledge from documentation is used where "the concern is to handle a large or complex amount of information," a case that certainly applies to conceptual modeling. Modeling a real enterprise requires the handling of a large and complex amount of data. The analyst must produce a model that adequately reflects the real properties of the enterprise being modeled. This goal is complicated by the fact that there may be multiple models that satisfy the requirements. Hawryskzkiewycz (1984,115) echoes a widely-held view when he states that "the design process is not deterministic: different designers can produce different enterprise models of the same enterprise." At the macro level, constructing the managerial schema from documentation is a two-stage process. REA templates are first instantiated for each of the accounting cycles. These accounting cycle schemas are then combined together into the managerial schema. At the micro level, however, the process quickly becomes complicated. This complexity arises, to a great extent, from the different perspectives of accounting taken by the Encyclopedia of Accounting Systems' authors and the REA accounting model. 45 The REA model is based heavily upon the basic stock- flow nature of accounting information and the concept of duality. As described in McCarthy (1982,561), Elements of the general ledger normally are classified as either balance sheet accounts, which represent monetary stocks of goods, services, and claims at a particular time, or income statement accounts, which represent monetary flows of these same items over a period of time. When viewed from this perspective, the important concepts in the accounting system are the economic events of the enterprise, the economic resources and agents who participate in those events, and the particular relationships that link all these entities together. As mentioned previously, many of the important concepts included in the Encyclopedia of Accounting Systems are artifacts of double-entry bookkeeping that arose primarily from the manual storage and transmission of accounting data. The difficulty then becomes the "sorting-out" of the important accounting facts in the encyclopedia. In essence, what is being attempted is the reasoning backward from the compiled journal and ledger view to the underlying knowledge about the important economic entities and relationships in a given industry. Once that is achieved, the knowledge can be transformed into a managerial schema using REA accounting theory to arrive at a structure that embodies important knowledge from both the principles and industry levels. We next look at the various forms of knowledge in the encyclopedia and explain how this transformation was 46 accomplished. The industry selected for this project was the machine shop. This industry was selected for the rich contextual setting provided by its manufacturing focus and because of the availability of a number of modeling cases drawn from actual business enterprises from this industry. For ease of reading, we will hereafter refer to the Encyclopedia of Accounting Systems as the EAS. 3.5 overall Knowledge Acquisition Strategy The desired output of our knowledge acquisition activities is a managerial schema for a typical machine shop. To fill the REA templates, it is necessary to answer the following questions: . what entities can we identify from the EAS? - what attributes can we identify for the entities? - what identifiers (keys) can we identify for the entities? - what relationships can we establish among the entities? The first step, then, is to determine the resources, events, and agents present in the EAS view of that industry. To do this, we separately examined the four major types of information found there: textual narrative, chart of accounts, sample documents, and organization chart. Examples of each type of information will be given in the sections that follow. As entities were identified, they were recorded, along with any attributes discovered. When the entity identification step was completed, keys for each entity were identified. 47 After the initial identification of entities, the event entities were separated into accounting cycles. Then, by cycle, REA templates were filled with the identified entities. This grouping of objects of interest into accounting cycles is not a requirement for REA modeling, but rather a common convention used by accountants when trying to reduce the complexity of the business enterprise into manageable chunks. Accounting information system consultants and auditors frequently analyze clients by cycles. This grouping by cycles allowed the analysis of smaller groups of entities that still had some close association to each other.4 3.5.1 narrative. The first pass through the EAS was an examination of the narrative portions to gain an understanding of machine shops and to begin to identify the entities needed in the managerial schema. vThe following quote is the first paragraph of the section of machine shop narrative titled "General Features." The machine shop industry is extremely competitive. Usually, sales are to industrial plants (professional buyers who are primarily concerned with service and price), on a bid or quote basis. Price is an important factor in obtaining sales. a given machine shop will compete with different sizes and types of companies on each item quoted, depending on the item and location of the customer. These 4 The precise accounting cycle scheme used is not critical to the task. There is no widespread agreement about terminology or even about the cycles to which some accounting transactions belong. 48 competitive features require the accounting system to reflect adequate costing to furnish a guide for estimating prices on future jobs. (Pescow 1976, 1171) In this one short paragraph, we learn a number of facts. Four entities are described, customer, price quotation, sale, and finished good. The exact terms given for these entities are industrial plant (professional buyer), bid (quote), and item. This highlights one of the underlying premises of the EAS. The EAS's authors assume that their work will be used by those already familiar with basic business and accounting concepts. Any experienced accountant should be able to read the above passage and understand the business entities involved. This does impose the requirement, however, that those acting as knowledge engineers in reconstructing industry level accounting knowledge from the EAS also have moderate knowledge of basic business and accounting concepts.5 The above passage also begins to identify some of the attributes of those entities. We see that price is an attribute for price quotation, sale, and finished good. We also see that location is an attribute for customer, although the concept of location is refined in later narrative to become the concept of sales territory, which 5 The minimum level of such knowledge required for the knowledge acquisition task is an interesting research question in itself, but beyond the scope of this thesis. From the experience gained in performing such acquisition, it appears that an advanced accounting Student probably possesses enough business and accounting knowledge to isdentify adequately the important accounting entities in the EAS .industry descriptions. 49 has further possible components of trading area, county, and state. A partially filled-in REA template derived from the above passage is shown in Figure 3-1. Those entities found in the EAS narrative are shown with solid lines, while other entities required by the full template are shown with dotted lines. Note that the REA template anticipates a dual event, cash receipt, to be triggered by the sale event. In this example, sale is the decrement event, and cash receipt the associated increment event. Note also that the REA template allows for other types of relationships (such as commitment) beyond those shown in the general REA template in Figure 2-2 in Chapter 2. Examples of some of these types of relationships can be found in McCarthy (1982), Denna and McCarthy (1987), and McCarthy and Rockwell (1989). This type of analysis proceeds until the entire machine shop section of the EAS has been examined. 3.5.2 Chart of Accounts. Analysis of the chart of accounts in the EAS is very similar to that of the textual narrative. One important difference is that some of the generalization relationships are more apparent here, as shown in the coding of the account numbers. Figure 3-2 shows a portion of the machine shop chart of accounts. The center and right columns contain double-entry bookkeeping information about debits and credits. This type of information is an artifact of manual record keeping, and it provides virtually no problem solving knowledge. Those artifcats were left in 50 CUSTOMER l I I I I I I *\ SALE FINISHED- GOOD ’ \ \ I, \.I iI'""""“ k...—.......-. SALES PERSON ----------c-“ '0 I \ "”"J-Oooo- 'COOOQQQQQOO- CASH CASHIER Inn----- LO-’-----.--., 5“ ~I Figure 3-1 Partially Instantiated REA Template [adapted from McCarthy (1982)] 51 BALANCE SHEET ACCOUNTS Assets Debits fiom Credits from 101 Cash ................... Cash Receipts Journal Cash Receipts Journal, Payroll Journal 102 Investments ............... Cash Disbursement Journal Cash Receipts Journal 103 Notes and Accounts Receivable . . . Sales Journal Do 104 Inventories: 104-1 Supplies ............... Voucher Register Material Requisition Summary 104-2 Material ............... Do Do 104-3 Work-in—Process ......... Material Requisition Summary, Job Office Summary Payroll J ournal, General Journal 104-4 Finished Goods .......... Job Order Summary General Journal 110 Land ................... Voucher Register 111 Buildings ................ Do 112-1 Machinery ............... Do 112-2 Small Tools ............... Do 112-3 Shop Equipment and Fixtures . . . . Do 113 Office Equipment and furniture . . . Do 114 Automobiles .............. Do 115 Patents ................. General Journal 116 Reserve for Depreciation, etc. . . . Do Do 120 Prepaid Expenses ........... Voucher Register Do Figure 3-2 Partial Chart of Accounts — Assets [sourcez Pescow (1976, 1174)] 52 Figure 3-2 to demonstrate how the authors of the EAS ”compiled" their accounting knowledge into a format very different from that needed in the task of view integration. The ”Do" in those columns stands for ”ditto." We see there that there are four types of inventory. Inventory has a code of "104,” and the four sub—types of inventory all have codes starting with "104-." There is an additional type of resource, also with three sub-types, whose code is "112." However, the EAS does not give us a name for this resource. The missing name for resource "112" points out a recurring problem with using reconstructive sources for knowledge. When information is incomplete, inconclusive, or simply missing, we are not able to go to the expert and fill in the gaps in the desired expertise. Sometimes we can deduce the information we need from our other knowledge sources, i.e., the REA model and general accounting knowledge. Other times we must simply use an incomplete knowledge structure and appeal to the user for such information when needed. Section 3.5.2 provides more discussion on this subject. Another area of confusion comes from the numerous expense accounts we find. Figure 3-3 shows another portion of the chart of accounts. From the coding scheme used, we recognize these items as period costs rather than product costs. At first, many of these expenses are difficult to place in the normal REA template of increment/decrement 53 600—Marketing Expense: 601—Salaries: 601-1 Sales Salaries ................. 601-2 Salesmen's Commissions .......... 601-3 Stores Salaries ................. 601-4 Engineering Salaries ............. 601-5 Salaries-Research ............... 601-6 Salaries-General Office ........... 602-611—Service and Expense: 602-1 Advertising-Catalo .............. 602-2 Advertising—Genera .............. 603-1 Agents' Commissions ............ 603-2 Dealers’ Discounts .............. 604-1 Traveling and Entertaining Expense . . . . 605-1 Parcel Post and Express Shipping Expense 605-2 Postage-General Mail ............ 606-1 Sales Tax .................... 607—1 Sales Auto Expense .............. 607-2 Delivery Expense ............... 608-1 Telegraph Expense .............. 608-2 Telephone Expense-Toll Calls ....... 608-3 Telephone Expense-Other .......... 609-1 Depreciation-Automobiles .......... 609-2 Depreciation-Office Equipment ....... 609-3 Rental of Office Equipment ......... 609-4 Maintenance of Office Equipment ..... 610-1 Collection Fees ................ 610-2 Credit Reporting Fees ............ 610-3 Research and Development Supplies . . . . 610-4 Forms and Office Supplies ......... 611-1 Shipping Room Supplies ........... 611-2 Sundry Marketing Expense ......... Figure 3-3 Partial Chart of Accounts — Marketing Expense [adapted from: Pescow (1976)] 54 event pairs. McCarthy (1982) discusses the concept of "event combinations" caused by accepted accounting convention. Many of the traditional ”period” expenses in accounting systems are of this nature. Because of the difficulty in achieving exact matching of some outflows to their appropriate inflows (or because of the marginal information content of such matching), some inflow/outflow pairs are combined into one event entity. For example, rather than record the acquisition of utility service as an inflow separate from the use of such service, we routinely record it as a period expense. This, in effect, combines the acquisition and the "using up” of the resource as a single event. Most of the expense accounts can be modeled in this fashion, although this is a compromise of the general REA template. 3.5.3 Documents. There are a number of typical documents shown in the machine shop industry section of the EAS. Perhaps due to the completeness of the narrative and chart of accounts already examined, there were no new entities discovered from the sample documents, however, the extra detail on the various forms did suggest some synonyms and additional attributes for entities already modeled. One example is the discovery that the EAS was using product as a synonym for the chart of accounts inventory item finished goods. Another example is the discovery on the shop order system forms that some raw materials quantities are measured 55 by weight, while others are measured by the more familiar number of items (often referred to as quantity). To accommodate this, raw material has the attributes quantity and unit of measure. 3.5.4 Organization Chart. The information given in the sample organization chart theoretically provides a wealth of information about employee roles, generalization hierarchies among employees, and most of the knowledge needed to model the responsibility relationships necessary for the various inside agents in the REA templates. The wide variation in organization charts expected among actual business enterprises, however, limits the usefulness of the information given in the EAS. As the authors there point out, the actual organization charts for real businesses would vary by size of business and the nature of the products manufactured. The only information given that would seem to apply widely is the advice given in the following quote from the EAS: .However, the points enumerated below should be given serious consideration in determining the organization of a machine shop. 1. Research should be set up as a separate function reporting directly to top management and not subordinate to engineering or production departments. Whenever research activity is a part of either the production or engineering divisions, there is a natural tendency to push research work aside in favor of customers' orders during periods of heavy activity. There is also a tendency to 56 over-staff the research department whenever activity slackens. 2. Engineering is usually an important part of machine shop operation and should rate a top-level division. 3. The personnel department should also rate a major place in the organization structure to help insure favorable labor relations and community acceptance. 4. Purchasing should be centralized in one department under the controller. The proper functioning of this department is dependent largely upon the development of proper records, routines, and purchasing techniques. However, the purchasing function in a machine shop does not usually have sufficient influence on major policy decisions to require a separate division reporting directly to the president. (Pescow 1976, 1173) This advice is still but a recommendation, and the possible variations in actual machine shop organization prevent us from making strong inferences with this knowledge, at least in the view integration task as defined in this research. The suggestions there do introduce an interesting topic-— that of the proper role of view integration activities. As presented in most of the papers and books on the subject of conceptual design of database systems, the view integration stage is seen primarily as the merging of the individual user views created during the view modeling stage. As Batini et a1. (1992,119) state: The main goal of view integration is to find all parts of the input conceptual schema that refer to the same portion of reality and to unify their representation. This activity is called ache-a integration; it is very complex, since the same portion of reality is usually modeled in different ways in each schema. 57 This is precisely the focus of the view integration task in this thesis. As such, the ultimate use of accounting domain knowledge is to help identify errors and omissions in modeling the realities of the business enterprise as they exist (or as management wishes them to exist in the new information system, should that be the focus of the conceptual modeling). The earlier suggestions about organization structure would not be as helpful in determining what organization form actually exists in a given machine shop as they would be in evaluating the appropriateness of structure being used by the enterprise. In other words, the information helps us identify errors in the organizational structure of a particular machine shop rather than errors in the modeling of that structure. Much of the knowledge in the EAS is of this nature, and for purposes of this thesis, it is essentially ignored. On the other hand, most information system design methodologies stress the iterative nature of the process; discoveries at one stage may lead to a re-thinking of prior decisions, and, in some instances, redesign and repetition of earlier steps in the process. As the view integration stage presents designers and managers with a particularly insightful perspective on the enterprise and its operations, it would be possible to add considerably more input to the design activities if some of this other industry domain expertise could be used. For example, instead of merely identifying the responsibility relationships in the user 58 views, it might be possible to make some sort of evaluation of the existing responsibility hierarchies and recommend improvements. Such investigation is greatly outside the scope of this thesis. 3.6 Selection of Entity Keys After all the entities have been extracted from the various parts of the EAS description, identifiers (i.e., keys) must be determined for each entity. Where more than one key can be identified, one is selected as the primary key, and the others are listed as non-primary-key attributes. These "additional" keys are also designated as candidate keys, which might be used as entity identifiers. This information is used in the actual view integration stage. Where possible, keys from the EAS were used. When such keys were missing, we were sometimes able to supply them by resorting to general business and accounting knowledge. For example, the EAS never used the term social security number as a key for employee, but we know that it frequently serves that purpose. As further support for the use of this key, we observed that in an illustration of a job time card, a block titled "EMPLOYEE NAME AND NO." contained the name "WILLIAM PETROFF" and the number "350-03-3094," which looks strikingly like a social security number. At the same time, that block also contained the designator "L78." It is not uncommon to assign employee identification numbers according 59 to some other scheme, so information from the time card illustration was inconclusive as to which number was being used as the identifier. We chose to model the two as separate attributes based upon information in other illustrations. On some forms shown, there are fields labelled "DEPT. AND OPER. NO." The values in these fields are short combinations of letters and numbers, such as "T-7" and "2-9." The resulting managerial schema therefore contains the key employee no. for employee, with social security number listed as a candidate key. 3.7 Composite-Rey Entities For simplicity in illustration and computation, primary keys were required, if at all possible, to be simple keys. That is, they were constrained to be single attributes. In reality, most primary keys are single rather than multiple attribute. It is a common convention to choose simple keys, as Batini et al. (1992, 294) note If an entity has multiple identifiers, one of them must often be designated as the entity's primary key. A secondary decision criterion is to prefer simple identifiers to multiple identifiers, and internal identifiers to external identifiers; in this way, primary keys of entities of entities can be kept minimal in size and simple in structure. In the actual entity modeling, when identifiers could be found in the EAS, they were, in most cases, simple and internal. Those entities requiring composite keys were notable in that they could each be classified into one of 60 three types of entities, each with interesting properties. Those entity types were employee service, depreciation, and budgeted, events. Figure 3-4 shows examples of these entities. The one thing all of these entities have in common is that they are events (or budgeted events) that often do not generate source documents with unique numbering systems. To put it in non-artifactual terms, they are events to which we do not normally assign internal identifiers. For most events (and recall that events are phenomena that reflect changes in resources), we have some method of assigning unique identifiers. Examples of such unique identifiers (which we refer to as "event codes") include the invoice number for the sales event, the requisition number for a material requisition, and the receiving slip number for a material receipt event. For employee service events, there is usually a time card, time sheet, or perhaps a job sheet, which often do not contain unique, sequential document/transaction numbers, as we see on most other "event recording" documents. These types of events are also unique in the REA model, in that they are resource incrementing events that have cash disbursement as their dual relationship and also have an employee of the business enterprise as the outside agent. For depreciation events, we normally issue no source documents. In a manual system, there would simply be an end of period journal entry. In one sense, depreciation can be 61 L—o employee no. L—o start time —0 stop time —0 operation. no. 1 job operation a. Employee Service —o depr. amount -—e building no. —. data b. Depreciation (I) (D 2. O (D building I -—0 operation no. r—O product-no —0 standard time —o standard-setup g. —0 description c. Budgeted Event . operation type Figure 3-4 "Composite-Key" Entities we 6V Ste acc QCt 62 thought of as somehow different from ”traditional" exchange events. Although depreciation is supposed to represent the "using up" of a long-lived asset, the actual formulas used to calculate depreciation frequently bear little relationship to the actual reduction in the assets life. Further, the depreciated resource is not really decremented in the same way a resource like finished goods is. When we decrement finished goods, it is usually because we have sold something and that thing is actually removed from our control. When we record depreciation on a building, the building is (usually) still there. We do not witness the physical decrement to this resource until we sell or demolish the building. What we are doing is trying to partition, for accounting purposes, what is essentially a continuous process.6 Budgeted events, such as job operation standards (which we refer to as job operation types), are also non-exchange events, but in a different way; the entity is simply a standard created by management to allow responsibility accounting by measuring variances between projected and actual changes in resources. The use of composite keys on depreciation and employee service can be eliminated by assigning a unique time ”stamp" to each occurrence of an event, as in Gal and McCarthy (1986). For budgeted events, the identifier would be 6 event. McCarthy (1982) refers to this type of event as a partitioned 63 something other than time. For example, for the job operation type event, the identifier would be operation number. To some, this may seem artificial. For example, it somehow seems more natural to think of depreciation in terms of an asset/time period combination, and indeed, the asset number and date attributes are used as the primary key in our managerial schema. Likewise, we could have the accounting system assign a unique time stamp to the employee service events when payroll is calculated, but it seems more natural to think of this service in terms of the employee/pay period combination used as the primary key in our managerial schema. At the view integration stage, however, there is no conceptual difference between the use of simple and composite keys. For simplicity in explanation and computation, we use the simple key time as the key for those classes of entities just discussed. 3.8 Relationships and Structural Constraints Before we begin discussion of modeling relationships from knowledge in the EAS, we must refine the concept of structural constraints introduced by the Chapter 2 discussion of cardinality ratios. Those ratios are one notation for specifying acceptable limits on the number of times an instance of one entity may participate in a relationship with an instance of another entity. The "one salesperson to a sale" limit was given as an example of this concept. For a complete specification, we also need to know 64 whether an entity must participate in a particular relationship (total participation) or whether such participation is optional (partial participation). Consider the relationship between department and employee. An example of total participation for employee is a policy stating that all employees must be assigned to a departments. An example of partial participation for employee is a policy which does not require all employees to be so assigned. We can now fully specify structural constraints by assigning a two number "constraint set" to each entity in a relationship. The first number (the min-card) specifies the total/partial participation constraint. A zero specifies partial participation, and a number greater than one specifies total participation. The second number in each pair (the max-card) specifies the maximum number of instances of the relationship in which the an entity instance may participate. 3.9 Relationships in the EAS REA accounting theory provides us with a number of "basic” relationships that must be present in a properly constructed accounting system. These were used as the initial relationships modeled in the managerial schema. The EAS was then re-examined to discover other relationships that might be expected in typical machine shops. 65 When re-examining the EAS, the primary focus was on trying to discover relationships outside of those in the generalized REA template. Readers should recall from Chapter 2 that the general REA relationships were stock- flow, duality, control, and responsibility. While not part of the general template, other relationships are allowed. Also recall that responsibility relationships were, for the most part, inconclusively specified by the EAS. The primary relationship discovered in this re-examination pass was the specifies relationship, which connects a resource-affecting event with a budgeted event. The resource-affecting event participates in a stock-flow relationship with a resource of the business, and the budgeted event specifies budgets (standards) for that stock-flow. For that reason, we refer to the relationship between the two events as a "flow- budget" relationship. Figure 3-5 shows one example of this, the job operation to job operation type relationship. Instances of job operation type provide budgeted standards for the flow of employee labor into work-in-process during job operations. After all relationships were identified, structural constraints were assigned to the relationship. For simplicity of illustration and reduction of implementation complexity, binary relationships were used. This makes no difference in the view integration task, as the ternary control relationship is simply modeled as two binary relationships of the same type, one between an event and an 66 job-op- (0,N) low (0,1) job (1,N) (0,N) .Ob type udge operation 1 g Figure 3-5 "Flow-Budget" and "Stock-Flow" Relationships 0P We St]- 67 outside agent, the other between that same event and an inside agent. 3.9.1 Assigning Structural constraints. Assigning structural constraints was another task that was only partially aided by EAS knowledge. There were few definitive examples in which the EAS indicated possible constraints, and these were more implicit than explicit. For example, the discussion of production employees led to the belief that they repeatedly participated in employee service events (such as job operations). We also observed that a particular job operation was usually performed by one employee. This would make the manufacturing employee to job operation relationship one-to-many. Min-cards were not expressed, but it is normal for employees to be hired and their personnel records entered into the company's record- keeping system before they actually perform any work for the company, so the min-card on the employee side was set to zero. On the other hand, a job operation event must be performed by an employee. Thus, the min-card for job operation must be set to one. The resulting modeling construct is shown in Figure 3-6a. For the most part, however, structural constraints were generated using more general knowledge about business operations, when possible, and they were left blank if there were no compelling arguments for a particular set of structural constraints. The customer to sale relationship 68 mfg (O,N) (1,1) job- * employee operation a. mfg-employee-job—operation relationship (1,1) (0,N) * sale customer b. sale-customer relationship Figure 3-6 EAS-derived Structural Constraints 69 is an example of structural constraints derived, at least partially, from general business knowledge. The EAS does not explicitly indicate the full expected structural constraints between these two entities, but we reason as follows. It is a natural assumption in most businesses that customers may (and hopefully, will) participate in multiple sales events. It is also the usual case that any given sale will be to one customer. This results in max-cards of ”N" for customer and "one" for sale. It is also common practice in manufacturing companies to enter customer information into the information system in advance of sales, perhaps after an initial sales visit by the firm's sales representative. This would give customer a min-card of zero. On the other hand, an individual sale must be made to a customer, which would argue for a min-card of "one” for sale. Further, the EAS narrative for machine shops contains statements like Accounting for income begins with receipt of the customer's order. The routine of processing orders should be tied in with the accounting system to give adequate information about sales classification at lowest cost of handling. To provide data for accounting and sales analysis work, both customer and product classification must be considered. (Pescow 1976, 1177) This also argues for a minimum cardinality of "one" for sale, for it appears that we always wish to know the customer to whom a sale is made. From the previous reasoning, we produce the binary relationship shown in Figure 3-6b. 57 70 Finally, the individual accounting cycle schemas were integrated into the managerial schema for the machine shop industry. 3.10 Iodeling Compromise In Chapter 2, we mentioned that there are a number of compromises to the full REA templates that commonly arise in actual enterprise modeling situations. Those compromises include event partitioning, materialization of claims as base objects, and event combination due to expensing of immediate services. The managerial schema constructed for REAVIEWS contains some examples of such compromise, and we should explain the reasoning behind their inclusion in the managerial schema. In McCarthy (1982), compromise to the full REA template is discouraged, but allowable in three general situations. The first is when modeling transactions for which existing accounting convention allows less than full specification of schema elements. The second is when implementation concerns, such as system efficiency or storage requirements, indicate that the adjustments make economic sense. The third is when such compromise enhances the decision 7 When deciding which usefulness of the resulting system. compromises to allow in the managerial schema, the following rules were followed: 7 For additional analysis of these situations, see McCarthy(l982, 71 - when existing convention allows less than full specification and the convention is common, allow compromise. - when compromise is for implementation reasons, defer compromise to implementation stage of project and do not allow compromise at view integration stage. - when compromise might enhance decision usefulness of resulting schema, allow compromise when there is compelling evidence that such additional decision data is actually being used by the enterprise. The various depreciation events are example of event partitioning. They are common in most accounting systems, so they were modeled in the compromised fashion following the first rule.8 There were no compromises suggested by the EAS that would invoke the second rule. There were, however several compromises that fell under the third rule, whose application deserves some explanation. Perhaps the most common compromise for enhancing decision usefulness is the materialization of claims as separate base objects, separate derived objects, or as views. For example, accounts receivable can quite properly be thought of as the imbalance between two economic events, sale and cash receipt. As such, we do not need to model that claim as an object in the enterprise schema. It can be 8 Early papers on REA accounting theory commonly included such partitioned events in REA-based schemas. Later work examining the usefulness of REA-based schemas in linking REA-modeled databases with knowledge-based systems has pushed us toward a much stronger position favoring maintenance of an ”non-compromised" enterprise schema in virtually all cases. The "compromised" enterprise schema (which could include such things as partitioned events) would be derived as a View of the underlying non-compromised version and would serve as the middle level schema in the three-level schema approach presented in Chapter 2. i: 72 materialized via a procedure when needed, as when producing financial statements. We can also conceive of situations when we might wish to consider the claim as an object having attributes of its own, as when we produce an aged accounts receivable report. There were three main claims present in the EAS that fell under the third rule, Accounts and Notes Receivable, Accounts and Notes Payable, and Capital and Surplus. The Machine Sheps narrative did not discuss using these claims in any decision processes, and the chart of accounts presented no lower level detail (i.e., sub-accounts) for these items, so they were not materialized in the managerial schema. Should they be present in views supplied by the users to the view integration process, they would then be added to the enterprise schema. The managerial schema also contained other compromises due to insufficient information in the EAS. In the most commonly conceived use of the REA model, the actual business enterprise is the source of knowledge about the entities to be represented. If there is incomplete knowledge about an entity or relationship, it is assumed that such knowledge can be discovered. In other words, if one needs to know something important about an entity, one can go ask someone in the company, examine documentation, etc. When using the EAS as the source of knowledge about entities and relationships, we can only model what we find there. There is no other source for gaining additional information. For 73 some entities, there was simply not enough information to produce adequate constructs. In the best of such cases, general business and accounting knowledge allowed us to "fill in the gaps." In the worst cases, the objects were left out of the schema or modeled by an object representing a class of the entity. For example, there are many accounts in the chart of accounts that represent various services supplied by vendors, among them Advertising-General, Rental of Office Equipment, Maintenance of Office Equipment, and Credit Reporting Fees. No more information is ever given about the services. To present a managerial schema that at least contained some representation of these events, we added the entity Vendor Service. This entity is a representation of the more general class to which the service events belong. By choosing to model the more general object, we are in essence saying that we do not have enough information to distinguish between the more-specific events. Lacking any strong evidence that we should capture "differential" information among the events, we model one construct that can accommodate all of them in the managerial schema. The binary relationship between Vendor Service and its dual event, cash disbursement, is shown in Figure 3-7. Note that there is no functional difference in REAVIEWS problem solving abilities whether we model these events in this fashion or simply leave them out of the managerial schema. This type of incomplete information is not used by REAVIEWS '74 voucher no date amount —o service-type —0 description —e invoice-no. —0 date --0 amount —e —o —o < (D 3 O. 0 q - cash- service disbursement Figure 3-7 Vendor Service Entity 75 in the view integration task-— the assumptions necessary for its use go well beyond the reconstructive knowledge focus of this‘thesis.9 Throughout the entire process of building the managerial schema, we also made note of typical synonyms for the various entities and attributes. The product-finished goods synonym mentioned earlier is one example of this. This information was also stored in the system for use in view integration. 3.11 Integration Conflict Resolution The next few sections deal with problem-solving strategies for view integration. The discussion is presented from a "human problem solver" perspective. Chapter 4 presents the same integration process from the perspective of REAVIEWS, the knowledge-based system in which these strategies were embedded. That discussion is presented in terms more specific to the representation language chosen for the actual implementation, but the same overall strategy should be apparent in both chapters. 3.11.1 Basic Problem solving Cbncepts. When considering the knowledge needed, stored, and used in solving a particular problem, one commonly made assessment is the 9 This situation did provide some clues that the integration process may be further aided by using accounting knowledge from other sources than those studied for this project. 76 level, or depth, of that knowledge. Depth of knowledge is usually placed on a continuum between deep representations and shallow, or surface, representations. Deep models of knowledge are typically presented as capturing the underlying causal processes in the domain of interest, while shallow models of knowledge typically make use of pre-stored associations between problem descriptions and solutions. A familiar illustration for deep reasoning in the medical world is the just-graduated medical student who performs diagnosis by using functional knowledge of the human body to reason backwards from symptoms to likely causes. Shallow reasoning would be represented by a physician with many years of diagnostic experience who has compiled a wide database of empirical associations between patterns of symptoms and diseases based upon the many previous diagnoses the physician has made. This physician can in many instances make a very quick and accurate diagnosis by matching the current pattern of symptoms against those compiled patterns. Researchers have proposed or built knowledge-based systems that make use of both sorts of knowledge. Typically, these systems initially avail themselves of the relatively efficient shallow reasoning, but resort to using deeper models of knowledge when shallow methods fail to resolve the problem satisfactorily. In performing view integration, we follow a similar method, moving from shallow to deeper levels of knowledge as necessary. 77 3.12 View Integration Strategies At the most general level, the view integration process looks simple. We make use of the ladder strategy explained in Chapter 2 to add the user views, one at a time, to the initial managerial schema, using the following three-stage strategy: - try to identify the components in a user view as existing objects in the schema. - if components are identified, check for and resolve conflicts with managerial schema specifications (e.g., in structural constraints); if not identified as already established components, treat as new components. - if user components are not already represented in the schema, add them. This apparently simple strategy masks some of REAVIEWS's use of domain knowledge in the integration process. As we move to lower levels of detail, the process becomes much more complex. Figure 3-8 shows a flowchart of the problem solving strategy in REAVIEWS. This provides a concise overview of the process discussed in the narrative that follows next. The rectangles in Figure 3-8 represent the major processes in our overall strategy. Appendix A contains descriptions of those processes. If one works through the diagram and appendix, one gains a better understanding of the problem solving algorithm at work. The narrative that follows takes us through the process from a relatively "machine independent" perspective. One should 78 START ‘ get individual ( view T find —) entity ( find pka @N Y get user schema add unique enti find candidate entities )2 confirm candidate enti find add . ...... '°13§:°"' f tind foreign- ke n . ka add toreign- add key entity relation- E | shi . save schema Figure 3-8 Flowchart of REAVIEWS’s Integration Strategy 79 also be able to identify this same process in the Chapter 4 discussion of REAVIEWS’s handling of the test cases. 3.12.1 Initial Schema Processing. As an initial aid in problem solving, user views are first grouped by accounting cycle, then solved by cycle. This is not required by the underlying KBS processes, but it is viewed as an aid to the users of the system, who are allowed to focus on a smaller subset of the enterprise's business activities at any given point in the consultation. Also, as part of the initial view modeling, users produce their views using constructs from REA theory and entity-relationship modeling, as in McCarthy (1982). When presented with a user view, we attempt to recognize entities in the view in the following order: events first, followed by resources, then followed by agents. The important part of this ordering is the primacy of the event entity. If you look at the general REA template of Figure 2-2, you see that event is usually directly linked to most of the other entities in the template. Furthermore, a major function of a business company’s accounting system is the capturing of data about the economic transactions entered into by that company. A typical definition of an accounting system is given by Horngren and Foster (1987, 910): An accounting system is a set of records, procedures, and equipment that routinely deals 80 with the events affecting the financial performance and position of the organization. In addition, Ijiri (1975, 61) relates The notion of exchanges is significant in accounting measurement because an increase or a decrease in the resource set is treated not as an isolated event, but as an integral part of activities. Hence, if we can first identify an event in a user view, we have potential insight into the identities of most of the other entities. 3.12.2 Entity Identification. Initial recognition is performed using a simple pattern-match on the names and primary keys of the entity, making use of the synonym lists constructed during the knowledge-acquisition phase and added to as the view integration proceeds. If that shallow matching fails, we resort to deeper knowledge contained in the managerial schema about the various groupings of entities and relationships we eXpect to find in a typical machine shop. We inspect the other entities in the user view and attempt to identify an REA template in the managerial view that might be the same as the user is modeling. The various potential templates are ordered from most likely to least likely. We take the most likely template, describe it to the user, and suggest which entity we think the user is attempting to model. The user is asked for confirmation of our "best guess." If the user confirms that the two entities are the same, we model it as 81 such, adding the entity/attribute names to our list of synonyms. If the user indicates they are different, then the next most likely template is presented, and so on, until we have exhausted all the potential templates. If none of the suggestions are confirmed, we add the entity to our schema as a new entity, and construct the appropriate relationship(s). 3.12.3 Relationship Identification. After the entities have been added to the schema, the relationships are examined, and we look for differences in structural constraints between user view relationships and relationships in the growing schema. If found, we attempt to resolve them, using methods discussed later in this section. When all user views have been added, the view integration process is ended. Having explained at a high level how we handle object identification and the addition of new objects, we turn to the middle stage of our three-stage integration strategy, the identification and resolution of view conflicts. 3.13 View Conflict Recognition Batini et al. (1992) suggest that homonyms and synonyms may be indicated by the presence of concept mismatches and concept similarities, respectively. mismatches occur when identically-named concepts possess different properties and constraints; similarities occur when concepts with different 82 names share properties or constraints. Properties are defined as neighbor concepts. An entity's properties would be its attributes and the relationships it participates in. Constraints, on the other hand, are limiting conditions on the set of allowable instances of the schema. Examples of this include the cardinality constraints on relationships. These definitions of concept similarity and concept mismatch are unfortunately a bit too broad to lend us much help in positively identifying homonyms and synonyms during view integration. This is due to the sharing and integration of the data typically stored in database systems. As Date (1986, 6) states, these two aspects, integration and sharing, represent a major advantage of database systems in the "large" environment; and integration, at least, may be significant in the "small" environment, too. In this environment, the concept similarities and mismatches described above are frequently just normal consequences of data integration and do not signal a modeling error at all. As Date (1986, 7) further relates Another consequence of the same fact (that the database is integrated) is that any given user will normally be concerned only with some subset of the total database; moreover, different users' subsets will overlap in many different ways. In other words, a given database will be perceived by different users in a variety of different ways. In fact, even when two users share the same subset of the database, their views of that subset may differ considerably at a detailed level. In examining the schemas from which the integration test schemas were derived, it became apparent that differences 83 among concept properties were common and in most cases did not signal modeling errors, but were just differences in the ways the various users viewed the data set. In cases where naming errors did exist, the two major indicators seemed to be mismatches in attributes (for homonyms) and matches in attributes, roles, and relationships (for synonyms). These observations drive the reasoning followed when seeking to identify naming conflicts in REAVIEWS. Homonyms are typically identified by discrepancies in attribute sets among entities with identical names. Synonyms, on the other hand, are usually identified by discrepancies in names among entities with the same or similar attribute sets, especially when the primary keys match or the entities serve the same role in a particular REA template. Most of the modeling tools mentioned previously can do no more than identify such situations. The user is left to make the decision as to the correct nature of the entities. With the aid of additional accounting domain knowledge, we can provide more support when such situations are encountered. 3.13.1 Hemonyms. Our knowledge base contains synonym information and a managerial schema containing expected relationships between entities. Both of these types of knowledge can be applied in homonym and synonym conflicts. When an entity is encountered in a user view, we try to find a matching structure in the managerial schema. 84 If we find a match on names or synonyms, we compare attribute sets, looking for possible homonyms. Of course, if the attribute sets produce no conflicts, the entities are treated as being the same. If an attribute discrepancy is found, we check to see if the attribute discrepancy is just a case of using synonyms for the attributes. In the case of a difference in the primary keys, we check to see if the new primary key is known to be a candidate key of the entity. If either of these cases are true, we resolve the conflict by treating the two entities as the same. If this is not the case, then we check to see if the two entities are playing the same role in a known REA template. When we find that they do play the same role, we suspect they are possibly the same entities, with the individual view using a different set of attributes. We present this possibility to the system user, along with a description of the entity we believe it to be, and ask the user to confirm or reject the entity. We have now moved from a simple matching strategy to one that relies on knowledge of how the entities should be related in the particular REA templates. The main objective here is to see how much we can discover about an entity before we have to stop and ask the user. We first try to find the correct answer from stored knowledge. If that is not possible, it is necessary ask the user to resolve the problem using company-specific knowledge. Even in those cases, it is possible to use some of our knowledge to help 85 guide the user in providing the correct answer. This process becomes clearer in Chapter 4, where we discuss the actual implementation of REAVIEWS and its handling of test case conflicts. 3.13.2 synonyms. A similar process occurs when we find an entity playing a role in the REA template that is already modeled by an entity with a different name. This appears to be a synonym, so we first check to see if the name is a known synonyms. If it is not, we look at the attributes to see if they are the same. If so, as we did in the above case, we present the user with the already modeled entity and its description and ask the user to confirm or reject the match. This is another case in which we are able to provide guidance from domain knowledge, even if it is not the complete solution. Entities are then added as indicated by the user, either unique new entities or merely different views of entities already in the schema. 3.13.3 Type cenflicts. Type conflicts occur when different constructs are used for the same entity, as when an object is modeled as an attribute in one user view and as an entity in another view. We can detect this in some cases by recognizing a non-key attribute for one entity that is also a key for a different entity (i.e., the attribute is a foreign key). If our domain knowledge indicates this foreign key represents an entity that should be separately 86 modeled (or if it has already been separately modeled in another user view), we notify the user of this fact and change the view to show the additional entity and a relationship linking it to the initial entity. In the above case, if the suspected entity has not already been modeled in a previous user view, we still wish to model the entity separately, as our domain knowledge suggests that we eventually wish to keep track of other attributes for that entity. Because this domain knowledge comes from outside the business enterprise, it is possible that the actual finished schema may not contain any additional attributes. In that case, the user may collapse the two entities back into one, as in the original view. Consider a user view in which we find an entity named Employee, with the key of employee number and two non-key attributes, name, and department number. We recognize that department number is the key of an entity about which we frequently model other attributes, such as department name. We therefore model the separate entity, department. Figure 3-9 shows the before and after versions of this example. Assume that no further attributes are found in later views, and we finish with a schema entity, department, with only one attribute, that being its primary key. This situation really presents no problems, as it will be accommodated when the conceptual schema is mapped into the particular data model used by the DBMS chosen for the accounting system. 87 employee no. name address department no. 1.... _o ‘ —-O _0 employee a. before expansion . o' 8 €- 0 0 3 E s s (D c m 'D * , (0.1) (0.N) employee department b. after expansion Figure 3-9 Expansion of Foreign-Key Attribute 88 3.13.4 Structural constraint Cenflicts. Dependency and behavioral conflicts come about when different views model the same relationship with different structural constraints. A dependency conflict is a discrepancy between the max- cards, while a behavioral conflict is a discrepancy between the min-cards. Before considering the resolution of these conflicts, we make a few observations on use of structural constraints in the modeling of a business enterprise. First, we note that different groups in an enterprise might rightly wish to enforce different constraints on a relationship, at least from the perspectives of the user applications that gave rise to the original user views. The goal of view integration should be to create an enterprise schema from which all of the individual user views may be constructed. If one view requires more restrictive constraints than other views, those more restrictive constraints can be implemented procedurally. Next, we note that here may be some structural constraints that should rarely, if ever, be allowed. And finally, there may be constraints that really depend upon particular aspects of the individual enterprise, e.g., management policies. These observations lead directly to our three methods of solving structural constraints, which we call force, resolve, and g- d-m. The force method is used when our domain knowledge gives us a compelling reason to use one particular set of constraints. In this case, we are saying, in essence, "We 89 feel that you should virtually always model this relationship with these constraints-— if you ignore them, you should be absolutely certain that you wish to view this relationship in a potentially incorrect manner." For illustration, the narrative and document examples in the EAS chapter on machine shops indicate that an individual sale would virtually always be to a single customer. Conversely, the shop expects multiple sales to (at least) some of its customers, but there are cases in which the shop records data about a customer prior to an actual sale. In modeling terms , we would say that customer has an optional participation in the customer-sale relationship, while sale has a mandatory participation. Further, we would say that the customer-sale relationship is a one-to-many relationship. We would therefore try to force users to accept this view of the relationship's structural constraints. The resolve method is used when we know that a relationship will normally have one "correct" set of constraints, but we don't know in advance what these constraints should be. This typically happens when those constraints are based on policies that may vary among companies. Consider the relationship between manufacturing employee and job operation. In some shops, segregation of duties may dictate that several employees will work on a given job. In a different shop, the policy may be for one employee to always complete a job individually. Naturally, 90 business factors, such as the size of the company or the nature of the product manufactured, decide which policies are appropriate, but the fact is, such policies do differ, and they affect the way we should model the objects of interest. When a structural conflict arises, we could simply ask the user to give us the "correct" structural constraint. Of course, this is not what we would expect of the experienced accountant whom we have been using as an analog for our knowledge-based system. Realizing that the answer depended on manufacturing policies, he or she would ask "Would an individual job always be completed by a single employee, or do you have some jobs that require more than one employee to complete?" The resolve method thus asks for knowledge about the individual company, then uses that knowledge to determine the correct structural constraints. The g-d-m method uses a general database modeling convention to resolve structural constraint conflicts. It is used in cases where we have no compelling reason to require all users to accept the same set of structural constraints (remember that we can enforce the stricter constraints procedurally), but the base objects that we model must accommodate all user views. Thus, what we really want to do here is find the most general set of constraint (i.e., the set that is least restrictive) and use that set in our enterprise schema. Of course, we would want to inform the user of this change, so that the user would be 91 aware that a more general set of constraints was being modeled. In this way, the user would be know that the stricter constraints would have to be enforced procedurally. 3.13.5 Key COnflicts. Key conflicts are our last class of view integration conflicts. They occur when the same concept has different keys in different views. There are really only two variations on this. The two keys are either synonyms for each other, or they are different keys. If we recognize them as synonyms, we note this fact and choose one for the primary key in the enterprise schema, notifying the user of this fact. If we recognize them as being different attributes, we choose one as the primary key, then record the other as a candidate key. If our domain knowledge does not allow us to distinguish which situation exists, we must ask the user. It may be possible to aid the user by describing the primary key already modeled in our schema. This may make it easier for the user to determine if the different key names are describing the same attributes or not. The discussion in this section has been presented, as much as possible, from the human perspective. In the next chapter, the focus is shifted to the actual implementation of domain knowledge and problem-solving strategies in REAVIEWS-—-our knowledge-based system for view integration. Included is a discussion on the choice of language representation and the test cases developed to demonstrate 92 the use of domain knowledge in the view integration. Chapter 4 concludes with details of a REAVIEWS session in which the user views are integrated, with the various conflict types recognized and resolved in the manner described in this chapter. Chapter 4. The REAVIEWS System 4.1 Knowledge Structures within REAVISIS In this section we examine several aspects of the knowledge embedded in the REAVIEWS system. We first look at the choice of knowledge representation language. We then describe how accounting domain knowledge is implemented in REAVIEWS, discussing separately the declarative and the procedural structures used. Finally, we explain the methods for applying that knowledge in the view integration task. 4.1.1 Frame-based Knowledge Representations. The Chapter 3 discussion of accounting domain knowledge and view integration proposed how one might use accounting knowledge to perform view integration and deal successfully with the integration conflicts that arise. To be used by knowledge- based systems, however, such knowledge must be encoded into a format usable by the computer. A number of formalisms have been used for such encoding. Among the more familiar are production rules, first-order predicate calculus, and frames. For the REAVIEWS system, we chose to represent the accounting domain knowledge of Chapter 3 in a frame system. Frames, as conceived by Minsky (1981), are data- structures for representing "remembered" stereotypical 93 94 knowledge that can be used to understand and make inferences about ”new" situations or objects. As typically implemented in knowledge-based systems, a frame is used to represent one object or a class of objects, and multiple frames are connected together in semantic networks, commonly referred to as frame systems. These frame systems allow us to organize and use knowledge of very complex situations or systems in an efficient and effective manner. There are a number of reasons for choosing this type of knowledge structure for REAVIEWS. REA accounting theory (in McCarthy 1979, 1982) is itself presented in a semantic network. This would argue for a knowledge representation scheme capable of modeling semantic nets. Frame systems also have a number of other benefits. As Fikes and Kehler (1985, 904-5) remark, The advantages of frame languages are considerable: They capture the way experts typically think about much of their knowledge, provide a concise structural representation of useful relations, and support a concise definition by specialization technique that is easy for most domain experts to use. In addition, special- purpose algorithms have been developed that exploit the structural characteristics of frames to rapidly perform a set of inferences commonly needed in knowledge-based systems. Chandrasekaran (1984) mentions three advantages of frame-based systems: (1) the use of default knowledge adds efficiency, as information about a particular object need only be stored when it differs from default information about that type of object; (2) frames can be structured into 95 generalization hierarchies, with default information inherited from objects at a higher level in the hierarchy; and (3) frames may contain procedural information allowing inference mechanisms to be invoked when contextually 0 These advantages make frame-based systems appropriate.1 "very useful for capturing one broad class of problem solving activity, viz. one where the basic task can be formulated as one of making inferences about objects by using one’s knowledge of related objects elsewhere in the structure" (ibid., 52-53). This is very similar to the problem solving task described in Chapter 3 for resolving view integration conflicts. 4.1.2 .Declarative Structures far.Accounting Domain Knowledge. In Chapter 3, we modeled accounting knowledge from the principles and industry levels in a managerial schema (which we refer to as the m-schema). That schema was in fact a declarative representation of the accounting knowledge. The translation of that knowledge into a frame- based system can be thought of as the translation from one semantic net to another. The entities and relationships 10 An even stronger case can be made about default information. Such information can be used to make inferences when new or incompletely specified objects are encountered. In addition, the frame structures themselves can enhance the performance of knowledge-based systems. Fikes and Kehler (1985, 907-20) discuss "various ways in which a frame- based representation facility participates in a knowledge system's reasoning functionality and can assist the system designer in determining strategies for controlling a system's reasoning." (ibid., 907) 96 will simply be represented in frame structures, rather than the E—R diagrammatic constructs of Chapter 3. In REAVIEWS then, frames are the structures used to represent objects of interest in the application domain. They are composed of two major elements, slots and facets.11 Slots describe the object and can provide taxonomic descriptions, such as the generalization hierarchy formed by Employee and Salesperson. Slots can also provide the more familiar attribute descriptions (such as quantity or price, which describe the object Inventory). Facets can be thought of as subslots. They are used to represent knowledge about the slots. Some common facets are those for the actual value of the slot (typically named value), default values, documentation strings for the slot, and various constraints for the slot. In REAVIEWS, there are two primary types of constraint facets: constraint defines the allowable domain for slot values, while multivalued defines whether slots are restricted to single values or not. For example, the entity frame has a slot named acct-cycle, which holds the names of the various accounting cycles in which the entity is found. That slot contains the following facets: - value, which lists the accounting cycle(s) for the individual entity; 11 Some of the terminology used here is derived from the large body of work on frame-based systems. Much of that work addressed structural representation issues of frames as structures in knowledge-based systems, rather than Minsky's orientation toward frames as memory structures for the control of reasoning. For example, the word ”slot" has evolved, in structural representation terms, into something with a slightly different meaning than its original use by Minsky. 97 - constraints, which describes the set from which the value facet may be filled; in REAVIEWS, value is constrained to be one of: revenue-cycle, conversion-cycle, acquisition-cycle; - multivalued, which is set to the boolean true, meaning that the slot may contain multiple values; - doc-string, which holds a short description that can be used to explain what the slot's values represent-— in this case, the string reads ”the accounting cycle(s) in which the entity participates;" and . print-name, which is set to "accounting cycle"12 Figure 4-1 shows some of the basic parts of the frame structure for entity. In REAVIEWS's hierarchical frame system, we define four sub-types entity frames: resource, event, agent, and other- entity. These four have further sub-types that represent the various entities in the m-schema. Each of the schema entities inherits the attributes of those objects above it in the hierarchy, and each may contain additional information appropriate only for that particular object. For example, sale inherits attributes from both event (its immediate parent), and entity (the parent of event). Sale also has some attributes of its own, including default values for some of the inherited slots and some new slots (such as total-amount and sales-tax) not defined in any of 12 Many of the frames and slots have abbreviated names for convenience, but this makes them difficult for the user to understand; the use of print names allows the user to be presented with easily identifiable terms, while still allowing internal system use of the shorter names. ‘98 ‘:flnldunrks rim; Systen nefine Eind Bun flindow Qperations I:I: .muu ‘"' 'w. ,LNTQ. Efiirane: ENiIivaaaafir ‘ ' " '7? Frame: ifllllillli Parents:> Children:) TOP-FRAME -5u111y ITV SOURCE Instances: L, Slots: -CVCLE -EXPL -NRHE -ROLE -TVPE ‘flT» rase ENTITY: Click right For auailable operations. Figure 4-1 Partial Structure of Entity Frame 99 its ancestor frames. Figure 4-2 shows a small portion of the frame hierarchy for entities in REAVIEWS. The other major structures in the m-schema, relationships, are represented in the same way as entities. The basic object is defined in the relationship frame, and there are various sub-types of relationship also defined. These structures contain some slots common to the entity frames, but also include new types of attributes. For example, the slot struct-const holds the structural constraints for the entities joined by the relationship. Struct-const has the following two facets for use in resolving structural constraint conflicts: - conflict-strategy, which denotes which of the three conflict resolution strategies (force, resolve, or g-d-m) should be used when such a conflict arises, and - s-conflict-proc, which names the procedure to be invoked when the resolve method is to be used (recall that this method acquires user knowledge during the integration session, and then uses that knowledge to determine the appropriate constraints). Relationship frames also contain a rel-type slot to identify the type of REA relationship for each individual relationship. It is possible that some entity pairs will have more than one relationship linking two particular entities. This is allowable as long as the relationships are unique. Modeling of the same enterprise object with multiple E-R constructs (entities or relationships) would constitute a modeling error in the m-schema and is therefore 100 System nefine [ind nun Hindus firowserxnperations Browsing lUP—lHnnt JOC§0PERGTIDN flB-TRONSFER 'TEBlflL-REQUISITION PBICE-QUOTRTION SRLE SRLE-ORDER ASH-OISE CRSH-RECEIPT SflLE-RETURN HflTERIflL-TBRHSFER JOB-OP-TVPE HRT-REQ-TVPE rowsing TOP-FRAME I Figure 4-2 Partial Entity Hierarchy 101 not allowed. In fact, one of the integration conflicts we try to resolve-—-the synonym-— is this type of error. 4.1.3 Procedural Structures far Accounting Domain Knowledge. Most of the domain knowledge in REAVIEWS is modeled declaratively, but there areas in which knowledge is best modeled with procedures. Such procedural knowledge is stored in REAVIEWS in one of two ways —-either attached to a frame of a particular m-schema object or embedded as part of the control structure of the system. An example of the former method is the use the two structural constraint facets mentioned above. Each relationship frame contains facets that carry information about how structural constraint conflicts should be resolved. The s-conflict- proc facet contains the procedure to be invoked to obtain and employ user knowledge in the conflict resolution process. An example of control structure procedural knowledge is the confirm-entity process. At a high level, the process can be viewed as the attempt to identify a user-view entity, referred to as the current-entity, using m-schema information. If that attempt fails, then a procedure is invoked to identify likely candidates for the user entity from within the schema, and these "educated guesses" are presented to the user, with some explanatory information to help the user determine if any of the proposed entities are the same as the user-view entity. Based upon the response 102 from the user, the entity is then added to the schema as either an instantiation of an existing schema entity or as a unique entity. Unlike the use of the s-conflict-proc facet, the "most likely candidate" method is not specific to the entity being investigated, but is instead a general method applied to all entities when specific identification can not be accomplished internally. The basic intuition behind this method is that two entities are more likely to be the same if they share a greater number of properties.13 An initial set of candidate entities are selected, based upon matches in the entity-role and accounting-cycle attributes. The set is then ordered using a weighting scheme which assigns to each candidate entity a weight based upon matches between the set of entities in the user view being integrated and the set of entities with which a candidate view has relationships. Those candidate entities with more matches are offered to the user before candidates with fewer matches. 4.1.4 Structures far View Integration.xnowledge. Chapter 3 discussed view integration and the use of domain knowledge in solving integration conflicts. The processes described there were derived from the techniques used by individuals with extensive modeling expertise using both REA and Entity- 13 Properties are here being defined as in Chapter 3; i.e., they are the attributes and relationships of the entities in question. 103 Relationship theory. As a result, while not intended as a rigorous cognitive model of their expertise, those processes contain a great deal of knowledge about the view integration task. This knowledge is, in a sense, compiled into the integration process itself. To avail ourselves of this knowledge, the basic control structure of REAVIEWS is patterned around the integration processes and strategies described in Chapter 3. It is a common feature of many knowledge-based systems that some knowledge gets compiled into the control structures of the system itself. REAVIEWS is no exception. By carefully attempting to pattern the control structure after those integration strategies, we believe that we have added some power to the conflict resolution ability of REAVIEWS. The modeling knowledge that adds this power is, however, modeled much less explicitly than the knowledge represented by the m-schema. 4.2 TOSt CISOS for REAVIEWS We previously explained the rationale for selecting the machine shop industry as being, in part, due to the availability of a number of modeling cases drawn from actual business enterprises from this industry. This was an important factor. Part of the purpose of this thesis was to test the proposition that using domain knowledge adds to our 104 ability to resolve integration conflicts.14 To be generalizable, test cases need to be representative of the broad classes of integration conflicts. For high external validity, the cases need to be representative of actual mistakes made by human modelers. Conceptual models (user views and enterprise schemas) from more than a dozen ”real- world" machine-shop-type businesses were used as the starting point. Test cases were derived from those materials and discussions with experienced modelers. In addition, academic and educational literature was examined for examples of integration conflicts in other domains. The result is a set of cases that, we believe, are representative of those found in industries other than the machine shop and that may also be generalized to conceptual modeling of non-accounting systems. The cases shown in Figures 4-3 through 4-6 contain examples of conflict classes shown in Table 2-1.15 The precise nature of the conflicts will be discussed in the integration example section, but we first present a brief 14 This is not an ad hoc proposition, but developed from observation and performance of the view integration task itself. When resolving integration problems such as name conflicts, the analyst must determine if two entities are referring to the same real world object or not. This determination is frequently made using domain knowledge. 15 REAVIEWS was also tested with a number of user views not discussed here. The views were primarily variations of the examples in Figures 4-3 through 4-6. The purpose was to test each solution with multiple combinations of variables; the results were all consistent with those discussed here. Of course, this is not surprising, as Newell and Simon (1976, 114) point out, ”we don't have to build 100 copies of, say, a theorem prover to demonstrate statistically that it has not overcome the combinatorial explosion of search in the way hoped for.” The variations were more for identification of errors in system programming than for errors in problem-solving logic. 105 o E . r': E 2 0' § gé s Eo'E «b 8" S g-EE 225-3 (0% 028 338 8:: worker (1,N) (1,N) 118.... ® (0,N) (1,1) item-no. . industry-no. price - - name Figure 4—3 User View — Produce-Sales-Analysis _—l-« job-no 106 pieces em ployee-no end-setup machine-no :9 17516 —0 std-setup —o descr'ption . --e job-op-no —o std-time 3 9' o 1? job- operation ]Ob type Figure 4-4 User View — Update-Work-In-Process 107 8 i? . 0 = as E E ii 3 E23 as gg 98m .52 ii iii Ti , cash- cash receipt saleJ Figure 4-5 User View — Record-Payment 108 O C a 5 o o 8 9 o 9 1’ E E E ‘5 a 2 s 2 s 8 g 8 2 9 8 ‘8 8 m I i I i i l T r mfg- 68.5" cash f employee disb (0,N) (1.1) , As time employee- hours service 9'055'pay ~ fed-tax Figure 4-6 User View — Record-Payroll 109 explanation of the methods of providing user inputs to REAVIEWS. 4.3 Inputs to REAVIEWS The REAVIEWS system is designed for REA view integration. Most of its input consists of REA-modeled user views. The diagrammatic format of the user views in Figure 2—1 must be translated into frame structures before the views can be used by the system. This is relatively straightforward, though not necessarily simple. Entities become frames, attributes become slots, and constraints on these attributes, such as range restrictions, become facets. Relationships in which an entity participates also become slots of that entity. Constraints on relationships, such as dependencies between the entities connected by the relationship, become facets. Other input may be required during the integration session. This will generally be in the form of user responses to system queries. The system may, for example, ask the user for additional information it requires for conflict resolution. When the system is unable to resolve a conflict and asks the user for help, it will still provide, when possible, suggestions or background information to aid in the task. 110 4.4 View Integration Session Following the general integration strategy described in Chapter 3, the user views are each separately retrieved and examined by REAVIEWS. A typical session starts with a welcome screen and a dialog box asking the user to type in the name of the user views input file. As REAVIEWS performs its various operations, important messages, particularly those requiring action by the user, are displayed at the top of the main REAVIEWS window, as seen in Figure 4-7. As can be seen in that figure. More detailed information on REAVIEWS’s actions is provided in the output window at the bottom of the main screen. When a user view has been completely processed, REAVIEWS pauses and instructs the user to scroll through its actions before proceeding with the integration. The next four sections discuss the integration of the four test cases. While going through the cases, the reader may wish to refer back to the problem solving overview in Figure 3-8 and the appendix. 4.4.1 Produce-sales-Analysis View. The first user view to be integrated is shown in Figure 4-3-—-the Produce-Sales- Analysis view. This view contains examples of three of the conflict classes. The first conflict is a synonym conflict between the entity named "sales-worker" and the managerial- schema entity salesperson. This type of synonym is one of the easier for a human to recognize as (1) the terms are recognized as synonyms, (2) the entities are serving the Jill PRODUCE-SRLES-SHRLVSIS Please refer to printed view as integration proceeds. Uutput Window 'ERUIEHS is opening the File IESTCRSE.TXI H USER UIEH: PRODUCE-SBLES-ONBLVSIS . msmsmsmassass"use“!masasssasssmussmmmssmssssmssmmsmmmmmsmsmmsmsmsms 'ERUIEWS is searching the enterprise schema to see if it already contains an of the entities in the current user view. Figure 4-7 REAVIEWS — Main Screen 112 same role in the accounting-cycle template, and (3) the entities have the same primary-key attribute. This highlights some of the benefits of using domain knowledge and a problem-solving structure similar to that of human modelers. In REAVIEWS, this recognition is implemented by providing the system with lists of commonly used synonyms for entities and attributes. Following the integration script described in Chapter 3, REAVIEWS first tries to locate the entity name "sales-worker" in the m-schema. That failing, it checks the synonym list. The synonym sales- worker is not in the m-schema's synonym list, so REAVIEWS follows the strategy described for selecting likely 16 Using the weighted list of candidate candidate entities. templates constructed when the user view was retrieved, REAVIEWS presents the user with a screen asking the user for confirmation of the most likely candidate entity. This screen is shown in Figure 4-8. If the entity is confirmed, REAVIEWS then proceeds with identifying and adding the primary-key and non-primary-key attributes, modifying the m- schema as new information is added. The second conflict arises because the max-cards of the Customer-Industry relationship differ from those modeled in the m-schema produced from domain knowledge. Such a 16 To enhance REAVIEWS's performance, and more closely mimic the expert data modeler's behavior, some form of natural language parser could be added, to allow matching parts of entity names. For example, a human would recognize that a worker is also a person, so sales-worker would readily be identified as the same entity as sales-person. This is beyond the current scope of REAVIEWS, but may be the focus of future research efforts. 11J3 The rea templ or shows that event as the DECREHENT oF the resource FINISHED-8000 the usual inside agent(s) For this event: snlESPERan the usual outside agent(s) For this event: CUSTOMER In the user view named PRODuCE-SRLEs-BWRLVSIS iF the IN—thNT named snLEs-WORKER is the same as the res-template entity named SALESPERSOM please select the yes button. 1F not, please select the no button. I} (III Output Hinduu by the number oF user-view entities Found in each template, so the most likely template gets examined First. To help identiFy user entities, BERUIEWS uses event templates From the enterprise schema. REhUIEWS is ordering the templates by the number oF user-view entities Found in each template, so the most likel te ulate nets examined First. Figure 4-8 REAVIEWS — "Candidate Entity" Screen 114 difference constitutes a dependency conflict. As modeled, the user view shows that a customer may be classified into many industries, but a single industry will have only one customer associated with it. This is opposite of the normal case. Domain knowledge in our m-schema (which we refer to as to as d-knowledge) leads us to expect the more normal case, and the structural constraints in the m-schema consequently use the force resolution strategy. The third conflict in Figure 4-3 is a behavioral conflict, where the min-card of industry indicates that an industry cannot be identified or maintained in the database without at least one corresponding customer. This precludes, for example, marketing or product development groups from identifying and tracking information on industries with potential for sales but to which no sales have been made. The d-knowledge min-card of "zero" allows for such tracking, and REAVIEWS uses that constraint. 4.4.2 Update-werk-in-Process View. The user view in Figure 4-4 also contains three conflicts. Two are similar to the structural conflicts found in the previous view. There is a dependency conflict in the min-cards of the relationship between job-operation and job-operation-type. There is also a behavioral conflict in the max-cards of that same relationship. As modeled, the min-cards indicate that standards for job operations must be associated with an actual operation 115 event to be allowed in the database. This precludes the establishing of standards for an operation before the operation is actually used on a customer’s job. Likewise, the min-card for job-operation precludes an employee from performing an operation for which no standards have been set. This precludes ”custom" or "cost-plus" type jobs. This type of job is common when an entirely new product is being built. Without the benefit of experience in building a similar item, it can be difficult to determine what a standard amount of materials or labor might be. Hence, the m-schema contains a less restrictive min-card of "zero" for each of the entities. Allowing for the less-likely possibility of a shop that does requires the setting of standards in advance of all operations, REAVIEWS does not use the force method of conflict resolution, but instead uses the g-d—m method. This allows for more restrictive modeling in individual user views while constructing an enterprise schema that will also allow the more usual case.17 A dependency conflict arises due to the max-card for job-operation-type. As shown in Figure 4-4, an operation- type could be used only once. This is contrary to the purpose and use of standards for job-operations. Such standards are usually set for operations that are repeatedly 17 For example, it may be that there will be a later user view for updating work-in-process for custom jobs. In that user view, a less- restrictive min-card of "zero” might be required. 116 performed, and so we should expect that a particular job- operation standard would be used for multiple actual operation events. Again, the g-d-m method allows the m- schema's less restrictive max-card of ”N” to be used in the enterprise schema. The third conflict in Figure 4-4 is a type conflict. The entity mfg-employee is an important component of a schema for any manufacturing company. It is difficult to conceive of an enterprise schema for any such firm that would not model that employee as a separate entity. In the user view in Figure 4-4, however, the employee is modeled as an attribute rather than an entity. When adding the non- primary-key attributes from the user view to the enterprise schema, an experienced human modeler would recognize that the job-operation attribute employee-no is an identifier for the manufacturing employee. That modeler would also be aware that we would virtually always want that employee modeled as a separate entity and modify the schema accordingly. This process is implemented in REAVIEWS when it adds the user-view non-primary-key attributes to the schema. At that time, a check is made of all of those attributes to see if any are recognized as primary or candidate keys for any entities modeled in the m-schema for that event template. When found, REAVIEWS notifies the user that it is modifying the user view to include the "new" entity. Figure 4-9 shows the screen presented to the user in this case. REAVIEWS 1137 $35: airfares ..:' . £5 3;; ..... H: 5 ..... ”1m. .=?~'.~.'Y~‘.~. .33 .5.-1:13. 5. 3%. 1*? E his attribute is also the prinmry key For the entity ire-Elnora §:e aware that this entity usually has multiple attributes about which we g ecord inFormation. RERUIEWS will model this as a separate entity. Output Window «assassswswwwwsswswwmaasemswesawnuusmsswsmsmmeswm -£nvr£us is searching the enterprise scheme to see iF it already contains any oF the entities in the current user view. 'ERUIEHS is examining the user entity JOB-OPERATION 'ERUIEWS is adding the entity JOB-OPERRTION to the enter rise schema. Figure 4-9 REAVIEWS -— Notification of Foreign-Key Expansion 118 also adds a relationship between the new entity and the entity from whose attribute list it was discovered. 4.4.3 Record-Payment View. Figure 4-5 also contains dependency and behavioral conflicts, but these demonstrate the third method of structural constraint conflict resolution, resolve. As modeled, the cash min-card shows that all cash receipts by the company must be payments for sales. This precludes other common cash-receipt events such as owner/stockholder investment or purchase returns for which the company receives a check rather than an adjustment to its account by the vendor. The two max-cards indicate that each sale will be paid for by separate cash receipts and that a sale will always be paid for by one and only one payment. This is contrary to the more usual case in which a customer may send in one check to pay for multiple invoices or the case of a customer paying "on account," in which a payment may reimburse only part of the total amount owed for one sale. The resolve method of constraint conflict resolution looks to the individual relationship frame to findthe resolution process created specifically for that relationship. Such processes are different from the force and g-d-m methods in that resolve processes require information beyond that contained in the m-schema relationship itself. In the example in Figure 4-5, resolving the conflict between the user view and the m- 119 schema is straightforward if we know a little about the actual policies of the company. A human could decide on the appropriate structural constraints by asking some simple questions about the sale payment process. REAVIEWS does the same thing when it invokes the process attached to the cash- receipt-sale relationship. Figure 4-10 shows the screen presented to the user. The choices available offer the combinations of payment policies commonly used. REAVIEWS will assign structural constraints based upon the user response. In this test session, we selected the last option, and REAVIEWS appropriately assigned max-cards of "N" to both entities. It also assigned a min-card of "zero" to cash, to recognize the fact that companies routinely encounter cash receipts for events other than sales. 4.4.4 Record-Payroll View. Figure 4-6 provides an example of the last structural conflict, the key conflict. In this case, mfg-employee is shown with social-security-no as a primary key. The Encyclopedia of Accounting Systems provided evidence that some companies might assign other identifiers to employees. Many standard accounting texts similarly demonstrate employee codes of this nature, often using a code that incorporates other information (such as employee department) in the employee-number code. The m- schema thus includes both social-security-no and employee-no as candidate keys, with employee-no initially listed as the default primary key. 12C) .- e 30 i 9 us jam To help determine the correct structural constraints For the cashreceipt-S‘rII relationship. please select the choice below mich best describes your I policy regarding acceptance of payments For sales. thank you. He accept only Full payments For single inuoices. He accept Full or partial payments, but neuer for more than one iJBbice. He accept payment for multiple inuoices, but each must be paid in Full. He accept partial or full payoents. For one or multiple inuoices. 'EflUleS is examining the user relationsh p cnSfl-BECEIFT—SREE-o 'ERUIEHS is checking For entity synonyms discouered during entity integration; the current relationship will be renaned, if necessary. he enterprise schema contains the relationship CRSH-RECEIPI-SRLE-D. REQUIEHS is now co narin- structural constraints For conflicts. Figure 4-10 REAVIEWS — Request for Company-Level Knowledge 121 When REAVIEWS encounters a user entity with a primary key different than in the m-schema, it looks for the user's primary key in the m-schema entity's list of candidate keys. Following the experienced modeler's practice of deferring to the user's view of the company as much as possible, REAVIEWS will first look to see if the user entity has already been instantiated in a different user view. If so, REAVIEWS defers to the already integrated view and uses that primary key. If different than the current user view's primary key, the current primary key is placed in the candidate key list. If the entity has not yet been instantiated, REAVIEWS uses the current view’s primary key in the enterprise schema. It then switches the entity's default primary key to match. The last user view having been integrated, REAVIEWS asks the user for the name of the output file in which to store the integrated schema. It then writes the schema information to that file and stops the REAVIEWS session. The instantiated schema information is still present in the system, and it may be reviewed by the user if desired. Such a review could be used to provide additional insights into the completeness of the finished schema. In the normal course of accounting system design, a designer may expect certain entities and events to be present and notice their obvious absence from the users' schemas. Similarly, the Encyclopedia of Accounting Systems provides information about the typical entities and events that one could expect to see in a well-constructed schema. These entities and 122 events having been modeled in the m-schema. The user can readily identify ”missing" schema components by browsing the schema hierarchy at the end of a REAVIEWS session. At the end of each branch in the hierarchy, there will be either a schema frame, shown in normal black letters, or an instance of a frame, shown in red italic letters. Those frames without instances are the schema components that were modeled in the m-schema but not modeled in any user views. This process of browsing the finished schema could be used to help the user identify potentially-missing schema components. An alternative to this approach would be to add some text to the output file listing the uninstantiated schema components and suggesting the user review them. As all the user views have been integrated, all that remains is for the user or analyst to draw the entity- relationship diagrams, using the information in the output Lfile. The nature of that output is discussed next. 4.5 Outputs from REAVIEWS When the user views have all been integrated, the internal frame-based representation must be translated into a form understandable by the users. REAVIEWS output will be the complete schema as a list of entities, attributes, relationships, and structural constraints. The most desirable output would of course be an E-R diagram of the company. Producing such a diagrammatic representation is, 1however, beyond the scope of the REAVIEWS system. While the 123 spacial manipulation of objects required to produce E-R diagrams seems relatively simple for humans, the task is non—trivial on a computer. There are tools available that provide such output from a list of entity-relationship specifications. It may be possible to link such tools to REAVIEWS so that output could be provided in the in diagram form, but at this time that is outside the scope of our current research. Figure 4-11 shows a short section of the output file from this REAVIEWS session, while Figure 4-12 shows an E-R diagram produced from that output. Both of those figures show only part of the actual output from the test case session. 4.6 Software Environment REAVIEWS was implemented using the GoldWOrks II expert system development tool. This environment was initially chosen for the REACH project (McCarthy and Rockwell 1989) because of its its support for both rule and frame structures. The REAVIEWS project grew out of REACH. GoldWorks II also provides other development tools, such as object-oriented programming, that are useful in KBS design. 124 The entity named SALE is of type EVENT. Its primary-key attribute is: INVOICE-NO. It's non-primary-key attributes are: (TOTAL-AMOUNT DATE). The primary-key attribute synonyms are: (SALES-NO). The entity named FINISHED-GOOD is of type RESOURCE. Its primary-key attribute is: ITEM-NO. It's non-primary-key attributes are: (PRICE). Synonyms for FINISHED-GOOD are: (PRODUCT) The entity named CUSTOMER is of type AGENT. Its primary-key attribute is: CUSTOMER-NO. It’s non-primary—key attributes are: (NAME). The entity named SALESPERSON is of type AGENT. Its primary-key attribute is: EMPLOYEE-NO. It's candidate-key attributes are: ((SOCIAL-SECURITY-NO)). It's non-primary-key attributes are: (COMMISSION-RATE NAME). Synonyms for SALESPERSON are: (SALESWORKER) The primary-key attribute synonyms are: (EMF-NO). The relationship SALE-SALESPERSON-C is a CONTROL relationship. The entities joined by this relationship are: (SALE SALESPERSON). The structural constraints for these entities are: ((SALESPERSON 0 N) (SALE 1 1)). The relationship CUSTOMER-SALE-C is a CONTROL relationship. The entities joined by this relationship are: (CUSTOMER SALE). The structural constraints for these entities are: ((CUSTOMER 1 N) (SALE 1 1)). The relationship FINISHED-GOOD-SALE-S is a STOCK-FLOW relationship. It's non-primary—key attributes are: OTY. The entities joined by this relationship are: (FINISHED-GOOD SALE). The structural constraints for these entities are: ((FlNlSHED-GOOD O 1) (SALE 1 NI). Figure 4-11 Partial Output from REAVIEWS’s Session 125 . s E 2 § E. E is 8 '6 2 a :3 E s a E a :9”. a 2 iinizhed- I Y T (1") ‘1'”) I t [ good sale ® customer (1.1) <® (0.N) ..- employee-no sales- social-security-no person name commission-rate Figure 4-12 Partial Schema Produced from REAVIEWS Output Chapter 5. Summary and Contributions In this thesis, we have discussed research into conceptual database modeling from the perspective of accounting information systems design. In Chapter 2, we focused on problems encountered at the view integration stage and on the inability of existing knowledge-based modeling systems to adequately resolve them. We proposed that existing computer experts systems fail, in part, because they lack important application-specific domain knowledge available to human experts. Chapter 3 contained our discussion of the modeling and use of domain knowledge as a potential solution to some integration problems. In Chapter 4, we described the prototype knowledge-based system (REAVIEWS) designed to explore and test that proposal. The discussion covered the modeling and use of reconstructive expertise in REAVIEWS, along with details of an integration session, using test cases developed for this research. In this chapter, we first discuss limits of scope for this thesis. Next, we examine the research setting and justifications for this work. Following that, we consider the research contributions of this work. We close with a 126 127 short section on future research directions suggested by our work. 5.1 Limits of Scope for REAVIEWS REAVIEWS was developed following the process described in McCarthy, Rockwell, and Wallingford (1989). Figure 5-1 illustrates part of that process, in which the AI system component with the highest complexity (Component 3) is developed first. Other components are initially developed less fully. Problems of tractability force researchers to make the inevitable trade-off between depth and breadth of scope. To achieve the desired research objectives, REAVIEWS was designed to perform view integration for the revenue and conversion accounting cycles in a representative and rich industry setting. While future development is anticipated for all of the accounting cycles and additional industries, the initial scope of the system was limited to those two cycles and the machine shop industry. 5.2 Research context and Justification The initial impetus for REAVIEWS came from an earlier research project in computer-aided software engineering (CASE). That work (McCarthy and Rockwell 1988, 1989) attempted to bridge existing research in systems for structured analysis and conceptual database design. The system proposed there (named REACH) is represented in Figure 5-2. REACH would use multiple types of knowledge 128 FULL DEPTH OF COMPLEXITY .. ... Q .. § .Q ~Q .‘ O. O ... . .O Q. Q .Q .. 5 Component n Sub-Task Sub-Task Primitive Task Descrlptlon Primitive Task Descrlptlon Figure 5—1 Scope of Pilot System [sourcez McCarthy, Rockwell, and Wallingford (1989)] Fri 1' Task Descrlptlon 129 DFD'a, DESIGNAID \ - c - ‘ IMPLEMENTATION W DESIGN '------------------ ......OOOOOOOOOOOOOOI.. O ACCOUNTING KNOWLEDGE -. I ' . . Scheme 0 . ' : : TOP-DOWN 1 . ; ENTERPRISE ANALYSIS 1 I : (Swarm Encyclopedia) ' : : \ VIEW : ' : : MODELING / . TARGET . ; EOONSTRucTIVE EXPERTIS A»? VIEW I SYSTEM . : OF ACCOUNTING THEORY : INTEGRATION . KNOWLEDGE ' : EA Event Template) 3 | ' : 0 I z I : ; IMPLEMENTATION HEURISTICS I . 3 (Events Acctg. Compromises) I I '. . . ..eoeeeeeeeeeaeeeeeei°. : PHYSICAL ' DESIGN . BEAzQH ' ' METHODS KNOWLEDGE I . (ER a Normalization) I ' I ‘ l ‘ I ~----------------------' Figure 5-2 The REACH System [adapted from: McCarthy and Rockwell ( 1989)] 130 (from several knowledge domains) in a CASE tool to connect two relatively disparate methodologies. The inputs of the view modeling stage are formal information requirements derived using structured analysis. The outputs Of the view modeler will be the user schema representations required by REAVIEWS. In the course of the research, it became apparent that the integration Of user schemas was extremely problematic and would require a considerable research effort. This realization provided some of the initial impetus for this thesis, but REAVIEWS can also be placed within research contexts beyond the REACH project. Amer et al. (1987, 14) declare that "from an accounting perspective, the higher database abstraction of the conceptual and external levels, and therefore the CIS [computer information systems] research area of data modeling, significantly impacts design considerations Of accounting databases." They suggest that application and adaption Of theories from other scientific disciplines might offer new insights tO accounting researchers and should be encouraged. They also observe that accounting research has notably benefitted from the application of expert systems technology to the accounting problem domains Of auditing and taxation. The benefits Of such a combination do not accrue solely to accounting researchers. As O’Leary (1988, 26) states, "certain research topics, methodologies and database approaches used in software engineering research can benefit from the specificity Of context Offered by accounting 131 information systems." In addition, "the existence of expertise by accounting information systems designers and developers has received little research attention" (ibid., 30). Embedding more domain-specific accounting knowledge within the REA framework is also a logical development in the conceptual modeling research area. Reuber (1988) called for further development of generic enterprise models relevant to accounting, such as the REA model. Weber (1986) suggested future research "might attempt to refine the [REA] model to lower levels Of abstraction, even if the model becomes domain-specific. Work in knowledge acquisition from documentation is as yet less well developed than other areas of knowledge acquisition research. While much work has been done on natural language processing (NLP) Of textual matter, much less formal work has been done on acquiring knowledge for KBS from such sources. The use Of reconstructed knowledge in REAVIEWS was viewed as a means Of exploring and refining some Of the issues in this area Of research. It is not uncommon for some of the knowledge in a KBS to originate from documentation. There exist, however, relatively few systems with knowledge bases constructed primarily from 18 documentation. The specificity of context provided by the 18 See Hoffman (1989) for examples of systems using expertise from documentation. 132 accounting environment Offered some distinct advantage in this effort. As a final matter in placing the contribution of REAVIEWS into an overall research context, we recall the quote of Newell and Simon (1976, 114), who maintain that research such as that performed for this thesis can be considered a form of empirical inquiry: Each new program that is built is an experiment. It poses questions to nature and its behavior Offers clues to an answer.... We don't have to build 100 copies of, say, a theorem prover,to demonstrate statistically that it has not overcome the combinatorial explosion Of search in the way hoped for.... But as basic scientists, we build machines and programs as a way of discovering new phenomena and analyzing phenomena we already know about. We believe that the system building process underlying our research efforts here fall within the scope of empirical inquiry delineated by these two noted computer scientists. 5.3 Contributions This thesis contributes to the existing body of research on a number Of dimensions. As far as can be determined, this is the first knowledge-based system that uses domain-specific theory to structure the task Of view integration. Existing KBS for conceptual modeling use knowledge primarily from the field Of conceptual database design, not from the application domain being modeled. REAVIEWS used a general domain theory about accounting as 133 developed in the REA accounting model. It also used a somewhat less general domain theory Of accounting for companies in a particular industry, as presented in the Encyclopedia of Accounting Systems. As demonstrated in the test cases, this domain-specific knowledge did allow us to resolve certain conflicts which have stymied existing modeling systems. The basic findings support the initial proposition that some integration conflicts are solvable when such knowledge is used. This is significant, as accounting systems are the backbone Of most commercial information systems. The second major contribution Of this research lies in its use Of "chart-Of-accounts-based" sources for much Of the domain-specific knowledge. We in essence took a rich source of knowledge about accounting for a particular industry and transformed it from one highly specialized model tO another very different way Of viewing accounting. In the process, the knowledge was transformed from a format best suited for manual record-keeping systems to a format which made that knowledge available to knowledge-based systems for conceptual database design. REA accounting theory provided us with a "knowledge structure" independent Of the implementation choices inherent in the building Of modeling systems. As such, it provides some insights to areas other than accounting that may have a similar domain theory for use in the knowledge structuring task. 134 This thesis also contributes to the body of research concerned with the acquisition Of knowledge from reconstructive (or documented) sources. The findings support the proposition that the process of such knowledge acquisition can be aided by the presence and use Of well- developed domain theories. The theoretical accounting basis for REAVIEWS (primarily derived from the REA model) directed the knowledge acquisition task and helped us identify important facts in our domain Of interest. That theory also made us painfully aware of those instances where the domain "expert" (the Encyclopedia of Accounting Systems) could not provide sufficient expertise to assist in the view integration task. The application Of artificial intelligence research paradigms to accounting issues has yielded a considerable amount Of knowledge and insight, particularly in the area of audit judgement. Research in applying those paradigms to the area of accounting information system design is just beginning. This project can also be viewed as one step in extending our knowledge in that area. 5.4 Future Research Directions A variety Of future research topics are suggested by this thesis. Host Of these topics can be grouped into two major areas: knowledge acquisition and conceptual modeling. The automation Of knowledge acquisition is currently the focus of considerable research attention. The methodology 135 developed for the manual translation of reconstructive knowledge for REAVIEWS could form the basis for research into automated acquisition Of such knowledge. The application domain Of REAVIEWS-— conceptual modeling of accounting systems-— possesses properties that make it appropriate for this type of research. First, the vocabulary for accounting and accounting system design is relatively limited and well-defined when compared to language as a whole. Second, the REA model provides a syntactic and semantic structure into which the accounting narratives may be mapped. Another interesting extension Of this work would be the use of domain-specific accounting knowledge in earlier stages of conceptual database design. There may be advantages to introducing this extra knowledge into the modeling process before integration occurs. By analyzing user information requirements within the context Of industry-specific accounting cycle models, some Of the integration conflicts may be resolved at the view modeling stage. Within the conceptual modeling area, "industry- specific" enterprise models developed from other industries Offer potential insight into the accounting system design process. Exploration of the commonalities found in these models may lead to more generalizable enterprise models, such as models for manufacturing or retail firms. These models would be at a higher level Of abstraction than the 136 models used in REAVIEWS but would still be more specific than the REA model. The synthesis of new enterprise models from existing models Offers another research avenue. Sources such as the Encyclopedia of Accounting Systems contain a limited number Of firm types. This might be compared to a situation in which accountants have a great deal of experience designing accounting systems, but only in a narrow range Of business types. When faced with the task Of designing accounting systems for new types Of business, the accountants might recognize similarities between the new businesses and those with which they have experience. For example, when first encountering a video rental type of business, the accountants might notice that the video store is similar to both a library and a record store. With experience and knowledge in those two business areas, the accountants could construct an accounting information system. At this time, computers are not as good at performing this kind Of design task as are humans. Research aimed at increasing our understanding Of the design process is a promising area for future endeavors. 5.5 Final Conclusions As discussed in the previous two sections, the research reported in this thesis makes significant contributions and extensions to prior research work in the areas of conceptual database design and knowledge acquisition. Continued work 137 on the REAVIEWS and REACH systems is expected to provide further insights into what are clearly very complex and very important issues, particularly in the area of accounting system design. While the setting for this project was in the domain of accounting, we believe there is some generalizability in the methods used to structure the acquisition and use Of knowledge via theory from the application domain. Finally, the use of application domain theory should also be useful in conceptual modeling systems designed for domains other than accounting. APPENDIX APPENDIX: Major View Integration Processes in REAVIEWS (as depicted in Figure 3-8) NOTE: Processes are listed alphabetically rather than by order of appearance during integration. ADD ENTITY: The managerial schema (m-schema) is checked to see if the current-entity has already been instantiated. If not, the entity instance is created, and accounting- cycle template is updated to reflect this. Any entity synonyms discovered in the entity identification process are added to m-schema. ADD FOREIGN-KEY ENTITY: During the addition Of non-primary-key attributes (NPKA) to the m-schema, we may find a foreign key Of another entity which is modeled in the m-schema. Entities are modeled separately in the m-schema because they are important Objects about which we frequently capture additional information. When a foreign key is found, the attribute is removed from the npka list Of the current entity; the foreign-key entity is added to the schema as in ADD ENTITY; and the relationship between the two entities is added to the m-schema as in ADD RELATIONSHIP. The user is notified that this action is being taken. ADD NPKAS: This is the procedure for adding non-primary-key attributes to the m-schema. In this version Of REAVIEWS the process is simple: we instantiate the attribute unless it (or one of its synonyms) has already been added to the schema. There is nO domain knowledge used in this process, primarily because the EAS (our source of reconstructive knowledge) contained too little information about attributes Of the important Objects in the domain. Richer sources Of domain knowledge may provide enough additional knowledge to allow the type Of problem solving strategies used for entities, but that is beyond the scope of present research. 138 139 ADD RELATIONSHIP: Relationships are handled similarly to entities. We first check to see if the relationship has already been instantiated, and if not, we add it to the schema, updating the accounting cycle template to reflect this. NPKAs are added as in ADD NPKAs. ADD UNIQUE ENTITY: When adding a new entity to the scheme, we must first create a new entity in the m-schema, then instantiate it, adding the attribute information (pka and npka). The accounting-cycle template is also updated. CONEIRN CANDIDATE ENTITY: By order of "likelihood" (see FIND CANDIDATE ENTITIES) the candidate entities are presented to the user, along with information about the REA template in which it was found (see Figure 4-8). If the user indicates that one of our candidate entities is the same as the current- entity, then we add the current-entity as an instance Of the confirmed candidate entity, using the ADD ENTITY process. If none Of the candidate entities are confirmed, the current-entity is added with ADD UNIQUE ENTITY. FIND CANDIDATE ENTITIEB: When an entity can not be identified from its name and key attributes, we search the m-schema for potential matches. We look for other entities Of the same type, playing a similar role in our accounting-cycle templates. If found, these "candidate entities" are ordered, so that we may present the more-likely candidates first. The ordering process follows asimple heuristic: we start with the accounting-cycle templates that most closely resemble the current user view, then proceed through those templates that bear less and less resemblance. In this case, "resemblance" is approximated by a simple count of the number Of matches between Objects in the user view and the m-schema template. If a template has three of the same entities as the user view, that template is deemed a "better" candidate than a template with only two Of the same entities. The "found" candidate entities are then used in CONFIRM CANDIDATE ENTITY. 140 FIND ENTITY: The initial attempts at finding a match for the current-entity in the m-schema are basically a pattern match on entity names/synonyms and types. If no match is found, we invoke FIND CANDIDATE ENTITIES (above), which uses a somewhat more sophisticated matching based upon entity roles. If a match is found, we attempt to confirm our identification by examining the primary-key attributes (PKA) via the FIND PKA process. FIND FOREIGN-KEY NPEAS: Before we add the non-primary-key attributes to our m- schema, we check for foreign keys in the NPKA list. Their presence indicates a potential type conflict in the schemas, and this is resolved via ADD FOREIGN-KEY ENTITY. FIND PEA: After a current-entity has been matched with an entity in the m-schema (by FIND ENTITY), we dO a pattern match on the primary key entities. If the match fails, we check primary key attribute synonyms. If successful on either match, we add the current-entity to the m-schema via ADD ENTITY. If the PEA/synonym match fails, we match on candidate keys and their synonyms. If a match is found and the entity has not been instantiated, we add the entity to the schema. We then change the m- schema's primary key to match the current-entity's key. We next place the Old default-primary key in the candidate key list. This follows our heuristic Of deferring to the users' view Of the firm when at all possible. If the entity has already been instantiated, we place the current-entity's primary key in the candidate key list. FIND RELATIONSHIP: Before a relationship can be added to the schema, we check the m-schema to see if we have renamed any of the entities connected by the relationship. This renaming frequently occurs during synonym or homonym conflict resolution. A relationship’s name is the concatenation of the names of the two entities. Failure to check for the renaming of entities can result in the instantiation Of a single relationship under multiple names. If entities have been renamed, we change the relationship name to reflect this. 141 GET USER SCEENA: At the beginning of an integration session, we divide up all the user views into their respective accounting cycles. This allows us to integrate the views one accounting-cycle at a time. This is done as an aid to the user rather than as a requirement Of our problem solving strategy. If the domain expertise suggests that one set of transaction templates should be given primacy over others, that set could be integrated first. GET INDIVIDUAL VIEW: User views are selected for processing by cycle. When all Of the views in a cycle have been integrated, the next cycle is selected for processing. At this time, we have no domain-based preference for the ordering, although the structure of REAVIEWS does allow such preferential ordering (see GET USER SCHEMA, above). SAVE BCNENA: In a manual view modeling setting, the integration described in this thesis is accomplished primarily in a graphical environment. Saving the schema in that setting would consist of gathering up all the diagrams and storing them together. In REAVIEWS, the schema information resides in the frame hierarchy of the m- schema (which includes the modifications to it made during the integration process). There are two ways in which this information is provided to the user. First, the entire instantiated schema is copied to a text file (Figure 4-11 shows a portion Of an output file from the test cases). This text-based representation contains the information needed to produce an E—R diagram Of the schema. REAVIEWS does not attempt this graphical depiction for two reasons: (1) the diagramming task is outside the research scope of this thesis, and (2) the task extremely complex and only imperfectly implemented in the commercial and research systems that have been developed for this process (some examples Of which were given in Chapter 2). In addition to the text-based schema, REAVIEWS provides a graphical depiction Of the schema Objects. This can be used look for entities that exist in the m-schema but that were not instantiated from any of the user views. This could be used to aid the user in the identification Of incomplete user specifications. 142 SOLVE CONFLICT: When conflicts exist between the structural constraints in a user-view relationship and an m-schema relationship they are resolved using the three methods described in Chapter 3. If the force or g-d—m methods are used, the constraints are automatically changed, and the user is notified. The actions taken are shown in the REAVIEWS output window (see Figure 4-7). The g-d-m method also notifies the user via the main screen. When the resolve method is used, additional information is acquired by the system (e.g., see Figure 4-10). Resolve strategies are primarily procedural knowledge attached to their specific relationships and invoked when necessary. Although the process illustrated in the test case acquires the needed domain knowledge directly from the user, a particular resolve method might instead look in the m-schema for the information needed. LIST OF REFERENCES LIST OF REFERENCES Amer, T., A. D. Bailey, Jr., and P. De. 1987. A review Of the computer information systems research related to accounting and auditing. JOurnal of Information Systems (Fall): 3—28. Armitage, H. M. 1985. Linking management accounting with computer technology. Research monograph. Hamilton, Ontario: Society of Management Accountants Of Canada. Batini, C., S. Ceri, and S. B. Navathe. 1992. Conceptual database design: An entity-relationship approach. Redwood City, CA: Benjamin/Cummings. Batini, C., M. Lenzerini, and 8.8. Navathe. 1986. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys 18 (December): 323—64. Chandrasekaran, B. 1984. Expert systems: Matching techniques to tasks. In Artificial intelligence applications for business, ed. W. Reitman, 41-63. Norwood, NJ: Ablex. Chen, P. P. 1976. The entity-relationship model—Toward a unified view of data. ACM Transactions on Database Systems 1 (March): 9—36. Choobineh, J., M. Manning, J. Nunamaker, Jr., and B. R. Konsynski. 1988. An expert database design system based on analysis of forms. IEEE Transactions on Database Engineering 14 (February): 242—53. COdd, E. F. 1970. A relational model of data for large shared data banks. Communications of the ACM 13:6. COdd, E. F. 1991. The relational model for database management. Reading, MA: Addison-Wesley. Colantoni, C. S., R. P. Manes, and A. Whinston. 1971. A unified approach to the theory of accounting and information systems. Accounting Review 46 (January): 90—102. 143 144 Date, C. J. 1986. An introduction to database systems. Reading, MA: Addison-Wesley. DeMarco, T. 1979. Structured analysis and system specification. Englewood Cliffs, NJ: Prentice-Hall. Denna, E. L., and W. E. McCarthy. 1987. An Events Accounting Foundation for DSS Implementation. In Decision Support Systems: Theory and Applications, eds. C. W. Holsapple and A. B. Whinston, 239-63. Springer-Verlag. Ericsson, K. A., and H. A. Simon. 1984. Protocol analysis: Verbal reports as data. Cambridge: MIT Press. Everest, G. C., and R. Weber. 1977. A relational approach to accounting models. Accounting Review 52 (April): 340—59. Fikes, R., and T. Kehler. 1985. The role Of frame-based representation in reasoning. Communications of the ACM 28 (September): 904—20. Furtado, A. L., M. A. Casanova, and L. Tucherman. 1988. The CHRIS consultant. In Entity-relationship approach: Proceedings of the sixth international conference on entity-relationship approach, ed. S. T. March, 515—32. Amsterdam: North-Holland. Gal, 6., and W. E. McCarthy. 1986. Operation of a relational accounting system. Advances in Accounting 3: 83-112. Gane, C., and T. Sarson. 1979. Structured systems analysis: T0015 and techniques. Englewood Cliffs, NJ: Prentice- Hall. Geerts, G., and W. E. McCarthy. 1991. Database accounting systems. In Information technology perspectives in accounting: An integrated approach, eds. B. Williams and B. J. Sproul. Goldstein, R. C., and V. C. Storey. 1989. Some findings on the intuitiveness of entity-relationship constructs. In Proceedings of the eighth international conference on entity-relationship approach, ed. F. H. Lochovsky, 6—20. Toronto: ER Institute. Hammer, M., and D. McLeod. 1981. Database description with SDM: A semantic model. ACM Transactions on Database Systems 6 (September): 351—86. Haseman, W. D., and A. B. Whinston. 1976. Design of a multidimensional accounting system. Accounting Review 51 (January): 65—79. 145 Haseman, W. D., and A. B. Whinston. 1977. Introduction to data management. Homewood, IL: Irwin. Hawryskzkiewycz, I. T. 1984. Database analysis and design. Chicago: Science Research Associates. Hoffman, R. R. 1989. A survey of methods for eliciting the knowledge of experts. SIGART Newsletter, no. 108 (April): 19—27. Horngren, C. T., and G. Foster. 1987. Cost accounting: A managerial emphasis. Englewood Cliffs, NJ: Prentice- Hall. Hull, R., and R. King. 1987. Semantic database modeling: Survey, applications, and research issues. ACM Computing Surveys 19 (September): 201—60. Ijiri, Y. 1975. Theory of Accounting Measurement. Sarasota, FL: American Accounting Association. Johnson, P. E. 1983. What Kind of expert should a system be? The Journal of Medicine and Philosophy (February): 77—97. Kuntz, M., and R. Melchert. 1989. Ergonomic schema design and browsing with more semantics in the Pasta-3 interface for E-R DBMSs. In Proceedings of the eighth international conference on entity-relationship approach, ed. F. H. Lochovsky, 263—78. Toronto: ER Institute. Lenat, D., M. Prakash, and M. Shepherd. 1986. CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. AI Magazine 6(4). Lieberman, A. z., and A. B. Whinston. 1975. A structuring of an events-accounting information system. Accounting Review 50 (April): 246—58. Lum, V., S. Ghosh, M. Schkolnick, D. Jefferson, 8. Su, T. Fry, and B. Yao. 1979. 1978 New Orleans data base design workshop. In Proceedings of the fifth international conference on very large data bases, 328—339. New York: IEEE. Mattessich, R. 1964. Accounting and analytical methods. Homewood, IL: Irwin. 146 Mattos, N. M., and M. Michels. 1989. Modeling with KRISYS: The design process of DB applications reviewed. In Proceedings of the eighth international conference on entity-relationship approach, ed. F. H. Lochovsky, 159—73. Toronto: ER Institute. McCarthy, J. 1987. Generality in artificial intelligence. Communications of the ACM 30 (December): 1030—35. McCarthy, W. E. 1979. An entity-relationship view Of accounting models. The Accounting Review 54 (October): 667—86. McCarthy, W. E. 1982. The REA accounting model: A generalized framework for accounting systems in a shared data environment. The Accounting Review 57 (July): 554—78. McCarthy, W. E., and S. R. Rockwell. 1988. On the embedding of domain knowledge in automated software engineering tools: The case Of accounting. In vol. 1 Of Advance working papers of the second international workshop on computer-aided software engineering, ed. E. J. Chikofsky, 2-15-— 2-17. Cambridge, MA (July). McCarthy, W. E., and S. R. Rockwell. 1989. The integrated use Of first-order theories, reconstructive expertise, and implementation heuristics in an accounting information system design tool. In Proceedings of the ninth international workshop on expert systems & their applications, 537—48. Avignon, France: EC2. McCarthy, W. E., S. R. Rockwell, and H. M. Armitage. 1989. A structured methodology for the design of accounting transaction systems in a shared data environment. In Proceedings of the fifth annual structured techniques association conference, ed. J. S. Weber, 194—207. Chicago: STA. McCarthy, W. E., S. R. Rockwell, and E. Wallingford. 1989. Design, development, and deployment Of expert systems within an Operational accounting framework. In Proceedings of the workshop on innovative applications of computers in accounting education. Lethbridge, Alberta: University of Lethbridge. (To be reprinted in book form) Minsky, M. 1975. A framework for representing knowledge. In The Psychology of Computer Vision, ed. P. H. Winston, 211—77. New York: McGraw-Hill. 147 Mylopolous, J., P. A. Bernstein, and H. K. T. Wong. 1980. A language facility for designing database-intensive applications. ACM Transactions on Database Systems 5 (June): 185—207. Newell, A., and H. A. Simon. 1976. Computer science as empirical inquiry: Symbols and search. COmmunications of the ACM 19 (March): 113—26. O'Leary, D. E. 1988. Software engineering and research issues in accounting information systems. JOurnal of Information Systems (Spring): 24—38. Pescow, J. R., ed. 1976. The encyclopedia of accounting systems. Englewood Cliffs, NJ: Prentice-Hall. Reiner, D., G. Brown, M. Friedell, J. Lehman, A. McKee, P. Rheingans, and A. Rosenthal. 1987. A database designer's workbench. In Entity-relationship approach: Proceedings of the fifth international conference on entity-relationship approach, ed. 8. Spaccapietra, 347—60. Amsterdam: North-Holland. Reuber, A. R. 1988. Opportunities for accounting information systems research from a database perspective. JOurnal of Information Systems (Fall): 87—103. Roussopoulos, N., and R. T. Yeh. 1984. An adaptable methodology for database design. IEEE Cbmputer (May): 64—80. Shipman, D. W. 1981. The functional data model and data language DAPLEX. ACM Transactions on Database Systems 6 (March): 140—73. Smith, J. M., and D. L. P. Smith. 1977. Database abstractions: Aggregation and generalization. ACM Transactions on Database Systems 2 (June): 105—33. Sorter, G. 1969. An "events" approach to basic accounting theory. Accounting Review 44 (January): 12—19. Sowa, J. F. 1984. Conceptual structures: Information processing in mind and machine. Reading, MA: Addison- Wesley. 148 Tauzovich, B. 1989. An expert system for conceptual data modeling. In Proceedings of the eighth international conference on entity-relationship approach, ed. F. H. Lochovsky, 329—44. Toronto: ER Institute. ' Teorey, T. J., and J. P. Fry. 1982. Design of database structures. Englewood Cliffs, NJ: Prentice-Hall. Teorey, T. J., D. Yang, and J. P. Fry. 1986. A logical design methodology for relational databases using the extended entity-relationship model. ACM Computing Surveys 18 (June): 197—222. Tsichritzis, D. C., and A. Klug, eds. 1978. The ANSI/X3/SPARC DBMS framework report Of the study group on database management systems. Information Systems 3:173—91. Turban, E. 1990. Decision support and expert systems: Management support systems. New York: Macmillan. Weber, R. 1986. Data models research in accounting: An evaluation of wholesale distribution software. Accounting Review 61 (July): 498—518. Yourdon, E. 1989. Modern structured analysis. Englewood Cliffs, NJ: Prentice-Hall. "‘IIIlIlIIIIIlIIlI“