r
[.7‘PIIJ '- ritf- ..
.|I.I l-

... V
' O '7‘

I

 

l

“33E?!"

l 3m“:

F a . ‘ ‘ . . ca {it C vl
I. . . ;I 5.51 .o!!!
I . .
In. 31.32.! .I 10.515:- ‘0 . V
.x... ~51. Qvﬁutt :inssri s?2t..9§ pl:
. .2 . ‘

I art"!

. .IP.or|r.l;

Iva a!!!;nlp)n.r..flv(

.51.!f.lvp!.(.\rl.n1c- VIII

U,u»l1!.-$'1i1lvl=h||.l
’l’ nrvol

 

 

it).£\.l.
-)_\l'2

 

II... .I 12.44.. [tort .vC :(c
‘!;.: 3.9.5!
{v '1‘...“qu 2:! ~92?

. 1-0.3.- .
\Jl‘lllv.|.A.I.n II . (I

. . .. . . A 55:95.1
‘ , gmwgﬁfh “vi . . .....zf :u...v4r39. it... ‘5...1P.:.,:
.YKNNAKMQVD: ‘uv ‘ . . . ... ‘ 2. .1...)JJ.-.o.!!uJ-. . In: JEN ‘ . . ,

.

£5.37»: .1500... yr.
I. .34.».5 / .

I .

 

THESIS

MIICH IGANS SUTATE

III I IIIIII IIIIII I'II'IIIII”

293 00897J r8797

 

 

 

 

 

 

 

 

 

               

 

This is to certify that the

dissertation entitled

THE CONCEPTUAL MODELING AND AUTOMATED USE OF
RECONSTRUCTIVE ACCOUNTING DOMAIN KNOWLEDGE

presented by
STEPHEN RAYMOND ROCKWELL

 

has been accepted towards fulﬁllment
of the requirements for

Ph . D . degree in Account ingL

 

“Mama/14cc»?

Major professor

William E. McCarthy
Date g3 [La/ma» /?951

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0- 12771

 

 

 

LIBRARY
Michigan State
I University

L_._

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

l |

TI
I

—ll

MSU I: An Affirmative ActiorVEqual Opportunity limitation
cmmiI-DA

 

———j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

THE CONCEPTUAL MODELING AND AUTOMATED USE OF
RECONSTRUCTIVE ACCOUNTING DOMAIN KNOWLEDGE

by

STEPHEN RAYMOND ROCKWELL

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Accounting

1992

ABSTRACT

THE CONCEPTUAL MODELING AND AUTOMATED USE OF
RECONSTRUCTIVE ACCOUNTING DOMAIN KNOWLEDGE

by

STEPHEN RAYMOND ROCKWELL

This thesis evaluates the use of domain-specific knowledge in
the automation of the database design task of conceptual
modeling. The primary research task was examination of how
that knowledge could be used to resolve problems in the
modeling task of view integration. Another task was
exploration of the acquisition and use of reconstructive
domain knowledge. A knowledge-based view integration system,
REAVIEWS, was created for those purposes. It is the first
knowledge-based system to use domain-specific theory to
structure view integration. Industry-specific accounting
knowledge was reconstructed from the Encyclopedia of
Accounting Systems. Knowledge of first-order accounting
principles was represented by means of the REA accounting
model. REAVIEWS demonstrates additional problem solving
ability from the use of such domain knowledge. This thesis
discusses the design and implementation of REAVIEWS, including
the acquisition, modeling, and use of domain knowledge for
problem resolution. This thesis also offers insight into the
process of modeling knowledge originally compiled into a form

less suitable for use in knowledge-based systems.

Copyright by
STEPHEN RAYMOND ROCKWELL
1992

To Shere and Kate, with love

iv

ACKNOWLEDGEMENTS

I would like to thank Severin Grabski and Jon Sticklen for the
help and insight they provided in this research and in mw'
scholarly pursuits at Michigan State University.

Words seem. small recompense for the countless hours of
discussion, debate, guidance, support, and friendship (and did
I mention patience?) provided by William.McCarthy during this
whole process. I am deeply indebted to Bill for all his
assistance and inspiration.

The greatest debt of all is owed to my loving wife Shere and
daughter Kate. Without their love, cheerfulness, and
limitless patience, this work would not have been possible.

TABLE OF CONTENTS

LI ST OF TABLES O O O O O O O O O O O O O O O O O O O 0
LI ST OF FIGURES O O O O O O O O O O O O O O O O O O O 0
Chapter 1. Introduction . . . . . . . . . . . . . . .

Chapter 2. Accounting Database Systems and Conceptual

Modeling . . . . . . . . . . . . .
2.1 Accounting Database Systems .

2.2 Semantic Data Models . . . . . . . . . . . . .
2.2.1 Conceptual Modeling. . . . . . . . . . . .
2.2.2 REA modeling. . . . . . . . . . . . . . . .
2.2.3 Problems in View Integration. . . . . . . .
2.2.4 Expert Systems for Conceptual Mode ling. . .

2.3 Integration Example . . . . . . . . . . . . . .

2.4 Integration Strategies . . . . . . . . . . . .

2.5 Knowledge-Based Modeling Systems . . . . . .

2.6 Conflict Resolution Using Accounting Knowledge

2.7 Accounting Knowledge Structured as E-R Templates

2.8 Overview of the Integration Process . . . . . .

Chapter 3. Accounting Domain Knowledge in View

Integration . . . . . . . . . . . . . . . . . . . .

3.1 Levels of Accounting Domain Knowledge . . . . .
3.2 Sources of Accounting Domain Knowledge . . . .
3.2.1 Principles Level Knowledge. . . . . . . . .

3.2.2 Industry Level Knowledge . . . . . . . . .
3.2.3 Company Level Knowledge . . . . . . . . . .
3.3 Conceptual Modeling of Principles Level
nowledge. . . . . . . . . . . . . . . . . . . .
Conceptual Modeling of Industry Level Knowledge
Overall Knowledge Acquisition Strategy . . . .
1 Narrative. . . . . . . . . . . . . . . . .
2 Chart of Accounts. . . . . . . . . . . . .
3 Documents. . . . . . . . . . . . . . . . .
4 Organization Chart. . . . . . . . . .

3.
3.
.5.
.5.
.5.
.5.
Selection of Entity Keys . . . . . . . .
Composite-Key Entities . . .
Relationships and Structural WConstrai nts
Relationships in the EAS . . . . . . . .
3.9.1 Assigning Structural Constraints. . .

3.10 Modeling Compromise . . . . . . . . . .
3.11 Integration Conflict Resolution . . . . . . .

K
4
5

3

3

3

3
6
7
8
9

3.
3.
3.
3.

vi

11
12
15
18
2O
22
24
26
27
32

34
34
36
37
38
4O

41
43
46
47
49
54
55
58
59
63
64
67
7O
75

vii

3.11.1 Basic Problem Solving Concepts. . . . . . 75
3.12 View Integration Strategies . . . . . . . . . 77
3.12.1 Initial Schema Processing. . . . . . . . . 79
3.12.2 Entity Identification. . . . . . . . . . . 80
3.12.3 Relationship Identification. . . . . . . . 81
3.13 View Conflict Recognition . . . . . . . . . . 81
3.13.1 Homonyms. . . . . . . . . . . . . . . . . 83
3.13.2 Synonyms. . . . . . . . . . . . . . . . . 85
3.13.3 Type Conflicts. . . . . . . . . . 85
3.13.4 Structural Constraint Conflicts. . . . . . 88
3.13.5 Key Conflicts. . . . . . . . . . . . . . . 91
Chapter 4. The REAVIEWS System . . . . . . . . . 93

4.1 Knowledge Structures within REAVIEWS . . . . . 93
4.1.1 Frame-based Knowledge Representations. . . 93
4.1.2 Declarative Structures for Accounting Domain

Knowledge. . . . . . . . . . . . . . . . . . . 95
4.1.3 Procedural Structures for Accounting Domain
Knowledge. . . . . . . . . . . . . . . . . . . 101

4.1.4 Structures for View Integration Knowledge. 102
4.2 Test Cases for REAVIEWS . . . . . . . . . . . . 103
4.3 Inputs tO REAVIEWS . . . . . . . . . . . . . . 109
4.4 View Integration Session . . . . . . . . . . . 110

4.4.1 Produce-Sales-Analysis View. . . . . . . . 110

4.4.2 Update-Work-in-Process View. . . . . . . . 114

4.4.3 Record-Payment View. . . . . . . . . . . . 118

4.4.4 Record-Payroll View. . . . . . . . . . . . 119
4.5 Outputs from REAVIEWS . . . . . . . . . . . . . 122
4.6 Software Environment . . . . . . . . . . . . . 123

Chapter 5. Summary and Contributions . . . . . . . . . 126
1 Limits of Scope for REAVIEWS . . . . . . . . . 127
2 Research Context and Justification . . . . . . 127
3 Contributions . . . . . . . . . . . . . . . . . 132
4 Future Research Directions . . . . . . . . . . 134
5 Final Conclusions . . . . . . . . . . . . . . . 136

APPENDIX: Major View Integration Processes in REAVIEWS 138

LI ST 0F REFERENCES 0 C O O O O O O O O O O O O O O O O 14 3

Table 2-1
Table 2-2

LIST OF TABLES

Integration Conflicts . .
Integration Strategies . .

viii

LIST OF FIGURES

Figure 2-1 User Schema . . . . . . . . . . . . . . .
Figure 2-2 The General REA Template . . . . . . . . .
Figure 2-3 Three Partial_User Views . . . . . . . . .
Figure 2-4 Interschema Properties . . . . . . . .
Figure 2-5 Kinds of Knowledge That Can Go into a K88 .
Figure 2-6 Enterprise Schema . . . . . . . . . . .
Figure 3-1 Partially Instantiated REA Template . . . .
Figure 3-2 Partial Chart of Accounts-—-Assets . . .
Figure 3-3 Partial Chart of Accounts-—-Marketing Expense
Figure 3-4 "Composite-Key" Entities . . . . . . .
Figure 3-5 "Flow-Budget" and "Stock-Flow" Relationships
Figure 3-6 EAS-derived Structural Constraints . . . .
Figure 3-7 Vendor Service Entity . . . . . . . . . . .
Figure 3-8 Flowchart of REAVIEWS’s Integration Strategy
Figure 3-9 Expansion of Foreign-Key Attribute . . . .
Figure 4-1 Partial Structure of Entity Frame . . . . .
Figure 4-2 Partial Entity Hierarchy . . . . . . . . .
Figure 4-3 User View-—-Produce-Sales-Analysis . . . .
Figure 4-4 User View-—-Update-Work-In-Process . . . .
Figure 4-5 User View-—-Record-Payment . . . . . . . .
Figure 4-6 User View —-Record-Payroll . . . . . . . .
Figure 4-7 REAVIEWS-—-Main Screen . . . . . . . . .
Figure 4-8 REAVIEWS-— "Candidate Entity" Screen . . .
Figure 4-9 REAVIEWS-—-Notification of Foreign-Key
Expansion . . . . . . . . . . . . . . . . . . . . .
Figure 4- -10 REAVIEWS-—-Request for Company-Level
Knowledge . . . . . . . . . . . . . . . . . . . .

Figure 4- 11 Partial Output from REAVIEWS’s Session

10
14
21
28
29
31
50
51
53
61
66
68
74
78
87
98
100
105
106
107
108
111
113

117

120
124

Figure 4- -12 Partial Schema Produced from REAVIEWS Output125

Figure 5-1 Scope of Pilot System . . . . . . . . . .
Figure 5-2 The REACH System . . . . . . . . . . . .

ix

128
129

Chapter 1. Introduction

Research in the design of accounting information
systems (AIS) has been heavily influenced by work in the
areas of events accounting and conceptual database modeling.
Events accounting approaches (Sorter 1969; Colantoni, Manes,
and Whinston 1971; Everest and Weber 1977; McCarthy
1979,1982) support the construction of accounting systems
that record information about economic events in a
disaggregate, multidimensional format. Conceptual database
modeling theories are concerned with the development of
high-level, global descriptions of databases called
conceptual schemas. To aid that development, researchers
have developed various semantic models, such as the entity-
relationship (E-R) model of Chen (1976). McCarthy (1979,
1982) combined the work of Sorter and Chen with other
accounting theory, such as Ijiri (1975) and Mattesich
(1964), to produce the REA framework for the design of
accounting database systems. REA theory provides the
advantages of events accounting and conceptual modeling,
while incorporating important principles from the
traditional "value" approach to accounting.

Analysts usually construct the conceptual schemas from

smaller models of individual user applications, referred to

1

2
as user views, or user schemas. These schemas are combined
together in a process known as view integration. This
process is extremely complex, as analysts must attempt to
satisfy the competing data and processing requirements of
the people and applications that will eventually utilize the
database. Accurately representing and combining the varied
needs and perspectives of the individual users make .
conceptual modeling one of the most difficult and
challenging activities in the system design process. As
such, this activity has generated considerable research
interest. Recently, some of this research has been directed
toward the development of knowledge-based systems (KBS) to
aid in the conceptual modeling task (see, for example,
Mattos and Michels 1989 and Reiner et al. 1987). These
systems serve several purposes, among them:

1” They add to our understanding of the
conceptual modeling process;

2. ‘they help validate the various theories about
conceptual models; and

3. ‘they aid in the transfer of expertise in the
modeling task itself.

While a number of systems of varying complexity have
been developed in both academic and commercial settings, in
general, they rely heavily upon the user during the view
integration phase. To a greater or lesser extent, these
systems "fail" when attempting to resolve some of the
problems of view integration and must "ask” the user to do

much of the work. This project suggests that much of this

3
failure is a result of the general nature of the knowledge
underlying most of these systems. The knowledge that has
been embedded in these systems is almost exclusively from
the domains of conceptual modeling and database theory, with
little or no formal representation of knowledge from the
domain for which the database is being designed. This
thesis further proposes that some of the view integration
problems can be resolved by the use of specific knowledge
from the application domain. This suggests that expert
systems for conceptual modeling can be made more robust by
incorporating in them the expertise of someone familiar with
the application domain.

For example, an expert system for modeling accounting
database systems could contain accounting knowledge about
the particular type of business being modeled. This would
be similar to having an experienced accountant "looking over
the shoulder" of the system user during the modeling
process. Thus, when encountering view integration conflicts
that cannot be resolved with generic modeling expertise, the
system could access the more specific accounting knowledge
for suggestions and insights. This is in contrast to
existing systems that must stop and ask the user to supply
this domain knowledge and, in effect, resolve integration
problems with little guidance from the system itself.

For this project, we designed a K88 (named Reaviews)
for schema integration using such domain knowledge. Before

that knowledge could be embedded in the K88, it had to be

4

acquired from the expert. Reconstructive methods of
reasoning (Johnson 1983) explicate domain knowledge of facts
and procedures that can be codified and used by knowledge-
based systems. These methods are useful for unearthing
expertise that has been "compiled" into a form that is
missing some of the procedural or declarative knowledge
originally possessed by the expert. Much of the accounting
knowledge used in REAVIEWS is this type of reconstructive
expertise.

The remainder of this thesis is organized as follows.
In Chapter 2, we explain the process and problems of view
modeling and integration in more detail (including a brief
discussion of the REA model). We also review associated
research from accounting and computer science. Chapter 3
explains our use of accounting domain knowledge in the
solving of integration problems. Chapter 4 contains an
overview of the REAVIEWS prototype. The final chapter
includes a brief summary; discussion of the research
limitations, justification, and contributions; and some

recommendations for future research suggested by this work.

Chapter 2. Accounting Database Systems and Conceptual
lodeling

2.1 Accounting Database Systems

From the 19605 on, we have witnessed a rapid expansion
in the use of computerized data processing in companies of
all types. These technological advances have provided
accountants with the opportunity to overcome some of the
weaknesses of traditional double-entry, chart-of-accounts-
based accounting systems. McCarthy (1982) explained some of
these weaknesses of a conventional accounting system (as
identified by two American Accounting Association research
committees):

1. Its dimensions are limited.

2. Its classification schemes are not always
appropriate.

3. Its aggregation level for stored information
is too high.

4. Its degree of integration with the other
functional areas of an enterprise is too
restricted.

In an effort to overcome these weaknesses, a number of
accounting researchers advocated the design and use of
accounting systems that record information about economic

events in a disaggregate, multidimensional format.

6
Sorter (1969) referred to this as the "events" approach to
accounting, as contrasted with traditional accounting
theory, which he labeled the "value” approach.

One advantage of the events approach is that its less-
aggregate data can always produce the more aggregate, value-
oriented reports when desired. One problem with the events
approach is that it can require significantly more
information to be recorded and used in the production of
reports. The advent of computerized accounting systems
offered some solutions to that problem. In particular, the
database approach to information processing and management
was seen by many researchers as an appropriate vehicle by
which the events theories could be implemented in accounting
information systems.

Accounting system designers, regardless of their
acceptance of the events theories, have recognized the
advantages of capturing data beyond that recorded using
traditional value approaches. The accounting databases of
today have moved away from purely chart-of-accounts-based
systems and routinely capture the type of disaggregate
information called for by events proponents. Events
researchers provided a theoretical foundation for recording
information in this fashion. Accounting database designers,
whether they were using these theories or not, have moved
steadily in the events direction.

Researchers have developed various accounting system

models that take advantage of the database approach. Some

7
of the accounting systems were based upon the primary data
models used in database implementations, namely, the
hierarchical, network, and relational models. Those using
the hierarchical database model included Colantoni, Manes,
and Whinston (1971), Lieberman and Whinston (1975), and
Haseman and Whinston (1976). Haseman and Whinston (1977)
used the network data model, and Everest and Weber (1977)
used the relational data model. McCarthy (1979,1982) used
an events accounting approach as the basis for his model of
an accounting database system. For his data model, however,
he chose the E-R semantic model rather than one of the
primary models. The following section discusses semantic
data models in more detail and explains how they are used in

the database design task of conceptual modeling.

2.: semantic Data Models

The E-R model is only one of a number of semantic data
models that have been proposed by researchers (e.g., Smith
and Smith 1977; Mylopolous, Bernstein, and Wong 1980;
Shipman 1981; Hammer and McLeod 1981; Hull and King 1987).
Semantic data models offer more powerful tools than do the
primary data models for representing the domain of interest
in a database management systems (DBMS). The semantic
models allow one to construct an abstract, high-level
specification of the data underlying a database
implementation. This specification, called a conceptual

schema, describes the data in terms of the real-world

8
entities (and relationships among those entities) that are
modeled by the database. Modeling the database in those
terms facilitates communication between the user groups and
the design team, making it easier to construct databases
that capture the important concepts in the application
domain.

The conceptual schema is the intermediate level of the
three-schema framework for database design (Tsichritzis and
Klug 1978). The other levels consist of the external (user)
schemas (which describe the database from the user
application perspective) and the internal (physical) schema
(which describes the physical layout of the data as
implemented in a particular DBMS). The three-schema
framework allows the user views to refer only to the
conceptual schema and thus remain independent of the
physical storage structures of the database implementation.

The user and conceptual schemas view the database from
the level of the real-world concepts (or objects) modeled by
the_database. Those objects are classified as entities,
relationships among the entities, or attributes (which
describe the entities or relationships). The tasks involved
with creating those schemas are referred to as conceptual
data modeling. Because of its power and simplicity, the E-R
model is perhaps the best known and most widely used tool in
the data modeling process. Teorey et al. (1986) refer to

the E-R model as the "premier model for conceptual design."

9
For those reasons, a version of the E-R model was the main
conceptual modeling system used in this project.

In E-R modeling, entities are the real-world objects we
wish to model and are drawn as rectangles in the view
models. Diamonds are used to represent relationships that
exist between those entities. Entities can be further
characterized with attributes that are used to describe and
identify actual instances of the entities. Figure 2-1 shows
one example of a user schema (also called a user view). In
that view, the entity Sale has the attributes inv. I, sale
amount, and customer. It is shown in a relationship with
Inventory. Attributes with a solid circle are known as
primary-key attributes and are unique identifiers of their
respective entities. In this example, if we have an invoice
number, we can identify one particular sales event and the
values for its other attributes such as sale amount. We can
also identify the various inventory items which were sold in
that particular sales event. Non-key attributes are not
necessarily unique identifiers. For example, there may be a
several different sales with the same sale date, thus date
can not be used to identify a specific sales event. For
simplicity, the figures show only selected attributes; there
are a number of other attributes that normally would be
present in the complete user schemas.

The numbers and letters to each side of a relationship
indicate maximum cardinalities for the entities

participating in that relationship. The relationship

10

23.5 0.8 0|

:80!
8.80!
313860!
3.831

.eEogooL

e551

3.8.868 0.1
8.8. 8.8 0!
2:9. 9L

e .93 9.1

|—<1 H" Sale l—{m H" Inventory]

 

 

Sales

 

 

 

 

Person

 

Figure 2-1 User Schema

11
between Sales Person and Sale is a "one-to-many"
relationship (frequently written as l-N). The maximum
cardinalities tell us that a single instance of the entity
Sales Person may participate in many sales events, but a
particular instance of the entity Sale will be associated
with one, and only one, sales person. The terms one-to-one
(1-1), one-to-sany (l-N), and sany-to-sany (M-N) are

referred to collectively as cardinality ratios.

2.2.1 COnceptual MOdeling. Regardless of the choice of
data model, analysts generally follow the three phase
modeling process recommended by Lum et al. (1979). In the
first phase, called requirements analysis, analysts gather
data from a variety of sources to identify the
organization's information needs and determine how the
proposed database system will meet those needs. At this
phase, analysts must identify and specify the following
items (McCarthy 1982, p. 557):

1” the processes (and decisions) that use data;

2. ‘the various data elements themselves and
their patterns of usage across processes; and

3. the various organizational constraints on
data use.

For each process identified during this phase, the analyst
will prepare a list of data elements. Each list can be

thought of as one view of the database, taken from the

12
perspective of the associated process. This view is
referred to as a user view.

During view modeling (the second phase in the modeling
process) analysts convert the lists of data elements into
one of the many data models that have been developed for
this purpose. Semantic data models are currently the
modeling vehicles of choice for most analysts. At the
conclusion of the view modeling phase, the analyst has a
number of individual user views that must be combined into
one global data model during the final design phase, view
integration.

Batini et a1. (1986, 326) explain why user views are
produced independently and why differences between views may
occur:

1" The structure of the database for large
applications (organizations) is too complex
to be modeled by a single designer in a
single view.

2. ‘User groups typically operate independently'
in organizations and have their own
requirements and expectations of data, which
may conflict with other user groups.

2.2.2 REA modeling. E-R modeling constructs are used for
the conceptual data modeling underlying the construction of
REAVIEWS, the prototype knowledge-based system developed for
this thesis. These modeling techniques are applied in a
very specific theoretical framework for accounting database

design, as described in McCarthy (1982). The REA theory of

13

data modeling uses E-R constructs to implement basic
accounting principles in the process of accounting database
design. As such, it provided an acceptable method of
incorporating accounting domain knowledge into REAVIEWS. It
also was used as an important part of the problem-resolution
process. Knowledge of important accounting principles helps
us recognize the theoretically ”more-correct” choice of
modeling construct in some integration conflict situations.
This is discussed at greater length in subsequent sections.

Figure 2-2 shows a graphical representation of the
major modeling constructs in the REA accounting model. That
figure is drawn using the same E-R diagrammatic techniques
as in Figure 2-1, and it illustrates what we refer to as an
REA template. A major premise of REA accounting theory is
that a complete accounting data model of any economic event
will include all of the elements shown in the REA template.
As more fully explained in McCarthy (1982), there are a
number instances in which we may construct data models that
incompletely specify the REA templates for certain events.
Such instances may arise when they provide us with important
system efficiencies or when accounting convention allows a
less than complete specification of certain events. We
discuss some of these "compromises" to the basic REA
template later. For now, it is sufficient to know we can
identify and model the major components of the REA template

for most of the economic events of a firm.

14

 

ECONOMIC
AGENT

 

 

®

 

 

l

 

ECONOMIC ECONOMIC
RESOURCE EVENT
mm- ECONOMIC

 

 

 

Figure 2-2 The General REA Template
[adapted from: Denna and McCarthy (1987)]

15
2.2.3 Problems in View Integration. Producing a global
view of an organization's data resources provides benefits
to system designers and users alike. Sowa (1984) states the
"common aspect that unifies all groups [of system design
specialists] is a knowledge of the meaning of the data and
the constraints necessary to keep it a faithful model of the
real world" (p. 303). The conceptual schema provides that
knowledge and can be just as valuable to the users of the
system. A global data schema helps decision makers
understand how the various data files are related, and it
potentially affords them a better understanding of the total
data resources available in the firm.

As mentioned previously, the integration of the
independent user views becomes problematic due to the
structural and semantic diversities that arise among
designers when modeling concepts that are common to multiple
user views. Researchers have proposed several causes for
these diversities. The following causes are summarized from
a comparative study of integration methodologies (Batini et
al. 1986):

1” (Different perspectives among user groups or
designers-—-the same concept or relationship
may be given different names in different
schemas, or the same relationship may be
modeled directly in one schema and indirectly
in another schema.

16

2. .Equivalence among constructs of the model —-
there may be different combinations of
constructs that still provide equivalent
models of the application domain; e.g., a
concept may be represented as an attribute of
another concept in one schema while another
schema shows both as entities with a
relationship connecting them together.

3. (Incompatible design specifications-—-choices
made concerning names, types, integrity
constraints, etc. in one schema may be
incompatible with choices made in another
schema.

In addition to identifying the previous causes of conflicts,
they found the types of conflicts that occurred could be
grouped into two main areas: naming conflicts and structural
conflicts. Naming conflicts are of two varieties: homonyms,
where two different entities are given the same name; and
synonyms, where one entity is given different names in
different schemas. Structural conflicts are conflicts
between entity types, dependency constraints, keys, or
insertion/deletion policies. Table 2-1 provides more
complete descriptions of these types of conflicts.

There are frequently relationships, referred to as
interschema properties, between two different sets of
objects that reside in different schemas. These interschema
properties may not be evident when viewing the database from
any individual user schema, and therefore they must be added
in at the view integration stage. Specific examples of
conflicts and interschema properties are provided here

later.

17

Table 2-1 Integration Conflicts
[deﬁnitions from: Batini et al. (1986)]

 

Type

Conflict

Description

 

Naming

Structural

Homonym

Synonym

Type
Conflict

Dependency
Conflict

Key
Conflict

Behavioral
Conflict

The same name is used for two different concepts,
giving rise to inconsistency unless detected. For
example, merging two entities of this type in the
integrated schema would result in producing a
single entity for two conceptually distinct
objects.

The same concept is described by two or more
names. Keeping each name modeled as a
distinct entity in the integrated schema would
result in modeling a single object by means of
multiple entities.

The same concept is represented by different
modeling constructs in different schemas. For
example, a class of objects may be represented
as an entity in; one schema and as an attribute in
another schema.

A group of concepts are related among themselves
with different dependencies in different
schemas. For example, a relationship between
two entities may be shown as 1:1 in one
schema, but mm in another schema.

The same concept is assigned different keys in
different schemas. For example. 884? and
Emp_id may be the keys of Employee in two
component schemas.

The same class of objects is assigned different
insertion/deletion policies in distinct schemas.
For example, in one schema a department may
be allowed to exist without employees. whereas
in another, deleting the last employee associated
with a department leads to the deletion of the
department itself.

 

18
2.2.4 Expert systems far conceptual Modeling; Conceptual
modeling may be viewed as an expert task-—-the performance
by experts is significantly better than the performance of
novices (e.g., Goldstein and Storey 1989). As such,
researchers have attempted to model the task with expert
systems (ES), with varying levels of success. Examples of
such systems include DDEW (Reiner et al. 1987), EDDS
(Choobineh et al. 1988), CHRIS (Furtado et al. 1988),
Modeller (Tauzovich 1989), KRISYS (Mattos and Michels 1989),
and Pasta-3 (Kuntz and Melchert 1989). Nearly all of the
systems developed to date have been general purpose in
nature. They have been designed to operate in multiple
modeling domains. For example, the same tool might be used
to model databases for an automotive manufacturer and a
large university. In the view integration stage, these
systems follow a number of conflict resolution strategies
that are also generic in nature.

One conclusion that can be drawn from research is that
conceptual modeling requires significant amounts of
knowledge from both the application domain and the field of
conceptual modeling itself. The knowledge-based systems
developed to date, however, have been imbued primarily with
the latter type of knowledge. This is perhaps the result of
the search for generality that has been a consistent theme
in artificial intelligence (AI) research. John McCarthy, a
pioneer in the field of AI, notes that lack of generality

has long plagued AI programs and believes that "the problem

19
of generality in artificial intelligence (AI) is almost as
unsolved as ever” (McCarthy 1987, p. 1030).

Regardless of the reasons for the preference for
domain-independent conceptual modeling ESs, they all share
similar problems at view integration time. These problems
are generally resolved manually by the system user,
utilizing his or her application domain knowledge. This
being the case, it appears that conceptual modeling systems
can be made more robust by embedding within them this
application-specific knowledge. While this approach loses
some of the oft-sought generality, many researchers view
loss of generality not as a problem, but as a key to the
development of some knowledge-based systems. ES development
in the 1960s and 1970s provided the insight that

the power of an ES is derived from the specific
knowledge it possesses, not from the particular
formalisms and inference schemes it employs. In
short, an expert’s knowledge per se seems both
necessary and nearly sufficient to develop an
expert system. (Turban 1990, 425)

That is the view taken in this thesis. A knowledge-based
system is being developed to explore and test the earlier
proposal that additional types of information are needed in
the view integration process. Chapter 3 examines the
potential improvement to view integration systems provided

by the use of more domain-specific knowledge.

20
2.3 Integration Example

The discussion of integration processes and problems is
more easily understood with a few simple user views as
illustration. Figure 2-3 shows three user views: 2-3a is
the view model of a sales transaction originally shown in
Figure 2-1; 2-3b is a model of the issuance of a paycheck to
an employee; and 2-3c is a model of a labor operation in
which a worker completes a work-in-process job that is
subsequently transferred to finished goods. The user views
were synthesized from transactions in Armitage (1985), in
which he provided a detailed example of database design for
an actual firm in the machine-shop industry.

User schemas are typically developed from formal
specifications of the information needs of the users. A
variety of methodologies have been developed for analyzing
an organization's information requirements and producing
that formal specification. The approach used in this
project was developed by McCarthy, Rockwell, and Armitage
(1989). It is a synthesis of structured analysis, as
described in Gane and Sarson (1979), DeMarco (1979), and
Yourdon (1989), and database design, as described Lum et al.
(1979) and Teorey and Fry (1982).

As presented, the three schemas in Figure 2-3 have a
number of conflicts. The two Employee entities are
homonyms; the entities Finished Goods and Inventory are
synonyms; and there are two interschema properties that are

not identified in the individual user views. We next look

21

 

8
Egg b"Pd" h
6:8 .ayayScema

 

c. 'BuiId-it" Schema

Figure 2-3 Three Partial User Views

22
at some of the methods that have been developed for handling
such conflicts. These examples are meant for illustration,
and they are not the actual cases developed for testing

REAVIEWS.

2.4 Integration Strategies

After producing the user views, analysts face the
problems of integrating them into a global schema. In a
comparative review of integration methodologies, Batini et
al. (1986) found that integration strategies could be
grouped into one of two primary classifications: binary, in
which two schemas are integrated at a time; and n-ary, in
which n schemas (n>2) are integrated at one time. Binary
strategies could be further classified as ladder or
balanced, while n-ary strategies could be divided into one-
shot or iterative. Definitions for these terms are found in
Table 2-2, as are graphical depictions of the strategies.

The ladder strategy is used in this project, as it
offers the following advantages to the integration process:

1” The integration task is simplified compared
to n-ary strategies, as integration
complexity increases with the number of
schemas integrated at one time (although the
number of integration operations is greater
than with n-ary strategies);

2. 'there is more control over the order of
integration of individual user views; and

3. at each stage, the intermediate schema can be
given a higher importance in settling
integration conflicts.

23

Table 2-2 Integration Strategies
[adapted from: Batini et al. (1986)]

 

 

. . Graphic
Strategy Type Descnptlon Representation
Ladder A new component schema is
(binary) integrated with an existing

intermediate result at each step.

Balanced Schemas are divided into pairs at

(binary) the start and are integrated in a
symmetric fashion.

One-shot The schemas are all integrated in

(”'80’) a single step.

Iterative Any n-ary strategies other than

("-800 one-shot.

 

24

The advantages of task simplification are obvious. The
ability to group the schemas together (e.g., by accounting
cycle) offers the analyst some of the advantage of n-ary
strategies. The third advantage, giving higher "weight" to
the intermediate schema, is in fact a part of the problem
solving strategy that is followed by the KBS being designed
for this thesis. That strategy will be explained after a
brief discussion of integration within existing conceptual

modeling systems.

2.5 Knowledge-Based Modeling Systems

Current E-R modeling systems range from passive drawing
tools to knowledge-based assistants that embody some form of
conceptual modeling expertise. The systems discussed herein
reside toward the knowledge-based end of the spectrum. The
list of systems is not all-inclusive, but it is
representative of the state of the research at this time.

DDEW (Reiner et al. 1987) finds objects that appear
similar (e.g., same names) and let the user decide whether
to merge them. Modeller (Tauzovich 1989) can identify
conflicts in user-supplied assertions, but the system itself
does not resolve them. It merely refuses to accept a new
information until it is conflict-free. This is a common
form of conflict resolution, and it is also used by EDDS
(Choobineh et al. 1988), CHRIS (Furtado et a1. 1988), and
Pasta-3 (Kuntz and Melchert 1989). The KRISYS system

(Mattos and Michels 1989) allows analysts to design very

25
complex conceptual models, but still refers conflicts back
to the analyst for solution. That solution usually consists
of adding new application domain knowledge to the system.

Some researchers have suggested conflict resolution
strategies not tied to a specific conceptual modeling KBS.
Teorey et al. (1986) provide a number of rules for
distinguishing between entity types and attributes, but
these also are generic, and in a specific context, one might
need to refer to an expert in the application domain to
resolve conflicts. Roussopoulos and Yeh (1984) suggest some
rules of thumb for identifying entities, properties, and
relationships, but they admit they have no algorithmic
methods for this task. Indeed, many of their rules of thumb
require the analyst to use knowledge of the application
domain.

The systems and methodologies just mentioned do help
resolve, or at least identify, some common integration
conflicts, but they still leave much of the conflict
resolution to the analyst. Using their more general
modeling methods, most of these systems could not identify
the potential conflicts between the two Employee entities
and between the Inventory and Finished Goods entities.
Further, none would be in a position to provide the analyst
with the theoretically "correct" solution. The best they
could do is to query the analyst and "ask” whether the

entities are in fact the same. If the analyst confirmed

26

that they were identical, the systems would then rely on the

analyst to decide which attributes the entity should have.

2.6 Conflict Resolution Using Accounting Knowledge

During the integration process, conflict resolution
could be facilitated if the analyst had the services of an
experienced accountant who possessed knowledge of the
application domain. In Figure 2-3, the entity Employee
appears in two user views. The analyst might easily assume
that the two views are modeling exactly the same entity.

The analyst might further assume that the entity Salesperson
in the first schema is also the same entity (making
Salesperson and Employee synonyms). An experienced
accountant would know that the entities are being viewed
from a different level of abstraction. In the Sales and
Build-it schemas, the employees modeled are sub-types, or
specializations of the more abstract entity in the Payday
schema. This type of relationship is called a
generalization relationship (Smith and Smith, 1977). It is
also one of the enterprise’s interschema properties that are
not apparent in the individual user views of Figure 2-3.

The abstract concept of duality (McCarthy 1982) is
another interschema property that is not usually apparent in
user views. Simply stated, duality is the idea that changes
in an entity's resource set generally occur in pairs; i.e.,
each increment to resources is accompanied by a related

decrement to resources. A labor operation (increment to the

27

work-in-process resource) will usually have an associated
disbursement (decrement to the cash resource). Figure 2-4
depicts the Pay-Day and Build—it schemas and the interschema
relationships just described. The cloud surrounds the two
relationships that are not apparent at the user view level.

In an example used previously, we saw that the
individual user views do not identify the entities Finished
Goods and Inventory as synonyms. Our experienced accountant
knows that the finished goods from the conversion accounting
cycle are the inventory items that are sold in the sales
accounting cycle. From these few examples, we see how the
"extra" accounting knowledge can give the analyst guidance

beyond that supplied by the current modeling systems.

2.7 Accounting Knowledge Structured as E-R Templates

Many types of domain knowledge can go into a KBS.
Fikes and Kehler (1985) presented one classification scheme
for domain knowledge. They divided such knowledge into the
eleven types shown in Figure 2-5. Years of experience in a
particular industry help an accountant build a mental model
of accounting systems for businesses in that industry. We
can think of this model as a sort of template. This
template could conceivably contain all of the types of
knowledge shown in Figure 2-5. If we model this knowledge
in entity-relationship terms, we can visualize this as a
semi-generic E-R template of a firm. The E-R schema would

‘model the accountant's knowledge about the entities and

28

 

“' employeelf

 

 

1 n
IEmP'W‘ﬂ—O—‘IC Disbursement I“'<>_‘I 033" I

 

 

1QQ‘V—---~~~"--~
U
Pays for Interschema '\
(dua'ny) Relationshlps I!
'

O
---,

2%
ti

Finis

‘

 

 

i -Oquentityonhand

 

E

 

 

 

 

n Labor n 1 Work-in- 0 Inventory m
peratlon Process Transfer

 

 

Figure 2—4 Interschema Properties

29

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

. Uncertain
Behavior
[descriptionsJ T I I I
P J
I Vocabulary I\
deﬁnitions
...... .__I...........I
Objects
and
relationships
Heuristics
IW.I
Decision
rules
Disjunctive
facts

 

 

 

Figure 2-5 Kinds of Knowledge That Can Go into a KBS
[adapted from: Pikes and Kehler (1985)]

30
relationships that exist in a typical business situation in
that industry. Along with the schema, we would need to
model the vocabulary definitions, decision rules,
constraints, heuristics, and other facts that the accountant
knows about this type of firm. The following example
demonstrates how we can use this type of knowledge in the
integration of the three user views.

Figure 2-6 shows a portion of an E-R schema for a firm
in the machine shop industry. The schema contains four
interschema properties: two duality relationships and two
generalization relationships. The actual construction of
this schema is discussed later; the main focus here is the
use of this schema in view integration.

Minsky (1975) theorized that people faced with new
situations try to fit current perceptions to some pre-
existing memory structure, which he called frames. A
similar process can aid us in view integration. New user
views are "fitted" against existing mental templates to help
understand the user views and reconcile conflicts. No claim
is made here that this is the actual mental process that
occurs when one performs the integration task, but personal
experience and discussions with experienced data modelers
suggest this may be so. The enterprise schema in Figure 2-6
may now be thought of as a pre-existing "template" to which

the user views in Figure 2-3 will be compared.

31

 

 

 

 

 

 

 

 

 

I l
[marl 3r... I rm"

 

 

 

 

Sale

 

 

 

 

 

 

 

 

 

lS-A

 

l Inventory l

 

Figure 2-6 Enterprise Schema

32
2.8 Overview of the Integration Process

Step one in the integration consists of adding the
Sales schema to our enterprise schema. Remember that we are
using the ladder integration strategy. In the initial
integration, the enterprise schema can be thought of as the
intermediate schema. We "know" that the entity set labeled
Inventory in the Sale schema is the same as Finished Goods
Inventory in the enterprise schema. We keep track of these
"extra" names as "aliases," so that users of the finished
schema can find the enterprise schema entities that
correspond to the entities in their user views.

We next add attributes from the user view to the
intermediate schema. While adding the attributes, we would
notice the attribute called "customer" associated with the
Sale entity. We would know that a customer is most often
viewed as an entity, rather than an attribute. We would
then (tentatively) change customer to an entity.
Integration of the Payday schema proceeds in a similar
fashion, with no conflicts to resolve. The two interschema
relationships shown in Figure 2-4 are already present in the
enterprise schema. They present no conflict with the user
view, as these interschema relationships are not modeled in
that view.

The Build-it schema presents a number of interesting
examples. The renaming of entities occurs here, as it did
in the Sales schema, but we face a new problem when

attempting to add the attributes to our intermediate schema.

33
We have some synonyms among the attributes also, e.g.,
descr./description and qoh/quantity on hand. Our
experienced accountant knows the abbreviations used here
(part of the vocabulary definition type of knowledge).

If an attribute appears that is not in the current
vocabulary, the accountant can use knowledge of the
descriptive roles played by attributes in the firm prototype
to help identify the attribute. For example, if the qoh
attribute for Inventory had been labeled no. instead, the
accountant would still be able to identify it if no. was
described as representing the number of items of the product
currently on hand. The accountant would then know that #
was a synonym for quantity on hand. Integration continues
in this fashion until all of the user views have been added.

This chapter discussed some of the issues and research
in conceptual modeling and view integration. It also
provided a relatively high-level overview of the view
integration process. In Chapter 3, we provide deeper
analysis of the role played by domain knowledge in view
integration conflict resolution. We also discuss the
acquisition, modeling, and use of reconstructive knowledge

for this project.

Chapter 3. Accounting Domain Knowledge in View Integration

In earlier chapters, we suggested that adding domain
knowledge to the view integration process added extra power
to our ability to solve integration conflicts. Some simple
examples were given there, but a more complete explanation
of such knowledge is in order before we can discuss how to
use that knowledge in view integration. Much of the
discussion in this chapter is intentionally at a relatively
high level of abstraction. The implementation details in
Chapter 4 provide a much lower level of analysis, and
understanding that material should be aided by beginning

with the higher level exposition here.

3.1 Levels of Accounting Domain Knowledge

For the purposes of this project, it is helpful to
think of accounting domain knowledge as being classifiable
into three different levels, which we call the principles,
industry, and company levels of knowledge. This allows us
to consider separately the various sources and types of
knowledge embodied in REAVIEWS and gain some insight into
the how each might be used in our problem solving.

At the highest (i.e., most general) level lies

principles knowledge. This level consists of the basic

34

35
accounting and business concepts that must be considered in
the creation of accounting systems. Included in this
category are concepts such as duality, control,
accountability, economic resources and events, and stock-
flows. At the principles level lies much of the knowledge
of how businesses operate and the "real-world" objects about
which accountants capture information. Accounting systems
must allow for the representation, either explicitly or
implicitly, of those concepts.

At a less general plane lies industry knowledge. At
this level lies knowledge about the objects specific to
groups of businesses within given industries. For example,
at the principles level we understand the concept of
economic resources as scarce objects of utility controlled
by the business enterprise, but at the industry level we
understand the specific types of resources common to
business enterprises in a particular industry. Likewise,
the principles level view of economic events includes the
notion that such events reflect changes in a company’s
resource set resulting from certain activities, such as
production or sales. The industry level view would include
an understanding of the particular types of exchange
activities participated in by companies in a particular
industry (such as installment sales). Knowledge at this
industry level can help us determine more of the specific
objects (and relationships among those objects) we would

expect to be present in a typical company in that industry.

36
The lowest level of accounting knowledge in our

classification scheme is the company level, which consists
of knowledge about the objects of interest to one specific
company. This level would be concerned with the specific
business policies and terms used by the company for which we
are producing the conceptual schema. As some of these
policies and terms vary among individual businesses, it
follows that some of the conflicts encountered in view
integration might best be resolved by using company-specific

knowledge.

3.2 Sources of Accounting Domain Knowledge

The use of the three-level knowledge classification
model described above developed out of earlier work
(McCarthy and Rockwell 1989) which examined the use of
various types of knowledge in the process of designing
accounting information systems. The company-specific level
of knowledge was not addressed in that work, and some of the
terminology used here is new, but sources for the
acquisition of knowledge for the principles and industry
levels were discussed there. We next discuss the sources of
those three levels of knowledge used by REAVIEWS. After
that, we describe the transformation of this knowledge into
a conceptual form that can be used for problem solving in

view integration activities.

37
3.2.1 Principles Level Knowledge. The basic accounting
concepts and principles that underlie modern accounting
systems have been studied, explained, and formalized by a
number of accounting theorists, particularly in the last
several decades. Some of these principles relate directly
to the economic phenomena that we attempt to model in
conceptual schemas for database design. An example of this
is the principle of duality, which recognizes that each
increment in an enterprises’s resource set can be linked to
a corresponding decrement.1 Others principles relate to
concepts that are artifacts of particular methods of storing
and transmitting data about those economic phenomena. An
example of this nature is the prescription that, in journal
entries and trial balances, debits must equal credits. In
designing accounting systems following a database approach,
we need an accounting model free of these artifactual
concepts. REA accounting theory provides just such a model.
It is a first-order theory of accounting expressed in terms
and structures compatible with the semantic modeling of
accounting systems for use in shared data environments
(i.e., database systems). Because the declarative and

procedural aspects of the REA model are expressed in terms

 

1 As presented in McCarthy (1982), the concept of duality is much
richer than the brief explanation here. It includes the notions that an
increment must be a member of an event set different from the event set
of its matching decrement, that one event set will be that of
transferring in events and the other will be the set of transferring out
events, and that the accounting practice of matching of expenses allows
the relaxation of the duality constraint for certain events for which
direct linking to increment/decrement events is either undesirable or
not possible.

38
of both the accounting and the conceptual database design
domains, the model becomes an inherently useful vehicle for
providing high-level accounting knowledge to the task of

view integration.

3.2.2 Industry Level Knowledge. At this level, we are
dealing with a more specific level of knowledge than at the
principles level. As suggested in McCarthy and Rockwell
(1989), this type of knowledge would be part of the
accumulated expertise of an accountant who had experience
designing accounting systems for businesses in a particular
industry. To be used in a knowledge-based system such as
REAVIEWS, this knowledge must first be elicited from that
expert.

Hoffman (1989) surveyed methods used by researchers to
elicit experts’ knowledge and reasoning strategies. He
classified the strategies into three categories, task
analysis, interview techniques, and special tasks. The
latter two elicit expertise directly from the experts. The
data for task analysis can be acquired from experts
indirectly (e.g., from documentation such as training
manuals) or directly (as in the protocol analysis of
Ericsson and Simon (1984)). Expertise within documentation
can be placed on a continuum from direct representation to
indirect representation. At the direct end of the
continuum, expertise is provided in a form close to that

used in task performance. One example would be a training

39

manual that provides the detailed procedural and declarative
knowledge used in some task, along with heuristics for using
that knowledge. Indirect representations are those
requiring significant interpretation and reconstruction due
to the compilation of those more direct forms of knowledge.

Expertise is most commonly acquired directly from
experts, and numerous knowledge-based systems have been
developed with most or all of their domain knowledge
acquired in this fashion. As we have proposed, part of the
motivation for this thesis was the exploration of the use of
reconstructive expertise, which we described as expertise
derived from sources in which the domain knowledge has been
"compiled" into a form that is missing some of the
procedural or declarative knowledge originally possessed by
the expert. This corresponds to Hoffman's cOncept of
indirect knowledge.

The Encyclopedia of Accounting Systems (Pescow 1976)
contains, in a documentary form, the expertise of a number
of experienced accountants. The Encyclopedia is organized
by industry, and it provides just the sort of industry-level
accounting knowledge proposed for use in view integration.
The knowledge derived from this source can be considered
reconstructive expertise, as the original accountants'
expertise has been compiled down into chart-of-accounts-
based templates for accounting systems. A more complete
discussion of the nature of the Encyclopedia’s accounting

knowledge is contained in section 3.4, which details the

40

conversion of this industry knowledge into a conceptual form

appropriate to the view integration task.

3.2.3 company Level.Knowledge. There are numerous facts
about a particular business entity that cannot be derived
from the types of higher level domain expertise described in
the previous two sections. Examples of this are the
policies followed by businesses in granting credit to their
customers. Some companies offer discounts for invoices paid
within a short period of the invoice date, while others may
offer no discounts. Likewise, some companies may assign
customers to specific salespersons who act as account
representatives, while others may allow multiple
salespersons to participate in transactions with any single
customer. A semantically "correct" enterprise schema must
accurately model these facts.

In a general integration tool such as REAVIEWS, it is
not possible (nor is it desirable) to capture and embed this
very specific knowledge in advance. Such knowledge would
apply only to one specific enterprise, and its use in view
integration for a different company might introduce errors.
The solution is to acquire any such knowledge directly from
the user at the time of initial need. Once acquired, this
knowledge is available during that enterprise's view
integration, should it be needed in resolving later

conflicts. It is not, however, made part of the "permanent"

41
knowledge base available for later consultations with other

enterprises' data.

3.3 Conceptual Modeling of Principles Level Knowledge.

Batini, Ceri, and Navathe (1992, 73-76) suggest several
strategies for database schema design. Their mixed strategy
is presented as having significant advantages over other
strategies, in part owing to the use of a skeleton schema in
the view integration process. The skeleton schema is an
overall schema of the application domain, developed
separately from the individual user views. That schema "acts
as a frame for the most important concepts in the
application domain and embeds the links between partitions"
(ibid., 74). The partitions mentioned are analogous to the
concept of user views used in this thesis. Here, we call
such a skeleton schema the managerial schema.2 This
managerial schema becomes the primary knowledge structure
for both the principles and industry levels of domain
knowledge used in solving view integration conflicts.

The embedding of the principles level knowledge within
the managerial schema is accomplished by a rigorous
adherence to REA accounting theory as the basis for its

design. Some explanation of REA theory was given in Chapter

 

2 The skeleton schema of Batini et al. (1992) corresponds closely
with the concept of the managerial schema discussed in Lum et al.
(1979). It is in that sense that we here use the term managerial
schema. To eliminate possible confusion, the reader is advised that
Batini et al. use the term managerial schema for a somewhat different
concept.

42
2, but it will be helpful to address briefly the choice of
this manner of representing principles level knowledge.

The problem domain of REAVIEWS is that of conceptual
database design. The problem-solving task undertaken by
REAVIEWS is the integration of conceptually-modeled user
schemas into a unifying, global schema. The accounting
principles relevant to system design would be most
accessible and useful if characterized in terms related to
the problem domain of conceptual database design. This is
precisely the characterization provided by the REA model.

It is helpful, if not critical, to understand that the
important aspects of the REA model are the accounting theory
and principles at its core, not the modeling language chosen
to explain the theory's use in the design process. The
"rigorous adherence" to the theory is much more than merely

using a certain set of diagrammatic techniques.3

By such
application of the theory, we are able to make strong
inferences about the objects we expect to find in the
enterprise schema. This high level knowledge alone,
however, presents a rather incomplete view of such objects.
By adding more specific knowledge from the enterprise

industry level, we gain the ability to make even more

inferences about those objects. Transformation of

 

3 While much of the published work on the REA model has used the

Entity-Relationship modeling conventions of Chen (1976), the REA
accounting model may be expressed in any semantic language rich enough
to capture all of its declarative and procedural requirements. For
example, Geerts and McCarthy (1991) present the REA accounting model
using a very different set of diagrammatic constructs, but the
principles and theory of the REA model are uncompromised.

43
accounting knowledge from the Encyclopedia of Accounting
Systems into an REA-modeled representation provides a way of
adding such industry level expertise to principles level

expertise. It is discussed next.

3.4 Conceptual Modeling of Industry Level Knowledge

Our eventual goal is to make use of industry-level
knowledge in the process of view integration. That process
uses such knowledge in the form of the managerial schema
described earlier. The source of that knowledge, the
Encyclopedia of Accounting Systems, presents its information
from a traditional, chart-of-accounts-based viewpoint
optimized for double-entry bookkeeping. The Encyclopedia
includes charts of accounts, illustrations of source and
output documents, examples of journals, and textual
narrative. Some of the knowledge is less tied to this
manual system orientation. For example, there is discussion
of some principles useful in the design of appropriate
organization structures, and sample organization charts are
provided. In addition, when describing typical economic
transactions in the industry, the encyclopedia contains many
facts that are "accounting system neutral." Overall though,
the tone is very heavily influenced by the underlying
requirement that accounting data be classified according to
the categories in the chart of accounts to enable the proper
double-entry recording of transactions. The managerial

schema (and the knowledge it embodies) must therefore be

44
reconstructed from the information provided in the
encyclopedia.

Knowledge acquisition (KA) from documentation has not
been widely researched, with the notable exception of the
CYC project (Lenat et al. 1986). There are few formal
methodologies that deal with this approach to RA. Turban
(1990,465) notes that acquiring knowledge from documentation
is used where "the concern is to handle a large or complex
amount of information," a case that certainly applies to
conceptual modeling. Modeling a real enterprise requires
the handling of a large and complex amount of data. The
analyst must produce a model that adequately reflects the
real properties of the enterprise being modeled. This goal
is complicated by the fact that there may be multiple models
that satisfy the requirements. Hawryskzkiewycz (1984,115)
echoes a widely-held view when he states that "the design
process is not deterministic: different designers can
produce different enterprise models of the same enterprise."

At the macro level, constructing the managerial schema
from documentation is a two-stage process. REA templates
are first instantiated for each of the accounting cycles.
These accounting cycle schemas are then combined together
into the managerial schema. At the micro level, however,
the process quickly becomes complicated. This complexity
arises, to a great extent, from the different perspectives
of accounting taken by the Encyclopedia of Accounting

Systems' authors and the REA accounting model.

45
The REA model is based heavily upon the basic stock-
flow nature of accounting information and the concept of
duality. As described in McCarthy (1982,561),

Elements of the general ledger normally are
classified as either balance sheet accounts, which
represent monetary stocks of goods, services, and
claims at a particular time, or income statement
accounts, which represent monetary flows of these
same items over a period of time.

When viewed from this perspective, the important concepts in
the accounting system are the economic events of the
enterprise, the economic resources and agents who
participate in those events, and the particular
relationships that link all these entities together.

As mentioned previously, many of the important concepts
included in the Encyclopedia of Accounting Systems are
artifacts of double-entry bookkeeping that arose primarily
from the manual storage and transmission of accounting data.
The difficulty then becomes the "sorting-out" of the
important accounting facts in the encyclopedia. In essence,
what is being attempted is the reasoning backward from the
compiled journal and ledger view to the underlying knowledge
about the important economic entities and relationships in a
given industry. Once that is achieved, the knowledge can be
transformed into a managerial schema using REA accounting
theory to arrive at a structure that embodies important
knowledge from both the principles and industry levels. We
next look at the various forms of knowledge in the

encyclopedia and explain how this transformation was

46
accomplished. The industry selected for this project was
the machine shop. This industry was selected for the rich
contextual setting provided by its manufacturing focus and
because of the availability of a number of modeling cases
drawn from actual business enterprises from this industry.
For ease of reading, we will hereafter refer to the

Encyclopedia of Accounting Systems as the EAS.

3.5 overall Knowledge Acquisition Strategy

The desired output of our knowledge acquisition
activities is a managerial schema for a typical machine
shop. To fill the REA templates, it is necessary to answer
the following questions:

. what entities can we identify from the EAS?

- what attributes can we identify for the entities?

- what identifiers (keys) can we identify for the
entities?

- what relationships can we establish among the
entities?

The first step, then, is to determine the resources, events,
and agents present in the EAS view of that industry. To do
this, we separately examined the four major types of
information found there: textual narrative, chart of
accounts, sample documents, and organization chart.

Examples of each type of information will be given in the
sections that follow. As entities were identified, they
were recorded, along with any attributes discovered. When
the entity identification step was completed, keys for each

entity were identified.

47

After the initial identification of entities, the event
entities were separated into accounting cycles. Then, by
cycle, REA templates were filled with the identified
entities. This grouping of objects of interest into
accounting cycles is not a requirement for REA modeling, but
rather a common convention used by accountants when trying
to reduce the complexity of the business enterprise into
manageable chunks. Accounting information system
consultants and auditors frequently analyze clients by
cycles. This grouping by cycles allowed the analysis of
smaller groups of entities that still had some close

association to each other.4

3.5.1 narrative. The first pass through the EAS was an
examination of the narrative portions to gain an
understanding of machine shops and to begin to identify the
entities needed in the managerial schema. vThe following
quote is the first paragraph of the section of machine shop
narrative titled "General Features."

The machine shop industry is extremely
competitive. Usually, sales are to industrial
plants (professional buyers who are primarily
concerned with service and price), on a bid or
quote basis. Price is an important factor in
obtaining sales. a given machine shop will
compete with different sizes and types of
companies on each item quoted, depending on the
item and location of the customer. These

 

4 The precise accounting cycle scheme used is not critical to the

task. There is no widespread agreement about terminology or even about
the cycles to which some accounting transactions belong.

48

competitive features require the accounting system
to reflect adequate costing to furnish a guide for
estimating prices on future jobs. (Pescow 1976,

1171)
In this one short paragraph, we learn a number of facts.
Four entities are described, customer, price quotation,
sale, and finished good. The exact terms given for these
entities are industrial plant (professional buyer), bid
(quote), and item. This highlights one of the underlying
premises of the EAS. The EAS's authors assume that their
work will be used by those already familiar with basic
business and accounting concepts. Any experienced
accountant should be able to read the above passage and
understand the business entities involved. This does impose
the requirement, however, that those acting as knowledge
engineers in reconstructing industry level accounting
knowledge from the EAS also have moderate knowledge of basic
business and accounting concepts.5

The above passage also begins to identify some of the
attributes of those entities. We see that price is an
attribute for price quotation, sale, and finished good. We
also see that location is an attribute for customer,
although the concept of location is refined in later

narrative to become the concept of sales territory, which

5 The minimum level of such knowledge required for the knowledge
acquisition task is an interesting research question in itself, but
beyond the scope of this thesis. From the experience gained in
performing such acquisition, it appears that an advanced accounting
Student probably possesses enough business and accounting knowledge to
isdentify adequately the important accounting entities in the EAS
.industry descriptions.

49
has further possible components of trading area, county, and
state. A partially filled-in REA template derived from the
above passage is shown in Figure 3-1. Those entities found
in the EAS narrative are shown with solid lines, while other
entities required by the full template are shown with dotted
lines. Note that the REA template anticipates a dual event,
cash receipt, to be triggered by the sale event. In this
example, sale is the decrement event, and cash receipt the
associated increment event. Note also that the REA template
allows for other types of relationships (such as commitment)
beyond those shown in the general REA template in Figure 2-2
in Chapter 2. Examples of some of these types of
relationships can be found in McCarthy (1982), Denna and
McCarthy (1987), and McCarthy and Rockwell (1989). This
type of analysis proceeds until the entire machine shop

section of the EAS has been examined.

3.5.2 Chart of Accounts. Analysis of the chart of accounts
in the EAS is very similar to that of the textual narrative.
One important difference is that some of the generalization
relationships are more apparent here, as shown in the coding
of the account numbers. Figure 3-2 shows a portion of the
machine shop chart of accounts. The center and right
columns contain double-entry bookkeeping information about
debits and credits. This type of information is an artifact
of manual record keeping, and it provides virtually no

problem solving knowledge. Those artifcats were left in

50

 

 

 

 

CUSTOMER l

 

 

 

 

I
I
I
I
I
I
*\

SALE

FINISHED-
GOOD

’ \
\ I,
\.I

 

iI'""""“

k...—.......-.

 

SALES
PERSON

----------c-“

'0

 

I
\
"”"J-Oooo-

 

'COOOQQQQQOO-

 

CASH

CASHIER Inn-----

LO-’-----.--.,

5“
~I

Figure 3-1 Partially Instantiated REA Template
[adapted from McCarthy (1982)]

51

BALANCE SHEET ACCOUNTS

Assets Debits ﬁom Credits from
101 Cash ................... Cash Receipts Journal Cash Receipts Journal,
Payroll Journal
102 Investments ............... Cash Disbursement Journal Cash Receipts Journal
103 Notes and Accounts Receivable . . . Sales Journal Do
104 Inventories:
104-1 Supplies ............... Voucher Register Material Requisition Summary
104-2 Material ............... Do Do
104-3 Work-in—Process ......... Material Requisition Summary, Job Ofﬁce Summary
Payroll J ournal,
General Journal
104-4 Finished Goods .......... Job Order Summary General Journal
110 Land ................... Voucher Register
111 Buildings ................ Do
112-1 Machinery ............... Do
112-2 Small Tools ............... Do
112-3 Shop Equipment and Fixtures . . . . Do
113 Ofﬁce Equipment and furniture . . . Do
114 Automobiles .............. Do

115 Patents ................. General Journal
116 Reserve for Depreciation, etc. . . . Do Do
120 Prepaid Expenses ........... Voucher Register Do

Figure 3-2 Partial Chart of Accounts — Assets
[sourcez Pescow (1976, 1174)]

52
Figure 3-2 to demonstrate how the authors of the EAS
”compiled" their accounting knowledge into a format very
different from that needed in the task of view integration.
The ”Do" in those columns stands for ”ditto."

We see there that there are four types of inventory.
Inventory has a code of "104,” and the four sub—types of
inventory all have codes starting with "104-." There is an
additional type of resource, also with three sub-types,
whose code is "112." However, the EAS does not give us a
name for this resource.

The missing name for resource "112" points out a
recurring problem with using reconstructive sources for
knowledge. When information is incomplete, inconclusive, or
simply missing, we are not able to go to the expert and fill
in the gaps in the desired expertise. Sometimes we can
deduce the information we need from our other knowledge
sources, i.e., the REA model and general accounting
knowledge. Other times we must simply use an incomplete
knowledge structure and appeal to the user for such
information when needed. Section 3.5.2 provides more
discussion on this subject.

Another area of confusion comes from the numerous
expense accounts we find. Figure 3-3 shows another portion
of the chart of accounts. From the coding scheme used, we
recognize these items as period costs rather than product
costs. At first, many of these expenses are difficult to

place in the normal REA template of increment/decrement

53

600—Marketing Expense:
601—Salaries:
601-1 Sales Salaries .................
601-2 Salesmen's Commissions ..........
601-3 Stores Salaries .................
601-4 Engineering Salaries .............
601-5 Salaries-Research ...............
601-6 Salaries-General Ofﬁce ...........
602-611—Service and Expense:
602-1 Advertising-Catalo ..............
602-2 Advertising—Genera ..............
603-1 Agents' Commissions ............
603-2 Dealers’ Discounts ..............
604-1 Traveling and Entertaining Expense . . . .
605-1 Parcel Post and Express Shipping Expense
605-2 Postage-General Mail ............
606-1 Sales Tax ....................
607—1 Sales Auto Expense ..............
607-2 Delivery Expense ...............
608-1 Telegraph Expense ..............
608-2 Telephone Expense-Toll Calls .......
608-3 Telephone Expense-Other ..........
609-1 Depreciation-Automobiles ..........
609-2 Depreciation-Ofﬁce Equipment .......
609-3 Rental of Ofﬁce Equipment .........
609-4 Maintenance of Ofﬁce Equipment .....
610-1 Collection Fees ................
610-2 Credit Reporting Fees ............
610-3 Research and Development Supplies . . . .
610-4 Forms and Ofﬁce Supplies .........
611-1 Shipping Room Supplies ...........
611-2 Sundry Marketing Expense .........

Figure 3-3 Partial Chart of Accounts — Marketing Expense
[adapted from: Pescow (1976)]

54
event pairs. McCarthy (1982) discusses the concept of
"event combinations" caused by accepted accounting
convention. Many of the traditional ”period” expenses in
accounting systems are of this nature. Because of the
difficulty in achieving exact matching of some outflows to
their appropriate inflows (or because of the marginal
information content of such matching), some inflow/outflow
pairs are combined into one event entity. For example,
rather than record the acquisition of utility service as an
inflow separate from the use of such service, we routinely
record it as a period expense. This, in effect, combines
the acquisition and the "using up” of the resource as a
single event. Most of the expense accounts can be modeled in
this fashion, although this is a compromise of the general

REA template.

3.5.3 Documents. There are a number of typical documents
shown in the machine shop industry section of the EAS.
Perhaps due to the completeness of the narrative and chart
of accounts already examined, there were no new entities
discovered from the sample documents, however, the extra
detail on the various forms did suggest some synonyms and
additional attributes for entities already modeled. One
example is the discovery that the EAS was using product as a
synonym for the chart of accounts inventory item finished
goods. Another example is the discovery on the shop order

system forms that some raw materials quantities are measured

55
by weight, while others are measured by the more familiar
number of items (often referred to as quantity). To
accommodate this, raw material has the attributes quantity

and unit of measure.

3.5.4 Organization Chart. The information given in the
sample organization chart theoretically provides a wealth of
information about employee roles, generalization hierarchies
among employees, and most of the knowledge needed to model
the responsibility relationships necessary for the various
inside agents in the REA templates. The wide variation in
organization charts expected among actual business
enterprises, however, limits the usefulness of the
information given in the EAS. As the authors there point
out, the actual organization charts for real businesses
would vary by size of business and the nature of the
products manufactured. The only information given that
would seem to apply widely is the advice given in the
following quote from the EAS:

.However, the points enumerated below should be
given serious consideration in determining the
organization of a machine shop.

1. Research should be set up as a separate
function reporting directly to top management and
not subordinate to engineering or production
departments. Whenever research activity is a part
of either the production or engineering divisions,
there is a natural tendency to push research work
aside in favor of customers' orders during periods
of heavy activity. There is also a tendency to

56

over-staff the research department whenever
activity slackens.

2. Engineering is usually an important part of
machine shop operation and should rate a top-level
division.

3. The personnel department should also rate a
major place in the organization structure to help
insure favorable labor relations and community
acceptance.

4. Purchasing should be centralized in one
department under the controller. The proper
functioning of this department is dependent
largely upon the development of proper records,
routines, and purchasing techniques. However, the
purchasing function in a machine shop does not
usually have sufficient influence on major policy
decisions to require a separate division reporting
directly to the president. (Pescow 1976, 1173)

This advice is still but a recommendation, and the possible
variations in actual machine shop organization prevent us
from making strong inferences with this knowledge, at least
in the view integration task as defined in this research.
The suggestions there do introduce an interesting topic-—
that of the proper role of view integration activities.

As presented in most of the papers and books on the
subject of conceptual design of database systems, the view
integration stage is seen primarily as the merging of the
individual user views created during the view modeling
stage. As Batini et a1. (1992,119) state:

The main goal of view integration is to find all
parts of the input conceptual schema that refer to
the same portion of reality and to unify their
representation. This activity is called ache-a
integration; it is very complex, since the same
portion of reality is usually modeled in different
ways in each schema.

57
This is precisely the focus of the view integration task in
this thesis. As such, the ultimate use of accounting domain
knowledge is to help identify errors and omissions in
modeling the realities of the business enterprise as they
exist (or as management wishes them to exist in the new
information system, should that be the focus of the
conceptual modeling). The earlier suggestions about
organization structure would not be as helpful in
determining what organization form actually exists in a
given machine shop as they would be in evaluating the
appropriateness of structure being used by the enterprise.
In other words, the information helps us identify errors in
the organizational structure of a particular machine shop
rather than errors in the modeling of that structure. Much
of the knowledge in the EAS is of this nature, and for
purposes of this thesis, it is essentially ignored.

On the other hand, most information system design
methodologies stress the iterative nature of the process;
discoveries at one stage may lead to a re-thinking of prior
decisions, and, in some instances, redesign and repetition
of earlier steps in the process. As the view integration
stage presents designers and managers with a particularly
insightful perspective on the enterprise and its operations,
it would be possible to add considerably more input to the
design activities if some of this other industry domain
expertise could be used. For example, instead of merely

identifying the responsibility relationships in the user

58
views, it might be possible to make some sort of evaluation
of the existing responsibility hierarchies and recommend
improvements. Such investigation is greatly outside the

scope of this thesis.

3.6 Selection of Entity Keys

After all the entities have been extracted from the
various parts of the EAS description, identifiers (i.e.,
keys) must be determined for each entity. Where more than
one key can be identified, one is selected as the primary
key, and the others are listed as non-primary-key
attributes. These "additional" keys are also designated as
candidate keys, which might be used as entity identifiers.
This information is used in the actual view integration
stage.

Where possible, keys from the EAS were used. When such
keys were missing, we were sometimes able to supply them by
resorting to general business and accounting knowledge. For
example, the EAS never used the term social security number
as a key for employee, but we know that it frequently serves
that purpose. As further support for the use of this key,
we observed that in an illustration of a job time card, a
block titled "EMPLOYEE NAME AND NO." contained the name
"WILLIAM PETROFF" and the number "350-03-3094," which looks
strikingly like a social security number. At the same time,
that block also contained the designator "L78." It is not

uncommon to assign employee identification numbers according

59
to some other scheme, so information from the time card
illustration was inconclusive as to which number was being
used as the identifier. We chose to model the two as
separate attributes based upon information in other
illustrations. On some forms shown, there are fields
labelled "DEPT. AND OPER. NO." The values in these fields
are short combinations of letters and numbers, such as "T-7"
and "2-9." The resulting managerial schema therefore
contains the key employee no. for employee, with social

security number listed as a candidate key.

3.7 Composite-Rey Entities

For simplicity in illustration and computation, primary
keys were required, if at all possible, to be simple keys.
That is, they were constrained to be single attributes. In
reality, most primary keys are single rather than multiple
attribute. It is a common convention to choose simple keys,
as Batini et al. (1992, 294) note

If an entity has multiple identifiers, one of them
must often be designated as the entity's primary
key. A secondary decision criterion is to prefer
simple identifiers to multiple identifiers, and
internal identifiers to external identifiers; in
this way, primary keys of entities of entities can
be kept minimal in size and simple in structure.

In the actual entity modeling, when identifiers could be
found in the EAS, they were, in most cases, simple and
internal. Those entities requiring composite keys were

notable in that they could each be classified into one of

60
three types of entities, each with interesting properties.
Those entity types were employee service, depreciation, and
budgeted, events.

Figure 3-4 shows examples of these entities. The one
thing all of these entities have in common is that they are
events (or budgeted events) that often do not generate
source documents with unique numbering systems. To put it
in non-artifactual terms, they are events to which we do not
normally assign internal identifiers. For most events (and
recall that events are phenomena that reflect changes in
resources), we have some method of assigning unique
identifiers. Examples of such unique identifiers (which we
refer to as "event codes") include the invoice number for
the sales event, the requisition number for a material
requisition, and the receiving slip number for a material
receipt event.

For employee service events, there is usually a time
card, time sheet, or perhaps a job sheet, which often do not
contain unique, sequential document/transaction numbers, as
we see on most other "event recording" documents. These
types of events are also unique in the REA model, in that
they are resource incrementing events that have cash
disbursement as their dual relationship and also have an
employee of the business enterprise as the outside agent.

For depreciation events, we normally issue no source
documents. In a manual system, there would simply be an end

of period journal entry. In one sense, depreciation can be

61

L—o employee no.
L—o start time
—0 stop time
—0 operation. no.

1
job
operation

a. Employee Service

 

 

—o depr. amount

-—e building no.

—. data

b. Depreciation

(I)
(D
2.
O
(D

building I

 

 

-—0 operation no.
r—O product-no
—0 standard time
—o standard-setup

 

g. —0 description

c. Budgeted Event .
operation

type

 

 

Figure 3-4 "Composite-Key" Entities

we

6V

Ste

acc

QCt

62

thought of as somehow different from ”traditional" exchange
events. Although depreciation is supposed to represent the
"using up" of a long-lived asset, the actual formulas used
to calculate depreciation frequently bear little
relationship to the actual reduction in the assets life.
Further, the depreciated resource is not really decremented
in the same way a resource like finished goods is. When we
decrement finished goods, it is usually because we have sold
something and that thing is actually removed from our
control. When we record depreciation on a building, the
building is (usually) still there. We do not witness the
physical decrement to this resource until we sell or
demolish the building. What we are doing is trying to
partition, for accounting purposes, what is essentially a
continuous process.6

Budgeted events, such as job operation standards (which
we refer to as job operation types), are also non-exchange
events, but in a different way; the entity is simply a
standard created by management to allow responsibility
accounting by measuring variances between projected and
actual changes in resources.

The use of composite keys on depreciation and employee
service can be eliminated by assigning a unique time ”stamp"
to each occurrence of an event, as in Gal and McCarthy

(1986). For budgeted events, the identifier would be

 

6

event.

McCarthy (1982) refers to this type of event as a partitioned

63
something other than time. For example, for the job
operation type event, the identifier would be operation
number. To some, this may seem artificial. For example, it
somehow seems more natural to think of depreciation in terms
of an asset/time period combination, and indeed, the asset
number and date attributes are used as the primary key in
our managerial schema. Likewise, we could have the
accounting system assign a unique time stamp to the employee
service events when payroll is calculated, but it seems more
natural to think of this service in terms of the
employee/pay period combination used as the primary key in
our managerial schema. At the view integration stage,
however, there is no conceptual difference between the use
of simple and composite keys. For simplicity in explanation
and computation, we use the simple key time as the key for

those classes of entities just discussed.

3.8 Relationships and Structural Constraints

Before we begin discussion of modeling relationships
from knowledge in the EAS, we must refine the concept of
structural constraints introduced by the Chapter 2
discussion of cardinality ratios. Those ratios are one
notation for specifying acceptable limits on the number of
times an instance of one entity may participate in a
relationship with an instance of another entity. The "one
salesperson to a sale" limit was given as an example of this

concept. For a complete specification, we also need to know

64
whether an entity must participate in a particular
relationship (total participation) or whether such
participation is optional (partial participation). Consider
the relationship between department and employee. An
example of total participation for employee is a policy
stating that all employees must be assigned to a
departments. An example of partial participation for
employee is a policy which does not require all employees to
be so assigned.

We can now fully specify structural constraints by
assigning a two number "constraint set" to each entity in a
relationship. The first number (the min-card) specifies the
total/partial participation constraint. A zero specifies
partial participation, and a number greater than one
specifies total participation. The second number in each
pair (the max-card) specifies the maximum number of
instances of the relationship in which the an entity

instance may participate.

3.9 Relationships in the EAS

REA accounting theory provides us with a number of
"basic” relationships that must be present in a properly
constructed accounting system. These were used as the
initial relationships modeled in the managerial schema. The
EAS was then re-examined to discover other relationships

that might be expected in typical machine shops.

65

When re-examining the EAS, the primary focus was on
trying to discover relationships outside of those in the
generalized REA template. Readers should recall from
Chapter 2 that the general REA relationships were stock-
flow, duality, control, and responsibility. While not part
of the general template, other relationships are allowed.
Also recall that responsibility relationships were, for the
most part, inconclusively specified by the EAS. The primary
relationship discovered in this re-examination pass was the
specifies relationship, which connects a resource-affecting
event with a budgeted event. The resource-affecting event
participates in a stock-flow relationship with a resource of
the business, and the budgeted event specifies budgets
(standards) for that stock-flow. For that reason, we refer
to the relationship between the two events as a "flow-
budget" relationship. Figure 3-5 shows one example of this,
the job operation to job operation type relationship.
Instances of job operation type provide budgeted standards
for the flow of employee labor into work-in-process during
job operations.

After all relationships were identified, structural
constraints were assigned to the relationship. For
simplicity of illustration and reduction of implementation
complexity, binary relationships were used. This makes no
difference in the view integration task, as the ternary
control relationship is simply modeled as two binary

relationships of the same type, one between an event and an

66

 

 

 

job-op- (0,N) low (0,1) job (1,N) (0,N) .Ob
type udge operation 1 g

 

 

 

Figure 3-5 "Flow-Budget" and "Stock-Flow" Relationships

 

0P

We

St]-

67
outside agent, the other between that same event and an

inside agent.

3.9.1 Assigning Structural constraints. Assigning
structural constraints was another task that was only
partially aided by EAS knowledge. There were few definitive
examples in which the EAS indicated possible constraints,
and these were more implicit than explicit. For example,
the discussion of production employees led to the belief
that they repeatedly participated in employee service events
(such as job operations). We also observed that a
particular job operation was usually performed by one
employee. This would make the manufacturing employee to job
operation relationship one-to-many. Min-cards were not
expressed, but it is normal for employees to be hired and
their personnel records entered into the company's record-
keeping system before they actually perform any work for the
company, so the min-card on the employee side was set to
zero. On the other hand, a job operation event must be
performed by an employee. Thus, the min-card for job
operation must be set to one. The resulting modeling
construct is shown in Figure 3-6a.

For the most part, however, structural constraints were
generated using more general knowledge about business
operations, when possible, and they were left blank if there
were no compelling arguments for a particular set of

structural constraints. The customer to sale relationship

68

 

 

mfg (O,N) (1,1) job- *
employee operation

a. mfg-employee-job—operation relationship

(1,1) (0,N) *
sale customer

b. sale-customer relationship

 

 

 

 

 

 

 

 

 

Figure 3-6 EAS-derived Structural Constraints

69
is an example of structural constraints derived, at least
partially, from general business knowledge. The EAS does
not explicitly indicate the full expected structural
constraints between these two entities, but we reason as
follows. It is a natural assumption in most businesses that
customers may (and hopefully, will) participate in multiple
sales events. It is also the usual case that any given sale
will be to one customer. This results in max-cards of ”N"
for customer and "one" for sale. It is also common practice
in manufacturing companies to enter customer information
into the information system in advance of sales, perhaps
after an initial sales visit by the firm's sales
representative. This would give customer a min-card of
zero. On the other hand, an individual sale must be made to
a customer, which would argue for a min-card of "one” for
sale. Further, the EAS narrative for machine shops contains
statements like

Accounting for income begins with receipt of the
customer's order. The routine of processing
orders should be tied in with the accounting
system to give adequate information about sales
classification at lowest cost of handling. To
provide data for accounting and sales analysis
work, both customer and product classification
must be considered. (Pescow 1976, 1177)

This also argues for a minimum cardinality of "one" for
sale, for it appears that we always wish to know the
customer to whom a sale is made. From the previous
reasoning, we produce the binary relationship shown in

Figure 3-6b.

 

57

70
Finally, the individual accounting cycle schemas were
integrated into the managerial schema for the machine shop

industry.

3.10 Iodeling Compromise

In Chapter 2, we mentioned that there are a number of
compromises to the full REA templates that commonly arise in
actual enterprise modeling situations. Those compromises
include event partitioning, materialization of claims as
base objects, and event combination due to expensing of
immediate services. The managerial schema constructed for
REAVIEWS contains some examples of such compromise, and we
should explain the reasoning behind their inclusion in the
managerial schema.

In McCarthy (1982), compromise to the full REA template
is discouraged, but allowable in three general situations.
The first is when modeling transactions for which existing
accounting convention allows less than full specification of
schema elements. The second is when implementation
concerns, such as system efficiency or storage requirements,
indicate that the adjustments make economic sense. The
third is when such compromise enhances the decision

7 When deciding which

usefulness of the resulting system.
compromises to allow in the managerial schema, the following

rules were followed:

 

7 For additional analysis of these situations, see McCarthy(l982,

71

- when existing convention allows less than full
specification and the convention is common, allow
compromise.

- when compromise is for implementation reasons,
defer compromise to implementation stage of
project and do not allow compromise at view
integration stage.

- when compromise might enhance decision usefulness
of resulting schema, allow compromise when there
is compelling evidence that such additional
decision data is actually being used by the
enterprise.

The various depreciation events are example of event
partitioning. They are common in most accounting systems,
so they were modeled in the compromised fashion following
the first rule.8 There were no compromises suggested by the
EAS that would invoke the second rule. There were, however
several compromises that fell under the third rule, whose
application deserves some explanation.

Perhaps the most common compromise for enhancing
decision usefulness is the materialization of claims as
separate base objects, separate derived objects, or as
views. For example, accounts receivable can quite properly
be thought of as the imbalance between two economic events,
sale and cash receipt. As such, we do not need to model

that claim as an object in the enterprise schema. It can be

 

8 Early papers on REA accounting theory commonly included such
partitioned events in REA-based schemas. Later work examining the
usefulness of REA-based schemas in linking REA-modeled databases with
knowledge-based systems has pushed us toward a much stronger position
favoring maintenance of an ”non-compromised" enterprise schema in
virtually all cases. The "compromised" enterprise schema (which could
include such things as partitioned events) would be derived as a View of
the underlying non-compromised version and would serve as the middle
level schema in the three-level schema approach presented in Chapter 2.

i:

72
materialized via a procedure when needed, as when producing
financial statements. We can also conceive of situations
when we might wish to consider the claim as an object having
attributes of its own, as when we produce an aged accounts
receivable report.

There were three main claims present in the EAS that
fell under the third rule, Accounts and Notes Receivable,
Accounts and Notes Payable, and Capital and Surplus. The
Machine Sheps narrative did not discuss using these claims
in any decision processes, and the chart of accounts
presented no lower level detail (i.e., sub-accounts) for
these items, so they were not materialized in the managerial
schema. Should they be present in views supplied by the
users to the view integration process, they would then be
added to the enterprise schema.

The managerial schema also contained other compromises
due to insufficient information in the EAS. In the most
commonly conceived use of the REA model, the actual business
enterprise is the source of knowledge about the entities to
be represented. If there is incomplete knowledge about an
entity or relationship, it is assumed that such knowledge
can be discovered. In other words, if one needs to know
something important about an entity, one can go ask someone
in the company, examine documentation, etc. When using the
EAS as the source of knowledge about entities and
relationships, we can only model what we find there. There

is no other source for gaining additional information. For

73
some entities, there was simply not enough information to
produce adequate constructs. In the best of such cases,
general business and accounting knowledge allowed us to
"fill in the gaps." In the worst cases, the objects were
left out of the schema or modeled by an object representing
a class of the entity. For example, there are many accounts
in the chart of accounts that represent various services
supplied by vendors, among them Advertising-General, Rental
of Office Equipment, Maintenance of Office Equipment, and
Credit Reporting Fees. No more information is ever given
about the services. To present a managerial schema that at
least contained some representation of these events, we
added the entity Vendor Service. This entity is a
representation of the more general class to which the
service events belong.

By choosing to model the more general object, we are in
essence saying that we do not have enough information to
distinguish between the more-specific events. Lacking any
strong evidence that we should capture "differential"
information among the events, we model one construct that
can accommodate all of them in the managerial schema. The
binary relationship between Vendor Service and its dual
event, cash disbursement, is shown in Figure 3-7. Note that
there is no functional difference in REAVIEWS problem
solving abilities whether we model these events in this
fashion or simply leave them out of the managerial schema.

This type of incomplete information is not used by REAVIEWS

 

'74

voucher no
date
amount

—o service-type
—0 description

—e invoice-no.

—0 date

--0 amount
—e
—o
—o

<
(D
3
O.
0
q

- cash-
service disbursement

 

 

 

Figure 3-7 Vendor Service Entity

75

in the view integration task-— the assumptions necessary for
its use go well beyond the reconstructive knowledge focus of
this‘thesis.9

Throughout the entire process of building the
managerial schema, we also made note of typical synonyms for
the various entities and attributes. The product-finished
goods synonym mentioned earlier is one example of this.
This information was also stored in the system for use in

view integration.

3.11 Integration Conflict Resolution

The next few sections deal with problem-solving
strategies for view integration. The discussion is
presented from a "human problem solver" perspective.
Chapter 4 presents the same integration process from the
perspective of REAVIEWS, the knowledge-based system in which
these strategies were embedded. That discussion is
presented in terms more specific to the representation
language chosen for the actual implementation, but the same

overall strategy should be apparent in both chapters.

3.11.1 Basic Problem solving Cbncepts. When considering
the knowledge needed, stored, and used in solving a

particular problem, one commonly made assessment is the

 

9 This situation did provide some clues that the integration

process may be further aided by using accounting knowledge from other
sources than those studied for this project.

76

level, or depth, of that knowledge. Depth of knowledge is
usually placed on a continuum between deep representations
and shallow, or surface, representations. Deep models of
knowledge are typically presented as capturing the
underlying causal processes in the domain of interest, while
shallow models of knowledge typically make use of pre-stored
associations between problem descriptions and solutions. A
familiar illustration for deep reasoning in the medical
world is the just-graduated medical student who performs
diagnosis by using functional knowledge of the human body to
reason backwards from symptoms to likely causes. Shallow
reasoning would be represented by a physician with many
years of diagnostic experience who has compiled a wide
database of empirical associations between patterns of
symptoms and diseases based upon the many previous diagnoses
the physician has made. This physician can in many
instances make a very quick and accurate diagnosis by
matching the current pattern of symptoms against those
compiled patterns.

Researchers have proposed or built knowledge-based
systems that make use of both sorts of knowledge.
Typically, these systems initially avail themselves of the
relatively efficient shallow reasoning, but resort to using
deeper models of knowledge when shallow methods fail to
resolve the problem satisfactorily. In performing view
integration, we follow a similar method, moving from shallow

to deeper levels of knowledge as necessary.

77
3.12 View Integration Strategies
At the most general level, the view integration process
looks simple. We make use of the ladder strategy explained
in Chapter 2 to add the user views, one at a time, to the
initial managerial schema, using the following three-stage
strategy:

- try to identify the components in a user view as
existing objects in the schema.

- if components are identified, check for and
resolve conflicts with managerial schema
specifications (e.g., in structural constraints);
if not identified as already established
components, treat as new components.

- if user components are not already represented in
the schema, add them.

This apparently simple strategy masks some of REAVIEWS's use
of domain knowledge in the integration process. As we move
to lower levels of detail, the process becomes much more
complex. Figure 3-8 shows a flowchart of the problem
solving strategy in REAVIEWS. This provides a concise
overview of the process discussed in the narrative that
follows next. The rectangles in Figure 3-8 represent the
major processes in our overall strategy. Appendix A
contains descriptions of those processes. If one works
through the diagram and appendix, one gains a better
understanding of the problem solving algorithm at work. The
narrative that follows takes us through the process from a

relatively "machine independent" perspective. One should

78

   
    
     

START

  

‘ get
individual (
view

T

find
—) entity (

find
pka

@N

Y

get
user
schema

 

 

 

 

 

 

 

 

    
    
 
  

add
unique
enti

find
candidate
entities

 

 

)2

  
   
   
 
  
   

confirm
candidate
enti

 

 

 

 

 

 

 

 

 

 

  

find
add .
...... '°13§:°"' f

 

 
 

 

 

 

      
    

tind
foreign-
ke n . ka

 

 

   

 

add

toreign- add
key entity relation-

E | shi .

 

 

 

 

 

 

 

 

save
schema

 

 

 

 

 

Figure 3-8 Flowchart of REAVIEWS’s Integration Strategy

79
also be able to identify this same process in the Chapter 4

discussion of REAVIEWS’s handling of the test cases.

3.12.1 Initial Schema Processing. As an initial aid in
problem solving, user views are first grouped by accounting
cycle, then solved by cycle. This is not required by the
underlying KBS processes, but it is viewed as an aid to the
users of the system, who are allowed to focus on a smaller
subset of the enterprise's business activities at any given
point in the consultation. Also, as part of the initial
view modeling, users produce their views using constructs
from REA theory and entity-relationship modeling, as in
McCarthy (1982).

When presented with a user view, we attempt to
recognize entities in the view in the following order:
events first, followed by resources, then followed by
agents. The important part of this ordering is the primacy
of the event entity. If you look at the general REA
template of Figure 2-2, you see that event is usually
directly linked to most of the other entities in the
template. Furthermore, a major function of a business
company’s accounting system is the capturing of data about
the economic transactions entered into by that company. A
typical definition of an accounting system is given by
Horngren and Foster (1987, 910):

An accounting system is a set of records,
procedures, and equipment that routinely deals

80

with the events affecting the financial
performance and position of the organization.

In addition, Ijiri (1975, 61) relates

The notion of exchanges is significant in
accounting measurement because an increase or a
decrease in the resource set is treated not as an
isolated event, but as an integral part of
activities.

Hence, if we can first identify an event in a user view, we
have potential insight into the identities of most of the

other entities.

3.12.2 Entity Identification. Initial recognition is
performed using a simple pattern-match on the names and
primary keys of the entity, making use of the synonym lists
constructed during the knowledge-acquisition phase and added
to as the view integration proceeds.

If that shallow matching fails, we resort to deeper
knowledge contained in the managerial schema about the
various groupings of entities and relationships we eXpect to
find in a typical machine shop. We inspect the other
entities in the user view and attempt to identify an REA
template in the managerial view that might be the same as
the user is modeling. The various potential templates are
ordered from most likely to least likely. We take the most
likely template, describe it to the user, and suggest which
entity we think the user is attempting to model. The user
is asked for confirmation of our "best guess." If the user

confirms that the two entities are the same, we model it as

81
such, adding the entity/attribute names to our list of
synonyms. If the user indicates they are different, then
the next most likely template is presented, and so on, until
we have exhausted all the potential templates. If none of
the suggestions are confirmed, we add the entity to our
schema as a new entity, and construct the appropriate

relationship(s).

3.12.3 Relationship Identification. After the entities
have been added to the schema, the relationships are
examined, and we look for differences in structural
constraints between user view relationships and
relationships in the growing schema. If found, we attempt
to resolve them, using methods discussed later in this
section. When all user views have been added, the view
integration process is ended.

Having explained at a high level how we handle object
identification and the addition of new objects, we turn to
the middle stage of our three-stage integration strategy,

the identification and resolution of view conflicts.

3.13 View Conflict Recognition

Batini et al. (1992) suggest that homonyms and synonyms
may be indicated by the presence of concept mismatches and
concept similarities, respectively. mismatches occur when
identically-named concepts possess different properties and

constraints; similarities occur when concepts with different

82

names share properties or constraints. Properties are
defined as neighbor concepts. An entity's properties would
be its attributes and the relationships it participates in.
Constraints, on the other hand, are limiting conditions on
the set of allowable instances of the schema. Examples of
this include the cardinality constraints on relationships.

These definitions of concept similarity and concept
mismatch are unfortunately a bit too broad to lend us much
help in positively identifying homonyms and synonyms during
view integration. This is due to the sharing and
integration of the data typically stored in database
systems. As Date (1986, 6) states,

these two aspects, integration and sharing,

represent a major advantage of database systems in

the "large" environment; and integration, at

least, may be significant in the "small"

environment, too.
In this environment, the concept similarities and mismatches
described above are frequently just normal consequences of
data integration and do not signal a modeling error at all.
As Date (1986, 7) further relates

Another consequence of the same fact (that the

database is integrated) is that any given user

will normally be concerned only with some subset

of the total database; moreover, different users'

subsets will overlap in many different ways. In

other words, a given database will be perceived by

different users in a variety of different ways.

In fact, even when two users share the same subset

of the database, their views of that subset may

differ considerably at a detailed level.

In examining the schemas from which the integration test

schemas were derived, it became apparent that differences

83
among concept properties were common and in most cases did
not signal modeling errors, but were just differences in the
ways the various users viewed the data set. In cases where
naming errors did exist, the two major indicators seemed to
be mismatches in attributes (for homonyms) and matches in
attributes, roles, and relationships (for synonyms). These
observations drive the reasoning followed when seeking to
identify naming conflicts in REAVIEWS. Homonyms are
typically identified by discrepancies in attribute sets
among entities with identical names. Synonyms, on the other
hand, are usually identified by discrepancies in names among
entities with the same or similar attribute sets, especially
when the primary keys match or the entities serve the same
role in a particular REA template. Most of the modeling
tools mentioned previously can do no more than identify such
situations. The user is left to make the decision as to the
correct nature of the entities. With the aid of additional
accounting domain knowledge, we can provide more support

when such situations are encountered.

3.13.1 Hemonyms. Our knowledge base contains synonym
information and a managerial schema containing expected
relationships between entities. Both of these types of
knowledge can be applied in homonym and synonym conflicts.
When an entity is encountered in a user view, we try to find

a matching structure in the managerial schema.

84

If we find a match on names or synonyms, we compare
attribute sets, looking for possible homonyms. Of course,
if the attribute sets produce no conflicts, the entities are
treated as being the same. If an attribute discrepancy is
found, we check to see if the attribute discrepancy is just
a case of using synonyms for the attributes. In the case of
a difference in the primary keys, we check to see if the new
primary key is known to be a candidate key of the entity.

If either of these cases are true, we resolve the conflict
by treating the two entities as the same. If this is not
the case, then we check to see if the two entities are
playing the same role in a known REA template. When we find
that they do play the same role, we suspect they are
possibly the same entities, with the individual view using a
different set of attributes. We present this possibility to
the system user, along with a description of the entity we
believe it to be, and ask the user to confirm or reject the
entity.

We have now moved from a simple matching strategy to
one that relies on knowledge of how the entities should be
related in the particular REA templates. The main objective
here is to see how much we can discover about an entity
before we have to stop and ask the user. We first try to
find the correct answer from stored knowledge. If that is
not possible, it is necessary ask the user to resolve the
problem using company-specific knowledge. Even in those

cases, it is possible to use some of our knowledge to help

85
guide the user in providing the correct answer. This
process becomes clearer in Chapter 4, where we discuss the
actual implementation of REAVIEWS and its handling of test

case conflicts.

3.13.2 synonyms. A similar process occurs when we find an
entity playing a role in the REA template that is already
modeled by an entity with a different name. This appears to
be a synonym, so we first check to see if the name is a
known synonyms. If it is not, we look at the attributes to
see if they are the same. If so, as we did in the above
case, we present the user with the already modeled entity
and its description and ask the user to confirm or reject
the match. This is another case in which we are able to
provide guidance from domain knowledge, even if it is not
the complete solution. Entities are then added as indicated
by the user, either unique new entities or merely different

views of entities already in the schema.

3.13.3 Type cenflicts. Type conflicts occur when different
constructs are used for the same entity, as when an object
is modeled as an attribute in one user view and as an entity
in another view. We can detect this in some cases by
recognizing a non-key attribute for one entity that is also
a key for a different entity (i.e., the attribute is a
foreign key). If our domain knowledge indicates this

foreign key represents an entity that should be separately

86
modeled (or if it has already been separately modeled in
another user view), we notify the user of this fact and
change the view to show the additional entity and a
relationship linking it to the initial entity.

In the above case, if the suspected entity has not
already been modeled in a previous user view, we still wish
to model the entity separately, as our domain knowledge
suggests that we eventually wish to keep track of other
attributes for that entity. Because this domain knowledge
comes from outside the business enterprise, it is possible
that the actual finished schema may not contain any
additional attributes. In that case, the user may collapse
the two entities back into one, as in the original view.

Consider a user view in which we find an entity named
Employee, with the key of employee number and two non-key
attributes, name, and department number. We recognize that
department number is the key of an entity about which we
frequently model other attributes, such as department name.
We therefore model the separate entity, department. Figure
3-9 shows the before and after versions of this example.
Assume that no further attributes are found in later views,
and we finish with a schema entity, department, with only
one attribute, that being its primary key. This situation
really presents no problems, as it will be accommodated when
the conceptual schema is mapped into the particular data

model used by the DBMS chosen for the accounting system.

87

employee no.
name

address
department no.

1....
_o
‘ —-O
_0

employee

 

 

 

a. before expansion

 

 

. o'
8 €-
0 0
3 E s s
(D c m 'D
* , (0.1) (0.N)
employee department

 

 

b. after expansion

Figure 3-9 Expansion of Foreign-Key Attribute

88

3.13.4 Structural constraint Cenflicts. Dependency and
behavioral conflicts come about when different views model
the same relationship with different structural constraints.
A dependency conflict is a discrepancy between the max-
cards, while a behavioral conflict is a discrepancy between
the min-cards. Before considering the resolution of these
conflicts, we make a few observations on use of structural
constraints in the modeling of a business enterprise.
First, we note that different groups in an enterprise might
rightly wish to enforce different constraints on a
relationship, at least from the perspectives of the user
applications that gave rise to the original user views. The
goal of view integration should be to create an enterprise
schema from which all of the individual user views may be
constructed. If one view requires more restrictive
constraints than other views, those more restrictive
constraints can be implemented procedurally. Next, we note
that here may be some structural constraints that should
rarely, if ever, be allowed. And finally, there may be
constraints that really depend upon particular aspects of
the individual enterprise, e.g., management policies. These
observations lead directly to our three methods of solving
structural constraints, which we call force, resolve, and g-
d-m.

The force method is used when our domain knowledge
gives us a compelling reason to use one particular set of

constraints. In this case, we are saying, in essence, "We

 

89
feel that you should virtually always model this
relationship with these constraints-— if you ignore them,
you should be absolutely certain that you wish to view this
relationship in a potentially incorrect manner." For
illustration, the narrative and document examples in the EAS
chapter on machine shops indicate that an individual sale
would virtually always be to a single customer. Conversely,
the shop expects multiple sales to (at least) some of its
customers, but there are cases in which the shop records
data about a customer prior to an actual sale. In modeling
terms , we would say that customer has an optional
participation in the customer-sale relationship, while sale
has a mandatory participation. Further, we would say that
the customer-sale relationship is a one-to-many
relationship. We would therefore try to force users to
accept this view of the relationship's structural
constraints.

The resolve method is used when we know that a
relationship will normally have one "correct" set of
constraints, but we don't know in advance what these
constraints should be. This typically happens when those
constraints are based on policies that may vary among
companies. Consider the relationship between manufacturing
employee and job operation. In some shops, segregation of
duties may dictate that several employees will work on a
given job. In a different shop, the policy may be for one

employee to always complete a job individually. Naturally,

 

90
business factors, such as the size of the company or the
nature of the product manufactured, decide which policies
are appropriate, but the fact is, such policies do differ,
and they affect the way we should model the objects of
interest.

When a structural conflict arises, we could simply ask
the user to give us the "correct" structural constraint. Of
course, this is not what we would expect of the experienced
accountant whom we have been using as an analog for our
knowledge-based system. Realizing that the answer depended
on manufacturing policies, he or she would ask "Would an
individual job always be completed by a single employee, or
do you have some jobs that require more than one employee to
complete?" The resolve method thus asks for knowledge about
the individual company, then uses that knowledge to
determine the correct structural constraints.

The g-d-m method uses a general database modeling
convention to resolve structural constraint conflicts. It
is used in cases where we have no compelling reason to
require all users to accept the same set of structural
constraints (remember that we can enforce the stricter
constraints procedurally), but the base objects that we
model must accommodate all user views. Thus, what we really
want to do here is find the most general set of constraint
(i.e., the set that is least restrictive) and use that set
in our enterprise schema. Of course, we would want to

inform the user of this change, so that the user would be

91
aware that a more general set of constraints was being
modeled. In this way, the user would be know that the

stricter constraints would have to be enforced procedurally.

3.13.5 Key COnflicts. Key conflicts are our last class of
view integration conflicts. They occur when the same
concept has different keys in different views. There are
really only two variations on this. The two keys are either
synonyms for each other, or they are different keys. If we
recognize them as synonyms, we note this fact and choose one
for the primary key in the enterprise schema, notifying the
user of this fact. If we recognize them as being different
attributes, we choose one as the primary key, then record
the other as a candidate key. If our domain knowledge does
not allow us to distinguish which situation exists, we must
ask the user. It may be possible to aid the user by
describing the primary key already modeled in our schema.
This may make it easier for the user to determine if the
different key names are describing the same attributes or
not.

The discussion in this section has been presented, as
much as possible, from the human perspective. In the next
chapter, the focus is shifted to the actual implementation
of domain knowledge and problem-solving strategies in
REAVIEWS-—-our knowledge-based system for view integration.
Included is a discussion on the choice of language

representation and the test cases developed to demonstrate

92
the use of domain knowledge in the view integration.
Chapter 4 concludes with details of a REAVIEWS session in
which the user views are integrated, with the various
conflict types recognized and resolved in the manner

described in this chapter.

Chapter 4. The REAVIEWS System

4.1 Knowledge Structures within REAVISIS

In this section we examine several aspects of the
knowledge embedded in the REAVIEWS system. We first look at
the choice of knowledge representation language. We then
describe how accounting domain knowledge is implemented in
REAVIEWS, discussing separately the declarative and the
procedural structures used. Finally, we explain the methods

for applying that knowledge in the view integration task.

4.1.1 Frame-based Knowledge Representations. The Chapter 3
discussion of accounting domain knowledge and view
integration proposed how one might use accounting knowledge
to perform view integration and deal successfully with the
integration conflicts that arise. To be used by knowledge-
based systems, however, such knowledge must be encoded into
a format usable by the computer. A number of formalisms
have been used for such encoding. Among the more familiar
are production rules, first-order predicate calculus, and
frames. For the REAVIEWS system, we chose to represent the
accounting domain knowledge of Chapter 3 in a frame system.
Frames, as conceived by Minsky (1981), are data-

structures for representing "remembered" stereotypical

93

 

94
knowledge that can be used to understand and make inferences
about ”new" situations or objects. As typically implemented
in knowledge-based systems, a frame is used to represent one
object or a class of objects, and multiple frames are
connected together in semantic networks, commonly referred
to as frame systems. These frame systems allow us to
organize and use knowledge of very complex situations or
systems in an efficient and effective manner.

There are a number of reasons for choosing this type of
knowledge structure for REAVIEWS. REA accounting theory (in
McCarthy 1979, 1982) is itself presented in a semantic
network. This would argue for a knowledge representation
scheme capable of modeling semantic nets. Frame systems
also have a number of other benefits. As Fikes and Kehler
(1985, 904-5) remark,

The advantages of frame languages are
considerable: They capture the way experts
typically think about much of their knowledge,
provide a concise structural representation of
useful relations, and support a concise definition
by specialization technique that is easy for most
domain experts to use. In addition, special-
purpose algorithms have been developed that
exploit the structural characteristics of frames
to rapidly perform a set of inferences commonly
needed in knowledge-based systems.

Chandrasekaran (1984) mentions three advantages of
frame-based systems: (1) the use of default knowledge adds
efficiency, as information about a particular object need
only be stored when it differs from default information

about that type of object; (2) frames can be structured into

95
generalization hierarchies, with default information
inherited from objects at a higher level in the hierarchy;
and (3) frames may contain procedural information allowing
inference mechanisms to be invoked when contextually

0 These advantages make frame-based systems

appropriate.1
"very useful for capturing one broad class of problem
solving activity, viz. one where the basic task can be
formulated as one of making inferences about objects by
using one’s knowledge of related objects elsewhere in the
structure" (ibid., 52-53). This is very similar to the

problem solving task described in Chapter 3 for resolving

view integration conflicts.

4.1.2 .Declarative Structures far.Accounting Domain
Knowledge. In Chapter 3, we modeled accounting knowledge
from the principles and industry levels in a managerial
schema (which we refer to as the m-schema). That schema was
in fact a declarative representation of the accounting
knowledge. The translation of that knowledge into a frame-
based system can be thought of as the translation from one

semantic net to another. The entities and relationships

 

10 An even stronger case can be made about default information.
Such information can be used to make inferences when new or incompletely
specified objects are encountered. In addition, the frame structures
themselves can enhance the performance of knowledge-based systems.
Fikes and Kehler (1985, 907-20) discuss "various ways in which a frame-
based representation facility participates in a knowledge system's
reasoning functionality and can assist the system designer in
determining strategies for controlling a system's reasoning."
(ibid., 907)

96
will simply be represented in frame structures, rather than
the E—R diagrammatic constructs of Chapter 3.

In REAVIEWS then, frames are the structures used to
represent objects of interest in the application domain.
They are composed of two major elements, slots and facets.11
Slots describe the object and can provide taxonomic
descriptions, such as the generalization hierarchy formed by
Employee and Salesperson. Slots can also provide the more
familiar attribute descriptions (such as quantity or price,
which describe the object Inventory). Facets can be thought
of as subslots. They are used to represent knowledge about
the slots. Some common facets are those for the actual
value of the slot (typically named value), default values,
documentation strings for the slot, and various constraints
for the slot. In REAVIEWS, there are two primary types of
constraint facets: constraint defines the allowable domain
for slot values, while multivalued defines whether slots are
restricted to single values or not. For example, the entity
frame has a slot named acct-cycle, which holds the names of
the various accounting cycles in which the entity is found.
That slot contains the following facets:

- value, which lists the accounting cycle(s) for the
individual entity;

 

11 Some of the terminology used here is derived from the large body
of work on frame-based systems. Much of that work addressed structural
representation issues of frames as structures in knowledge-based
systems, rather than Minsky's orientation toward frames as memory
structures for the control of reasoning. For example, the word ”slot"
has evolved, in structural representation terms, into something with a
slightly different meaning than its original use by Minsky.

97

- constraints, which describes the set from which
the value facet may be filled; in REAVIEWS, value
is constrained to be one of: revenue-cycle,
conversion-cycle, acquisition-cycle;

- multivalued, which is set to the boolean true,
meaning that the slot may contain multiple values;

- doc-string, which holds a short description that
can be used to explain what the slot's values
represent-— in this case, the string reads ”the
accounting cycle(s) in which the entity
participates;" and

. print-name, which is set to "accounting cycle"12
Figure 4-1 shows some of the basic parts of the frame
structure for entity.

In REAVIEWS's hierarchical frame system, we define four
sub-types entity frames: resource, event, agent, and other-
entity. These four have further sub-types that represent
the various entities in the m-schema. Each of the schema
entities inherits the attributes of those objects above it
in the hierarchy, and each may contain additional
information appropriate only for that particular object.
For example, sale inherits attributes from both event (its
immediate parent), and entity (the parent of event). Sale
also has some attributes of its own, including default
values for some of the inherited slots and some new slots

(such as total-amount and sales-tax) not defined in any of

 

12 Many of the frames and slots have abbreviated names for
convenience, but this makes them difficult for the user to understand;
the use of print names allows the user to be presented with easily
identifiable terms, while still allowing internal system use of the
shorter names.

‘98

‘:ﬂnldunrks rim;

Systen nefine Eind Bun ﬂindow Qperations
I:I: .muu ‘"' 'w. ,LNTQ. Eﬁirane: ENiIivaaaaﬁr ‘ ' " '7?

Frame: iﬂlllillli

Parents:> Children:)
TOP-FRAME -5u111y

 

 

ITV

 

SOURCE
Instances: L, Slots:

 

 

 

-CVCLE

-EXPL
-NRHE
-ROLE
-TVPE
‘ﬂT»

 

 

 

 

rase ENTITY: Click right For auailable operations.

 

Figure 4-1 Partial Structure of Entity Frame

99
its ancestor frames. Figure 4-2 shows a small portion of the
frame hierarchy for entities in REAVIEWS.

The other major structures in the m-schema,
relationships, are represented in the same way as entities.
The basic object is defined in the relationship frame, and
there are various sub-types of relationship also defined.
These structures contain some slots common to the entity
frames, but also include new types of attributes. For
example, the slot struct-const holds the structural
constraints for the entities joined by the relationship.
Struct-const has the following two facets for use in
resolving structural constraint conflicts:

- conflict-strategy, which denotes which of the
three conflict resolution strategies (force,
resolve, or g-d-m) should be used when such a
conflict arises, and

- s-conflict-proc, which names the procedure to be
invoked when the resolve method is to be used
(recall that this method acquires user knowledge
during the integration session, and then uses that
knowledge to determine the appropriate
constraints).

Relationship frames also contain a rel-type slot to identify
the type of REA relationship for each individual
relationship. It is possible that some entity pairs will
have more than one relationship linking two particular
entities. This is allowable as long as the relationships
are unique. Modeling of the same enterprise object with
multiple E-R constructs (entities or relationships) would

constitute a modeling error in the m-schema and is therefore

100

System nefine [ind nun Hindus ﬁrowserxnperations

 

Browsing lUP—lHnnt

JOC§0PERGTIDN
ﬂB-TRONSFER
'TEBlﬂL-REQUISITION

PBICE-QUOTRTION

SRLE

SRLE-ORDER
ASH-OISE

CRSH-RECEIPT

SﬂLE-RETURN

HﬂTERIﬂL-TBRHSFER

JOB-OP-TVPE

HRT-REQ-TVPE

 

 

 

 

rowsing TOP-FRAME I

Figure 4-2 Partial Entity Hierarchy

101
not allowed. In fact, one of the integration conflicts we

try to resolve-—-the synonym-— is this type of error.

4.1.3 Procedural Structures far Accounting Domain
Knowledge. Most of the domain knowledge in REAVIEWS is
modeled declaratively, but there areas in which knowledge is
best modeled with procedures. Such procedural knowledge is
stored in REAVIEWS in one of two ways —-either attached to a
frame of a particular m-schema object or embedded as part of
the control structure of the system. An example of the
former method is the use the two structural constraint
facets mentioned above. Each relationship frame contains
facets that carry information about how structural
constraint conflicts should be resolved. The s-conflict-
proc facet contains the procedure to be invoked to obtain
and employ user knowledge in the conflict resolution
process.

An example of control structure procedural knowledge is
the confirm-entity process. At a high level, the process
can be viewed as the attempt to identify a user-view entity,
referred to as the current-entity, using m-schema
information. If that attempt fails, then a procedure is
invoked to identify likely candidates for the user entity
from within the schema, and these "educated guesses" are
presented to the user, with some explanatory information to
help the user determine if any of the proposed entities are

the same as the user-view entity. Based upon the response

102
from the user, the entity is then added to the schema as
either an instantiation of an existing schema entity or as a
unique entity.

Unlike the use of the s-conflict-proc facet, the "most
likely candidate" method is not specific to the entity being
investigated, but is instead a general method applied to all
entities when specific identification can not be
accomplished internally. The basic intuition behind this
method is that two entities are more likely to be the same
if they share a greater number of properties.13 An initial
set of candidate entities are selected, based upon matches
in the entity-role and accounting-cycle attributes. The set
is then ordered using a weighting scheme which assigns to
each candidate entity a weight based upon matches between
the set of entities in the user view being integrated and
the set of entities with which a candidate view has
relationships. Those candidate entities with more matches
are offered to the user before candidates with fewer

matches.

4.1.4 Structures far View Integration.xnowledge. Chapter 3
discussed view integration and the use of domain knowledge
in solving integration conflicts. The processes described
there were derived from the techniques used by individuals

with extensive modeling expertise using both REA and Entity-

 

13 Properties are here being defined as in Chapter 3; i.e., they
are the attributes and relationships of the entities in question.

103
Relationship theory. As a result, while not intended as a
rigorous cognitive model of their expertise, those processes
contain a great deal of knowledge about the view integration
task. This knowledge is, in a sense, compiled into the
integration process itself. To avail ourselves of this
knowledge, the basic control structure of REAVIEWS is
patterned around the integration processes and strategies
described in Chapter 3. It is a common feature of many
knowledge-based systems that some knowledge gets compiled
into the control structures of the system itself. REAVIEWS
is no exception. By carefully attempting to pattern the
control structure after those integration strategies, we
believe that we have added some power to the conflict
resolution ability of REAVIEWS. The modeling knowledge that
adds this power is, however, modeled much less explicitly

than the knowledge represented by the m-schema.

4.2 TOSt CISOS for REAVIEWS

We previously explained the rationale for selecting the
machine shop industry as being, in part, due to the
availability of a number of modeling cases drawn from actual
business enterprises from this industry. This was an
important factor. Part of the purpose of this thesis was to

test the proposition that using domain knowledge adds to our

104

ability to resolve integration conflicts.14 To be
generalizable, test cases need to be representative of the
broad classes of integration conflicts. For high external
validity, the cases need to be representative of actual
mistakes made by human modelers. Conceptual models (user
views and enterprise schemas) from more than a dozen ”real-
world" machine-shop-type businesses were used as the
starting point. Test cases were derived from those
materials and discussions with experienced modelers. In
addition, academic and educational literature was examined
for examples of integration conflicts in other domains. The
result is a set of cases that, we believe, are
representative of those found in industries other than the
machine shop and that may also be generalized to conceptual
modeling of non-accounting systems.

The cases shown in Figures 4-3 through 4-6 contain
examples of conflict classes shown in Table 2-1.15 The
precise nature of the conflicts will be discussed in the

integration example section, but we first present a brief

 

14 This is not an ad hoc proposition, but developed from
observation and performance of the view integration task itself. When
resolving integration problems such as name conflicts, the analyst must
determine if two entities are referring to the same real world object or
not. This determination is frequently made using domain knowledge.

15 REAVIEWS was also tested with a number of user views not
discussed here. The views were primarily variations of the examples in
Figures 4-3 through 4-6. The purpose was to test each solution with
multiple combinations of variables; the results were all consistent with
those discussed here. Of course, this is not surprising, as Newell and
Simon (1976, 114) point out, ”we don't have to build 100 copies of, say,
a theorem prover to demonstrate statistically that it has not overcome
the combinatorial explosion of search in the way hoped for.” The
variations were more for identification of errors in system programming
than for errors in problem-solving logic.

105

 

o

E .

r': E 2
0' § gé s
Eo'E «b 8" S
g-EE 225-3 (0%
028 338 8::

worker

(1,N) (1,N)

118.... ®

(0,N) (1,1)

item-no. . industry-no.
price - - name

Figure 4—3 User View — Produce-Sales-Analysis

_—l-« job-no

106

pieces
em ployee-no

end-setup
machine-no

:9
17516

—0 std-setup
—o descr'ption

 

. --e job-op-no
—o std-time

3
9'
o
1?

 
 
 
 

 
 

job-
operation

]Ob type

  

 

Figure 4-4 User View — Update-Work-In-Process

 

107

   

 

 

 

8
i? .
0 =
as E E ii
3 E23 as
gg 98m .52
ii iii Ti ,
cash-
cash receipt saleJ

 

 

   

 

Figure 4-5 User View — Record-Payment

 

108

 

 

 

 

 

 

 

 

O
C
a
5 o o
8 9 o 9
1’ E E E ‘5
a 2 s 2 s 8 g
8 2 9 8 ‘8 8 m
I i I i i l T
r mfg- 68.5" cash
f employee disb

(0,N)

(1.1)

 

 

, As time
employee- hours
service 9'055'pay
~ fed-tax

Figure 4-6 User View — Record-Payroll

109
explanation of the methods of providing user inputs to

REAVIEWS.

4.3 Inputs to REAVIEWS

The REAVIEWS system is designed for REA view
integration. Most of its input consists of REA-modeled user
views. The diagrammatic format of the user views in
Figure 2—1 must be translated into frame structures before
the views can be used by the system. This is relatively
straightforward, though not necessarily simple. Entities
become frames, attributes become slots, and constraints on
these attributes, such as range restrictions, become facets.
Relationships in which an entity participates also become
slots of that entity. Constraints on relationships, such as
dependencies between the entities connected by the
relationship, become facets.

Other input may be required during the integration
session. This will generally be in the form of user
responses to system queries. The system may, for example,
ask the user for additional information it requires for
conflict resolution. When the system is unable to resolve a
conflict and asks the user for help, it will still provide,
when possible, suggestions or background information to aid

in the task.

110

4.4 View Integration Session

Following the general integration strategy described in
Chapter 3, the user views are each separately retrieved and
examined by REAVIEWS. A typical session starts with a
welcome screen and a dialog box asking the user to type in
the name of the user views input file. As REAVIEWS performs
its various operations, important messages, particularly
those requiring action by the user, are displayed at the top
of the main REAVIEWS window, as seen in Figure 4-7. As can
be seen in that figure. More detailed information on
REAVIEWS’s actions is provided in the output window at the
bottom of the main screen. When a user view has been
completely processed, REAVIEWS pauses and instructs the user
to scroll through its actions before proceeding with the
integration. The next four sections discuss the integration
of the four test cases. While going through the cases, the
reader may wish to refer back to the problem solving

overview in Figure 3-8 and the appendix.

4.4.1 Produce-sales-Analysis View. The first user view to
be integrated is shown in Figure 4-3-—-the Produce-Sales-
Analysis view. This view contains examples of three of the
conflict classes. The first conflict is a synonym conflict
between the entity named "sales-worker" and the managerial-
schema entity salesperson. This type of synonym is one of
the easier for a human to recognize as (1) the terms are

recognized as synonyms, (2) the entities are serving the

Jill

PRODUCE-SRLES-SHRLVSIS
Please refer to printed view as integration proceeds.

Uutput Window

 

 

'ERUIEHS is opening the File IESTCRSE.TXI

H USER UIEH: PRODUCE-SBLES-ONBLVSIS

. msmsmsmassass"use“!masasssasssmussmmmssmssssmssmmsmmmmmsmsmmsmsmsms

'ERUIEWS is searching the enterprise schema to see if it already
contains an of the entities in the current user view.

 

Figure 4-7 REAVIEWS — Main Screen

 

112
same role in the accounting-cycle template, and (3) the
entities have the same primary-key attribute. This
highlights some of the benefits of using domain knowledge
and a problem-solving structure similar to that of human
modelers. In REAVIEWS, this recognition is implemented by
providing the system with lists of commonly used synonyms
for entities and attributes. Following the integration
script described in Chapter 3, REAVIEWS first tries to
locate the entity name "sales-worker" in the m-schema. That
failing, it checks the synonym list. The synonym sales-
worker is not in the m-schema's synonym list, so REAVIEWS
follows the strategy described for selecting likely

16 Using the weighted list of candidate

candidate entities.
templates constructed when the user view was retrieved,
REAVIEWS presents the user with a screen asking the user for
confirmation of the most likely candidate entity. This
screen is shown in Figure 4-8. If the entity is confirmed,
REAVIEWS then proceeds with identifying and adding the
primary-key and non-primary-key attributes, modifying the m-
schema as new information is added.

The second conflict arises because the max-cards of the

Customer-Industry relationship differ from those modeled in

the m-schema produced from domain knowledge. Such a

 

16 To enhance REAVIEWS's performance, and more closely mimic the
expert data modeler's behavior, some form of natural language parser
could be added, to allow matching parts of entity names. For example, a
human would recognize that a worker is also a person, so sales-worker
would readily be identified as the same entity as sales-person. This is
beyond the current scope of REAVIEWS, but may be the focus of future
research efforts.

11J3

The rea templ or
shows that event as the DECREHENT oF the resource FINISHED-8000
the usual inside agent(s) For this event:
snlESPERan
the usual outside agent(s) For this event:
CUSTOMER
In the user view named PRODuCE-SRLEs-BWRLVSIS
iF the IN—thNT named snLEs-WORKER
is the same as the res-template entity named SALESPERSOM
please select the yes button. 1F not, please select the no button.

I}

(III

Output Hinduu
by the number oF user-view entities Found in each template,
so the most likely template gets examined First.

To help identiFy user entities, BERUIEWS uses event templates
From the enterprise schema. REhUIEWS is ordering the templates
by the number oF user-view entities Found in each template,
so the most likel te ulate nets examined First.

 

 

 

 

Figure 4-8 REAVIEWS — "Candidate Entity" Screen

114
difference constitutes a dependency conflict. As modeled,
the user view shows that a customer may be classified into
many industries, but a single industry will have only one
customer associated with it. This is opposite of the normal
case. Domain knowledge in our m-schema (which we refer to
as to as d-knowledge) leads us to expect the more normal
case, and the structural constraints in the m-schema
consequently use the force resolution strategy.

The third conflict in Figure 4-3 is a behavioral
conflict, where the min-card of industry indicates that an
industry cannot be identified or maintained in the database
without at least one corresponding customer. This
precludes, for example, marketing or product development
groups from identifying and tracking information on
industries with potential for sales but to which no sales
have been made. The d-knowledge min-card of "zero" allows

for such tracking, and REAVIEWS uses that constraint.

4.4.2 Update-werk-in-Process View. The user view in Figure
4-4 also contains three conflicts. Two are similar to the
structural conflicts found in the previous view. There is a
dependency conflict in the min-cards of the relationship
between job-operation and job-operation-type. There is also
a behavioral conflict in the max-cards of that same
relationship.

As modeled, the min-cards indicate that standards for

job operations must be associated with an actual operation

115
event to be allowed in the database. This precludes the
establishing of standards for an operation before the
operation is actually used on a customer’s job. Likewise,
the min-card for job-operation precludes an employee from
performing an operation for which no standards have been
set. This precludes ”custom" or "cost-plus" type jobs.
This type of job is common when an entirely new product is
being built. Without the benefit of experience in building
a similar item, it can be difficult to determine what a
standard amount of materials or labor might be. Hence, the
m-schema contains a less restrictive min-card of "zero" for
each of the entities.

Allowing for the less-likely possibility of a shop that
does requires the setting of standards in advance of all
operations, REAVIEWS does not use the force method of
conflict resolution, but instead uses the g-d—m method.

This allows for more restrictive modeling in individual user
views while constructing an enterprise schema that will also
allow the more usual case.17

A dependency conflict arises due to the max-card for
job-operation-type. As shown in Figure 4-4, an operation-
type could be used only once. This is contrary to the
purpose and use of standards for job-operations. Such

standards are usually set for operations that are repeatedly

 

17 For example, it may be that there will be a later user view for

updating work-in-process for custom jobs. In that user view, a less-
restrictive min-card of "zero” might be required.

116
performed, and so we should expect that a particular job-
operation standard would be used for multiple actual
operation events. Again, the g-d-m method allows the m-
schema's less restrictive max-card of ”N” to be used in the
enterprise schema.

The third conflict in Figure 4-4 is a type conflict.
The entity mfg-employee is an important component of a
schema for any manufacturing company. It is difficult to
conceive of an enterprise schema for any such firm that
would not model that employee as a separate entity. In the
user view in Figure 4-4, however, the employee is modeled as
an attribute rather than an entity. When adding the non-
primary-key attributes from the user view to the enterprise
schema, an experienced human modeler would recognize that
the job-operation attribute employee-no is an identifier for
the manufacturing employee. That modeler would also be
aware that we would virtually always want that employee
modeled as a separate entity and modify the schema
accordingly.

This process is implemented in REAVIEWS when it adds
the user-view non-primary-key attributes to the schema. At
that time, a check is made of all of those attributes to see
if any are recognized as primary or candidate keys for any
entities modeled in the m-schema for that event template.
When found, REAVIEWS notifies the user that it is modifying
the user view to include the "new" entity. Figure 4-9 shows

the screen presented to the user in this case. REAVIEWS

1137

$35: airfares ..:' . £5 3;; ..... H: 5 ..... ”1m. .=?~'.~.'Y~‘.~. .33 .5.-1:13. 5. 3%. 1*?

 

E his attribute is also the prinmry key For the entity

ire-Elnora

§:e aware that this entity usually has multiple attributes about which we
g ecord inFormation. RERUIEWS will model this as a separate entity.

Output Window
«assassswswwwwsswswwmaasemswesawnuusmsswsmsmmeswm
-£nvr£us is searching the enterprise scheme to see iF it already
contains any oF the entities in the current user view.
'ERUIEHS is examining the user entity JOB-OPERATION
'ERUIEWS is adding the entity JOB-OPERRTION
to the enter rise schema.

 

 

 

 

Figure 4-9 REAVIEWS -— Notiﬁcation of Foreign-Key Expansion

118
also adds a relationship between the new entity and the

entity from whose attribute list it was discovered.

4.4.3 Record-Payment View. Figure 4-5 also contains
dependency and behavioral conflicts, but these demonstrate
the third method of structural constraint conflict
resolution, resolve. As modeled, the cash min-card shows
that all cash receipts by the company must be payments for
sales. This precludes other common cash-receipt events such
as owner/stockholder investment or purchase returns for
which the company receives a check rather than an adjustment
to its account by the vendor.

The two max-cards indicate that each sale will be paid
for by separate cash receipts and that a sale will always be
paid for by one and only one payment. This is contrary to
the more usual case in which a customer may send in one
check to pay for multiple invoices or the case of a customer
paying "on account," in which a payment may reimburse only
part of the total amount owed for one sale.

The resolve method of constraint conflict resolution
looks to the individual relationship frame to findthe
resolution process created specifically for that
relationship. Such processes are different from the force
and g-d-m methods in that resolve processes require
information beyond that contained in the m-schema
relationship itself. In the example in Figure 4-5,

resolving the conflict between the user view and the m-

119
schema is straightforward if we know a little about the
actual policies of the company. A human could decide on the
appropriate structural constraints by asking some simple
questions about the sale payment process. REAVIEWS does the
same thing when it invokes the process attached to the cash-
receipt-sale relationship. Figure 4-10 shows the screen
presented to the user. The choices available offer the
combinations of payment policies commonly used. REAVIEWS
will assign structural constraints based upon the user
response. In this test session, we selected the last
option, and REAVIEWS appropriately assigned max-cards of "N"
to both entities. It also assigned a min-card of "zero" to
cash, to recognize the fact that companies routinely

encounter cash receipts for events other than sales.

4.4.4 Record-Payroll View. Figure 4-6 provides an example
of the last structural conflict, the key conflict. In this
case, mfg-employee is shown with social-security-no as a
primary key. The Encyclopedia of Accounting Systems
provided evidence that some companies might assign other
identifiers to employees. Many standard accounting texts
similarly demonstrate employee codes of this nature, often
using a code that incorporates other information (such as
employee department) in the employee-number code. The m-
schema thus includes both social-security-no and employee-no
as candidate keys, with employee-no initially listed as the

default primary key.

12C)

.- e 30 i 9 us jam
To help determine the correct structural constraints For the cashreceipt-S‘rII
relationship. please select the choice below mich best describes your I
policy regarding acceptance of payments For sales. thank you.

He accept only Full payments For single inuoices.

He accept Full or partial payments, but neuer for more than one iJBbice.

He accept payment for multiple inuoices, but each must be paid in Full.

He accept partial or full payoents. For one or multiple inuoices.

 

 

'EﬂUleS is examining the user relationsh p cnSﬂ-BECEIFT—SREE-o
'ERUIEHS is checking For entity synonyms discouered during entity
integration; the current relationship will be renaned, if necessary.
he enterprise schema contains the relationship
CRSH-RECEIPI-SRLE-D.
REQUIEHS is now co narin- structural constraints For conflicts.

 

 

 

Figure 4-10 REAVIEWS — Request for Company-Level Knowledge

121

When REAVIEWS encounters a user entity with a primary
key different than in the m-schema, it looks for the user's
primary key in the m-schema entity's list of candidate keys.
Following the experienced modeler's practice of deferring to
the user's view of the company as much as possible, REAVIEWS
will first look to see if the user entity has already been
instantiated in a different user view. If so, REAVIEWS
defers to the already integrated view and uses that primary
key. If different than the current user view's primary key,
the current primary key is placed in the candidate key list.
If the entity has not yet been instantiated, REAVIEWS uses
the current view’s primary key in the enterprise schema. It
then switches the entity's default primary key to match.

The last user view having been integrated, REAVIEWS
asks the user for the name of the output file in which to
store the integrated schema. It then writes the schema
information to that file and stops the REAVIEWS session.

The instantiated schema information is still present in the
system, and it may be reviewed by the user if desired. Such
a review could be used to provide additional insights into
the completeness of the finished schema. In the normal
course of accounting system design, a designer may expect
certain entities and events to be present and notice their
obvious absence from the users' schemas. Similarly, the
Encyclopedia of Accounting Systems provides information
about the typical entities and events that one could expect

to see in a well-constructed schema. These entities and

122

events having been modeled in the m-schema. The user can
readily identify ”missing" schema components by browsing the
schema hierarchy at the end of a REAVIEWS session. At the
end of each branch in the hierarchy, there will be either a
schema frame, shown in normal black letters, or an instance
of a frame, shown in red italic letters. Those frames
without instances are the schema components that were
modeled in the m-schema but not modeled in any user views.
This process of browsing the finished schema could be used
to help the user identify potentially-missing schema
components. An alternative to this approach would be to add
some text to the output file listing the uninstantiated
schema components and suggesting the user review them.

As all the user views have been integrated, all that
remains is for the user or analyst to draw the entity-
relationship diagrams, using the information in the output

Lfile. The nature of that output is discussed next.

4.5 Outputs from REAVIEWS

When the user views have all been integrated, the
internal frame-based representation must be translated into
a form understandable by the users. REAVIEWS output will be
the complete schema as a list of entities, attributes,
relationships, and structural constraints. The most
desirable output would of course be an E-R diagram of the
company. Producing such a diagrammatic representation is,

1however, beyond the scope of the REAVIEWS system. While the

123

spacial manipulation of objects required to produce E-R
diagrams seems relatively simple for humans, the task is
non—trivial on a computer. There are tools available that
provide such output from a list of entity-relationship
specifications. It may be possible to link such tools to
REAVIEWS so that output could be provided in the in diagram
form, but at this time that is outside the scope of our
current research.

Figure 4-11 shows a short section of the output file
from this REAVIEWS session, while Figure 4-12 shows an E-R
diagram produced from that output. Both of those figures
show only part of the actual output from the test case

session.

4.6 Software Environment

REAVIEWS was implemented using the GoldWOrks II expert
system development tool. This environment was initially
chosen for the REACH project (McCarthy and Rockwell 1989)
because of its its support for both rule and frame
structures. The REAVIEWS project grew out of REACH.
GoldWorks II also provides other development tools, such as

object-oriented programming, that are useful in KBS design.

124

The entity named SALE is of type EVENT.

Its primary-key attribute is: INVOICE-NO.

It's non-primary-key attributes are:

(TOTAL-AMOUNT DATE).

The primary-key attribute synonyms are: (SALES-NO).

The entity named FINISHED-GOOD is of type RESOURCE.
Its primary-key attribute is: ITEM-NO.

It's non-primary-key attributes are:

(PRICE).

Synonyms for FINISHED-GOOD are: (PRODUCT)

The entity named CUSTOMER is of type AGENT.
Its primary-key attribute is: CUSTOMER-NO.

It’s non-primary—key attributes are:

(NAME).

The entity named SALESPERSON is of type AGENT.
Its primary-key attribute is: EMPLOYEE-NO.

It's candidate-key attributes are:
((SOCIAL-SECURITY-NO)).

It's non-primary-key attributes are:
(COMMISSION-RATE NAME).

Synonyms for SALESPERSON are: (SALESWORKER)
The primary-key attribute synonyms are: (EMF-NO).

The relationship SALE-SALESPERSON-C is a CONTROL relationship.
The entities joined by this relationship are:

(SALE SALESPERSON).

The structural constraints for these entities are:

((SALESPERSON 0 N) (SALE 1 1)).

The relationship CUSTOMER-SALE-C is a CONTROL relationship.
The entities joined by this relationship are:

(CUSTOMER SALE).

The structural constraints for these entities are:

((CUSTOMER 1 N) (SALE 1 1)).

The relationship FINISHED-GOOD-SALE-S is a STOCK-FLOW relationship.
It's non-primary—key attributes are:

OTY.

The entities joined by this relationship are:
(FINISHED-GOOD SALE).

The structural constraints for these entities are:
((FlNlSHED-GOOD O 1) (SALE 1 NI).

Figure 4-11 Partial Output from REAVIEWS’s Session

125

 

 

 

. s E
2 § E. E
is 8 '6 2 a :3 E
s a E a :9”. a 2
iinizhed- I Y T (1") ‘1'”) I t
[ good sale ® customer

 

 

 

 

(1.1)

<®

 

(0.N)
..- employee-no
sales- social-security-no
person name

 

commission-rate

Figure 4-12 Partial Schema Produced from REAVIEWS Output

Chapter 5. Summary and Contributions

In this thesis, we have discussed research into
conceptual database modeling from the perspective of
accounting information systems design. In Chapter 2, we
focused on problems encountered at the view integration
stage and on the inability of existing knowledge-based
modeling systems to adequately resolve them. We proposed
that existing computer experts systems fail, in part,
because they lack important application-specific domain
knowledge available to human experts. Chapter 3 contained
our discussion of the modeling and use of domain knowledge
as a potential solution to some integration problems. In
Chapter 4, we described the prototype knowledge-based
system (REAVIEWS) designed to explore and test that
proposal. The discussion covered the modeling and use of
reconstructive expertise in REAVIEWS, along with details of
an integration session, using test cases developed for this
research.

In this chapter, we first discuss limits of scope for
this thesis. Next, we examine the research setting and
justifications for this work. Following that, we consider

the research contributions of this work. We close with a

126

127
short section on future research directions suggested by our

work.

5.1 Limits of Scope for REAVIEWS

REAVIEWS was developed following the process described
in McCarthy, Rockwell, and Wallingford (1989). Figure 5-1
illustrates part of that process, in which the AI system
component with the highest complexity (Component 3) is
developed first. Other components are initially developed
less fully. Problems of tractability force researchers to
make the inevitable trade-off between depth and breadth of
scope. To achieve the desired research objectives, REAVIEWS
was designed to perform view integration for the revenue and
conversion accounting cycles in a representative and rich
industry setting. While future development is anticipated
for all of the accounting cycles and additional industries,
the initial scope of the system was limited to those two

cycles and the machine shop industry.

5.2 Research context and Justification

The initial impetus for REAVIEWS came from an earlier
research project in computer-aided software engineering
(CASE). That work (McCarthy and Rockwell 1988, 1989)
attempted to bridge existing research in systems for
structured analysis and conceptual database design. The
system proposed there (named REACH) is represented in

Figure 5-2. REACH would use multiple types of knowledge

128

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FULL DEPTH
OF
COMPLEXITY

..
...
Q
..
§
.Q
~Q
.‘
O.
O
...
.
.O
Q.
Q
.Q
..
5

 

 

Component n

 

 

 

 

Sub-Task

 

 

Sub-Task

 

 

 

 

Primitive
Task
Descrlptlon

 

 

 

 

 

Primitive
Task
Descrlptlon

 

 

Figure 5—1 Scope of Pilot System
[sourcez McCarthy, Rockwell, and Wallingford (1989)]

 

 

Fri 1'
Task
Descrlptlon

 

 

 

129

 

DFD'a,
DESIGNAID

\

   
  
    

- c - ‘ IMPLEMENTATION
W DESIGN

'------------------

......OOOOOOOOOOOOOOI..
O

ACCOUNTING KNOWLEDGE -.

 

   

     
   
 
     
   
   

       
       
 
 
      
    
 
 

 

 

 

 

I
' . . Scheme

0 . '
: : TOP-DOWN 1
. ; ENTERPRISE ANALYSIS 1 I

: (Swarm Encyclopedia) '
: : \ VIEW :
' : : MODELING / . TARGET
. ; EOONSTRucTIVE EXPERTIS A»? VIEW I SYSTEM
. : OF ACCOUNTING THEORY : INTEGRATION . KNOWLEDGE
' : EA Event Template) 3 |
' : 0 I

z I
: ; IMPLEMENTATION HEURISTICS I
. 3 (Events Acctg. Compromises) I
I '. .
. ..eoeeeeeeeeeaeeeeeei°. : PHYSICAL
' DESIGN
. BEAzQH '
' METHODS KNOWLEDGE I
. (ER a Normalization) I
' I
‘ l

‘ I

~----------------------'

Figure 5-2 The REACH System
[adapted from: McCarthy and Rockwell ( 1989)]

130
(from several knowledge domains) in a CASE tool to connect
two relatively disparate methodologies. The inputs of the
view modeling stage are formal information requirements
derived using structured analysis. The outputs Of the view
modeler will be the user schema representations required by
REAVIEWS. In the course of the research, it became apparent
that the integration Of user schemas was extremely
problematic and would require a considerable research
effort. This realization provided some of the initial
impetus for this thesis, but REAVIEWS can also be placed
within research contexts beyond the REACH project.

Amer et al. (1987, 14) declare that "from an accounting
perspective, the higher database abstraction of the
conceptual and external levels, and therefore the CIS
[computer information systems] research area of data
modeling, significantly impacts design considerations Of
accounting databases." They suggest that application and
adaption Of theories from other scientific disciplines might
offer new insights tO accounting researchers and should be
encouraged. They also observe that accounting research has
notably benefitted from the application of expert systems
technology to the accounting problem domains Of auditing and
taxation. The benefits Of such a combination do not accrue
solely to accounting researchers. As O’Leary (1988, 26)
states, "certain research topics, methodologies and database
approaches used in software engineering research can benefit

from the specificity Of context Offered by accounting

131
information systems." In addition, "the existence of
expertise by accounting information systems designers and
developers has received little research attention"
(ibid., 30).

Embedding more domain-specific accounting knowledge
within the REA framework is also a logical development in
the conceptual modeling research area. Reuber (1988) called
for further development of generic enterprise models
relevant to accounting, such as the REA model. Weber (1986)
suggested future research "might attempt to refine the [REA]
model to lower levels Of abstraction, even if the model
becomes domain-specific.

Work in knowledge acquisition from documentation is as
yet less well developed than other areas of knowledge
acquisition research. While much work has been done on
natural language processing (NLP) Of textual matter, much
less formal work has been done on acquiring knowledge for
KBS from such sources. The use Of reconstructed knowledge
in REAVIEWS was viewed as a means Of exploring and refining
some Of the issues in this area Of research. It is not
uncommon for some of the knowledge in a KBS to originate
from documentation. There exist, however, relatively few
systems with knowledge bases constructed primarily from

18

documentation. The specificity of context provided by the

 

18 See Hoffman (1989) for examples of systems using expertise from
documentation.

132
accounting environment Offered some distinct advantage in
this effort.

As a final matter in placing the contribution of
REAVIEWS into an overall research context, we recall the
quote of Newell and Simon (1976, 114), who maintain that
research such as that performed for this thesis can be
considered a form of empirical inquiry:

Each new program that is built is an experiment.
It poses questions to nature and its behavior
Offers clues to an answer.... We don't have to
build 100 copies of, say, a theorem prover,to
demonstrate statistically that it has not overcome
the combinatorial explosion Of search in the way
hoped for.... But as basic scientists, we build
machines and programs as a way of discovering new
phenomena and analyzing phenomena we already know
about.

We believe that the system building process underlying our
research efforts here fall within the scope of empirical

inquiry delineated by these two noted computer scientists.

5.3 Contributions

This thesis contributes to the existing body of
research on a number Of dimensions. As far as can be
determined, this is the first knowledge-based system that
uses domain-specific theory to structure the task Of view
integration. Existing KBS for conceptual modeling use
knowledge primarily from the field Of conceptual database
design, not from the application domain being modeled.

REAVIEWS used a general domain theory about accounting as

133
developed in the REA accounting model. It also used a
somewhat less general domain theory Of accounting for
companies in a particular industry, as presented in the
Encyclopedia of Accounting Systems. As demonstrated in the
test cases, this domain-specific knowledge did allow us to
resolve certain conflicts which have stymied existing
modeling systems. The basic findings support the initial
proposition that some integration conflicts are solvable
when such knowledge is used. This is significant, as
accounting systems are the backbone Of most commercial
information systems.

The second major contribution Of this research lies in
its use Of "chart-Of-accounts-based" sources for much Of the
domain-specific knowledge. We in essence took a rich source
of knowledge about accounting for a particular industry and
transformed it from one highly specialized model tO another
very different way Of viewing accounting. In the process,
the knowledge was transformed from a format best suited for
manual record-keeping systems to a format which made that
knowledge available to knowledge-based systems for
conceptual database design. REA accounting theory provided
us with a "knowledge structure" independent Of the
implementation choices inherent in the building Of modeling
systems. As such, it provides some insights to areas other
than accounting that may have a similar domain theory for

use in the knowledge structuring task.

134

This thesis also contributes to the body of research
concerned with the acquisition Of knowledge from
reconstructive (or documented) sources. The findings
support the proposition that the process of such knowledge
acquisition can be aided by the presence and use Of well-
developed domain theories. The theoretical accounting basis
for REAVIEWS (primarily derived from the REA model) directed
the knowledge acquisition task and helped us identify
important facts in our domain Of interest. That theory also
made us painfully aware of those instances where the domain
"expert" (the Encyclopedia of Accounting Systems) could not
provide sufficient expertise to assist in the view
integration task.

The application Of artificial intelligence research
paradigms to accounting issues has yielded a considerable
amount Of knowledge and insight, particularly in the area of
audit judgement. Research in applying those paradigms to
the area of accounting information system design is just
beginning. This project can also be viewed as one step in

extending our knowledge in that area.

5.4 Future Research Directions

A variety Of future research topics are suggested by
this thesis. Host Of these topics can be grouped into two
major areas: knowledge acquisition and conceptual modeling.
The automation Of knowledge acquisition is currently the

focus of considerable research attention. The methodology

135
developed for the manual translation of reconstructive
knowledge for REAVIEWS could form the basis for research
into automated acquisition Of such knowledge. The
application domain Of REAVIEWS-— conceptual modeling of
accounting systems-— possesses properties that make it
appropriate for this type of research. First, the
vocabulary for accounting and accounting system design is
relatively limited and well-defined when compared to
language as a whole. Second, the REA model provides a
syntactic and semantic structure into which the accounting
narratives may be mapped.

Another interesting extension Of this work would be the
use of domain-specific accounting knowledge in earlier
stages of conceptual database design. There may be
advantages to introducing this extra knowledge into the
modeling process before integration occurs. By analyzing
user information requirements within the context Of
industry-specific accounting cycle models, some Of the
integration conflicts may be resolved at the view modeling
stage.

Within the conceptual modeling area, "industry-
specific" enterprise models developed from other industries
Offer potential insight into the accounting system design
process. Exploration of the commonalities found in these
models may lead to more generalizable enterprise models,
such as models for manufacturing or retail firms. These

models would be at a higher level Of abstraction than the

136
models used in REAVIEWS but would still be more specific
than the REA model.

The synthesis of new enterprise models from existing
models Offers another research avenue. Sources such as the
Encyclopedia of Accounting Systems contain a limited number
Of firm types. This might be compared to a situation in
which accountants have a great deal of experience designing
accounting systems, but only in a narrow range Of business
types. When faced with the task Of designing accounting
systems for new types Of business, the accountants might
recognize similarities between the new businesses and those
with which they have experience. For example, when first
encountering a video rental type of business, the
accountants might notice that the video store is similar to
both a library and a record store. With experience and
knowledge in those two business areas, the accountants could
construct an accounting information system. At this time,
computers are not as good at performing this kind Of design
task as are humans. Research aimed at increasing our
understanding Of the design process is a promising area for

future endeavors.

5.5 Final Conclusions

As discussed in the previous two sections, the research
reported in this thesis makes significant contributions and
extensions to prior research work in the areas of conceptual

database design and knowledge acquisition. Continued work

137
on the REAVIEWS and REACH systems is expected to provide
further insights into what are clearly very complex and very
important issues, particularly in the area of accounting
system design.

While the setting for this project was in the domain of
accounting, we believe there is some generalizability in the
methods used to structure the acquisition and use Of
knowledge via theory from the application domain.

Finally, the use of application domain theory should
also be useful in conceptual modeling systems designed for

domains other than accounting.

APPENDIX

APPENDIX: Major View Integration Processes in REAVIEWS
(as depicted in Figure 3-8)

NOTE: Processes are listed alphabetically rather
than by order of appearance during
integration.

ADD ENTITY:
The managerial schema (m-schema) is checked to see if
the current-entity has already been instantiated. If
not, the entity instance is created, and accounting-
cycle template is updated to reflect this. Any entity
synonyms discovered in the entity identification
process are added to m-schema.

ADD FOREIGN-KEY ENTITY:
During the addition Of non-primary-key attributes
(NPKA) to the m-schema, we may find a foreign key Of
another entity which is modeled in the m-schema.
Entities are modeled separately in the m-schema because
they are important Objects about which we frequently
capture additional information. When a foreign key is
found, the attribute is removed from the npka list Of
the current entity; the foreign-key entity is added to
the schema as in ADD ENTITY; and the relationship
between the two entities is added to the m-schema as in
ADD RELATIONSHIP. The user is notified that this
action is being taken.

ADD NPKAS:
This is the procedure for adding non-primary-key
attributes to the m-schema. In this version Of
REAVIEWS the process is simple: we instantiate the
attribute unless it (or one of its synonyms) has
already been added to the schema. There is nO domain
knowledge used in this process, primarily because the
EAS (our source of reconstructive knowledge) contained
too little information about attributes Of the
important Objects in the domain. Richer sources Of
domain knowledge may provide enough additional
knowledge to allow the type Of problem solving
strategies used for entities, but that is beyond the
scope of present research.

138

139

ADD RELATIONSHIP:
Relationships are handled similarly to entities. We
first check to see if the relationship has already been
instantiated, and if not, we add it to the schema,
updating the accounting cycle template to reflect this.
NPKAs are added as in ADD NPKAs.

ADD UNIQUE ENTITY:
When adding a new entity to the scheme, we must first
create a new entity in the m-schema, then instantiate
it, adding the attribute information (pka and npka).
The accounting-cycle template is also updated.

CONEIRN CANDIDATE ENTITY:
By order of "likelihood" (see FIND CANDIDATE ENTITIES)
the candidate entities are presented to the user, along
with information about the REA template in which it was
found (see Figure 4-8). If the user indicates that one
of our candidate entities is the same as the current-
entity, then we add the current-entity as an instance
Of the confirmed candidate entity, using the ADD ENTITY
process. If none Of the candidate entities are
confirmed, the current-entity is added with ADD UNIQUE
ENTITY.

FIND CANDIDATE ENTITIEB:
When an entity can not be identified from its name and
key attributes, we search the m-schema for potential
matches. We look for other entities Of the same type,
playing a similar role in our accounting-cycle
templates. If found, these "candidate entities" are
ordered, so that we may present the more-likely
candidates first. The ordering process follows asimple
heuristic: we start with the accounting-cycle templates
that most closely resemble the current user view, then
proceed through those templates that bear less and less
resemblance. In this case, "resemblance" is
approximated by a simple count of the number Of matches
between Objects in the user view and the m-schema
template. If a template has three of the same entities
as the user view, that template is deemed a "better"
candidate than a template with only two Of the same
entities. The "found" candidate entities are then used
in CONFIRM CANDIDATE ENTITY.

140

FIND ENTITY:
The initial attempts at finding a match for the
current-entity in the m-schema are basically a pattern
match on entity names/synonyms and types. If no match
is found, we invoke FIND CANDIDATE ENTITIES (above),
which uses a somewhat more sophisticated matching based
upon entity roles. If a match is found, we attempt to
confirm our identification by examining the primary-key
attributes (PKA) via the FIND PKA process.

FIND FOREIGN-KEY NPEAS:
Before we add the non-primary-key attributes to our m-
schema, we check for foreign keys in the NPKA list.
Their presence indicates a potential type conflict in
the schemas, and this is resolved via ADD FOREIGN-KEY
ENTITY.

FIND PEA:
After a current-entity has been matched with an entity
in the m-schema (by FIND ENTITY), we dO a pattern match
on the primary key entities. If the match fails, we
check primary key attribute synonyms. If successful on
either match, we add the current-entity to the m-schema
via ADD ENTITY. If the PEA/synonym match fails, we
match on candidate keys and their synonyms. If a match
is found and the entity has not been instantiated, we
add the entity to the schema. We then change the m-
schema's primary key to match the current-entity's key.
We next place the Old default-primary key in the
candidate key list. This follows our heuristic Of
deferring to the users' view Of the firm when at all
possible. If the entity has already been instantiated,
we place the current-entity's primary key in the
candidate key list.

 

FIND RELATIONSHIP:
Before a relationship can be added to the schema, we
check the m-schema to see if we have renamed any of the
entities connected by the relationship. This renaming
frequently occurs during synonym or homonym conflict
resolution. A relationship’s name is the concatenation
of the names of the two entities. Failure to check for
the renaming of entities can result in the
instantiation Of a single relationship under multiple
names. If entities have been renamed, we change the
relationship name to reflect this.

141

GET USER SCEENA:
At the beginning of an integration session, we divide
up all the user views into their respective accounting
cycles. This allows us to integrate the views one
accounting-cycle at a time. This is done as an aid to
the user rather than as a requirement Of our problem
solving strategy. If the domain expertise suggests
that one set of transaction templates should be given
primacy over others, that set could be integrated
first.

GET INDIVIDUAL VIEW:
User views are selected for processing by cycle. When
all Of the views in a cycle have been integrated, the
next cycle is selected for processing. At this time,
we have no domain-based preference for the ordering,
although the structure of REAVIEWS does allow such
preferential ordering (see GET USER SCHEMA, above).

SAVE BCNENA:
In a manual view modeling setting, the integration
described in this thesis is accomplished primarily in a
graphical environment. Saving the schema in that
setting would consist of gathering up all the diagrams
and storing them together. In REAVIEWS, the schema
information resides in the frame hierarchy of the m-
schema (which includes the modifications to it made
during the integration process). There are two ways in
which this information is provided to the user. First,
the entire instantiated schema is copied to a text file
(Figure 4-11 shows a portion Of an output file from the
test cases). This text-based representation contains
the information needed to produce an E—R diagram Of the
schema. REAVIEWS does not attempt this graphical
depiction for two reasons: (1) the diagramming task is
outside the research scope of this thesis, and (2) the
task extremely complex and only imperfectly implemented
in the commercial and research systems that have been
developed for this process (some examples Of which were
given in Chapter 2). In addition to the text-based
schema, REAVIEWS provides a graphical depiction Of the
schema Objects. This can be used look for entities
that exist in the m-schema but that were not
instantiated from any of the user views. This could be
used to aid the user in the identification Of
incomplete user specifications.

142

SOLVE CONFLICT:
When conflicts exist between the structural constraints
in a user-view relationship and an m-schema
relationship they are resolved using the three methods
described in Chapter 3. If the force or g-d—m methods
are used, the constraints are automatically changed,
and the user is notified. The actions taken are shown
in the REAVIEWS output window (see Figure 4-7). The
g-d-m method also notifies the user via the main
screen. When the resolve method is used, additional
information is acquired by the system (e.g., see Figure
4-10). Resolve strategies are primarily procedural
knowledge attached to their specific relationships and
invoked when necessary. Although the process
illustrated in the test case acquires the needed domain
knowledge directly from the user, a particular resolve
method might instead look in the m-schema for the
information needed.

LIST OF REFERENCES

LIST OF REFERENCES

Amer, T., A. D. Bailey, Jr., and P. De. 1987. A review Of
the computer information systems research related to
accounting and auditing. JOurnal of Information Systems
(Fall): 3—28.

Armitage, H. M. 1985. Linking management accounting with
computer technology. Research monograph. Hamilton,
Ontario: Society of Management Accountants Of Canada.

Batini, C., S. Ceri, and S. B. Navathe. 1992. Conceptual
database design: An entity-relationship approach.
Redwood City, CA: Benjamin/Cummings.

Batini, C., M. Lenzerini, and 8.8. Navathe. 1986. A
comparative analysis of methodologies for database
schema integration. ACM Computing Surveys 18
(December): 323—64.

Chandrasekaran, B. 1984. Expert systems: Matching techniques
to tasks. In Artificial intelligence applications for
business, ed. W. Reitman, 41-63. Norwood, NJ: Ablex.

Chen, P. P. 1976. The entity-relationship model—Toward a
unified view of data. ACM Transactions on Database
Systems 1 (March): 9—36.

Choobineh, J., M. Manning, J. Nunamaker, Jr., and B. R.
Konsynski. 1988. An expert database design system based
on analysis of forms. IEEE Transactions on Database
Engineering 14 (February): 242—53.

COdd, E. F. 1970. A relational model of data for large
shared data banks. Communications of the ACM 13:6.

COdd, E. F. 1991. The relational model for database
management. Reading, MA: Addison-Wesley.

Colantoni, C. S., R. P. Manes, and A. Whinston. 1971. A
unified approach to the theory of accounting and
information systems. Accounting Review 46 (January):
90—102.

143

144

Date, C. J. 1986. An introduction to database systems.
Reading, MA: Addison-Wesley.

DeMarco, T. 1979. Structured analysis and system
specification. Englewood Cliffs, NJ: Prentice-Hall.

Denna, E. L., and W. E. McCarthy. 1987. An Events Accounting
Foundation for DSS Implementation. In Decision Support
Systems: Theory and Applications, eds. C. W. Holsapple
and A. B. Whinston, 239-63. Springer-Verlag.

Ericsson, K. A., and H. A. Simon. 1984. Protocol analysis:
Verbal reports as data. Cambridge: MIT Press.

Everest, G. C., and R. Weber. 1977. A relational approach to
accounting models. Accounting Review 52 (April):
340—59.

Fikes, R., and T. Kehler. 1985. The role Of frame-based
representation in reasoning. Communications of the ACM
28 (September): 904—20.

Furtado, A. L., M. A. Casanova, and L. Tucherman. 1988. The
CHRIS consultant. In Entity-relationship approach:
Proceedings of the sixth international conference on
entity-relationship approach, ed. S. T. March, 515—32.
Amsterdam: North-Holland.

Gal, 6., and W. E. McCarthy. 1986. Operation of a relational
accounting system. Advances in Accounting 3: 83-112.

Gane, C., and T. Sarson. 1979. Structured systems analysis:
T0015 and techniques. Englewood Cliffs, NJ: Prentice-
Hall.

Geerts, G., and W. E. McCarthy. 1991. Database accounting
systems. In Information technology perspectives in
accounting: An integrated approach, eds. B. Williams
and B. J. Sproul.

Goldstein, R. C., and V. C. Storey. 1989. Some findings on
the intuitiveness of entity-relationship constructs. In
Proceedings of the eighth international conference on
entity-relationship approach, ed. F. H. Lochovsky,
6—20. Toronto: ER Institute.

Hammer, M., and D. McLeod. 1981. Database description with
SDM: A semantic model. ACM Transactions on Database
Systems 6 (September): 351—86.

Haseman, W. D., and A. B. Whinston. 1976. Design of a
multidimensional accounting system. Accounting Review
51 (January): 65—79.

145

Haseman, W. D., and A. B. Whinston. 1977. Introduction to
data management. Homewood, IL: Irwin.

Hawryskzkiewycz, I. T. 1984. Database analysis and design.
Chicago: Science Research Associates.

Hoffman, R. R. 1989. A survey of methods for eliciting the
knowledge of experts. SIGART Newsletter, no. 108
(April): 19—27.

Horngren, C. T., and G. Foster. 1987. Cost accounting: A
managerial emphasis. Englewood Cliffs, NJ: Prentice-
Hall.

Hull, R., and R. King. 1987. Semantic database modeling:
Survey, applications, and research issues. ACM
Computing Surveys 19 (September): 201—60.

Ijiri, Y. 1975. Theory of Accounting Measurement. Sarasota,
FL: American Accounting Association.

Johnson, P. E. 1983. What Kind of expert should a system be?
The Journal of Medicine and Philosophy (February):
77—97.

Kuntz, M., and R. Melchert. 1989. Ergonomic schema design
and browsing with more semantics in the Pasta-3
interface for E-R DBMSs. In Proceedings of the eighth
international conference on entity-relationship
approach, ed. F. H. Lochovsky, 263—78. Toronto: ER
Institute.

Lenat, D., M. Prakash, and M. Shepherd. 1986. CYC: Using
common sense knowledge to overcome brittleness and
knowledge acquisition bottlenecks. AI Magazine 6(4).

Lieberman, A. z., and A. B. Whinston. 1975. A structuring of
an events-accounting information system. Accounting
Review 50 (April): 246—58.

Lum, V., S. Ghosh, M. Schkolnick, D. Jefferson, 8. Su, T.
Fry, and B. Yao. 1979. 1978 New Orleans data base
design workshop. In Proceedings of the fifth
international conference on very large data bases,
328—339. New York: IEEE.

Mattessich, R. 1964. Accounting and analytical methods.
Homewood, IL: Irwin.

146

Mattos, N. M., and M. Michels. 1989. Modeling with KRISYS:
The design process of DB applications reviewed. In
Proceedings of the eighth international conference on
entity-relationship approach, ed. F. H. Lochovsky,
159—73. Toronto: ER Institute.

McCarthy, J. 1987. Generality in artificial intelligence.
Communications of the ACM 30 (December): 1030—35.

McCarthy, W. E. 1979. An entity-relationship view Of
accounting models. The Accounting Review 54 (October):
667—86.

McCarthy, W. E. 1982. The REA accounting model: A
generalized framework for accounting systems in a
shared data environment. The Accounting Review 57
(July): 554—78.

McCarthy, W. E., and S. R. Rockwell. 1988. On the embedding
of domain knowledge in automated software engineering
tools: The case Of accounting. In vol. 1 Of Advance
working papers of the second international workshop on
computer-aided software engineering, ed. E. J.
Chikofsky, 2-15-— 2-17. Cambridge, MA (July).

McCarthy, W. E., and S. R. Rockwell. 1989. The integrated
use Of first-order theories, reconstructive expertise,
and implementation heuristics in an accounting
information system design tool. In Proceedings of the
ninth international workshop on expert systems & their
applications, 537—48. Avignon, France: EC2.

McCarthy, W. E., S. R. Rockwell, and H. M. Armitage. 1989. A
structured methodology for the design of accounting
transaction systems in a shared data environment. In
Proceedings of the fifth annual structured techniques
association conference, ed. J. S. Weber, 194—207.
Chicago: STA.

McCarthy, W. E., S. R. Rockwell, and E. Wallingford. 1989.
Design, development, and deployment Of expert systems
within an Operational accounting framework. In
Proceedings of the workshop on innovative applications
of computers in accounting education. Lethbridge,
Alberta: University of Lethbridge. (To be reprinted in
book form)

Minsky, M. 1975. A framework for representing knowledge. In
The Psychology of Computer Vision, ed. P. H. Winston,
211—77. New York: McGraw-Hill.

147

Mylopolous, J., P. A. Bernstein, and H. K. T. Wong. 1980. A
language facility for designing database-intensive
applications. ACM Transactions on Database Systems 5
(June): 185—207.

Newell, A., and H. A. Simon. 1976. Computer science as
empirical inquiry: Symbols and search. COmmunications
of the ACM 19 (March): 113—26.

O'Leary, D. E. 1988. Software engineering and research
issues in accounting information systems. JOurnal of
Information Systems (Spring): 24—38.

Pescow, J. R., ed. 1976. The encyclopedia of accounting
systems. Englewood Cliffs, NJ: Prentice-Hall.

Reiner, D., G. Brown, M. Friedell, J. Lehman, A. McKee, P.
Rheingans, and A. Rosenthal. 1987. A database
designer's workbench. In Entity-relationship approach:
Proceedings of the fifth international conference on
entity-relationship approach, ed. 8. Spaccapietra,
347—60. Amsterdam: North-Holland.

Reuber, A. R. 1988. Opportunities for accounting information
systems research from a database perspective. JOurnal
of Information Systems (Fall): 87—103.

Roussopoulos, N., and R. T. Yeh. 1984. An adaptable
methodology for database design. IEEE Cbmputer (May):
64—80.

Shipman, D. W. 1981. The functional data model and data
language DAPLEX. ACM Transactions on Database Systems 6
(March): 140—73.

Smith, J. M., and D. L. P. Smith. 1977. Database
abstractions: Aggregation and generalization. ACM
Transactions on Database Systems 2 (June): 105—33.

Sorter, G. 1969. An "events" approach to basic accounting
theory. Accounting Review 44 (January): 12—19.

Sowa, J. F. 1984. Conceptual structures: Information
processing in mind and machine. Reading, MA: Addison-
Wesley.

148

Tauzovich, B. 1989. An expert system for conceptual data
modeling. In Proceedings of the eighth international
conference on entity-relationship approach, ed. F. H.
Lochovsky, 329—44. Toronto: ER Institute. '

Teorey, T. J., and J. P. Fry. 1982. Design of database
structures. Englewood Cliffs, NJ: Prentice-Hall.

Teorey, T. J., D. Yang, and J. P. Fry. 1986. A logical
design methodology for relational databases using the
extended entity-relationship model. ACM Computing
Surveys 18 (June): 197—222.

Tsichritzis, D. C., and A. Klug, eds. 1978. The
ANSI/X3/SPARC DBMS framework report Of the study group
on database management systems. Information Systems
3:173—91.

Turban, E. 1990. Decision support and expert systems:
Management support systems. New York: Macmillan.

Weber, R. 1986. Data models research in accounting: An
evaluation of wholesale distribution software.
Accounting Review 61 (July): 498—518.

Yourdon, E. 1989. Modern structured analysis. Englewood
Cliffs, NJ: Prentice-Hall.

"‘IIIlIlIIIIIlIIlI“