MATHEMATICAL FOUNDATIONS FOR
RELATIDNAL DATA BASES

Dissertation for the Degree of Ph. D.

MICHIGAN STATE UNIVERSITY
RAYMOND YOUSSEF FADOUS
1 9 7.5

IIIllIllIlIIlIﬂILIIIILIIjIIIIIIIIIIIIIIIIIIIII f

 

ABSTRACT

MATHEMATICAL FOUNDATIONS FOR RELATIONAL DATA BASES

By

Raymond Youssef Fadous

A new approach to data management systems has been the
introduction of a relation or table as a model for a data base.
Large sets of data can be represented in a few large tables, but
such a representation often leads to certain anomalies whenever
data items in the data base are added, deleted, or changed. To
reduce the effect of these anomalies, previous research identified
functional relations or dependencies between attributes and defined
second and third normal forms. These normal forms are dependent
on minimal subsets of attributes, called candidate keys or simply
keys, which uniquely identify each row of a table whenever the
usual operations of retrieval, deletion, and update are performed.
The research reported in this thesis considers the problem of
constructing algorithms for finding keys for relational
data bases and for determining whether a relation is in second or
third normal form. The thesis presents a new algorithm which
starts with the functional relations and finds all keys of a
normalized relation.

The mathematical properties of a relation in second and

third normal forms are studied in detail along with the properties

Raymond Youssef Fadous

of prime and non-prime attributes and algorithms are given for
determining whether a relation is in either second or third normal
form.

Finally, this thesis points to a weakness in the definitions
of a relation in third normal form, as proposed by Codd and Kent,
and advances the concept of a canonical normal form to overcome

the disclosed weakness.

MATHEMATICAL FOUNDATIONS FOR RELATIONAL DATA BASES

By

Raymond Youssef Fadous

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Computer Science

1975

ACKNOWLEDGEMENTS

I am very grateful to Dr. John J. Forsyth, my thesis adviser
and the co-chairman of my committee, for his encouragement, sugges-
tions, and help in writing this thesis in a clear and concise mathe-
matical form. I would like to thank Dr. Car1.V. Page, the co-chair-
man of my committee, for introducing me to this area of research.

My thanks to Dr. J.S. Frame and Dr. Richard C. Dubes for serving

on my committee and for the helpful discussions I had with them
while writing this thesis. Thanks go to the Division of Engineering
Research for providing me with financial assistance to do research.
My thanks to Dr. E.F. Codd of IBM for generously providing the
necessary references in this area of research.

Special thanks are due to my parents, my brothers, my wife
Dola, and to my children Sandy and Joe. Their love and understanding

helped me sustain the hard work of the doctoral program.

ii

Chapter

TABLE OF CONTENTS

ACKNOWLEDGEMENTS

SURVEY OF RELATIONAL DATA BASE THEORY

HHl-‘H
O. O
waI—I

Background

The Relational Model

Normal Forms

Contributions and Organization of the
Thesis

FINDING KEYS FOR RELATIONAL DATA BASES

NNNNNN
.0...
ost-‘th-d

Introduction

Basic Concepts and Notation
Mathematical Preliminaries
Finding the Keys

Examples

Chapter Summary and Remarks

FUNCTIONAL PARTITION AND REDUCTION

WWW
WNH

woo
U‘D

UN
\16\

Introduction

Functional Partition

Procedures for Obtaining Functional
Partition

3.3.1 Graphical Method

3.3.2 Algebraic Method

PrOperties of Functional Partition
Properties of Prime and Non-Prime
Attributes

Functional Reduction

Functional Deletion

3.7.1 Examples

Chapter Summary and Remarks

iii

Page

ii

U1

15
17

17
18
21
27
3O
32

34

34
35

37
37
39
40

49
51
54
55

Chapter

4.

NORMAL FORMS

1 Introduction
2 Properties of the Second Normal Form (SNF)
4.2.1 Mathematical Properties
4.2.2 Examples and Remarks
4.3 Procedure for SNF
4.3.1 Algorithm A2
4.3.2 Examples
4.4 PrOperties of the Third Normal Form (TNF)
4.4.1 Mathematical Properties
4.4.2 Examples
5 Procedure for TNF
6 Chapter Summary and Remarks

4.
4.

4.
4.
SUMMARY AND CONCLUSIONS

5.1 Conclusions

5.2 Alternative Definitions for Normal Forms

5.3 Suggestions for Future Research

BIBLIOGRAPHY

iv

Page
56

56
56
56
61
62
63
64
65
65
7O
71
71

73
73
74
80

82

GLOSSARIAL CROSS REFERENCE

1e_r_tn_
Adjacent
Algorithm A1
Algorithm A2
Algorithm A3
Attribute

Attribute Value

Candidate Key
Canonical Normal Form (CNF)

Connected

Dependent

First Normal Form (FNF)
Full Dependence
Functional Dependence

Functional Relation

Identity Partition
Implication Matrix

Implies
Key
KSNF

KTNF

Definition Number

 

3

.2

.3

.2

.1

.4
.2.1

.2

Page
35

27
63

71

78

35

37

19

75

76

Tg£m_

Natural Join

Non-Adjacent

Non-Prime Attribute
Non-Transitive Dependence

Normalized Relation

Optimal Second Normal Form

Optimal Third Normal Form

Primary Key
Prime Attribute

Projection

Reflexive Form

Relation

Second Normal Form (SNF)
Set of Functional Relations

Strict Transitive Dependence

Third Normal Form (TNF)
Transitive Closure
Transitive Dependence

Tuple

Universal Partition

Definition Number

 

1.3.11
3.2.1
1.3.5

1.3.9

1.3.13

1.3.14

1.3.6
1.3.5

1.3.1

3.7.1

1.3.8
2.2.1

1.3.12

1.3.10

1.3.9

3.2.4

vi

Page
13

35

11

14

14

51

10
18

13

11
20

11

37

CHAPTER 1

SURVEY OF RELATIONAL DATA BASE THEORY

The trend in computer application31n3to make the computer
accessible to a wide range of users, esPecially casual users, who
have little or no training in programming. It is estimated, by
IBM researchers Codd, et. a1 [28], that in the 1990's the growth
in on-line interaction by casual users will exceed that for all other
users by a large factor. Such users need a simple logical notion of
the data organization in order to form queries in a sensible way.
One of the key develOpments in data base design in recent years has
been the introduction of a relation or table as a model for a data
base. The tabular representation of data is the simplest and most
universally understood data structure. The initial structuring of
the data can often be described by a few large tables with many
columns. However, such a structure is usually inefficient and con-
tains redundant information which is not suitable for direct storage.
Codd [22] discussed the possibility of breaking up a table into
smaller ones, in what he termed second and third normal forms, so
as to remove these shortcomings. For this reason, Codd introduced
the concept of a functional relation between attributes and the
important notion of a key to help project a relation into subrela-
tions in second and third normal forms, and also to recover the
original table from the projections by a Special Operation called

the natural join.

Existing methods for finding keys are conceptually elegant
but computationally intimidating; these methods suffer from the
curse of dimensionality or combinatorial explosion associated with
finding prime implicants of Boolean functions having a large number
of variables. The algorithm in this thesis, in contrast, proceeds
by identifying subsets which can lead to keys and then carefully
selecting the appropriate intersections of these subsets which pro-
duce the keys. Further, Codd gives only the definition as a basis
for a relation to be in a normal form. But, how can the date base
administrator know whether the relations are in second or third
normal form? This thesis provides the answer to this question.

Kent [48] examined the underlying definitions of normal
forms, as preposed by Codd, and suggested some improvements. How-

ever, in this thesis, further examination of Kent's alternative

definitions reveals that the objectives of the- normal forms were
not totally met, and this led to the introduction of a new
canonical normal form that better meets the criteria for normal

forms in relational data bases.

1.1 Background

 

One of the main objectives in the design of data base systems
is the concept of data independence. Briefly stated, this means that
application programs are not affected by the way data is stored. The
data base is the data as physically stored, also referred to as the
storage structure. The user's view of the data base is called the
data model, also known as the logical structure. That is, to the

user the data model is the data base. Date, et. a1 [29, 30] give

many examples and explain these terms. Engles [40] and Meltzer [53]
provide a good background in the design of data independent access-
ing models.

Many different data models that claim data independence have
appeared in the literature since 1960. Two of these, the CODASYL
Data Base Task Group (DBTG) network model [17-20] and the relational
model, as proposed by E.F. Codd [21], have each attracted a large
number of followers. Bachman [4, 5], one of the main contributors
to the DBTG approach, clearly presents the goals desired in a data
independent model. The string model, as proposed by Senko, et. a1
[1, 3, 63], is also a data independent accessing model, based on the
work of Davies [35], Engles [40], and Meltzer [53], but is
overshadowed by the two previously mentioned models.

Date [32] gives a good introduction to the DBTG approach.
Engles [41] presents a thoughtful criticism of the DBTG model and
lists the IBM objections to it. Date, et. al [33] and Codd, et. al
[28] explain the main differences between the network and relational
approaches to data base design. The main differences and, in Codd's
view, the advantages of the relational model are simplicity, uni-
formity, completeness, and data independence. To a large extent,
simplicity may be considered the justification of the relational
approach. All the records in a file are stored as n-tuples in a
relation -- that is as a table. Also, the data sublanguages that
Operate on the relational data base are easier to learn than the
data manipulation language of the DBTG approach. The relational
model is uniform in the sense that any relationship between entities

or relations is also expressed as a relation. It is complete in the

sense that all data structures commonly used in data base systems
can be expressed in a relational form. As for data independence,
neither the data model nor the languages contain any reference to
storage structure or to access methods. Date [31] summarizes, in
a tutorial form, the main concepts of these two as well as other

models.

This thesis deals exclusively with the relational model as
originally proposed by Codd [21]. The relational approach is
motivated primarily, but not exclusively, by the desire to attract
the casual user. The data structure used is in the simplest form
possible, the tabular form. Codd [27] states the steps necessary
to help the user interact with the computer in a natural dialogue
with the objective of attaining an agreement between the user and
the system as to the user's needs. One of these steps is the de-
velopment of the language for manipulating the data model, which is
known as the data sublanguage. Codd [24] defined the data sublanguage
ALPHA (DSL-ALPHA), based on the first-order predicate calculus, for
eXpressing the user's queries. Queries can be eXpressed in terms of
a collection of Operations on the relations [21, 25], and this
collection is called the relational algebra. An earlier informa-
tion algebra, with a different set of operations, was defined by
Bosak [9]. Codd [25] proves that his DSL-ALPHA is complete in the
sense that any query expressable in the relational algebra is also
expressable in the relational calculus. Other languages have been
defined for accessing data in a relational data base. Boyce,
et. a1 [10] proposed a data sublanguage called SQUARE. It uses

a mathematical notation for expressing queries, but is less

sophisticated mathematically than DSL-ALPHA; hence, SQUARE is
easier to use by the casual user. They have also shown that SQUARE
is a complete sublanguage. In later papers, Boyce, et. a1 [11]
and Chamberlin, et. a1 [15] presented another sublanguage called
SEQUEL that has the functional capabilities of SQUARE but different
syntax. SQUARE is a concise mathematical APL-like notation, while
SEQUEL is a block structured English keyword language.

There has been much research done on the relational model
and Codd [26] reviews briefly the latest and the current areas of

investigations that have been undertaken.

1.2 The Relational Model

 

Given sets, D1,D Dn, called domains, not necessarily

2,...,

distinct, a relation m of degree n, defined over D1,D2,... Dn’

 

is a Subset of the cartesian product D X D

1 2 X...X D“. That is,

m is a set of elements each of the form (d1,d

2,...,dn) where
each di is an element of Di' The set D1 is called the i-th
domain of a. Each element (d1,d2,...,dn) is called an n-tuple

or,simply,tuple of R. Each (11 is the i-th component of a tuple.
An attribute is a name assigned to a domain of a relation. Any

value associated with an attribute is called an attribute value.

 

While the domains of a relation need not be distinct, the attribute
names assigned to them must all be distinct. The relations are time-
varying relations; tuples may be updated, deleted, and inserted in a
relation. Throughout this thesis, the term relation means a time-
varying relation. The logical view of a relation of degree n is

a rectangular array or table with n columns and m rows such that

m varies from one update to another. This work deals only with

normalized relations. These are relations whose attribute values

 

are simple and are not themselves relations. Codd [21, 23] lists
five prOperties that are satisfied by any normalized relation R.
These prOperties are:

l. R is column homogeneous, that is in any one column
all the attribute values are of the same kind, whereas
items in different columns need not be of the same
kind.

2. Each attribute value is a simple number or a char-
acter string and not a set of numbers or a repeating
group.

3. All rows of a relation must be distinct.

4. The ordering of rows within a relation is immaterial.

5. Since the attributes are distinct, the ordering of
columns within a relation is immaterial.

With prOperty 5 added, the exact mathematical term is
relationship and not a relation, but here this distinction will be
ignored. A relation which satisfies property 2 is said to be
normalized. Date, et. a1 [33] give a good exposition of the logical
structure of the data base. In the relational approach, the data
model definition (DMD) defines the relations and the underlying

attributes that together constitute the relational model.

1 .3 Normal Forms

 

The notion of functional dependence, as defined by Codd
[22], plays a fundamental role in the theory which governs the de-
composition Of relations into subrelations in normal forms. Codd

[23] listed six aims of normalization of relations. The two most

important are:

1. To reduce the need for restructuring the collection

of relations as new types of data are introduced, and
thus increase the life Span of application programs.

2. To reduce the incidence of undesirable insertion, up-

date, and delition anomalies.

An example will be given later, in this section, to eXplain
these different anomalies. Delobel, et. a1 [36] use the term
"functional relation" to mean functional dependence. A whole theory
of relations can be built around this simple concept. First, one

needs to define the term prgjection of a relation 3 on a subset of

 

attributes of m. The notation used in the following definition is

eXplained in Chapter 2.

Definition 1.3.1

 

Let R be a relation defined on the set of attributes
Q = A1A2A3 ... An' For any a = AlAZ ... Am’ a subset of Q,

the projection of m on a is defined as:

Rd = {(al,a2,...,am)I(a1,a2,...,an) E R] ,

also written as ‘R[A1A2 ... Am].

Definition 1.3.2

 

The set of attributes B in a relation 8 of a degree n
is functionally dependent or just dependent on the set of
attributes A in m, if, at any instant of time, there exists

a function, called functional relation, F : R[A] ~t%[B] or

 

simply A A’B. The term A implies B is also used when-

ever A ~ B.

The notation A f B indicates that B is not dependent
on A, while A «'B says that A and B are dependent on
each other.

Definition 1.3.3

 

The set of attributes B in a relation m is EBLLL
dependent on the set of attributes A in R, written as
A =‘B, if

1. B is dependent on A and

2. B is not dependent on any prOper subset of A.

If B is not fully dependent on A, one writes A # B,
while A ”'3 indicates that A and B are fully dependent

on each other.

The concept of a key, that will be defined next, is the
mathematical basis of the relational model. The normal forms are

defined in terms of the candidate keys or simply keys of the rela-

 

tion. Also, the search algorithms use the keys of a relation to
retrieve or insert information in the data base. The remainder of
this thesis uses the term key instead of candidate key.

Definition 1.3.4

 

Each key of a relation R is a non-empty subset, K, of Q,
the set of all attributes in m, Such that O is fully dependent
on K; or in symbols K:= Q.

Definition 1.3.5

 

An attribute in R is called a prime attribute if it is a
member of any key of m. Otherwise, it is non-prime.

Definition 1.3.6

 

Every relation 3 has a primary key which is chosen

 

arbitrarily from the set of all keys of 8. Also, no
attribute in a primary key is allowed to have an undefined
value.

Definition 1.3.7

 

Every normalized relation is said to be in first normal

 

form (FNF).

Example 1 .3 .1

 

Consider the teaching schedule relation, Tl, defined on the
attributes P = Professor name, C = Course number, R = Room
number, H = Hour of the course meeting, and O = Office number
of a professor. Assume that a course may be taught by many
professors and that a professor may teach many courses, but
that a professor can only be in one room at any one time.
Also, assume that each professor is assigned exactly one
office number, and that many professors can share the same
office. The above statement of the problem provides the
following functional relations:

PH-oRC;RH-+PC;P-+0

At some instant of time T1 might look like this:

Tl : P C R H O
Forsyth 817 1 9 400
Forsyth 818 1 10 400
Dubes 805 2 9 400
Dubes 817 2 10 400
Frame 851 3 9 243
Frame 831 4 11 243
Page 841 2 11 404
Page 841 2 12 404

10

The relation T1 is in FNF because all the entries in T1 are
simple. The combinations of attributes RH and PH are the only
keys of T1 based only on the functional relations given above.
Then, P, R, and H are prime attributes, while C and O are non-
prime. Choose PH, say, as the primary key.

Observe that if Professor Forsyth moved to office number 402,
then more than one tuple has to be updated. This is called an
update anomaly. If Professor Page is not teaching a course
this year but he will next year, then the information about his
office number should be retained; but, if the Page tuples are
deleted, that information is lost. This is an example of a
deletion anomaly. Finally, suppose one wishes to record the
office number of visiting Professor Jones who has not been yet
assigned a course number to teach. Since PH is the primary key,
one must fabricate a fictitious hour number in order to store
this information. This is called an insertion anomaly.

To reduce the effect of these anomalies, transitive de-
pendence of attributes upon one another were introduced and
the second and third normal forms were defined by Codd [22].

Definition 1.3.8

 

A relation R is in second normal form (SNF) if

 

l. R is in first normal form and
2. Every non-prime attribute in m is fully dependent on
every key of 8.

Example 1.3.2

 

If relation T1, in Example 1.3.1, were projected into two

subrelations, T2[PCRH] and T3[PO], then, T3, for example, looks

11

like this:
T3 : P O
Forsyth 400
Dubes 400
Frame 243
Page 404

The only key of T3 is P, hence the primary key, while T2
has the same keys as T1. Note that T2 and T3 are each in
second normal form. This is so, since T3 has a simple key,
that is a key with just one attribute, and C, the non-prime
attribute in T2, is not dependent on any proper subsets of
the keys PH and RH. The anomalies previously mentioned have
disappeared.

Definition 1.3.9

The set of attributes C in a relation R is transitively

 

dependent on the set of attributes A in a, written as
A- —'C if
1. A and C are disjoint and
2. There exists a set of attributes B in m, disjoint
from both A and C, Such that A.—.B, B f'A, and

B —oC. Otherwise, C is non-transitively dependent
on A.

Definition 1.3.10

A relation 3 in second normal form is in third normal

 

form (TNF) if every non-prime attribute in m is non-

transitively dependent on each key of m.

12

Example 1.3.3

 

Given the following data base with relation T4 defined on
the attributes P = Professor name, C = City of residence, and

Z = Zip code. At some instant of time T4 might look like this:

T4 : P C Z
Page E.L. 48823
Frame E.L. 48823
Forsyth E.L. 48823
Dubes E.L. 48823

The statement of the problem provides the following func-

tional relations:
P —oC; C a Z

The only key of T4 is P, hence the primary key, and so, by
definition, T4 is in second normal form; but T4 is not in third
normal form since Z is transitively dependent on P. Suppose
that the Zip Code of E.L. is changed to 48824, then many tuples
would have to be updated. This is a consequence of the trans-
itive dependence. To get rid of this anomaly, one can project

T4 into T5[PC] and T6[CZ].

T5 : P C and T6 : C Z
Page E.L. E.L. 48823
Frame E.L.
Dubes E.L.
Forsyth E.L.

Now, T5 and T6 are each in third normal form, as were T2

and T3.

13

Definition 1.3.11

Let R, and R2 be any relations over the set of attributes

l
01 = {a,b] and 05 = {b,c] reSpectively such that

 

= . *
a 01 n 02 # m, then the natural_jorn, denoted m1 m2, of
mi and $2 over a is defined by
m1 * R2 = {(a,b,c)\(a,b) 6 R1 and (b,c) 6 m2]. This defini-
tion can be extended to any sets 01 and 02 such that

a = 01 0 02 ¢ ¢.
Example 1.3.4

Consider the relations T4, T5, and T6 defined in Example
1.3.3. Then T4 = T5 * T6. The natural join is only one of
the relational operations introduced by Codd [21, 24]. In
projecting a relation 3 into subrelations of m, it is
important to be able to recover the original relation R
from the subrelations.l This idea is emphasized later in the
definitions of optimal normal forms. This recovery can be
done with the natural join operation. Suppose one projects
a relation R[ABC] into ml[AB] and m2[BC]; it is not always
true that m = ml * $2. Delobel, et. a1 [36] gave the suf-

ficient condition that, if B -oC, then

ER[ABC] = 2R1[AB] * m2[Bc].

Although this thesis does not deal with optimization, the
following definitions are given here to complete the presenta-
tion of the relational model as originally prOposed by Codd.

Definition 1.3.12
The transitive dependence A- #10, in a relation n, is

a strict transitive dependence if there exists a set of

14

attributes B in m disjoint from A and C such that:
1. A-+B,Bf~A and
2. B —~C, C foB.

Definition 1.3.13

A collection of relations C, which are projections in a
relation R, is in Optimal second normal form if the following
three conditions hold:

1. All the relations in C are in second normal form.

2. a is the natural join of the relations in C.

3. No smaller collection of relations has these prOperties.

Definition 1.3.14

Let C2 be a collection of relations in Optimal second
normal form, and C3, a collection of relations in third normal

form, which are projections from the relations in C2. Then,
the collection C3 is in Optimal third normal form if the

following 4 conditions hold:

1. All the relations in C3 are in third normal form.

2. C2 can be recovered with the natural join of the

relations in C .

3

3. No relation in C3 contains any pair of attributes
that are strictly transitively dependent on any relation
in C2.
4. No smaller collection of relations has these prOperties.
The relational model, as proposed by Codd, lacks simple and
efficient procedures to find all keys, and to determine whether a

relation is in second or third normal form. This thesis reports

efforts made to take constructive steps in this direction.

l4

attributes B in m disjoint from A and C such that:

l. A «’B, B f A and

2. B —.C, C foB.
Definition 1.3.13

A collection of relations C, which are projections in a
relation R, is in Optimal second normal form if the following
three conditions hold:

1. All the relations in C are in second normal form.

2. R is the natural join of the relations in C.

3. No smaller collection of relations has these prOperties.

Definition 1.3.14

Let C2 be a collection of relations in Optimal second
normal form, and C3, a collection of relations in third normal

form, which are projections from the relations in C Then,

2.
the collection C3 is in Optimal third normal form if the

following 4 conditions hold:

1. All the relations in C3 are in third normal form.

2. C2 can be recovered with the natural join of the

relations in C .

3

3. No relation in C3 contains any pair of attributes

that are strictly transitively dependent on any relation

in C2.
4. No smaller collection of relations has these prOperties.
The relational model, as proposed by Codd, lacks simple and

efficient procedures to find all keys, and to determine whether a

relation is in second or third normal form. This thesis reports

efforts made to take constructive steps in this direction.

15

1.4 Contributions and Organization of the Thesis

Functional relations are shown to enjoy a rich algebraic and
set theoretic structure, and that the normal forms can be studied
within this framework.

This mathematical structure leads to a new approach for
starting with the functional relations and finding all of the keys
in a normalized relation. The algorithm uses an implication matrix,
its transitive closure and a systematic method for introducing
attributes to form keys.

A detailed mathematical characterization of prime and non-

prime attributes leads to the concepts of functional partition, re-

duction, and deletion. Also, it is shown that, under certain con-
ditions, many computational steps can be saved in the algorithm for
finding the keys.

The mathematical prOperties of a relation in second and
third normal forms are studied in detail and algorithms are defined
for determining whether a relation is in any of these forms.

Then, this thesis proceeds to point to a weakness in the definitions
of third normal form, as proposed by Codd and Kent [48], and a new
canonical normal form (CNF), an improvement in terms of reducing the
update anomaly, is suggested.

The basic definitions of the relational model, as given by
Codd, are all included in Chapter 1. Chapter 2 defines the implica-
tion matrix and its transitive closure which form the basis of the
new algebraic approach, taken in this thesis, in the study of the
normal forms. Also, in Chapter 2, the algorithm for finding all
keys in a relation is presented. Chapter 3 deals with the mathe-

matical properties of prime and non-prime attributes, and with the

16

new concepts of functional partition, reduction, and deletion. In
Chapter 4, the mathematical properties, and algorithms for the normal
forms, are given. Conclusions and suggestions for future research
are offered in Chapter 5.

All definitions, theorems and lemmas in the remainder of
this thesis are the original work of the author unless otherwise in-

dicated.

CHAPTER 2

FINDING KEYS FOR RELATIONAL DATA BASES

2.1 Introduction

 

E.F. Codd [21] prOposed the relational data base model in
an effort to provide "data independence" for applications programs.
An applications programmer working with a relational data base
may view the data as being stored in tabular form; each unique
collection of data items which describes a single entity occupies
a unique row of some table. A number of different collections of
tables may be used to represent any given data base. Certain
collections might be preferable to others if they eliminate
exceptional conditions, or anomalies, which can occur when items
in the data base are added, deleted, or changed. ~Several re-
searchers have investigated normal forms for relational data bases
and characterized their role in eliminating addition, deletion,
and insertion anomalies. A good overview of these concepts can be
obtained by reading Codd [22-23] and Kent [48].

In order to establish an appropriate normal form for a re-
lational data base, one must identify minimal subsets of attributes,
called kpygj which uniquely determine the values of the remaining
attributes, so that the rows of the tables in the data base have
the required prOperty of uniqueness. Delobel and Casey [36] de-
ve10ped an algorithm for finding all keys in a relational data base,

17

18

given the set of fundamental functional relations defined on the
data base. Their method maps the functional relations to a Boolean
function and produces the keys from those prime implicants of the
Boolean function which cover the implicant which consists of all

of the uncomplemented variables of the function. This Chapter
describes an alternative algorithm, whose correctness is shown to
derive directly from the fundamental prOperties of the functional

relations as given by Armstrong [2].

2.2 Basic Concepts and Notation

 

In the discussion which follows, the symbol 0 will be
used to denote the set of all n attributes A1,A2,...,An on
which a relation 3 of degree n is defined. The empty set is
denoted by ¢. The union and intersection Operations between sets
will be denoted by U and n, reSpectively. If A is a set, then

[A] is the cardinality of A. The usual set notation for

a set 0 With n elements A1,A2,...,An is D = [A1,A2,---,An];

while here the notation Q = AlA2 ... An is used. Two sets X

and Y are disjoint if X FIY = O. A partition on a set 0 is

a collection of non-empty disjoint subsets of 0 whose union is 0.

Definition 2.2.1

 

Let Li and Ri’ i = 1,2,...,m, be non-empty subsets in

Q, then the {Li T‘Ri} is called the set of functional

 

relations in Q.

 

Armstrong [2] has shown that the following prOperties,

Pl - P5, are sufficient to find the set of all functional relations

19

that can be derived from a given set of functional relations. Let
A, B, C, and D be any non-empty subsets of Q then,

Pl. Reflexivity: A ~.A;

P2. Transitivity: If .A a B and B —~C, then A —.C;

P3. Augmentation: If A.—oB, then AI —~B for any A' D.A;

P4. Projectivity: If A a B, then A a B' for any B' C B

and B' # m;
P5. Additivity: If A ~ B and C —oD, then A U C —oB d D.
The purpose of this chapter is to exhibit an algorithm which

generates all keys which can be found using only the given functional
relations and prOperties Pl - P5. The starting point is a set of
functional relations Specified by the data base administrator or the
user. Let {Li ..Ri}’ i = 1,2,...,m, be the set of functional rela-
tions in 0. Without loss of generality, one can impose the restric-
tions that L1 # Lj for i # j and that the intersection Li FIRi,
i = 1,2,...,m, is empty. Note that if AB —oBC, then it is true
that AB ~oC, but it is not necessarily true that A 4’BC or that

A «IC.

The implication matrix P of a relation R of degree n,

 

defined on Q = A A

1 2 . An’ is an m X n matrix whose rows are

labeled L ,L ,...,L and whose columns are labeled A ,A ,...,A
l 2 m l 2 n

such that:

1 if Aj 6 (Li U Ri)

i o
J 0 otherwise

20

Example 2.2.1

Consider the following functional relations where

Q = ABCDEFG.
ABC «DEG; AB «CF; CD -oEF; EG -.AC
Then, the implication matrix P is

A B C D E F G

EC 1 0 1 0 1 0 1

The fact that row labels of ?P are not isomorphic to Sub-
sets of the column labels requires an extension of the usual method
of finding the transitive closure of an implication matrix. The

*
transitive clOSure, P , of P is defined as follows:
*
1. Put P =P

*
2. For every two distinct rows Li and Lj in P , if for

every attribute Ak in Lj’ the entry LiAk = 1, then
copy all 1 entries in row Lj into the corresponding
entries in row Li' The 0 entries in row Li change to

1 if the correSponding entries in row Lj are l and the
original 1 entries in row Li will remain. This changes

*
P

*
3. Repeat part 2 above until P cannot be changed any

further.

21

Example 2.2.2

 

Starting with P in Example 2.2.1, then

EC 1 0 1 0 l 0 1

where 1_means a 0 has been changed to 1 in the process of

taking the transitive closure.

Before proceeding with the proof of the basic theorems that
support the algorithm, the following notation is needed. From the
definition of P*, P* might effectively indicate a new set of
functional relations. These new functional relations are denoted
Li a Ti’ i = 1,2,...,m, where Ti D'Ri, Li H T1 = ¢. In P*, it
might be the case that some rows are not all 1. The set of
attributes that correspond to 0 entries in row Li Will be denoted
by T]. Note that T; = ¢ if row L1 is all 1 in P* and that
n = LiTiTi is a partition of 0, if T] # O. It is also assumed
that L1 and T1 are not empty.

2.3 Mathematical Preliminaries

The mathematical properties of the functional relations
are stated in Lenmas 2.3.1 - 2.3.7, and the theoretical foundations
for the steps taken in the algorithm for finding the keys are pre-
sented in Theorems 2.3.1 - 2.3.3.

Lemma 2.3.1

 

*
In P , 1f Li a Ti and Li U Ti # Q, then Li can be

22

extended by the subset T] such that L1 U T] 4 0-

Proof:

Since Li-oTi and Li-oLi,

u _. u I _. v =
Also Ti Ti (by P1), hence L1 u Ti Li u Ti u Ti 0

then L1 —.L1 U Ti (by P5).

(by P5).

Lemma 2.3.2

*
In P , if row Lj is all 1 and Lj CLi U Ti’ then row

 

13_ is also all 1.
Proof:

Since row Lj is all 1, this implies that Lj « 0. Also
Since Li a Ti’ then Li -'Li U Ti (by P5). But

Li U Ti ﬁiLj (by P4) and so L

—. —. b .
1 Lj n ( y P2)

Lemma 2 .3.3

 

*
In P , if L is a Subset of Li U T and if row Li

j i

is not all 1, hence row Lj is not all 1, then, for T]
and T; such that L1 U Ti «10 and Lj U T] a'ﬂ, the inter-
section Of T; and T5 is not empty.
met:

It follows from the transitive property P2, that any 1
entry in row Lj is a 1 entry in row Li and hence any 0

entry in row Li is a 0 entry in row L But T;

j'

correSponds to the 0 entries in row Li’ hence T; O T5 # ¢.

Example 2.3.1

 

n = ABCDE A B c D E
1. AB —.c * AB 1 1 1 l o
P =
2. AC —.D AC 1 o 1 1 0
t = I = I I
then T1 E, T2 BE and T1 0 T2 # ¢.

23

Lemma 2.3.4
9:
In P , if Lj is a subset of Li U 1'1 and if I'OW L1

is not all 1, then for any a C Tj' such that Lj U Q/j -* O,

i

the intersection of Ti and aj is not empty.

M:
The proof is by contradiction. If aj 0 Ti = O, then
C - I . . .

or]. L1 U Ti Since LiTiTi is a partition on 0 Also

Lj C Li U Ti’ hence L U aj C Li U Ti. But this would imply

J

that Li -+ Li U Ti aLj U or]. -+ D, which is a contradiction

since Li /- 0. Therefore, a H Ti 9‘ O.

3
Lemma 2 .3 .5

 

~1-
In P , if L is not a Subset of L_ U Ti and if row
i

J

Lj is all 1 and row L1 is not all 1, then for Ti Such

that L1 U Ti ..0, the intersection of Ti and Lj is not
empty.
2.1222:

If the intersection of Ti and L is empty, this implies

J
that the 1 entries in row L1 include Lj and hence Lj
would be a Subset of L1 U Ti which contradicts the hypothesis.
Therefore, T]!- n Lj 1‘ (D.

Example 2 .3 .2

O=ABCDE ABC DE
1. A-oBC * A 1 1 1 0 0
P =
2. AD—~E AD 1 _1_ l 1 1
l__ = I
then 131-1313,}.2 AD and TIHL2#¢.
Lemma2.3.6

 

a:
In P , if L is not a subset of Li U Ti and if row

J

24

Lj is not all 1 and row Li is not all 1, then for Ti

such that L1 U Ti - 0, the intersection of Ti and LJ.
is not empty.
232015.:

The proof is exactly the same as in Lemma 2.3.5.

Example 2 .3.3

 

0=ABCDE A B c D E
1. A-aBC * A 1 1 1 0 0
P =
2. BD—oA BD 1 1 1_ 1 o
u: = u
then '131 DE, 12 BD and T1 an #eb.
Lemma 2.3.7

 

'1:
In P , if L is a subset of L. U Ti’ then L and
i

j i
Li U L, imply the same subset in Q.
J
Proof:
*
In P , L1 -*L1 U Ti and so, by definition of Ti’ if

Zi C. 0 such that Z 23Li U T1, then Li [4 21. Similarly,

i

Lj -0 Lj U Tj . But, by hypothesis, Lj

L1 - Li U Ti _. Lj -. I.j U Tj by projectivity and tranSitiVity

reSpectively which implies that Lj U T

CLi U Ti and so

chi U Ti. Also

L «T andL -+T,andso

1 i J J
LiULj—oLiUTiULjUTj =Liu'r11 since

Lj U Tj §:Li U Ti. Therefore, L

same subset, Li U Ti’ in O.

and Li U L imply the

i J

Example 2 .3 .4

Refer back to Example 2.3.1. Note that L1 = AB .. ABCD

and L1 U L2 = ABC «ABCD.

25

Theorem 2.3.1

 

*
In P , if L is not a subset of Li U Ti’ then there

J

exists a subset a C‘L , possibly a = O if row L1 is all

J
1, Such that L1 U a and L1 U Lj imply the same subset
in 0.

Proof:

 

If Li ~10, then a = O and hence it is always true that
L1 U Lj ~10.

If L1 #10, then by Lemmas 2.3.5 and 2.3.6, for T] such
that L1 u T; ..0, it follows that a = Ti' nLj ’9 O. So
a C Lj. Since L TiT' is a partition on 0, then either

i i

| l
chri or LJ.ﬂ(LiUTi)#O. If chT1,

0 = L FIT; = Lj and Li U a = Li U Lj and hence they imply

then

J

the same subset in Q. If Lj ¢ T], then L FI(Li U Ti) # O.

J

j j j FITi) since LiTiTi

is a partition on 0; hence Lj = [Lj FI(Li U Ti)] U a. Also

But Lj=(L nLi)U(L nTi)U(L

L 0(L1UTi) CLiUTi, so

I."
II

[Lj ﬂ(LiUTi)]UaCLiUTan. But Li—vTi,so

L U a « Ti' Therefore, using Lemma 2.3.7, if

L. C (‘Li U a) U Ti where Li U a a T then the Subset in

1,
0 dependent on L1 U a is the same subset dependent on the

union (Li U a) U L = Li U Lj since a :‘L .

J J

Example 2 .3 .5

 

In Example 2.3.3, note that a = D C‘Lz = BD and that
A U D and A U BD imply the same subset, ABCD, in Q.
If K = Li U ai is a key, then K is said to be derived

from L..
1

26

The following theorem shows that unions of Li and Lj
are not useful in deriving keys.

Theorem 2.3.2

 

Any key that can be derived from Li U Lj can also be

derived from Li or Lj separately.
Proof:

By Lemma 2.3.7 and Theorem 2.3.1, for some a C'L , where

J

a could possibly be empty, Li U a and Li U L imply the

J
same subset in Q. If Li U 0 ~10, then Li U Lj ~>0, but

Li U Lj QiLi U 0, hence Li U Lj cannot be a key whenever
Li U a is unless Li U Lj = Li U a. If Li U a #>Q, then,
by Lemma 2.3.1, Li U a and Li U Lj can be extended by the
same minimum subset 3 so that L1 U (a U B) ~»Q and

(Li U Lj) U B a Q. But a C‘Lj and for any 8,

a U 8 C Lj U 8, hence Li U (a U B) c (Li U Lj) U B and

therefore (Li U Lj) U B can never be a key whenever

Liumue) is unless (LiUL)UB=LiU(OIUB)-

J

The previous theorems and lemmas Show that to find the keys
of a relation a, one needs only consider the functional relations

L a T., i = 1,2,...,m, and that taking the union L1 U L of any

J

two Li’ L will not result in any new key that cannot be found

J

from L1: or Lj separately. So, the algorithm uses the transitive
property repeatedly to find the minimal ai such that Li U 01 4g).
The following theorem asserts that for each Li U a1 in T2, to

be explained in the algorithm,'Li U 01 4:1.

 

 

2.4

27

Theorem 2.3.3

If L is any Subset of 0 such that L a Q, and if T; # m,
then Li U (Ti (TL) ..9,

By Lemmas 2.3.3 - 2.3.6, T] (IL # O whenever T] ¥ O.

Partition L into L and L such that L C‘L J T and

1 2 l i 1
L2 C T]; this is possible since LiTiTi is a partition on
._ I —O _. o o o
{2 and L2 —-Ti FIL # O. L1 L1 U Ti L1 by pr0ject1Vity

(P4). Ti [IL2 = L2 ~TL2 by reflexivity (Pl). Since

“Ti: ﬂL2 C T;- (IL, Ti (TL «L2 by augmentation (P3). Also by

additivity (P5), Li U (T; (IL) -»L1 U L2 = L ~10. 30,

Li U (T; n'L) #10 by transitivity (P2).

Finding the Keys

Algorithm A1

1. Form P* with row labels Li’ i = 1,2,...,m.
2. T1 = O and T2 = O-

3. For i = 1 to m enter into T1 each Li U T] where

ITII 2 O.
1.
I I ' I I
4..If LiUTiCLJUTj and 1f \Tj\51 and \Ti\_<_1,
then delete Lj U T] from T1.

5. If \Ti‘ g_l for all remaining entries in T1 then
terminate the algorithm. Tl contains all keys.
a. Else for all i in T1 such that \TiI 2 2 form
= I I
aij Ti H (Lj U Tj) for all j # i where
\T3\ 2 O.
J

b. For each i, delete any aij' such that aij C 0.

31‘3”-

28

c. For each remaining a enter Li U a into T2.

ij ij
Then delete from T2 any set which is a superset of
any other set in T2.
d. For all i in T1 such that ITiI 2 2 and all Lj
in T2 for which Li If: Lj form aik = T] n (Lj u ajk).
e. For each i, delete any aik' such that aik C aik"
ka‘k'. ; “I
f. Enter into T2 any Li U aik which is not a superset
of a set already in T2. Then delete from T2 any set
which is a superset of any other set in T2. If any
new entries are thus created in T2, repeat from Step
5d. Otherwise go to step 6.
6. Copy from T1 into T2 any sets Li U T] in T1 where
ITiI 5.1.
7. Delete from T2 any set which is a Superset of another set

in T2. The remaining sets in T2 are all of the keys and

the algorithm terminates.

By construction, the entries in T1 satisfy Li U T] «IO
and, by Theorem 2.3.3, all entries in T2 satisfy Li U aij ~»Q.

It remains only to show that the algorithm is complete,
in the sense that it finds all keys. If Li U ai’ where IaiI 2 O,
is a key, then, for I01] = 0, L1 is a key only if, in jP*’
Li —v0, and no subset of L1 implies 0. Therefore, to Show
completeness it remains only to show that if ‘01] 2 1, then
Li U ai is found by the algorithm. The following completeness

theorem will Show this. Assume that the set of functional relations

has more than one element in it, otherwise it is a trivial case.

29

Theorem 2.4.1
_ I
If Li u all, IdiI 2 1, is a key, then “i - Ti 0 (Lj u Bj),
I83] 2 0, for some j ¥ i.
Proof:
If Li U ai is a key, then Li U a1 «'0. But Since

Li #10, this implies that there exists a subset

Z CLi U Ti U ai such that Zi —IT£ where

N
II

Oi U [Zi n (Li U Ti)] and Z1 = 01 only if

Z. H (Li U Ti) = O. There is always one such 2 , namely

i

= Z '. ’ Z a T'
i 0T1 So, if i

= ... .... '.

i i

one must Show that there exists a j i i, such that

Zi = Lj U Bj a T]

only one functional relation which would give ai

, except in the trivial case when there is

T} and
i

Z = --I —-I '. -I ' = . '
1 Li U a1 0 Ti If Lj Ti’ then let 21 Lj This

is done for each 'L such that Lj -0T£, then choose the

J

minimal Oi -- no superset of a1 is chosen, where

= ' _.
a. T1 FILj and L1 U ai 0. If L

I
1 {'Ti, then for some

J

Bj’ Lj U Bj aiTi by augmentation P3 and additivity P5 and for

I I I I ___.
some Bj, Lj U Bj U 81 a T1 U Lj U 81 U 81 Q. where

B] ; L1 U T1, 3831“ by P3 and P5. Therefore 35 FIT] = O

I 8
since LiTiTi is a partition on 0. Let Zi Lj U Bj’ then

T] n (‘Lj U 8j U 33) = T] rI(Lj j) = T] n Z1 = a1. Again

this is done for each Lj U 31 such that Lj U Bj « Ti. But,

this is exactly how the algorithm iteratively finds the

U 8

different a so that L1 U a1 4.9 and then chooses the

1

minimal of these a1.

30

When Algorithm Al terminates, every set remaining in T2
is a key. Further, by Theorem 2.4.1, no keys are missed (not in T2).

Therefore, the algorithm is complete.

2.5 Examples

Example 2.5.1

 

The process of finding the keys of the relation in Example
2.2.1, will be presented in a somewhat Optimized manner, but
the steps still agree with the algorithm.

T1: AB; ABC; CD UABG; EG U BDF
Delete ABC 2.AB from T1, so
Tl: AB; CD U ABG; EG U BDF.
Apply 5a-5 of the algorithm as follows.

f
EC U [BDF n (CD u ABG)]

EG U BD; CD U [ABC FI(EG U BDF)] CD U BG

EG U (BDF 0 AB)

EGUB;CDU(ABGnAB) CDUAB

Therefore, T2 holds:
T2: EG U B; CD U AB; CD U BG.
Intersect every subset T], where \TiI 2 2, in T1 with every

subset in T2, then add this intersection to L1 of Li U T;

and put the resulting subset in T2 if it is not a superset of

a set already in T2.

BDF FICDAB BD gives EG U BD (superset, do not put in T2)

BDF FICDBG

BD gives EG U BD (superset, do not put in T2)
ABC FIEGB = BG Gives CD U BG (already in T2).

Every Ti, IT£I 2 2, in T1 has been intersected with every sub-
set in T2 and no new subsets resulted for T2. Proceed to step

6 of the algorithm to get:

31

T2: AB; EGB; CDAB; CDBG.
Applying step 7 leaves the keys:

T2: AB; EGB; CDBG.

Example 2.5.2

 

Consider the following functional relations:
ABC —oDF; BCD 4G; CE «AD; DG «EF; BF -»CG.

*
The transitive closure, P , is:

A B C D E F G

ABC 1 l l l

|I-|
...:
"-4

BCD l 1 1

lb-'
II-l
IH

,...I

DG 0 0 0 1 l 1 1

BF 0 1 l 0 0 1 1

So,
Tl: ABC; BCD; CE U BFG; DG U ABC; BF U ADE.

Apply 58-5c of Algorithm A1 as follows:

BFU [ADE n (DGUABC)] BFUAD ; BFU [ADE n(CEUBFG)]

BF u (ADE n BCD)

BFUD ;BFU(ADEnABC)

DGU [ABC n (BFUADE)] DGUAB ;DGU [ABC n (CEUBFG)]

DC u (ABC n BCD)

DGUBC;DGU(ABCnABC)

CE u [BFG n (BF u ADE)]

CEUBF;CEU[BFGn(DGUABC)]

CE u (BFG n BCD)

CE U B ; CE U (BFG FIABC)

Therefore T2 holds:
T2: BF U A; BF U D; BF U E; CE U B; DG U AB; DG U BC.

Now, applying Sd-Sf of the algorithm gives:

BF U E
BF U A
DC U BC
DG U ABC
CE U BG

CE U B

32

BF u [ADE rI(CE u B)] = BF 0 E (already in T2)

BF U [ADE FI(DG U AB)] = BF U AD (superset, do not put in T2)

BF U [ADE [\(DG U BC)] = BF U D (already in T2)

DG U [ABC [\(BF U A)] = DG U AB (already in T2)

DG U [ABC 0 (BF U D)] = DG U B (put in T2 and delete any
superset)

DG U [ABC [1(BF U E)] = DG U B (already in T2 from previous
step)

DG

[ABC FI(CE U B)] DG U BC (superset, do not put in T2)

CE

[BFG FI(BF u A)] CE u BF (superset, do not put in T2)

CE U BF (Superset, do not put in T2)

CE [BFG FI(BF u B)]

U

U

CE u [BFG rI(BF u D)]

U CE U BF (superset, do not put in T2)
U

CE

[BFG 0 (DG U B)] CE U BG (superset, do not put in T2)

Therefore, T2 holds;

T2: BF U A; BF U D; BF U E; CE U B; DG U B.

Again, applying Sd-Sf of the algorithm produces no new entries
in T2; so proceed to step 6 to get:

T2: BF u A; BF u D; BF u E; CE U B; DG u B; ABC; BCD.
Finally, applying step 7 gives all keys:

T2: BF u A; BF u D; BF U E; CE u B; DG u B; ABC; BCD.

2.6 Chapter Summaryiand Remarks
A new algorithm for finding all keys for relational data
bases has been presented. It is clear, from the examples given in
Section 2.5, that, at least by hand computation, Algorithm Al seems
easier to apply than the algorithm suggested by Delobel and Casey.
The keys can be studied within the framework of the implica-

tion matrix which is a familiar algebraic approach. It is important

33

to note that Algorithm Al is not written in a ready to program
notation, and so care must be taken in choosing the apprOpriate
notation for actual programming. Also, Optimization is possible
and the examples in Section 2.5 Should provide some help in this
direction.

Finally, Algorithm Al has not been programmed, and so no
eXperimental comparisons have been made between the approach taken

here and that of Delobel and Casey.

CHAPTER 3

FUNCTIONAL PARTITION AND REDIIITION

3.1 Introduction

 

In this chapter, computational savings are considered which
lead to new concepts of functional partition and reduction. Also,
detailed mathematical analysis of prime and non-prime attributes is
given, and conditions are given under which one can delete a functional
relation without changing the keys.

Functional partition provides a way of finding the keys of a
relation R, defined in Q, from the projections of R into subre-
lations defined on Subsets of O. This, in turn, implies that
transitive closures are taken in submatrices of the implication
matrix, P, and so, storage requirements are minimized.

Functional reduction is a direct consequence of the mathe-
matical properties of prime and non-prime attributes which are also
discussed in this chapter. It is shown that the keys of a relation
R defined in O depend on the given functional relations in R
and on a subset, not necessarily prOper, of 0, while in Chapter 2,
it was always the case that all the attributes in 0 were used, in
Algorithm Al, for finding the keys.

This chapter also Shows that functional relations of the form
L —’Li can be deleted without altering the keys.

1

34

35

3.2 Functional Partition

Algorithm Al in Chapter 2 showed that the keys of a relation
R are completely determined by the original set of functional re-
lations as Specified by the data base administrator or the user.
If this set is large, then the number of computational steps for
finding the keys is enormous, and so the question is: Can one find
the keys of R from subsets of the original set of functional re-
lations and, if so, how does one define these subsets? This section
presents such Subsets provided by the binary operation (e0, defined
on the set of functional relations, that partitions this set into
equivalence classes.

Let R be a relation defined in O, and let the set of func-

tional relations in R be denoted by {F1} = {FiILi «IR i = 1,2,...,m].

i,
This chapter assumes, unless otherwise Specified, that
m

0 = U (Li U R1) and this assumption will be substantiated later.
i=1
Definition 3.2.1

Two functional relations F and F in {F1} are called

 

i J
adjacent, written as Fi ~ Fj’ if (Li U R1) [1(Lj U R1) # O:
Otherwise, F1 and Fj are called non-adjacent and denoted as
F,1‘F..
1 J

Definition 3.2.2

F1 and Fj in {F1} are connected, written as F1 Rst,

if at least one of the following conditions holds:
1) Fi'vFj;
ii) There exists at least one Subcollection {FL ] C {Pi}
i

such that F ~'F , F "'F ,...,F “'F .
i L1 Li Li+l Li+k J

36

Theorem 3.2.1
The connected relation 0%) is an equivalence relation on
{Fi]°
gases
1. 4% is reflexive
It is always true that Fi ~ Fi since Li U Ri # O;
hence, (Li U Ri) [\(Li U Ri) # O. Therefore, by Defini-
tion 3.2.2, Fi R’ F .

i

2. 5% is symmetric

If F1N Fj’ and if Fi ~ Fj’ then Fj ~ Fi; hence,
If Fi‘w Fj’ and if there exists a Subcollection
{F j C {F ] such that F ~ F , F ~ F ,...,F ~ F,,
Li 1 1 Li Li Li+1 Li+k 3
then reversing the step gives
F~F ,...,F ~F ,F ~F;hence,Fe5F..
J {1+k Li+l Li Li i j i

3. Fe is transitive
If Fiﬁw Fj’ and if Fjie Fk’ then there are two sub-
collections {F j and {F ], subsets in {F,], each
Li Lj 1

possibly with one element in it, such that

F . ~ F , F ~ F , o o o , F ~ F and
1 Li 41 (1+1 Li+k 3
F, ~ F , F ~ F ,..., F ~ F . But, this
. k
implies, by Definition 3.2.2, that Piss Fk.

Therefore, the binary operation,‘¥, partitions {F1} into

equivalence classes such that F1 and Fj are in the same class if

and only if (iff) Fi‘a Fj. The equivalence classes will be denoted

by Fi where ‘F: is the set of all functional relations connected

37

to Fi' An equivalence class can be denoted by any of its members.

Definition 3.2.3

If for each j # i, Fi¢$ Fj’ then iv induces the identity
partition on {F1}. i.e. Each functional relation is in a
class by itself.

Definition 3.2.4

If for all i and all j, Fiew Fj’ then R! induces the
universal partition on {Fi}' i.e. A11 functional relations
are in only one class.

Lemma 3.2.1

 

A partition on {F1} induces a partition on 0.
Proof:
If Fi and Fj are in different classes, then it follows

that either Fi i Fj or that there exists no PR in F1

such that Fi ~ Fk' Therefore, let C1 be the set of attributes
in El; then Ci FICj = O for i i j, and U C1 = 0. Hence,

[Ci] is a partition on O.

3.3 Procedures for Obtaining Functional Partition

Section 3.2 showed that the binary operation GU) is an
equivalence relation on the original set of functional relations.
This section presents both a graphical method, suitable for hand
computation, and an algebraic method, suitable for hand computation
and computer implementation, for producing this partition.

3.3.1 Graphical Method

F.

Given {F1}. Consider m points labeled F1,F2,..., m,

reSpectively. For each i and each j, if Fi ~ F , then draw a

J

38

line or arc between F1 and F , otherwise do not join F1 and Fj.

J
The number of connected subgraphs that results, is the number of
equivalence classes in {F1}; Fi and Fj are in the same class iff
they are in the same connected subgraph.

Whenever each point F1 is an isolated point, then the
result is the identity partition. However, if all the points F1
are on the same connected graph, then one gets the universal parti-

tion.

Example 3.3.1 .1

 

Let Q = ABCDEFGHKLM; Consider the following functional
relations:

F 0 ....) ’ ° -D . F
1. AB C, F2 3. BD FG,

Note that Elie F3 and Elie F4 since

(AB u C) FI(BD u FG) # ¢> and (AB u C) rI(AB u C) i a

. .4 °F ' —-I'F° —-I.
. H KL, 4. AB C, 5. KM L

reSpectively. Similarly, F2 st5. Applying the graphical method

gives the following two connected subgraphs G1 and G2.

 

Therefore, there are two equivalence classes

F1 = [F1,F3,F4] and F2 = {F2,FS] that correSpond to G and G

1 2
reSpectively. Note that F3¢¥ F4 although there is no line joining
F3 and F4. However, the subgraph G1 is connected and hence
F1,F3 and F4 are in the same equivalence class. Also, the parti-
tion on {F1} induces a partition on 0, namely, if 01 is taken
to be the set of attributes mentioned in F1,F3 and F4, then
01 = ABCDEFG and 02 = HKLM which are mentioned in F2 and F5;

0102 is a partition on Q.

39

3.3.2 Alggbraic Method

 

Given {F1}, form the implication matrix, P, Such that each

row, P corresponds to a functional relation F . Form the direct

i’ i
incidence matrix, P', from P by letting each entry in P' be the
logical vector inner product of rows of P; P' = [p£,] where
#(C) J
I = I = = . .
pij pji (Pi’Pj) and (Pi’Pj) kil pik pjk Where is
logical product and E is logical sum. P' is a symmetric matrix

of 0's and 1'8 with all l's along the diagonal and whose rows and
columns are labeled F1,F

*
distinct row of the transitive closure (P') of P' define a

2,...,Fm respectively. The ones in each

correSponding equivalence class under Ra
Example 3.3.2.1
Applying the algebraic method to the functional relations
in Example 3.3.1.1 gives:
A. B C D E F G H K. L M

F 1 1 1 0 0 0 0 0 0 0 0

F200000001110
P=F301010110000
FalOlOIOOOOOO
F500000000111
F1F2F3F4FS
F11011O
F201001
(F')*=F3loiio

Therefore, F1 = {F1,F3aF4I and F2 = {F2:F5]-

40

With suitable rearrangements of columns and rows in P,
the implication matrix is a rectangular matrix having non-zero
rectangular submatrices along its main diagonal with the remaining
elements equal to zero.

The following section will Show that the keys of R can be
determined from the non-zero submatrices in the implication matrix,
and so storage requirements are reduced because the zero submatrices

would not have to be stored.

3.4 Properties of Functional Partition

A partition on {F1} induces a partition on 0. Let
0102 ... an be such a partition. The relation R, defined in O,

can be projected into subrelations Si’ defined in 01, i = 1,2,...,n,

L1 j —oRi j be the set of functional relations

Li a Ri in R that hold in Sj. The following lemma and theorem

reSpectively. Let

Show that the keys of R can be obtained from the keys of Si’

i = 1,2,...,n.

Lemma 3.4.1

 

Let Li j 4 R1 j be the set of functional relations in

Sj. If Li —IRi is a functional relation in R where

j’ then there exists Ki,j C'L1

such that K. . C Q and K a R .
1’3 j 1:3 1L]

Li c Q .and R1 j = Ri : Q

P_r_9_f.

The proof follows from the fundamental properties, Pl-PS,
of the functional relations. The applications of reflexivity,
transitivity, projectivity, and additivity on the functional
relations in S result only in functional relations whose

J
attributes belong to Qj. However, applying the augmentation

41

prOperty with attributes not belonging to DJ results in a
functional relation L1 «iRi in R, where Li C Q and

Ri j = Ri C Qj. Therefore, there must exist a subset

Ki,j Cﬂj Such that K1,] #Ri’j.
The following theorem is the main result of this section.
It shows that the keys of R can be determined from subsets of the
original set of functional relations. Consequently, the major
goal of functional partition, as stated at the beginning of Section
3.2, is achieved.
Theorem 3.4.1
If 0 = 0102 ... on, and if K C Q, then the necessary and
sufficient conditions for K to be a key of R are
K = U Ki where Ki # O and K1 C Di such that K1 is a
key of Si°

Proof:

Necessapy condition: Let K be a key of R. Since

0102 ... an is a partition on 0,
K = (K n 01) U (K n 02) U...U (K h on). Let Ki = K O 01.

One must show that, for each i, Ki # ¢, and that Ki is the

key of 51' Suppose that, for some 1, K1 = On then

" =K= - —. .

Ki K1UK2U UK1_1JK1+1U UKn O 01 But
K] has no subset that belongs to 01, and so, it contradicts

1
Lemma 3.4.1. Hence, for each 1, K1 1‘ O. Therefore,

K = K1 U K2 U...U K.n a Q a mi and so, by Lemma 3.4.1, there

must exist a subset Mi C K such that M1 C Oi and MR

But Ki C K is the only subset in K that belongs to 0i;

hence, Mi C K

#01.

i and Mi 4 Oi. Therefore,

42

M = M1 U M2 U...U M5 4 Q by additivity, and so M C K which
contradicts the hypothesis that K is a key of R. unless
M1 = Ki' Therefore, K1 i'

Sufficient condition: If, for each i, K

is a key of S

C 01 is a key

 

i
of Si’ then K = U K114 0 by additivity. It remains only to
Show that there is no prOper subset of K that implies 0.
Suppose M<: K and M14 Q, then by the same procedure as
above, M = U Mi where M1 C K1 and M1 i O. Again applying
the same reasoning as above gives Mi a 01. But, Mi A Di
contradicts the fact that K1 is a key of Si’ unless

M1 = Ki. Therefore K.= U Ki is a key of R.

Corollary 3.4.1.1

 

If the connected functional relations induce the identity
partition on {F1}, then K = U L1 is the only key of R,
.P_r___f

The proof follows from Theorem 3.4.1 where each
Fi: Li “’Ri’ i = 1,2,...,m, is in an equivalence class by
itself. Let Oi = (Li U R1), and project R. into Si[Qi].

L1 is the only key of 81’ Since there is only one functional
relation in each Si; hence, U Li is the only key of R
defined in Q = (1102 (1m.

Corollary 3.4.1.2

 

If no functional relations are given in R defined in Q,
then 0 is the only key of R.
ii_i

Let Q = AlAZ ... An' Since no functional relations are
given in R, then it follows from the reflexive prOperty that

.43

F1: A1 4 A1, i = 1,2,...,n, are the only given functional re-
lations in Rs Therefore there exists an identity partition
on {F1} and, by Corollary 3.4.1.1, the only key of R is

K = U A1 = Q.

If A and B are sets, then the set difference is defined
as A - B = {x‘x 6 A and x f B]. The following corollary shows
that every attribute in R, that is not mentioned in any functional
relation in R, belongs to every key of R.

Corollagy 3.4.1.3
Let R. be a relation in Q, and suppose that the functional

relations in R are defined in Q <: 0. Let Sl[01] be the

1
projection of R in 01. If K1 is a key of 81, then
K = K1 U (Q - 01) is a key of R.
iisci:

Let SZ[Q - 91] be the projection of R in (Q - 01).

It is clear that no functional relations are given in 82,
except those by reflexivity. Therefore, by Corollary 3.4.1.2,
(Q - 01) is the only key of S2. Also, since 01 and

(Q - 01) are disjoint such that Q = 01 U (Q - 01), then,

by Theorem 3.4.1, K = K1 U (Q ' 01) is a key of R.

The preceding demonstrates that to find the keys of a re-
lation R, defined in 0, one needs only be concerned with the
connected functional relations in Q. So, the first step for find-
ing the keys is the possible projection of R into subrelations
Si’ i = 1,2,...,n, where, for each i, the functional relations in
Si are connected. This projection is not always possible, namely,

whenever the connected functional relations induce the universal

partition on {F,].
i

44

Therefore, without loss of generality, one can consider a
relation R, defined in 0, such that the functional relations in
R are all connected, and that Q = U(Li U R1) which substantiates
the statement made at the beginning of Section 3.2.

Example 3.4.1

Consider the functional relations given in Example 3.3.1.1.
Applying Algorithm A1 gives only one key, ABDEHM, for the
relation R defined in Q = ABCDEFGHKLM. But, the same
example Showed that {F1} can be partitioned into two equi-
valence classes that induced the partition 0102 on Q, where

Q = ABCDEFG and 02 = HKLM. Project R into Sl[01] and

1

SZ[QZ]; the functional relations in S1 are:
AB—oC;BD-+FG;AE-IC;
and the functional relations in 82 are:
H « KL; KM «iL.
Again, K1 = ABDE is the only key of 81’ and K2 = HM is

the only key of S Note that K = K1 U K2 = ABDEHM is the

2.
only key of R defined in Q = 01 U 02.

3.5 Properties of Prime and NonéPrime Attributes

Recall the definitions of prime and non-prime attributes
given in Chapter 1. It is essential to characterize the properties
of these attributes for they play an important role in determining
whether a relation is in second or third normal form to be studied
in Chapter 4. Also, they help characterize the keys which form the
basis of the normal forms and they are essential in the functional
reduction which is to be explained later. Again, let R. be the
relation in O, and let {F1} be the connected functional relations

in R such that ()=U(LiURi).

45

Lemma 3.5 .1

If an attribute Ak is not a member of any L then there

1’
exists no functional relation L a r in R such that AR 6 L

and AR E r unless there exists an L' C L such that Ak G L'
and L' a r.
2.1-ace:

If, for every i, AR 6 Li’ then AR 6 Rj, for some j. It
follows from properties P1 - P5 of functional relations that
Ak E L only by applying the augmentation property or by
applying additivity with Ak a Ak' In either case, there must
exist L' C L such that Ak f L' and L' a r.

Theorem 3.5.1

If an attribute Ak is not a member of any Li’ then Ak

is non-prime.
Proof:
Algorithm Al showed that every key of R is of the form

I
Li U Oi, for some i, where ai C Ti and IaiI 2 0.

If, for every 1, IdiI = 0, then L is a key and AR 6 Li

1
by hypothesis; hence, Ak is non-prime.

If, for some i, I01] # 0, then L1 U a1 a Q .and L1 F 0-

Therefore, there must exist a subset Z1 CLi U Ti U Q, Such

= _. ' -
that Zi a1 U [Zi H (Li U Ti)] and Z1 T1 Oi. Also

ai = Z1 n T], and a1 is the minimal subset of T] such that

Z —O ' - o
1 Ti 01 But if Ak E ai’ then AR 6 Z1 and

Ak f (T; - Oi) since Zi and T' - a are disjoint, and so

i i

I .
Zi Ti 01 contradicts Lemma 3.5.1 whenever Ak E 21 and

46

Ak i T; - a1. Therefore, AR 6 ai’ and so A.k E Li U a1,
which implies that Ak is non-prime.
Example 3.5 .1

Example 3.4.1 showed that ABDE and HM are the only keys
of S1 and $2 reSpectively. The attribute L is not a
member of any L1 in S2, and L is non-prime. Also, the
attributes C, F and G are not members of any L1 in 81’
and they are also non-prime.

However, the attribute K is a member of L2 in 82 and
yet K is non-prime. Therefore, a non-prime attribute may
appear as a member of some Li'

Corollary_3.5.l.l
If an attribute Ak is prime, then A is a member of some

k
L

i'
This statement is true because it is the contrapositive of

the statement in Theorem 3.5.1 which is true.

Corollaryr3.5.l.2

Every key K of R is a subset of U Li.

Proof:

k 1
therefore, by Corollary 3.5.1.1, Ak is a member of some Li

By definition, every attribute A n a key K is prime,

and hence K C U'Li.

Lemma 3.5.2

 

If an attribute Ak is not a member of any Ri’ then there

exists no functional relation L a r in R, such that Ak G L

and AR 6 r.

47

Proof:

If, for every 1, AR 2 R1, then AR 6 L , for some j.

J
Only, by using the additivity property with Ak d’Ak’ that
one gets Ak E r. But then AR 6 L. Therefore there exists
no functional relation L a r in R such that Ak f L and
Ak E r.

Theorem 3.5.2

 

If an attribute Ak is not a member of any Ri’ then Ak
is a member of every key of R.
isssiz

Let K.= Li U ai be a key of R and suppose that AR 2 K,
then R a Q a Ak’ by definition of a key and by projectivity.
However, this contradicts Lemma 3.5.2 and therefore Ak E K
for every key K of R.

Example 3.5.2

 

Let Q = ABCDEFGHK, and let the functional relations in
R be as follows:
AB -+CD; ABE «FG; CE aHK; AH aBFK.
The keys are: ABE, AEH and ACE. The attributes, A and
E, belong to every key, and they are the only attributes in
Q that do not belong to any Ri'

Corollaryg3.5.2.1

 

If an attribute AR 18 not a member of any R1, then Ak
is prime.
Proof:
By Theorem 3.5.2, Ak is a member of every key of R, hence,

by definition, Ak is prime.

48

Corollary 3.5.2.2
If an attribute Ak is non-prime, then Ak is a member
of some R. .
1
Proof:
This statement is true because it is the contrapositive of

the statement in Corollary 3.5.2.1 which is true.

Corollary 3.5.2.3

 

If every attribute in U R1 is prime, then every attribute
in 0 is prime.
Proof:

Let 01 = U Ri’ and let 02 = Q - 0 Every attribute

1.
Ak in 02 is not a member in 01 and so, by Corollary
3.5.2.1, every Ak is prime. But, 0 = 01 U 02 and, by
hypothesis, every attribute in 01 is prime; hence, every
member in O is prime.

Corollary_3.5.2.4

If every attribute in U Ri is prime, then 0 = U Li'
13%:

By Corollary 3.5.2.3, every attribute Ak in O is prime
and so, by Corollary 3.5.1.1, every Ak is a member of some
Li’ hence Q C U Li' But, by definition of Q, U Li C Q.
Therefore 0 = U Li.

Corollary_3.5.2.5

 

Let Q = U L' and let 02 = U R . If 0102 is a parti-

l i’ 1
tion on Q, then 01 is the only key of R.

Proof:

Since 0102 is a partition on Q, then every attribute in

49

01 is not a member of any R1 and every attribute in 02
is not a member of any L1. But, by Theorem 3.5.1, every

attribute in Q is non-prime and, by Theorem 3.5.2, every

2
attribute in 01 is a member of every key of R. Therefore,

if K is a key of R, then 01 C K but, by Corollary 3.5.1.2,
K C 01 and so K = 01 is the only key of R.
Example 3.5.3

Refer back to relation S1 and its functional relations in
Example 3.4.1. Let 01 = U L. = ABDE, and let 02 = U R1 = CFC,

1

then 0102 is a partition on D = Q

1 U 02 and O = ABDE is

l
the only key of SI.
Theorems 3.5.1 - 3.5.2 and their corollaries Show that the

keys are subsets of the attributes, U L that appear only on the

i,
left side of the original functional relations Li -.Ri' This is
a fundamental result and it is used in the following section to Show
that the columns -- in the implication matrix -- correSponding to

the attributes that belong to (U Ri) - (U Li) can be deleted

leaving exactly the same set of keys.

3.6 Functional Reduction

Let R be a relation in Q, and let {Fi: Li a R1} be the
connected functional relations in R such that 0 = U(Li U R1).

Let “1 = U Li’ and let 02 be the set of attributes in
0 that are not members of any Li’ so that 0102 is a partition
on 0. Project R into the subrelation Sl[01]. The functional
relations in 81’ called reduced functional relations, are derived
from the given functional relations in R by deleting the attributes

in 02. Note that, for every i, the functional relations in S1

50

are of the form Li —»R£ such that R; C Ri whenever 02 ¢ O
and R] = Ri only when Q2 = O, and so, by projectivity, the re-
duced functional relations in S1 still hold true.

The following theorem is a fundamental result which shows
that the keys of R depend on the functional relations in R, and
on 01, while, in Algorithm Al in Chapter 2, all the attributes in

Q were used to find the keys.

Theorem 3.6.1

 

Let K C Q, then K is a key of R if and only if K is
a key of 31’ the projection of R on U Li.
129215.:
Necessary condition: If K is a key of R, then, by defini-

tion, K a Q and, by projectivity, K a Q Moreover, since K

1.
is a key of R, then K is a minimal subset in Q Such that

K a Q. But, by Corollary 3.5.1.2, K is a Subset of Q1, and

so K is a minimal subset in Q1 such that K a Q. Therefore,
it remains to Show that K is a minimal subset in G5 such
that K a Q1. Let K1<: K, and suppose K1 4 Q1. But, by

additivity, 01 = U Li —'U R1 and so, by transitivity, K a U R

l i
and, by additivity, K1 .. (U Li) U (U R1) = 0. However, K1 4 Q
contradicts the hypothesis that K is a key. Therefore K is
also a key of 31.

Sufficient condition: If K is a key of 81’ then K 4 Q1,
and K,» Q1 ~»U Ri by definition and transitivity reSpectively.
Therefore K a (U Li) U (U R1) = Q by additivity. If there
exists a prOper Subset ch: K Such that K1 aiQ, then

K1 4 Ql by projectivity, which contradicts the hypothesis that

 

. Ill. A II) I III]... 11‘

51

K is a key of S Therefore, K is a key of R.

1.
Example 3.6.1

Refer back to Example 3.5.2. The keys of R are:

ABE, AEH and ACE.

Apply the method as prOposed in Theorem 3.6.1. Project
the relation R in 0 into Sl[Q1] where
01 = U'Li = ABCEH and Q2 = Q - 01 = DFGK. The reduced
functional relations in S1 are:

AB —»C; ABE ~»ABE (by reflexivity); CE «IH; AH a B.

s
The transitive closure, P , of the implication matrix, P, in

S1 is:
A B C E H
AB 1 1 l 0 0
*ABE11I_1_1_
P:

CE 00111
AH 11101

Again, the keys of S are: ABE, AEH, and ACE which are

l
exactly the same keys of R.

Functional Deletion

Note that in Example 3.6.1, the functional relation ABE a ABE

holds by reflexivity. The following definition, prOperty, and

lemmas will Show that one can delete a functional relation of the

form L1 —’L1 from {F1} without changing the set of keys.

Definition 3.7.1

A functional relation Fi of the form Li —ILi is called

a reflexive form. Otherwise, it is called a non-reflexive form.

 

52

PrOperty 3.7.1
If L a r is a derived functional relation from the given
[Pi], then L D‘Li for some 1, since the application of pro-

perties Pl - P5 will either unalter or augment L Therefore

1.
L D L..
1

Lemma 3.7.1

 

If Fi is a reflexive form, and if there exists at least
one j 9‘ i such that Lj CL1 in the original set of Li’
then any key that can be derived from Li’ can also be derived
from L..

J

Proof:

Let Lij = Li - Lj. If Li U 8i e n, then

(Lj u Lij) u B1 = Lj u (Lij u Bi) Lj u B a Q. So, there

J
is always at least one Bj C'Li U 81’ and hence
Lj U Bj CLi U Bi’ such that Lj U Bj a Q. Therefore,

Li U Bi can never be a key whenever Lj U Bj is unless
Li U 81 = Lj U Bj.

Lemma 3.7.2

 

If F1 is a reflexive form, and if for every j ¢ i, L ¢1Li

J

in the original set of L then any key that can be derived

1,

from L1 can also be derived from Lk’ for some k # 1.

Proof:

If, for every j # i, L OILi in the original set of Li’

J
*
then, in P , L. —vL. only. 80, T, = O, and T3 = O - L.
i i i i 1

since, by definition, Li 0 T1 = O. Hence, if Li U Bi 4 O,

for Bi C T], then there must exist Z CLi U 81 Such that

i

Z a T' d b . . Z =
i i’ an so, y Property 3 7 l, i Lk U Yk’ for some

 

53

I
k # i, and Lk U Yk « Ti. But, since Lk

Li U 81 a Q, then there always exists Bk C L1 U Bi Such

CLi U 81 and

that Lk U Bk CLi U Bi and Lk.U Bk « 0. Therefore,
Li U Bi can never be a key whenever Lka Bk is, unless

Theorem 3.7.1

 

Any functional relation in a reflexive form can be deleted
from {Fi} without changing the keys of a relation defined
in Q.
assiz

If F1 is a reflexive form, then, by Lemmas 3.7.1 and
3.7.2, every key that can be derived from Li can also be

derived from L for some j # 1. Therefore, one needs only

j,
show that if, for j # i, Lj U dj is a key, then Fi need

not be used to find “j

If Lj U Bj a Q, B, 2 a then there must exist
J

j,
Z, C L. U T, U _ such that Z,H T'. Hence, if Z = L. U .,
J J J Bj J j j 1 Y1

I I = d _. I
then, for Y1 C Lj U Tj’ Li U Yi U Yi Li U 8i Q Tj such

= I = I
that Bj Tj FI(Li U Y1) Tj Q (Li U Bi)’ and so, by Lemmas

3.7.1 and 3.7.2, there always exists Lk U Bk CL1 U Bi’ for

-o —o '
some k # i, Such that Lk U 8k 0 Tj' Hence, if Lj

is a key, then, by Algorithm Al, Li U 81 need not be used

U aj

since it is a Superset. Therefore, F1 can be deleted from

the original set of functional relations without changing the

keys of a relation defined in Q.

3.7.1

54

Examples

Example 3.7.1.1

 

Consider the reduced functional relations in Example 3.6.1.
Deleting the functional relation ABE 4.ABE gives the new
set of reduced functional relations:
AB-+C;CE—oH;AH—oB .
Again, the keys are: ABE, AEH, and ACE.

Example 3.7.1.2

 

Refer back to relation S1 and its functional relations

in Example 3.4.1. Projecting S into Si[ABDE] gives the

1

following reduced functional relations in Si:

AB-oAB;BD-+BD;AE-+AE

All the reduced functional relations are reflexive forms;

so, they can be deleted. Note that ' = ABDE in Si and
there are no non-reflexive form functional relations in Si,
so, by Corollary 3.4.1.2, Qi is the only key of Si. This

justifies the answers in Examples 3.4.1 and 3.5.3.

Example 3.7.1.3

 

Refer back to relation 82 and its functional relations

in Example 3.4.1. It was shown that K2 = HM is the only key

of 82. Projecting S into SéEHKM] gives the following

2

reduced functional relations in Si:

F : H a K ; F2: KM 4 KM.

Note that F2 is a reflexive form and so it can be deleted.

Therefore, F is the only functional relation left in S'

l 2

and so HM is the only key of S', and hence of 82.

55

2: KM « KM in Si is deleted, the

keys must still be found in Qé = HKM, although not all

Observe that when F

attributes in Qé appear in the reduced functional relations

after the reflexive forms have been deleted.

3.8 Chapter Summary and Remarks

This chapter dealt mainly with computational savings. It
was shown that if {Fit Li 4 R1] is the set of given functional
relations in R, then the first step for finding the keys is the
partition of {F1} into classes of connected functional relations.

Moreover, it was shown that the keys depend only on the
functional relations and the attributes in U Li. Extensive com-
putational savings can be achieved whenever U Li CIQ, where Q
is the set of all attributes in R. Also, further savings are
possible whenever a functional relation is in a reflexive form.

Finally, since the keys of R are determined from Sub—
relations of R, defined on a partition of Q, then adding or
deleting attributes to a subset of this partition will only affect
the keys of the correSponding subrelation, while the keys of the
other subrelations remain unchanged. Therefore, the concepts of
functional partition, reduction, and deletion help provide an
answer to the question of the effect on the keys of adding or

deleting attributes in R.

CHAPTER 4

NORMAL. FORMS

4.1 Introduction

 

Chapter 4 deals mainly with relations in second and third
normal forms. The definition of the relational model does not
specify procedures for determining whether a normalized relation
is in second or third normal form. This chapter provides such
procedures, as a result of analyzing in detail the mathematical

properties of a relation in second and third normal forms.

4.2 Properties of the Second Normal Form (SNF)
This section characterizes the mathematical properties of
a relation in SNF for the purpose of gaining insight into ways of

determining whether a relation is in SNF.

4.2.1 Mathematical Properties
Let R be a relation in Q. The functional relations,
F,: L. a R., i = 1,2,...,m, in R are all connected and

Q = U(Li U Ri)' Let Q, = U Li’ and let Q2 = U R1. It is assumed

1

that {F1} has more than one element. Let P and Q0 be the

0

set of prime and non-prime attributes in R reSpectively. Also,
let p and q be arbitrary elements in ‘P0 and Q0 reSpectively.

If every attribute in 02 is prime, then R is in SNF,

because, by Corollary 3.5.2.3, P = Q; hence, QQ = O. However,

0
SNF'S exist for which QQ ¥ O.
56

57

Further, it is easily shown that, if K is a key, then no
non-empty prOper Subset of K is dependent on any other distinct
non-empty proper subset of K, for otherwise a proper subset of K
would imply Q and K, would not be a key. Consequently, if Q1
is the only key of R, then Q0 = 02 i O, and so R is not in SNF,
because Li._'Ri’ for i = 1,2,...,m, are given where Li<: Q1
and R1 C Q0.

The preceding statements suggest the following two important

theorems.

Theorem 4.2.1.1

 

 

If QlQ2 is a partition on Q, then R is not in SNF.
1319133

By Corollary 3.5.2.5, Q1 is the only key of R, and so
R is not in SNF.

Theorem 4.2.1.2

 

If, for some i, Li U ”i’ IaiI 2 l, is the only key of
R, and if QQ # ¢, then R is not in SNF.

Erect:

PQ =Li U a1, Li c:Li U a1, and L1 F’Ri is given. If
Ri C'Li U a1, then Li «1R1 would imply that a proper Subset
of a key is dependent on another distinct proper Subset of
the same key, which is impossible. Therefore R1 C Q“, and
SO R is not in SNF.
As a consequence to the definition of a relation in SNF,
the following property is true.

Pryerty 4 .2 .l .1

If every key of R consists of only one attribute, then

58

R is in SNF, for none of the keys has a non-empty prOper
Subset upon which a non-prime attribute could be dependent.
Algorithm A1 showed that every key is of the form
Li U ai’ I01] 2 0. So, if Li U 01 is a key, and if, for j # i,
there exists an Lj CZLi in the original set of Li’ then
L. U a. = L. U ai is also the same key. For the remainder of

J J 1
this chapter, the following notation is used. If L U Oj is a

J
key, then, for every i # j, Li diLj in the original set of Li.

Lemma 4.2.1.1

If Li a R and if there exists no L CZL in the

i’ j 1
original set of Li’ then, for any L1 CLi and L2 C Q,
‘1 C ‘2'
M:
If L1 n L2, then by Property 3.7.1, L1 DTLk, for some

k, and so L C L1 CIL , which contradicts the hypothesis

k i

that there exists no Lk CZLi in the original set of Li'
Therefore L1 f L2.
Consequently, the following theorems lead to Special cases
in the algorithm for determining whether a relation is in SNF.

Theorem 4.2.1.3

 

If every key of R is one of the original Li’ then R
is in SNF.
Proof:

If L1 is a key, then, by the construction of Li and

by Lemma 4.2.1.1, for any L1 CZL and L2 C Q, Ll flbz.

1
So, if q 6 Q0, then choose L2 = q, and so L1 f q. There-

fore, R is in SNF.

 

59

Theorem,4.2.l.4

 

If every key of R is of the form ‘Li U a1, IaiI g_l, and
if, for every j Such that Lj CZL1 U ai, Lj f q, then R is
in SNF.

P_.___f.

If LClLi U ai’ and if L a r is a derived functional
relation from the given set of functional relations, then,
by Property 3.7.1, L D'L

for some k, and L CLCLan/i.

k’ k

But Li U a1 is a key and so, by the construction of a key,

i or L = Lk’ for

some k # i, and so, if q 6 Q0 .and q é r, then R is in

 

Lk<tLi,hence LQfLi. Therefore,L=L

SNF.
The following theorem and corollary are fundamental in
determining whether a relation is in SNF.

Theorem 4.2.1.5

 

To determine whether a relation, R, is in SNF, one needs

only consider the functional relations, L # Ti’ after form-

1
ing the transitive closure of the implication matrix P, Such
that Li C Pa.
33192:

If R is not in SNF, then there exists at least one key,
say Li U ai’ such that LCZLi U a1 and L a q, where

q 6 Q0. 30, by PrOperty 3.7.1, L = L U 8 , for some j.

J J
But L C'PQ, since L is a subset of a key, and so
L. g C P O
J L 0
Let tj be the set of all attributes in Q Such that

L = Lj U Bj ” tj' To find tj’ the transitive closure is

60

formed on Lj U (a and the original set of functional rela-
tions in R. Denote the transitive closure by the successive

ste S, L. , a t , L U , a t ,,...,L U a t , = t
p JUBJ B] 2] B

1J J J J n] J.
If, for some k, Lk C PD, and if, for some i,

L C‘L U , U t,,, then L C'L. U 8,, Since L U B C P ,
kJBJlJ kJJ iii)

and so tij [ILk # O; hence, there must exist q 6 Q0 such

that q E (tij FILk). Therefore, if Lk C P then, for some

0 ,
i, L C‘L, U 8,IJ t,,, unless = L. U . a , but then R
R J J 13 I“ J 8J q

would not be in SNF, Since L a q, and so Lk is not needed.

 

The following corollary provides a Sufficient condition for a
relation to be in SNF. However, Example 4.3.2.1 will Show a relation

which is in SNF but which does not satisfy this sufficient condition.

Corollary 4.2.1.5.1

 

Let K be an arbitrary key of R. If, for every i Such

that Li C P and Li # K,‘Li f»q, then R is in SNF.

0

Proof:
To determine whether a relation, R, is in SNF, then by
Theorem 4.2.1.5, one considers only the functional relations,

*
Li H T1 in P , Such that Li C'PO. By the hypothesis of
the corollary, if Li C‘PQ, then Li f q unless Li is a

key. But, if L1 is a key, then, by the construction of a
key, for every j ¥ i, Lj C Li. Also, for i # k, if Li
and Lk.U ak’ Auk] 2 l, are keys of R, and if L = Lk U Bk,

for Bk C a then Li ¢ Lk U Tk U Bk’ for otherWise Lk U Bk

k,

would imply Q, since L is a key, and Lk U Gk would not

i
be a key. Therefore, if Lk U Bk a q, then, by taking the

transitive closure on L U Bk and L, a T, such that
k i i

61

Li C‘Pﬂ, there must exist L C P for j # k and L f K,

J 0’
such that Lj ~rq. However, Lj a q, for 'L

J

# K, contradicts

J
the hypothesis, and so Lk U Bk f q. Therefore R is in SNF.

4.2.2 Examples and Remarks

 

This section provides examples and states remarks that are
prOperties of relations in SNF. It is always assumed that
Q = U(Li U Ri) where Li —IRi, i = 1,2,...,m, are the given func-
tional relations in R.

Example 4.2.2.1

 

ﬂ-HhCEdD;MD«B.
The keys are: AB U E; AD U E; CE U A. Therefore, PQ = Q
and Q0 = O. So, by definition, R is in SNF.

Example 4.2.2.2

 

AB «CDE; AE «BFG.
The keys are: AB; AE. Therefore,‘PO = ABE and Q0 = CDFG.
So, by Theorem 4.2.1.3, R is in SNF.
The two previous examples Show that if R is in SNF, then
Q may or may not be equal to U Li. Also, if R is in SNF, then
it is not necessarily true that every attribute in U R1 is prime.

Example 4.2.2.3

 

AB 41D; D «.A; C 4 AD.

The key is C U B. Therefore, P = BC and Qﬂ = AD, So,

0
by Theorem 4.2.1.4, R is not in SNF.

In this example, Q U Li and R is not in SNF, while

in Example 4.2.2.1, Q U Li and R was shown to be in SNF.

Therefore, if Q = U Li’ then R may or may not be in SNF.

62

Example 4.2.2.4

AB-oCD;AC «BEF; DF-oAE;BE—IDC.

The keys are: AB; AC; BE U F; DF U B; DF U C. Therefore,
P0 = Q and Q0 = ¢. So, by definition, R is in SNF.
Example 4.2.2.5

 

AB-aCD;AE—oFG.
The key is: AB U E or AE U B. Therefore, PQ = ABE
and QQ = CDFG. So, by Theorem 4.2.1.4, R is not in SNF.
By Examples 4.2.2.4 and 4.2.2.5, if every attribute in
U L. is prime, then R may or may not be in SNF.

Example 4.2.2.6

 

A.-IB; B —+C.

The key is: A. Therefore, by Theorem 4.2.1.3, R is in
SNF.

By Examples 4.2.2.4 and 4.2.2.6, one sees that if R is

in SNF, then an attribute in U Li may or may not be prime.

4.3 Procedure for SNF

 

Section 4.3.1 presents an algorithm (Algorithm A2) which

determines whether a relation is in SNF and Section 4.3.2 gives

examples. Algorithm A2 uses the notation which is introduced next.

Let LP = [L1ILi C P0 and Li —oRi is a given functional

relation], and let {C = {LiILi U a1 is a key and IdiI 2 l]
which implies that L C L . Also, let IL I = n , and let

c p P P
ILCI = nc. ConSider {Fi: Li a Ri] such that Li 6 LP, and re-
number them, if necessary, F1,F2,...,F , np g;m where m is the

n

total number of elements in the original set, {F1}, of functional

 

63

relations in R. That is, if Lk E LP, then np+l S.k 5_m. Also,

if Li 6 LE, and if L 6 (L1) - Lt), then i < j.

J

In Algorithm A2, the transitive closures are taken over

*
{F1}, 1 = 1,2,...,np, except when forming P , they are taken over
[F1], i = 1,2,...,m, in R. Moreover, since more than one key,
Li U a1, might be derived from the same Li’ every statement in

Algorithm A2 that includes ai must be executed once for each
distinct key, Li U ai’ derived from the same Li'

4.3.1 Algorithm A2

 

SNF e-TRUE

Apply Algorithm Al to get the keys then find Q0, PO, LP,

and L .

c
_f_ QQ = ¢ v LC=¢ then terminate the algorithm.
ile i < n do

_ c ._

§

 

If. Li a q A q 6 Q0 then SNF w-FALSE, terminate the
algorithm.

nd

(D

If for every 1 such that Li U 01 is a key, IaiI g_l then
terminate the algorithm.

While 1 _<_nc do

While jgnp do

I_f_ (j#i)/\(L c: Liuriuoimmija. rhi#¢)/\(oij#oi)

J J
then form the transitive closure on L1 U aij'
I£_ L1 U aij a q then SNF e-FALSE, terminate
the algorithm.

end

64

4.3.2 Examples

Example 4.3.2.1
AB-oCD;CE—IAF;DF-»BE;BCF—ox.
The keys are: AB U E; AB U F; CE U B; CE U D; DF U A;
DF U C. Therefore, P0 = ABCDEF, QQ = X, Lp = {AB,CE,DF,BCF],
and LC = {AB,CE,DF}.
By Theorem 4.2.1.4, the relation is in SNF and this result
can also be easily checked by Algorithm A2.
Example 4.3.2.2
ABC -+DEG; AB-rCF;CD-oE;EG-+AC;BD—oF.

The keys are: AB; EG U B; CD U BG. Therefore,

'11
II

ABCDEG, Qn = F, LP = [ABC,AB,CD,BC,BD} and

C).
II

{CD,EG,BD].
Note that BD 6 LC and BD 4 F, F 6 Q9; hence, the rela-
tion is not in SNF.
Example 4.3.2.3
AB —ICD; CEY 4 AF; DFY -IBE; BCF « X.

The keys are: AB U EY; AB U FY; CEY U B; CEY U D; DFY U A;

DFY u C. Therefore P.) = ABCDEFY, QQ = x, LP = {AB,CBY,DFY,BCF},

and L = [AB,CEY,DFY].
c
Applying the last dp_statement in Algorithm A2 gives:
BCF €Lp,BCFCABUCDUFY,aij =BCF QFY=FII=O, and
FI‘FY.
Forming the transitive closure on AB U F gives:

AB U P —oX, X 6 Q0. But ABFY is a key and ABF CZABFY.

Therefore, the relation is not in SNF.

   
  
 

E’ .

f,

{3

I L. .; smash

65

4.4 Properties of the Third Normal Form (TNF)

Section 4.4.1 characterizes the mathematical prOperties of
a relation in TNF, and Section 4.4.2 gives examples to support the
theory; this leads to Algorithm A3 in Section 4.5 for determining

whether a relation in SNF is also in TNF.

4.4.1 Mathematical Properties

This section begins by identifying a fundamental prOperty
of keys of a relation, whose consequence is a necessary and suf-
ficient condition, in Theorem 4.4.1.2, for a relation which is in
second normal form to also be in third normal form.

Theorem 4.4.1.1

Any two keys, K1 and K2, of R, are fully dependent on
each other. i.e. K1 a K2.
iicioiz

K1 4 K2 since K1 is a key. If K1 $ K2, then there

must exist a proper Subset 81‘: K1

then S1 4 K2 4 Q, by transitivity and because K2 is a key.

such that S1 a K2, but

However, S1 a Q contradicts the hypothesis that K1 is a key.
Therefore, K1 a K2.
Example 4.4.1 .1

Refer back to Example 4.2.2.1. All the keys are fully
dependent on each other. Also, by definition, a relation in
TNF is also in SNF; so, every non-prime attribute is fully
dependent on each key. R is in TNF since QQ = O.

In this example, ABE and ADE are keys and AD 4 B; the prime

attribute B is not fully dependent on the key ADE. So, a prime

attribute of a relation in TNF might not be fully dependent on each

key. This proved to be a weakness in the definitions of SNF and TNF,

66

which led Kent [48] to give alternative definitions for SNF and

TNF. This point will be discussed in more detail in Chapter 5.

Example 4.4.1.2
AB—ICD;BE-+AF; DE-IBC;BD—+A.
The keys are: BE; DE. Therefore P0 = BDE and Q0 = ACF.
Note that BE e'DE. By Theorem 4.2.1.3, R is in SNF. Later,
it can be shown that R is also in TNF.
The thing to observe here is that ED is not disjoint
from the keys BE and DE, yet BD «»A and A is non-prime. This
will prove to be a weakness in the definitions of TNF, as proposed
by Codd and Kent; Chapter 5 analyzes this point more carefully.
The preceding theorem and examples, in this section, suggest
the following important lemma.
Lemma 4.4.1.1
If R is in TNF, then every non-prime attribute is dependent
on itself, on the keys and on subsets that always have a non-
empty intersection with the set of all keys of R.
area:
Let q1 and q2 be arbitrary non-prime attributes in Q0.
If K is a key, then, by definition, K a q1 and K a q2.
But, R is also in SNF; hence, K = q1 and K.= q2. Also,
q1 a ql’ by reflexivity. If q1 a q2, then q1,q2 and K
are disjoint and K a q1 a q2’ by transitivity. However, this
contradicts the hypothesis that R is in TNF. Therefore
q1 ﬁ»q2. Finally, Example 4.4.1.2 showed that there may exist

a Subset, S, that has a non-empty intersection with the set

67

of all keys of R, such that S a q1 or S « q2' Any other
dependence contradicts the definition of TNF.
Recall, in Chapter 1, the definition of a relation in TNF.
The definition of transitive dependence allows a relation, R, in
TNF to have multiple keys, because if K1 and K2 are any two dis-

joint keys of R, then, for any q 6 Q0, K « K2 a q. But,

1

K2 4 K1. So, q is non-transitively dependent on K1. Consequently,
the following important theorem gives necessary and sufficient con-

ditions for a relation in SNF to also be in TNF.

Theorem 4.4.1.2

 

A relation, R, in SNF is in TNF if and only if, for any
two disjoint non-empty proper subsets, S1 and 82 in Qq,

A
S1 . $2 and $2 f>81.
P_r_9_0_f_=
Necessary condition: If R is in TNF, then, by definition,

every non-prime attribute in R is non-transitively dependent

on each key K of R. If S a 32’ then K a S a 32’ by

l l
transitivity and the fact that K is a key. But K, S1 and
82 are mutually disjoint; so, K 4 S1 4'82 contradicts the

hypothesis that R is in TNF. Therefore, 81 A 82 and

similarly 82 f 81'

Sufficient condition: If 81 f 82, and if 82 f 81, then, as

in Lemma 4.4.1.1, S1 and $2

on keys, and on subsets that always have a non-empty inter-

are only dependent on themselves,

section with the set of all keys of R. However, in all these
cases, it is easily Shown, by the definition of transitive

dependence, that S1 and 82 are non-transitively dependent

68

on each key of R. Therefore, R is in TNF.
Corollary 4.4.1.2.1

If every attribute in U R1 is prime, then R is in TNF.
P_r_2i

If U Ri C P then, by Corollary 3.5.2.3, P9 = Q and

0’
Q0 = O. Therefore, since there are no non-prime attributes
in R, then, by definition, R is in TNF.

Corollary 4.4.1.2.2

If \QOA = 1, then a relation, R, in SNF is also in TNF.

Proof:

1r IQ

Q] 1, then there are no disjoint proper subsets

in Q0, and so, by Theorem 4.4.1.2, R is in TNF.

O

The following theorem is very important; it is basic to the
proof of the fundamental result in this section. The result is
another set of necessary and sufficient conditions for a relation
in SNF to also be in TNF.

Theorem 4.4.1.3
If S C Q0, then, for every p E P , S f p.

0

Proof:
Suppose S a p. But, p is prime; so, it is a member of

at least one key, say, Li U ai’ IaiI 2 0. Let L = (Li U oi) ' P:

then LCZLi U ai and L is not a key, since L U a. is.

i 1
Hence, S U L a p U L,= Li U 01 a Q, by additivity and trans-

itivity respectively. But S U L,4 Q, and S U L is not a

Superset of any key, since S FIFO = O. Therefore, S U L

is a key and so 8 C PO. But, this contradicts the hypothesis

that S C Q0, and so S 7» p.

69

The following result is fundamental and, as Section 4.5
will Show, it simplifies greatly the algorithm for determining
whether a relation in SNF is also in TNF.

Theorem 4.4.1.4

 

A relation, R, in SNF, is also in TNF if and only if, for
every 1, Li FIPQ # O.

Proof:

Necessary condition: Suppose the relation, R, is in TNF.
If, for some 1, Li FIPO = O, then Li C Q0; so, by Theorem
4.4.1.3, Li f p for every p 6 P0. Hence, for some i,

Li U Ri C.QO such that L1 —'Ri is one of the original
functional relations in R, for otherwise Ri FIFO # O;

so, for some p E R1 QPO,Li a R a p by projectivity and

i

transitivity reSpectively. However, Li a p contradicts

.4. o . o c . 3
Theorem 4 1 3, Since Li C QO Therefore, Ri Q0, so
Li U Ri C Q0. But, by Theorem 4.4.1.2, R would not be in
TNF, Since Li —IRi, Li QRi = O, and Li U Ri C Q0, which
contradicts the hypothesis that R is in TNF. Therefore,
for every i, Li FIP0 # O.

Sufficient condition: Suppose, for every i, L1 TIPQ # O.

If IQQI 2 2, then let S1 and 82 be any two disjoint non-

empty prOper subsets in Q0. If S1 4 82, then, by PrOperty

3.7.1, and for some i, S D'Li. Hence, Li C S1 C Q0; so

1

‘Li H PQ = O, Since PO 0 Q0 = O. But, Li (IF = O is a con-

0
tradiction since, by hypothesis, Li FIFO # O for every i.

Therefore, S1 f 82; so, by Theorem 4.4.1.2, R is also in TNF.

70

The previous theorem fully characterizes a relation in TNF.
However, it will be easier, in Algorithm A3 in Section 4.5, to use
the contrapositive form of Theorem 4.4.1.4 to determine whether a
relation in SNF is also in TNF. The following important corollary
gives this contrapositive form.

Corollaryi4.4.l.4.1

 

A relation, R, in SNF, is not in TNF if and only if, for
some i L. C .
,lQQ
Proof:
This statement is true because it is the contrapositive

of the statement in Theorem 4.4.1.4 which is true.

4.4.2 Examples

In the following examples, each relation is in SNF. The
problem is to determine those which are also in TNF.
Example 4.4.2 .1
Refer back to Example 4.3.2.1. Note that QQ = X, and so
IQQI = 1. Therefore, by Corollary 4.4.1.2.2, R is also in TNF.

Example 4.4.2.2

 

Refer back to Example 4.2.2.2. Note that P0 = ABE and
QQ = CDFG. For every i, Li OPQ # O; so, by Theorem 4.4.1.4,
R is also in TNF.

Example 4.4.2.3

 

Refer back to Example 4.2.2.4. Note that Q0 = O; so, by
definition, R is also in TNF.
Example 4.4.2.4

AB aCDE; AE .. BFG; CD _. FG.

The keys are: AB, AE. Therefore, P0 = ABE and QQ = CDFG.

71

By Theorem 4.2.1.3, R is in SNF. But, CD is one of the
original Li’ and CD C Q0; therefore, by Corollary 4.4.1.4.1,
R is not in TNF.
The preceding theory and examples lead to the following
simple algorithm which determines whether a relation, in SNF, is

also in TNF.

4.5 Procedure for TNF

 

The following algorithm is a direct consequence of Theorem
4.4.1.4 and Corollary 4.4.1.4.1.
Algorithm A3
1. Apply Algorithm A2 to determine whether the relation
'3 ' SNF. F' d P and .
1 in in 0 Q0

2. If R is not in SNF, then go to 5. Otherwise, go to 3.

3. If, for some i, L

i C QQ’ then go to 5. Otherwise, go

to 4.
4. The relation is in TNF. Stop.

5. The relation is not in TNF. StOp.

4.6 Chapper Summary and Remarks

 

Chapter 4 gives algorithms for determining whether a rela-
tion is in second or third normal form. The correctness of the
algorithms, A2 and A3, derives from an extensive mathematical analysis
preceding each algorithm. These algorithms are suitable for hand
computation as well as computer implementation, as was Algorithm A1
in Chapter 2. However, none of these algorithms have been pro-

grammed.

72

It is clear that Algorithm Al is the building block for the
algorithms, A2 and A3, given in this chapter. This is expected
since the keys, as found by Algorithm Al, form the basis of a rela-

tion in second or third normal form.

CHAPTER 5

SUMMARY AND CONCLUSIONS

5.1 Conclusions

 

This thesis provides a detailed mathematical analysis of
the basic concepts of the relational model, as originally proposed

by Codd. Three algorithms are presented. Their correctness derives

 

from the basic properties of the functional relations, as given by
Armstrong [2], and the extensive theoretical background provided
in this thesis. The concept of the implication matrix, as applied
to the functional relations in Algorithm Al to find the keys of a
relation, has also proved to be fruitful in providing a useful
mathematical theory and algorithms (A2 and A3 in Chapter 4) for
determining whether a relation is in second or third normal form.
Algorithms comparable to A2 and A3 do not exist anywhere in the
literature. The mathematical prOperties of a relation in second
or third normal form and the three Suggested algorithms are the
main contributions of this thesis. The data base administrator may
use these results to determine whether a relation is in either of
these normal forms.

Although this thesis provides three algorithms, no actual
programming was undertaken because performance criteria are not
central to the purpose of this thesis which is to develop and
Show the utility of the mathematical foundation.

73

74

Many problems remain to be solved in the relational model.
This thesis concentrated on the normal forms, as prOposed by Codd,
only because no substitutes could be Suggested before their full
mathematical properties were well studied. Now that those pro-
perties have been examined in this thesis, an examination of the
underlying definitions of normal forms will prove fruitful in
determining whether the motivations for the second and third normal
forms have been realized. This points to a new and rich area for

further research.

5.2 Alternative Definitions for Normal Forms

As was stated in Chapter 1, the objective of the normal
forms is to minimize the update, deletion, and insertion anomalies.
For this reason, Codd introduced the concept of functional dependence
and proposed the definitions of first, second, and third normal
forms. The examples in Chapter 1 showed that a relation, R, stored
in a first normal form, is highly redundant and inefficient to main-
tain. Moreover, projecting a relation, in first normal form, into
a collection of relations in second or third normal form reduces
the undesirable anomalies mentioned by Codd.

A relation is a collection of facts, and, roughly Speaking,
a dependence, A.-oB, correSponds to a fact. The basic motivation
behind second normal form is that a fact stored in a relation

Should be dependent on the whole key for the relation. That is,

 

75

if A a B, then A and B Should not be stored in the same re-
lation, whenever A is a proper subset of a key, otherwise this

fact is stored more than once and if B should change, then this
update should be done in more than one place. However, the second
normal form, as defined by Codd, does not totally meet this objective,
as can easily be seen from the following example.

Example 5 .2 .1

 

AB-+CD;CD—0AB; A-rC.
The keys are: AB, CD, and A U D. Therefore, PC}: ABCD

= O, the relation is in SNF. The

 

and Q0 = O. Since QQ
relation, R, might look like this:
R: A B C D
1 1 l l
l 2 l 2
l 3 l 3
l 4 l 4
Suppose now, one wishes to update the fact, A.-oC by chang-
ing the values of A or C. It is clear that, as the value of C
changes, then this change occurs in more than one tuple.
The definition of a relation, R, in SNF, as proposed by
Codd, is restricted to non-prime attributes to be fully dependent
on each key of R. But, as was shown in Example 5.2.1, C is prime
and C is not fully dependent on the key AB since A —.C. This
shortcoming led Kent [48] to Suggest the following alternative de-
finition of a relation in SNF; it will be referred to as KSNF.

Definition 5.2.1

 

A relation in first normal form is in KSNF if every

76

attribute in the complement of a key is fully dependent on
that key.

It is important to note that, by definition, a relation in
KSNF is also in SNF. Based on this alternative definition, it is
easy to see that the relation, R, in Example 5.2.1, is not in KSNF
since A.—oC. Project R into R1[AC] and R2[ABD]. It is clear
that R1 and R2 are in KSNF, and so in SNF. Note that the fact,
A 41C, is stored only once in the relation R1. Therefore, the
alternative definition, as suggested by Kent, is an improvement, in
terms of update minimization, over COdd's definition of a relation
in SNF.

The second normal form, as prOposed by Codd, is only the
first step in eliminating certain redundancies in information storage.
Other dependencies may exist in a relation, and Codd proposed the
third normal form to eliminate information about an entity which
can be derived from other attributes of the entity. Example 1.3.3
already elaborated on this concept. Moreover, the third normal
form is restricted to the second normal form and to non-prime
attributes to be non-transitively dependent on each key of the re-
lation. However, Example 5.2.1 gave the reasons for Kent's
alternative definition for the second normal form; so, for similar
reasons, Kent suggested the following alternative definition for
the third normal form. It will be referred to as KTNF.

Definition 5.2.2

 

A relation in KSNF is in KTNF, if every attribute in the

complement of a key is non-transitively dependent on that key.

 

77

Again, it is important to note that a relation in KTNF is
in TNF, but the converse may not be true.
It is clear, from the definition of KTNF, that every
attribute in the complement of a key is fully dependent on itself
and on that key. But, Lemma 4.4.1 showed that an attribute may also
be dependent on Subsets that always have a non-empty intersection
with the set of all keys of the relation. Therefore, although the
conditions for a relation to be in KTNF are more stringent than those
of a relation in TNF, it is still possible to have facts stored more
than once in a relation, as the following example will Show.
Examﬂe 5.2.2
AB aICD; AC a D
The key is: AB. Therefore, P0 = AB and Q0 = CD. By
Theorem 4.2.1.3, the relation is in SNF, and, by Theorem 4.4.1.4,
it is also in TNF. Moreover, by definitions, the relation
is also in KSNF and KTNF. The relation, R, might look like

this:

1 3 l 1
Suppose now, one wishes to update the fact AC a D. It is
clear that as the value of D changes, then this change occurs in
more than one tuple.
It is important to note that the fact, AB‘a D, is a con-

sequence, by transitivity, of the facts AB —IC and AC ~ID. But

78

the objective of the third normal forms, TNF and KTNF, is not

to store information like AB 4 D in the above example. There-
fore, Example 5.2.2 points to a weakness in the statements of the
third normal forms, as proposed by Codd and Kent.

The ultimate goal, in the relational model, is to have a
relation in TNF or KTNF; so, this thesis proposes only one alternative
definition that is more restrictive than the definitions of Codd and
Kent. The basic motivation for the following alternative definition
has been explained in Example 5.2.2.

Definition 5.2.3

A relation, R, in first normal form is in canonical normal

 

form (CNF) if, for any two distinct, but not necessarily dis-

joint, non-empty Subsets, $1 and 32’ in the set of all

attributes in R, S 82, then S is a key of R.

1" 1
It follows, from the definitions, that a relation in CNF
is also in KTNF, and so in TNF. Also, note that the relation, R,

in Example 5.2.2, is not in CNF since AC q.D and AC is not a key

of 9;. Project in into R1[ABC] and R2[ACD]. Note that 321

and R2 are in CNF. MOreover, R is the natural join of R1 and

n that is R[ABCD] = R1[ABC] * R2[ACD], because AC .. D is the

2,

Sufficient condition for R = R1 * R to be true; so, no essential

2
information is lost, because the original functional relations in
R are preserved. Since AB «>C holds in R1, and AC a’D holds
in R2, by transitivity, AB a D holds in R. Therefore, the de-

finition of CNF, as suggested previously, is an improvement, in

terms of storage requirements, over the definitions of Codd and Kent.

 

79

Further, the following example Should help explain the
objectives of the Suggested canonical normal form.
Example 5.2.3
AB «D; BC «A; CD «B; BD«A.
The keys are: BC, CD. Therefore P0 = BCD and QQ = A.
It is easy to Show, by definition, that the relation is in
KTNF, and so in TNF. The relation, R, might look like this:
R: A B C D
1 1 l l
l l 2 1
l l 3 l
l 2 4 2
The fact, BC —.Aq is a consequence, by transitivity, of the
facts BC « D, Since BC is a key of R, and BD «.A. But since
R is in KTNF, the objective of Kent's alternative definition has
not been completely met. However, R is not in CNF, because
AB « D and BD «.A but neither AB nor ED is a key of R.
Project a into R1[ABD] and R2[BCD]. The functional

relations in R1 are: AB «ID and BD —oAq while in R they are:

2,
BC « D and CD «18. Therefore, R1 and R2 are in CNF. Mbreover
R[ABCD] = R1[ABD] * R2[BCD] and the original functional relations
in R are preserved; so no essential information has been lost.

Examples 5.2.2 and 5.2.3 showed that the canonical normal
form , as suggested in this thesis, meets better the original
objectives of TNF and KTNF.

Codd's SNF required each non-prime attribute to be fully

dependent on each key, but allowed a prime attribute not to have

 

80

such full dependence; hence, update anomalies Still could exist.
Kent's KSNF attempted to remedy that problem by requiring every
attribute in the complement of a key to be fully dependent on that
key. However, Chapter 4 showed that a subset, S, of attributes
might have a non-empty intersection with all keys of a relation in
KSNF, and it still might be that S «>A, where A is some prime

or non-prime attribute, in which case update anomalies would Still
exist. The canonical normal form defined here removes the update
anomalies in all these cases, and leaves open the question of an
appropriate definition of "optimality" when applied to normal forms,

a question which might instigate Substantial further research.

5.3 Suggestions for Future Research

This thesis did not deal with the concept of optimization
in the normal forms. Codd [22] and Kent [48] motivated the subject,
but their definitions of an Optimal normal form lack general and
different criteria for possible comparison. It appears that the
extensive mathematical theory and the algorithms presented in this
thesis can be extended, with some variations, to deal with the
Optimal normal forms.

Although this thesis provides three algorithms, no actual
programming was undertaken so performance criteria were not set.
The algorithms should be tested on real data bases and their per-
formance and complexity evaluated.

Codd [25] and Strnad [68] define a collection of operations
on relations, and this collection is called the relational algebra.
These operations are not necessarily binary. They include the

traditional set Operations (cartesian product, union, intersection,

81

etc.) and new operations on relations, suitable only in the rela-
tional model, such as projection, join, division, and restriction.
The user's queries can be expressed in the relational algebra by
forming a new normalized relation from the existing collection of
relations. The investigation of the mathematical properties of
these Operations is strongly suggested.

Finally, this thesis dealt only with the logical structure
of a relational data base, and no mention was made of the Storage
structure. Rissanen, et. a1 [57] analyze this problem in the form
of examples so a much more rigorous and general mathematical

approach is needed in this area.

BIBLIOGRAPHY

10.

BIBLIOGRAPHY

ALTMAN, E.B., et. al (1973)
"Specifications in a Data Independent Accessing Model",
IBM Research Report, RJ 1141, IBM Research Laboratory,
San Jose, California.

ARMSTRONG, w.w. (1974)
"Dependency Structures of Data Base Relationships",
Information Processing 74, North Holland Publishing
Company, pp. 580-583.

 

ASTRAHAN, M.M., et. a1 (1972)
"Concepts of a Data Independent Accessing Model", IBM
Research Report, RJ 1105, IBM Research Laboratory, San
Jose, California.

BACHMAN, C.w. (1972)
"The Evolution'of Storage Structures", CACM, Vol. 15,
No. 7, pp. 6284634.

BACHMAN, C.W. (1973)
"The Programmer as Navigator", CACM, Vol. 16, No. 11,
pp. 653-658.

BELL SYSTEM TECHNICAL JOURNAL (1973)
"Information Management System Issue", Vol. 52, No. 10,

BIRKHOFF, G. (1967)
"Lattice Theory", 3rd Edition, American Mathematical
Society Colloquium, Publ. XXV, Providence, R.I.

 

 

BJORNER, D., et. a1 (1973)
"The Gamma Zero N-ary Relational Data Base Interface:
Specifications of Objects and Operations", IBM Research
Report, RJ 1200, IBM Research Laboratory, San Jose,
California.

BOSAK, R. (1962)
"An Information Algebra", CACM Vol. 5, No. 4, pp. 190—204.

BOYCE, R.F., et. a1 (1973)
"Specifying Queries as Relational Expressions: SQUARE",
IBM Research Report, RJ 1291, IBM Research Laboratory,
San Jose, California.

82

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

83

BOYCE, R.F., et. a1 (1973)
"Using a Structured English Query Language as a Data
Definition Facility", IBM Research Report, RJ 1318, IBM
Research Laboratory, San Jose, California.

BRACCHI, G., et. a1 (1972)
"A Language~for a Relational Data Base", Sixth Annual
Princeton Conference on Infromation Sciences and Systems,
March 23—24, Princeton, New Jersey.

 

 

CANNING, R.G. (1974)
"Problem Areas in Data Management", EDP Analyzer, Vol. 12,
No. 3.

 

CASEY, R.G., et. a1 (1974)
"Generalized Page Replacement Algorithms in a Relational
Data Base?, Proc. ACMSSIGFIDET Workshpp on Data Description,
Access and Control, May 1-3, Ann Arbor, Michigan.

 

 

CHAMBERLIN, D.D., et. a1 (1974)
"SEQUEL: A Structured English Query Language", Proc.
ACM-SIGFIDET Workshop on Data Description, Access and
Control, May 1-3, Ann Arbor, Michigan.

 

CHILDS, D.C. (1968)
"Feasibility of a Set Theoretic Data Structure", Proc.
IFIP Congress, Vol. 1, North Holland Pub. Co., Amsterdam,
pp. 420-430.

 

CODASYL (1969)
"Data Base Task Group Report", Available from ACM HQ.,
New York.

CODASYL (1971)
"Data Base Task Group Report”, Available from ACM HQ.,
New York.

CODASYL (1971)
"Introduction to Feature Analysis of Generalized Data
Base Management Systems", CACM, Vol. 14, No. 5, pp. 308-
318.

CODASYL (1971)
"Feature Analysis of Generalized Data Base Management
Systems", Available from ACM HQ., New York.

CODD, E.F. (1970)
"A Relational Model of Data for Large Shared Data Banks",
CACM, Vol. 13, No. 6, pp. 377-387.

84

22. CODD, E.F. (1971)
"Further Normalization of the Data Base Relational Model",
IBM Research Report, RJ 909, IBM Research Laboratory,
San Jose, California.

23. CODD, E.F. (1971)
"Normalized Data Base Structure: A Brief Tutorial",
IBM Research Report, RJ 935, IBM Research Laboratory, San
Jose, California.

24. CODD, E.F. (1971)
"A Data Base Sublanguage Founded on the Relational
Calculus", IBM Research Report, RJ 893, IBM Research
Laboratory, San Jose, Calfironia.

25. CODD, E.F. (1971)
"Relational Completeness of Data Base Sublanguage", IBM
Research Report, RJ 987, IBM Research Laboratory, San
Jose, California.

26. CODD, E.F. (1974)
"Recent Investigations in Relational Data Base Systems",
IBM Research Report, RJ 1385, IBM Research Laboratory,
San Jose, California.

27. CODD, E.F. (1974)
"Seven Steps to Rendezvous with the Casual User", IBM
Research Report, RJ 1333, IBM Research Laboratory, San
Jose, California.

28. CODD, E.F., et. a1 (1974)
"Interactive Support for Non-programmers: The Relational
and Network Approaches", IBM Research Report, RJ 1400,
IBM Research Laboratory, San Jose, California.

29. DATE, C.J., et. a1 (1971)
"File Definition and Logical Data Independence", Proc.
ACM-SIGFIDET WorkshOp on Data Description, Access and
Control, Nov. 11-12, San Diego, California, pp. 117-138.

 

30. DATE, C.J., et. a1 (1971)
"Storage Structure and Physical Data Independence", Proc.
ACM—SIGFIDET Workshop on Data Description, Access and
Control, Nov. 11-12, San Diego, California, pp. 139—168.

 

31. DATE, C.J. (1972)
"Relational Data Base Systems: A Tutorial", Fourth
International Symposium on Comppter and Information
Sciences, Miami Beach, Florida, Plenum Press.

 

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

85

DATE, C.J. (1972)
"An Introduction to the April 1971 Report of the CODASYL
Data Base Task Group", IBM Technical Report, TR. 12.104,
IBM United Kingdom Laboratories LTD, Hursley Park,
Winchester Hampshire.

DATE, C.J., et. al (1974)
"The Relational and Network Approaches: Comparison of
the Application Programming Interfaces", IBM Research
Report, RJ 1401, IBM Research Laboratory, San Jose,
California.

DATE, C.J. (1975)
"An Introduction to Data Base Systems", Addison Wesley,
New York.

DAVIES, C.T. (1967)
"A Logical Concept for the Control and Management of
Data", IBM Research Report, AR—0803-00, Poughkeepsie,
New York.

DELOBEL, C., et. a1 (1973)
"Decomposition of a Data Base and the Theory of Boolean

Switching Functions", IBM Journal of Research and Develop—
ment, V01. 17’ NO. 5, pp. 374—3860

 

D'IMPERIO, M.E. (1969) .
"Data Structures and Their Representations in Storage",
Annual Review in Automatic Programming, Vol. 5, Pergamon
Press, New York, pp. 1-75.

 

DODD, 0.6. (1969)
"Elements of Data Management Systems", Computing Survey,
V01. 1, NO. 2, pp. 117-1330

 

EARNEST, C.P. (1974)
"A Comparison of the Network and Relational Data Structure
Models", Computer Sciences Corporation Report, El Segundo,
California.

 

ENGLES, R.W. (1970)
"A Tutorial on Data Base Organization", IBM Technical
Report, TR 00.2004, Poughkeepsie, New York.

ENGLES, R.W. (1971)
"An Analysis of the April 1971 Data Base Task Group
Report", Proc. ACM-SIGFIDET Workshop on Data Description,
Access and Control, Nov. ll—12, San Diego, California,
pp. 69-91.

 

 

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

86

FEHDER, P.L. (1972)
"The Representation Independent Language Part 1: Introduc-
tion and the Subsetting Operation", IBM Research Report,
RJ 1121, IBM Research Laboratory, San Jose, California.

FEHDER, P.L. (1973)
"The Representation Independent Language Part 2: Deriva-
tion and Insert, Update, and Delete Operations", IBM
Research Report, RJ 1251, IBM Research Laboratory, San
Jose, California.

FLORENTIN, J.J. (1974)
"Consistency Auditing of Data Bases", The Computer Journal,
Vol. 17, No. 1, pp. 52-58.

 

HEATH, I.J. (1971)
"Unacceptable File Operations in a Relational Data Base",
Proc. ACM-SIGFIDET Worksh0p on Data Description; Access
and Control, Nov. 11-12, San Diego, California, pp. 19—33.

 

 

HENRY, W.R. (1969)
"Hierarchical Structure of Data Management", IBM Systems
Journal, Vol. 8, No. 1, pp. 2-16.

 

HSIAO, D., et. al (1970)
"A Formal System for Information Retrieval from Files",
CACM, Vol. 13, No. 2, pp. 67-73.

KENT, w. (1973)

"A Primer of Normal Forms", IBM Technical Report,
TR 02.600, IBM System Development Division, San Jose,

California.

LEVIEN, R.E., et. a1 (1967)
"A Computer System for Inference Execution and Data
Retrieval", CACM, Vol. 10, No. 11, pp. 715-721.

LORIE, R.A. (1974)
"XRMr An Extended (n-ary) Relational Memory", IBM
Scientific Center Report, G320—2096, Cambridge, Mass.

MCGEE, W.C. (1969)
"Generalized File Processing", Annual Review in Automatic
Programming, Vol. 5, Pergamon Press, New York, pp. 77-149.

 

 

MEALY, G.H. (1967)
"Another Look at Data", Proc. AFIPS, FJCC, Vol. 31,
AFIPS Press, Montvale, New Jersey, pp. 525-534.

 

MELTZER, H.S. (1969)
"System Design Considerations for Data Base Support",
IBM Technical Report, TR 02.468, IBM System DevelOpment
Division, San Jose, California.

54.

55.

56.

57.

58.

59.

60.

61.

62.

63.

64.

87

NAVATHE, s. (1974)
"Logical Normal Forms for Data Translation", Proc. ACM-
SIGFIDET Worksh0p on Data Description, Access and Control,
May 1—3, Ann Arbor, Michigan.

 

PALERMO, F.P. (1972)
"A Data Base Search Problem", IBM Research Report,
RJ 1072, IBM Research Laboratory, San Jose, California.

PALERMO, F.P. (1973)
"An APL Implementation of Relational Operators and a
Search Algorithm", IBM Research Report, RJ 1273, IBM
Research Laboratory, San Jose, California.

RISSANEN, J., et. al (1973)
"Decomposition of Files, a Basis for Data Storage and
Retrieval", IBM Research Report, RJ 1220, IBM Research
Laboratory, San Jose, California.

ROBERTS, D.C. (1972)
"File Organization Techniques", Advances in Computers,
Vol. 12, pp. 115-174.

 

ROSENBERG, A.L. (1971)
"Data Graphs and Addressing Schemes", Journal of Computer
and System Sciences, Vol. 5, No. 3, pp. 193-238.

 

 

ROSENBERG, A.L. (1973)
"Suffixes of Addressable Data Graphs", Information and
Control, Vol. 23, pp. 107-127.

 

ROTHNIE, J.B. (1972)
"The Design of Generalized Data Management Systems",
PhD Dissertation, Department of Civil Engineering,
Massachusetts Institute of Technology.

 

 

ROTHNIE, J.B. (1974)
"An Approach to Implementing a Relational Data Management
System", Proc. ACM-SIGFIDET Workshop on Data Description,
Access and Control, May 1-3, Ann Arbor, Michigan.

 

SENKO, M.E., et. a1 (1972)
"A Data Independent Architecture Model I: Four Levels
of Description from Logical Structures to Physical Search
Structures", IBM Research Report, RJ 982, IBM Research
Laboratory, San Jose, California.

SEVERANCE, D.C. (1972)
"Some Generalized Modeling Structures for Use in the
Design of File Organizations", PhD Dissertation, Depart—
ment of Computer and Communication Sciences, University of
Michigan, Ann Arbor, Michigan.

 

65.

66.

67.

68.

69.

70.

71.

72.

73.

74.

88

SIBLEY, E.H., et. al (1973)
"A Data Definition and Mapping Language", CACM, Vol. 16,
NO. 12’ pp. 750-7590

SIBLEY, E.H. (1974)
"On the Equivalence of Data Based Systems", Proc. ACME
SIGFIDET Workshop on Data Description, Access and
Control, May 1—3, Ann Arbor, Michigan.

 

SMITH, D.P. (1971)
"An Approach to Data Description and Conversion", PhD

Dissertation, The Moore School of Electrical Engineering,
University of Pennsylvania, Philadelphia, Pennsylvania.

 

STRNAD, A.L. (1971)
"The Relational Appraoch to the Management of Data Bases",
Proc. IFIP Congress, Ljubljana, Yougoslavia.

 

TAYLOR, R.W. (1971)
"Generalized Data Base Management System Data Structures
and Their Mapping to Physical Storage", PhD Dissertation,
Department of Computer and Communication Sciences, Univer-
sity of Michigan, Ann Arbor, Michigan.

 

 

WANG, C.P., et. a1 (1974)
"An Approach for Segment Synthesis in Logical Data Base
Design", IBM Research Report, RJ 1397, IBM Research
Laboratory, San Jose, California.

WHITNEY, V.K.M. (1972)
"RDMS: A Relational Data Management System", Proc. Fourth
International Symposium on Computer and Information
Sciences, Miami Beach, Florida, Dec. 14-16, Plenum Press.

 

 

WHITNEY, V.K.M. (1974)
"Relational Data Management Implementation Techniques",
Proc. ACM-SIGFIDET Workshoppon Data Description, Access
and Control, May 1-3, Ann Arbor, Michigan.

 

WILLIAMS, R. (1971)

"A Survey of Data Structures for Computer Graphics Systems",

Computing Survey, Vol. 3, No. 1, pp. 1-22.

 

WONG, E., et. a1 (1971)
"Canonical Structure in Attribute Based File Organization",
CACM, Vol. 14, No. 9, pp. 593-597.

 

"IIIIIIIIIIIIIIIIIII