MATHEMATICAL FOUNDATIONS FOR RELATIDNAL DATA BASES Dissertation for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY RAYMOND YOUSSEF FADOUS 1 9 7.5 IIIllIllIlIIlIflILIIIILIIjIIIIIIIIIIIIIIIIIIIII f ABSTRACT MATHEMATICAL FOUNDATIONS FOR RELATIONAL DATA BASES By Raymond Youssef Fadous A new approach to data management systems has been the introduction of a relation or table as a model for a data base. Large sets of data can be represented in a few large tables, but such a representation often leads to certain anomalies whenever data items in the data base are added, deleted, or changed. To reduce the effect of these anomalies, previous research identified functional relations or dependencies between attributes and defined second and third normal forms. These normal forms are dependent on minimal subsets of attributes, called candidate keys or simply keys, which uniquely identify each row of a table whenever the usual operations of retrieval, deletion, and update are performed. The research reported in this thesis considers the problem of constructing algorithms for finding keys for relational data bases and for determining whether a relation is in second or third normal form. The thesis presents a new algorithm which starts with the functional relations and finds all keys of a normalized relation. The mathematical properties of a relation in second and third normal forms are studied in detail along with the properties Raymond Youssef Fadous of prime and non-prime attributes and algorithms are given for determining whether a relation is in either second or third normal form. Finally, this thesis points to a weakness in the definitions of a relation in third normal form, as proposed by Codd and Kent, and advances the concept of a canonical normal form to overcome the disclosed weakness. MATHEMATICAL FOUNDATIONS FOR RELATIONAL DATA BASES By Raymond Youssef Fadous A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1975 ACKNOWLEDGEMENTS I am very grateful to Dr. John J. Forsyth, my thesis adviser and the co-chairman of my committee, for his encouragement, sugges- tions, and help in writing this thesis in a clear and concise mathe- matical form. I would like to thank Dr. Car1.V. Page, the co-chair- man of my committee, for introducing me to this area of research. My thanks to Dr. J.S. Frame and Dr. Richard C. Dubes for serving on my committee and for the helpful discussions I had with them while writing this thesis. Thanks go to the Division of Engineering Research for providing me with financial assistance to do research. My thanks to Dr. E.F. Codd of IBM for generously providing the necessary references in this area of research. Special thanks are due to my parents, my brothers, my wife Dola, and to my children Sandy and Joe. Their love and understanding helped me sustain the hard work of the doctoral program. ii Chapter TABLE OF CONTENTS ACKNOWLEDGEMENTS SURVEY OF RELATIONAL DATA BASE THEORY HHl-‘H O. O waI—I Background The Relational Model Normal Forms Contributions and Organization of the Thesis FINDING KEYS FOR RELATIONAL DATA BASES NNNNNN .0... ost-‘th-d Introduction Basic Concepts and Notation Mathematical Preliminaries Finding the Keys Examples Chapter Summary and Remarks FUNCTIONAL PARTITION AND REDUCTION WWW WNH woo U‘D UN \16\ Introduction Functional Partition Procedures for Obtaining Functional Partition 3.3.1 Graphical Method 3.3.2 Algebraic Method PrOperties of Functional Partition Properties of Prime and Non-Prime Attributes Functional Reduction Functional Deletion 3.7.1 Examples Chapter Summary and Remarks iii Page ii U1 15 17 17 18 21 27 3O 32 34 34 35 37 37 39 40 49 51 54 55 Chapter 4. NORMAL FORMS 1 Introduction 2 Properties of the Second Normal Form (SNF) 4.2.1 Mathematical Properties 4.2.2 Examples and Remarks 4.3 Procedure for SNF 4.3.1 Algorithm A2 4.3.2 Examples 4.4 PrOperties of the Third Normal Form (TNF) 4.4.1 Mathematical Properties 4.4.2 Examples 5 Procedure for TNF 6 Chapter Summary and Remarks 4. 4. 4. 4. SUMMARY AND CONCLUSIONS 5.1 Conclusions 5.2 Alternative Definitions for Normal Forms 5.3 Suggestions for Future Research BIBLIOGRAPHY iv Page 56 56 56 56 61 62 63 64 65 65 7O 71 71 73 73 74 80 82 GLOSSARIAL CROSS REFERENCE 1e_r_tn_ Adjacent Algorithm A1 Algorithm A2 Algorithm A3 Attribute Attribute Value Candidate Key Canonical Normal Form (CNF) Connected Dependent First Normal Form (FNF) Full Dependence Functional Dependence Functional Relation Identity Partition Implication Matrix Implies Key KSNF KTNF Definition Number 3 .2 .3 .2 .1 .4 .2.1 .2 Page 35 27 63 71 78 35 37 19 75 76 Tg£m_ Natural Join Non-Adjacent Non-Prime Attribute Non-Transitive Dependence Normalized Relation Optimal Second Normal Form Optimal Third Normal Form Primary Key Prime Attribute Projection Reflexive Form Relation Second Normal Form (SNF) Set of Functional Relations Strict Transitive Dependence Third Normal Form (TNF) Transitive Closure Transitive Dependence Tuple Universal Partition Definition Number 1.3.11 3.2.1 1.3.5 1.3.9 1.3.13 1.3.14 1.3.6 1.3.5 1.3.1 3.7.1 1.3.8 2.2.1 1.3.12 1.3.10 1.3.9 3.2.4 vi Page 13 35 11 14 14 51 10 18 13 11 20 11 37 CHAPTER 1 SURVEY OF RELATIONAL DATA BASE THEORY The trend in computer application31n3to make the computer accessible to a wide range of users, esPecially casual users, who have little or no training in programming. It is estimated, by IBM researchers Codd, et. a1 [28], that in the 1990's the growth in on-line interaction by casual users will exceed that for all other users by a large factor. Such users need a simple logical notion of the data organization in order to form queries in a sensible way. One of the key develOpments in data base design in recent years has been the introduction of a relation or table as a model for a data base. The tabular representation of data is the simplest and most universally understood data structure. The initial structuring of the data can often be described by a few large tables with many columns. However, such a structure is usually inefficient and con- tains redundant information which is not suitable for direct storage. Codd [22] discussed the possibility of breaking up a table into smaller ones, in what he termed second and third normal forms, so as to remove these shortcomings. For this reason, Codd introduced the concept of a functional relation between attributes and the important notion of a key to help project a relation into subrela- tions in second and third normal forms, and also to recover the original table from the projections by a Special Operation called the natural join. Existing methods for finding keys are conceptually elegant but computationally intimidating; these methods suffer from the curse of dimensionality or combinatorial explosion associated with finding prime implicants of Boolean functions having a large number of variables. The algorithm in this thesis, in contrast, proceeds by identifying subsets which can lead to keys and then carefully selecting the appropriate intersections of these subsets which pro- duce the keys. Further, Codd gives only the definition as a basis for a relation to be in a normal form. But, how can the date base administrator know whether the relations are in second or third normal form? This thesis provides the answer to this question. Kent [48] examined the underlying definitions of normal forms, as preposed by Codd, and suggested some improvements. How- ever, in this thesis, further examination of Kent's alternative definitions reveals that the objectives of the- normal forms were not totally met, and this led to the introduction of a new canonical normal form that better meets the criteria for normal forms in relational data bases. 1.1 Background One of the main objectives in the design of data base systems is the concept of data independence. Briefly stated, this means that application programs are not affected by the way data is stored. The data base is the data as physically stored, also referred to as the storage structure. The user's view of the data base is called the data model, also known as the logical structure. That is, to the user the data model is the data base. Date, et. a1 [29, 30] give many examples and explain these terms. Engles [40] and Meltzer [53] provide a good background in the design of data independent access- ing models. Many different data models that claim data independence have appeared in the literature since 1960. Two of these, the CODASYL Data Base Task Group (DBTG) network model [17-20] and the relational model, as proposed by E.F. Codd [21], have each attracted a large number of followers. Bachman [4, 5], one of the main contributors to the DBTG approach, clearly presents the goals desired in a data independent model. The string model, as proposed by Senko, et. a1 [1, 3, 63], is also a data independent accessing model, based on the work of Davies [35], Engles [40], and Meltzer [53], but is overshadowed by the two previously mentioned models. Date [32] gives a good introduction to the DBTG approach. Engles [41] presents a thoughtful criticism of the DBTG model and lists the IBM objections to it. Date, et. al [33] and Codd, et. al [28] explain the main differences between the network and relational approaches to data base design. The main differences and, in Codd's view, the advantages of the relational model are simplicity, uni- formity, completeness, and data independence. To a large extent, simplicity may be considered the justification of the relational approach. All the records in a file are stored as n-tuples in a relation -- that is as a table. Also, the data sublanguages that Operate on the relational data base are easier to learn than the data manipulation language of the DBTG approach. The relational model is uniform in the sense that any relationship between entities or relations is also expressed as a relation. It is complete in the sense that all data structures commonly used in data base systems can be expressed in a relational form. As for data independence, neither the data model nor the languages contain any reference to storage structure or to access methods. Date [31] summarizes, in a tutorial form, the main concepts of these two as well as other models. This thesis deals exclusively with the relational model as originally proposed by Codd [21]. The relational approach is motivated primarily, but not exclusively, by the desire to attract the casual user. The data structure used is in the simplest form possible, the tabular form. Codd [27] states the steps necessary to help the user interact with the computer in a natural dialogue with the objective of attaining an agreement between the user and the system as to the user's needs. One of these steps is the de- velopment of the language for manipulating the data model, which is known as the data sublanguage. Codd [24] defined the data sublanguage ALPHA (DSL-ALPHA), based on the first-order predicate calculus, for eXpressing the user's queries. Queries can be eXpressed in terms of a collection of Operations on the relations [21, 25], and this collection is called the relational algebra. An earlier informa- tion algebra, with a different set of operations, was defined by Bosak [9]. Codd [25] proves that his DSL-ALPHA is complete in the sense that any query expressable in the relational algebra is also expressable in the relational calculus. Other languages have been defined for accessing data in a relational data base. Boyce, et. a1 [10] proposed a data sublanguage called SQUARE. It uses a mathematical notation for expressing queries, but is less sophisticated mathematically than DSL-ALPHA; hence, SQUARE is easier to use by the casual user. They have also shown that SQUARE is a complete sublanguage. In later papers, Boyce, et. a1 [11] and Chamberlin, et. a1 [15] presented another sublanguage called SEQUEL that has the functional capabilities of SQUARE but different syntax. SQUARE is a concise mathematical APL-like notation, while SEQUEL is a block structured English keyword language. There has been much research done on the relational model and Codd [26] reviews briefly the latest and the current areas of investigations that have been undertaken. 1.2 The Relational Model Given sets, D1,D Dn, called domains, not necessarily 2,..., distinct, a relation m of degree n, defined over D1,D2,... Dn’ is a Subset of the cartesian product D X D 1 2 X...X D“. That is, m is a set of elements each of the form (d1,d 2,...,dn) where each di is an element of Di' The set D1 is called the i-th domain of a. Each element (d1,d2,...,dn) is called an n-tuple or,simply,tuple of R. Each (11 is the i-th component of a tuple. An attribute is a name assigned to a domain of a relation. Any value associated with an attribute is called an attribute value. While the domains of a relation need not be distinct, the attribute names assigned to them must all be distinct. The relations are time- varying relations; tuples may be updated, deleted, and inserted in a relation. Throughout this thesis, the term relation means a time- varying relation. The logical view of a relation of degree n is a rectangular array or table with n columns and m rows such that m varies from one update to another. This work deals only with normalized relations. These are relations whose attribute values are simple and are not themselves relations. Codd [21, 23] lists five prOperties that are satisfied by any normalized relation R. These prOperties are: l. R is column homogeneous, that is in any one column all the attribute values are of the same kind, whereas items in different columns need not be of the same kind. 2. Each attribute value is a simple number or a char- acter string and not a set of numbers or a repeating group. 3. All rows of a relation must be distinct. 4. The ordering of rows within a relation is immaterial. 5. Since the attributes are distinct, the ordering of columns within a relation is immaterial. With prOperty 5 added, the exact mathematical term is relationship and not a relation, but here this distinction will be ignored. A relation which satisfies property 2 is said to be normalized. Date, et. a1 [33] give a good exposition of the logical structure of the data base. In the relational approach, the data model definition (DMD) defines the relations and the underlying attributes that together constitute the relational model. 1 .3 Normal Forms The notion of functional dependence, as defined by Codd [22], plays a fundamental role in the theory which governs the de- composition Of relations into subrelations in normal forms. Codd [23] listed six aims of normalization of relations. The two most important are: 1. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life Span of application programs. 2. To reduce the incidence of undesirable insertion, up- date, and delition anomalies. An example will be given later, in this section, to eXplain these different anomalies. Delobel, et. a1 [36] use the term "functional relation" to mean functional dependence. A whole theory of relations can be built around this simple concept. First, one needs to define the term prgjection of a relation 3 on a subset of attributes of m. The notation used in the following definition is eXplained in Chapter 2. Definition 1.3.1 Let R be a relation defined on the set of attributes Q = A1A2A3 ... An' For any a = AlAZ ... Am’ a subset of Q, the projection of m on a is defined as: Rd = {(al,a2,...,am)I(a1,a2,...,an) E R] , also written as ‘R[A1A2 ... Am]. Definition 1.3.2 The set of attributes B in a relation 8 of a degree n is functionally dependent or just dependent on the set of attributes A in m, if, at any instant of time, there exists a function, called functional relation, F : R[A] ~t%[B] or simply A A’B. The term A implies B is also used when- ever A ~ B. The notation A f B indicates that B is not dependent on A, while A «'B says that A and B are dependent on each other. Definition 1.3.3 The set of attributes B in a relation m is EBLLL dependent on the set of attributes A in R, written as A =‘B, if 1. B is dependent on A and 2. B is not dependent on any prOper subset of A. If B is not fully dependent on A, one writes A # B, while A ”'3 indicates that A and B are fully dependent on each other. The concept of a key, that will be defined next, is the mathematical basis of the relational model. The normal forms are defined in terms of the candidate keys or simply keys of the rela- tion. Also, the search algorithms use the keys of a relation to retrieve or insert information in the data base. The remainder of this thesis uses the term key instead of candidate key. Definition 1.3.4 Each key of a relation R is a non-empty subset, K, of Q, the set of all attributes in m, Such that O is fully dependent on K; or in symbols K:= Q. Definition 1.3.5 An attribute in R is called a prime attribute if it is a member of any key of m. Otherwise, it is non-prime. Definition 1.3.6 Every relation 3 has a primary key which is chosen arbitrarily from the set of all keys of 8. Also, no attribute in a primary key is allowed to have an undefined value. Definition 1.3.7 Every normalized relation is said to be in first normal form (FNF). Example 1 .3 .1 Consider the teaching schedule relation, Tl, defined on the attributes P = Professor name, C = Course number, R = Room number, H = Hour of the course meeting, and O = Office number of a professor. Assume that a course may be taught by many professors and that a professor may teach many courses, but that a professor can only be in one room at any one time. Also, assume that each professor is assigned exactly one office number, and that many professors can share the same office. The above statement of the problem provides the following functional relations: PH-oRC;RH-+PC;P-+0 At some instant of time T1 might look like this: Tl : P C R H O Forsyth 817 1 9 400 Forsyth 818 1 10 400 Dubes 805 2 9 400 Dubes 817 2 10 400 Frame 851 3 9 243 Frame 831 4 11 243 Page 841 2 11 404 Page 841 2 12 404 10 The relation T1 is in FNF because all the entries in T1 are simple. The combinations of attributes RH and PH are the only keys of T1 based only on the functional relations given above. Then, P, R, and H are prime attributes, while C and O are non- prime. Choose PH, say, as the primary key. Observe that if Professor Forsyth moved to office number 402, then more than one tuple has to be updated. This is called an update anomaly. If Professor Page is not teaching a course this year but he will next year, then the information about his office number should be retained; but, if the Page tuples are deleted, that information is lost. This is an example of a deletion anomaly. Finally, suppose one wishes to record the office number of visiting Professor Jones who has not been yet assigned a course number to teach. Since PH is the primary key, one must fabricate a fictitious hour number in order to store this information. This is called an insertion anomaly. To reduce the effect of these anomalies, transitive de- pendence of attributes upon one another were introduced and the second and third normal forms were defined by Codd [22]. Definition 1.3.8 A relation R is in second normal form (SNF) if l. R is in first normal form and 2. Every non-prime attribute in m is fully dependent on every key of 8. Example 1.3.2 If relation T1, in Example 1.3.1, were projected into two subrelations, T2[PCRH] and T3[PO], then, T3, for example, looks 11 like this: T3 : P O Forsyth 400 Dubes 400 Frame 243 Page 404 The only key of T3 is P, hence the primary key, while T2 has the same keys as T1. Note that T2 and T3 are each in second normal form. This is so, since T3 has a simple key, that is a key with just one attribute, and C, the non-prime attribute in T2, is not dependent on any proper subsets of the keys PH and RH. The anomalies previously mentioned have disappeared. Definition 1.3.9 The set of attributes C in a relation R is transitively dependent on the set of attributes A in a, written as A- —'C if 1. A and C are disjoint and 2. There exists a set of attributes B in m, disjoint from both A and C, Such that A.—.B, B f'A, and B —oC. Otherwise, C is non-transitively dependent on A. Definition 1.3.10 A relation 3 in second normal form is in third normal form (TNF) if every non-prime attribute in m is non- transitively dependent on each key of m. 12 Example 1.3.3 Given the following data base with relation T4 defined on the attributes P = Professor name, C = City of residence, and Z = Zip code. At some instant of time T4 might look like this: T4 : P C Z Page E.L. 48823 Frame E.L. 48823 Forsyth E.L. 48823 Dubes E.L. 48823 The statement of the problem provides the following func- tional relations: P —oC; C a Z The only key of T4 is P, hence the primary key, and so, by definition, T4 is in second normal form; but T4 is not in third normal form since Z is transitively dependent on P. Suppose that the Zip Code of E.L. is changed to 48824, then many tuples would have to be updated. This is a consequence of the trans- itive dependence. To get rid of this anomaly, one can project T4 into T5[PC] and T6[CZ]. T5 : P C and T6 : C Z Page E.L. E.L. 48823 Frame E.L. Dubes E.L. Forsyth E.L. Now, T5 and T6 are each in third normal form, as were T2 and T3. 13 Definition 1.3.11 Let R, and R2 be any relations over the set of attributes l 01 = {a,b] and 05 = {b,c] reSpectively such that = . * a 01 n 02 # m, then the natural_jorn, denoted m1 m2, of mi and $2 over a is defined by m1 * R2 = {(a,b,c)\(a,b) 6 R1 and (b,c) 6 m2]. This defini- tion can be extended to any sets 01 and 02 such that a = 01 0 02 ¢ ¢. Example 1.3.4 Consider the relations T4, T5, and T6 defined in Example 1.3.3. Then T4 = T5 * T6. The natural join is only one of the relational operations introduced by Codd [21, 24]. In projecting a relation 3 into subrelations of m, it is important to be able to recover the original relation R from the subrelations.l This idea is emphasized later in the definitions of optimal normal forms. This recovery can be done with the natural join operation. Suppose one projects a relation R[ABC] into ml[AB] and m2[BC]; it is not always true that m = ml * $2. Delobel, et. a1 [36] gave the suf- ficient condition that, if B -oC, then ER[ABC] = 2R1[AB] * m2[Bc]. Although this thesis does not deal with optimization, the following definitions are given here to complete the presenta- tion of the relational model as originally prOposed by Codd. Definition 1.3.12 The transitive dependence A- #10, in a relation n, is a strict transitive dependence if there exists a set of 14 attributes B in m disjoint from A and C such that: 1. A-+B,Bf~A and 2. B —~C, C foB. Definition 1.3.13 A collection of relations C, which are projections in a relation R, is in Optimal second normal form if the following three conditions hold: 1. All the relations in C are in second normal form. 2. a is the natural join of the relations in C. 3. No smaller collection of relations has these prOperties. Definition 1.3.14 Let C2 be a collection of relations in Optimal second normal form, and C3, a collection of relations in third normal form, which are projections from the relations in C2. Then, the collection C3 is in Optimal third normal form if the following 4 conditions hold: 1. All the relations in C3 are in third normal form. 2. C2 can be recovered with the natural join of the relations in C . 3 3. No relation in C3 contains any pair of attributes that are strictly transitively dependent on any relation in C2. 4. No smaller collection of relations has these prOperties. The relational model, as proposed by Codd, lacks simple and efficient procedures to find all keys, and to determine whether a relation is in second or third normal form. This thesis reports efforts made to take constructive steps in this direction. l4 attributes B in m disjoint from A and C such that: l. A «’B, B f A and 2. B —.C, C foB. Definition 1.3.13 A collection of relations C, which are projections in a relation R, is in Optimal second normal form if the following three conditions hold: 1. All the relations in C are in second normal form. 2. R is the natural join of the relations in C. 3. No smaller collection of relations has these prOperties. Definition 1.3.14 Let C2 be a collection of relations in Optimal second normal form, and C3, a collection of relations in third normal form, which are projections from the relations in C Then, 2. the collection C3 is in Optimal third normal form if the following 4 conditions hold: 1. All the relations in C3 are in third normal form. 2. C2 can be recovered with the natural join of the relations in C . 3 3. No relation in C3 contains any pair of attributes that are strictly transitively dependent on any relation in C2. 4. No smaller collection of relations has these prOperties. The relational model, as proposed by Codd, lacks simple and efficient procedures to find all keys, and to determine whether a relation is in second or third normal form. This thesis reports efforts made to take constructive steps in this direction. 15 1.4 Contributions and Organization of the Thesis Functional relations are shown to enjoy a rich algebraic and set theoretic structure, and that the normal forms can be studied within this framework. This mathematical structure leads to a new approach for starting with the functional relations and finding all of the keys in a normalized relation. The algorithm uses an implication matrix, its transitive closure and a systematic method for introducing attributes to form keys. A detailed mathematical characterization of prime and non- prime attributes leads to the concepts of functional partition, re- duction, and deletion. Also, it is shown that, under certain con- ditions, many computational steps can be saved in the algorithm for finding the keys. The mathematical prOperties of a relation in second and third normal forms are studied in detail and algorithms are defined for determining whether a relation is in any of these forms. Then, this thesis proceeds to point to a weakness in the definitions of third normal form, as proposed by Codd and Kent [48], and a new canonical normal form (CNF), an improvement in terms of reducing the update anomaly, is suggested. The basic definitions of the relational model, as given by Codd, are all included in Chapter 1. Chapter 2 defines the implica- tion matrix and its transitive closure which form the basis of the new algebraic approach, taken in this thesis, in the study of the normal forms. Also, in Chapter 2, the algorithm for finding all keys in a relation is presented. Chapter 3 deals with the mathe- matical properties of prime and non-prime attributes, and with the 16 new concepts of functional partition, reduction, and deletion. In Chapter 4, the mathematical properties, and algorithms for the normal forms, are given. Conclusions and suggestions for future research are offered in Chapter 5. All definitions, theorems and lemmas in the remainder of this thesis are the original work of the author unless otherwise in- dicated. CHAPTER 2 FINDING KEYS FOR RELATIONAL DATA BASES 2.1 Introduction E.F. Codd [21] prOposed the relational data base model in an effort to provide "data independence" for applications programs. An applications programmer working with a relational data base may view the data as being stored in tabular form; each unique collection of data items which describes a single entity occupies a unique row of some table. A number of different collections of tables may be used to represent any given data base. Certain collections might be preferable to others if they eliminate exceptional conditions, or anomalies, which can occur when items in the data base are added, deleted, or changed. ~Several re- searchers have investigated normal forms for relational data bases and characterized their role in eliminating addition, deletion, and insertion anomalies. A good overview of these concepts can be obtained by reading Codd [22-23] and Kent [48]. In order to establish an appropriate normal form for a re- lational data base, one must identify minimal subsets of attributes, called kpygj which uniquely determine the values of the remaining attributes, so that the rows of the tables in the data base have the required prOperty of uniqueness. Delobel and Casey [36] de- ve10ped an algorithm for finding all keys in a relational data base, 17 18 given the set of fundamental functional relations defined on the data base. Their method maps the functional relations to a Boolean function and produces the keys from those prime implicants of the Boolean function which cover the implicant which consists of all of the uncomplemented variables of the function. This Chapter describes an alternative algorithm, whose correctness is shown to derive directly from the fundamental prOperties of the functional relations as given by Armstrong [2]. 2.2 Basic Concepts and Notation In the discussion which follows, the symbol 0 will be used to denote the set of all n attributes A1,A2,...,An on which a relation 3 of degree n is defined. The empty set is denoted by ¢. The union and intersection Operations between sets will be denoted by U and n, reSpectively. If A is a set, then [A] is the cardinality of A. The usual set notation for a set 0 With n elements A1,A2,...,An is D = [A1,A2,---,An]; while here the notation Q = AlA2 ... An is used. Two sets X and Y are disjoint if X FIY = O. A partition on a set 0 is a collection of non-empty disjoint subsets of 0 whose union is 0. Definition 2.2.1 Let Li and Ri’ i = 1,2,...,m, be non-empty subsets in Q, then the {Li T‘Ri} is called the set of functional relations in Q. Armstrong [2] has shown that the following prOperties, Pl - P5, are sufficient to find the set of all functional relations 19 that can be derived from a given set of functional relations. Let A, B, C, and D be any non-empty subsets of Q then, Pl. Reflexivity: A ~.A; P2. Transitivity: If .A a B and B —~C, then A —.C; P3. Augmentation: If A.—oB, then AI —~B for any A' D.A; P4. Projectivity: If A a B, then A a B' for any B' C B and B' # m; P5. Additivity: If A ~ B and C —oD, then A U C —oB d D. The purpose of this chapter is to exhibit an algorithm which generates all keys which can be found using only the given functional relations and prOperties Pl - P5. The starting point is a set of functional relations Specified by the data base administrator or the user. Let {Li ..Ri}’ i = 1,2,...,m, be the set of functional rela- tions in 0. Without loss of generality, one can impose the restric- tions that L1 # Lj for i # j and that the intersection Li FIRi, i = 1,2,...,m, is empty. Note that if AB —oBC, then it is true that AB ~oC, but it is not necessarily true that A 4’BC or that A «IC. The implication matrix P of a relation R of degree n, defined on Q = A A 1 2 . An’ is an m X n matrix whose rows are labeled L ,L ,...,L and whose columns are labeled A ,A ,...,A l 2 m l 2 n such that: 1 if Aj 6 (Li U Ri) i o J 0 otherwise 20 Example 2.2.1 Consider the following functional relations where Q = ABCDEFG. ABC «DEG; AB «CF; CD -oEF; EG -.AC Then, the implication matrix P is A B C D E F G EC 1 0 1 0 1 0 1 The fact that row labels of ?P are not isomorphic to Sub- sets of the column labels requires an extension of the usual method of finding the transitive closure of an implication matrix. The * transitive clOSure, P , of P is defined as follows: * 1. Put P =P * 2. For every two distinct rows Li and Lj in P , if for every attribute Ak in Lj’ the entry LiAk = 1, then copy all 1 entries in row Lj into the corresponding entries in row Li' The 0 entries in row Li change to 1 if the correSponding entries in row Lj are l and the original 1 entries in row Li will remain. This changes * P * 3. Repeat part 2 above until P cannot be changed any further. 21 Example 2.2.2 Starting with P in Example 2.2.1, then EC 1 0 1 0 l 0 1 where 1_means a 0 has been changed to 1 in the process of taking the transitive closure. Before proceeding with the proof of the basic theorems that support the algorithm, the following notation is needed. From the definition of P*, P* might effectively indicate a new set of functional relations. These new functional relations are denoted Li a Ti’ i = 1,2,...,m, where Ti D'Ri, Li H T1 = ¢. In P*, it might be the case that some rows are not all 1. The set of attributes that correspond to 0 entries in row Li Will be denoted by T]. Note that T; = ¢ if row L1 is all 1 in P* and that n = LiTiTi is a partition of 0, if T] # O. It is also assumed that L1 and T1 are not empty. 2.3 Mathematical Preliminaries The mathematical properties of the functional relations are stated in Lenmas 2.3.1 - 2.3.7, and the theoretical foundations for the steps taken in the algorithm for finding the keys are pre- sented in Theorems 2.3.1 - 2.3.3. Lemma 2.3.1 * In P , 1f Li a Ti and Li U Ti # Q, then Li can be 22 extended by the subset T] such that L1 U T] 4 0- Proof: Since Li-oTi and Li-oLi, u _. u I _. v = Also Ti Ti (by P1), hence L1 u Ti Li u Ti u Ti 0 then L1 —.L1 U Ti (by P5). (by P5). Lemma 2.3.2 * In P , if row Lj is all 1 and Lj CLi U Ti’ then row 13_ is also all 1. Proof: Since row Lj is all 1, this implies that Lj « 0. Also Since Li a Ti’ then Li -'Li U Ti (by P5). But Li U Ti fiiLj (by P4) and so L —. —. b . 1 Lj n ( y P2) Lemma 2 .3.3 * In P , if L is a Subset of Li U T and if row Li j i is not all 1, hence row Lj is not all 1, then, for T] and T; such that L1 U Ti «10 and Lj U T] a'fl, the inter- section Of T; and T5 is not empty. met: It follows from the transitive property P2, that any 1 entry in row Lj is a 1 entry in row Li and hence any 0 entry in row Li is a 0 entry in row L But T; j' correSponds to the 0 entries in row Li’ hence T; O T5 # ¢. Example 2.3.1 n = ABCDE A B c D E 1. AB —.c * AB 1 1 1 l o P = 2. AC —.D AC 1 o 1 1 0 t = I = I I then T1 E, T2 BE and T1 0 T2 # ¢. 23 Lemma 2.3.4 9: In P , if Lj is a subset of Li U 1'1 and if I'OW L1 is not all 1, then for any a C Tj' such that Lj U Q/j -* O, i the intersection of Ti and aj is not empty. M: The proof is by contradiction. If aj 0 Ti = O, then C - I . . . or]. L1 U Ti Since LiTiTi is a partition on 0 Also Lj C Li U Ti’ hence L U aj C Li U Ti. But this would imply J that Li -+ Li U Ti aLj U or]. -+ D, which is a contradiction since Li /- 0. Therefore, a H Ti 9‘ O. 3 Lemma 2 .3 .5 ~1- In P , if L is not a Subset of L_ U Ti and if row i J Lj is all 1 and row L1 is not all 1, then for Ti Such that L1 U Ti ..0, the intersection of Ti and Lj is not empty. 2.1222: If the intersection of Ti and L is empty, this implies J that the 1 entries in row L1 include Lj and hence Lj would be a Subset of L1 U Ti which contradicts the hypothesis. Therefore, T]!- n Lj 1‘ (D. Example 2 .3 .2 O=ABCDE ABC DE 1. A-oBC * A 1 1 1 0 0 P = 2. AD—~E AD 1 _1_ l 1 1 l__ = I then 131-1313,}.2 AD and TIHL2#¢. Lemma2.3.6 a: In P , if L is not a subset of Li U Ti and if row J 24 Lj is not all 1 and row Li is not all 1, then for Ti such that L1 U Ti - 0, the intersection of Ti and LJ. is not empty. 232015.: The proof is exactly the same as in Lemma 2.3.5. Example 2 .3.3 0=ABCDE A B c D E 1. A-aBC * A 1 1 1 0 0 P = 2. BD—oA BD 1 1 1_ 1 o u: = u then '131 DE, 12 BD and T1 an #eb. Lemma 2.3.7 '1: In P , if L is a subset of L. U Ti’ then L and i j i Li U L, imply the same subset in Q. J Proof: * In P , L1 -*L1 U Ti and so, by definition of Ti’ if Zi C. 0 such that Z 23Li U T1, then Li [4 21. Similarly, i Lj -0 Lj U Tj . But, by hypothesis, Lj L1 - Li U Ti _. Lj -. I.j U Tj by projectivity and tranSitiVity reSpectively which implies that Lj U T CLi U Ti and so chi U Ti. Also L «T andL -+T,andso 1 i J J LiULj—oLiUTiULjUTj =Liu'r11 since Lj U Tj §:Li U Ti. Therefore, L same subset, Li U Ti’ in O. and Li U L imply the i J Example 2 .3 .4 Refer back to Example 2.3.1. Note that L1 = AB .. ABCD and L1 U L2 = ABC «ABCD. 25 Theorem 2.3.1 * In P , if L is not a subset of Li U Ti’ then there J exists a subset a C‘L , possibly a = O if row L1 is all J 1, Such that L1 U a and L1 U Lj imply the same subset in 0. Proof: If Li ~10, then a = O and hence it is always true that L1 U Lj ~10. If L1 #10, then by Lemmas 2.3.5 and 2.3.6, for T] such that L1 u T; ..0, it follows that a = Ti' nLj ’9 O. So a C Lj. Since L TiT' is a partition on 0, then either i i | l chri or LJ.fl(LiUTi)#O. If chT1, 0 = L FIT; = Lj and Li U a = Li U Lj and hence they imply then J the same subset in Q. If Lj ¢ T], then L FI(Li U Ti) # O. J j j j FITi) since LiTiTi is a partition on 0; hence Lj = [Lj FI(Li U Ti)] U a. Also But Lj=(L nLi)U(L nTi)U(L L 0(L1UTi) CLiUTi, so I." II [Lj fl(LiUTi)]UaCLiUTan. But Li—vTi,so L U a « Ti' Therefore, using Lemma 2.3.7, if L. C (‘Li U a) U Ti where Li U a a T then the Subset in 1, 0 dependent on L1 U a is the same subset dependent on the union (Li U a) U L = Li U Lj since a :‘L . J J Example 2 .3 .5 In Example 2.3.3, note that a = D C‘Lz = BD and that A U D and A U BD imply the same subset, ABCD, in Q. If K = Li U ai is a key, then K is said to be derived from L.. 1 26 The following theorem shows that unions of Li and Lj are not useful in deriving keys. Theorem 2.3.2 Any key that can be derived from Li U Lj can also be derived from Li or Lj separately. Proof: By Lemma 2.3.7 and Theorem 2.3.1, for some a C'L , where J a could possibly be empty, Li U a and Li U L imply the J same subset in Q. If Li U 0 ~10, then Li U Lj ~>0, but Li U Lj QiLi U 0, hence Li U Lj cannot be a key whenever Li U a is unless Li U Lj = Li U a. If Li U a #>Q, then, by Lemma 2.3.1, Li U a and Li U Lj can be extended by the same minimum subset 3 so that L1 U (a U B) ~»Q and (Li U Lj) U B a Q. But a C‘Lj and for any 8, a U 8 C Lj U 8, hence Li U (a U B) c (Li U Lj) U B and therefore (Li U Lj) U B can never be a key whenever Liumue) is unless (LiUL)UB=LiU(OIUB)- J The previous theorems and lemmas Show that to find the keys of a relation a, one needs only consider the functional relations L a T., i = 1,2,...,m, and that taking the union L1 U L of any J two Li’ L will not result in any new key that cannot be found J from L1: or Lj separately. So, the algorithm uses the transitive property repeatedly to find the minimal ai such that Li U 01 4g). The following theorem asserts that for each Li U a1 in T2, to be explained in the algorithm,'Li U 01 4:1. 2.4 27 Theorem 2.3.3 If L is any Subset of 0 such that L a Q, and if T; # m, then Li U (Ti (TL) ..9, By Lemmas 2.3.3 - 2.3.6, T] (IL # O whenever T] ¥ O. Partition L into L and L such that L C‘L J T and 1 2 l i 1 L2 C T]; this is possible since LiTiTi is a partition on ._ I —O _. o o o {2 and L2 —-Ti FIL # O. L1 L1 U Ti L1 by pr0ject1Vity (P4). Ti [IL2 = L2 ~TL2 by reflexivity (Pl). Since “Ti: flL2 C T;- (IL, Ti (TL «L2 by augmentation (P3). Also by additivity (P5), Li U (T; (IL) -»L1 U L2 = L ~10. 30, Li U (T; n'L) #10 by transitivity (P2). Finding the Keys Algorithm A1 1. Form P* with row labels Li’ i = 1,2,...,m. 2. T1 = O and T2 = O- 3. For i = 1 to m enter into T1 each Li U T] where ITII 2 O. 1. I I ' I I 4..If LiUTiCLJUTj and 1f \Tj\51 and \Ti\_<_1, then delete Lj U T] from T1. 5. If \Ti‘ g_l for all remaining entries in T1 then terminate the algorithm. Tl contains all keys. a. Else for all i in T1 such that \TiI 2 2 form = I I aij Ti H (Lj U Tj) for all j # i where \T3\ 2 O. J b. For each i, delete any aij' such that aij C 0. 31‘3”- 28 c. For each remaining a enter Li U a into T2. ij ij Then delete from T2 any set which is a superset of any other set in T2. d. For all i in T1 such that ITiI 2 2 and all Lj in T2 for which Li If: Lj form aik = T] n (Lj u ajk). e. For each i, delete any aik' such that aik C aik" ka‘k'. ; “I f. Enter into T2 any Li U aik which is not a superset of a set already in T2. Then delete from T2 any set which is a superset of any other set in T2. If any new entries are thus created in T2, repeat from Step 5d. Otherwise go to step 6. 6. Copy from T1 into T2 any sets Li U T] in T1 where ITiI 5.1. 7. Delete from T2 any set which is a Superset of another set in T2. The remaining sets in T2 are all of the keys and the algorithm terminates. By construction, the entries in T1 satisfy Li U T] «IO and, by Theorem 2.3.3, all entries in T2 satisfy Li U aij ~»Q. It remains only to show that the algorithm is complete, in the sense that it finds all keys. If Li U ai’ where IaiI 2 O, is a key, then, for I01] = 0, L1 is a key only if, in jP*’ Li —v0, and no subset of L1 implies 0. Therefore, to Show completeness it remains only to show that if ‘01] 2 1, then Li U ai is found by the algorithm. The following completeness theorem will Show this. Assume that the set of functional relations has more than one element in it, otherwise it is a trivial case. 29 Theorem 2.4.1 _ I If Li u all, IdiI 2 1, is a key, then “i - Ti 0 (Lj u Bj), I83] 2 0, for some j ¥ i. Proof: If Li U ai is a key, then Li U a1 «'0. But Since Li #10, this implies that there exists a subset Z CLi U Ti U ai such that Zi —IT£ where N II Oi U [Zi n (Li U Ti)] and Z1 = 01 only if Z. H (Li U Ti) = O. There is always one such 2 , namely i = Z '. ’ Z a T' i 0T1 So, if i = ... .... '. i i one must Show that there exists a j i i, such that Zi = Lj U Bj a T] only one functional relation which would give ai , except in the trivial case when there is T} and i Z = --I —-I '. -I ' = . ' 1 Li U a1 0 Ti If Lj Ti’ then let 21 Lj This is done for each 'L such that Lj -0T£, then choose the J minimal Oi -- no superset of a1 is chosen, where = ' _. a. T1 FILj and L1 U ai 0. If L I 1 {'Ti, then for some J Bj’ Lj U Bj aiTi by augmentation P3 and additivity P5 and for I I I I ___. some Bj, Lj U Bj U 81 a T1 U Lj U 81 U 81 Q. where B] ; L1 U T1, 3831“ by P3 and P5. Therefore 35 FIT] = O I 8 since LiTiTi is a partition on 0. Let Zi Lj U Bj’ then T] n (‘Lj U 8j U 33) = T] rI(Lj j) = T] n Z1 = a1. Again this is done for each Lj U 31 such that Lj U Bj « Ti. But, this is exactly how the algorithm iteratively finds the U 8 different a so that L1 U a1 4.9 and then chooses the 1 minimal of these a1. 30 When Algorithm Al terminates, every set remaining in T2 is a key. Further, by Theorem 2.4.1, no keys are missed (not in T2). Therefore, the algorithm is complete. 2.5 Examples Example 2.5.1 The process of finding the keys of the relation in Example 2.2.1, will be presented in a somewhat Optimized manner, but the steps still agree with the algorithm. T1: AB; ABC; CD UABG; EG U BDF Delete ABC 2.AB from T1, so Tl: AB; CD U ABG; EG U BDF. Apply 5a-5 of the algorithm as follows. f EC U [BDF n (CD u ABG)] EG U BD; CD U [ABC FI(EG U BDF)] CD U BG EG U (BDF 0 AB) EGUB;CDU(ABGnAB) CDUAB Therefore, T2 holds: T2: EG U B; CD U AB; CD U BG. Intersect every subset T], where \TiI 2 2, in T1 with every subset in T2, then add this intersection to L1 of Li U T; and put the resulting subset in T2 if it is not a superset of a set already in T2. BDF FICDAB BD gives EG U BD (superset, do not put in T2) BDF FICDBG BD gives EG U BD (superset, do not put in T2) ABC FIEGB = BG Gives CD U BG (already in T2). Every Ti, IT£I 2 2, in T1 has been intersected with every sub- set in T2 and no new subsets resulted for T2. Proceed to step 6 of the algorithm to get: 31 T2: AB; EGB; CDAB; CDBG. Applying step 7 leaves the keys: T2: AB; EGB; CDBG. Example 2.5.2 Consider the following functional relations: ABC —oDF; BCD 4G; CE «AD; DG «EF; BF -»CG. * The transitive closure, P , is: A B C D E F G ABC 1 l l l |I-| ...: "-4 BCD l 1 1 lb-' II-l IH ,...I DG 0 0 0 1 l 1 1 BF 0 1 l 0 0 1 1 So, Tl: ABC; BCD; CE U BFG; DG U ABC; BF U ADE. Apply 58-5c of Algorithm A1 as follows: BFU [ADE n (DGUABC)] BFUAD ; BFU [ADE n(CEUBFG)] BF u (ADE n BCD) BFUD ;BFU(ADEnABC) DGU [ABC n (BFUADE)] DGUAB ;DGU [ABC n (CEUBFG)] DC u (ABC n BCD) DGUBC;DGU(ABCnABC) CE u [BFG n (BF u ADE)] CEUBF;CEU[BFGn(DGUABC)] CE u (BFG n BCD) CE U B ; CE U (BFG FIABC) Therefore T2 holds: T2: BF U A; BF U D; BF U E; CE U B; DG U AB; DG U BC. Now, applying Sd-Sf of the algorithm gives: BF U E BF U A DC U BC DG U ABC CE U BG CE U B 32 BF u [ADE rI(CE u B)] = BF 0 E (already in T2) BF U [ADE FI(DG U AB)] = BF U AD (superset, do not put in T2) BF U [ADE [\(DG U BC)] = BF U D (already in T2) DG U [ABC [\(BF U A)] = DG U AB (already in T2) DG U [ABC 0 (BF U D)] = DG U B (put in T2 and delete any superset) DG U [ABC [1(BF U E)] = DG U B (already in T2 from previous step) DG [ABC FI(CE U B)] DG U BC (superset, do not put in T2) CE [BFG FI(BF u A)] CE u BF (superset, do not put in T2) CE U BF (Superset, do not put in T2) CE [BFG FI(BF u B)] U U CE u [BFG rI(BF u D)] U CE U BF (superset, do not put in T2) U CE [BFG 0 (DG U B)] CE U BG (superset, do not put in T2) Therefore, T2 holds; T2: BF U A; BF U D; BF U E; CE U B; DG U B. Again, applying Sd-Sf of the algorithm produces no new entries in T2; so proceed to step 6 to get: T2: BF u A; BF u D; BF u E; CE U B; DG u B; ABC; BCD. Finally, applying step 7 gives all keys: T2: BF u A; BF u D; BF U E; CE u B; DG u B; ABC; BCD. 2.6 Chapter Summaryiand Remarks A new algorithm for finding all keys for relational data bases has been presented. It is clear, from the examples given in Section 2.5, that, at least by hand computation, Algorithm Al seems easier to apply than the algorithm suggested by Delobel and Casey. The keys can be studied within the framework of the implica- tion matrix which is a familiar algebraic approach. It is important 33 to note that Algorithm Al is not written in a ready to program notation, and so care must be taken in choosing the apprOpriate notation for actual programming. Also, Optimization is possible and the examples in Section 2.5 Should provide some help in this direction. Finally, Algorithm Al has not been programmed, and so no eXperimental comparisons have been made between the approach taken here and that of Delobel and Casey. CHAPTER 3 FUNCTIONAL PARTITION AND REDIIITION 3.1 Introduction In this chapter, computational savings are considered which lead to new concepts of functional partition and reduction. Also, detailed mathematical analysis of prime and non-prime attributes is given, and conditions are given under which one can delete a functional relation without changing the keys. Functional partition provides a way of finding the keys of a relation R, defined in Q, from the projections of R into subre- lations defined on Subsets of O. This, in turn, implies that transitive closures are taken in submatrices of the implication matrix, P, and so, storage requirements are minimized. Functional reduction is a direct consequence of the mathe- matical properties of prime and non-prime attributes which are also discussed in this chapter. It is shown that the keys of a relation R defined in O depend on the given functional relations in R and on a subset, not necessarily prOper, of 0, while in Chapter 2, it was always the case that all the attributes in 0 were used, in Algorithm Al, for finding the keys. This chapter also Shows that functional relations of the form L —’Li can be deleted without altering the keys. 1 34 35 3.2 Functional Partition Algorithm Al in Chapter 2 showed that the keys of a relation R are completely determined by the original set of functional re- lations as Specified by the data base administrator or the user. If this set is large, then the number of computational steps for finding the keys is enormous, and so the question is: Can one find the keys of R from subsets of the original set of functional re- lations and, if so, how does one define these subsets? This section presents such Subsets provided by the binary operation (e0, defined on the set of functional relations, that partitions this set into equivalence classes. Let R be a relation defined in O, and let the set of func- tional relations in R be denoted by {F1} = {FiILi «IR i = 1,2,...,m]. i, This chapter assumes, unless otherwise Specified, that m 0 = U (Li U R1) and this assumption will be substantiated later. i=1 Definition 3.2.1 Two functional relations F and F in {F1} are called i J adjacent, written as Fi ~ Fj’ if (Li U R1) [1(Lj U R1) # O: Otherwise, F1 and Fj are called non-adjacent and denoted as F,1‘F.. 1 J Definition 3.2.2 F1 and Fj in {F1} are connected, written as F1 Rst, if at least one of the following conditions holds: 1) Fi'vFj; ii) There exists at least one Subcollection {FL ] C {Pi} i such that F ~'F , F "'F ,...,F “'F . i L1 Li Li+l Li+k J 36 Theorem 3.2.1 The connected relation 0%) is an equivalence relation on {Fi]° gases 1. 4% is reflexive It is always true that Fi ~ Fi since Li U Ri # O; hence, (Li U Ri) [\(Li U Ri) # O. Therefore, by Defini- tion 3.2.2, Fi R’ F . i 2. 5% is symmetric If F1N Fj’ and if Fi ~ Fj’ then Fj ~ Fi; hence, If Fi‘w Fj’ and if there exists a Subcollection {F j C {F ] such that F ~ F , F ~ F ,...,F ~ F,, Li 1 1 Li Li Li+1 Li+k 3 then reversing the step gives F~F ,...,F ~F ,F ~F;hence,Fe5F.. J {1+k Li+l Li Li i j i 3. Fe is transitive If Fifiw Fj’ and if Fjie Fk’ then there are two sub- collections {F j and {F ], subsets in {F,], each Li Lj 1 possibly with one element in it, such that F . ~ F , F ~ F , o o o , F ~ F and 1 Li 41 (1+1 Li+k 3 F, ~ F , F ~ F ,..., F ~ F . But, this . k implies, by Definition 3.2.2, that Piss Fk. Therefore, the binary operation,‘¥, partitions {F1} into equivalence classes such that F1 and Fj are in the same class if and only if (iff) Fi‘a Fj. The equivalence classes will be denoted by Fi where ‘F: is the set of all functional relations connected 37 to Fi' An equivalence class can be denoted by any of its members. Definition 3.2.3 If for each j # i, Fi¢$ Fj’ then iv induces the identity partition on {F1}. i.e. Each functional relation is in a class by itself. Definition 3.2.4 If for all i and all j, Fiew Fj’ then R! induces the universal partition on {Fi}' i.e. A11 functional relations are in only one class. Lemma 3.2.1 A partition on {F1} induces a partition on 0. Proof: If Fi and Fj are in different classes, then it follows that either Fi i Fj or that there exists no PR in F1 such that Fi ~ Fk' Therefore, let C1 be the set of attributes in El; then Ci FICj = O for i i j, and U C1 = 0. Hence, [Ci] is a partition on O. 3.3 Procedures for Obtaining Functional Partition Section 3.2 showed that the binary operation GU) is an equivalence relation on the original set of functional relations. This section presents both a graphical method, suitable for hand computation, and an algebraic method, suitable for hand computation and computer implementation, for producing this partition. 3.3.1 Graphical Method F. Given {F1}. Consider m points labeled F1,F2,..., m, reSpectively. For each i and each j, if Fi ~ F , then draw a J 38 line or arc between F1 and F , otherwise do not join F1 and Fj. J The number of connected subgraphs that results, is the number of equivalence classes in {F1}; Fi and Fj are in the same class iff they are in the same connected subgraph. Whenever each point F1 is an isolated point, then the result is the identity partition. However, if all the points F1 are on the same connected graph, then one gets the universal parti- tion. Example 3.3.1 .1 Let Q = ABCDEFGHKLM; Consider the following functional relations: F 0 ....) ’ ° -D . F 1. AB C, F2 3. BD FG, Note that Elie F3 and Elie F4 since (AB u C) FI(BD u FG) # ¢> and (AB u C) rI(AB u C) i a . .4 °F ' —-I'F° —-I. . H KL, 4. AB C, 5. KM L reSpectively. Similarly, F2 st5. Applying the graphical method gives the following two connected subgraphs G1 and G2. Therefore, there are two equivalence classes F1 = [F1,F3,F4] and F2 = {F2,FS] that correSpond to G and G 1 2 reSpectively. Note that F3¢¥ F4 although there is no line joining F3 and F4. However, the subgraph G1 is connected and hence F1,F3 and F4 are in the same equivalence class. Also, the parti- tion on {F1} induces a partition on 0, namely, if 01 is taken to be the set of attributes mentioned in F1,F3 and F4, then 01 = ABCDEFG and 02 = HKLM which are mentioned in F2 and F5; 0102 is a partition on Q. 39 3.3.2 Alggbraic Method Given {F1}, form the implication matrix, P, Such that each row, P corresponds to a functional relation F . Form the direct i’ i incidence matrix, P', from P by letting each entry in P' be the logical vector inner product of rows of P; P' = [p£,] where #(C) J I = I = = . . pij pji (Pi’Pj) and (Pi’Pj) kil pik pjk Where is logical product and E is logical sum. P' is a symmetric matrix of 0's and 1'8 with all l's along the diagonal and whose rows and columns are labeled F1,F * distinct row of the transitive closure (P') of P' define a 2,...,Fm respectively. The ones in each correSponding equivalence class under Ra Example 3.3.2.1 Applying the algebraic method to the functional relations in Example 3.3.1.1 gives: A. B C D E F G H K. L M F 1 1 1 0 0 0 0 0 0 0 0 F200000001110 P=F301010110000 FalOlOIOOOOOO F500000000111 F1F2F3F4FS F11011O F201001 (F')*=F3loiio Therefore, F1 = {F1,F3aF4I and F2 = {F2:F5]- 40 With suitable rearrangements of columns and rows in P, the implication matrix is a rectangular matrix having non-zero rectangular submatrices along its main diagonal with the remaining elements equal to zero. The following section will Show that the keys of R can be determined from the non-zero submatrices in the implication matrix, and so storage requirements are reduced because the zero submatrices would not have to be stored. 3.4 Properties of Functional Partition A partition on {F1} induces a partition on 0. Let 0102 ... an be such a partition. The relation R, defined in O, can be projected into subrelations Si’ defined in 01, i = 1,2,...,n, L1 j —oRi j be the set of functional relations Li a Ri in R that hold in Sj. The following lemma and theorem reSpectively. Let Show that the keys of R can be obtained from the keys of Si’ i = 1,2,...,n. Lemma 3.4.1 Let Li j 4 R1 j be the set of functional relations in Sj. If Li —IRi is a functional relation in R where j’ then there exists Ki,j C'L1 such that K. . C Q and K a R . 1’3 j 1:3 1L] Li c Q .and R1 j = Ri : Q P_r_9_f. The proof follows from the fundamental properties, Pl-PS, of the functional relations. The applications of reflexivity, transitivity, projectivity, and additivity on the functional relations in S result only in functional relations whose J attributes belong to Qj. However, applying the augmentation 41 prOperty with attributes not belonging to DJ results in a functional relation L1 «iRi in R, where Li C Q and Ri j = Ri C Qj. Therefore, there must exist a subset Ki,j Cflj Such that K1,] #Ri’j. The following theorem is the main result of this section. It shows that the keys of R can be determined from subsets of the original set of functional relations. Consequently, the major goal of functional partition, as stated at the beginning of Section 3.2, is achieved. Theorem 3.4.1 If 0 = 0102 ... on, and if K C Q, then the necessary and sufficient conditions for K to be a key of R are K = U Ki where Ki # O and K1 C Di such that K1 is a key of Si° Proof: Necessapy condition: Let K be a key of R. Since 0102 ... an is a partition on 0, K = (K n 01) U (K n 02) U...U (K h on). Let Ki = K O 01. One must show that, for each i, Ki # ¢, and that Ki is the key of 51' Suppose that, for some 1, K1 = On then " =K= - —. . Ki K1UK2U UK1_1JK1+1U UKn O 01 But K] has no subset that belongs to 01, and so, it contradicts 1 Lemma 3.4.1. Hence, for each 1, K1 1‘ O. Therefore, K = K1 U K2 U...U K.n a Q a mi and so, by Lemma 3.4.1, there must exist a subset Mi C K such that M1 C Oi and MR But Ki C K is the only subset in K that belongs to 0i; hence, Mi C K #01. i and Mi 4 Oi. Therefore, 42 M = M1 U M2 U...U M5 4 Q by additivity, and so M C K which contradicts the hypothesis that K is a key of R. unless M1 = Ki' Therefore, K1 i' Sufficient condition: If, for each i, K is a key of S C 01 is a key i of Si’ then K = U K114 0 by additivity. It remains only to Show that there is no prOper subset of K that implies 0. Suppose M<: K and M14 Q, then by the same procedure as above, M = U Mi where M1 C K1 and M1 i O. Again applying the same reasoning as above gives Mi a 01. But, Mi A Di contradicts the fact that K1 is a key of Si’ unless M1 = Ki. Therefore K.= U Ki is a key of R. Corollary 3.4.1.1 If the connected functional relations induce the identity partition on {F1}, then K = U L1 is the only key of R, .P_r___f The proof follows from Theorem 3.4.1 where each Fi: Li “’Ri’ i = 1,2,...,m, is in an equivalence class by itself. Let Oi = (Li U R1), and project R. into Si[Qi]. L1 is the only key of 81’ Since there is only one functional relation in each Si; hence, U Li is the only key of R defined in Q = (1102 (1m. Corollary 3.4.1.2 If no functional relations are given in R defined in Q, then 0 is the only key of R. ii_i Let Q = AlAZ ... An' Since no functional relations are given in R, then it follows from the reflexive prOperty that .43 F1: A1 4 A1, i = 1,2,...,n, are the only given functional re- lations in Rs Therefore there exists an identity partition on {F1} and, by Corollary 3.4.1.1, the only key of R is K = U A1 = Q. If A and B are sets, then the set difference is defined as A - B = {x‘x 6 A and x f B]. The following corollary shows that every attribute in R, that is not mentioned in any functional relation in R, belongs to every key of R. Corollagy 3.4.1.3 Let R. be a relation in Q, and suppose that the functional relations in R are defined in Q <: 0. Let Sl[01] be the 1 projection of R in 01. If K1 is a key of 81, then K = K1 U (Q - 01) is a key of R. iisci: Let SZ[Q - 91] be the projection of R in (Q - 01). It is clear that no functional relations are given in 82, except those by reflexivity. Therefore, by Corollary 3.4.1.2, (Q - 01) is the only key of S2. Also, since 01 and (Q - 01) are disjoint such that Q = 01 U (Q - 01), then, by Theorem 3.4.1, K = K1 U (Q ' 01) is a key of R. The preceding demonstrates that to find the keys of a re- lation R, defined in 0, one needs only be concerned with the connected functional relations in Q. So, the first step for find- ing the keys is the possible projection of R into subrelations Si’ i = 1,2,...,n, where, for each i, the functional relations in Si are connected. This projection is not always possible, namely, whenever the connected functional relations induce the universal partition on {F,]. i 44 Therefore, without loss of generality, one can consider a relation R, defined in 0, such that the functional relations in R are all connected, and that Q = U(Li U R1) which substantiates the statement made at the beginning of Section 3.2. Example 3.4.1 Consider the functional relations given in Example 3.3.1.1. Applying Algorithm A1 gives only one key, ABDEHM, for the relation R defined in Q = ABCDEFGHKLM. But, the same example Showed that {F1} can be partitioned into two equi- valence classes that induced the partition 0102 on Q, where Q = ABCDEFG and 02 = HKLM. Project R into Sl[01] and 1 SZ[QZ]; the functional relations in S1 are: AB—oC;BD-+FG;AE-IC; and the functional relations in 82 are: H « KL; KM «iL. Again, K1 = ABDE is the only key of 81’ and K2 = HM is the only key of S Note that K = K1 U K2 = ABDEHM is the 2. only key of R defined in Q = 01 U 02. 3.5 Properties of Prime and NonéPrime Attributes Recall the definitions of prime and non-prime attributes given in Chapter 1. It is essential to characterize the properties of these attributes for they play an important role in determining whether a relation is in second or third normal form to be studied in Chapter 4. Also, they help characterize the keys which form the basis of the normal forms and they are essential in the functional reduction which is to be explained later. Again, let R. be the relation in O, and let {F1} be the connected functional relations in R such that ()=U(LiURi). 45 Lemma 3.5 .1 If an attribute Ak is not a member of any L then there 1’ exists no functional relation L a r in R such that AR 6 L and AR E r unless there exists an L' C L such that Ak G L' and L' a r. 2.1-ace: If, for every i, AR 6 Li’ then AR 6 Rj, for some j. It follows from properties P1 - P5 of functional relations that Ak E L only by applying the augmentation property or by applying additivity with Ak a Ak' In either case, there must exist L' C L such that Ak f L' and L' a r. Theorem 3.5.1 If an attribute Ak is not a member of any Li’ then Ak is non-prime. Proof: Algorithm Al showed that every key of R is of the form I Li U Oi, for some i, where ai C Ti and IaiI 2 0. If, for every 1, IdiI = 0, then L is a key and AR 6 Li 1 by hypothesis; hence, Ak is non-prime. If, for some i, I01] # 0, then L1 U a1 a Q .and L1 F 0- Therefore, there must exist a subset Z1 CLi U Ti U Q, Such = _. ' - that Zi a1 U [Zi H (Li U Ti)] and Z1 T1 Oi. Also ai = Z1 n T], and a1 is the minimal subset of T] such that Z —O ' - o 1 Ti 01 But if Ak E ai’ then AR 6 Z1 and Ak f (T; - Oi) since Zi and T' - a are disjoint, and so i i I . Zi Ti 01 contradicts Lemma 3.5.1 whenever Ak E 21 and 46 Ak i T; - a1. Therefore, AR 6 ai’ and so A.k E Li U a1, which implies that Ak is non-prime. Example 3.5 .1 Example 3.4.1 showed that ABDE and HM are the only keys of S1 and $2 reSpectively. The attribute L is not a member of any L1 in S2, and L is non-prime. Also, the attributes C, F and G are not members of any L1 in 81’ and they are also non-prime. However, the attribute K is a member of L2 in 82 and yet K is non-prime. Therefore, a non-prime attribute may appear as a member of some Li' Corollary_3.5.l.l If an attribute Ak is prime, then A is a member of some k L i' This statement is true because it is the contrapositive of the statement in Theorem 3.5.1 which is true. Corollaryr3.5.l.2 Every key K of R is a subset of U Li. Proof: k 1 therefore, by Corollary 3.5.1.1, Ak is a member of some Li By definition, every attribute A n a key K is prime, and hence K C U'Li. Lemma 3.5.2 If an attribute Ak is not a member of any Ri’ then there exists no functional relation L a r in R, such that Ak G L and AR 6 r. 47 Proof: If, for every 1, AR 2 R1, then AR 6 L , for some j. J Only, by using the additivity property with Ak d’Ak’ that one gets Ak E r. But then AR 6 L. Therefore there exists no functional relation L a r in R such that Ak f L and Ak E r. Theorem 3.5.2 If an attribute Ak is not a member of any Ri’ then Ak is a member of every key of R. isssiz Let K.= Li U ai be a key of R and suppose that AR 2 K, then R a Q a Ak’ by definition of a key and by projectivity. However, this contradicts Lemma 3.5.2 and therefore Ak E K for every key K of R. Example 3.5.2 Let Q = ABCDEFGHK, and let the functional relations in R be as follows: AB -+CD; ABE «FG; CE aHK; AH aBFK. The keys are: ABE, AEH and ACE. The attributes, A and E, belong to every key, and they are the only attributes in Q that do not belong to any Ri' Corollaryg3.5.2.1 If an attribute AR 18 not a member of any R1, then Ak is prime. Proof: By Theorem 3.5.2, Ak is a member of every key of R, hence, by definition, Ak is prime. 48 Corollary 3.5.2.2 If an attribute Ak is non-prime, then Ak is a member of some R. . 1 Proof: This statement is true because it is the contrapositive of the statement in Corollary 3.5.2.1 which is true. Corollary 3.5.2.3 If every attribute in U R1 is prime, then every attribute in 0 is prime. Proof: Let 01 = U Ri’ and let 02 = Q - 0 Every attribute 1. Ak in 02 is not a member in 01 and so, by Corollary 3.5.2.1, every Ak is prime. But, 0 = 01 U 02 and, by hypothesis, every attribute in 01 is prime; hence, every member in O is prime. Corollary_3.5.2.4 If every attribute in U Ri is prime, then 0 = U Li' 13%: By Corollary 3.5.2.3, every attribute Ak in O is prime and so, by Corollary 3.5.1.1, every Ak is a member of some Li’ hence Q C U Li' But, by definition of Q, U Li C Q. Therefore 0 = U Li. Corollary_3.5.2.5 Let Q = U L' and let 02 = U R . If 0102 is a parti- l i’ 1 tion on Q, then 01 is the only key of R. Proof: Since 0102 is a partition on Q, then every attribute in 49 01 is not a member of any R1 and every attribute in 02 is not a member of any L1. But, by Theorem 3.5.1, every attribute in Q is non-prime and, by Theorem 3.5.2, every 2 attribute in 01 is a member of every key of R. Therefore, if K is a key of R, then 01 C K but, by Corollary 3.5.1.2, K C 01 and so K = 01 is the only key of R. Example 3.5.3 Refer back to relation S1 and its functional relations in Example 3.4.1. Let 01 = U L. = ABDE, and let 02 = U R1 = CFC, 1 then 0102 is a partition on D = Q 1 U 02 and O = ABDE is l the only key of SI. Theorems 3.5.1 - 3.5.2 and their corollaries Show that the keys are subsets of the attributes, U L that appear only on the i, left side of the original functional relations Li -.Ri' This is a fundamental result and it is used in the following section to Show that the columns -- in the implication matrix -- correSponding to the attributes that belong to (U Ri) - (U Li) can be deleted leaving exactly the same set of keys. 3.6 Functional Reduction Let R be a relation in Q, and let {Fi: Li a R1} be the connected functional relations in R such that 0 = U(Li U R1). Let “1 = U Li’ and let 02 be the set of attributes in 0 that are not members of any Li’ so that 0102 is a partition on 0. Project R into the subrelation Sl[01]. The functional relations in 81’ called reduced functional relations, are derived from the given functional relations in R by deleting the attributes in 02. Note that, for every i, the functional relations in S1 50 are of the form Li —»R£ such that R; C Ri whenever 02 ¢ O and R] = Ri only when Q2 = O, and so, by projectivity, the re- duced functional relations in S1 still hold true. The following theorem is a fundamental result which shows that the keys of R depend on the functional relations in R, and on 01, while, in Algorithm Al in Chapter 2, all the attributes in Q were used to find the keys. Theorem 3.6.1 Let K C Q, then K is a key of R if and only if K is a key of 31’ the projection of R on U Li. 129215.: Necessary condition: If K is a key of R, then, by defini- tion, K a Q and, by projectivity, K a Q Moreover, since K 1. is a key of R, then K is a minimal subset in Q Such that K a Q. But, by Corollary 3.5.1.2, K is a Subset of Q1, and so K is a minimal subset in Q1 such that K a Q. Therefore, it remains to Show that K is a minimal subset in G5 such that K a Q1. Let K1<: K, and suppose K1 4 Q1. But, by additivity, 01 = U Li —'U R1 and so, by transitivity, K a U R l i and, by additivity, K1 .. (U Li) U (U R1) = 0. However, K1 4 Q contradicts the hypothesis that K is a key. Therefore K is also a key of 31. Sufficient condition: If K is a key of 81’ then K 4 Q1, and K,» Q1 ~»U Ri by definition and transitivity reSpectively. Therefore K a (U Li) U (U R1) = Q by additivity. If there exists a prOper Subset ch: K Such that K1 aiQ, then K1 4 Ql by projectivity, which contradicts the hypothesis that . Ill. A II) I III]... 11‘ 51 K is a key of S Therefore, K is a key of R. 1. Example 3.6.1 Refer back to Example 3.5.2. The keys of R are: ABE, AEH and ACE. Apply the method as prOposed in Theorem 3.6.1. Project the relation R in 0 into Sl[Q1] where 01 = U'Li = ABCEH and Q2 = Q - 01 = DFGK. The reduced functional relations in S1 are: AB —»C; ABE ~»ABE (by reflexivity); CE «IH; AH a B. s The transitive closure, P , of the implication matrix, P, in S1 is: A B C E H AB 1 1 l 0 0 *ABE11I_1_1_ P: CE 00111 AH 11101 Again, the keys of S are: ABE, AEH, and ACE which are l exactly the same keys of R. Functional Deletion Note that in Example 3.6.1, the functional relation ABE a ABE holds by reflexivity. The following definition, prOperty, and lemmas will Show that one can delete a functional relation of the form L1 —’L1 from {F1} without changing the set of keys. Definition 3.7.1 A functional relation Fi of the form Li —ILi is called a reflexive form. Otherwise, it is called a non-reflexive form. 52 PrOperty 3.7.1 If L a r is a derived functional relation from the given [Pi], then L D‘Li for some 1, since the application of pro- perties Pl - P5 will either unalter or augment L Therefore 1. L D L.. 1 Lemma 3.7.1 If Fi is a reflexive form, and if there exists at least one j 9‘ i such that Lj CL1 in the original set of Li’ then any key that can be derived from Li’ can also be derived from L.. J Proof: Let Lij = Li - Lj. If Li U 8i e n, then (Lj u Lij) u B1 = Lj u (Lij u Bi) Lj u B a Q. So, there J is always at least one Bj C'Li U 81’ and hence Lj U Bj CLi U Bi’ such that Lj U Bj a Q. Therefore, Li U Bi can never be a key whenever Lj U Bj is unless Li U 81 = Lj U Bj. Lemma 3.7.2 If F1 is a reflexive form, and if for every j ¢ i, L ¢1Li J in the original set of L then any key that can be derived 1, from L1 can also be derived from Lk’ for some k # 1. Proof: If, for every j # i, L OILi in the original set of Li’ J * then, in P , L. —vL. only. 80, T, = O, and T3 = O - L. i i i i 1 since, by definition, Li 0 T1 = O. Hence, if Li U Bi 4 O, for Bi C T], then there must exist Z CLi U 81 Such that i Z a T' d b . . Z = i i’ an so, y Property 3 7 l, i Lk U Yk’ for some 53 I k # i, and Lk U Yk « Ti. But, since Lk Li U 81 a Q, then there always exists Bk C L1 U Bi Such CLi U 81 and that Lk U Bk CLi U Bi and Lk.U Bk « 0. Therefore, Li U Bi can never be a key whenever Lka Bk is, unless Theorem 3.7.1 Any functional relation in a reflexive form can be deleted from {Fi} without changing the keys of a relation defined in Q. assiz If F1 is a reflexive form, then, by Lemmas 3.7.1 and 3.7.2, every key that can be derived from Li can also be derived from L for some j # 1. Therefore, one needs only j, show that if, for j # i, Lj U dj is a key, then Fi need not be used to find “j If Lj U Bj a Q, B, 2 a then there must exist J j, Z, C L. U T, U _ such that Z,H T'. Hence, if Z = L. U ., J J J Bj J j j 1 Y1 I I = d _. I then, for Y1 C Lj U Tj’ Li U Yi U Yi Li U 8i Q Tj such = I = I that Bj Tj FI(Li U Y1) Tj Q (Li U Bi)’ and so, by Lemmas 3.7.1 and 3.7.2, there always exists Lk U Bk CL1 U Bi’ for -o —o ' some k # i, Such that Lk U 8k 0 Tj' Hence, if Lj is a key, then, by Algorithm Al, Li U 81 need not be used U aj since it is a Superset. Therefore, F1 can be deleted from the original set of functional relations without changing the keys of a relation defined in Q. 3.7.1 54 Examples Example 3.7.1.1 Consider the reduced functional relations in Example 3.6.1. Deleting the functional relation ABE 4.ABE gives the new set of reduced functional relations: AB-+C;CE—oH;AH—oB . Again, the keys are: ABE, AEH, and ACE. Example 3.7.1.2 Refer back to relation S1 and its functional relations in Example 3.4.1. Projecting S into Si[ABDE] gives the 1 following reduced functional relations in Si: AB-oAB;BD-+BD;AE-+AE All the reduced functional relations are reflexive forms; so, they can be deleted. Note that ' = ABDE in Si and there are no non-reflexive form functional relations in Si, so, by Corollary 3.4.1.2, Qi is the only key of Si. This justifies the answers in Examples 3.4.1 and 3.5.3. Example 3.7.1.3 Refer back to relation 82 and its functional relations in Example 3.4.1. It was shown that K2 = HM is the only key of 82. Projecting S into SéEHKM] gives the following 2 reduced functional relations in Si: F : H a K ; F2: KM 4 KM. Note that F2 is a reflexive form and so it can be deleted. Therefore, F is the only functional relation left in S' l 2 and so HM is the only key of S', and hence of 82. 55 2: KM « KM in Si is deleted, the keys must still be found in Qé = HKM, although not all Observe that when F attributes in Qé appear in the reduced functional relations after the reflexive forms have been deleted. 3.8 Chapter Summary and Remarks This chapter dealt mainly with computational savings. It was shown that if {Fit Li 4 R1] is the set of given functional relations in R, then the first step for finding the keys is the partition of {F1} into classes of connected functional relations. Moreover, it was shown that the keys depend only on the functional relations and the attributes in U Li. Extensive com- putational savings can be achieved whenever U Li CIQ, where Q is the set of all attributes in R. Also, further savings are possible whenever a functional relation is in a reflexive form. Finally, since the keys of R are determined from Sub— relations of R, defined on a partition of Q, then adding or deleting attributes to a subset of this partition will only affect the keys of the correSponding subrelation, while the keys of the other subrelations remain unchanged. Therefore, the concepts of functional partition, reduction, and deletion help provide an answer to the question of the effect on the keys of adding or deleting attributes in R. CHAPTER 4 NORMAL. FORMS 4.1 Introduction Chapter 4 deals mainly with relations in second and third normal forms. The definition of the relational model does not specify procedures for determining whether a normalized relation is in second or third normal form. This chapter provides such procedures, as a result of analyzing in detail the mathematical properties of a relation in second and third normal forms. 4.2 Properties of the Second Normal Form (SNF) This section characterizes the mathematical properties of a relation in SNF for the purpose of gaining insight into ways of determining whether a relation is in SNF. 4.2.1 Mathematical Properties Let R be a relation in Q. The functional relations, F,: L. a R., i = 1,2,...,m, in R are all connected and Q = U(Li U Ri)' Let Q, = U Li’ and let Q2 = U R1. It is assumed 1 that {F1} has more than one element. Let P and Q0 be the 0 set of prime and non-prime attributes in R reSpectively. Also, let p and q be arbitrary elements in ‘P0 and Q0 reSpectively. If every attribute in 02 is prime, then R is in SNF, because, by Corollary 3.5.2.3, P = Q; hence, QQ = O. However, 0 SNF'S exist for which QQ ¥ O. 56 57 Further, it is easily shown that, if K is a key, then no non-empty prOper Subset of K is dependent on any other distinct non-empty proper subset of K, for otherwise a proper subset of K would imply Q and K, would not be a key. Consequently, if Q1 is the only key of R, then Q0 = 02 i O, and so R is not in SNF, because Li._'Ri’ for i = 1,2,...,m, are given where Li<: Q1 and R1 C Q0. The preceding statements suggest the following two important theorems. Theorem 4.2.1.1 If QlQ2 is a partition on Q, then R is not in SNF. 1319133 By Corollary 3.5.2.5, Q1 is the only key of R, and so R is not in SNF. Theorem 4.2.1.2 If, for some i, Li U ”i’ IaiI 2 l, is the only key of R, and if QQ # ¢, then R is not in SNF. Erect: PQ =Li U a1, Li c:Li U a1, and L1 F’Ri is given. If Ri C'Li U a1, then Li «1R1 would imply that a proper Subset of a key is dependent on another distinct proper Subset of the same key, which is impossible. Therefore R1 C Q“, and SO R is not in SNF. As a consequence to the definition of a relation in SNF, the following property is true. Pryerty 4 .2 .l .1 If every key of R consists of only one attribute, then 58 R is in SNF, for none of the keys has a non-empty prOper Subset upon which a non-prime attribute could be dependent. Algorithm A1 showed that every key is of the form Li U ai’ I01] 2 0. So, if Li U 01 is a key, and if, for j # i, there exists an Lj CZLi in the original set of Li’ then L. U a. = L. U ai is also the same key. For the remainder of J J 1 this chapter, the following notation is used. If L U Oj is a J key, then, for every i # j, Li diLj in the original set of Li. Lemma 4.2.1.1 If Li a R and if there exists no L CZL in the i’ j 1 original set of Li’ then, for any L1 CLi and L2 C Q, ‘1 C ‘2' M: If L1 n L2, then by Property 3.7.1, L1 DTLk, for some k, and so L C L1 CIL , which contradicts the hypothesis k i that there exists no Lk CZLi in the original set of Li' Therefore L1 f L2. Consequently, the following theorems lead to Special cases in the algorithm for determining whether a relation is in SNF. Theorem 4.2.1.3 If every key of R is one of the original Li’ then R is in SNF. Proof: If L1 is a key, then, by the construction of Li and by Lemma 4.2.1.1, for any L1 CZL and L2 C Q, Ll flbz. 1 So, if q 6 Q0, then choose L2 = q, and so L1 f q. There- fore, R is in SNF. 59 Theorem,4.2.l.4 If every key of R is of the form ‘Li U a1, IaiI g_l, and if, for every j Such that Lj CZL1 U ai, Lj f q, then R is in SNF. P_.___f. If LClLi U ai’ and if L a r is a derived functional relation from the given set of functional relations, then, by Property 3.7.1, L D'L for some k, and L CLCLan/i. k’ k But Li U a1 is a key and so, by the construction of a key, i or L = Lk’ for some k # i, and so, if q 6 Q0 .and q é r, then R is in Lk81. P_r_9_0_f_= Necessary condition: If R is in TNF, then, by definition, every non-prime attribute in R is non-transitively dependent on each key K of R. If S a 32’ then K a S a 32’ by l l transitivity and the fact that K is a key. But K, S1 and 82 are mutually disjoint; so, K 4 S1 4'82 contradicts the hypothesis that R is in TNF. Therefore, 81 A 82 and similarly 82 f 81' Sufficient condition: If 81 f 82, and if 82 f 81, then, as in Lemma 4.4.1.1, S1 and $2 on keys, and on subsets that always have a non-empty inter- are only dependent on themselves, section with the set of all keys of R. However, in all these cases, it is easily Shown, by the definition of transitive dependence, that S1 and 82 are non-transitively dependent 68 on each key of R. Therefore, R is in TNF. Corollary 4.4.1.2.1 If every attribute in U R1 is prime, then R is in TNF. P_r_2i If U Ri C P then, by Corollary 3.5.2.3, P9 = Q and 0’ Q0 = O. Therefore, since there are no non-prime attributes in R, then, by definition, R is in TNF. Corollary 4.4.1.2.2 If \QOA = 1, then a relation, R, in SNF is also in TNF. Proof: 1r IQ Q] 1, then there are no disjoint proper subsets in Q0, and so, by Theorem 4.4.1.2, R is in TNF. O The following theorem is very important; it is basic to the proof of the fundamental result in this section. The result is another set of necessary and sufficient conditions for a relation in SNF to also be in TNF. Theorem 4.4.1.3 If S C Q0, then, for every p E P , S f p. 0 Proof: Suppose S a p. But, p is prime; so, it is a member of at least one key, say, Li U ai’ IaiI 2 0. Let L = (Li U oi) ' P: then LCZLi U ai and L is not a key, since L U a. is. i 1 Hence, S U L a p U L,= Li U 01 a Q, by additivity and trans- itivity respectively. But S U L,4 Q, and S U L is not a Superset of any key, since S FIFO = O. Therefore, S U L is a key and so 8 C PO. But, this contradicts the hypothesis that S C Q0, and so S 7» p. 69 The following result is fundamental and, as Section 4.5 will Show, it simplifies greatly the algorithm for determining whether a relation in SNF is also in TNF. Theorem 4.4.1.4 A relation, R, in SNF, is also in TNF if and only if, for every 1, Li FIPQ # O. Proof: Necessary condition: Suppose the relation, R, is in TNF. If, for some 1, Li FIPO = O, then Li C Q0; so, by Theorem 4.4.1.3, Li f p for every p 6 P0. Hence, for some i, Li U Ri C.QO such that L1 —'Ri is one of the original functional relations in R, for otherwise Ri FIFO # O; so, for some p E R1 QPO,Li a R a p by projectivity and i transitivity reSpectively. However, Li a p contradicts .4. o . o c . 3 Theorem 4 1 3, Since Li C QO Therefore, Ri Q0, so Li U Ri C Q0. But, by Theorem 4.4.1.2, R would not be in TNF, Since Li —IRi, Li QRi = O, and Li U Ri C Q0, which contradicts the hypothesis that R is in TNF. Therefore, for every i, Li FIP0 # O. Sufficient condition: Suppose, for every i, L1 TIPQ # O. If IQQI 2 2, then let S1 and 82 be any two disjoint non- empty prOper subsets in Q0. If S1 4 82, then, by PrOperty 3.7.1, and for some i, S D'Li. Hence, Li C S1 C Q0; so 1 ‘Li H PQ = O, Since PO 0 Q0 = O. But, Li (IF = O is a con- 0 tradiction since, by hypothesis, Li FIFO # O for every i. Therefore, S1 f 82; so, by Theorem 4.4.1.2, R is also in TNF. 70 The previous theorem fully characterizes a relation in TNF. However, it will be easier, in Algorithm A3 in Section 4.5, to use the contrapositive form of Theorem 4.4.1.4 to determine whether a relation in SNF is also in TNF. The following important corollary gives this contrapositive form. Corollaryi4.4.l.4.1 A relation, R, in SNF, is not in TNF if and only if, for some i L. C . ,lQQ Proof: This statement is true because it is the contrapositive of the statement in Theorem 4.4.1.4 which is true. 4.4.2 Examples In the following examples, each relation is in SNF. The problem is to determine those which are also in TNF. Example 4.4.2 .1 Refer back to Example 4.3.2.1. Note that QQ = X, and so IQQI = 1. Therefore, by Corollary 4.4.1.2.2, R is also in TNF. Example 4.4.2.2 Refer back to Example 4.2.2.2. Note that P0 = ABE and QQ = CDFG. For every i, Li OPQ # O; so, by Theorem 4.4.1.4, R is also in TNF. Example 4.4.2.3 Refer back to Example 4.2.2.4. Note that Q0 = O; so, by definition, R is also in TNF. Example 4.4.2.4 AB aCDE; AE .. BFG; CD _. FG. The keys are: AB, AE. Therefore, P0 = ABE and QQ = CDFG. 71 By Theorem 4.2.1.3, R is in SNF. But, CD is one of the original Li’ and CD C Q0; therefore, by Corollary 4.4.1.4.1, R is not in TNF. The preceding theory and examples lead to the following simple algorithm which determines whether a relation, in SNF, is also in TNF. 4.5 Procedure for TNF The following algorithm is a direct consequence of Theorem 4.4.1.4 and Corollary 4.4.1.4.1. Algorithm A3 1. Apply Algorithm A2 to determine whether the relation '3 ' SNF. F' d P and . 1 in in 0 Q0 2. If R is not in SNF, then go to 5. Otherwise, go to 3. 3. If, for some i, L i C QQ’ then go to 5. Otherwise, go to 4. 4. The relation is in TNF. Stop. 5. The relation is not in TNF. StOp. 4.6 Chapper Summary and Remarks Chapter 4 gives algorithms for determining whether a rela- tion is in second or third normal form. The correctness of the algorithms, A2 and A3, derives from an extensive mathematical analysis preceding each algorithm. These algorithms are suitable for hand computation as well as computer implementation, as was Algorithm A1 in Chapter 2. However, none of these algorithms have been pro- grammed. 72 It is clear that Algorithm Al is the building block for the algorithms, A2 and A3, given in this chapter. This is expected since the keys, as found by Algorithm Al, form the basis of a rela- tion in second or third normal form. CHAPTER 5 SUMMARY AND CONCLUSIONS 5.1 Conclusions This thesis provides a detailed mathematical analysis of the basic concepts of the relational model, as originally proposed by Codd. Three algorithms are presented. Their correctness derives from the basic properties of the functional relations, as given by Armstrong [2], and the extensive theoretical background provided in this thesis. The concept of the implication matrix, as applied to the functional relations in Algorithm Al to find the keys of a relation, has also proved to be fruitful in providing a useful mathematical theory and algorithms (A2 and A3 in Chapter 4) for determining whether a relation is in second or third normal form. Algorithms comparable to A2 and A3 do not exist anywhere in the literature. The mathematical prOperties of a relation in second or third normal form and the three Suggested algorithms are the main contributions of this thesis. The data base administrator may use these results to determine whether a relation is in either of these normal forms. Although this thesis provides three algorithms, no actual programming was undertaken because performance criteria are not central to the purpose of this thesis which is to develop and Show the utility of the mathematical foundation. 73 74 Many problems remain to be solved in the relational model. This thesis concentrated on the normal forms, as prOposed by Codd, only because no substitutes could be Suggested before their full mathematical properties were well studied. Now that those pro- perties have been examined in this thesis, an examination of the underlying definitions of normal forms will prove fruitful in determining whether the motivations for the second and third normal forms have been realized. This points to a new and rich area for further research. 5.2 Alternative Definitions for Normal Forms As was stated in Chapter 1, the objective of the normal forms is to minimize the update, deletion, and insertion anomalies. For this reason, Codd introduced the concept of functional dependence and proposed the definitions of first, second, and third normal forms. The examples in Chapter 1 showed that a relation, R, stored in a first normal form, is highly redundant and inefficient to main- tain. Moreover, projecting a relation, in first normal form, into a collection of relations in second or third normal form reduces the undesirable anomalies mentioned by Codd. A relation is a collection of facts, and, roughly Speaking, a dependence, A.-oB, correSponds to a fact. The basic motivation behind second normal form is that a fact stored in a relation Should be dependent on the whole key for the relation. That is, 75 if A a B, then A and B Should not be stored in the same re- lation, whenever A is a proper subset of a key, otherwise this fact is stored more than once and if B should change, then this update should be done in more than one place. However, the second normal form, as defined by Codd, does not totally meet this objective, as can easily be seen from the following example. Example 5 .2 .1 AB-+CD;CD—0AB; A-rC. The keys are: AB, CD, and A U D. Therefore, PC}: ABCD = O, the relation is in SNF. The and Q0 = O. Since QQ relation, R, might look like this: R: A B C D 1 1 l l l 2 l 2 l 3 l 3 l 4 l 4 Suppose now, one wishes to update the fact, A.-oC by chang- ing the values of A or C. It is clear that, as the value of C changes, then this change occurs in more than one tuple. The definition of a relation, R, in SNF, as proposed by Codd, is restricted to non-prime attributes to be fully dependent on each key of R. But, as was shown in Example 5.2.1, C is prime and C is not fully dependent on the key AB since A —.C. This shortcoming led Kent [48] to Suggest the following alternative de- finition of a relation in SNF; it will be referred to as KSNF. Definition 5.2.1 A relation in first normal form is in KSNF if every 76 attribute in the complement of a key is fully dependent on that key. It is important to note that, by definition, a relation in KSNF is also in SNF. Based on this alternative definition, it is easy to see that the relation, R, in Example 5.2.1, is not in KSNF since A.—oC. Project R into R1[AC] and R2[ABD]. It is clear that R1 and R2 are in KSNF, and so in SNF. Note that the fact, A 41C, is stored only once in the relation R1. Therefore, the alternative definition, as suggested by Kent, is an improvement, in terms of update minimization, over COdd's definition of a relation in SNF. The second normal form, as prOposed by Codd, is only the first step in eliminating certain redundancies in information storage. Other dependencies may exist in a relation, and Codd proposed the third normal form to eliminate information about an entity which can be derived from other attributes of the entity. Example 1.3.3 already elaborated on this concept. Moreover, the third normal form is restricted to the second normal form and to non-prime attributes to be non-transitively dependent on each key of the re- lation. However, Example 5.2.1 gave the reasons for Kent's alternative definition for the second normal form; so, for similar reasons, Kent suggested the following alternative definition for the third normal form. It will be referred to as KTNF. Definition 5.2.2 A relation in KSNF is in KTNF, if every attribute in the complement of a key is non-transitively dependent on that key. 77 Again, it is important to note that a relation in KTNF is in TNF, but the converse may not be true. It is clear, from the definition of KTNF, that every attribute in the complement of a key is fully dependent on itself and on that key. But, Lemma 4.4.1 showed that an attribute may also be dependent on Subsets that always have a non-empty intersection with the set of all keys of the relation. Therefore, although the conditions for a relation to be in KTNF are more stringent than those of a relation in TNF, it is still possible to have facts stored more than once in a relation, as the following example will Show. Examfle 5.2.2 AB aICD; AC a D The key is: AB. Therefore, P0 = AB and Q0 = CD. By Theorem 4.2.1.3, the relation is in SNF, and, by Theorem 4.4.1.4, it is also in TNF. Moreover, by definitions, the relation is also in KSNF and KTNF. The relation, R, might look like this: 1 3 l 1 Suppose now, one wishes to update the fact AC a D. It is clear that as the value of D changes, then this change occurs in more than one tuple. It is important to note that the fact, AB‘a D, is a con- sequence, by transitivity, of the facts AB —IC and AC ~ID. But 78 the objective of the third normal forms, TNF and KTNF, is not to store information like AB 4 D in the above example. There- fore, Example 5.2.2 points to a weakness in the statements of the third normal forms, as proposed by Codd and Kent. The ultimate goal, in the relational model, is to have a relation in TNF or KTNF; so, this thesis proposes only one alternative definition that is more restrictive than the definitions of Codd and Kent. The basic motivation for the following alternative definition has been explained in Example 5.2.2. Definition 5.2.3 A relation, R, in first normal form is in canonical normal form (CNF) if, for any two distinct, but not necessarily dis- joint, non-empty Subsets, $1 and 32’ in the set of all attributes in R, S 82, then S is a key of R. 1" 1 It follows, from the definitions, that a relation in CNF is also in KTNF, and so in TNF. Also, note that the relation, R, in Example 5.2.2, is not in CNF since AC q.D and AC is not a key of 9;. Project in into R1[ABC] and R2[ACD]. Note that 321 and R2 are in CNF. MOreover, R is the natural join of R1 and n that is R[ABCD] = R1[ABC] * R2[ACD], because AC .. D is the 2, Sufficient condition for R = R1 * R to be true; so, no essential 2 information is lost, because the original functional relations in R are preserved. Since AB «>C holds in R1, and AC a’D holds in R2, by transitivity, AB a D holds in R. Therefore, the de- finition of CNF, as suggested previously, is an improvement, in terms of storage requirements, over the definitions of Codd and Kent. 79 Further, the following example Should help explain the objectives of the Suggested canonical normal form. Example 5.2.3 AB «D; BC «A; CD «B; BD«A. The keys are: BC, CD. Therefore P0 = BCD and QQ = A. It is easy to Show, by definition, that the relation is in KTNF, and so in TNF. The relation, R, might look like this: R: A B C D 1 1 l l l l 2 1 l l 3 l l 2 4 2 The fact, BC —.Aq is a consequence, by transitivity, of the facts BC « D, Since BC is a key of R, and BD «.A. But since R is in KTNF, the objective of Kent's alternative definition has not been completely met. However, R is not in CNF, because AB « D and BD «.A but neither AB nor ED is a key of R. Project a into R1[ABD] and R2[BCD]. The functional relations in R1 are: AB «ID and BD —oAq while in R they are: 2, BC « D and CD «18. Therefore, R1 and R2 are in CNF. Mbreover R[ABCD] = R1[ABD] * R2[BCD] and the original functional relations in R are preserved; so no essential information has been lost. Examples 5.2.2 and 5.2.3 showed that the canonical normal form , as suggested in this thesis, meets better the original objectives of TNF and KTNF. Codd's SNF required each non-prime attribute to be fully dependent on each key, but allowed a prime attribute not to have 80 such full dependence; hence, update anomalies Still could exist. Kent's KSNF attempted to remedy that problem by requiring every attribute in the complement of a key to be fully dependent on that key. However, Chapter 4 showed that a subset, S, of attributes might have a non-empty intersection with all keys of a relation in KSNF, and it still might be that S «>A, where A is some prime or non-prime attribute, in which case update anomalies would Still exist. The canonical normal form defined here removes the update anomalies in all these cases, and leaves open the question of an appropriate definition of "optimality" when applied to normal forms, a question which might instigate Substantial further research. 5.3 Suggestions for Future Research This thesis did not deal with the concept of optimization in the normal forms. Codd [22] and Kent [48] motivated the subject, but their definitions of an Optimal normal form lack general and different criteria for possible comparison. It appears that the extensive mathematical theory and the algorithms presented in this thesis can be extended, with some variations, to deal with the Optimal normal forms. Although this thesis provides three algorithms, no actual programming was undertaken so performance criteria were not set. The algorithms should be tested on real data bases and their per- formance and complexity evaluated. Codd [25] and Strnad [68] define a collection of operations on relations, and this collection is called the relational algebra. These operations are not necessarily binary. They include the traditional set Operations (cartesian product, union, intersection, 81 etc.) and new operations on relations, suitable only in the rela- tional model, such as projection, join, division, and restriction. The user's queries can be expressed in the relational algebra by forming a new normalized relation from the existing collection of relations. The investigation of the mathematical properties of these Operations is strongly suggested. Finally, this thesis dealt only with the logical structure of a relational data base, and no mention was made of the Storage structure. Rissanen, et. a1 [57] analyze this problem in the form of examples so a much more rigorous and general mathematical approach is needed in this area. BIBLIOGRAPHY 10. BIBLIOGRAPHY ALTMAN, E.B., et. al (1973) "Specifications in a Data Independent Accessing Model", IBM Research Report, RJ 1141, IBM Research Laboratory, San Jose, California. ARMSTRONG, w.w. (1974) "Dependency Structures of Data Base Relationships", Information Processing 74, North Holland Publishing Company, pp. 580-583. ASTRAHAN, M.M., et. a1 (1972) "Concepts of a Data Independent Accessing Model", IBM Research Report, RJ 1105, IBM Research Laboratory, San Jose, California. BACHMAN, C.w. (1972) "The Evolution'of Storage Structures", CACM, Vol. 15, No. 7, pp. 6284634. BACHMAN, C.W. (1973) "The Programmer as Navigator", CACM, Vol. 16, No. 11, pp. 653-658. BELL SYSTEM TECHNICAL JOURNAL (1973) "Information Management System Issue", Vol. 52, No. 10, BIRKHOFF, G. (1967) "Lattice Theory", 3rd Edition, American Mathematical Society Colloquium, Publ. XXV, Providence, R.I. BJORNER, D., et. a1 (1973) "The Gamma Zero N-ary Relational Data Base Interface: Specifications of Objects and Operations", IBM Research Report, RJ 1200, IBM Research Laboratory, San Jose, California. BOSAK, R. (1962) "An Information Algebra", CACM Vol. 5, No. 4, pp. 190—204. BOYCE, R.F., et. a1 (1973) "Specifying Queries as Relational Expressions: SQUARE", IBM Research Report, RJ 1291, IBM Research Laboratory, San Jose, California. 82 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 83 BOYCE, R.F., et. a1 (1973) "Using a Structured English Query Language as a Data Definition Facility", IBM Research Report, RJ 1318, IBM Research Laboratory, San Jose, California. BRACCHI, G., et. a1 (1972) "A Language~for a Relational Data Base", Sixth Annual Princeton Conference on Infromation Sciences and Systems, March 23—24, Princeton, New Jersey. CANNING, R.G. (1974) "Problem Areas in Data Management", EDP Analyzer, Vol. 12, No. 3. CASEY, R.G., et. a1 (1974) "Generalized Page Replacement Algorithms in a Relational Data Base?, Proc. ACMSSIGFIDET Workshpp on Data Description, Access and Control, May 1-3, Ann Arbor, Michigan. CHAMBERLIN, D.D., et. a1 (1974) "SEQUEL: A Structured English Query Language", Proc. ACM-SIGFIDET Workshop on Data Description, Access and Control, May 1-3, Ann Arbor, Michigan. CHILDS, D.C. (1968) "Feasibility of a Set Theoretic Data Structure", Proc. IFIP Congress, Vol. 1, North Holland Pub. Co., Amsterdam, pp. 420-430. CODASYL (1969) "Data Base Task Group Report", Available from ACM HQ., New York. CODASYL (1971) "Data Base Task Group Report”, Available from ACM HQ., New York. CODASYL (1971) "Introduction to Feature Analysis of Generalized Data Base Management Systems", CACM, Vol. 14, No. 5, pp. 308- 318. CODASYL (1971) "Feature Analysis of Generalized Data Base Management Systems", Available from ACM HQ., New York. CODD, E.F. (1970) "A Relational Model of Data for Large Shared Data Banks", CACM, Vol. 13, No. 6, pp. 377-387. 84 22. CODD, E.F. (1971) "Further Normalization of the Data Base Relational Model", IBM Research Report, RJ 909, IBM Research Laboratory, San Jose, California. 23. CODD, E.F. (1971) "Normalized Data Base Structure: A Brief Tutorial", IBM Research Report, RJ 935, IBM Research Laboratory, San Jose, California. 24. CODD, E.F. (1971) "A Data Base Sublanguage Founded on the Relational Calculus", IBM Research Report, RJ 893, IBM Research Laboratory, San Jose, Calfironia. 25. CODD, E.F. (1971) "Relational Completeness of Data Base Sublanguage", IBM Research Report, RJ 987, IBM Research Laboratory, San Jose, California. 26. CODD, E.F. (1974) "Recent Investigations in Relational Data Base Systems", IBM Research Report, RJ 1385, IBM Research Laboratory, San Jose, California. 27. CODD, E.F. (1974) "Seven Steps to Rendezvous with the Casual User", IBM Research Report, RJ 1333, IBM Research Laboratory, San Jose, California. 28. CODD, E.F., et. a1 (1974) "Interactive Support for Non-programmers: The Relational and Network Approaches", IBM Research Report, RJ 1400, IBM Research Laboratory, San Jose, California. 29. DATE, C.J., et. a1 (1971) "File Definition and Logical Data Independence", Proc. ACM-SIGFIDET WorkshOp on Data Description, Access and Control, Nov. 11-12, San Diego, California, pp. 117-138. 30. DATE, C.J., et. a1 (1971) "Storage Structure and Physical Data Independence", Proc. ACM—SIGFIDET Workshop on Data Description, Access and Control, Nov. 11-12, San Diego, California, pp. 139—168. 31. DATE, C.J. (1972) "Relational Data Base Systems: A Tutorial", Fourth International Symposium on Comppter and Information Sciences, Miami Beach, Florida, Plenum Press. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 85 DATE, C.J. (1972) "An Introduction to the April 1971 Report of the CODASYL Data Base Task Group", IBM Technical Report, TR. 12.104, IBM United Kingdom Laboratories LTD, Hursley Park, Winchester Hampshire. DATE, C.J., et. al (1974) "The Relational and Network Approaches: Comparison of the Application Programming Interfaces", IBM Research Report, RJ 1401, IBM Research Laboratory, San Jose, California. DATE, C.J. (1975) "An Introduction to Data Base Systems", Addison Wesley, New York. DAVIES, C.T. (1967) "A Logical Concept for the Control and Management of Data", IBM Research Report, AR—0803-00, Poughkeepsie, New York. DELOBEL, C., et. a1 (1973) "Decomposition of a Data Base and the Theory of Boolean Switching Functions", IBM Journal of Research and Develop— ment, V01. 17’ NO. 5, pp. 374—3860 D'IMPERIO, M.E. (1969) . "Data Structures and Their Representations in Storage", Annual Review in Automatic Programming, Vol. 5, Pergamon Press, New York, pp. 1-75. DODD, 0.6. (1969) "Elements of Data Management Systems", Computing Survey, V01. 1, NO. 2, pp. 117-1330 EARNEST, C.P. (1974) "A Comparison of the Network and Relational Data Structure Models", Computer Sciences Corporation Report, El Segundo, California. ENGLES, R.W. (1970) "A Tutorial on Data Base Organization", IBM Technical Report, TR 00.2004, Poughkeepsie, New York. ENGLES, R.W. (1971) "An Analysis of the April 1971 Data Base Task Group Report", Proc. ACM-SIGFIDET Workshop on Data Description, Access and Control, Nov. ll—12, San Diego, California, pp. 69-91. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 86 FEHDER, P.L. (1972) "The Representation Independent Language Part 1: Introduc- tion and the Subsetting Operation", IBM Research Report, RJ 1121, IBM Research Laboratory, San Jose, California. FEHDER, P.L. (1973) "The Representation Independent Language Part 2: Deriva- tion and Insert, Update, and Delete Operations", IBM Research Report, RJ 1251, IBM Research Laboratory, San Jose, California. FLORENTIN, J.J. (1974) "Consistency Auditing of Data Bases", The Computer Journal, Vol. 17, No. 1, pp. 52-58. HEATH, I.J. (1971) "Unacceptable File Operations in a Relational Data Base", Proc. ACM-SIGFIDET Worksh0p on Data Description; Access and Control, Nov. 11-12, San Diego, California, pp. 19—33. HENRY, W.R. (1969) "Hierarchical Structure of Data Management", IBM Systems Journal, Vol. 8, No. 1, pp. 2-16. HSIAO, D., et. al (1970) "A Formal System for Information Retrieval from Files", CACM, Vol. 13, No. 2, pp. 67-73. KENT, w. (1973) "A Primer of Normal Forms", IBM Technical Report, TR 02.600, IBM System Development Division, San Jose, California. LEVIEN, R.E., et. a1 (1967) "A Computer System for Inference Execution and Data Retrieval", CACM, Vol. 10, No. 11, pp. 715-721. LORIE, R.A. (1974) "XRMr An Extended (n-ary) Relational Memory", IBM Scientific Center Report, G320—2096, Cambridge, Mass. MCGEE, W.C. (1969) "Generalized File Processing", Annual Review in Automatic Programming, Vol. 5, Pergamon Press, New York, pp. 77-149. MEALY, G.H. (1967) "Another Look at Data", Proc. AFIPS, FJCC, Vol. 31, AFIPS Press, Montvale, New Jersey, pp. 525-534. MELTZER, H.S. (1969) "System Design Considerations for Data Base Support", IBM Technical Report, TR 02.468, IBM System DevelOpment Division, San Jose, California. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 87 NAVATHE, s. (1974) "Logical Normal Forms for Data Translation", Proc. ACM- SIGFIDET Worksh0p on Data Description, Access and Control, May 1—3, Ann Arbor, Michigan. PALERMO, F.P. (1972) "A Data Base Search Problem", IBM Research Report, RJ 1072, IBM Research Laboratory, San Jose, California. PALERMO, F.P. (1973) "An APL Implementation of Relational Operators and a Search Algorithm", IBM Research Report, RJ 1273, IBM Research Laboratory, San Jose, California. RISSANEN, J., et. al (1973) "Decomposition of Files, a Basis for Data Storage and Retrieval", IBM Research Report, RJ 1220, IBM Research Laboratory, San Jose, California. ROBERTS, D.C. (1972) "File Organization Techniques", Advances in Computers, Vol. 12, pp. 115-174. ROSENBERG, A.L. (1971) "Data Graphs and Addressing Schemes", Journal of Computer and System Sciences, Vol. 5, No. 3, pp. 193-238. ROSENBERG, A.L. (1973) "Suffixes of Addressable Data Graphs", Information and Control, Vol. 23, pp. 107-127. ROTHNIE, J.B. (1972) "The Design of Generalized Data Management Systems", PhD Dissertation, Department of Civil Engineering, Massachusetts Institute of Technology. ROTHNIE, J.B. (1974) "An Approach to Implementing a Relational Data Management System", Proc. ACM-SIGFIDET Workshop on Data Description, Access and Control, May 1-3, Ann Arbor, Michigan. SENKO, M.E., et. a1 (1972) "A Data Independent Architecture Model I: Four Levels of Description from Logical Structures to Physical Search Structures", IBM Research Report, RJ 982, IBM Research Laboratory, San Jose, California. SEVERANCE, D.C. (1972) "Some Generalized Modeling Structures for Use in the Design of File Organizations", PhD Dissertation, Depart— ment of Computer and Communication Sciences, University of Michigan, Ann Arbor, Michigan. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 88 SIBLEY, E.H., et. al (1973) "A Data Definition and Mapping Language", CACM, Vol. 16, NO. 12’ pp. 750-7590 SIBLEY, E.H. (1974) "On the Equivalence of Data Based Systems", Proc. ACME SIGFIDET Workshop on Data Description, Access and Control, May 1—3, Ann Arbor, Michigan. SMITH, D.P. (1971) "An Approach to Data Description and Conversion", PhD Dissertation, The Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia, Pennsylvania. STRNAD, A.L. (1971) "The Relational Appraoch to the Management of Data Bases", Proc. IFIP Congress, Ljubljana, Yougoslavia. TAYLOR, R.W. (1971) "Generalized Data Base Management System Data Structures and Their Mapping to Physical Storage", PhD Dissertation, Department of Computer and Communication Sciences, Univer- sity of Michigan, Ann Arbor, Michigan. WANG, C.P., et. a1 (1974) "An Approach for Segment Synthesis in Logical Data Base Design", IBM Research Report, RJ 1397, IBM Research Laboratory, San Jose, California. WHITNEY, V.K.M. (1972) "RDMS: A Relational Data Management System", Proc. Fourth International Symposium on Computer and Information Sciences, Miami Beach, Florida, Dec. 14-16, Plenum Press. WHITNEY, V.K.M. (1974) "Relational Data Management Implementation Techniques", Proc. ACM-SIGFIDET Workshoppon Data Description, Access and Control, May 1-3, Ann Arbor, Michigan. WILLIAMS, R. (1971) "A Survey of Data Structures for Computer Graphics Systems", Computing Survey, Vol. 3, No. 1, pp. 1-22. WONG, E., et. a1 (1971) "Canonical Structure in Attribute Based File Organization", CACM, Vol. 14, No. 9, pp. 593-597. "IIIIIIIIIIIIIIIIIII