' EXTENDED SIMPLE PRECEDENCE Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY LEWIS HARVEY GREENBERG 1970 . ‘3‘ ‘. G . 3; ,V' I} ‘7‘ ' .4 1.1:) is i 2 I'HeSIB 3," PA1(-}ii,y‘-dlk {3"}; ’ Unrest.» Cy This is to certify that the thesis entitled EXTENDED S IMPLE PRECEDENCE presented by Lewis Harvey Greenberg has been accepted towards fulfillment of the requirements for __Eh‘D_._ degree in Electrical Engineering and System Science OLILLKLLL . Major professor Date—Decemheijqm 0-169 3? amount: av 3‘ "DAB & SONS' 590K HINDU" INC. LIIMHY .lNDERS 'm SPIIII'QI! Inclus- ABSTRACT EXTENDED SIMPLE PRECEDENCE BY Lewis Harvey Greenberg Grammars are tools for defining structures in languages. Their study has enabled the mechanization of several techniques for processing computer languages. These techniques depend to a large extent on being able to determine the grammatical structure of arbitrary strings of the language. One method, first introduced by Wirth and Weber, called precedence analysis, is extremely easy to implement if the grammar under consideration has certain basic properties. This dissertation extends the basic Wirth and Weber pre- cedence concepts to a larger class of grammars. This is achieved by changing the definitions of the precedence relations so that these relations are mutually disjoint and adding a set of context sensitive productions to the original grammar so as to effectively leave the precedence parsing algorithm unchanged. The use of context sensitive productions also provides a technique for attacking the problem of equal right Sides. This enables the new method, called extended simple precedence (ESP), to handle a much larger class of grammars than simple precedence does. EXTENDED SIMPLE PRECEDENCE By Lewis Harvey Greenberg A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical Engineering and System Science 1970 6:02.757 7~/~7o ACKNOWLEDGMENTS I would like to express my gratitude and appreciation to Professor Julian Kateley for his guidance throughout the pre- paration of this dissertation. His generous assistance, encourage- ment, and inspiration, played an important part in the completion of this work. I would also like to thank the other members of my committee, Carl Page, Richard Dubes, William Kilmgr and J. Sutherland Frame for their constructive criticism and guidance. I would also like to extend my gratitude to my wife Maxine, for her assistance in preparing the rough draft and her encourage- ment and inSpiration during the preparation of this dissertation. ii TABLE OF CONTENTS ABSTRACT ACKNOWLEDGEMENTS LIST OF FIGURES Chapter 1. INTRODUCTION 1.1 Basic Notation 1.2 Dissertation Objectives PRECEDENCE GRAMMARS 1 Simple Precedence 2 The Precedence Parsing Algorithm .3 Conflicts in Simple Precedence 4 Extended Precedence EXTENDED SIMPLE PRECEDENCE 3.1 The New Precedence Relations 3.2 The Formal Mechanism 3.3 Generating Strings INFINITE STRINGS AND EQUAL RIGHT SIDES 4.1 Infinite Contest Sensitive Productions 4.2 A Canonical Form 4.3 Equal Right Sides and Bounded Context Grammars LR(k) GRAMMARS AND THEIR RELATION TO ESP GRAMMARS 5 . 1 LR(k) Graumars 5.2 LR(1) Grammars 5.3 Synonomous Grammars for LR(k) CON CLUS IONS 6.1 Summary of Results 6.2 Some Possible Extensions of ESP BIBLIOGRAPHY APPENDIX Page ii iv 12 15 18 21 21 23 28 38 38 43 48 50 SO 56 62 63 63 66 68 Figure 1.1 2.1 2.2 3.1 5.1 LIST OF FIGURES Inclusion Tree of Parsing Algorithms Data on Analysis of Precedence Using Matrix Techniques Initial Configuration and Next Possible Steps of the Precedence Analyzer Graph of the Relation R Automata for Example 5.3 iv Page 12 14 31 59 CHAPTER I Introduction The modern digital computer, has made it increasingly impor- tant to improve man-machine communications. To this end, many compilers have been designed and built to translate man-oriented languages, i.e. FORTRAN, ALGOL, COBOL, to machine-oriented lan- guages. The man-oriented language which, in general, is machine independent allows the programmer to Specify his problem in much the same way he would if he were to solve it by hand. The machine- oriented language, on the other hand, is, in general, completely machine dependent and often impossible to readily comprehend. In the last few years, much progress has been made in the field of formalized languages. Yet the hand coded ad-hoc variety of languages still plays a dominant role in todays computer systems, mainly because of greater speed and error detecting capability. In order to close the gap between formal and non-formal languages, many papers have appeared in the literature which propose formal methods for accomplishing the task of translation. These methods deal with the structure or grammar of the language to be translated and generate output by assigning meaning to syntactical structures. In order for these methods to be efficient, the algorithm which is to accomplish the syntactic analysis should be made as simple as possible. The simplicity of the algorithm in most instances places unwarranted restrictions on the type and the complexity of the grammar it may effectively handle. In some cases the grammar l can be changed into a form which will make it acceptable to the algorithm, but these changes are usually ad-hoc and in some instances force unacceptable changes in the original structure of the grammar. This dissertation will deal with an extension of a method first proposed by Wirth and Weber [Ni 66] which provides an efficient algorithm for syntactic analysis together with good error detecting capability. 1.1 Basic Notation The basic entities to be used in the classical mOdel of a grammar are given as follows: (1) A finite set V. V is called the vocabulary and the elements of V are called symbols. (2) The operation of concatenation as denoted by juxta- position will be used to form strings from the elements of V. Strings may also result from the juxtaposition of strings. (3) The null symbol is denoted by A. Using these primitives several definitions can now be given. (1) The set of all strings of finite length over a vocab- ulary V is denoted by V* V* - {xlx = A v (3y)(3Y)y E V* Y E V = x = yY} Lower case Latin letters will be used to denote elements of V*. Upper case Latin letters will denote elements of V. In either case they may be subscripted. (2) A pgoduction denoted by 1 d r is an ordered pair of strings. The left and right parts of a production are designated 1 and r reSpectively. The set of 3 productions is denoted by P. (3) The set of symbols on the left hand side of the produc- tions is denoted by VL L V = {A‘(3x)(3y)(32) yAz ~ yxz E P} (4) The set of symbols on the right hand side of the pro- ductions in P is denoted by VR. vR - {AI <3x><32> x ~ yAz 6 PI (5) Z designates a subset of V called terminal symbols. 2: = {VR - vL} (6) VN designates the set of non-terminal symbols. vN={v-2} When it becomes necessary to distinguish between terminal and non- ’ terminal symbols and strings the following notation will be used: N in implies X G V _, N * y implies y 6 (V ) § implies X E 2 * y implies y E 2 The length of a string a is denoted by ‘a‘ and if Ia‘ K then the individual symbols of the string a will be denoted by A1,A2,...,AK. When a binary relation, say A, is defined for a symbol pair, the following notation will be used to extend the notation to strings: Let M =K a A b implies AK A B1 A a implies A1 A A i = 1,2,...,K-l i+l To denote that a relation does not hold between a pair of symbols or strings the —1 sign will be used preceeding the relation i.e. a—IAb. Where it would be ambiguous to use the = relation (mean- ing equality) the symbol E will be used. A context free grammar (CFG) is a 4 tuple, where: V = {V1,V2,...,VM,...,VN} is the vocabulary. 2={v P G Mfil’vM+2""’VN} 18 the set of terminal symbols. is a restricted set of productions of the form X.~ y. is the goal symbol or the starting type and is an element of the non-terminal set. A derivation is denoted by x =ty and is defined as follows: The The with which Similarly, with which x =Iy iff x = wAz and y = wsz then either A a s E P or A a t E P and t a 3 language L(Z) defined by the grammar Z is the set L(Z) = {ylG = 3'} set of head symbols, HSCX), is the set of all symbols some derivation from X may begin. mm)=WW=Yd H86!) = ¢ the set of tail symbols, TS(X), is the set of all symbols some derivation from X may end. TS(X) = {le = zY} T860 8 ¢ A sentential form (SF) is any string that can be derived from the goal symbol. The set of sentential forms of which the language of a grammar is a proper subset is defined as The entail the {ch = z} derivation of an element of a language will, in general, generation of a goodly number of SF's as intermediate steps. Since at each step there will in general exist several non- terminals, any one of which may be replaced using a production, the set of SF's used in the derivation is not unique. For this reason a canonical form for derivations will next be defined. A derivation is said to be rightmost (leftmost) if at each step of the derivation the rightmost (leftmost) non-terminal symbol is the symbol upon which the next SF is derived. If it is the case that there exists more than one rightmost (leftmost) derivation for the same SF then the grammar is said to be ambiguous‘. Parsing, the reverse process of generation, is an algorithmic method for determining whether a string is an element of a language and if so, what productions were used in its generation. There are two main approaches to parsing. The first of these methods, called tog-down, tries to match the string in question by generating a new string which is identical to the original" Cheathauiand Sattley [Ch 64] give a clear treatment of this method together with some of its pit-falls and draw-backs. The second method, called bottom- gp, tries to reduce the original string by replacing substrings in the SF which correSpond to the right hand side of some pro- duction with the left hand side. The algorithm is successful if the original string can be reduced to the goal symbol. These methods differ mainly in the amount of foresight and hindsight used in determining what constitutes a proper substitution. A parsing algorithm is said to be deterministic if at every point in the analysis the next step is uniquely determinable. This is a very important consideration when evaluating the efficiency of an algorithm since if there exists a choice at some point in the analysis, the fact that the wrong choice was made may not be- come apparent until after several additional steps. Then if the analysis is to succeed the algorithm must make some provision for backing up to the point of the incorrect choice so that an alternative choice can be made. Clearly this type of analysis complicates the algorithm and puts great restrictions on generation of output. A parsing algorithm is said to yield a left to right canonical parse (LRCP) if the substring to be reduced is always the leftmost reducible substring in the sentential form. For bottom- up parsing, a parse is an LRCP if the order in which the productions are applied to reduce the string is just the reverse of the right- most derivation (RMD) for that string. 1.2 Dissertation Objectives Given a grammar , it is a trivial task, starting with the goal G and repeatedly using the various productions, to derive a string of terminal symbols such that this string is in the language specified by the grammar. However, parsing, the reverse process of generation, is in general much more complex. Since it is the objective of a compiler to determine the structure of an arbitrary string and to generate code dependent on this structural information, a solution to the parsing problem has practical significance. Given a grammar, several algorithms exist for the parsing of strings and this dissertation will deal with an extension to one of the simplest of these methods. In Chapter II the concepts of precedence are introduced. This method attempts to reconstruct the SF's of the grammar by the bottom-up method. This is accomplished by insertion of brackets at appropriate points in the string so that the sub- string that is to be replaced is clearly delineated. This sub- string always corresponds to the right hand side of some pro- duction in the grammar. The process for determining where the brackets will be placed in the string is based on a set of rela- tions between symbol pairs which may legally appear adjacent to one another in a SF. If however, the set of relations are not disjoint then the brackets are not unique and the substrings are not properly delineated. Even when the substrings are pr0perly delineated there may still be a problem in determining which pro- duction is to be applied since there may exist two productions which have the same right sides. In Chapter III a new formulation of precedence is defined in such a way that the set of relations is always disjoint. How- ever the set of substrings delineated by the brackets no longer correspomh directly to the right hand sides of productions and therefore a new set of replacement rules must be defined. In general these rules take the form of context sensitive productions which we will loosely define to be any production which has more than one symbol on the left hand side. A method for determining the set of context sensitive productions is also given in this chapter. Since the set of context sensitive productions is not in general finite, the concept of bounds is introduced in Chapter IV. If the set of bounds exists then there exists a finite subset of productions defined by the set of bounds which is sufficient to parse any string in the language. One method for dealing with the problem of equal right sides is to examine the context in which the reduction is to take place. Since the mechanism already exists for examining the con- text under the new formulation, this method is also discussed in Chapter IV. Figure 1.1, taken from Feldman [Fl 68], shows the relation- ship among various types of grammars. Thus LR(k) is seen to be the most general of the various grammars shown. Under certain conditions, as presented in Chapter V, ESP grammars are at least as general as LR(k) grammars. LR(k) Grammars and Production Languages Knuth Kn 65] (m,k) bounded context (1) Eichel [Ei 64] Knuth [Kn 65] (m,k) precedence (1,1) bounded context RCF Wirth [W1 66] Eichel [Ei 64] Tixier [Ti 66] exten ed precedence McKeeman [MC 66] transition matrices Gries [Gr 67] precedence operator precedence Wirth [W1 66] Floyd [F1 63] Figure 1.1. Inclusion Tree of Parsing Algorithms CHAPTER II Precedence Grammars There are two distinct types of precedence analysis in use today. The first,called operator precedence,was first introduced by Floyd [F1 63] and is concerned with the interrelation between terminal symbols of the grammar. This method of syntactic analysis is restricted to a specific form of context free languages, known as operator grammars. A grammar is said to be an Operator grammar if none of its productions contains two adjacent non-terminal symbols. This type of analysis was extended to a larger class of grammars by Colmerauer [Co 68] and Gries [Gr 68]. The second type of precedence analysis, sometimes referred to as total precedence, is concerned with the interrelation of all the symbols in the grammar. This technique was first described by Wirth and Weber [Wi 66] and has been extended by McKeeman [Me 66] and Cheatham [Ch 68]. Although there do exist grammars which are both operator and total precedence, the class of grammars which exhibit the simplest form of total precedence (referred to as simple precedence) does not contain the entire class of operator pre- cedence grammars. 2.1 Simple precedence The precedence relations for simplegprecedence (SP) are defined between pairs of symbols as follows: 9 10 xéY if L-eaXYbEP X<~Y if L-oaXZbEP and YEHS(Z) X>Y if L-vaZYbEP and XETS(Z) or L a aZWb E P and X E TS(Z) and Y E HS(W) If there exists at most one relation between each pair of symbols in V and no two productions have the same right sides or are of the form X a A then the grammar is said to be simple precedence. The relationship between Boolean matrices and relations can be utilized to easily formulate the precedence relations. Three Boolean matrices are constructed from the produttion set as follows: E(I,J) = 1 if L‘» aV V b E P I J = 0 otherwise h(I,J) = 1 if VI ~ VJb E P = 0 otherwise t(I,J) = 1 if VI a a‘VJ E P 0 otherwise From the definition of the set of head symbols HS it is easily seen that the transitive closure of the relation h is the set of ordered pairs HS. Similarly the set of tail symbols TS is given by the transitive closure of the relation t. Letting H and T denote Boolean matrices which represent the transitive closure of the relations h and t respectively, the following can easily be shown: H(I,J) = 1 if VJ E HS(VI) = 0 otherwise T(I,J) = 1 if VJ E TS(VI) = 0 otherwise 11 The above transitive closure operations on the Boolean matrices can be obtained by using Warshall's algorithm [Wa 62]. The complexity of this algorithm is on a par with matrix multiplication but its efficiency can be increased by arranging the rows and columns so that all the non-terminal symbols precede the terminal symbols, thus taking advantage of the zero entries that occur in the rows for terminal symbols. Using the Boolean equivalent of matrix multiplication and addition, the following four matrices are defined: L = EH G = T'L Gl=T'E G=G]_-i-G2 The precedence relations can then be defined as follows: v év if E(I,J) = 1 I J VI <1 vJ 1f L(I,J) = 1 vI DVJ 1f G(I,J) = 1 The following matrix notation will be used throughout this dissertation: (1) AB will denote matrix multiplicator using and and or. (2) T' will denote the tranSpose of the matrix T (3) A-B will denote the matrix the elements of which equal the logical product of the corresponding elements of A and B. C = A'B implies C(I,J) = A(I,J) and B(I,J) (4) Ec will denote the logical complement of the matrix E. EC(I,J) = 1 if E(I,J) = o 0 otherwise (5) A +-B will denote matrix addition using the logical operator or. 12 When it is important to distinguish between the two types of > the following notation will be used: v > v if Gl(I,J) = 1 v > ‘v if G2(I,J) 1 Since the above algorithm is well suited for computer implementation, it might be interesting to examine some of the data for a few of the languages that have been analyzed over the past two years using this method. Figure 2.1 summarizes the results of these analyses. Similar Boolean matrix formulations for precedence grammars have been presented by Colmerauer [Co 68] and Martin [Ma 68]. Number of Total time for Percent of non-blank to Language productions analysis (SEC) total entries in the pre- cedence matrix MAD/I [M1 68] 106 5.767 24.10 MKL [Me 66] 95 6.484 52.08 AlGOl.[Co 68] 353 31.494 8.11 PI/I [A1 68] 689 96.124 9.90 Figure 2.1 2.2 The precedence parsing algprithm The precedence parsing algorithm as presented by Wirth and Weber [W1 66] is extremely simple to implement. Two stacks are used to hold the SF being analyzed. In the initial configuration (Fig. 2.2a), the string to be analyzed minus its first symbol resides on the right stack and the relation be followed by the first symbol of the string resides on the left stack. Designating 13 the top elements of the left and right stacks L1 and R1 reSpectively, the algorithm proceeds as follows. If L1'4 R1 (Fig. 2.2b) then the symbol «6 followed by the t0p symbol in the 1 = R1 (Fig. 2.2c) then the t0p symbol on the right stack is pushed down on the right stack is pushed down on the left stack. If L left stack. If L D'R (Fig. 2.2d) then the symbols on the left 1 l stack between the most recent ‘6 symbol and the top of the stack constitute the right hand side of some production. These symbols together with the Em symbol are popped off the left stack and the corresponding left hand side of the production is pushed onto the right stack. The process continues until only the goal symbol remains on the left stack and the right stack is empty. Whenever the right stack is empty the algorithm assumes that I. éiRl. 1 Whenever the left stack is empty the ‘Q symbol followed by R1 is pushed onto the left stack. ‘6 A B C D E F ... Left stack Right stack (3) <6 A <3 B C D E F 1 R1 If Ll‘¢ R1 then the result is: <0 A (I B C < D E F (b) If L1 5 R1 then the result is: ‘0 A <8 B C D E F (C) 14 If LID'R1 and there exists a production of the form S a BC 6 P then the result is: <3 A S D E F ... ((0 Figure 2.2 Initial configuration and next possible steps of the precedence analyzer. Error detection comes at two points in the algorithm. The first case arises when there is no relation between L1 and R1. This implies that the string is not an element of the language since if a symbol can appear adjacent to some other symbol in a SF then they are related by at least one of the precedence rela- tions. The second case arises when a substring bounded by the relations <5 and 9 does not correSpond to the right hand side of any production. This condition also indicates the occurrence of an invalid string. Before going on it might be advantageous to give an example of a simple precedence grammar together with the parse of a valid string. Example 2.1 The formation of the precedence relations and the parsing technique. The grammar The h matrix The t matrix A a BCD A B C D Y A B C D Y B a BA A O 1 O O O A O O O l O B d D B O l O l O B l O O l O C O O O O 1 C O O O 0 l C d Y D O O 0 O O D O O O O O Y O O O O O Y O O O 0 O ta 5' m :1: B m n H E Ndow> c>c>c>c>c>>- c>c>c>rarlut The r OOOOO> ooowow The parse rt 8 n H 52 c>c>c>rac>~< 15 The T t<15<3tfi1> GOOD-“0:1> The r OHOOH> Ot-‘OOt-‘w 0000053 0 of the string DDDYDYDYD Sentential forms DDYDYDYD <- B 4 D .> DYDYDYD <'B‘< B < D‘D’YDYDYD <'B'< B16 B 4 YtD'DYDYD <-BYDYD be BE< B 5 A.9»YDYD DYD YD YD <'B < Y'D'D <-B-*-C=D~> <~A~> 2.3 Conflicts in simple precedence Production used t<|51313 ¢>63053>r>u=u>c3ususui 1 1 1 1 1 1 1 1 1 1 1 UlHigiuiK3glu! 8 8 8 Conflicts arise in simple precedence when more than one relation exists between a pair of symbols. These conflicts consist of five basic forms; (4,5), (é,~>1), (5,92), (<-,s>1), and (4:92)- Squires [Sq 66] has shown that there exist special cases of the (1,5) and (é,>1) conflicts which may be resolved by the addition of a new non-terminal symbol and a trivial production. The first Special case arises when a symbol W head symbol. is its own If such a symbol appears to the right of some other l6 symbol V in some production then by definition V 5 W and V 4W. Removing the conflict involves the addition of a new symbol, say W', which is substituted for each occurrence of W appearing to the right of some symbol in a production. Then the trivial pro- duction, W' -' W is added to the production set. Now since there exists no symbol V such that V "= W the conflict has been removed. The second Special case involves a symbol which is its own tail symbol appearing to the left of some symbol in a production. A similar method will remove this (5,91) conflict. The third conflict which we would like to examine involves the <0 and -> relations. Colmerauer [Co 68] points out that this type of conflict arises from the fact that the precedence algorithm always yields a LRCP. If one is willing to forego the requirement of a LRCP then any .>2 relation may be arbitrarily changed to <. without effecting the parsing algorithm. By examining the general form of the (4,02) conflict we observe that the following must be present in the gramar: (1) V a CXYd (2) X =9 eA . = A -> B (3) Y ”Bf J (4) W as gAUh :9 A 4 B (5) U =oBi In applying the parsing algorithm we can't decide whether to complete the definition of X, i.e., A > B, or start building the definition of U, i.e., A <2 B. But in either case, we must eventually start working on the definitions of either U or Y, i.e., X <- B, and therefore we might just as well form U or Y before considering 17 whether A constitutes the end of some definition. Clearly we no longer have a LRCP since this would require the completion of X before Y. However once either U or Y is found the problem of what to do with A is resolved since if Y is found then A >'Y and if U is found then A 5 U. Cheathem [Ch 68] has shown that any grammar may be changed so that there do not exist any precedence conflicts between symbol pairs. This is accomplished by requiring that no symbol may appear more than once in the right hand side of all the non-trivial pro- ductions of the grammar. To convert any grammar to this form, one need only replace each repeated symbol B with a new symbol Bi and add the trivial production Bi ~ B. The following example will demonstrate the procedure. Example 2.2 The convertion of a non-precedence grammar The grammar The precedence matrix AI» CBC A B C X A O O O O B-ax 00é04 XO-‘~>4,~> The new Grammar The precedence matrix A'a 0181C2 A B B1 B2 C1 C2 X1 X2 31» X B A 0 0 O O 0 O O O 3.] X1 2X2 B 0 0 0 O 0 e» >- 0 B1 0 0 0 O O é O 0 31-03 32 000 0 0 o é 0 B «B C1 0 4 é O 0 0 4 0 2 C2 0 O O O O O O O Clbfi C X1 0 «é O é 0 0 <9 0 2 C 0 9’:>' O O O -> O Xl'e X x O D»'> 6> 0 ~> -> e> X2 e X OOOOOOO O V V O A C>A c>c>c>c> x Vcs 18 There doesn't exist a deterministic parsing algorithm which will successfully handle the grammar in example 2.2. This is because the algorithm has no way of determining the middle of the string of X's which is where the first production should be applied. Therefore although we have eliminated all of the precedence conflicts the new grammar is not simple precedence. All that we have done is to exchange the precedence conflicts for a set of productions with equal right sides. Also we note that the number of non-terminal symbols has greatly increased as has the set of productions. 2.4 Extended precedence Wirth and Weber [Wi 66] also defined an extended version of precedence by extending the definition to strings. This higher order precedence is defined as follows: Let ‘x‘ = N , * * x = y if A bXNch E P and be =Ib'x , Y1c1= yc' * x < y if A bXNVc E P and bXN = b'x , Vc = yc' * x > y if A bUY E P and bU = b'x , Y c a yc' 1c 1 or A bUVc 6 P and bU = b'x , Vc = yc' where a :3, b implies a E b or a =9 b A grammar is called a ppecedenceggpammar of order,(M,N) if for the strings x, ‘x‘ =‘M, and y, lyl =‘N, there exists at most one relation between x and y. Clearly if a and b are strings such that ‘a|‘< M and |b| < N and there exists at most one relation between a and b then for any strings ca and db at most one relation holds between them. This new definition of precedence will tend to increase the size of the precedence matrix 19 to unwieldy proportion unless some procedure is introduced so that the analyzer will look at strings of length greater than one only when it is necessary. McKeeman [Me 66] took this into considera- tion in his extension of the precedence algorithm. McKeeman decomposed the precedence algorithm into two distinct parts. The first part started scanning the input string from left to right, looking for a >1 relation as defined in a precedence grammar of order (2,1). Taking advantage of the fact that a reducible substring is always bounded on the right by a terminal symbol in a LRCP, McKeeman was able to simplfy the definition of the > relation to: x>i if A-vbWUYc and WU=w'x or A-isUVc and WU==w'x,V=Yd The second part involves scanning from the point at which the 9' relation has been found, back to the left, looking for a < relation as defined by a precedence grammar of order (1,2). Again the definition of «< can be simplified to: X14 y if A a bXVWc and VW = yw' The two sets of triples are used to find the precedence relations only when there is a conflict in the simple precedence relations. These conflicts are flagged by a special symbol in the simple precedence matrix which causes the analyzer to reference the appropriate set of triples for the correct relation. Both of these extensions contain Floyd's operator precedence as a subset since both have the ability to look at the Operators which in an operator grammar are never separated by more than one symbol. However, neither of these methods can handle the problem 20 of equal right sides in the production set. In Chapter III we will examine another method for extending simple precedence which will use an augmented set of productions to overcome precedence conflicts. CHAPTER.III Extended Simple Precedence As was shown in Chapter II, Wirth and Weber extended the notion of simple precedence by looking at more than two symbols in order to determine the proper precedence relation. It was pointed out that the size of the precedence matrix grew as a power of the number of symbols used in the determination. In this chapter we will introduce a new method for extending simple pre- cedence without increasing the size of the precedence matrix. Instead, the set of production rules will be augmented with a set of context sensitive productions (08?) which will be used to re- move conflicts in the precedence relations. 3.1 The New Precedence Relations Conflicts in simple precedence, as mentioned before, take five basic forms: (4,5), (é,~>1), (<,>1) and (<-,c>2). As was shown before, the (65>2) conflict is of no consequence if one is willing to forego the restriction of a LRCP. In the new method a LRCP will not be a criterion. Therefore, the new precedence relations are defined as follows: Definition 3.1 X = Y iff X Y or (X14 Y and X >1 Y) X < Y iff Xb< Y and X —1= Y X>Y iff X>Y and X-1=Y and X-1vJ if MLJ) = 1 where: 6 = E+-L-G1 =8. = Lsdc = L-Ec-Gi 8 = Ge6c9£C = G-EceLc Clearly, 6.4. = 64. = aft-.9 = O by inapection. The main idea behind this new definition is the fact that by making all conflicts between any tWo symbols = we have enlarged the number of symbols which can be examined in determining the proper reduction to be applied. That is, we have changed the problem by delaying the decision of whether to include a symbol in the reducible string by always including it in the string whenever there is a chance that it might be allowed. Before going on, consider an example. This is a grammar taken fromNWirth and Weber's paper which is not simple precedence. Example 4.1 G: A—oB;B ABE];L A O 1 l O O 1 . H at 13-.[A] 3001001 mm" A O 1 O l O 1 . B-°[L] 3000101 Tmatm‘ A 0 0 O 5 9 0 Precedence B a L B O 0 O > = 0 matrix [5440051 ] 0 0 0 9., 0 :0é4004 L O O 0 2 > O 23 There are two conflicts in the grammar which under the new defini- tions will be changed to the = relation. Examining a typical string, L;[[L;[L]];L] we note that the reducible string [L] comes up whenever it appears since [=L=].1 [L will come up iff [L; is in the string since ; is the only symbol S such that L > S. By the same reasoning, L] appears iff ;L] was in the string. The new set of productions is: A a B;B B«~ L B-o[L] [B-v[L B-o[A] B]-oL] and the parse of the sentential form [LgL];L is as follows: <[=L>;L];L [B ~ [L <[B=;;L B] .. L] <[B=;=B>],L A a B;B <[=A=]>;L B .. [A] B 4 L A » B;B 3.2 The Formal Mechanism Theorem 3.1 Let aNb be a SF Let N a n Let b"= b Let anb' be a SF then, if the grammar is unambiguous, aNb' is a SF. The only relation between ] and any symbol is the > relation and the only relation between any symbol and [ is the < relation. 24 2502:: First, since aNb is a SF, then anb is also and any derivation sequence which is used to derive anb must contain N a n by the definition of unambiguity. 0'; aNb fl anb 11 2 anb' = anb The derivation sequence of b' =Ib must be included in 1 since it was used in 3 to derive the same string. Then, if in l, we remove the proper subsequence which corresponds to 3, the result must be the derivation of aNb'. Theorem 3.2 Let nb be a SF s Let n Let b E cde such that Eb< 3 > e Then there exists a d' such that d' E d and a SF nb' such that b' E cd'e. 2222:: Since each element of the string d is equal to the next, there are, in reality, 3 possible relations between the elements of the string. Therefore, there are, at most, 3‘dl possible ways that d can be factored and at least one of these is correct for this SF form. Let R1 6 {<, E, >}. There exists one and only one sequence of R's for the given SF such that R0 D1 R1 D2 ..... RN-l D is correct. We locate the smallest i such that Ri E >. Clearly, there always exists an i for which this is true. Then we find the largest j less than i such that R E <. Then for all k, J j less than k less than i, Rk E 5' which implies that Dj ... Di N RN 25 is an alternative of some non-terminal symbol Q. Then d'=D ...D_1QDi...D and the SF nb' exists. Q. E. D. l j N Given the SF aNb such that E, a s N and that N e n, we would next like to consider under what conditions a context sensitive production is needed to reduce the SF anb. Clearly, if the grammar is SP, then n > b and there is no need to use a CSP. However, under the new definitions for simple precedence, hereafter referred to as ESP, four additional cases arise. Case I a<< n, n = b If b is of the form cd with the pr0perties Z and c > d then by adding the CSP, Ncba nc, the proper reduction can be made. If b cannot be factored into the above form, it must be of the form E > d for some c and d. Then, by Theorem 3.1 there exists a SF anb' and by Theorem 3.2 a SF aNb', which will delay the finding of the proper CSP until the sentential forms anb' and aNb' are examined. Case II a = n, n > b From the hypothesis a can be factored into ef such that E < f and the CSP fN a fn can be used to find the proper reduc- tion. Case III a = n, n = b This is a combination of Cases I and II. If as in Case I b E : > d then the CSP ch # fnc can be used to find the proper reduction. If, on the other hand, b cannot be factored, then as in Case I, there exists a SF anb', and the CSP will be found when the SF aNb' is examined. 26 Case IV a s n, n < b In this case no reduction can be made. But, by Theorem 3.1 there exists a SF anb' and by Theorem 3.2 a SP aNb', which will delay the finding of the prOper CSP until the sentential forms anb' and aNb' are.examined. The new definitions would be of little use if the entire sentential form had to be examined in order to decide what CSP was to be applied. Therefore, we will define an entity called a phrase which will contain sufficient information for the decision. Define a phrase as any sub-string of a SF for which the following prOperties are true. Definition 3.2 If abc is a SF and l. a < b 2. b > c 3. S then b is a phrase. Now, if it is the case that all phrases are known together with the CSP's which reduce them, then the algorithm for parsing is identical with the algorithm for parsing simple precedence. To this end, we would next like to consider a method for determining the phrases of a grammar. Lemma 3.1 If for some symbol pair (A,B) A = B then X, X E HS(B), A -,> x 27 Proof: It is sufficient to prove that (6H) -.& = 0 Expanding the above (E + L . G1) H . (G.E°.LC) 0 (EH) . c - EC . LC) + (L . G1)HtG-EC-LC) = 0 and since EH = L (EH-G°EC-Lc) = 0 Assume (L-Gl)H)-G°EC.LC) 1‘ 0 then there exist I,J,K, and L such that L(1,K)Gl(1,K)H(K,J)G(I,J)E°(1,J)L°(i,J) = 1 But L(J,K)H(K,J) implies L(I,J) which contradicts LC(I,J). Therefore (610%?! = 0 Lemma 3.2 If for some symbol pair (A,B), A = B then X, X E TS(A), X —1< B 2.22.: It is sufficient to show that (T'6) .1”, = 0 Expanding the above we get T'(E+L'G1)- (DEC-GE) = 0 ((T'E)-IrEc-G(1:) + ((T'(L-Gl))-IrEc-G:) = 0 but T'E = C1 therefore ((T'E)-L-Ec-Gi) = 0 Assume (T'(L- Gl))-L-EcoG: ’5 0 then there exists an I,J,K, and L such that T(K,I)L(K,J)GI(K,J)L(I,J)EC(I,J)Gi(l,J)GI(K,J) = 1. This implies there exists an M such that T(M,K)E(M,J) 28 T(M,K)T(K,I) implies T(M,I) and T (M,I)E (M,J) implies G1(I ,J) But G1(I,J) and G:(I,J) is a contradiction, therefore (T'6) sf, = O Lamina 3.3 If for some symbol pair (A,B) A.< B then X, X E HS(B), A -1> X see It is sufficient to prove that (ea-0'19 = 0 Expanding the above we get ((L°EC°G:)H)'G-EC-LC = 0 Assume ((LoEciG:)H)°G'Ec9LC f 0 then there exists an I,J, and K such that L(I,K)EC(I,K)G(I,K)H(K,J)G(I,J)EC(I,J)LC(I,J) but L(I,K)H(K,J) implies L(I,J) which contradicts LC(IaJ). Q. E. D. 3.3 Generating Strings Definition 3.3 Define the 3 Link matrices as follows: Let x e TS(VI), Y 6 HSO/J) Q1(I,J) 1 if X=Y Q2(I,J) 1 if VI = Y Q3(I,J) = 1 if X = VJ or equivalently in matrix notation Q1 = T6H' Qz=w' Q3 =Te' 29 lemma 3.4 and H If a grammar G is SP then (Q1 +Q2 +Q3)°(=E+& + 6) = 0 3.9.9.2: Let TR and HR represent the reflexive closure of T respectively. G and (SEE Since G is SP, is L, .8 and (Q1 +Q2 +03) = TREH' + TE (L + c + E) = '11?an assume (TREH' + TE)°(TI'{EH) 7‘ 0 then there exists an I,J,K,L,M and N such that one of the following is true. 1) TR(K,I)E(K,L)HR(L,J)TR(I,M)E(M,N)H(J,N) 2) TR(K,J)E(K,L)HR(L,J)T(I,M)E(M,J) If number 1 is true then TR(K,I) and T(I,M) imply TR(K,M). HR(L,J) and H(J,N) imply HR(L,N). TR(K,M)E(K,L)HR{L,N) implies there is a relation between M and N. But E(M,N) implies V E V which forces M N K E M and L E N. Then rewriting l we get TRm.J)ECM,N)HR(N ,J)TR(I .M)H(J .11} but HR(N,J)H(J,N) implies H(N,N) and E(M,N)HCN,N) implies V 'VJ which contradicts with VM = VJ. Therefore 2 is false. Q. E. D. Intuitively the Q matrices predict when there exists a derivation on a symbol which will link to its context though the equals relation. Given a string c of length K, we define two sequences and each consisting of the K elements D1,D2,...,DK and M1,M2,...,MK’ respectively. We define the relation R on these two sequences as follows: 1. DiRDi-l if Ci-l = C1 2' Mi 1-1 if Q2(Ci-1’Ci) 3. MiRMi-l if Q1(Ci-1’Ci) 4' DiRMi-l if Q3(Ci-l’ci) Letting RT be the transitive closure of R, we define linked strings as follows: Definition 3.4 c is a (M,M) linked string if 1~H(RTM1 c is a (M,D) linked strig if DKRT M1 c is a (D,M) linked string, if MK.RT D1 c is a (D,D) linked string, if DK RT D1 31 Example 3.2 Finding Linked Strings For the string a = A1A2A3A4 GRAPH OF THE RELATION R the following are linked strings STRING TYPE 1121314 (M,D). (D,D). (M,M) A1A2A3 (D,M): (D,D) Definition 3.5 A string ch is said to be a generating string, GS(X), with distinguished non-terminal X if the string has all of the following properties. Let: |c| E I, |d| EJ 1. E 2. At least one of the following is true: a. c is a (D,M) or (M,M) linked string and Q1(CI,X) b. c is a (D,D) or (M,D) linked string and Q2(CI,X) 32 3. At least one of the following is true: a. Q3(X,D1) and d b. J E 2 and Dl,< D2 c. J E O 4. S X C1 Definition 3.6 We define a generator as an ordered pair (x,N) where x is a generating string and N is the distance to the distinguished non-terminal when measured from the right hand end of the string. If (ch,K) is a generator we can produce a new set of generators by applying to the distinguished non-terminal X all of its alternatives. For each alternative, say X A b, we examine the string cbd for new generators. First, since ch was a CS, by Lemma 3.1 cb has the property E S E. Now for each non- terminal 2 in cb, factor cb into rZs and examine the string rZ ad for GS(Z). Since, in the definition of a G8, the strings to the left and the right of the distinguished non-terminal have to satisfy disjoint requirements, we will examine each side separately. For the left side, we will factor r in two ways II! III r tu ‘u‘ I III III r 1n ‘n‘ J where u is the longest of (M,D) or (D,D) linked strings and n is the longest of (M,M) or (D,M) linked strings. Then there are four possible cases for the left side. case I: Q2(UI,Z)—IQ1(NJ.Z) Then the left hand side is uz since this satisfies re- quirement 2.b in definition 3.5. 33 Case II: —1Q2(UI,Z) QICNJ,Z) Then the left side is nZ since this satisfies requirement 2.a in definition 3.5. Case III: Q2(UI,Z) Q1(NJ,Z) Then, if I is greater than J, the left hand side is uZ as in Case I. If I is less than J then the left side is nZ as in Case II. If I E J then u E n and the left side is nZ which is equi- valent to uZ. Case IV: —1Q2(UI,Z) —1Q1(NJ,Z) Then the left side is just Z which satisfies 2.c in the definition 3.5. For the right side we factor sd as follows: sd E ef E p ‘e‘ E I where g > f There are two possible cases for the right side. Case I: I is greater than 0 and Q3(Z,e1) Then the right side is e which satisfies 3.a of definition 3.5. Case II: —1Q3(Z,e1) or I E 0 If 2 = P1, then the right side is A. If 2 # P1, then the right side is 911. Then the constructed string satisfies the requirements of a CS. The set of generators for a grammar is defined to be the set of all generators that can be derived from (0,1) where o is the goal of the grammar. 34 In each step of the analysis the precedence analyzer will return a phrase for reduction. In each case we know that the string to the left of the phrase has the property s and that the next reduction must be based solely on the phrase presented. A GS is used to predict in which context a production may become context sensitive by supplying a string which may be the result of a reduction of a phrase. For a given GS, say ch and a production B ~»b, we try to determine whether a phrase will exist in the string abc. The same cases will arise that arose when we considered the sentential forms. Case I: a < b, b = c From the definition of a CS, c may have one of two forms. Form 1: 3 Then the CSP is Bc 4 be since by the construction of the GS we know that c> the context in which the CS was derived. Form 2: c E ClCZ’ C1 < C2 The only way in which this case could arise is when, in the formation of the GS, the right side could not be factored into some string ef such that g > f. It was Specifically for this purpose that C1 and C2 were placed in the GS as such. Intu- itively, what we are saying is that it is not possible for a reduction to take place on this string because it is not a phrase. However, this GS is still to be included in this set because of the 68's which may be derived from it. Also, it is not necessarily the case that all the tail symbols of the distinguished non-terminal will be equal to C1. Therefore, C also keeps track of the 1 context in which case a tail symbol of the distinguished non-terminal 35 can appear so that if the tail symbol is > C the production 1 will be found. Case II: a = b, n > b In this case a production is always found, since by defini- tion a is always factorable into some ef such that E < f and the CSP is just fB d fb. Case III: a = b, b = c This is just a combination of Cases I and II. If a reduc- tion exists for the right side, the CSP would be ch1~ fbc. Otherwise, there is no reduction for the same reason given in Case I. Case IV: a s b, b < c Clearly, in this case there are no reductions since there exists no phrase in this context which contains b. One can now see that the method for finding CSP'S is already embedded in the process of determining the 68's and, therefore, one can combine them into a simple algorithm. Starting with the starting type of a grammar, we obtain a set of generators and a corresponding set of CSP'S. The car- dinality of these sets is not necessarily finite. However, at this point, we will only consider the cases of finite cardinality and investigate the infinite case in the next chapter. Now we will examine, in more detail, a single step in the algorithm. Given a GS(B) of the form ch with :, we will examine all the possibilities that will arise and show that the algorithm is complete. 36 First, we note that the only way in which c can be c is if the GS from which this CS was derived had on the right side of its non-terminal B, a string which was of the form cd and that there exists a tail symbol of B, say X, such that X = c. But, clearly, not all the tail symbols of B are necessarily related to c by the = relation, so that two other cases are possible. Case I: X > c In this case, the context of c is not needed for the reduction but is necessary for establishing the existence of the reduction. Case II: X < c This case can only arise if B E c by Lemma 3.2. In this case, c is to be reduced since it is a phrase. But since c is a phrase, it must contain as a sub-string, at least one right side of some reduction. And since c is a sub-string in a CS, it must have been derived from some mapping on some distinguished non-terminal of some GS. Therefore, there exists a CS which will produce a string c when its distinguished non-terminal is mapped and the prOper reduction will be found. Now we will examine the other side. From Lemma's 3.1 and 3.3 we know that a.< X or a = X for each X a head symbol of B since by the definition of a CS, a S B. Therefore, either there is or is not a left context, but it is never the case that a > X. We also note that if there is a left context, then the production thus found cannot be applied in this context without its left context since, by Lemma 3.3, all the tail symbols of the right-most 37 symbol of the left context cannot be > the head symbol, X, of B for which a = X. In the 68's, we have examined all the contexts in which a non-terminal can find itself in a sentential form and eliminated only those contexts where the precedence relations were such that they could not become part of a phrase. Therefore, if there are a finite number of GS's, then there are a finite number of produc- tions and they will be enumerated by the algorithm. Before discussing the case of an infinite set of 68's, the same example which was done earlier in this chapter will be used to demonstrate the algorithm. Example 3.3 ESP precedence matrix Ar'B,B ABEJSL B—v[A] AOOO=OO 3-o[L] 3000>=0 B-oL [=<<00= ]000>>0 ,O=<00< L000=>O Q1 Q2 Q3 AB[],L ABE].L ABE].L AOOOOOO AOOOOOO AOOOllO 3000000 3000000 3000100 [000000 [110000 [000000 ]000000 ]000000 ]000000 , O 0 0 O O O , 1 O 0 O O 0 , O O 0 O O O LOOOOOO L000000 LOOOOOO GS MAPPINGS CSP A 1 3,3 A—+B,B 3 1 [A].IL].L B~IAI [A12 [3.3] BELL] B] 2 £A11.[L11.L1 B~L [3 1 [TA].I[L].IL B]~L] EB~EL CHAPTER IV Infinite Strings and Equal Right Sides In chapter III the concepts of ESP were introduced together with a method for generating the CSP's necessary for the proper operation of the precedence analyzer. It was stated that it is not necessarily the case that the set of CSP's, thus derived, will be finite. Therefore, we would next like to investigate the case of the infinite CSP's leaving the problem of equal right sides to the latter sections of this chapter. 4.1 Infinite Context Sensitive Productions Infinite CSP arise in one of three ways. If there exists a derivation of the form 'A E bAc, Then, Case I: E = b ahd : or c E c Case II: E = c and E or b E b Case III: : = c and E = b In case I, the set of allowable phrases will contain db*bAe where d and e are part of the context in which A may appear. A similar set exists for the other cases. Definition 4.1 A symbol, P, is said to have an LL bound, K, if for every phrase which starts with P, there exists a reduction within K symbols to the right of P. 38 Qfiilfl for e withi 93:13 for 6 with Piél for with aPPE dist 19mg whe in and of do f0 39 Definition 4.2 A symbol P is said to have an LR bound, K, if for every phrase which starts with P, there exists a reduction within K symbols to the left of the right hand end of the phrase. Definition 4.3 A symbol P is said to have an RR bound, K, if for every phrase which ends with P, there exists a reduction within K symbols to the left of P. Definition 4.4 A symbol P is said to have an RL bound, K, if for every phrase which ends with P, there exists a reduction within K symbols to the right of the left hand end of the phrase. Given phrases generated by either Case I or Case II, it is apparent that the reduction to be made always lies a finite fixed distance from one boundary of the phrase. For each phrase p of length K, we define a six-tuple, where LL,LR, RR, and RL are the values of the bounds defined in definitions 4.1 through 4.4 and P1 and PK are the first and last symbols of the phrase p. For each given finite set of CSP's, there exists a maximum LL,LR,RR and RL for each symbol in the grammar. Definition 4.5 The dominant bound of a production is the smallest of the four bounds (LL,LR,RR,RL) of a production. The bounds associated with each symbol are the maximum dominant bounds for that symbol. They can be easily found as follows: 1. For each six—tuple defined above, we construct four 3-tuples . 'Where B is the type of the bound (LL,LR,RR,RL), V is its value and S is the associated symbol. does cess than WhEn far Sub. Stri - a” _:'::“a -_["..-l " 4O Construct one 3-tuple for the dominant bound, if more than one is dominant, construct one for each such dominant bound. b. Construct one 3-tuple for each of the remaining bounds with a value one greater than the dominant I value. 2. For each symbol and bound type, there exists a maximum value V, which is the maximum bound for that symbol and bound type. 3. Replace the bounds in each six-tuple by the correSpond- ing maximum bound if the current bound is less than the maximum. The above process is repeated until a set of maximum bounds does not change for two consecutive iterations. Clearly the pro- cess must halt since the size of the bounds cannot become greater than the largest original bound in the set. The bounds, thus found, are used by the parsing algorithm Vohenever a phrase is found. The analyzer finds the minimum bound fer the given end symbols and then tries to match the indicated Sub-string of the phrase to the CSP set. In thiS‘way, infinite strings of Case I and Case II are easily handled. Example 4.1 Consider the simple grammar: S a AS S a AA In this grammar, A is both < A and =A. Under the formation rules of ESP, this conflict is resolved by making the relation between the A's equal. But there exists the following two sentential forms, A*S infinite set of productions. 41 and A*AA, which imply an If we generate the ESP CSP's for all strings of length less than or equal to 4, we get the follow- ing set of productions and bounds: S AA <2,2,2,2,A,A> AS AAS <3,2,2,3,A,S> AAS AAAS <4,2,2,3,A,S> <3 <3,2,2,3,A,S> <4,2,2,3,A,S> Theorem 4.1 Let K be the length of the longest CSP in the set which contains all the finite CSP's and at least one representative of each potentially infinite CSP and let S denote the set of all K CSP's of length less than or equal to K. Then, if the set of bounds for the set S is the same as the set of bounds for the K set S , where M is the maximum bound in the set SK KfiM , then the set of bounds for the set S is sufficient for the set of K infinite CSP's. 4.2 Proof: Assume Case I. Then by the hypothesis there exists an N such that 1 +J N+J+1 IabNd|§K conflicts Property 2. The right end of every phrase is known 44 Properuy 3. The parse of any string is an LRCP. Property 4. At most only Case I phrases exist. Property 5. There always exists a set of bounds. The above form always exists for any grammar since this is just the reverse of Griebach Normal Form [Gr 65]. Several authors have pointed out that a sufficient condition for parsing a grammar in Griebach Normal Form by a top-down method is that each alternative of a given definition starts with a unique terminal symbol. This method is based upon the fact that if a non- terminal is to be replaced by one of its alternatives, one need only look one symbol ahead on the input string to uniquely determine which alternative to apply. In these grammars, there may be many productions which have equal right sides, but, in a sense, only a small subset of the productions can be applied at a given time. Therefore, the restriction for the given subset, that the pro- ductions be unique, reduces for top-down parsing to the case of unique starting terminal symbols within the alternatives of the definition. It would be convenient to have a similar set of sub- sets of productions for bottom-up parsing. Taking advantage of the LRCP of the canonical form, we will next investigate such a procedure. Definition 4.6 The production grammar GP of a grammar G is defined as follows: C =1 GP =I. Let 1 denote theindfiiproduction in P and let 2' = {i‘i E 1,2,....#P} and let v' E [v - 2} U {2'}. .-th "" so. For each production in P, say thel. R W121W222 WKZK where Wj E 2* and Zj E (V - 2)* define a corresponding pro- d ti f P' .. . uc on or , R 2122 ZKi 45 Now for each production in P, there is a correSponding pro- duction in P' which has the same non-terminal symbols in the same order and a single terminal symbol at the right end which represents the production number in P. Theorem 4.2 The language generated by a production grammar is the set of allowable LRCP's for the grammar. 3:22: Clearly, the language generated bythe production grammar contains only production numbers since these are the only terminal symbols in the grammar. If one starts to derive an SF in GP using an RMD, then at any point in the derivation, the SF will have the form zAb where z E (V' - Z')*, A E (V' - 2') and b 6 (2')*. Now, whenever a production is applied to an SF, using an RMD on a production grammar, its production number is recorded immediately to the right of its application and immediately to the left of the most recent previous application. Therefore, the string generated by the RMD is just the reverse of the application of the productions in the RMD. But an LRCP is juSt the reverse of an RMD and, since there exists an RMD for every string in the language, then the language generated by the production grammar is the set of LRCP'S. Theorem 4.3 The set of LRCP's of a grammar G and its production grammar are the same. Given any RMD, at any point in the derivation, the number and order of the non-terminals in the SF's of the two 46 grammars are the same. This is evident from the construction of the production grammar. Now, since the right-most non-terminal in each of the two SF's is the same, if correSponding productions are applied to this non-terminal in each of the reSpective SF's, the resulting SF's are still identical with reSpect to non-terminal symbols. Therefore, for every derivation in G, there exists a correSponding derivation in GP and visa versa. Q. E. D. It was noted in Chapter II that the only way a symbol B could appear immediately to the right of a symbol A was if A‘<, = or >'B . Therefore, if the precedence relations are known for the terminal symbols of a production grammar, then given any production, the set of production which may legally follow it in an LRCP are given by these relations. Since the form of the production grammar is identical with the canonical form described above, the relation matrix N for the production set is just N = T12 (E11 H12 + E12). The restriction on equal right sides can now be relaxed by only requiring productions within any given subset to be unique. Clearly, this technique can be used for any bottom-up method which is governed by an LRCP. As an example, a precedence grammar, with equal right sides, will use the above technique to resolve this problem. Example 4.2 l S a LED 4 M.~ D 7 T * ATB Production Grammar l S d LRl 4 Min 4 7 T * T7 Precedence Matrix for G I:cotfl:>teifliit~cn c>c>c>c>c>c>c>c>c>aa c>c>c>c>c>c>c>c>c>r* c>c>c>c>c>c>c>c>c>z CIVIC u<=<3zi V00 "00 "COT-J v v c>A c>c>A A c>>h OOV II II nooow 47 2 S E'MTC 5 R a ARB 8 T a AB 2 S EIMTZ 5 R.* R5 8 T a 8 The N C D l O 0 l O 0 O 2 0 O O 3 0 O = 4 O = O S l O O 6 l > > 7 O O O 8 O 0 0 Matrix for G HHOOOOOON OOOOOOOOU OOOOOOOOL‘ OOHt—‘OOOOUI COOOOI—‘OOO‘ Ht—‘OOOOOON OOOOHOOOQ Clearly the first applicable production must be a head symbol of the starting type in the production grammar. grammar it must be either production 3 or 4. For this particular The parse of a string in the language would proceed as follows: DAAABBBC AAABBBC BBC BC C S Production to be applied 4 8 7 7 2 Allowable successors 8 2,7 2,7 2,7 48 4.3 Equal Right Sides and Bounded Context Grammars Floyd [Fl 64] introduced the concept of bounded context analysis. This is another convenient method for handling equal right sides in ESP. If there exists a context of finite length, for which the decision can be made as to which production is to be applied, then if this context can be forced to appear with the phrase, the correct reduction can easily be found. But clearly, the mechanism for examining this extended context is already available in ESP. One need only change the apprOpriate relations to = in order to extend the contexts. In so doing, additional CSP's will be created which will perform the same function as Floyd's Bounded Context Pro- ductions. To illustrate this technique, we will examine another precedence grammar which contains equal right sides. Example 4.3 S * CR T 4 MTNB S a DT T a MNB R,* LRB M m A R'* 13 N m B L m A The bounded context reductionsneeded to decide whether to apply I.~ A or M ~ A when an A is found are: LA use L d A CA use L‘e A DA use M ~ A MA use M d L The precedence matrix with the forced equalities circled is: S R T M_ L N A B C D S R = T - < M = < -' G9 <: L = < (E) = N = A «E be > > > > > B > > C = < I: D = < ’) The Th1 ESP, are free gra grammar 49 The four CPS's added to the production set are: MM d’MA CLH CA LL'~ LA DMIE DA The techniques discussed in this chapter, when combined with ESP, create a powerful method for deterministic parsing of context- free grammars. Some of timzmore interesting ramifications of ESP grammars will be discussed in the next chapter. CHAPTER V LR(k) Grammars and their Relation to ESP Grammars In this chapter we will show how ESP compares with other techniques for deterministic parsing of context free languages. 5.1 LR(k) Grammars Knuth [Kn 63] defines a grammar to be LR(k) if a phrase x is uniquely determined by the entire string y to the left of the phrase x and k terminal symbols to the right of the phrase x. Both Floyd [Fl 68] and Tixier [Ti 67] consider LR(k) grammars to be the most general form for which there exists a constructable algorithm for an LRCP which is derived directly from the grammar. This method always results in a LRCP. Lemma 5.1 There exist grammars which are not LR(k) for any k but which are ESP. Consider the following grammar: EBC ABD A (B) L 0305111030: 1 1 1 1 1 This grammar is not LR(k) for any k since one cannot tell whether the phrase A should be reduced to E or not by simply looking at a finite number of terminal symbols to the right of A. 50 The key it 3151: be I) this gran The Cir which 1 with aPPrOp of the if bc The key to the problem is that C or D, upon which the solution must be based, can be an unbounded number of Symbols away. But this grammar is ESP: csp SBE()LACD S-vABD S 34330 3 = == EBC a ABC E - < < B-'(B) ( < < B a L ) > > > L > >> A @- Q <.- The circled relations represent conflicts in the original grammar which were removed using ESP. Now the decision of what to do with A is delayed until a C or D is found and then the appropriate action is taken. Here we have also taken advantage of the fact that the parse is not a LRCP. Definition 5.1 A grammar G is said to be synonomous to a grammar G 1 if both of the following conditions hold: 2 1. Both G1 and G2 generate the same language. 2. If in each grammar the trivial productions of the form X d Y are removed by substitution, then the two grammars are isomorphic. Example 5.1 GI S a AAB 62 S q RHQ S d AS S * TS S d AC S ~ RV S a TC R a A T E A Q a B V m S These two grammars are synonomous. KnuI andu 1 immee with the referred this rig Si 52 Knuth has shown that if a grammar is LR(k), then there exists a right linear grammar which will generate a set of strings represent- ing the entire string to the left of the right-most phrase together with the phrase and k symbols to the left of the phrase, hereafter referred to as a sentential head. The technique for constructing this right linear grammar is as follows: Given G = , construct: F = where m = [PI v' = {[z,a]lz e v, a 6 2K} u {[i]|i = 1...M} 2' = {V} U {$1 0' = [o,$] P' is defined as follows: For each production in P, say A.X d ZX Zx ... Z , let 1 2 Xn 1. for each 2x 6 V - 2 add to P' a production of the form 1 [Ax,w] d ZX ...Zx [Zx,b] for each [A,w] and for each 1 i-l i [Zx ,b] where b E {Z‘ZX. ...ZX w = Eq, ‘2‘ = K] 1 1+1 N 2. [A ,w] a Z Z ...Z w[X] x x1 X2 "N 3. [x'j-oli The w is defined to be any b that appears on the right side with the corresponding AX. 53 Example 5.2 mm 1. S -* CT [s,$] 'I C[T9$] [s,$] .. CT$[1] 2. s .. DR [s,$] - D[R,$] [s,$] .. DR$[2] 3. T .. ATB [T,$] .. A[T,B] [T.$] ~ ATB$E31 4. T .. AB [13.3] -» A[T,B] [LB] - ATBBEBI 5. R .. ARBB [T.$] - AB$E4J [LB] ~ ABB[4] 6. R .. A33 [R,$] - A[R,Be [s,$] - ARBB$ 5] [R,B] _. Amuse [11,3] -» ARBBB 5 [R.$] -: ABB$[6] [3,3] —. A333[6] [I] -o A [4] -e A [2] .. A [5] -. A Definition 5.2 A finite state automata (FSA) is a 5-tuple A = , where S is a set of states 2 is a set of input symbols S m is a mapping of S X 2 into 2 So is the start state F is a subset of S designated final states The domain of m is extended to include strings of input symbols as follows: m(s,Ab) = m(m(s,A),b) Definition 5.3 If the range of m is S instead of ZS then the FSA is said to be deterministic (DFSA). Definition 5.4 The set of strings excepted by a FSA is the set T(A) T(A) = {2|n(so,z) c F} That is which I that t state genera contri fitter acce; whet IESU fOry ant [1]," 54 Definition 5.5 A FSA is said to be input pure iff a VX VI VY’VJ(m(X,I) E m(Y,J) implies I E J) That is, for each state there exists one and only one input for which the state is in the range of the m mapping. Given any right linear grammar, Ginsberg [Gi 65] has shown that there exists an effective procedure for finding a finite state automata which will accept those and only those strings generated by the right linear grammar. However, the automata thus contrived is not necessarily deterministic. Rabin and Scott [Ra 60] have proved that for each non- deterministic automata there exists a deterministic automata which accepts the same set of input strings. Lemma 5.2 For each DFSA there exists an equivalent input pure DFSA. Ems: For each state Z we define K new states Z ,Z A A ,...,ZA 1 2 K where K is the number of unique input symbols A1,...,AK which result in transitions to Z. We replace each move function of the form.‘M(L,A1) = z with the move function Ml(L,Ai) = ZA and i for each move function of the form M(Z,B) = P include I _.._._ u . M (ZAi,B) P. If Z 6 F add ZA.°°'ZAK to F . Now if i = v ' = d ' M(so,X) D then M (SO,X) DX an if D E F then Dx 6 F by construction. Assuming that M(so,aX) ==Q, M'(so,aX) ==Q§ and M(Q,Y) = R, we precede by induction. By the induction hypothesis M(so,aXY) = R and by the construction, if M(Q,Y) = R 55 then there exists M3(Qi,Y) = R; for all states Qi. In I = 3 ' = ' particular M (QX,Y) 1%. Therefore, M (so,aXY) 1% and REF then RéEF'. Lemma 5.3 If a grammar is LR(K) there exists an input pure DFSA which accepts those and only those strings which represent the sentential heads of the grammar. We would now like to construct a granniar which is synonomous to the original LR(K) grammar. Let GS = be the new grammar and G = be the LR(K) grammar for which A = is the input pure DFSA. Define V' = {V} U {S} and 2' = 2. The production set P' is constructed as follows: For each production in P, say Z ~ A1A2...AM, there exists N at least one production in the LR(k) right linear grammar of the form [zN ,w] .. AIAZ. . .AMBM+1. . .BM+K[N] Then for each sequence d of length M+K which will, from some state E, allow the automata to reach a final state NB , add M+K the following production to P'. ZN 4'M(E,A1)M(E,A1A2)...M(E,A1A2...AN) Since the automata is input pure each state represents one and only one symbol from. V. By adding the following productions the correSpondence between the states and the symbols in the automata is put in the production form. For each move function of the form M(Z,A) = X add the production A ~ X. Now each state represents a single symbol and each production which contains states is clearly 56 reducible to the original production in G by substitution. And since there exists at least one encoded (symbols replaced by states) production for each production G the constructed grammar is synonomous to the original grammar. The precedence relations can be taken directly from the automaton and the new grammar since the transitive closure type operations are implicit in the automaton. 5.2 LR(l) Grammars The synonomous grammar constructed from the automata is in general ambiguous and contains several productions which have the same right hand sides. However, the automata is deterministic and therefore, for each input, there is only one transition possible. Since the constructed grammar contains states of the automaton as non-terminal symbols, we can use this information to guide the precedence analyzer in choosing the proper production to apply. That is, the last state entered contains all the information necessary to determine the next State given any input. Since there are no productions with equal right sides which contain states, only the trivial productions need this consideration. The con- struction of the precedence matrix directly from the automaton and the non-trivial productions proceeds as follows: The E matrix isobtained from the non-trivial productions of the synonomous grammar together with the forced equalities which uniquely determine which trivial production is to be applied. Thus a -O b 1 if V aVI VJ or V--0VJ and M(VI,VJ) =V E(I,J) 57 We note that the two conditions are disjoint, one dealing only with relations between states and the other only between states and inputs. The L matrix reflects only relations between states and is obtainedeirectly from the m function. Thus L(I,J) E 1 if m(VI,V) 5 VJ The G matrix is just a relation between inputs since in a LRCP only terminal strings can appear to the right of a pro- duction and each production will have a terminal as its right most symbol by the definition of LR(l). Thus G(I,J) E 1 if m(m(V,VI),VJ) The matrices have the following form: STATES INPUTS E.=. rE 312‘ STATES LE 0 1 ll 1" .3 C) III '— D O O [0 0 INPUTS I0 0] - L G22.1 and therefore 6=E,=£=L-E°,.9=G The Q matrix may be found in the usual manner. In order to prove that the grammar just constructed is ESP it is only necessary to Show that there exists bounds for each pro- duction and that there are no equal right sides. If there were two encoded productions which had the same right side, then the original grammar could not have been LR(l) since the automaton ‘WOUld HOE have been able to distinguish between these two productions either. As for the trivial productions, since the'automatonis deterministic, 58 the last state entered must uniquely determine the next state for a given input. Therefore, there are no equal right sides. Since the length of each production is known and the right end of every production is also known, i.e., there are no > conflicts, the RR bound is just the longest production ending in each symbol and this is always finite. In fact, the longest bound must be less than or equal to the longest production in the grammar plus 1. Theorem 5.4 If a grammar is LR(l) then there exists a synonomous grammar G' which is an ESP grammar. Knuth [Kn 64] has shown that if a language is deterministic then there exists an LR(l) grammar which generates it. Theorem 5.5 If L is a deterministic language then there exists an ESP grammar which generates this language. Example 5.3 l. S a ABC 2. B a ASS 3. BI~ DB 4. B E D 5. S-oDS 6. S-oD This grammar is not a simple precedence grammar since there are equal right sides and conflicts between B and C, and also be- tween S and C. It is not bounded context since one must know how many A's have been stacked in order to determine whether to use B ~ D or S 41D. But it is LR(l), as illustrated below: LR(l) right linear grammar [s,$] - A[3,0] [3,0] - D[B,C] [s,$] .. A30$[1] [3,0] .. 030D] [s,c] .. A[3,0] [3,0] .. Dc[4] [s,c] .. ABCC[1] [3,0] -. A[s,0] [S.$] -° DIS.$] [3,0] -. Ascc[2] [s,$] ~ DS$[5] [I] -+ A ~ [A] —~ A [3,0] .. D[s,c] [2] .. A .. [5] _. A [3,0] - Dsc[5] [3] .. A -. [6] -» A 59 C B S S o MSG B B SYNONOMOUS GRAMMAR F-oA N-o3 S—oJQR G—oD o-s S-oFLM I-I-D P-oc 3—o103 I-oA Q—+C B—oHN J-+A R-oc N-+B K—oD T—oS s-oc'r L—oB U-S s-.1.U a nu: nvnv= n. .A =.U.U =.U Munvnvnvnvnv ==>000 300:0: =00=0= 000000 UOOOOOOOOOOO 1“: nvnvnunv m.nvnvnv= nu KAUAUnununv pnnvnvnvnvnv runvnvnvnvnu nxnvnvnvnvnu T.(‘n.nvnunu n.= nvnvnvnv p.nvnvnvnv= H<0000 nvnvnvnununv .Gnu.u.u <.U convnvnv. nu nunvnvnvnvnu n.nvnuAu <.U nvnvnunvnvnv nvnvnvnunvnu .U.UAU <.UAU nvnvnvnv= no .U.UAU <.UAU nvnvnvnvnv= .. :0: =0 0—. 0000 0.. 0000 000: 00 0.. 0000 000000 000000 0(0000 000000 0<0000 000000 <0000<000000 a 0000: 000000 <0000< 000.. 00 000000 000: 00 a 00000 000000 000000 000000 000000 000000 000000 0.. 0000 000000 000000 w.nucununvuup.R.T.an.pnxnm.tunuMu nvnv> > >AU > >.u nununv nonunu nvnvnv nunYO nonunv nunvnv nonunu nvnvnv nvnvnu nunvnv nvnvnv nonunu nvnvnv nvnvnv > >.U nvnvnv nvnvnv ADC Production used An example parse ADADCCC F d A FI d FA DADCCC c cm c on mm mmmummmmmmmmmpp a q a a a a a 4 a a a a a q a mwmmmmwwmmmmmnms .J I F C CM C C cmcmcmp WCX=X ..CM Amsq.qmwammm mimmmwkqxs.axaw aq«««q«a«aqqaqa aqqaqaqaaaaadaq Lemma 5.4 61 Tnere exists a context sensitive language which has an ESP grammar. The grammar: 1 G -'$R$ 2 $R$ a $AF$ 3 ARE —°AAFE 4 F$ a REC$ 5 F$ * DC$ 6 FE —.DE 7 FC d RECC 8 FC‘d DCC 9 DE -oDF 10 DRE -.DZE 11 DZ —~DR 12 R2 a RE 13 DC -oBC 14 DB * BB ihis grammar generates a context sensitive language. N N N generates ESP productions: R —~AF D a B F ..REC F —»DC F —~D G a $R$ E —.F R —.2 R2 —~RE D a R The precedence matrix: R F D E z A B C s R O O O = > 0 0 O = F 0 0 0 > 0 0 0 > > D < <‘< >>< 0 < = o E 0 0 0 > O 0 O = 0 Z O 0 O > O 0 0 0 O A < = < 0 O < < 0 0 B 0 0 O O 0 0 > > 0 C 0 0 O 0 O 0 0 > > $ = 0 0 0 0 < 0 0 0 SANBNCNS which has been shown to be To show that the above grammar actually $A B C $ for N 2 l we will examine a]-l sentential forms in a general derivation together with the precedence relations. Sentential Form C $R$ $AF$ $AREC$ $AAFEC$ $A.NFEN 1CN 1$ SA NDEN 1CN$1 $A WDK N -K- 1$CN-1 Prod. No. €> 0‘ GHQ-fitdfd C\ Prod. No. Resulting Sentential Form Resulting Sentential Form <$=R=$> <$$ <$$ 5 <$EC$ <$EC$ <$<§NEN—1CN-l$ <$<§NEN'ZCN'1$ EN-K-1Ch-l$ <$A$ 62 Sentential Form Prod. Resulting Prod. Resulting No. Sentential Form No. Sentential Form SAEDKEN-KCN-l$ 9 <$EN-K-1CN-l$ SANDN-lEcN—1$ 9 < $CN_1$ $ANDN-1FCN_1$ 7 < $CN_1$ 8 < $CN— 1 $ $ANDN-1RECN$ 10 < $E CNS SANDKZEN'Kch 11 <$ZEN‘KCN$ $ANDK’1RZEN'KCN3 12 <$EN‘KCN$ $ANRENCN$ 3 <$ENcN$ $ANCNCN$ 13 CN$ $ANDKBN-KCN3 14 <$BN-KCN$ note that the only place in any derivation where there exists a choice of productions is when either 4, 5 or 7, 8 can be applied. If 5 or 8 is applied the size of N becomes fixed and the only applicable productions are 13, 14 which convert D's to 3'5. Clearly for each sentential form, the proper reduction is made by the pre— cedence algorithm as demonstrated in the table. The only remaining question is that the analyzer may also accept strings of the form AIBJCK of ANBN-KCNBK. Examination of the precedence relations between A,B and C shows that C's can only appear to the right of B's and C's. Similar rules apply to both A and B so that the latter problem does not exist. By examining the sentential forms we can see that a precedence violation will occur with $ or the phrase R$ will come up with no reduction applicable in the former case. Q. E. D. 5.3 Synonomous Grammar for LR(K) Grammars To extend the concept of synonomous grammars to the general case of LR(K) grammars requires only a change in the definition of the set of productions used. That is, the automaton can be made 63 input pure and deterministic as shown in Lemma 5.3. However, since tfimzautomaton,from which the productions are found, must look ahead K symbol in order to determine if a reduction is to take place, the first K-l symbols of the K symbol string are encoded as states. But after the reduction is performed the states which represent the encoding of the K-1 symbols have lost their value as states. That is they are not necessarily related to the positions in the automaton for which they were encoded. This situation is easily rectified by replacing the right context on the left hand side of each production with the original terminal string. This can always be accomplished since the automaton is input pure and therefore there exists a unique deviation from the right context back to the original terminal string by the application of K trivial productions. For example if aSb a asb is a production and ‘bl = K then the string B is a string of states and there exists 182...BK_1 = B'B'... B' and trivial productions such that B182...BK_1 l 2 K—l Bi, aSBlBé'°'BR-IBK.” aSb and the following theorem has been proved. Theorem 5.6 Bé"'°’Bé—1 are terminal. Then the production to be used is just For each LR(K) grammar there exist a synonomous ESP grammar. CHAPTER VI Conclusions Section 6.1 gives a summary of the results of this disserta- tion and section 6.2 describes some possible extensions. 6.1 Summary of Results In chapter II we introduced the concept of precedence to— gether with several extensions of the concept. Each of these exten— sions increased the class of grammars which could be effectively handled by either extending the definition of precedence or changing the grammar. However none of these methods contain any mechanism for handling the problem of equal right sides in the production set. In this dissertation we have changed the definition of pre— cedence to include such a mechanism through the use of bounded context techniques [Fl 64]. Since the grammars which may be successfully handled by both McKeeman's [Me 66] and Wirth and Weber's [Wi 66] extensions are subsets of LR(K) [Ku 64] grammar, any prOperties of ESP which are related to LR(K) are also related to these extensions. To summarize, the following have been shown: (1) Every deterministic language has an ESP grammar. (2) Every LR(K) grammar has a synonomous grammar which is ESP. 634 64 (3) There exist grammars which are not LR(K) for any K but which are ESP. (4) There exist context sensitive languages which are ESP. Although the above results are of theoretical significance their practical value is marred by the large increase in non- terminahsrequired for the synonomous grammar. However there do exist many grammars which are not precedence under any of the aforementioned formulations, but which can be effectively handled by ESP without changing the grammar. The use of context sensitive productions allows ESP to attack the equal right hand sides prob- lem as part of its basic mechanism. This point is significant in that no other precedence formulation can accomplish this directly and in many such techniques it is completely ignored. Therefore we feel that ESP is an effective technique for syntactic analysis. 6.2 Some Possible Extensions of ESP The formulation of the precedence relations under ESP can be applied to any method of precedence analysis. In particular, the methods of Floyd [Fl 63] and Colmerauer [Co 68] can be easily extended in this way. Another interesting question involves the set of languages accepted by deterministic linear bounded automata [My 60]. Several languages from the above set have ESP grammars and we feel that the mechanism of the analysis is sufficient to handle the entire set although we have not been able to prove it. 65 Given a set of productions and a precedence matrix not derived from the productions a third interesting problem can be posed. Does this set define a language and if it does can a grammar be constructed from the set so that it generates this language? Again from the mechanism of the analyzer the class of languages so defined would be larger than the class of con- text free languages. BIBLIOGRAPHY [A1 [Ba [Cd [Ch [Ch [Co [Ei [Ei [Fe [F1 [F1 [Gi [61 68] 64] 67] 64] 68] 67] 63] 64] 68] 63] 64] 62] 66] BIBLIOGRAPHY Alber, K., Oliva, P., Urachler, 6., Concrete Syntax of PL/l, Technical Report No. TR25.084, IBM Laboratory Veinna, Veinna, Austria, June 1968. Bar-Hillel, Y., Language and Information, Addison- Wesley Publishing Company, Inc., Reading, Mass., 1964. A1G01.Generic Reference Manual, Publication No. 60214900, Documentation Department, Control Data Corporation, Palo Alto, Calif., 1967. Cheatham, T.E., and Sattley, K., "Syntax Directed Com- piling", Proc. AFIPS 1964 SJCC, Vol. 25, pp. 31-57. , On the Generation and Implementation of Re- duction Analysis Programs, CA-6804-0212, Computer Associates, Inc., Wakefield, Mass., 1968. Colmerauer, A., Precedence, Analyse Syntaxgque et Languages de Programmation, Thesis, University of Grenoble, Grenoble, France, 1967. Eickel, J., Paul, M., Bauer, F.L., and Samelson, K., "A Syntax Controlled Generator of Formal Language Processes". Comm. ACM, Vol. 6, pp. 451-455, August 1963. , Generation of Parsing Algorithms for ChOmsky Z-Type Languages. Mathematisches Institut der Techneschen Hochschule, Munich, Germany, 1964. Feldman, J. and Gries, D., "Translator Writing Systems," Comm. ACM, Vol. 11, pp. 77-113, February 1968. Floyd, R.W., "Syntactic Analysis and Operator Pre- cedence", J. ACM, Vol. 10, pp. 316-333, July 1963. ‘________’ "Bounded Context Syntactic Analysis", EQEEL_A£!J V01- 10: PP. 62-67, February 1964. Ginsberg, S., An Introduction to Mathematical Machine Theory, Addison4Wesley Publishing Company, Inc., Reading, Mass., 1962. , The Mathematical Theory of Context Free Languages, McGraw—Hill, Inc., New York, 1966. 66 [Gr [Gr [Kn [Ma [MC [Mi [My [Ra [Sq [Ti [Wa 65] 68] 65] 68] 66] 68] 60] 59] 66] 67] 62] 66] 67 Greiback, S., "A New Normal Form.Theorem for Context Free Phrase Structure Grammars", J. ACM, Vol. 12, pp. 42-52, January 1965. Gries, D., "The Use of Transition Matrices in Compiling", Comm. ACM, V01. 11, pp. 26-34, January 1968. Knuth, D.E., "0n Translation of Languages from Left to Right," Information and Control, Vol. 8, pp. 607- 639, October 1965. Martin, D.F., "Boolean Matrix Method for the Detection of Simple Precedence Grammars," Comm. ACM, Vol. 10, pp. 685-688, October 1968. McKeeman, W.M., An Approach to Computer Langgage Design, Technical Report 0848, Computer Science Department, Stanford University, Stanford, Ca1if., August 1966. Mills, D.L., "The Syntactic Structure of MAD/1," Technical Report 7, Concomp, University of Michigan, Ann Arbor, Mich., June 1968. Myhill, J., Linear Bounded Automata, Technical Report 60-165, Wright Air Deve10pment Division, Cincinnati, Ohio, June 1960. Rabin, M.O., and Scott, D., "Finite Automata and Their Decision Problems," IBM J. Research and Development, Vol. 3, pp. 114-125, 1959. Squires, E.B., Precedence Grammars and the Euler System, Parts I, II, III, and IV, Technical Report 701, Computer Science Department, University of Illinois, Urbana, 111., August 1966. Tixier, V., Recursive Functions of Regular Expressions in Language Analysis, Technical Report 0858, Computer Science Department, Stanford University, Stanford, Calif., March 1967. Warshall, S., "A Theorem on Boolean Matrices," J. ACM, V01. 9, pp. 11-12, January 1962. Wirth, N. and Weber, H., "Euler-A Generalization of AlGOL and its Formal Definition," Part I, Comm. ACM, Vol. 9, pp. 13-25, January 1966. APPENDIX APPENDIX Analysis of MAD/I This appendix will deal with the analysis of MAD/I under ESP. While it is not a simple precedence grammar, it is an operator precedence grammar and therefore the problem of equal right sides is assumed to be solvable by other considerations. The problem is not solvable under ESP because the grammar is ambiguous. That is, one may derive (IDR) in 3 ways from PLS and (DES) in 2 ways from PLS. The analysis is presented in the following order. (1) The original productions start on page 70. (2) The precedence matrix with the simple precedence conflicts appears next, starting on page 72. (3) The Q matrix, starting on page 76, is encoded as follows: a) Q1(I,J) implies Q(I,J) = E b) Q2(I,J) implies Q(I,J) = G c) Q3(I,J) implies Q(I,J) = L (4) The set of generating strings appears next, starting on page 80. (5) Next the set of CSP's appears, starting on page 83. (6) The bounds for the set of CSP's are listed next, starting on page 89. 68 69 (7) Finally the set of CSP's with the bounds applied is listed starting on page 91. It is interesting to note that only two productions were added to the original set under ESP. 70 SYNTAX FOR MAD/I OPERATOR PRECEDENCE KERNEL GRAMMAR FROM MILLS. DAVID Lu. THE SYNTATIC STRUCTURE OF MAD/I. THE UNIVERSITY OF MICHIGAN. CONCOMP PROJECT. ANN ARBOR. MICHIGAN. TECHNICAL REPORT 7. JUNE 1968. 91 PP.. ALSO AO-67l 683 000000 QQ~JOU1¢UN~ PROGRAM PRIMITIVES XL SIDN XL SLP PLS IDR XL 10R 10R .ATT. XL XM IDR XM XM STAG TOR XM XM sxev 055 XM DES DES . XM ASSIGNMENT STATEMENT X1 DES X1 .ABS. X1 X2 X1 X2 ,X2 .LS. X1 X2 X2 .RS. X1 X3 X2 X3 X3U X3U .ABS. X3U X3 X2 .LS. X3U X3 X2 .RS. X3U X3U .N. X3 X4 X3 X4 X4 .A. X3 X5 X4 X5 X5 .V. X4 X5 X5 .EV. X4 X6 X5 X6 X6 9' X5 X7 X6 X7 X7U X7U .ABS. X7U X7 X2 .LS. X7U X7 X2 .RS. X7U X7U .N. X7U X7 X4 .A. X7U X7 X5 .V. X7U X7 X5 .FV. X7U X7 X6 *9 X7U X7U $NEG X7 X8 X7 X8 X8 9 X7 X8 X3 / X7 X9 X8 X9 X9 . X8 X9 X9 - X8 XA X9 XA XA = X9 XA XA .NE. X9 XA XA .GT. X9 XA XA .GE. X9 XA XA .LT. X9 XA XA oLE. X9 X9 XA XH XBU XHU .AHS. XRU 71 '58—'X8flx2“ ".LS .7 ”XE—0 ' ' 56 X8 X2 .RS. X8U S7 X8U .N. XBU 58 X8 X4 .A. XBU 59 X8 X5 .Vo XBU 60 X8 X5 .EV. X8U 61 X8 X6 .9 X80 62 XBU SNEG X8U 63 X8 X8 ' X8U 66 X8 X8 / X8U 65 X8 X9 9 X8U 66 X8 X9 - X8U 67 X8 XA I X8U 68 X8 XA .NE. X8U 69 X8 XA .GT. X8U 70 X8 XA .LE. X8U 71 X8 XA oLT. X8U 72 X8 XA .GE. X8U 73 X8U .NOT. X8 74 XC X8 75 XC XC .AND. X8 76 X0 XC 77 XD XD .OR. XC 78 XD XD .FXOR. XC 79 XE X0 80 XE XE .THEN. X0 81 XF XE 82 XE XF .EOV. XE 83 ASN XF 84 ASN DES 8' ASN 85 STM ASN C C LIST STATEMENT 86 XH DES 87 XH XH ... DES 88 XJ XH 89 XJ ASN 90 LST XJ 91 LST LST . XJ 92 STM SLIST LST C C DECLARATION STATEMENT 93 XX IOR 94 XK XK SATRB IOR 95 L50 XX 96 LSD LSD . XX 97 STM SDECL LSD C C PROGRAM STRUCTURE 98 PLS ( LSD ) 99 PLS ( LST 1 100 PLS ( STM 1 101 STM DES .. STM 102 STM SSIMP STM 103 STL STM 104 STL STL 09 STM 105 STM SCOMP STL SEND 106 PGM SLC STL SRC E END OF MAD/1 SYNTAX 106 PRODUCTIONS READ NOT 8 32 NTT 8 78 NPD = 106 7 6 GT LToEO GTOEO LTOGT LT.GT.E° 72 5 3 O O O I O L G E0 LT ENTRY MEANS PRECEDENCE ARRAY . O 0 5 0 6.666666036606I 0000000000000060050006.00000 I 9. punvfipUfl-Gpoflvoaunvanul 0 .0 0 0. 0. . .0 0 .. .G 0 .6 . 00.0 .. . 00 0 O a 5666660655665I 00000000o000.0060050006000000 Sure 7 00000000 00.000000000000000 .000 0 0000000LLLLL .0. 6 fiuapunvenunu6.05vI 0 00 0 .0 0 .0 0. 0 0. 0 .0 .6 . 05 . 00.0 .. 0 00 0 0Suv . 5 fivsnunv6.0nuG.UI 0 00 0. .0 .0 00 .0 . 00 0. .G . .6 0. 0G 0 0. 00 0 0N0 .9 GOGGGGGGGI 00.00.000.0000000.060.6.0.6000000 0‘ 0 3 nvaunvenunvG I 00 0 0. . .. 0 00 00 0 00 0 0. .G 0 0a.. 0 .0 0 00 0 00 0”. 2 0 000 0000 0000000000 0000000000000. 000000LLLl-L 0‘50 1 GGOGGI 00.000.00.000000000000060060006 00000. 0LSO 8. 36666. 0000000000000000.0000006 0060000000000 .08050 9 9'00000.0I00000.00000000O000'000...0OD0-LLLI\ . 8 fivG.DI 0 00 00 0. 0 00 0. 00 .0 00 0 00 0 .. 0G 0 06 00 0° 0 0. .0 0 ‘KE' 7 GO- 0 0000 000000 000000000000 0006 0080006 000000 81.86 6 G‘I 0 0000000000 000000000000 0006 0060005 000000 08001.. .3 G I 00 0 0.00 0 00 0 .. 0 .. 0 0. 0. 0 00 0 0. .G.. .5 . .0 0 .0 0 .0 0 'LP 9 0000 0000 000000 0000 000000000000 0 000LL 0LLLLLL ’10” 3 00000000000000.000 .00000.00000000.LL 0LLLl—LL '0' z 0 0 00 0 0 0 0 00 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 . 00 0 0 0 0 0 0 0 0 0 SIIL I . .0 00 00 00 00 00 .0 .0 00 0. 00 00 00 00 00 00 00 00 .0 00 00 PLS 3° 0 000 0000000000000000000000 000 0 00.. 000000000 LSD 9 0 0 00 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 an“ a 0 00 00 00 00 00 00 0. 00 00 .0 00 00 00 00 00 00 00 0. 00 00 00 LeaT 7 0 00 00 00 00 00 00 .0 .0 0. .0 0. 00 00 0. 00 00 00 00 00 00 00 xsu 6 0 00 00 0. 0. 00 00 .0 00 00 0. 00 00 00 .0 00 .0 00 00 .0 00 .0 an“ 5 0 00 00 .0 0. 00 00 00 .0 0. 00 00 00 0. 00 0. 00 0. 00 00 .0 00 ST“ 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8:0" 3 0 .0 00 00 00 00 00 0. 0. 00 00 00 00 00 00 00 00 00 00 00 00 .0 In' a 0 00 00 00 00 00 00 0. 00 00 .0 00 00 00 0. 00 00 00 00 00 00 00 05E 1 . 0. 00 00 .0 00 0. .0 00 00 .0 00 00 00 00 00 00 0. 00 00 .0 00 050 «to 0 00 00 00 00 00 0. 0. .. .0 00 00 .0 .0 00 00 0. 00 00 0. 00 .0 an 9 . 0. 0. 00 00 00 0. .. 00 00 00 00 00 .0 .0 00 00 00 .0 00 00 00 XnUU 8 . 00 .0 0. .0 00 .0 .0 00 0. 00 00 .0 00 .0 00 00 00 00 0I II II use 1. 0 . 0. 0 00 . 00 0 00 0 0. 0 .. . 00 0 0. . 0. .0 0 00 0 .0 0 0. . 00 . “ 6 O O O O O I O C O O 8 O O O O O C C C O O l O C C O I C O C O I O O O O O O I O O 8 O 0A9 5 0 0. 0. .0 00 0. .0 0. 00 0. 00 00 00 00 00.00 00 00 00 .0 00 00 0A. 6 . .. 00 00 .0 00 00 .0 .0 00 0. 00 00 0. .0 00 00 00 00 00 00 .0 X1IU 3 . 0. 0. .. 00 00 00 0. .0 .0 .0 0. 0. 0. 0. 00 00 00 00 .II II I X1. 2 0 0. 00 00 00 00 00 0. .0 00 0. 00 00 00 0. 00 .0 00 00 0. 00 00 .I6 1 0 00 .0 00 00 .0 0. 0. .0 0. 0. .0 00 00 .0 00 00 .0 00 0. 00 0. XE. .10 0 0. 0. .0 00 0. .0 .0 0. .0 00 .0 .0 00 0. 00 0. 00 00 .0 00 00 X.9 9 . 0. 0. 0. .0 00 0. .0 .0 .0 0. 0. 0. .0 .0 00 00 00 00 00 00 00 I‘JU s 0 00 0. 00 .. 00 0. .0 .. 00 00 .0 00 0. 00 00 00 00 00 0I II.LL XqJ 7 .0 0. 00 00 00 .0 .. 0. 00 0. .0 00 00 0. .0 00 .0 00 00 00 0I I I? 6 0 0 00 00 00 00000 0 00 00 0000 0000 0000 0 0000000000LL X.l 5 0 .. .0 .0 .0 00 .. .0 0. .0 0. 00 .0 00 0. 00 0. .0 00 .I IIILL OILS 9 .. .0 . .0 0. 0. .0 .. 00 0. 00 00 00 0. 00 00 00 00 00 0L.LL.LL XI” 3 . 00 .0 .0 .0 00 0. 0. 00 .0 0. .0 00 00 00 00 00 00 001JL.LL.LL in“ 2 0 0 00 000 0 000 0 0 0 00 00 0 00 0 0 0 O0 000 0 0 00003 OLLLLLL. XL 001 000. 00.000000000000000000000000000IL0LLLLLIV X2 7 X3 8 X311 9 X4 X9 11 X6 .ATT. .889. .LS. IRS. 92 0". _¢3 .80 12 X7 XA xs 1 XL 2 109 3 XM 6 DES 5 X1 6 13 X7” 18 X8 15 X9 18 XBU 19 XC 20 X0 16 17 21 XE 22 X7 23 ASN 26 STM 25 XH 33 SIDN 3b SLP 35 36 STAG 37 SKEY 27 LST 38 0 39 10 26 XJ 28 XX 29 LSD 30 PLS 31 STL 32 PGM 60 01 73 . _ . LLLLLLLLLLLLLLLLLLLLL ,IIOOOIOOIIOOOOOOIOOOO ......OOOIOOOOOOOOOOOO ......OOOOCOOOOOOO'OO LLLLLLLLLLLLLLLLLLLLL “00000000000000000000 . _.....OOOOOOCOOOOOOOO LLLLLLLLLLLLLLLLLLLLL LL 00 00 00 00 LL 0. 0LL 0. 00 00 0L 00 00 0L 00 60000 60000 00000 0LLLL 60000 60000 00000 60000 0LLLL 60000 0.I? 0LLLL O O O O O C C 0.. I C O P. .0. .0. CO. LLLLLLLLLLLLLLLLLLLLLLLLLLL LLLLLLLLLLLLLLLLLLLLLLLLLLL 000 000 000 000 000 .00 000 000 000 000 000 000 .00 .00 III 000 000 _ 000 .00 II 00 .00 0 0 0 0 O 0 0 0 0 0 0 0 0 0 L L 03L 03LL 033LLL LLLLLL I LLLLL 0 . 0 0 0 0LLLLLLL 0333333LLLLLLL 033LLLLLLLLLLLLL ILLLLLLLLLLLLLLLLLL 0II ILLLLLLLLLLLLLLL 0LLLLLLLLLLLLLLLLLL 0 03LLLLLLLLLLLLLLLLLL 0v. .EV. 66 .5 06 .. SNEG 67 68. 69/ 50 0 51 - 52 I 53 .ur; 56 .GT. 55 .GE. 56 .LT. .LF. 57 .NOT. .ANn. .OR. 58 59 60 .FXOR. .THFN. .EOV. 61 62 63 JJLLLLLLLLLLLLLLLLLLL LLLLLLLLLLLLLLLLLLLLL LLLLLLLLLLLLLLLLLLLLL LLLLLLLLLLLLLLLLLLLLL 0LL LLLLLLLLLLLLLLLLLLLLL 0LL LLLLLLLLLLLLLLLLLLLLLaLL LLLLLLLLLLLLLLLLLLLLLLLL 0 0L LLLLLLLLLLLLLLLLLLLLLLLLaLL LLLLLLLLLLLLLLLLLLLLLLLLLLL 66 00 6s .0. 00 0 . 00 C. 1.0 03 IL LL .0 LL LL LL LL LL LL LL LL LL LL LL LL LL LL LL LL LL SLI