[Hal l)“- LIBRARY Michigan State University This is to certify that the thesis entitled AN AUTOMATIC VERIFICATION CONDITION GENERATOR FOR THE LANGUAGE C presented by Jennifer Marie Arbanas has been accepted towards fulfillment of the requirements for M.S. degreein Computer Science $43? M Anthony S. Wojcik mew 0-7639 MS U is an Atfmnalin Action/Equal Opportunity Institution MSU LlBRARlES .—:—. RETURNING MATERIALS: Place in book drop to remove this checkout from your record. FINES will be charged if book is returned after the date stamped beiow. [it .63 W AN AUTOMATIC VERIFICATION CONDITION GENERATOR FOR THE LANGUAGE C by Jennifer Marie Arbanas A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Computer Science August 28th 1987 ABSTRACT AN AUTOMATIC VERIFICATION CONDITION GENERATOR FOR THE LANGUAGE C By Jennifer Marie Arbanas For many software applications, it is necessary to verify that the software performs its designated function. In an attempt to ease the work involved in verifying programs written in C, a method for automat- ically generating the verification conditions for paths corresponding to program segments was developed. A transformational language called TAMPR [1][2][3][4][14] is used to take code written in C along with input, output and loop assertions and produce verification conditions which are necessary for each path of the code to execute properly. A theorem prover, II? [10], can then be used to produce a proof of correct- ness for the code segment. A detailed discussion of how TAMPR is used to create the verification condi- tions via an example program traced from the C program throughout the transformation set is presented. The H? proof for the program paths is also given. ACKNOWLEDGEMENTS I acknowledge Argonne National Laboratory Mathematics and Computer Science Division, where most of the research for this thesis was preformed. Within the Division, special thanks to Brian T. Smith whose constant guidance and help were sought after and obtained, Barbara Bonar whose work contributed to this project, and Jim Boyle and Ken Dritz who were my guides into the world of TAMPR. In addition, the entire Argonne staff has been a helpful and enjoyable group which has helped me in my guidance into the field of computer science. I am grateful to Anthony Wojcik, whose guidance and help at Michigan State University has been indispensable, as well as the entire MSU Computer Science staff where my education in the field of Com- puter Science has been greatly uplifted. Much of the funding for this project was from the Natural Science Foundation to whom I also owe thanks. iii TABLE OF CONTENTS LIST OF FIGURES SECTION I INTRODUCTION Motivation Software Verification Technique SECTION 11 Description of Tools TAMPR ITP OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO SECTION III Interpretation of Scanner, Parser and Tree Builder Output ........................................... Scanner Tokens with associated values (kept tokens ) Tokens without assocaiated values (deleted tokens ) Parser Tree Builder OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO SECTION IV C Example firsttrans initstrnts stndrdsp markident markident2 vargentr asserttr arraytr insvaltr reltrans setup thenelse problems forcingsplit OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO notsdwn quantsup remsqnew ....... quantout Iasttrans UIUIUJNH Results of the Transformations Through a Sedfile ITP Run on Edie.c FustrunofITP Second pass of ITP SECTION V Limitations with this method SECTION VI Summary and Concluding Remarks APPENDIX A Alterations in the C Grammar APPENDIX B Changes to Tools APPENDIX C Example Scanner Tokenizer Using FSCAN REFERENCES ....... 73 74 74 76 77 78 88 9O LIST OF FIGURES Figure 1 Verification Process Figure 2 TAMPR Components Figure 3 Empty C Program Figure 4 C Program After Scanning Figure 5 C Program After Parsing Figure 6 Tree Structure of C Program Figure 7 Tree Builder Output for Sample Program Figure 8 Tree Representation of Tree Builder Output Figure 9 Transformation Categories Figure 10 State Representation of Edie.c 13 14 16 18 19 20 23 SECTION I INTRODUCTION The process of using TAMPR [1] [2] [3] [4] [l4] (Transformation-Assisted Multiple Program Reali- zation system ) and ITP [10] ( Interactive Theorem Prover ) to verify code written in C is depicted in Fig- ure 1. The transformations which produce the verification conditions ( necessary upon execution of the C code ) are written using the TAMPR programming language. One applies the transformation set to extended C code as easily as compilation of code therefore, the transformation of C to clauses is a corn- pletely automatic step in the verification of C. The transformation set is described in detail in this thesis. A short description of ITP will be given, and an example of its use in verifing an example program, called edie.c, is presented. Use of ITP is an automatic process, but it requires some specification on the problem domain and some knowledge of theorem proving techniques. CProgram with bus TAMPR Transformations Clauses ITP 1 Verification of C Code Figure 1 - Verification Process The verification process starts with the user specifying assertions about his/her code. The assertions are of the following types: input which specify conditions for input variables, output which specify conditions at termination of the program segment, and loop assertions which specify loop condi- tions. The extended C code is sent through a set of transformations which are written in TAMPR. The transformation set produces the verification conditions for the code and places the code in a clause form suitable for ITP. For web path of the program segment, the transformation set produces the corresponding conditions of execution. An if statement produces two paths of execution, one of which corresponds to the if condition while the other corresponds to the else condition. ITP is an automated theorem prover which can be used to produce a proof that each program seg- ment path does not violate the program segment assertions. What follows is the motivation behind the C automatic verification process, the software verification technique used, and a description of TAMPR and ITP. From there, the thesis will contain a section describing the particulars of the program scanner, parser and tree building output, and a trace of the exam- ple C program, edie.c, through the transformation set. Also contained will be a description of how ITP was used for the example C program and a conclusion stating the limitations with this method of verifing C together with closing remarks. Motivation The EBRII ( Expermental Breeder Reactor II ) Division at Argonne National Laboratory West has taken on a project called Full Authority, Fault-Tolerant, Reactor Control System (FAFI'RCS) whose objec- tive is the development of a computer-based control system for a nuclear reactor [5]. FAFTRCS is intended to replace the present system which is not computerized. The reactor once contained ten direct sensing flow meters to monitor the flow of coolant around the reactor core. Seven of those meters are no longer functioning; and because the meters are located beneath approximately 50,000 gallons of molten sodium, they are not accessflrle for repair. Hence there is a need to use a different collection of available sensors to infer the coolant flow by some analytic means. FAFI‘RCS requires that the hardware and software meet designated reliability levels. Part of the software for this system is written in C, so there arises a need for a method of verification for the code. With this in mind, the Automatic C Verification Condition Generator was developed. The software verification is done in conjunction with hardware verification on a four processor com- puter system developed by the Charles Stark Draper laboratory (CSDL). CSDL has designed a fault- tolerant processor (FTP) which forms the basis of a portion of the reactor safety system. The verification techniques used in this project are also applicable to situations such as satellite con- trol, aircraft control, nuclear plant control and any other application for which high reliability in software is required. Software Verification Technique Because reasoning is involved in all aspects of software generation as well as testing and verification, it is of interest to use an automated reasoning system, such as ITP ( Interactive Theorem Prover ), to aid in the reasoning process. The project described in this thesis, aims to show how a reasoning system can be used to aid a programmer in showing that a given computer program does what it is supposed to do. Two approaches are possible for reasoning about programs. In the first, a reasoning system is used to carry out symbolic execution of portions of, or entire, programs. The symbolic execution can be of assistance in verifying that loop bounds are proper and array subscripts are correct. The second approach uses a software system to verify that a program has certain properties. If enough properties are verified for a pro- gram, it is assumed that a proof of correctness has been obtained. One must always keep in mind, when speaking about proof of correctness and verification, that the proof is only as strong as the assertions a user makes about his/her code. A central problem in using a reasoning system is one of representation, that is, how to place a pro- gram into a form manageable by the reasoning system. Obtaining a suitable representation for a program is diffith for several reasons: (1) Finding a way to capture the meaning of the program in a general representation which is independent of the program is a diffith task. (2) Convincing oneself that the representation really captures the meaning of the program and does not lead to ambiguous interpretations is not trivial. A means must be found to generate a representation of the program which overcomes these difficulties. Then a proof of correctness can provide a guarantee that the program meets the specifications for all the input data that satisfy its input assumptions. The reasoning program described in this project, ITP, requires statements in clause form. This clause form will be described in Section II. Representation of even small segments of code requires many clauses; therefore, it is of interest to produce an automated method of clause generation from a given pro- gram segment. TAMPR is the tool used for the automatic generation of clauses necessary in the verification of code written in C. The TAMPR routines making up the generation of verification conditions are referred to as the C Automatic Verification Condition Generator. Chapter 12 of the book Automated Reasoning Introduction and Applications [15] specifies a method for creating the clauses for a small program segment which can then be used with a theorem prover to verify the program. This is a long and tedious process for which the C Automatic Verification Condition Generator is designed to automate. The automation of the clause generating process reduces the amount of work needed to be done manually that is necessary to ver- ifyaprosram segment» SECTION II Description of Tools TAMPR The Transformation-Assisted Multiple Program Realization system (TAMPR) [l][2][3][4][14] was developed at Argonne National Laboratory. TAMPR, written in LISP, is a system for specifying source- to-source transformations. TAMPR was originally designed as a means to perform source to source transformations on Fortran 66 programs and has been used to construct multiple realizations of mathemati- cal subroutines written in Fortran 66 and 77. In constructing multiple realizations, one would like to minimize the number of changes necessary for a program to run on a particular machine. TAMPR was designed to derive variations of a program automatically. Another application of TAMPR is its use in translating LISP code to Fortran. TAMPR uses a tree matching process to modify programs. Therefore, the tree form of a program is required in order to apply transformations. TAMPR can be used with many different programming languages. There are implementations of TAMPR for transformations in Fortran and C. To use TAMPR with a particular language ( in this implementation C), it is necessary to create the scanner, parser and tree builder for the language ( see Figure 2). TAMPR uses this tree structure to perform matching and replace- ment operations in transforrnating the code. In using TAMPR’s tree matching process, it is not possible for a transformation to produce a tree structure representing a segment of code which is not grammatically correct within the language being transformed. The other major component of TAMPR is the transforma- tion tree builder which contains the transformation language scanner, parser and tree builder. The transfor- mation parser is used in parsing a transformation written in TAMPR, and the tree builder produces a tree structure representing the transformation and its replacement rules. TAMPR CScanner CParser o TScannner TParser CTokens C Grammar ® Ttokens T Grammar Ker- CTB - C Tree Builder TTB - Transformation Tree Builder C'I‘okeus - C Tokens TI‘okens - Transformation Tokens CGrammar-CGrammar TGrammar-Transformationgrammar CScanner - C Scanner TScanner - Transformation Scanner CParser - C Parser TParser - Transformation Parser LTB - Language Tree Builder T Tree Builder - Transformation Tree Builder Figure 2 - TAMPR Components In using TAMPR, one would first use the language scanner, parser and tree builder to produce a tree representation of a program. One would then write a number of transformations which specify match and replacement nrles for the language grammar. Each of the transformations would be scanned, parsed and sent through the transformation tree builder creating a tree version of each transformation. The next step is to apply one transformation at a time to the program tree creating a new program tree with any alterations applied from the transformation. Hence, the original program tree is slowly replaced with each transforma- tion. This process can be repeated with many transformations. Note that in Figure 2, one could create a version of TAMPR to run in a number of different program- ming languages by supplying TAMPR with the language scanner, parser and tree builder. In addition one needs to alter the transformation scanner to include the language tokens. In the application for C, all of these items needed to be created with the additional problem of handling situations where the TAMPR syn- tax and C syntax overlap. To handle this overlap problem with C, all TAMPR tokens are surrounded with back quotes (see appendix A). Now that the TAMPR components have been described, a discussion of how transformations in TAMPR are created is presented. Transformations in TAMPR have two parts : a structural description (matched pattern) and a structural change (replacement pattern). The transformer applies each transforma- tion by searching the program tree for sub-trees that match the matched pattern portion of the transforma- tion. When a match is found, the subtree is replaced by the replacement pattern of the transformation. Both the matched pattern and the replacement pattern must be within the constraints of the language grammar. An example of a C transformation in TAMPR follows: Example 1 - Simple transformation ‘transformit‘ ¢{S C {6 ‘sc s=> ‘w‘ ‘1‘ ‘16 ‘transform#‘ : A key word specifying that a transformation follows. The # specifies how the transformation is to be applied to the tree structure. Remember that in the C implementation, items in TAMPR are surrounded by back quotes. ‘{‘ ‘l‘ : The outside set ofbrackets delimita transformation or set of transformations. ‘[‘ ‘l‘ : The inside brackets delimit each dominating symbol. ‘sc‘ ‘sd‘ : The inside brackets delimit each match and replacement to be made under a given dominating symbol. : Is the dominating symbol which can be thought of as the head node of the tree where and must both be instances of node . :isthematchedpatternofthetree to be located and replaced by . : This is the replacement pattern for . A TAMPR transformation can be generalized so that many different replacements can be contained in one transformation under many different dominating nodes. The generalized TAMPR transformation is shown in Example 2. Example 2 - Generalized transformation ‘transfonn#‘ 6 I G d [ C th match pattern for l C=>6 Replacement pattern for 1 Cxi Cwfi match pattern for n I=>C Replacement pattern for n txd Old I I C In creating TAMPR transformations, one could actually put all the transformations that are to apply into one transformation. But in creating transformations, it is suggested that the size of each transformation be restricted to facilitate ease of debugging and to demonstrate ( prove ) that the transformation works. Transformations should also be split according to what function is intended by the match and replacements. An additional consideration is the order that one would like the transformations to apply. In separating the transformations, it is easier to think about the transformations as a series of changes. 10 ITP ITP is a reasoning program which was designed as an environment for the study of automated theorem proving. ITP is a clause-based reasoning system which can be used interactively or in batch. It is written in a portable dialect of PASCAL and can run on VAX/UNIX, VAX/VMS, IBM/CMS, Perq, Apollo and SUN systems. The information making up this section is an incomplete summary of ITP[10], so the reader is referred to [10], [l l] and [12] for a more complete explanation. Those items, which are necessary for a understanding of how ITP was used in the C verification generator, will be explained in detail here. The basic item managed by ITP is a clause. A clause corresponds to a statement (an assertion or a denial of a purported theorem ) and is a disjunction of a set of literals. ITP derives new clauses from exist- ing ones by using one or more inference rules. In ITP, a constant is represented as a string of characters beginning with a letter between a and r or A to Z and containing no symbol from the set { (, ). &.l 1. Variables start with a letter between s and z or with a 96 or _. Functions and predicate symbols follow the rules of constants. The negative symbol for a predicate is -. For example, the following are valid literals: P 000 -R(f(X.a).b) Finally note that a l represents an or in ITP. Clauses can also be written in an if then format with a conjunction of the if portion and a disjunction of the then portion. For example: if Q(x) & P(y) then -R(f(x,a),b) ; Variable names are not retained internally in ITP, instead they are converted to variable numbers on input. As an example, the above statement if Q(x) & P(y) then -R(f(x,a).b) ,' becomes if Q(x1) & P(x2) then -R(f(x1 .a),b),. In ITP a clause can be placed in one of the following four lists: ll 1) axiom 2) set of support 3) have-been-given 4) demodulator The mechanism for automatically applying an equality of the form Equal(A.B).' is called demodulation. Equal(A .B); is called a demodulator. ITP will (1) Pick a clause from the set of support and call it the "given clause"; (2) Infer a set of clauses which have the "given clause" as one parent and other parent clauses selected from the "axiom" list, "have-been-given" list and the "demodulator" list; (3) Process each generated clause by simplification and subsumption checks; (4) Move the "given clause" from the set of support to the "have-been-given” list. This process is continued until the “set of support" list has been exhausted or a contradiction has been found. A contradiction indicates that a clause is generated which is the negation of an existing clause in the clause list ( both the negated clause and the existing clause cannot hold true at the same time ). lTP’s behavior is controlled by approximately fifty parameters which are called options. These parameters include inference rules and weighting. Defaults exist for all the options. Inference rules are mechanisms to deduce new facts from given ones. Effective use of ITP depends on the knowledge of which inference rules to apply to a given set of clauses. The ITP inference nrles include: Hyperresolution UR-resolution Binary Resolution Unit Resolution Factoring Paramodulation In the C verification application, binary resolution is used and will therefore be the only inference rule dis- cussed in this section. In using binary resolution, two clauses, each containing one or more literals, are compared. The idea is to clash two of the literals, one from each clause. To clash is to find a literal in each clause which is opposite in sign and to establish values for the variables so that the two literals (except for sign ) become the same. The theorem prover would then propagate variable substitution in the two clauses. The 12 propagation will produce the same variable names between the two clauses. Next the unified literals are removed from both of the clauses to produce a third clause consisting of the merge of the remaining literals. The process of removing the two literals is referred to as a clash. Binary resolution is the smallest possible deductive step where exactly two clauses can participate in a clash. Hyperresolution and UR- resolution take several reasoning steps at once by using many clauses in a clash. Binary resolution is com- plete in the sense that if the clause set is inconsistent, a contradiction is to be obtained eventually. An example of binary resolution is the following: -P(x1) I Q(x2) I -R(x3) -Q(xl) I S(x2) -P(xl) I Q(x2) I -R(x3) Variable substitution takes place first. -Q(x2) | S(x4) -P(xl) l S(x4) I -R(x3) Unified literals are removed and the remaining literals are merged into one clause. Weighting is the assignment of a number to a clause, literal or term which measures its complexity. This number is used in determining: (a) Whether to keep a clause which is newly derived; (b) How to pick the next given clause from the set of support; (c) Whether a newly derived equality should become a demodulator. In the C verifier, weighting is done with patterns. Weighting patterns override the default weighting algo- rithm which is used when the weighing option is on. Using patterns, one can apply weights to specific terms, literals and clauses which match a particular pattern. For example: NOT: +6 negative literals have 6 added to their weight a: +10 the term a has a weight of 10 Weighting can be a useful mechanism for guiding ITP in the proper direction of proof. It can be used to direct ITP to select the most important clauses first. 13 SECTION III Interpretation of Scanner, Parser and Tree Builder Output In using TAMPR it is necessary to create the tree form of a program. The mechanism for creating the tree requires output from a LR-automatic parser generator[l3]. The scanning of the C tokens was done using Fscan [6]. Because the scanner and parser were not created together, it is necessary to have some knowledge about the output that each creates. Routines can be created to alter the scanner output to a form that the parser can accept. A description of the generated output is necessary if any alterations are to be made within the scanner, parser and communication routines. The following is a detailed discussion of how to interpret the output of the program scanner, parser and tree builder. Henderson’s paper[8] contains a detailed discussion of a similar verification project for the language Fortran. This section along with Henderson’s paper should be sufficient documentation for someone interested in using the scanner, parser and tree building tools to extended TAMPR with other programming languages. Scanner The C Scanner was built using Fscan[6], a language for specifying the lexical analysis of programs. It is assumed that the reader is familiar with programming language grammars, lexical analysis and the Fscan scanner specifications. Appendix C contains a portion of the C Fscan specifications corresponding to the example C program used in this report. For a more detailed discussion, see Fscan[6]. Figure 3 shows a empty program written in the language C. We will follow this example throughout the paper as it is sufficient to illustrate the basic concepts. mainO Figure 3 - Empty C Program To run the Fscan scanner created for C, the following command is used : scanner file] <file2.c > filc2.out 14 Where scanner: Executable file made from a number of scanner skeleton routines, (most of which are written in Fortran, some are written in C). Fscan scanner specifications (called specs) and the Fscan compiler. file2.c: A program written in C which serves as input to the scanner. file2.out: The scanner output consisting of a tokenized version of the input program. file]: An output file containing string literals and character literals used in the input file followed by a page feed. FileI later gets concatenated with the program tree structure for TAMPR’s use. The tokens generated by the scanner for the C program of Figure 3 are shown in Figure 4. The tokens for C are numbers, which refer to the terrnanal symbols in the program. 9999 4 main 80 81 78 76 Figure 4 - C Program After Scanning The tokens in Figure 4 fall into two categories. One category contains tokens which have associated values. These tokens are called kept tokens because along with the token a value is associated. Tokens in this category are identifiers, suings and characters. The other category contains tokens without associated values. Tokens of this type are key words such as if, while, and for. As one can we, there is no associated value with this type of token. Tokens with associated values ( kept tokens) When a token has an associated value ( identifiers, strings and characters are a few examples of this type), the scanner generates the number 9999, followed by the number corresponding to the position of the 15 token in the list of tokens at the top of the scanner specifications( called specs ). In Figure 4, a 2 follows 9999, referring to the first token (ie. 2-1=1) in the list of tokens in the language scanner specs. In the C specs the first token is an identifier. The next number designates the length of the identifier in characters. The actual identifier follows this length in the same line. So far, the scanner has recognized main to be an identifier, marks that it is 4 characters in length, and writes the identifier main. Tokens without associated values ( deleted tokens ) If a token does not have an associated value, the scanner simply generates its token number. The next number produced by the scanner is 80. 80 corresponds to the token at position 79 in the scanner specs, which is a left parenthesis "(". The next number is a 81 - which is a ")" and appears in location 80 in the scanner specs token position. 78 and 76 correspond to "I" and "l", respectively, and appear in loca- tions 77 and 75 in the scanner specs token positions. The last line of the output file is token number 1. This is the token for end-of—file (EOFTOK). The number 1 will always be the last line of the scanner output. Parser Shannon and Weathetall’s automatic parser generator[13] was used in creating the language and transformation parsers. The parser consists of a table generator and parser skeleton routines, which are dis- cussed in depth in Henderson’s paper[8]. To run the parser, the following command is used: parser < file2.sout >! file3.pout. parser: executable file made from a number of parser skeleton routines(most are written in Fortran, some in C), parser specification (the C grammar), and the parser compiler[13]. file2.sout: scanner output used as input to the program parser. file3.pout: parser output, to be used as input to the tree builder. The parser takes the output from the scanner and produces the output shown in Figure 5 for the example program ( Figure 3 ). Thus, Figure 5 represents the parse tree for this program. 16 0 #181 0 main 1 #1 l #159 l #177 l #184 l #186 0 #79 1 #379 0 #155 0 #80 1 #380 4 #186 l #188 1 #133 0 #190 0 #77 l #377 0 #136 0 #75 l #375 3 #128 4 #223 1 #221 l #222 l #199 Figure 5 - C Program After Parsing 17 These numbers describe the necessary information to build a tree representing the example C pro- gram. The pairs of numbers designate the following: First number : the number of children associated with the following grammar terminal or nonterminal. Second number : the number of the terminal or nonterrninal symbol. The terminal and nonterrninal symbols are assigned numbers at the bottom of the parser specifications ( specs ) by the use of &v followed by the terminal or nonterrninal symbol followed by a number. Example: &v 1 The number corresponds to the location of the terminal or nonterminal symbol in the scanner tables. The number serves as communication between the scanner and the parser. The parser output numbers represent an inorder traversal: Transverse the left subtree Visit node Transverse the right subtree 18 The tree structure for the example program is shown in Figure 6. 1 #199 I 1 #222 I 1 #221 I 4 #223 I I l I I 0#181 1#133 0#190 3#128 I I - l #188 | I I I I I I <(> <}> 4 #186 1 #377 0 #136 1 #375 l I I - 0 #77 0 #7 5 I I I I <(> <)> 1 #186 1#379 0 #155 l #380 I I I ( ) 1 #184 0 #79 0 #80 I 1 #177 I l #159 I 1 #1 I main 0 #main Figure 6 - Tree Structure of C Program 19 As an example of how numbers are generated from the parser let us look at a portion of the C gram- &p which specifies that a program consists of a top level declaration list. The number 199 in Figure 4 corresponds to the nonterrninal . At the bottom of the parser specifications one can find the statement &v 199, which indicates that the token 199 corresponds to the nontenninal . has one child because there is only one item on the right hand side of its rule in the grammar, therefore the 1 in Figure 6 corresponds to the number of children. Tree Builder The tree builder is written in Lisp. It takes the concatenation of parser output and scanner output file] and converts it into a tree suucture which is used as input to TAMPR. The tree builder output for the example program is shown in Figure 7. ((%O ‘ "'”)) (#199 #222 #221 #223 (#181) (#133 #188 #186 (#186 #184 #177 #159 #1 main) (#379 #79) (#155) (#380 #80)) (#190) (#128 (#377 #77) (#136) (#375 #75») Figure 7 - Tree Builder Output for Sample Program The correspondence between the tree builder numbers and the scanner output is easily seen from the tree structure shown in Figure 6. The scanner output was in an inorder tree traversal order, and the tree builder output is in preorder tree traversal form. preorder: visit node, transverse left and visit node when you can no longer transverse left move right and begin again or move back until you can move right and resume. Parentheses are used to specify children in the tree builder output where numbers were used in the parser output. To assist in seeing the correlation, Figure 8 shows the tree represented by the tree builder. 20 (199 222 221 223 (181) (133 (190) (128 188 (377 (136) (375 186 77) 75)) (186 (379 (155) (380 184 79) 80)) 177 159 1 main) Figure 8 - Tree Representation of Tree Builder Output The process of scanning, parsing and tree building a program is preformed by using a script called vrccogcpgm. The script was developed to allow a person to scan, parse and build a tree of a program with one statement rather than doing the three tasks separately and directing the inputs and outputs properly. vrecogcpgm assigns unique names for the files created from the scanning, parsing and tree building pro- cess to avoid conflict with files in the immediate directory. The created files are destroyed when recogniz- ing is complete, and all that will remain is the tree form of the program called file.aform. The following statement is used in recognizing a C program: vrecogc.pgmfile.c fileaform Scanning, parsing and tree building of transformations is done similarly using the statement: vrecogc.igt transformation The tree form of the transformation is automatically placed in a directory called vaform and has the same name as the transformation. Directory assignment is done so that the readable file of a transformation can have the same name as its tree form to avoid name confusion. To send a program through a transformation, the user would set up a file, typically called runtampr, which contains the transformation names locations. The user would then enter: 21 rwuamprfileaformfilenewaform > file.out where filengorm The tree form of the initial program or an intermediary tree form formed by sending a program through one or more previous transformations. file.new.cy"orm The new tree form created from the transformations listed in the nmtampr file. fileout A file set up so that the user can see which transformations effected the program - contains a listing of the program in a readable form given before and after each transformation is applied. It would be appropriate here to mention a form of the C grammar called the Superstart grammar of the language. This grammar is required so that terminals can be used as the dominating symbols of transformations. It is sometimes necessary to replace a terminal in a transformation; for example, to change ‘not‘ to - . In order to make the replacement we need to add the nonterminal <‘not‘> and <-> to the grammar where <‘not‘> :- ‘not‘ and <-> := - so that TAMPR can use these as dominating symbols for a transformation. The Superstart grammar allows terminals to be used as dominating symbols by creat- ing a version of the grammar where all the terminals become nonterminals. Vrecogc.igt, the transformation recognizer, uses the Superstart version of the grammar. 22 SECTION IV C Example There are a total of 20 transformations which are used to create the verification conditions for code written in C. One can look at the transformation process as mapping the C program, one transformation at a time, to a form suitable for ITP. These transformations are categorized into five basic types of transforma- tions (see Figure 9). Type A are responsible for getting the C code in a standard form and generating marks for statements and identifiers. Type B is responsible for creating the verification conditions and pro- ducing the clause form representation of the C code. Type C handles array index bound checking. Type D transformations define each program path. Type E places the statements into conjunctive normal form, reduces logical expressions and generates the required form for ITP. 23 Type A: Initialization and simplification for transformation set firsttrans - replaces C’s use of x op= y with x = x op y initstmts - replace do while statements with while statements, for statements with while statements and expression statements to if then statements. stndrdsp - marks all statements mar/cident - marks all identifiers with there corresponding type. marhidentZ - marks all undeclared identifiers with there corresponding type. Type B: Representation into states vargentr - generates a list of variables to which assignments are made. asserttr - produces a state representation of the program Type C: Array index handling arraytr - array index and format handling insvaltr - array index and format handling Type D: Set up program paths setup - sets up program paths thenelse - secondary path splitting for if then else sections of code problems - sets up program paths forcingsplit - establishes the necessary conditions to execute each program path Type E: Simplification reltrans - places operators into a single less then form this transformation applies after insvaltr and is the only transformation listed out of applied order. notsdwn - brings all logical nots in as far as possible quantsup - takes the quantifiers out dmgntaut - simplifies logical expressions remsqnew - removes unnecessary universal and existential quantifiers quantout - removes existential quantifiers lasttrans - converts items to ITP form Figure 9 - Transformation Categories 24 To give a full view of the transformation set, a trace of an example C program from the start of the transformations to the end and ITP’s proof of the program will be shown. Transformations are shown in the order that they are applied to the program, which is called edie.c. Each transformation will be explained in detail and the readable result of the transformation will be shown. Where a transformation does not affect edie the readable fortn of edie.c's present state will not be shown, but a description of what the transformation does will follow. It is expected that through this means the reader will gain a full under- standing of the C Automated Verification Condition Generator. Edie.c was created as a test for the transformation set. This program locates the maximum element of an array 0. This will provide a test for the transformation set in handling of arrays, while statements, assignments and if statements. Ediec covers most aspects of the transformation set with very few excep— tions, these being some of the initial transformations. Note that Edie.c never accesses the first element of array c. The transformation array bound checking presently checks for bounds between one and n. This needs to be corrected to check bounds between zero and n-1. The initial reason for bounds between one and n was to create a correspondence between the C verifier and the Fortran verifier as a method of assur- ing that the C verifier was preforrning correctly. The C verification condition generator requires that the user make input, output and loop assertions for the portion of code to be verified. The result of the transformations and subsequent ITP proofs verifies that the code complies with the stated assertions which are made about the code and that no array sub- scripts are out of bounds. Input assertion: The user specifies the input conditions by defining the function _phi_( ) equivalent to the conditions that must be met upon entering this routine. The function _phi_( ) is used as a key word in the transforma- tions so therefore must always be the notation for the input assertion. _phi_ ( ) is a function of the pro- gram variables which the user has specified within the input assertion. Output assertion: The output assertion is a specification of what variable conditions must be met at the termination of the program segment. _psi_( ) is defined as a function equivalent to the output assertions. _psi__ is a 25 function of the program variables which the user specifies as the output assertion. Loop assertion: This assertion requires no function notation and is set up as one would make an assertion about the loop conditions of a program. In the case of edie.c u must be >= 1, j is between 1 and n+1 , max must always be >= any element of the array c and there exists a k that is the max = c[k]. Edie.c with the added input, output and loop invariant assertions follows. Code will be in a larger font with the insertions and changes to the code darkened. Comments that are made within any of the code will be in italics. edie(n,c,max) int max,n,c[n]; /* E. MCCHAREN TEST PROGRAM NUMBER 1. */ int i, j, k ; /" FWDTHEMAXIMUMELEMENTINTHEARRAY C. INPUT ASSERTION. */ ‘assert‘ ( _phi_ (n) ‘equiv‘ n >= 1 ) ; /"' OUTPUT ASSERTIONH ‘assert‘ ( _psi (max,c,n) ‘equiv‘ n >= 1 ‘and‘ (‘every’in <= i ‘and‘ i <= 11 ‘impl‘ max >= c[i]) ‘and‘ (‘exists‘ k)(l <= k ‘and‘ k <= n ‘and‘ max == c[k]) ) ; max = c[l]; j = 2: /" INVARIANT ASSERTION FOR THE NEXT LOOP."'/ ‘assert‘ (n >= 1 ‘and‘ 1 <=j ‘and‘ j <= n+1 ‘and‘ (‘every‘ i)(l <= i ‘and‘ i < j ‘impl‘ max >= c[i]) ‘and‘ (‘exists‘ k)(l <= k ‘and‘ k < j ‘and‘ max == c[k]) ) ; while (i <= n) if( CD] > max) max = c[j] ; i =j+1; I return ; firsttrans This transformation deals with C’s ability to use x op: y and places them into an assignment in the form x a x op y ,'. F irsttrans has no effect on edie.c since edie.c contains no statements of the form x op: y. initstmts Initstmts contains a set of three transformations which initialize the form of the original C program by writing statements in a consistent and simplified form required for future transformations such as asserttr. The transformation expresses statements in equivalent but simplified form. convert:- (1) ‘s into their equivalent while statements. doi++,-while(i<10); to i++ ,- while(i< 10)i++ ,- (2) and ‘s to their equivalent in while statements. for ( i=1; i<=n,' i++ ) printf("xx") ,- to i = 1 ,- while (i <3)! ) {printf("xx") ; i++ ;} (3) expression statements to if-then statements. (x) ,° to do while () #; Stndrdsp also cleans up alternative terminals that can be used for a nonterminal symbol. In the extensions to C, one is able to use ll interchangeable with ‘or‘ to symbolize logical or. In later transforma- tions it is necessary to match a pattern with logical or. In initially forcing all the logical or’s to be ‘or‘, later transformations will not be repeated using ll operator. Stndrdsp applies to all nonterminals which have alternative or synonymous terminal symbols. The changes will not be shown here. The use of marks throughout the transformation set is not shown because the marks tend to make a transformation hard to read. The only thing to keep in mind here is that the use of marks about the statements is a method to prevent infinite looping in the transformations. Statements can be marked with #,‘#1‘,..., or ‘#S‘. A transformation will look for statements which are marked with one mark, say #, and in the pattern replacement mark it with another mark, perhaps “#1‘, therefore preventing a transformation from reapplying. markident This transformation marks all identifiers with their corresponding type throughout the program. Identifier marking is done as a method of retaining information about an identifier throughout the transfor- mation process. This information is used by a sedfile which determines whether an identifier is a variable, constant, or skolemized variable. The sedfile applies after the last transformation and its results are used by 28 the theorem prover. It is necessary to use the sedfile because the VAX version of TAMPR does not have formatting capabilities. The sedfile will be explained at the end of the transformation set. As an example: i becomes # i int # throughout the program. The results of this transformation will not be shown nor will its ramifications throughout the transformation set for the same reason as specified above in stndrdsp. markidentz This transfmnation marks all previously undeclared variables with the C default type int. Because _phi__ and _psi__ were added variables in the input and output assertions, they become of type in! and a declaration for them is inserted in the section of the code. At this point in the transformation set the program is in a simplified form with the generated marks about identifiers and statements. The next step is to produce a state representation of the program. vargentr This transfa'mation inserts an assert statement ‘assert‘(_pvar_(lisr)); and generates a list of vari- ables to which assignments are made and places them in the list portion of this assert statement. This rou- tine requires all the assignment statements to be of the form x = exp;. The assertion will later be used as a method of representing the state of the program in terms of the values of its variables. The added declara- tions for psi and phi can also be seen here: Edie after vargentr: edie ( n, c, max) int max, n, c[n] ; int _psi ; int _phi; ; inti,j,k; 29 ‘assert‘ ( _pvar_( max, j) ) ; ‘asscr't‘ (_phi_( n) ‘equiv‘ n>= 1 ) ; ‘assert‘ (_psi_( max, c, n) ‘equiv‘ n>= 1 ‘and‘ ( ‘every‘ i) ( 1 <= i‘and‘ i<= n‘impl‘ max>= c[ i] ) ‘and‘ ( ‘exists‘ k) ( 1 <= k‘and‘ k<= n‘and‘ ‘2‘ max int ‘2‘ = c[ k])); max= c[ l ] ; j= ld!2!0 ; ‘assert‘ (n>= l ‘and‘ 1 <=j‘and‘ ‘2‘ j int ‘2‘ <= n+ 1 ‘and‘ ( ‘every‘ i) ( 1 <= i ‘and‘ i< j‘impl‘ max>= c[ i] ) ‘and‘ ( ‘exists‘ k) ( 1 <= k‘and‘ k< j ‘and‘ max= c[ k] )) ; while ( j<= n) { if ( of j] > mart) { max= CU] ; l . J=J+1; } return; asserttr In asserttr the entire program is expressed by states and transitions from one state to another. Each state is a function of the program variables to which assignments are made. In the case of ediec there are two variables max and j and each state will be a function of these two variables. Assertrr also forces the input assertion into the _phi_ function, hence _phi_ becomes _phi_(n>=1). Asserttr converts all of the statements: 1 if then 2 while 3 assignment 4 return into a set of ‘assertion‘ statements . Tcheckrub is a dummy function created for the in bounds array conditions which must be met for some of the state changes to occur. The variable is later changed in arrayrr to _ib_(c,i) so arrays become specified in predicate calculus notation. 30 The program after asserttr will be shown with comments inserted to explain the effect of the transformation. It will be helpful to refer to the state representation of Figure 10 in evaluating what is hap- pening in this transformation. Edie after asserttr”. ‘assert‘ ( _z$$pvar_ ( max .j ) ) ; ‘asscrt‘ ( _z$$phi_ ( n >= 1 ) ) ; The start state is 10 ( max, j) where the 10 was a number generated by TAMPR. ‘assert‘ (_z$$start_( q00010(max J) ) ) 3 Transition: symbol _tran_. The state specification for assignment statements is done using the key word _tran_ within an assert statement. The program statement max = cl 1 I ,' becomes a transition from the start state 10(maxj) to state 9(c[I]J) where max now has a changed value and j remains the same. ‘assert‘ ( z$$tran (q00010(max,j) ‘and‘ _tchecksub_(c['1]) ‘impl‘ q00009(c[ 1 ] , j )) ) ; The statement j = 2; becomes a transition from state 9(maxj) to state 8(max.2). ‘assert‘ (_z$$tran_( q00009(max,j) ‘impl“ q00008(max, 2))); Induction: symbol _ind_ This assertion reads: If in State 8 then the loop invariant assertion. The loop invariant assertion now becomes the _ind_ assertion. ‘assert‘ (_z$$ind_ ( q00008(max , j ) ‘impl‘ (n >= 1 ‘and‘ 1 <= j ‘and‘ j <= n + l ‘and‘ ( ‘every‘ i) ( 1 <= i ‘and‘ i < j ‘impl‘ max >= c[i]) ‘and‘ ( ‘exists‘ k )(1 <= k ‘and‘ k max then goto state 5. Else portion of a if statement: symbol _else_ Do the following if the if statement conditions are not meet: c[i] is not > max and if c[ j] is in bounds then goto state 6. The ‘{#‘ and matching ‘}#‘ specify the portion of code which is inside of the while loop. s{#s ‘assert‘ ( ( Z$$then_ (q00007(max,j) ‘and‘ _tchecksub_ (c[j] ) ‘and‘ RU] > max) ) ‘impl‘ q00005(max,j)) ‘and‘ (_Z$$else ( q00007(maxJ) ‘and‘ _tchecksub_ (c[i] ) ‘and‘ ‘not" ( c [i] > max) ) ‘impl‘ q00006(max,j))); The ‘{#‘ and its matching ‘}#‘ specify the transitions in the case that the program has entered the if portion of a if statement. This specifies that if in state 5 and if c[j] is in bounds then enter state 3 with max = CU] and j unchanged. State 3 is the state after the then portion of if statement is complete. Transfer directly from state 3 to state 6 where state 6 is the state immediately after the else portion of the if statement. s{#s ‘assert‘ (_Z$$tran (q00005(max,j) ‘and‘ _tchecksub_ (c[jl) ‘impl‘ q00003 (c[i]J))) ; ‘assert‘ (_z$$tran_ ( q00003(max,j) ‘impl‘ q00006(max,j))); ‘}#‘ From here we have the transitions from state 6 directly into state 2 with max remaining the same and j incremented by I. ‘assert‘ (_Z$$tran_ ( q00006(max ,j) ‘impl‘ q00002(max, j + l))) ; Do end: symbol _doend_. From state 2 go directly to state 8 to make new iteration ifnecessary. ‘assert‘ ( _z$$doend_ ( q00002( maxJ) ‘impl‘ q00008(max ti») ; ‘}#‘ Not psi is the negation of the output assertion. This will serve as the denial for a proof by contradiction. 32 ‘assert‘( Tz$$notpsi (q00004 (max ,j) ‘impl‘ ‘not‘( It >= 1 ‘and‘ (‘ every? i )(1<=t ‘and‘i <= n ‘impl‘ max >= c[i]) ‘and‘ (‘exists‘ k )(1<= k ‘and‘ k<= n ‘and‘ max == c[k] )))); 33 9(C[1]J) 8(max.2) not(j<=n) J <= n tchecksub c[j tchecksub c[j] not(c[i] > max c[j] > max m «w» ““9" ‘mpty tchecksub c[j] r ' "W“ 3(CU].D empty State State after the while loop is completed State corresponding to max = c[j] in the then portion of the if statement State corresponding to j = j + l, increment of j in while loop The then portion of the if statement The else portion of the if statement. in this case it is immediately after the if statement. The state before enu'y into the while loop State after j = 2; State after max = c[l]; 10 The beginning state GUINw-b \ONQ Figure 10 - State representation of edie.c 34 Using the function tchecksub the C verification condition generator automatically takes care of array bound checking. The following two transformations preform this array handling. arraytr In the C verification condition generator, array bound checking is done automatically. In order to implement the bound checking the transformation arraytr was produced. Arraytr preforrns the following three tasks: (1) Changes all tchecksub(c[ .71) to _ib_(c,?) to get the arrays into form suitable for ITP. (2) Arraytr adds _axioms_ assertion where _axioms__ is equivalent to (1 <= _x_ <= n) which is in turn equivalent to _ib__(c,_x_)). This specifies the conditions for the array bounds. _x_ is a dummy variable and does not relate to any variables within the program. (3) In the array declaration, the left bracket of the first array dimension gets marked with a ‘[I‘ so that TAMPR knows that all the arrays in the program have been dealt with properly and again as a way of preventing an infinite loop in the transformation set. The use of marking a left bracket is an extension to the language C (see Appendix A) but a grammar change that the user need not be aware of. The transformed program after arraytr is: edie ( n, c, max) int max, n, c ‘[l‘ n] ; ‘assert‘ (_axiorns_ equiv ( ( 1 <= _x_‘and‘ _x_ <= ( n) ) equiv _Ib_ ( c._X_) ) ); int _psi_ ; int _phi_ ; int i, j, k ; ‘assert‘ (_z$$pvar_ ( max, D) ; ‘assert‘ (_z$$phi_ ( n>= 1 )) ; ‘assert‘ (_z$$start_ ( q00010( max, j) ) ) ; ‘assert‘ ( _z$$tran_ ( q00010( max, j ) ‘and‘ _ib_(¢,1)‘impl‘ q00009(c[1],j))) ; ‘assert‘ (_z$$tran_ ( qOOOO9( max, j ) ‘impl‘ q00008( max, 2) ) ) ; ‘assert‘ (_z$$ind_ ( q00008( max,j ) ‘impl‘ (n>= l ‘and‘ 1 <=j‘and‘ j <= n-t- 1 ‘and‘ (‘every‘ i) ( 1 <= i ‘and‘ i < j‘impl‘ max>= c[ i] ) ‘and‘ ( ‘exists‘ k) ( 1 <= k‘and‘ k max)) ‘impl‘ q00005(max,j)) ‘and‘ (_Z$$else_ ( q00007( max , j) ‘and‘ ib_ ( c, j) ‘and‘ ‘not‘ (c[ j] > max) ) ‘impl‘ q00005(max,j))); S ‘ ‘assert‘ (_z$$tran_ ( q00005(max,j) ‘and‘ _ib_(c,j) ‘impl‘ 900003 (c[j] .j) H; ‘assert‘ (_z$$tran_ ( q00003(max,j) ‘impl‘ qOOOO6(max,j))); C S ‘assert‘ (_z$$tran_ ( q00006(max,j) ‘impl‘ q00002(max,j + 1))); ‘assert‘ (_z$$doend_ ( q00002(max,j) ‘impl‘ q00008(max,j))); ‘assert‘ (_Z$$notpsi_ ( q00004( max, j ) ‘impl‘ ‘not‘ (n>= l ‘and‘ ( ‘every‘ i) ( 1 <= i‘and‘ i<= n‘impl‘ max>= C [i int ] ) ‘and‘ ( ‘cxists‘ k) ( 1 <= k‘and‘ k< = n‘and‘ max== c[ k] ) ) ) ) ; insvaltr In insvaltr, all arrays which still have the bracket notation get changed to a functional notation with key word _val#_. In edie.c the it is always 1 indicating one dimensional array. c[i] will become _valI_(cJ) where the first parameter of val is the array name and the second is the index of the array. Arrays changed in this transformation would include the arrays whose values are not being altered. As an example: max =- c[k] becomes max == _valI_(c,k) After this transformation there will be no my brackets in the pro- gram. Edie after insvaltr'. edie(n,c,max)intmax,n, c ‘[1‘n] ; ‘assert‘ (_axioms__ ‘equiv‘ ( ( 1 <= _x_‘and‘ _x_ <= ( n) ) ‘equiv‘ _ib_ ( C. _x_) ) ) ; int _psi_ ; 36 int _phi_; int i, j, k ; ‘asser‘t‘ (_z$$pvar_(maX.j)) ; ‘assert‘ (_z$$phi_(n>= 1 )) ; ‘assert‘ (_z$$start_(q00010(maX.j))) ; ‘assert‘ ( _z$$tran_ ( q00010( max, j ) ‘and‘ _ib_ ( C. 1 ) ‘impl‘ q00009( _vall_ ( c, 1 ) .j) )) ; ‘assert‘ (_zSStran_ ( q00009( max, j ) ‘impl‘ q00008( max, 2) ) ) ; ‘assert‘ (_z$$ind_ ( q00008( max, j ) ‘impl‘ (n>= l ‘and‘ 1 <=j‘and‘j <= n+ l ‘and‘ ( ‘every‘ i)( 1 <= i ‘and‘ i < j‘impl‘ max>= _vall_(c,i) ) ‘and‘ ( ‘exists‘ k) ( 1 <= k‘and‘ k < j‘and‘ max= _vall_(c,k))))) ; ‘assert‘ ( (_z$$doiter_ ( q00008( max, j ) ‘and‘ ( j<= n) ) ‘impl‘ q00007 (max, j) ) ‘and‘ (_z$$doexit_ ( q00008 ( max , j) ‘and‘ ‘not‘ ( j<= n) ) ‘impl‘ q00004( max,j) ) ) ; 6 ‘ ‘assert‘ ( (_z$$then_ ( q00007(max,j) ‘and‘ _ib_ ( c, j) ‘and‘ ( vall_(c,j) > max)) ‘impl‘ qOOOOS (max,j)) ‘and‘ ( _23$else_ ( q00007( max, j) ‘and‘ _ib_ ( C.j ) ‘and‘ ‘not‘ (_vall_ ( c, j) > max) ) ‘impl‘ q00006( max, j) ) ) ; 6 6 ‘assert‘ (_z$$tran_ ( q00005( max, j ) ‘and‘ _ib_(cj) ‘impl‘ q00003(_vall_(CJ).j))); ‘assert‘ (_z$$tran_ ( q00003( max, j ) ‘impl‘ q00006( max, j ) ) ) ; ‘assert‘ (_z$$tran_ ( q00006( max, j ) ‘impl‘ q00002( max, j + 1 ) ) ) ; ‘assert‘ (_z$$doend_ ( q00002( max, j ) ‘impl‘ qOOOO8( max, j) ) ) ; 6 C ‘assert‘ (_z$$notpsi_ ( q00004( max, j ) ‘impl‘ ‘not‘ (n>= l ‘and‘ ( ‘every‘ i) ( 1 <= i‘and‘ i<= n‘impl‘ max>= _vall_ ( c , i) ) ‘and‘ ( ‘exists‘ k) ( 1 <= k‘and‘ k int <= n‘and‘ max== _vall_ ( c, k ) )))); reltrans Reltrans is a simplification transformation which preforms the following three functions: (1) This transformation places all the operator expressions into the equivalent ’less than’ form. 37 Example: x >= y becomes (not (x < y)) In using ITP, one must specify all rules which correspond to a domain which new information is to be derived. By using reltrans, it is only necessary to include axioms corresponding to ’less than’ rather than all possible relational operators. Therefor using only ’less than’ will ease the work necessary in creating rules for ITP. (2) Reltrans also expresses the addition or subtraction of l in an arithmetic expression in terms of the suc- cessor function S or the predecessor function Pd. The successor function has a value of one greater then its argument. The predecessor function has a value of one less then its argument. (3) In reltrans, the val function notation is changed to eval. Edie after reltrans: edie ( n, c, max) int max, n,c ‘[1‘ n] ; ‘assert‘ (_axioms_ ‘equiv‘ ((( ‘not‘ ( x_< l )) ‘and‘ ( ‘not‘ ( ( n) < _X__) ) ) ‘equiV‘ _ib_(c ._x_))); int _psi_ ; int _phi_ ; int i,j, k; ‘assert‘ (_z$$pvar_ ( max,j) ) ; ‘assert‘ (_z$$phi_ ( ( ‘not‘ (n< l ) ) )) ; ‘assert‘ (_z$$start_ ( q00010( max, j) ) ) ; ‘assert‘ (_zSStran_ ( q00010( max,j ) ‘and‘ _ib_(c,1) ‘impl‘ q00009(eva|(0.1).j))) ; sasscrt‘ ( _Z$$tran_ ( q00009( max,j) ‘impl‘ q00008( max: 2 ) ) ) ; ‘assert‘ (_Z$$ind_ ( q00008( max, j ) ‘impl‘ ( ( ‘not‘ (n< 1)) ‘and‘ (‘not‘ (j < 1)) ‘and‘ ( ‘not‘ (s(n) = 1 is the hypothesis for path] . Between the hypothesis and the denial of each program path is a transition specification. In the case of path) , the specification is the path from state 10 to state 8. This is the description of the program up to the while loop. Denial: From state 8 (before while loop) implies the negation of the loop assertion. start ; ‘aesert‘ ( q00010( max, j) ) ; 6} 16 hypothesis ; ‘assert‘ ( ( q00010( max,j ) ) ‘impl‘ ( ( ‘not‘ (n< 1)) )) ; ‘}l‘ ‘assert‘ ( Z$$tran_ ( q00010( max,j ) ‘and‘ _ib_ ( c, 1 ) ‘impl‘ 00009(eval(c,l),j))); ‘assert‘ (_z$$tran_ ( q00009( max, j ) ‘impl‘ q00008( max, 2 ) ) ) ; forcing ; ‘assert‘ ( l) ; 40 ‘}16 denial ; { ‘assert‘ ( q00008(max,j) ‘impl‘ ‘not‘(((‘not‘ (n < 1)) ‘and‘ ( ‘not‘ (i < 1)) ‘and‘ (‘not‘ (s(n) < j)) ‘and‘ ( ‘every‘ i ) ( ( ‘not‘ ( k l ) ) ‘and‘ i= 1 and j): I and s(n) >=jand (foralli l<= i <=j implies max >= c[i]) and (exists k l<= It <=jandmax == c[k]). This Hypothesis is identical to the loop invariant hypothesis which the user initially entered. Following the hypothesis is the specification of a program path necessary to get the branch from state 8 to state 6. This is the 'do-while-body' portion of the code. Denial: State 4 implies the negation of the loop invariant assertion. This is setting up for a proof by contradiction by IT P. i“ ‘ ‘assert‘ (q00008( max,j» ; ()16 hypothesis ; { ‘assert‘ ( q00008(max,j) ‘impl‘ (( ‘not‘ (n< 1)) ‘and‘ (‘not‘ (i< 1)) ‘and‘ ( ‘not‘ (s(n) < j)) ‘and‘ ( ‘every‘ i) ( ( ‘not‘ (i< 1)) ‘and‘ i< j ‘impl‘ ( ‘not‘ ( max< eval (c, i) ) ) ) ‘and‘ ( ‘exists‘ k) ( ( ‘not‘ ( k< l ) ) ‘and‘ k< j ‘and‘ max ==eval(c,k)))); ‘}l‘ ‘assert‘( z$$tran_ ( q00008( max, j )‘and‘ ((‘ilot‘(n<.i)))‘impl‘ (100007(max,.i))) 3 ‘assert‘ ( ( _z$$then_ ( q00007( max, j ) ‘and‘ _ib_ ( e, j) ‘and‘ 41 (( max < eval (c, j) ) ) ) ‘impl‘ q00005 (max , j) ) ‘and‘ ( Z$$else__ ( q00007( max , j) ‘and‘ _ib (C, j ) ‘and‘ "not‘ ( ( max< eval (c, j )))) ‘impl‘ q00'006( max, j) ) ) ; ‘{‘ ‘assert‘ ( Z$$tran_ ( q00005( max, j ) ‘and‘ _ib_ ( c, j) ‘impl‘ (100003 ( eval (c..i) .i) ) ); ‘assert‘ (_zSStran_ ( q00003( max, j ) ‘impl‘ q00006( max, j) ) ) ; forcing ; ‘assert‘ ( l) ; ‘}l‘ ‘}l‘ ‘assert‘ (_z$$tran_ ( q00006( max, j ) ‘impl‘ q00002( max, 8 (j) ) ) ) ; hmmg; ‘assert‘ ( q00008( max, j) ‘impl‘ ( ( ‘not‘ (n< j) ) ) ) ; 6}l‘ denial ; { ‘assert‘ (q00004(max,j) ‘impl‘ ‘not‘((( ‘not‘ (n< 1)) ‘and‘ ( ‘not‘ (j < 1)) ‘and‘ (‘not‘ (s(n) < j) ) ‘and‘ ( ‘every‘ i ) ( ( ‘not‘ ( i< 1)) ‘and‘ i< j ‘impl‘ ( ‘not‘ ( max< eval (c, i) ) )) ‘and‘ ( ‘exists‘ k) ( ( ‘not‘ ( k< l ) ) ‘and‘ k< j ‘and‘ max==eval(c,k))))); ‘}l6 For problem3: Start condition: Path3’s start condition is identical to path2's. Hypothesis: Path3's start condition is identical to pach’s. The specification of the program path necessary to get from state 8 to 4 while not entering the while loop. Denial: In state 4 implies not input assertion or not every array member is less than max or not a I: where max 8: c[k]. start ; ‘assert‘ (q00008( max, j) ) ; 6}16 42 hypothesis ; { ‘assert‘ ( q00008(max,j) ‘impl‘ ((‘not‘ (n< l )) ‘and‘ (‘not‘ (i< 1)) ‘and‘ ( ‘not‘ (s(n) < j) ) ‘and‘ ( ‘every‘ i ) ( ( ‘not‘ ( i< l )) ‘and‘ k j ‘impl‘ ( ‘not‘ (max< eval (c, i) ) ) ) ‘and‘ ( ‘exists‘ k) ( ( ‘not‘ ( k< 1)) ‘and‘ k< j ‘and‘ max ==eval(c,k)))); ‘}l‘ ‘assert‘( z$$tran_ ( q00008( max,j ) ‘and‘ ‘not‘ (( ‘not‘ (n< j))) ‘impl‘ (100004 (max,j))) ; forcing ; ‘assert‘ ( q00008( max, j) ‘impl‘ ‘not‘ ( ( ‘not‘ (n< j) ) ) ) ; 6}]6 denial ; { ‘assert‘ (q00004( max, j) ‘impl‘ ‘not‘ ( ( ‘not‘ (n< l )) ‘and‘ ( ‘every‘ i) (( ‘not‘ ( i< 1)) ‘and‘ ( ‘not‘ ( n< i) ) ‘impl‘ ( ‘not‘ ( max < eval (c, i) ) ) ) ‘and‘ ( ‘exists‘ k) ( ( ‘not‘ (k< 1)) ‘and‘ ( ‘not‘ (n< k) ) ‘and‘ max==eval(c,k)))); s}1s ‘}l‘ Thenelse generates secondary program paths. It accomplishes this by splitting the if then statements into two paths which are specified in the forcing section of the main path. After having made the split, thenelse removes the key words _trans_, _else_ and _then_. Thenelse also moves the forcing assertion up to create any additional program paths. Edie after thenelse: edie ( n, c, max) int max, 11, c ‘[1‘ n] ; ‘assert‘ (_axioms_ ‘equiv‘ ( (( ‘not‘ (_x_< 1)) ‘and‘ ( ‘not‘ ( ( n) <_x__) ) ) ‘equiv‘ _ib_ ( c , _x_) ) ); 43 int _psi_ ; int _phi_ ; int i, j, k ; start; ‘assert‘ (q00010( max, j) ) ; hypothesis ; ‘asscrt‘ ((q00010( max.j ) ) ‘impl‘ ((‘not‘ (n<1)))) ; t}‘ For probleml .' forces assert(1) - I is C's notation for TRUE through the transitions with no changes in assert(l) meaning no additional paths to be specified here. There are no additional paths within this problem because there are no 'tf then else' segments corresponding to this code. forcing ; ‘assert‘(l) ; ‘assert‘ ( q00010( max, j) ‘and‘ _ib_ ( c, 1 ) ‘impl‘ q00009(eva1(c ,1),j)) ; ‘assert‘ (q00009( max, j) ‘impl‘ q00008( max, ld!2!0 ) ) ; (Ilenial ; ‘assert‘ (q00008(max,j) ‘irnpl‘ ‘not‘(((‘not‘(n< 1)) ‘and‘ (‘not‘ (i < 1)) ‘and‘ ( ‘not‘ (s(n) < j) ) ‘and‘ ( ‘every‘ i) ( ( ‘not‘ (i< 1 )) ‘and‘ i< j ‘impl‘ ( ‘not‘ (max< eval (c, i) ) ) ) ‘and‘ (‘exists‘ k) ( ( ‘not‘ (k< 1 )) ‘and‘ k< j ‘and‘ max=eval(c.k))))); ‘1‘ start; ‘assert‘ (q00008( max, j) ) ; hypothesis ; ‘assert‘ (q00008(max, j) ‘irnpl‘ (( ‘not‘ (n< 1)) ‘and‘ (‘not‘ (j< 1)) ‘and‘ ( ‘not‘ (s(n) < j) ) ‘and‘ ( ‘every‘ i )(( ‘not‘ ( i< 1 )) ‘and‘ i< j ‘impl‘ ( ‘not‘ (max< eval ( c, i) ) ) ) ‘and‘ ( ‘exists‘ k) ( ( ‘not‘ (k< 1 )) ‘and‘ k< j ‘and‘ max =cval(0.k)))); ‘}6 For problemZ : Forces assert(1) through the transitions placing the two if conditions into the assertion. The two conditions will later divide to produce two paths corresponding to this problem segment. forcing ; ‘assert‘ ( ( q00007( max, j ) ‘impl‘ _ib (c, j) ‘and‘ ‘not‘ ( ( max < eval (c, j) ) ) ) ‘and" ((100008 ( max 9 .i) ‘impl‘ ( ( ‘not‘ (n< .i) ) ) ) ) 3 ‘assert‘ ( ( ( q00007( max, j ) ‘impl‘ ib (c, j) ‘and‘ ( ( max < eval (c, j) ) ) ) ‘and‘ ( I") )7and‘ (q00008(max.j) ‘impl‘ ( ( ‘not‘ (n ‘impt‘ (q09005(max.j)»; ‘assert‘ ( ( q00007( max, j) ‘and‘ _1b_(C,J) ‘and‘ ‘not‘ ( (max < eva1( c,j)))) ‘irnpl‘ (q00006(max,j))); ‘assert‘ ( q00005( max, j) ‘and‘ _ib_ ( c, j) ‘irnpl‘ q00003(eva1(c.j).j) ) ; ‘assert‘ (q00003( max, j) ‘impl‘ q00006( max, j) ) ; glassearlt‘ (q00006( max, j) ‘impl‘ q00002( max, 5 (j) ) ) ; em ; ‘assert‘ (q00002( max, j) ‘impl‘ ‘not‘(((‘not‘(n< 1)) ‘and‘ ( ‘not‘ (j < 1)) ‘and‘ ( ‘not‘(s(n) < j) ) ‘and‘ ( ‘every‘ i) ( ( ‘not‘ (i< l )) ‘and‘ i< j ‘impl‘ ( ‘not‘ (max< eval (c, i) ) ) ) ‘and‘ (‘exists‘ k) ( ( ‘not‘ (k< l )) ‘and‘ k< j ‘and‘ max=eva1(c.k))))); 6}‘ start; ‘assert‘ (q00008( max. j) ) ; lltypothesis ; ‘assert‘ (q00008( max, j) ‘irnpl‘ ( ( ‘not‘ (n< 1 ) ) ‘and‘ ( ‘not‘ (j< 1 ) ) ‘and‘ ( ‘not‘ (8 ( n) y) ‘and‘ (‘every‘ 2) (z > y)) is transformed to: (‘evcry‘ X) (‘every‘ y) ((x > y) ‘and‘ (z > y)) iOOOXXand 1:000)“ are TAMPR generated variables that propagate through the assertion containing the every and/or exists. Edie after quantsup: edie(n,c, max) intmax, n, c ‘[1‘n] ; ‘assert‘ (_axioms_ ‘equiv‘ ( ( ( ‘not‘ (_x_< 1 ) ) ‘and‘ (‘not‘ ((n) < _x_) ) ) ‘equiv‘ _ib_ ( c , _x_) ) ); int _psi_; int _phi_ ; int i, j, k ; Iproblem_ ; _for_ ; { 8}8 _star'tup_ ; ‘assert‘ ( q00010( max. j) ) ; 8 } 8 _ind_ ; { ‘assert‘ ( ‘not‘ q00010( max, j ) ‘or‘ ‘not‘ (n< 1 ) ) ; 8 } 8 _{_trans_ ; ‘assert‘ ( ‘not‘ q00010( max, j ) ‘or‘ ‘not‘ _ib_ ( c, 1 ) ‘or‘ q00009(cva1(0.1).J)) ; ‘assert‘ ( ‘not‘ q00009( max, ) ‘or‘ q00008( max, 2) ) ; 8 } 8 _denial_ ; ‘assert‘ ( ( ‘exists‘ i00014) ( ‘every‘ k00015) ( ‘not‘ q00008( max, j) ‘or‘ ( n < l ‘or‘ j< l ‘or‘ s ( n) < j ‘or‘ ‘not‘ ( i00014 < l ) ‘and‘ i00014 < j ‘and‘ max < eval (c, i00014) ‘or‘ (k00015< l ‘or‘ ‘not‘ (k00015< j) ‘or‘ ‘not‘ ( max == eval ( c , k00015)))))) ; 56 _problem_ ; l _for_ ; { ‘assert‘ ( ( ‘not‘ q00007( max, j ) ‘or‘ _ib_ ( c, j) ‘and‘ max < eval ( c, j) ) ‘and‘ ( ‘not‘ q00008 (max .j) ‘or‘ ‘not‘ (n))‘eqmv‘_ib_(c._x_)); int _psr__; int _phi_ ; int i, j, k ; _{_problem_ ; _for_ ; l s l s _startup_ ; q00010( max, j) ; 8}8 Tind_; -q00010(max,j) ‘or‘-(Lt(n,l)); 818 _trans_ , l - q00010( max, j) ‘or‘ - _ib_ ( c, l ) ‘or‘ q00009( eval (c,1),j); - q00009( max, j) ‘or‘ q00008( max, 2) ; 8 l 8 denial_ ; ((-q00008(max,j)||Lt(n,l)l|Lt(j,l)|l t(s( n),j) ‘or‘-(Lt(c00036,l))ll t ( k00015 , l ) ‘or‘ - (Lt ( k00015, j) ) ‘or‘ - (Equal ( max, eval (c, k00015 ) ) ) ) ‘and‘ (-q00008(max,j) "Lt( n,l)llLt(j,l)ll Lt(s(n),j)llLt(c00037,j)ll Lt ( 00015 , l ) ‘or‘ - (Lt ( k00015,j)) ‘or‘ - (Equal ( max, eval(c,k00015)))) ‘and‘ -q00008(max,j)llLt( n,l)l|Lt(j,l)ll Lt ( s ( n) , j) ll Lt (max, eval (c,c00038)) ll Lt ( k00015, 1) ll - Lt( k00015, j) II T L L 66 - Equal ( max, eval (c, k00015))»; 8 l 8 8 8 _problem_ ; l _for_ ; ( - q00007( max, j) ‘or‘ _ib_ ( c, j) ) ‘and‘ (- q00007( max, j) ll Lt ( max, eval (c,j) ) ) ‘and‘ (-q00008(maX.j ) ‘Or‘ -(Lt(n,.i))); 8}8 _startup_ ; q00008( max, j) ; s }s Tind_ ; ((-q00008(max.j) ‘or‘ - (Lt(n. 1)))‘and‘ (-q00008( max,j) ‘or‘-(Lt(J',l)))‘and‘ (- q00008( max,j) ‘or‘ - ( Lt ( s ( n) .J') ) ) ‘and‘ - q00008( max,j ) u Lt ( k00020, 1 ) ‘or‘ - (Lt ( k00020 ,j ) ) ‘or‘ - (Lt ( max, eval(c,k00020)))) ‘and‘ (- q00008( max,j) ‘or‘ - (Lt ( c00039, 1 ) ) ) ‘and‘ (- q00008( max, j) ll Lt ( c00040,j )) ‘and‘ (- q00008( max, j) ll Equal (max, eval(c,c00041 ) ) ) ) ; 8 } 8 Ttrans_ ; - q00008( max,j) ll Lt ( n,j) ‘or‘ q00007( max , j) ; - q00007( max, j) ‘or‘ - _ib_ ( c, j) ‘or‘ - ( Lt( max, eval (c, j) ) ) ‘or‘ q00005( max , j) ; - q00007( max,j)‘ or‘ - _ib_ (c, j) ll Lt ( max, eval(c,j)) ‘or‘ q00006( max ,j ); - q00005( max, j)‘ or‘ - _ib_ (c, j) ‘or‘ q00003(eva1(c,j), j); - q00003( max, j) ‘or‘ q00006( max, j); - q00006( max, 0 ‘Or‘ q00002( max, 8 (J') ) ; ‘] _denial_ ; -q00004(max, j)l|Lt(n,l)|lLt(j,l)ll s( n), j‘) or‘ Lt(c00042, l))|lLt(k00025,l)‘or‘ Lt ( k00025, j) ) ‘or‘ Equal ( max, eval (c, k00025 ) ) ) ) ‘and‘ q00004(max,j)|lLt( n,l)ll J, l)llLt(s(n),j)ll c00043, j) ll Lt ( k00025, l ) ‘or‘ Lt ( k00025, j)) ‘or‘ f ( Lt (- ( ( ( ( Lt( Lt( -( 67 - ( Equal ( max,eval(c, k00025))» ‘and‘ (-q00004(max,j) llLt(n,l)llLt(j,l)ll Lt ( s ( n) , j ) ll Lt (max, eval(c, c00044)) ll Lt (k00025, j) ‘or‘ - Equal (max, eval (c, k00025))»; _problem_ ; { _for_ ; l ( — q00007( max, j) ‘or‘ _ib_ ( c, j) ) ‘and‘ ( - q00007( max, j) ‘or‘ -(Lt(max,eval(c,j))))‘and‘ ‘lg - q00008(maX.J') ‘or‘ - (Lt (n,j) ) ) ; 818 _problem_ ; { _for_ ; { - q00008( max, j) ll Lt ( n,j); 8}8 _startup_ ; q00008( max, j) ; 8 } 8 _ind_ ; { ((-q00008(max,j) ‘or‘ -(Lt(n,l)))‘and‘ (-q00008(max ,oj) ‘or‘ -(Lt (j, 1))) ‘and‘ {-q00008(max,j) ‘or‘-(Lt(s(n),j)))‘and‘ (- q00008( max,j) ll Lt ( k00030, l ) ‘or‘ - (Lt ( k00030 ,j ) ) ‘or‘ - ( Lt ( max, eval( c,k00030))))‘and‘ (:000008(max.j ) ‘or‘ -(Lt (c00045,l)))‘and‘ (- q00008( max, j) H Lt ( c00046,j ) ) ‘and‘ ( - q00008( max, j) II Equal ( max, eval (c,c00047 ) ) ) ) ; 8,8 trans ' l - q00008( max, j) ‘or‘ - ( Lt ( n, j) ) ‘or‘ q00004( max, j) ; TdeniaL ; ((-q00004(max,j)llLt(n,l)‘or‘ -(Lt( c00048, 1 ) ) ll Lt ( k00035, l ) ll Lt ( n, k00035) ‘or‘ - ( Equal ( max, eval (c, k00035) ) ) ) ‘and‘ (-q00004(max,j) llLt( n,l)‘or‘ - (Lt ( n, c00049) ) ll Lt ( k00035, l ) ll Lt ( n, k00035) ‘or‘ - ( Equal (max,eval( c, k00035) ) ) ) ‘and‘ (-q00004(max,j) llLt(n,l)ll 68 Lt( max, eval ( e, c00050) ) ll Lt ( k00035, l ) ll Lt ( n, k00035) ‘or‘ - (Equal(max,eval(c,k00035 ) ) ) ) ) ; 69 Results of Transformations Through a Sedfile After the final transformation, lasttrans , a sedfile , called sedlast, performs the final changes in edie.c. The results of the sedfile changes are used as ITP input. The sedfile is used to handle the few remaining items which TAMPR can not deal with. The items which TAMPR can not handle follow: (1) replacement of the c on the generated constants from quantout with a Cslt. (2) replacement of // and ‘or‘ with ITP’s symbol for or /. (3) replacement of ‘and‘ with a ,' and a carriage return. The result from sedlast : edie(Cn,Cc,Cmax)intCmax,Cn,c‘[1‘Cn]; _axioms_ ‘equiv‘ ( ( ( - (Lt ( C_x_ , numl ))) , (-(Lt((Cn).x_x_))))‘equiV‘ _ib_(C0.x_x_)); int C_psi_ ; int C_phi_ ; intCi,Cj ,Ck; ‘PROBLEM; * LIST FOR; * LIST STARTUP; q00010(Cmax , Cj ); 'LISTIND; -q00010(xmax,xj) l-(Lt(Cn,num1)); * LIST TRANS; -q00010(xmax,xj) l-__ib_(Cc,numl)l q00009(eva1(Cc ,numl),xj); -q00009(xmax,xj) |q00008(xmax,2); "' LIST DENIAL; 70 -q00008(xmax,xj)|Lt(Cn,numl)l Lt(xj,num1) I Lt(s(#nint#),xj)l -(Lt(000036,num1)) lLt(xk00015,num1)l -(Lt(xk00015,j))l-(Equal(xmax,eval(Cc,k00015))); -q00008(xmax,xj) l Lt(Cn,num1) l Lt(xj,num1)l Lt(s(Cn),xj) l Lt(c00037,j) I Lt(xk00015,num1) I -(Lt(xk00015,xj)) I -(Equal(xmax,eval(c,xk00015))); - q00008 ( xmax , xj ) I Lt(Cn,num1) I Lt(xj,num1) l Lt(s(Cn),xj) l Lt(max,eval(Cc,c00038))| Lt(xk00015,num1) I -(Lt(xk00015,xj))l ~(Equal(xmax,eval(Cc,k00015))); ’PROBLEMif; 'LISTFOR; -q00007(xmax,xj)l_ib_(CC.Xj); -q00007(xmax,xj)|Lt(xmaX.6V81(C.Xj)); -q00008(xmax,xj) | -(Lt(Cn,xj)); *LISTSTARTUP; q00008(Cmax,Cj); I'LISTIND; -q00008(xmax,xj)l - (Lt(Cn,num1)); -q00008(xmax,xj) l - (Lt(xj,numl )); -q00008(xmax,xj) I - (Lt(s(Cn).XJ' )); -q00008(xmax,xj) I Lt(xk00020,num1)l - Lt(xk00020,xj)l - (Lt(xmax,eval(Cc,xk00020))); -q00008(xmax,xj)|-(Lt(600039.num1))3 -qoooo3(xmax,j) I Lt(c00040,xj); -q00008 (xmax ,xj) l Equal (max ,eval(Cc ,c00041))P; a"LISTTRANS; -q00008(xmax,xj) | Lt(Cn,xj)lq00007(xmax.Xj); 71 -q00007(xmax,xj) l-_ib_(Cc,xj)l -(Lt(xmax,eval(Cc,xj))) l q00005(max ,xj); -q00007(xmax,xj) |-_ib_(Cc,xj) I Lt(xmax,eval(Cc,xj)) | q00006(max,xj); -q00005(xmax,xj) l-_ib_(Cc,xj)l q00003(eval(Cc,xj),xj); -q00003(xmax,xj) |q00006(xmax,xj); -q00006(xmax,xj) |q00002(xmax,s(Xj)); *LISTDENIAL; -q00004(xmax,xj)| Lt(Cn,num1)l Lt(xj,num1) I Lt(s(Cn),xj) I -(Lt(c00042,num1)) |Lt(xk00025,num1)l -(Lt(xk00025,xj)) l-(Equal(xmax,eval(Cc,xk00025) )); -q00004(xmax,xj) |Lt(Cn,num1)lLt(xj,num1)l Lt(s(Cn),xj) l Lt(c00043,xj)l Lt(xk00025,num1) I -(Lt(xk00025,xj)) I -(Equa1(xmax,eval(Cc ,xk00025))); -q00004(xmax ,xj) I Lt(Cn,numl) I Lt(xj ,numl) I Lt(s(Cn),xj) I Lt(xmax,eval(Cc,c00044)) l Lt(xk00025,num1)l -(Lt(xk00025,xj))l-(Equal(xmax,eval(Cc,xk00025int ))); I"PROBLEM'II; ’LISTFOR; -qoooo7(xmax,xj)l_ib_(Cc.xj); -q00007(xmax,xj) I -(Lt(xmax,eval(CC,Xj))); -q00008(xmax ,xj) I -(Li(Cfl.Xj)); ‘PROBLEM'fi; ’LISTFOR; -q00008(xmax,xj) | Lt(Cn,xj); * LIST STARTUP; q00008(Cmax,Cj); " LIST IND; 72 -q00008(xmax,xj)l -(Lt(Cn,num1)) ; -q00008(xmax,xj) I -(Lt(xj,num1) ); -q00008(xmax,xj) I -(Lt(s(Cn),xj)); -q00008(xmax,xj) I Lt(xk00030,numl) I -(Lt(xk00030,xj))l- (Lt(xmax,eval( Cc ,xk00030) )); - q00008(xmax,xj) I -(Lt(c00045,num1 )); -q00008(xmax,xj) | Lt(c00046.xj); - q00008( xmax,xj) | Equal(xmax ,eval(Cc,c00047)) ; " LIST TRANS; - q00008 ( xmax , xj) I - ( Lt(Cn,xj)) I q00004 ( xmax , xj ); "' LIST DENIAL; -q00004(xmax,xj)| Lt(Cn,numl)I -(Lt(c00048,num1))IlLt(xk00035,num1)l Lt(Cn,xk00035) I -(Equal(xmax,eval(Cc,xk00035))); -q00004(xmax, xj) I Lt(Cn,numl) I -(Lt(Cn,c00049‘)) l Lt(xk00035,num1)l Lt(Cn ,xk00035 )I-(Equal(xmax,eval(Cc,xk00035 ))); -q00004(xmax,j) l Lt(Cn,numl) I Lt(xmax,eval(Cc,c00050)) I Lt(xk00035,num1)l Lt(Cn,xk00035) I - (Equal(xmax ,eval(Cc,xk00035))); 73 ITP Run on Edie.c I'I'Pisaclausebasedsystemthatcandeducenew facts fiomasetofclauses. By itselfITPhasno knowledge about any subject. Therefor before using ITP to deduce anything about a program path, one must first specify rules corresponding to the problem domain of the program. This specification is in the form of given clauses and these basic clauses make up the axiom list. In the case of edie.c it is necessary to give the prover knowledge about what it means for values to be equal, less then, successors and predeces- sors. It is also necessary to give the prover information about arrays and array indices. Once the axiom list is developed it is used with a program path so that IT? can derive new clauses forrn the existing ones using one or more inference rules. The following are some of the clauses for edie.c: Clauses dealing with equality, successor and predecessor functions: - Equal(xltxz) I Equal(Sk2(xl).Sk2(x2)); Lt(xl,S(xl)); Lt(Pd(x1),x1); Equal(xl,xl); The equal clause prevents the prover from deducing useless facts had it not known that something is equal to itself. Clauses dealing with arrays and array indexes: -Ib(Cc,xl) | -Lt(x1,Numl); This clausereads thatifxl isaindex ofarray c then itcannotbe less than one. Equalarr(xl,x1); if Ib(xl,x2) 8t Ib(xl,x3) 8t Equal(x2,x3) then Equal(Eval(xl,x2),Eval(x1,x3)); If two indexes of an array xl are equal then the value of the array x] at the index is the same. Once the basic axioms are defined they are used along with each program path to create a proof by contradiction. It is necessary to make two runs for each path in ITP. The first run generates a extended denial for the program path incorporating the transitions and embedding them into the denial. The denial now represents the conditions for the path it corresponded to. The new denial can next be used along with 74 the basic axiom list to come up with a contradiction by making another nm of ITP. First run of ITP ITP must be forced into the direction of a proof. For the problem of edie.c binary resolution is used , the literals are weighed and the denial is moved the set of support list. Moving the denial forces it to be used in a clash against other clauses to produce new facts. With this IT? has been guided in the direction of finding a contradiction and it has been prevented from creating additional information which is not of interest. Shown below is the denial of path] before and after the first run of ITP. Pathl denial before first run of ITP; -Q07(xmax.Xj) ILt(xj,Numl) I Lt(S(Cn),xj) l -Lt(xk,xj) I Lt(xk,Num1) I -Ib(Cc,Skl(xk)) ILt(xmax,Eva1(Cc,Skl(xk))) I -Ib(Cc,xk) I - Equal(xmax,Eval(Cc,xk)); -Q07(xmax.xj) I Lt(xj,Numl) I Lt(S(Cn),xj) I -Lt(Sk1(xk),Num1) l Lt(xk,Num1) I -Lt(xk.xj) I -Ib(Cc,xk) I -Equal(xmax,Eval(Cc,xk)); -Q07(xmax.xj) I Lt(xjNuml) I Lt(s(Cn).xj) I -Lt(Sk1(xk).xD I Lt(xk,Numl) I -Lt(xk.X.I) I -Ib(Cc,xk) I -Equal(xmax,Eval(Cc,xk)); Pathl denial after first ITP run: - Lt(xl,S(Numl)) I - Ib(Cc,x1) I - Equal(Eval(Cc,Numl),Eval(Cc,x1)) I - Ib(Cc,Num1)) I Lt(SCNuml),Num1) ILt(S(Cn),S(Numl)) ILt(Sk1(x1),S(Numl)) I Lt(x1,Numl); -Lt (Skl(xl),Num1) I - Lt(xl,S(Numl)) I - Ib(Cc,xl) I - Equal(Eval(Cc,Numl),Eval(Cc,xl)) I - Ib(Cc,Num1)) I Lt(S(Numl),Num1) I Lt(S(Cn),S(Numl)) I Lt(x1,Num1); -Ib(Cc,Skl(xl)) I - Lt(xl,S(Numl)) I - Ib(Cc,xl) l - Equal(Eval(Cc,Numl),Eval(Cc,xl)) I - Ib(Cc,Num1)) I Lt(S(Numl),Num1) I Lt(S(Cn),S(Numl)) l Lt(xl,Numl) I Lt(Eval(Cc,Num l),Eval(cc,Sk 1(x1))); The darkened items in each of the three clauses after the ITP run designate the places where the three clauses differ. This information will be valuable when it comes to the second run of ITP. Second pass of ITP In using ITP to create a proof it is necessary to understand the way the clauses one is trying to disprove are setup. In knowing this, one can develop an optimal strategy in using ITP. In the case of edie.c, after the first pass in ITP the denial statement of each path is of the same form: 75 xllx2lx3Ix4lx5Ix6Ix7lx83nd xllx2lx3lx4lx5lx6lx7lx9and x1Ix2Ix3Ix4Ix5lx6lx7Ix10; where most of the literals are the same for each clause. This would suggest using a case analysis on each literal to evaluate if it is true. For each literal it is necessary to arrive at a contradiction. For the common literals, this need only be done once leaving three unit clauses consisting of different literals. Eventually we can state that none of the literals making up one of the statements in the denial is true therefore creating a contradiction in the entire denial. 76 SECTION V Limitations with this method In using the described method of software verification there exist a number of obstacles and perceiv- ablc problems which follow: (1) The user is required to have a fairly complete knowledge of theorem proving techniques and ITP, which is not a trivial thing to obtain. (2) The software and corresponding hardware requirement in using TAMPR and ITP must be considered. (3) The time it takes to transform a segment of C code (anywhere from 5 to 60 minutes, depending on the machinelmd)aswellasthetime forITPtocreateaproofisnotsmall. (4) The user is expected to be able to specify input, output and loops assertions properly. This is some- times a hard thing to do and can require much time. (5) There is a restriction on the size of the code that is reasonable to prove at a time. (6) C is not yet a standardized language. This certainly limits the versatility of the condition generator at present. Altering the version of C would at minimum require an alteration in the language scanner and parser. At maximum it would also require altering some of the transformations. 77 SECTION VI Summary and Concluding Remarks A method for automatic verification of code written in C has been described. The process involves transforming C code (with assertions) into separate problems which correspond to different possible paths of flow a program can take. This transformation process also produces the verification conditions which correspond to each program problem path. A proof using ITP can then be created for each problem path, verifying that the code satisfies the users assertions. The method described for software verification can be an efficient, reliable and time saving tool when the obstacles such as obtaining a working knowledge of ITP have been overcome. The process of generating verification conditions using TAMPR can be extended for other programming languages. As an example, there is a similar condition generator for Fortran 66 [3] [8]. Preparing TAMPR to work with additional languages is a time consuming process, requiring development of a scanner and parser for the desired language and requiring a thorough knowledge of TAMPR. The existing transformation set for C can be altered to create desirable input to other reasoning systems such as Prolog as possible alternatives in the proof of correcmess, thereby making the system flexible to changes in theorem proving techniques and tools. Use of this system for verification is already under way, in the EBRII reactor control system men- tioned earler. The applications extend to many other fields as well, where high reliability in software is necessary. APPENDICES APPENDIX A 78 APPENDIX A Alterations in the C Grammar The following is a detailed explanation of the changes made to the language C for use in a software verification project using the TAMPR system. Also included are the changes made to TAMPR system so that it could be used with C. Program Grammar Specification The C grammar mustbe extended so that itcan be used in the TAMPR system to create transforma« tions for a C verification condition generator. The base grammar used is the LALR(1) grammar for C from A C Reference Manual by Harbison & Steele [7]. Additions to the C Grammar: Change 1 All terminal symbols were made into non-terminals (by adding < > about them) , and a production was added to connect them with their corresponding terminals. As an example: auto &a void &p becomes d’ra &p auto dip void 81p &a is a symbol designating alternative rules for a production. &p specifies the end of the rule. 79 In the production it is not possible to add < > around char and float since the production for already has the terminal symbols and <float>. To distinguish between these terminal and nonterminal uses of and <float> the char and float of the is represented as and <float_t> respectively. This is an inconsistency in the extended grammar but the user need not know about it as it is only used within the grammar specification. Change 2: Constants and identifiers are extended to allow them to carry mark and type information. This infor- mation will be used by ITP to distinguish variables fi’om constants in the theorem proving input language. - defined as the production: &a &p - defined as the production: &a &p Because of intersections with other parts of the C grammar the is required at the start as well as end of the production. Change 3 Addition of an id to the C grammar. - A terminal symbol (as in Fortran). Change 4 Addition of the use of a . is an extension to the C language and is required (as in Fortran) to mark statements in transformations. Reasons for marking include: (A) a partiCular transformation can change the mark once it has applied. (B) a transformation can be applied or not applied depending on the particular mark used. 80 NOTE: a is an optional . In the Fortran verifier is placed at the beginning of all productions which end with a senti- colon. In C , placing the (prefix) at the start of productions ending with a semicolon causes a conflict in the grammar. To overcome this problem it is necessary to place the immediately before the semi- colons. Change 5 In C there are several productions in which one of the alternates can be empty. In order to avoid confusion in the recognizing stage it is necessary to make the grammar specification contain only one empty production &p. and use the symbol anywhere an empty production previously occurred. A new production would look like : &a &p &p to replace: &a &p Change 6 The following extension is required so that assertions can be created within the C code using the key- word ‘assert‘ as well as with ‘assume‘ and ‘invariant‘. <‘assert‘> <;> &p <(> <)> &p <‘assert‘> ‘assert‘ &a 81 ‘assume‘ &a ‘invar'iant‘ &p Change 7 The following extension is required so the quantifiers ‘every‘ and ‘exists‘ can be used within ‘assert‘ statements. <(> <)> &p ‘every‘ &a ‘exists‘ &p Change 8 A similar extension as in 7 is used so that a user can use && ‘and‘, // ‘or‘ , ' ‘not', ‘impl‘ -->, and === ‘equiv‘ interchangeably. <&&&&> &&&& &a ‘and‘ &p II &a ‘or‘ &p <“> " &a ‘not‘ &p <«&>> --&> &a ‘iran‘ &p <==> == &3 ‘equiv‘ &p Omissions to the C Grammar 82 The C use of typedef has been omitted in the present extended C grammar used in this project. Since the symbols and are lexically identical the scanner cannot distinguish between them unless it can obtain some semantic information about s . To include typedef some sort of construct would have to be set up (between the scanner and the parser) to store information on certain variables which are actually of type typedef rather titan a regular variable. This will eventually need to be implemented. Harbison & Steele’s "A C reference manual" mentions this problem in implementation (see page 121). Changes to the Transformation Grammar The following is a description of the changes made to the transformation grammar so that C could be used with TAMPR. The original transformation grammar was designed for transformations written in For- tran. Change 1 The changes made to the TRANSFORMATION GRAMMAR were : All occurrences of : ? changed to ‘7‘ I changed to ‘I‘ } changed to ‘] ‘ . changed to ‘ A-Z changed to a-z The above modifications are necessary as {. }. . and ? occur in both the C-grammar and the transfor- mation grammar. In order to distinguish between .7 in the transformation grammar and .7 in the C grammar some unique symbol must be used, such as putting back-quotes around all transformation symbols. Change 2 The dotted transformation symbols are changed to back-quoted transformation symbols and con- verted to lower case characters. 83 .T‘RANSFORM". becomes Transform“ This is not necessary but it makes the C-transformations consistent in that all transformation symbols will now have back-quotes around them. APPENDIX B 84 APPENDIX B Changes to Tools The program and transformation scanners are built using the executable file fscancompiler which is made up from a number of routines. Originally these routines were used in making the fscancompiler for the extended Fortran66 scanners. Because C has several very different properties from Fortran, some changes were required in the C fscancompiler. The transformation scanner creates tables containing numbers which represent both the transforma- tion language grammar symbols and the C language grammar symbols. When testing the transformation scanner, not all of the symbols were accepted by the scanner. There was no particular pattern in which symbols were not accepted; it appeared to depend on the number of symbols rather than the particular ones. This led to the belief that one or more arrays were overflowing. Tests were put in the fscan.f routine (where the arrays sizes were defined). Several of the arrays were found to be overflowing. The appropriate array sizes had to be increased. The two main words are AKSHNI and AKSHNZ and contain the arrays AKTYPE, AKSET, CALLAK, VALLOC and NEXTAK. The arrays CALLAK, VALLOC and NEXT/1K overflowed. It was sufficient to increase the size of these arrays by one bit. Instead of having 12 bits these arrays now have 13 bits, and allow space for up to 2e+13 = 8192 locations instead of the previous 2e+12 = 4096 locations. These tests are left in the fscan.f routine. In the future if this routine is being used to make an fscan- compiler for another language overflow can be easily detected and corrected. Changes to NTLR The program parser and superstart grammar tables are built using the executable file ntlr which is made flour a number of Fortran routines. Originally these routines were used in making the ntlr for the extended Fortran parser and tables. Some changes were required to these routines in order to make the parsersfor C. 85 The ntlr creates the grammar table output (lr.dat) and a listing file depending on which toggles were set. For the purposes of this project the following toggles were set: In the program parser specs. &j - to switch off the lrltran tables &k - to switch on the ansi 66 Fortran tables &z - to produce a grammar symbol table for the TAMPR transformer. In the superstart specs. &i &k &u - to produce tables that will allow non-terminals to appear in the parser input. The following changes were required: Change 1 In creating the tables for the extended C grammar, the error LIST SPACE OVERFLOW occurred. This error is listed in DIAGNOSTICS[9] and the action suggested is to increase the item array using the script item.ex . For the purposes of C the item array was increased from 55000 to 75000. Change 2 When creating the tables after the production had been added the error Memory Fault - core dumped error code 139 occurred. The core was very small (around 1200 characters ), the lr.dat was created but empty, and the listing was created but aborted during the last configuration set. This was traced back to the routine rel.f which was being called from genthdf with the parameter thedptO') . relf did not test for this parameter being zero (which appeared to happen when there were productions in the grammar) and so an attempt was being made to access the zero location of an array which started at location 1. This problem is fixed by including the test in rel.f if (iptr .eq. 0) go to 10 10 continue 86 If any symbols are added to or removed from the grammar specifications then check the table of non- terminal symbols generated by setting the &u flag in the superstart, to see that the table lists the as 671 286 . If the number is NOT 671 it will be necessary to change this value in the routine additl.f which tests for the production and prevents it from being written out by a later routine. if(iarg.eq.67l)goto... So, if the number of symbols is changed in the grammar specs the number will probably be dif- ferent. TOFIXTHIS: (1) Create the listing using the old ntlr ( it will probably have unexpected errors in it but ignore these because they may disappear after fixing the number), Note the new _ 286 number. (2) Change the routine additl.f to test for the new number. (3) Make the new ntlr. (4) Remake the listing . *** This should be done automatically by one of the routines by looking for the symbol and excluding it from the nonterminal symbol table. "* We were not able to locate the array holding this information. "* This should be fixed! Change 3 The lr.dat did not have the first occurrence of integer aligned properly. This was corrected by initial- izing the first five positions of the array line in the routine tableaf to blanks. 87 Change 4 The first configuration set in the listing was written out properly but did not have any spaces between the words. This was corrected by initializing the array line to blanks in the routine pntbasf Change 5 The last error needing correction in the listing was the fact that the productions with nothing on the RHS were not being printed out. After several attempts at writing these productions we concluded that was left out intentionally by the original programmer. Once the ntlr was made with the changes above an attempt was made to use it in making the LR(1) parser for C. The following errors occurred: Error on line of blockde ..... etc. This error appears to have been caused when splitting the grammar table output (lr.dat) into the dintfile, data] and data2 files by the script split.csh . The split.csh routine calls a routine called split.sed which puts the integer declarations into the dimfile. It appears to treat the first integer differently so it was concluded that when the routine tableaf was written, the programmer had left out the initialization required to position the first integer correctly so , rather that change the routine and hence the ntlr , he/she decided to account for this error in the sedfile split.sed . Since we had already fixed the routine tableaf it was necessary to then fix split.sed . APPENDIX C 88 APPENDIX C Example Scanner Tokenizer Using FSCAN TOKENS "ident" ’dummy’ "decint" ”hexint" "octint" "flcnst" "string" "chcnst" "tpdfnm" ’auto’ ’break’ ’case’ ’char’ ’contin’ ’deflt’ ’do’ ’double’ ’else’ ’enum’ ’extern’ ’float’ ’for’ goto’ ’if’ ’int’ ’long’ ’regstr’ ’retum’ ’short’ sizeof’ ’static’ ’struct’ ’switch’ ’typdef' ’union’ unsgnd’ ’void’ 'while’ eq’ ’pluseq’ ’minseq’ ’stareq’ ’slsheq’ ’pcnteq’ ggeq’ ’lleq’ ’arnpeq’ ’cflxeq’ ’vbareq’ eqto’ ’noteq’ ’lt’ ’le’ ’ge’ ’gt’ ’lshift’ ’rshift’ ’plus’ ’minus’ ’multop’ ’divop’ ’prcent’ ’not’ ’tilde’ ’postin’ ’postdc’ 'arrow’ ’bitand’ ’exclor’ ’inclor’ ’logand’ ’logor’ ’condop’ ’smcoln’ ’rbrace’ ’comma’ ’lbrace’ ’colon’ ’lbrac’ ’rbrac’ ’rsqbrc’ ’lsqbrc’ ’period’ ’equivl’ ’zexstz’ ’zeqivz’ ’zasrtz’ ’zasmez’ ’zimplz’ ’zinvtz’ ’znotz’ ’zandz’ ’zorz’ ’zevryz’ 9 9 9 9 9 ’ SCANNER c : c -> (spaces cl)‘ spaces ; spaces -> (’ ’/EOS/tab)"' ; tab -> ’# ’; EOS -> ’EOL’; SCANNER cl: c1 -> identifier / delimiter ; END cl ; SCANNER identifier: identifier -> firstchar (nextchar)* => "ident"; END identifier; firstchar -> letter/underscore ; nextchar -> letter/underscore/digit; letter -> ”A”/"B"/"C"/"D"/"E"/“F”/”G”/"H”/"l"/"J"/" K"/"L"/"M" /"N"/"O"/"P"/"Q"/"R"/"S"I'T"/"U"/"V"/"W"/"X”/"Y"/"Z" /"a"/"b"/"C"/"d"/"c"/"t"/"g”/"h"l"i"/"j"/"k"/"l"/"m" I"n”/"o"/"p"/"q"/"r”/"8"/"t"/"u"/"V"/"w”/"x"/"y"/"z"; tmderscore -> ”_” ; digit -> "0'7"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8"/"9" ; 89 SCANNER delimiter : delimiter -> ';' ws => ’smcoln’ -3. T ws => ’rbrace’ -> 1* ws => ’lbrace’ -> '(' ws => ’lbrac’ -> ’)’ ws => ’rbrac'; END delimiter ; ws ->( a r / "EOL")* ; ENDc. REFERENCES l. Boyle, James and M. Matz, "Automating Multiple Program Realizations", Proc. MRI Symp.. XXIV: Compt. Software Eng., Brooklyn, NY, pp. 421-456, 1976. 2. Boyle, James and Monaqur N. Muralidharan, "Program Reusability Through Program Transformation", IEEE Transactions on Software Engeneering, Vol. SE-IO, No.5, September 1984. 3. Boyle, James, unpublished report, "An Introduction to the Transformation-Assisted Multiple Program Realization (TAMPR) System", Argonne National laboratory, Mathematics and Computer Science Division. 4. Boyle, James and Ken Dritz, unpublished book, "The Art and Science of Writing TAMPR Transformations", Argonne National laboratory, Mathematics and Computer Science Division. 5. Chisholm, G.H., J.Kljaich, B.T. Smith, A.S.Wojcik, "Preliminary Report-Analysis of the Draper FTP Hardware and Software Using ITP", Argonne National Laboratory, Mathematics and Computer Science Division, ANL/MCS-TM-59, September 1985. 6. Clemm, G.M., "FSCAN-81 Report and Users Manual”, Interim Technical report, US. Army Research Office, contract ntunber DAAGZ9-78-G-0046, 1981 . 7. Harbison, Samuel P. and Guy L. Steele Jr: A C Reference Manual, Prentice-Hall Software Series, 1984. 8. Henderson, David, unpublished report, "Use of Software Tools in Building Program Recognizers for the TAMPR System", Argonne National Laboratory, Mathematics and Computer Science Division. 9. Henderson, David, unpublished report, "Diagnosis Documentation Used in Creating a Tree Builder with TAMPR”, Argonne National Laboratory, Mathematics and Computer Science Division. 10. Lusk, Ewing L, and Ross A. Overbeek, "The Automated Reasoning System ITP", Argonne National Laboratory, Mathematics and Computer Science Division, 1984. ll. Lusk, Ewing L, and Ross A. Overbeek, "Logic Machine Architecture Inference Mechanisms- Layer 2 User Reference Manual", Argonne National laboratory, Mathematics and Computer Science Division, ANL—82-84, December 1982. 12. Lusk, Ewing L, William Mch and Ross A. Overbeek, "Logic Machine Architecture: Inference Mechanisms", Proceedings of the Sixth Conference in Automated Deduction, Springer-Verlag Lecture Notes in Computer Science, Vol 138, (ed. Robert Kowalski and Wolfgang Bibel, L), pp. 85-108, 1982. 13. Shannon, A.W., and C. Wetherall, "LR-automatic Parser Generator and LR(1) Parser", IEEE Transactions on Software Engineering, May 1981. 91 14. Smith, Brian T., unpublished memoranda, "My Users Guide to TAMPR", Argonne National Laboratay, Mathematics and Computer Science Division, 1980. 15. Wos, Lusk, Boyle, Overbeek: Automated Reasoning - Introduction and Applications, Prentince-Hall, Inc., Englewood Cliffs, NJ, 1984.