3%?
*h. '

3°“

\1‘. ‘
v) A '
1L.

.4, > _ :- a
.2?” lei-$3 ‘l
C?....“‘
r’ﬁl‘; t

Wt???"

E

|
‘ Y1;
;

'L_> 4
r.
. v
'C‘

5-9

‘ 1

'9
'3‘ u

$33???

 

~:-._-.2

'5

(6‘7"

 

 

 

‘-‘ «:5? -. .
{ha-E

'<

 

‘15

M.

 

”é

AF"

 

. .J V
-..:v. r.
- ,wifprr‘ﬁﬁ'
4 mm; !.' "'r x _ .
I" , 3,.) “.3. .g
" ' - ‘rw‘v 9:2.
U“. ”25,55“... u

 

This is to certify that the

thesis entitled

The Application of Formal Methods to the
Reverse Engineering of Imperative Program
Code

presented by
Gerald C. Gannod

has been accepted towards fulﬁllment
of the requirements for

M.S. degree in Computer Science

 

 

 

M or professor

 

0-7639 MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

|\\\\\\\\\\ \\ W W W \\\\\\\2\\‘2 ‘\\\\0\\\\2 ‘\\\\1\\\\\i

3 1293 0102

 

LIBRARY
Michigan State
Unlverslty

 

 

 

PLACE EIN RETUR RN BOXto reomov othls hockoutfrom you r.record
TO AVOID FINES returno nor before date duo.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Luminous-M

 

The Application of Formal Methods to the

Reverse Engineering of Imperative Program Code

By

Gerald C. Gannod

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

MASTER OF SCIENCE

Computer Science Department

May 1994

ABSTRACT

The Application of Formal Methods to the

Reverse Engineering of Imperative Program Code
By

Gerald C. Gannod

Formal methods in software development provide many beneﬁts in the forward
engineering aspect of software development. One of the advantages of using formal
methods in software development is that the formalnotations are precise, veriﬁable,
and facilitate automated processing. Reverse engineering is the process of construct-
ing high level representations from lower level instantiations of an existing system.
There are two main objectives that the research described in this thesis covers. First, a
new approach to the construction of formal speciﬁcations from program code has been
developed. To this end, a propagational model of translation has been developed that
provides the framework for performing reverse engineering using the direct transla-
tion of programming constructs into corresponding formal speciﬁcations. Using this
framework, a number of translation rules for constructing formal speciﬁcations for
programming primitives, including assignment, alternation, iteration, sequence, and
procedure call, have been developed. The second objective of this research involves
the movement of procedural software towards ob ject-orientation. We have developed
an approach that uses clustering and restructuring of formal speciﬁcations at the

procedural level to identify objects embedded in programs.

Copyright © by
Gerald C. Gannod
May 1994

To G.T. and RC, I always thought I’d see you again.

iv

ACKNOWLEDGMENTS

I’d like to thank my parents, siblings, and friends for their understanding and
support. Well deserved recognition goes to the Committee for Institutional Cooper-
ation Summer Research Opportunity Program and the McNair Post Baccalaureate
Research Program at Michigan State University (specifically, Mary Lee and Shawn).
Without that program, none of this would have been possible. I would also like to
thank Bob Bourdeau and the rest of the students in the software bullpen for their
insights into this work. My utmost respect and gratitude goes to Dr. Cheng, my
advisor and mentor, for helping me ﬁnd the way. Finally, to the dead (and dread)
poets and jazz musicians of the world, may you always say the right verse and play
the right note.

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

1 Introduction

1.1 Motivation .................................
1.2 Research Contributions ..........................
1.3 Organization ...............................

2 Background

2.1 Software Engineering ...........................
2.2 Software Maintenance ..........................
2.2.1 A Model of Re-engineering ..... ' ...............
2.3 Applications of Reverse Engineering ...................
2.4 Formal Methods ..............................
2.4.1 Propositional Logic ........................
2.4.2 Predicate Logic ..........................
2.4.3 Program Semantics ........................

3 A Translational Approach to Reverse Engineering

3.1 Approach .................................
3.2 Model of Programs with Annotations ..................
3 .2. 1 Variables ..............................
3.2.2 Statements ............................
3.2.3 Annotations ............................
3.2.4 Relating Statements and Annotations ..............
3.2.5 Annotation Interpretations ....................
3.3 An Annotation Interpretation based on sp ...............
3.3.1 Translational Propagation ....................

4 Primitive Constructs

vi

viii

ix

[emu—Iii

CDCDWOSMIBQ

10
11

l4
14
15
16
18
19
19
20
20
21

23

4.1 Assignment ................................ 23

4.2 Alternation ................................ 24
4.3 Sequence .................................. 25

5 Iterative and Procedural Constructs 27
5.1 Iteration .................................. 27
5.2 Procedural Abstractions without Recursion ............... 30

6 Moving towards Object-Orientation 34
6.1 Identiﬁcation of Classes Using Formal Speciﬁcations .......... 34
6.1.1 Guidelines ............................. 35

6.1.2 Speciﬁcation Language ...................... 36

6.1.3 Example Identiﬁcation of Candidate Objects .......... 37

7 Modeling the Representation and Abstraction of Imperative Lan-

guages 41
7.1 Properties of Block Structured Languages ............... 42
7.1.1 Static Scope Rules ........................ 42

7.1.2 Statements ............................ 43

7.1.3 Procedural Abstractions ..................... 43

7.2 Modeling ................... 7 ............... 44
7.2.1 Variables, Symbols, and Types .................. 44

7.2.2 Statements, Rules, and Formal Speciﬁcations .......... 47

7.3 Example .................................. 54

8 Related Work 59
8.1 Formal Approaches ............................ 59
8.1.1 Function Abstraction ....................... 59

8.1.2 A Knowledge Based Transformational System ......... 62

8.2 Object-Orientation ............................ 65
8.2.1 Object Identiﬁcation ....................... 65

8.2.2 REDO - Objects ......................... 68

8.3 Comparison of Approaches ........................ 70

9 Conclusions and I'hture Investigations 72
BIBLIOGRAPHY 75

vii

LIST OF TABLES

2.1 BNF Grammar for Propositional Logic [1] ............... 10
2.2 Properties of the top and wlp predicate transformers .......... 12
6.1 Grammar for object speciﬁcations .................... 36

viii

 

 

 

 

LIST OF FIGURES

2.1 Relationship between re—engineering terms ............... 7
2.2 Graphical Depiction of a Re-engineering Model [2] .......... 8
5.1 Steps for abstracting the effect of iteration statements ......... 29
5.2 Removal of procedure call p(h',5,E) abstraction ............. 33
6.1 Example Pascal Code ........................... 38
6.2 A Formal Speciﬁcation of Example Code in Figure 6.1 ........ 39
6.3 Speciﬁcation of Object St ........................ 40
7.1 Formal Speciﬁcation of SymbolTableElement using LSL ...... 45
7.2 OMT Object Model of SymbolTable ................. 46
7.3 Example Sequence of Pascal Code .................... 47
7.4 Object Model of Statements ....................... 48
7.5 Object Instance Model for code sequence of Figure 7.3 ........ 49
7.6 Code sequences partially annotated with speciﬁcations ........ 51
7.7 Example with speciﬁcations annotated by the AUTOSPEC [3] system . 54
7.8 Example Pascal program ......................... 56
7.9 Output created by applying AUTOSPEC to example .......... 57
7.10 Output created by applying AUTOSPEC to example (cont.) ...... 58
8.1 Stages of Development .......................... 63
8.2 Languages for Development ....................... 63
8.3 COBOL File Operations [4] ....................... 69

ix

CHAPTER 1

Introduction

1 . 1 Motivation

The demand for software correctness becomes more evident when accidents, some-
times fatal, are due to software errors. For example, recently it was reported that
the software of a medical diagnostic system was the major source of a number of
potentially fatal doses of radiation [5]. Other problems caused by software failure
have been well documented and with the change in laws concerning liability [6], the
need to reduce the number of problems due to software increases.

Formal methods in software development provide many beneﬁts in the forward
engineering aspect of software development [7, 8, 9, 10, 11]. One of the advantages
of using formal methods in software development is that the formal notations are
precise, veriﬁable, and facilitate automated processing [12]. Reverse Engineering is
the process of constructing high level representations from lower level instantiations
of an existing system. One method for introducing formal methods, and therefore
taking advantage of the beneﬁts of formal methods, is through the reverse engineering

of existing program code into formal speciﬁcations.

1.2 Research Contributions

There are two main objectives that the research described in this thesis covers. First,
a new approach to the construction of formal speciﬁcations from program code has
been developed. To this end, a new propagational model of translation has been devel-
oped that provides the framework for performing reverse engineering using the direct
translation of programming constructs into corresponding formal speciﬁcations. Using
this framework, a number of translation rules for constructing formal speciﬁcations
for programming primitives, including assignment, alternation, iteration, sequence,
and procedure call, have been developed. Finally, this translational framework is
applied to the imperative programming paradigm, using Pascal as a model [13].
The second objective of this research involves the movement of procedural software
towards ob ject-orientation. Other projects have focused on source code clustering [l4]
and functional abstraction [4]. We have developed an approach that uses clustering
and restructuring of formal speciﬁcations at the procedural level to identify objects

embedded in programs [15].

1 .3 Organization

The remainder of this thesis is organized as follows. Chapter 2 provides background
information pertinent to the area of software maintenance and the support of reverse
engineering using formal methods. The research contributions are described in detail
in Chapters 3 through 7. Chapter 3 describes the translational propagation model
for the construction of formal speciﬁcations from program code. Chapters 4 and 5
discuss the translation rules for the assignment, alternation, iteration, sequence, and
procedure call programming primitives. Chapter 6 discusses the application of the
translational approach to the imperative programming paradigm. The method for

identifying objects in programs using formal speciﬁcations is described in Chapter 7.

Chapter 8 presents related work in the area of reverse engineering, and Chapter 9

draws conclusions and discusses future investigations.

CHAPTER 2

Background

2.1 Software Engineering

The following deﬁnition of software engineering has been offered by Fritz Bauer [16]:

The establishment and use of sound engineering principles in order to
obtain economically software that is reliable and works efﬁciently on real

machines.

In 1968 and 1969, workshops were held in Garmisch, West Germany and Rome, Italy
with the intent of addressing the growing problems associated with the construction of
computer software. The phrase “software crisis” was coined to describe the increas-
ing difﬁculty associated with the development of software. It was recognized that
processes for developing and managing software were desperately needed in order to
respond to the software crisis. Since then, a number of software life-cycles have been
developed with the intent of producing “software that is reliable and works efﬁciently
on real machines”.

The classical life-cycle, better known as the “waterfall” life-cycle, provides a sys-
tematic process for developing software. It involves ﬁve phases including requirements

analysis, design, implementation, testing, and maintenance [17].

The requirements phase is used to deﬁne the needs of a customer or user. The main
task is to determine the required function of the software system based on information
gathered about the problem domain. The design phase is used to reﬁne the informa-
tion gathered in the requirements phase by focusing on issues of software architecture,
behavior, and presentation. The implementation phase involves the translation of the
design into a machine executable form. Here the issues include efficiency considera-
tions, low level algorithmic details, and language. The purpose of the testing phase is
to verify that the software performs as expected. Testing strategies include black-box
testing, white-box testing, and branch testing. The maintenance phase encompasses
all operations that occur after the release of the software. Redesign, modiﬁcation,
and enhancement are all operations resulting from the maintenance of a system.

Other types of life-cycles have been created including the reuse, prototype, and
spiral models [17]. Each has advantages over the classic life-cycle, but in general

incorporate, to some degree, the phases deﬁned by the waterfall model.

2.2 Software Maintenance

Software maintenance has long been a problem faced by software professionals, where
the average age of software is between 10 to 15 years old [18]. With the development
of new architectures and improvements in programming methods and languages, in-
cluding formal methods in software development and object-oriented programming,
there is a strong motivation to reverse engineer and re-engineer existing program code
in order to preserve functionality, while exploiting the latest technology.

Reverse engineering of program code is the process of constructing a higher level
abstraction of an implementation in order to facilitate the understanding of a system
that may be in a “legacy” or “geriatric” state. Re-engineering is the process of

examining, understanding, and altering a system with the intent of implementing the

system in a new form [19]. The beneﬁts offered by re—engineering versus developing
software from the original requirements is considered to be a solution for handling
existing (legacy) code because much of the functionality of the existing software has
been achieved over a period of time and must be preserved for many reasons, including
providing continuity to current users of the software [20].

One of the most difﬁcult aspects of re—engineering is the recognition of the func-
tionality of existing programs. This step in re—engineering is known as reverse engi-
neering. Identifying design decisions, intended use, and domain speciﬁc details are
often the main obstacles to successfully re-engineering a system.

Several terms are frequently used in the discussion of re-engineering. [19]. Forward
engineering is the process of developing a system by moving from high level abstract
speciﬁcations to detailed, implementation-speciﬁc manifestations [19]. The explicit
use of the word “forward” is used to contrast the process with Reverse engineering,
the process of analyzing a system in order to identify system components, component
relationships, and intended behavior [19]. Restructuring is the process of creating a
logically equivalent system at the same level of abstraction [19]. This process does
not require a semantic understanding of the system and is best characterized by the
act of transforming unstructured code into structured code. A diagram depicting the
relationships between forward engineering, reverse engineering, restructuring, and re-

engineering is shown in Figure 2.1.

2.2.1 A Model of Re-engineering

Re-engineering can be described in terms of a model based on three concepts that are

deﬁned as follows [2]:

Reﬁnement is the gradual decrease in the abstraction level of a system

representation and is caused by the successive replacement of existing

 

 

 

 

 

 

 

 

 

 

 

Rm
Famed
Win
[anthem Design Inﬂammation

Figure 2.1. Relationship between re-engineering terms

 

system information with more detailed information.

Abstraction is the gradual increase in the abstraction level of a system
representation and is created by the successive replacement of existing
detailed information with information that is more abstract. Abstraction
produces a representation that emphasizes certain system characteristics

by suppressing information about others.

Alteration is the incorporation of one or more changes to a system repre-
sentation without changing the degree of abstraction. Alteration includes

the addition, deletion, and modiﬁcation of information.

Figure 2.2 depicts the three concepts of reﬁnement, abstraction, and alteration
within a model of re-engineering, where the left triangle represents the original system,
and the right triangle represents the new system. Abstraction is shown as the arrow
angled upward and corresponds to reverse engineering. The arrow angled downward
depicts reﬁnement and corresponds to forward engineering. The horizontal arrow

shows alteration and corresponds to restructuring.

 

Alteration
ﬂ

Requirements
Abstraction Reﬁnement

Design
/ Implementation \

Figure 2.2. Graphical Depiction of a Re-engineering Model [2]

 

 

 

 

 

 

2.3 Applications of Reverse Engineering

Reverse engineering has many areas of application, including redocumentation and
design recovery. Redocumentation is the creation or revision of a semantically equiv-
alent representation within the same relative abstraction level [19]. The purpose of
redocumentation is to recover or revise documentation that is missing or obsolete.
Often redocumentation is in the form of data flow and control flow diagrams.
Design recovery is the process in which domain knowledge, external information,
and deduction or fuzzy reasoning are added to the observations of a software system
to identify meaningful higher level abstractions beyond those obtained directly by
examining the system itself [19]. In addition to providing the abstractions for the
re-engineering processes of rework and replacement, design recovery can facilitate the

reuse of software components by providing the details of functionality.

2.4 Formal Methods

Although the waterfall development life-cycle provides a structured process for devel-
oping software, the design methodologies that support the life—cycle (i.e., Structured
Analysis and Design) have shortcomings in that they have the potential for intro-
ducing ambiguity, inconsistency, and incompleteness in designs and implementations.
Formal methods used in software development are systematic techniques for specify-
ing, developing, and verifying computer software. Formal methods aid in eliminating
the properties of ambiguity, inconsistency, and incompleteness of a system through
the use of a well-deﬁned speciﬁcation language with a set of well-deﬁned inference
rules that can be used to reason about the speciﬁcation. The remainder of this sec-
tion describes the notation used throughout the thesis and deﬁnes some key aspects

of program semantics.

2.4.1 Propositional Logic

A proposition is a logical expression that represents a truth value of true (T) or
false (F). Operators consist of conjunction (A), disjunction (V), implication (=>), and

equality (=) Propositional logic can be described using the following rules [1]:

1. Tand F are propositions

2. An identiﬁer is a proposition. (An identiﬁer is a sequence of one or more digits
and letters, the ﬁrst of which is a letter.)

3. If b is a proposition, then so is (ab).

4. If h and c are propositions, then so are (I) /\ c), (b V c), (b => c), and (b = c).

Table 2.1 gives the BNF grammar for propositions using precedence rules to elim-

inate parenthesized expressions [1].

10

 

(proposition) ::= (imp-expr)
(proposition) = (imp-expr)
(imp-expr) ::= (expr)
(imp-expr) => (expr)
(erpr) — (term)
I (expr) V (term)
(term) ::= (factor)
I (term) A (factor )
(factor) = -1(factor)
| ((proposition))
I T
I F
I (identiﬁer)

Table 2.1. BNF Grammar for Propositional Logic [1]

 

2.4.2 Predicate Logic

Propositional logic can be extended by replacing identiﬁers in a proposition with any
expression (termed an atomic expression) that can be evaluated to true or false and
by allowing the use of existential (3) and general (V) quantiﬁcation. This extension
is known as predicate logic. Atomic expressions are made up of four different types
of symbols: individual symbols or constants, variables, functions, and predicates. A
predicate is a mapping from a list of constants to the value true or false.

Existential quantiﬁcation can be deﬁned in the following manner. Given the ex-

pression

EjVEj+1V...VEk,

where j and k are integers, and E.- is a predicate, the existential quantiﬁcation oper-

ator (3) can be used to abbreviate the expression in the following way

11

General quantiﬁcation (V) can be expressed in a similar fashion. Given the ex-
pression

EjAEj+1/\.../\Ek,

general quantiﬁcation has the following form

(VisziSlczEg),

where j and k are integers and E,- is a predicate. General quantiﬁcation can be

expressed in terms of existential quantiﬁcation, and vice versa, as is shown below

(Vi: R,- : E.) E '1(3i 3R1“ I“Ei)a

where R.- is a predicate that describes the range, E,- is an arbitrary predicate, and i

is the quantiﬁed variable

2.4.3 Program Semantics

The notation Q { S } R was originally introduced by Hoare [21] to indicate a partial
correctness model of execution, where given that logical condition Q holds, if the exe-
cution of statement S terminates, then logical condition R will hold. A rearrangement
of the braces in the form of { Q } S { R }, in contrast, represents a total correctness
model of execution. That is, if logical condition Q holds, then statement S is guar-
anteed to terminate with logical condition R true. If we consider a state space for
computation then, in terms of some predicate R, we can partition computations into

the following three mutually exclusive classes [22]:

eternal: All computations that fail to terminate

ﬁnally R: All computations terminating in a ﬁnal state with R true

12

finally pR: All computations terminating in a ﬁnal state with pR true.

An analogous characterization of the computation state space in terms of the initial

state of a program can be given as follows [22]:

initially Q: All computations starting in an initial state with Q true.

initially -Q: All computations starting in an initial state with -Q true.

A precondition describes the initial state of a program, and a postcondition de-
scribes the ﬁnal state. Given a statement S and a postcondition R, the following
predicates can be deﬁned in terms of the characterization of the computation state
space [22]:

wp(S, R) The set of all states in which the computation under control of S belongs
to the class “ﬁnally R”.

wlp(S, R) The set of all states in which the computation under control of S belongs
to the class “eternal” or “ﬁnally R”.

That is, the weakest precondition wp(S, R) describes the set of all states in which the
statement S can begin execution and terminate with R true, and the weakest liberal
precondition wlp(S, R) is the set of all states in which the statement S can begin
execution and establish R as true if S terminates. In this respect, wp(S, R) establishes
the total correctness of S, and wlp(S, R) establishes the partial correctness of S. The
wp and wlp are called predicate transformers because they take predicate R and,

using the properties listed in Table 2.2, produce a new predicate. An interesting

 

 

 

wp(S, A) E wp(S, true) A wlp(S, A)
wp(S,/1) => “101145, “Al
wp(S, false) 5 false

 

wp(S, A A B) E wp(S, A) A wp(S, B)
wp(S, A V B) => wp(S, A) V wp(S, B)
wp(S,/1 -+ B) => wp(S,/1) -+ wp(S,B)

 

 

 

 

 

 

Table 2.2. Properties of the wp and wlp predicate transformers

l3

characterization of wlp(S, R) describes the set of computations for which “ﬁnally R”
is merely possible [22].

pwlp(S, pH) The set of all states in which there exists a computation under
control of S that belongs to the class “ﬁnally R”.

This use of wlp is contrasted to wlp(S, R) which, recall, is the set states in which the
computation under control of S belongs to the class “eternal” or “ﬁnally R”.

An analogous characterization can be made in terms of the computation state
space that describes initial conditions using the strongest postcondition sp(S, Q) pred-
icate transformer [22].

sp(S, Q) The set of all states in which there exists a computation under
control of S that belongs to the class “initially Q”.

That is, given that Q holds, execution of S results in sp(S, Q) true. It can be proven
that sp(S, Q) and wlp(S, R) are converses of one another [22]. Therefore, given the
Hoare triple Q { S } R we observe that sp(S,Q) and wlp(S, R) have the following

relationship:

QawMSM,
sp(S, Q) => R.

The importance of this property is two-fold. First, it provides a basis for translating
programming statements into formal speciﬁcations. Second, the symmetry of sp and
wlp provides a method for verifying the correctness of a reverse engineering process

that utilizes the properties of sp.

CHAPTER 3

A Translational Approach to

Reverse Engineering

Reverse engineering of program code into formal speciﬁcations facilitates the utiliza-
tion of the beneﬁts of formal methods in projects where formal methods may not
have previously been used. Program annotations in the form of Hoare triples [21]
provide a natural way for presenting the formal speciﬁcation of sequential programs.
In this chapter we present an approach to reverse engineering that uses the properties

of some well deﬁned predicate transformers to specify programs.

3.1 Approach

Chapter 2 introduced the notation Q { S } R and { Q } S { R } to denote partial
and total correctness, respectively. We will use { Q } S { R } to indicate a general
triple of precondition, statement, postcondition.

When an interpretation is needed in a formal context (i.e., partial correctness, to-
tal correctness), one will be explicitly stated. The general approach described herein
for reverse engineering is a top-down, sequential translation (derivation) of imperative

programming language primitives using the properties deﬁned by the strongest post-

14

15

condition predicate transformer sp. The approach is: Given that some precondition
Q is known, a postcondition R is constructed using the properties of sp as applied to

a statement S.

3.2 Model of Programs with Annotations

Program annotations in the form of preconditions and postconditions can be used to
document, in-line, the speciﬁcation of a program. Given a program P with a sequence

of n statements
SliS2i . - - ;S.--1;Sa; . . - ;Sn_1;Sn

a sequence of speciﬁcations can be used to annotate the program such that the spec-
iﬁcation indicates the “state” of the program at a given point of execution. For

instance, the above sequence can be annotated to appear as

{SP0}51;{SP1}52;~-i{SP2}Si—1i{SP3}Si;---i{SPn—2}S -1;{SPn-1}Sn{SPn}

where speciﬁcations are delimited by the “{ )” notation, and S pj is the speciﬁcation
of program P after execution of statement Sj. Alternatively, we can annotate the

program with comments in the following manner:
{CO}SI;{CI}S2i---i{Ci-2}Si-li{Ci-1}Sii-~-;{Cn-2}S -1;{C' -1}Sn{C'n}

where C.- is a comment annotated to the program after statement S.- and before

statement 5,-4.1. We can model a program P with annotations as a 4-tuple
P(V, S, A1,Value1) (3.1)

where V is a set of variables, S is a sequence of statements, A I is a sequence of anno-

tations, and Value, is an interpretation function, where I denotes the interpretation.

16

The model can be extended to accommodate multiple types of annotations, each with

different interpretations. Consider the sequence

{SPuo}{SPuo}Si; {5Pu1}{SPu1}52; - - - ; 55—1; {Spui-1}{Spvi-1}Si; {SPuiH-S'Pm'} - - - t
(3.2)
where S puj and Spa, each denote annotations for a statement 5,- with different inter-

pretations. The appropriate model would appear as
P(V, 5, (Au, Av}, {Valuem Valuev}) (3.3)

where V and S are as before, A, and Au are annotation sequences, and Value, and
Value” deﬁne the interpretations of Au and A”, respectively.
The remainder of this section describes each of the components of the general

program model and provides an interpretation of Value, for reverse engineering.

3.2.1 Variables

Variable decorations are used to indicate a temporal ordering of logical instances of
a variable. In Z, decorations v? and v! are used with a variable v to indicate logical
input and logical output instances, respectively [23]. Other speciﬁcation languages
use prime (i.e. v’, v”, etc.) and vector decorations (‘5) to indicate temporal order-
ings [24]. In our model, a variable is represented using subscripted integer indices (i.e.
v0, 121,. . .), where an ordering is imposed by the index and each v,- represents a logical
instance. There are three types of values that a logical instance can take. The ﬁrst
is as an unconditional expression. In this case, the statement generating the instance
is an assignment statement. The second type of value is a conditional value, where
the value of the logical instance depends on one or more Boolean conditions. Using

disjunction, a single logical instance can be used to describe a value that is dependent

17
on a Boolean condition. Consider the simple statement
if A then x :8 1 else 2 :8 2 ii

We can describe the i“ instance of the variable x after execution of the statement by
stating x.- = l V 2:.- = 2. The ﬁnal type of instance is a range instance. Again a single

logical instance can be used to describe the variable. Consider the sequence

x :8 0;
j :8 O;
a[] :8 A;
n :8 N;

do j <> n ->
if a[j] 8 T -> x,j :8 x + 1,j + 1
a[j] <> T -> j :8 j + 1
fi
od

where we are interested in the speciﬁcation of the variable :1: after execution of the
do-od statement. In this case, a: can take a value anywhere between 0 and N. We can
describe the i‘“ instance of x by stating (31' : 0 Sj S N : x.- = j).

Notationally, a variable v is a sequence with the following form

‘00, U], . . . , 0);, (3.4)

where for 0 S i S lc, v,- represents a logical instance. The size k of a sequence v

is bounded from above by the number of annotations in a program P. For a given

program P( V, S, A1,Value1), V(P) denotes the set of all deﬁned variables,

V(P) = {ulu is a variable with form given by (3.4)} (3.5)

This treatment of variables makes the following assumptions:

1. All variables in V(P) are local to P.

18

This condition ensures that a variable in V(P) is speciﬁed within the correct

scope. The main motivation is to use the program model to specify subprograms

of P.

2. All variables are untyped.

Symbolically this assumption poses no problems. However, in practice, pro-
gramming languages allow mixed type operations with appropriate transforma-

tions.

3. All variables are simple.

Structured types can be represented as simple variables by decomposing the
structures into the appropriate components. Pointers are also excluded from
the class of variables handled in this formalization. Pointers merit a more

rigorous treatment and are beyond of the scope of this work.

3.2.2 Statements

There are ﬁve main types of statements: assignment, alternation, iteration, sequence,
and procedure call. 5 (P) denotes the sequence of statements of a program P. The
ordering of the statement sequence given by S (P) corresponds directly to the order

found in the source text of a program. The sequence is of the form
S(P) = 81,82,...,Sn (3.6)

where n is the number of statements in the text of the program.

19

3.2.3 Annotations

A(P) denotes the sequence of annotations of a program P. The sequence has form

A(P) = a1,a2, . . . ,an+1 (3.7)

where n is the number of statements in the text of the program and n+1 is the number
of annotations. As stated earlier, there can be any number of annotation sequences,
each with a different interpretation. The interpretation function for predicate logic

annotations has the form

Value] : A —+ 2" (3.8)

where the subscript I identiﬁes a particular type of interpretation of the annotation,
and 2" denotes the set of valid predicate logic expressions. The range for A with
respect to Value, changes depending on the interpretation of the annotation. The

notion of annotation interpretation is described in more depth in Section 3.2.5.

3.2.4 Relating Statements and Annotations

A convenience operation in the model is the association of annotations with a given
statement using prec and succ. Symbolically, prec and succ represent precedes and
succeeds and are deﬁned as follows, where (3.9) and (3.10) are function signatures
and (3.11) and (3.12) deﬁne the semantics for prec and succ, respectively. The prec
function returns the annotation preceding statement S and the succ function returns

the annotation succeeding a statement S.

prec : S —+ A (3.9)

succ : S —> A (3.10)

20

where

prec(s,-) E (lg, 13 i S n,n = [SI (3.11)

and

succ(s,) E cg“, 1 S i S n,n = [S] (3.12)

3.2.5 Annotation Interpretations

Annotations can have many interpretations. For instance, suppose signature 3.8 is in-
terpreted as a weakest precondition assertion using predicate logic, denoted Value”.
Then for some statement 3 we have prec(s) = awcc and succ(s) = am“ such that
Valuewp(a,ucc) = R and Valuewp(ap,cc) = wp(s,R). Additionally, recall that many
different annotation sequences can exist for a given program (See Expressions (3.2)
and (3.3)). So, in addition to the wp interpretation for predicate logic, a second
interpretation to a program can be given using a second sequence of annotations.
Allowing different interpretations for annotations faCilitates the use of different spec—
iﬁcation techniques, each of which can provide an alternative perspective of a given

program.

3.3 An Annotation Interpretation based on sp

The previous section described an annotation interpretation based on the semantics
of the weakest precondition predicate transformer wp. Another interpretation for
annotations provides the framework for the reverse engineering techniques described
in the following chapters using the strongest postcondition predicate transformer sp.
When given a statement and a predicate logic expression, sp derives a predicate logic
speciﬁcation of the statement. The formal presentation of the semantics of sp with

respect to primitive programming structures is given in Chapters 4 and 5. Using the

21

deﬁnition of sp we deﬁne an annotation interpretation, denoted Value”, with the

signature given above by Expression (3.8) and semantics as follows:

Value,p(succ(s,-)) E sp(sg, Value,p(pre(s,-))) A Value,p(pre(s,-)),, (3.13)

 

where s.- 6 S (P),1 S i S |S(P)| for some program P. Note that the second conjunct
(underlined) is a propagation of the precondition for a given statement 3;. How-
ever, also note that we subscript the second conjunct with a p to indicate that the
propagated conjunct is subject to a propagation interpretation, named so in order to
distinguish it from annotation interpretations given by Value 1.

The interpretation of Value, as a strongest postcondition annotation provides
the basis of a model for the creation of formal speciﬁcations from program code.
Thus, given some initial annotation a1, each step in the reverse engineering process
involves the derivation of a formal speciﬁcation of some statement using precondition

assertions and a propagated expression that provides context for the derivation.

3.3.1 Translational Propagation

There are two types of propagation that can be used to interpret the second conjunct
of Expression (3.13). The ﬁrst type of propagation is dependent propagation. In
this interpretation, each propagated item is dependent on the value given by the
translation of a statement by sp. The second type of propagation is independent
propagation, where, as one might expect, the propagated item is independent of
translation.

The difference in the interpretations lies in the deﬁnition of the initial precondi—

tion annotation a1 6 A(P), whose value is given by Value(a1). In the dependent

22

interpretation, the initial annotation has the following value

Valtre,p(al) E (Vv : v E V(P) A k = |v| : (v0 = v") A (v; = e) . . . A (v,c = 6)) (3.14)

where vi is the value of instance 12,- and 6 denotes an undeﬁned value. In the depen-
dence interpretation the initial condition of a program is a speciﬁcation of all the
instances of all the variables of a program. When the initial precondition of a pro-
gram is deﬁned as such, then as a formal speciﬁcation is derived for each statement
3, propagation of Value,p(prec(s)) must be modiﬁed to reﬂect the changes to each
instance of a variable.

In the independent interpretation for propagated items the initial annotation a1

has the following value

Valtte,p(al) E (Vv : v E V(P) : v0 = v“). (3.15)

That is, the initial condition for a program is a speciﬁcation of the values of the initial
instances of all variables. This convention allows for the propagation of statement

preconditions without requiring modiﬁcations to the propagated expression Qp.

CHAPTER 4

Primitive Constructs

In this chapter we discuss the derivation of formal speciﬁcations from the primitive
programming constructs of assignment, alternation, and sequences. The Dijkstra
language [25] is used to represent each primitive construct, but the techniques are
applicable to the general class of imperative languages. For each primitive we ﬁrst
describe the semantics of the predicate transformers wlp and sp as they apply to each

primitive and then describe the speciﬁcation derivation using Hoare triples.

4.1 Assignment

An assignment statement has the form 1:8 e; where x is a variable, and e is an
expression. The wlp of an assignment statement is expressed as wlp(xz8e, R) = Rﬁ',
which represents textual substitution, where every occurrence of a: is replaced by the
expression e using the postcondition R[1]. If :1: corresponds to a vector '37 of variables
and e represents a vector B of expressions, then the wlp of the assignment is of the
form Ry , where each y,- is replaced by E;, respectively, in expression R. The sp of an

assignment statement is expressed as
sp(xz8e,Q) = (3v :: Q: A :r = 6:), (4.1)

23

24

where v is the previous value of x, and ‘::’ indicates that the range of the quantiﬁed
variable x0 is not used in the current context.

Recall from Expression (3.13) that we assume a propagation of preconditions in the
construction of formal speciﬁcations and that the initial precondition of a program is
given by Expression (3.15). In terms of specifying programs using variable instances,
sp is expressed as sp(xz8e,Q) = (335-1 :: Q A x,- = eim), where x,- and 13-1 are
the current and previous values of 1, respectively. We make the conjecture that
the removal of the quantiﬁcation for the initial values of a variable is valid if the
precondition Q has a conjunct that speciﬁes the textual substitution. That is, Q: in
Expression (4.1) is a redundant operation if, initially, Q has a conjunct of the form
a: = v. As such, if given the Hoare triple for a statement 3,, with Statement(s,-) =

x:8e and U = prec(s,~-1), the following annotated sequence is created. q

{(x, = X) A U} /* precondition */
x := e;
{(xJ-H = e;1)A (at,- = X) A U} /* postcondition */

which satisﬁes the sp for assignment.

4.2 Alternation

An alternation statement using the Dijkstra language [25] is expressed as

if
B1 —’ S] i

ll Bn -* Sn;
fi;
where B,- —. S; is a guarded command such that S.- is only executed if logical expression

(guard) B.- is true. The wlp for an alternation statement is given by

wlp(IF, R) :3 (Vi: B.- : wlp(S,, R)),

25

where IF represents the alternation statement. The equation states that the necessary
condition to satisfy R, if the alternation statement terminates, is that given 8; is true,
the wlp for each guarded statement S,- with respect to R holds. The sp for alternation

has form

sp(IF,Q) E (3i :: sp(S,-,B.- A Q)), (4.2)

The existential expression can be expanded to be of the form

sp(IF, Q) E (sash 8. A Q) v - - . v sp(S., a. A 0)). (43)

Expression (4.3) illustrates the disjunctive nature of alternation statements where
each disjunct describes the postcondition in terms of both the precondition Q and the
guard-guarded command pairs, given by B.- and S,-. This characterization is intuitive
in that a statement 8.- is only executed if B,- is true, and that only one of 81-, 1 S j S
n, is executed. The translation of alternation statements follows accordingly from
Expression (4.3). Using the Hoare triple notation, a speciﬁcation is constructed as

follows

{Q}
if
Bl —* 31;

iii. —+ 3..-
fi,’
{(3p(31,Bl A Q) V . . . V sp(SnaBn A Q» A QP}

where Qp is a propagation of the precondition of the statement being speciﬁed.

4.3 Sequence

For a given sequence of statements 31;. . . ; Sn, the notion that the postcondition for

some statement S.- is the precondition for some other statement 5;“ is natural. The

26

wlp and sp for sequences follow accordingly. The wlp for sequences is deﬁned in the

following manner
wlp(Sl 332, R) ’3 wlp(SlaU)lp(S2i R))

Conversely, the sp is

3P(S]§SQ,Q) E 31’(32,SP(SI,Q))- (4'4)

In the case of wlp, the set of states for which the sequence 81 ;S; can execute with
R true (if the sequence terminates) is equivalent to the wlp of S; with respect to the
set of states deﬁned by wlp(Sg, R). For sp, the derived state for the sequence 81:83
with respect to the precondition Q is equivalent to the derived postcondition for S;
with respect to a precondition given by sp(S,, Q). The Hoare triple formulation and

construction process is as follows

{Q}
31;

{ 81451, Q) A Q10}
323

{ sp(32,3p(31.Q)) A Q.» }-

CHAPTER 5

Iterative and Procedural

Constructs

The programming constructs of assignment, alternation, and sequence can be com-
bined to produce straight-line programs (programs without iteration or recursion).
The introduction of iteration and recursion into programs provides for more power-
ful computation ability. However, constructing formal speciﬁcations of iterative and
recursive programs can be problematic, even for the human speciﬁer. This section
discusses the formal speciﬁcation of iteration and procedural abstractions without
recursion. We deviate from our previous convention of providing the formalisms for
wlp and sp for each construct and use an operational deﬁnition of how speciﬁcations
are constructed. This approach is necessitated by the fact that the formalisms for the
wlp and sp for iteration are deﬁned in terms of recursive functions [1, 22] that are in

general difﬁcult to practically apply.

5.1 Iteration

Iteration allows for the repetitive application of a statement. Iteration, using the

Dijkstra language, has the form

27

28

do
81 —* 31;

II B. -> 8.:
od;

In general, the iteration statement may contain any number of guarded commands of
the form 8,- -—> S,-, such that the loop is executed as long as any guard B.- is true.

In the context of iteration, a bound function determines the upper bound on the
number of iterations still to be performed on the loop. An invariant is a predicate that
is true before and after each iteration of a loop. The problem of constructing formal
speciﬁcations of iteration statements is difﬁcult because the bound functions and the
invariants must be identiﬁed. However, for a partial correctness model of execution,
concerns of boundedness and termination fall outside of the interpretation, and thus
can be relaxed.

Gries deﬁnes guidelines for developing loops through the identiﬁcation of loop
invariants [1]. The methods of deleting a conjunct, replacing a constant by a variable,
enlarging the range of a variable, and adding a disjunct can provide insight into
the automated construction of a speciﬁcation from program code. For instance, a
loop written using the method of replacing a constant by a variable must identify
the upper (lower) bound of an incremented (decremented) variable. Furthermore,
determining the statements that ensure progress towards termination is facilitated
by the properties associated with this class of loops. Figure 5.1 gives the steps for
constructing a speciﬁcation for a loop that was developed using the replace a constant
by a variable strategy for the loop invariant. Although these characteristics are more
total correctness in nature, the insight provided by identifying these properties in
loops aids in specifying a loop using partial correctness interpretations.

When no automated strategy can be applied to a loop, the domain expert‘is

 

‘A domain expert is a maintenance engineer who is familiar with the subject system.

29

prompted for the proper speciﬁcation of the statement. The following items are then

identiﬁed in order to conﬁrm that the speciﬁcation of the loop is complete:

a invariant (P): an expression describing the conditions prior to entry and upon
exit of the iterative structure.

0 guards (B): Boolean expressions that restrict the entry into the loop. Execution
of each guarded command, B,- -> S.- terminates with P true, so that P is an
invariant of the loop.

{P /\ B;}S;{P}, for I S i S n

When none of the guards is true and the invariant is true, then the postcondition
of the loop should be satisﬁed (P A “BB -—. R, where BB = B1 V . . . V B" and
R is the postcondition).

 

1. The abstraction algorithm begins with the template for a quantiﬁed expression
of the form . . . .
(Qt : range(z) : expression(z)),

where Q represents one of the quantiﬁer symbols V, 3, E.

2. The quantiﬁed variable(s) are determined by examining the identiﬁers occurring
in guards Bj.

3. The ranges of the quantiﬁed variables are determined by ﬁnding statements
occurring prior to entry into the loop that assign values to incremented (decre-
mented) variables and their occurrences in the guards.

4. For each guarded command, the corresponding statement list includes state-
ments that ensure progress towards termination; the postcondition for the re-
maining statements constitutes expression(i).

5. The bound function becomes the difference between the upper (lower) bound
for a variable that is being incremented (decremented) and its value during loop
iterations.

Figure 5.1. Steps for abstracting the effect of iteration statements

 

30
5.2 Procedural Abstractions without Recursion

This section describes the construction of formal speciﬁcations from code containing
the use of non-recursive procedural abstractions. A procedure declaration can be

represented using the following notation

proc p ( value if; value-result 1;; result 2 );

{PM body ){Q}

where f, y, and '2' represent the value, value—result, and result parameters for the
procedure, respectively. The notation ( body ) represents one or more statements
making up the “procedure”, while {P} and {Q} are the precondition and postcondi-

tion, respectively. The syntactic signature of a procedure appears as

proc p : (inputJype)' —> (outputJype)" (5.1)

where the Kleene star (*) indicates zero or more repetitions of the preceding
unit, input-type denotes the name of an input parameter to the procedure p, and
outputjype denotes the name of an output parameter of procedure p. A speciﬁcation
of a procedure can be constructed to be of the form

{ P: U }

proc p : E0 —> E1

(body)

{ Qi sp(body,U) AU? }
where ED is one or more input parameter types with attribute value or value-result,
and E1 is one or more output parameter types with attribute value-result or result.
The postcondition for the body of the procedure, sp (body, U), is constructed using the
previously deﬁned guidelines for assignment, alternation, and iteration as applied to
the statements of the procedure body.

Gries deﬁnes a theorem for specifying the effects of a procedure call [1] using a

total correctness model of execution. Given a procedure declaration of the above

31

form, the following condition holds [1]
{PRT : Pg? A (vine :: Q5" => 12%)} p(ﬁ,b,E) {a} (5.2)

for a procedure call p(a, b, E), where a, b, and E represent the actual parameters of type
value, value-result, and result, respectively. Local variables of procedure p used
to compute value-result and result parameters are represented using i and 5', re-
spectively. Informally, the condition states that PRT must hold before the execution
of procedure p in order to satisfy R. In addition, PRT states that the precondition
to procedure p must hold for the parameters passed to the procedure and that the
postcondition for procedure p implies R for each value-result and result parameter.
The formulation in terms of a partial correctness model of execution is identical, as-
suming that the procedure is straight-line, non-recursive, and terminates. Using this
theorem for the procedure call, an abstraction of the effects of a procedure call can be
derived using a speciﬁcation of the procedure declaration. That is, the construction
of a formal speciﬁcation from a procedure call can be performed by inlining a pro-
cedure call and using the strongest postcondition for assignment. A procedure call
p(a,b,E) can be represented by the Pascal—like block [1] found in Figure 5.2, where
(body) comprises the statements of the procedure declaration for p. By representing
a procedure call in this manner, parameter binding can be achieved through multi-
ple assignment statements and a postcondition R can be established by using the sp
for assignment. Removal of a procedural abstraction allows for the extension of the
notion of a straight-line program to include non-recursive straight-line procedures.
Making the appropriate sp substitutions, we can annotate the code sequence from

Figure 5.2 to appear as follows:

32

{PR }
if :8E,b; __ __ _
{ 19:35,? - PR§§A§=5§§A§=S;§}
(body)
{Q }
7,: was;
.—- "i ___—z ___‘y,z

$933734 9;; AY-“vz Az—sz}
,6 ='y.2;

.-_ 8.3 —_.SE __.SE
{ R. 319,¢ QRE? Ab— YE? Ac— 23¢}

where 6', B, 7, E, if, and ¢ are the initial values of i, 7 (before execution of the
procedure body), y (after execution of the procedure body), 2, b, and E, respectively.
Using the conjecture of Section 4.1 regarding assignments as well as the facts that
formal and actual result parameters have no initial values, and local variables are
used to compute the values of the value-result parameters, the above sequence
can be simpliﬁed using the semantics of sp for assignments to obtain the following

annotated code sequence:

{PR}

$595.3;
{P.-PRAi=a/\y=3}
(body)

{Q}

Yiiz'ﬁﬁ.
{QR.-QAy=ﬁ¥/\z=v¥}
bizrii;

{12: QRAE=7AE=2}

where Q is derived using sp((body), P).

33

 

begin

w:
NW"

FA-JU a-hn
' A
' any
Hap-4m

end

begin
declare it, y, 2, ﬁ, 7;

{ PR }
5,1);

"x37 :8

{P}

A
u 8‘
a.
Q
I wav
Isl

wOIQNIQ
some.
«swat

Q
‘0 I
8..

2 Ph‘ a‘|~h'~<|.-h
--v I

and

Figure 5.2. Removal of procedure call p(a,li,e) abstraction

 

CHAPTER 6

Moving towards

Ob ject- Orientation

The main characteristics of object-oriented analysis, design, and programming are
encapsulation, modularity, and inheritance. Well-deﬁned object interfaces and a clear
behavioral knowledge about an object can facilitate software reuse. In recent years,
there has been an increase in the development and use of ob ject-oriented programming
languages, including C++, Smalltalk, and Modula-3. Re-engineering efforts have
been focused, in part, to the conversion of software from the imperative paradigm
to the object-oriented paradigm [4, 14, 26]. This chapter discusses an approach for
identifying objects embedded in programs and restructuring formal speciﬁcations to

reﬂect that identiﬁcation.

6.1 Identiﬁcation of Classes Using Formal Speci-

ﬁcations

The methods described in Chapters 4 and 5 produce speciﬁcations that have proper-
ties that are amenable to the identiﬁcation of candidate objects in program code. An

object is a self-contained module that includes both the data and procedures (meth-

34

 

 

 

35

ods) that operate on that data. An object can be considered to be an abstract data
type (ADT), that is, a user-deﬁned data type with a speciﬁc set of allowable oper-
ations. A class is a collection of objects that have common use [27]. This section
describes an aggregation heuristic for identifying objects based on the examination

of procedure signatures.

6.1 .1 Guidelines

Using the above deﬁnition of an object, a set of guidelines for identifying objects is

as follows:

1. Construct a list of all data structures contained in a program. These data
structures should not include primitive types.

2. For each data structure contained in the list of program data structures, group
together the operations that refer to the data structure as input in the syntactic
signature speciﬁcation of the procedure (See Equation 5.1 for the format of
signatures). This grouping along with the associated data structure is an object
candidate.

3. In case of conflicts, (i.e., a procedure contains two non-primitive data structures
as input in the signature) one of three actions can be taken

(a) Determine if one data structure is composed of one or more occurrences of
the other data structure. This step is performed by checking the deﬁnitions
of data structures.

(b) Determine whether the output of the procedure excludes either data struc-
ture. If it can be shown that the procedure does not modify a data struc-
ture then associate the procedure with the data structure that is modiﬁed.

(c) In the cases where no determination can be made as to how to associate
a procedure with a respective data structure, query the domain expert on
the appropriate association.

The process of identifying candidate objects is facilitated by the format of the
formal speciﬁcations for procedures. It is emphasized that the datatypes identified
by this technique are only candidate objects. However, once the object deﬁnition has
been constructed, the formal nature of the speciﬁcation facilitates formal reasoning

about the objects and can aid in the validation of candidate objects as true objects.

36

6.1.2 Speciﬁcation Language

An object speciﬁcation is constructed by collecting the procedure speciﬁcations asso-
ciated to a data structure and declaring the data structure to be a candidate object.
The BNF grammar for the language used to specify potential object classes is given
in Figure 6.1. The deﬁnition uses the roman font to describe non-terminals; bold type
deﬁnes keywords; the Kleene star (*) is used to denote one or more repetitions of the
preceding unit; square brackets (I ]) indicate optional items; parentheses (‘()’) indi-

cate groupings. The non-terminal, expression, represents a predicate logic expression.

 

component = type typemame: has ( method )*
has = has ( datajescription )*
data.description =
variable : type.name
method = method methodmame: (type_name)* —. (typemame)*
in((variable: type-name)*)
local((variable: type.name)*)
out((variable: type-name)*)
{ pre: expression }
{ post: expression }
expression = true
I false
I ( expression )
I - expression
I expression A expression
I expression V expression
I expression => expression
I expression 4:) expression
I ( V variable : type :: expression )
I ( 3 variable : type 2: expression )
I predicatemame [( term (, term)*)]
I

def .
term = expression

term 2 variable
I functionmame [( term (, term)* )]

Table 6.1. Grammar for object speciﬁcations

 

37

6.1.3 Example Identiﬁcation of Candidate Objects

The ﬁrst step of the analysis is to identify the non-primitive data structures of the pro-
gram. This process is performed by examining the signatures of the procedure spec-
iﬁcations and extracting the unique non-primitive data structure names. Analysis of
the program QeS (shown in Figure 6.1) produces the speciﬁcations given in Figure 6.2.
Analysis of the formal speciﬁcations identiﬁes non-primitive data structures named
St, On, and elementtype. In continuing the analysis of the data structure St, the
procedures inns, nts, tp, and po are grouped with St through examination of the spec-
iﬁcation signatures. Procedure pu is initially grouped with St and elementtype, but
further analysis leads to a strict association of pa to St since the input elementtype

is not modiﬁed by pu. The subsequent object deﬁnition for St appears in Figure 6.3.

38

 

progren Qe8(output);

conet
neulength 8 60;

procedure nnq ( ver 0 : Qu );
begin
Q.f :8 nenlength;

Q.r :8 nenlength;

tVPO end:
8t 8 record
t = integer; function utq ( var Q : Qu ) : booleen;
e : urruy[1..nuxlengthJ of elementtype 5.51.
end; if Q.r 8 Q.f then
Qu 8 record ntq :8 true
e : urruyIi..nnnlengthJ of elenenttype; .1..
f. r : integer ntq :8 false
884: end;
elementtype : Qu;
function fr ( var Q : Qu ) : ole-enttype;

procedure nun ( vur 8 : 8t ):
begin
8.t :8 lenlength + 1;
end;
function nte ( 8 : 8t ) : booleun;
begin
if 8.t > nunlength then
ntn :8 true
elee
nte :8 fulee
end;
function tp ( vur 8 : 8t ) : eleuenttype;

begin
if nte(8) then
urite1n(’error’);
elee
tp :8 8.e[8.top]
end;
procedure po ( ver 8 : 8t );
begin
if ntn(3) then
urite1n('error’);
elee
8.t :8 8.t # 1;
end;
procedure pu ( x : St
begin
if 8.t 8 1 then
urite1n(’error’);
elee
begin
8.t :8 8.t - 1;
8.e[8.t] :8 3
end;
end;

elementtype; vur S :

Figure 6.1.

begin
if ntq(Q) then
urite1n(’error’)
elee
fr :- q..[q.r];
end;

procedure en ( n : elementtype; vur Q : Qu );

begin
if Q.r 8 nunlength then
0.x :8 1
else

Q.r :8 Q.r + 1;
if 0.x 8 Q.f then

urite1n(’error’)
elee

Q.e[Q.r] :8 n;

end;

procedure de ( vur Q : Qu );
begin
if ntq(Q) then
urite1n(’error’)

else
if Q.f 8 nexlength then
Q.f :8 1
else
Q.f :8 Q.f + 1;
end;
var
ex-e : St;
en_q : Qu;
begin

(0 QeS Body o)

end

Example Pascal Code

 

39

 

proc rune : St _. St

in( S : St )

out( S : St )

{ pro: true }

( poet: S.t = maxlength + l A true }

proc tp : St —o St x clementtype

in( S : St )

out( 8 : St, tp : elementtype )

{ pre: domain(S) )

{ poet: (((S.t ) maxlength) A (mix = true))A
sp(writeln( ’error ’), true))v
((«(SJ > maxlength) A (mts = false))A
(tp = S.e[S.t])) A domain(S) }

proc po : St _. St.

in( S : St )

out( S : St )

{ pre: domain(S) }

{ poet: (((S.t > maxlength) A (mts = true))A
ep(writeln( ’error ’), true))V
((-(S.t > maxlength) A (mix = false))A
(SJ, = 3.80 + l))/\
domain(S) }

proc pu : St x elementtype —+ St

in( S : St, x : elementtype)

out( 8 : St )

{ pre: domain(S) A domain(x) }

{ post: (34 = l A sp(writeln( 'error’), true)) v
(8(S.t = 1)A(S.t1 = S.to A S.e[S.11 = x))
Adomain(S) A domain(x) }

proc mnqz Qu _. Qu
in( Q = Q“ )
out( a = on )
{ pre: true }
{ poet: QJ = maxlengthA
Q.r = maxlength A true }

proc mtq : Qu -» Qu x boolean

in( Q : Qu )

out( Q : Qu, mtq : boolean)

{ pre: domain(Q) }

{ poet: (Q.r = Q.f A mtq = true)v
(-(Q.r = Q.f) A mtq = false)
A domain(Q) }

proc fr : Qu _. Qu x elementtype

in( Q : Qu )

out( Q : Qu , fr : elementtypc)

{ pro: domain(Q) }

{ poet: ((Q.r = Q.f A mtq = true)A
(sp(writeln( ’error ’), true)))v
((Q.r = Q.f A mtq = true)A
(fr = Q.e[Q.f])) A domain(Q) }

proc en : clementtype x Qu -¢ Qu
in( x : elementtype , Q : Qu )
out( Q : Qu )
{ pre: domain(Q) A domain(x) }
{ poet: ((Q.ro = maxlength A Q.r1 = 1) V
(“(Qmo = maxlength)A
(QJ’; = QJ'o + I)))/\
(((Q.r = QJ) A ep(writeln( ’error’), true)) V
("(QJ‘ = QJ) A (Q..e[Qr] = 1:)”
A domain(Q) }

proc dc : Qu —' Qu

in( Q : Qu )

out( Q : Qu )

{ pre: domain(Q) }

{ post: ((Q.r = Q.’ A mtq = truc)A
(sp(writeln( ’error ’), true)))v
((H(Q.r = QJ) A mtq = false)A
(((Q.f = maxlength) A QJ = 1)V
(ﬁ(Q.fo = maxlength)A
(or. = are + 1))» A domino) }

 

Figure 6.2. A Formal Speciﬁcation of Example Code in Figure 6.1

40

 

typo St:
hue
t : integer;
o : may of olemonttype;

method mm 2 St -. St
in( S : St )
out( S : St )
{ pro: true }
{ pout: S.t = maxlength + 1 A true }
method rnts : St -. boolean
in( S : St )
out( mtu : boolcan )
{ pro: domain(S) }
{ poet:(((S.t > maxlength) A (mts = true))V
(H(S.t > maxlength) A (min = false)” A domain(S) }
method tp : St -v St x elementtype
in( S : St )
out( 8 : St, tp : olementtype)
{ pro: domain(S) }
{ poet: (((S.t > maxlength) A (mix = true)) A sp(writeln( ’error’), true))v
((-v(S.t > maxlength) A (mts = false)) A (tp = S.e[S.t])) A domain($) }
method po : St —0 St
in( S : St )
out( 8 : St )
{ pro: domain(S) }
{ poet: (((S.t > maxlength) A (mts = true)) A sp(writeln( ’error’), true))v
((-(S.t ) maxlength) A (mts = false)) A (8.1; = S.to + 1)) A domain(S) }
method pu : St x olementtypo -o St
in( S : St, x : olernonttype )
out( S : St )
{ pro: domain(S) A domain(x) }
{ poet: (SJ = 1 A sp(writeln( ’error’),true))v
(H(S.t = 1) A (St; = S.to A S.e[S.t1 = x)) A domain(S) }

Figure 6.3. Speciﬁcation of Object St

 

CHAPTER 7

Modeling the Representation and
Abstraction of Imperative

Languages

This chapter presents the rules and representations that embody the application of
the translational approach to reverse engineering for the Pascal language. For ease
in understanding the representations, two complementary description mechanisms
are used. First, a diagramming technique, known as the Object-Modeling Tech-
nique (OMT) [28], is used to represent the object-oriented relationships of program-
ming constructs? Second, the formal speciﬁcation language Larch Shared Language
(LSL) [29] is used to formally specify the abstract data types that support the auto-

mated construction of formal speciﬁcations from program code.

 

‘Note that the OMT approach includes the use of object models, data ﬂow diagrams, and state-
charts. In this discussion, OMT refers exclusively to the use of object models.

41

42

7 .1 Properties of Block Structured Languages

The fundamental concepts of block structured languages originated with the devel-
opment of ALGOL 60 [30] and have since been incorporated into the design of many
programming languages, such as Ada [31] and Pascal [32]. Programs written us-
ing block structured languages are organized into nested blocks, where each block
introduces a new local referencing environment. Because of their widespread and
longstanding use, block structured languages have become a common object of study
by maintenance engineers [33]. In order to analyze and maintain programs written
using block structured languages, an understanding of the rules that govern block
structured languages is necessary. The remainder of this section describes the funda-
mental concepts underlying block structured languages, where Pascal is used as an

example of imperative (procedural) languages.

7.1.1 Static Scope Rules

Static scope rules provide the deﬁnition of the context of a variable declaration within
a program. In order to determine the intended behavior of a subject system, it
is important that the rules that deﬁne the scope of variables within a system are
understood. When abstracting a formal speciﬁcation from program code, the scope
of a variable and its potential values play a major role in expressing the effects of

statements. The static scope rules for block structured languages are as follows [30]:

1. The declarations at the beginning of any block deﬁne the local identiﬁers for a
block.

2. Any identiﬁers referenced within a block for which no local declaration exists re
for to the immediate parental block for a declaration. If no declaration is located
in the parent block then the next ancestor is referenced. This identiﬁcation pro-
cess continues until the declaration is found (success) or no declaration is found
(i.e., the top-most environment is reached and no declaration is present).

43

3. Declarations in nested child blocks are completely hidden from parent blocks
and cannot be referenced by parents or ancestors.

4. Named subblocks in the form of subprograms are members of the parent’s local
referencing environment.

7.1.2 Statements

Imperative programming language constructs generally consist of four different types
of basic statements regardless of whether the language is block structured or not.
The programming constructs are assignment, alternation, iteration, and sequence. It
is important to note that alternation, iteration, and sequence statements can contain
one or more nested statements within the body of the statement. For instance, an

alternation statement in Pascal can appear as

if (a 8 b) then

begin
1 :8 q;
y :8 r;
end;

where the if statement contains a begin-end statement and, additionally, the
begin-end statement contains two assignment statements. This property will be

referred to as the nesting property.

7 .1.3 Procedural Abstractions

A concern in the use of subprograms is parameter association. Knowing the meth-
ods for binding formal and actual parameters as well as determining the parameter
transmission schemes (i.e., value, value-result) of a language are necessary prerequi-
site tasks for the correct abstraction of formal speciﬁcations from program code. The

rules for parameter association in Pascal are as follows [34]:

l. The number of formal and actual parameters for a given function or subprogram
must be identical.

44
2. The types of the parameters must be the same. Actual parameters of type
integer can be coerced to type real.

3. The actual parameters associated with var formal parameters must be variables.
They cannot be constants or expressions.

7 .2 Modeling

In order to facilitate the automated construction of formal speciﬁcations from Pascal
programs, a model of the analysis of properties contained in Section 7.1 has been de-
veloped. This section describes the formal and graphical models used to the represent

Pascal language and to abstract formal speciﬁcations from Pascal programs.

7.2.1 Variables, Symbols, and Types

Section 7.1 discussed properties (including static scope rules) relevant to the reverse
engineering of program code written using imperative block structured languages.
Static‘scope rules deﬁne the methods for determining the context of an identiﬁer
within a program. As is common in compiler construction [35], a hierarchical approach
has been developed for both determining scope and recording histories of identiﬁers
within a program. This approach is implemented in the form of an abstract data type
(ADT) called SymbolTablo.

A SymbolTable is an ADT containing a set of SymbolTabloEloments (re-
ferred to as the symbols set) and a link to a parent referencing environment. Each
SymbolTableElement object in the symbols set represents an identiﬁer, a table of
the values of the instances for that identiﬁer (history), and an index used to reference
the last instance of the identiﬁer in the history table. The SymbolTableElement
objects contained in the symbols set are uniquely identiﬁed within the set by the
name given to each identiﬁer. The notation convention for referring to the name,

history table, and index of the last instance added to the history table for a given

45

identiﬁer i will be i.name, i.history, and i.index, respectively. The formal speciﬁca-
tion of the SymbolTableEloment ADT, written in the Larch Shared Language, is
given in Figure 7.2.1 and is used to deﬁne a method for determining the effects of

certain programming constructs. In the speciﬁcation, SymTabElement is the name

 

SymTabElement : trait

includes Integer

introduces
% % new deﬁnes a new element
% %
new : Sir —r Ste
% % name gives the name of the identiﬁer
% %
name : Ste -+ Str
% % addHist adds history information to an identiﬁer’s table
% %
addllist : Ste, Int, Val -o Ste
% % retrieve gets a speciﬁc instance of an identiﬁer
% %
retrieve : Ste, Int —. Val
% % lastlndex gives the index to the last item for the
% % identiﬁer in the table
% %
lastlndex : Ste -+ Int
% % 6 determines membership in a list
% %
.. E .: Int, Ste —* Baal

asserts V ste,t : Ste, v : Val, str : Str, i, it : Int
“(i E new(str))
t6 addHist(t, if, v) z: (i = if) V (i E t)
retrieve(addHist(t, i, v), il) ==
ifi = it then v else retrieve(t, il)
lastlndex(new(str)) == 0
lastlndex(addHist(t, i, v)) z:
if (i 6 t) then lastlndex(t) else lastlndex“) + l
name(new(str)) == str
name(addHist(ste, i, v)) =2 name(ste)

Figure 7.1. Formal Speciﬁcation of SymbolTableElemont using LSL

 

of the SymbolTableElement ADT, the includes keyword indicates that the def-

46

inition for the Integer type is used, the introduces keyword delimits the signature
of the operators of the ADT, and the asserts keyword introduces the semantics of
the operators, including the types of the arguments for the operators. The formal
deﬁnitions for the operations of SymbolTableElement are used in the next section
in the discussion of the abstraction process.

Figure 7.2 contains a graphical depiction of the SymbolTable ADT using the
OMT notation that shows SymbolTable as an aggregate of zero or more Symbol-
TabloEloments and zero or one SymbolTables, where aggregation is symbolically
represented by a diamond, the “zero or more” relation is represented by the ﬁlled

circle, and the “zero or one” relation is represented by the hollow circle.

 

 

[SymbolTable

T

 

SymbolTableElement

Legend

0-— Aggregate Relation

—0 One to Zero or One Relation
—. One to Zero or More Relation

 

 

 

 

Figure 7.2. OMT Object Model of SymbolTable

 

47

7 .2.2 Statements, Rules, and Formal Speciﬁcations

Section 7.1.2 deﬁned the nesting property of programming statements. For abstraction
purposes, it becomes useful to recognize that nested statements are contained, to some
extent, within a single statement. For instance, consider the code sequence given in
Figure 7.3. This sequence has eleven separate statements including the begin-end
sequences. At the highest level of abstraction this sequence has one statement, the
outer begin-end statement. The next level of granularity contains three statements,
the assignment statement at line 2, the if statement beginning at line 3, and the
assignment statement at line 14. The if statement beginning at line 4 contains two

statements in the form of an assignment statement and a while statement.

 

1 begin

2 n :8 y;

3 if D then

4 if L then

5 n :8 u;

6 else

7 uhile i <> n

8 begin

9 x:Ix+t[1];
10 i :8 i + 1
11 end
12 else
13 n :8 e;

14 q :8 n;

15 end

Figure 7.3. Example Sequence of Pascal Code

 

An object model is a representation in OMT that deﬁnes the hierarchy and static
relationships of object classes. An instance object model depicts a snapshot of the dy-
namic relationships between objects in a system. Figure 7.4 contains the object model
of the Statement ADT, where the triangle symbol denotes inheritance. As such, the

diagram explicitly states that Sequence, Assignment, Alternation, Iteration,

48

 

 

Stucment

.1... m... “.1... .1... [i
_l in :3

‘ c‘l‘t‘ﬁs‘n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Boolean l
Exmsion ,

 

 

Legend

BiennibclueedA

1* OuetoOnea'MoroRellia

 

 

Figure 7.4. Object Model of Statements

 

and ProcedureCall are Statements. A GuardodCommand, also depicted in the
diagram, is a construct that is only executed when a given boolean expression (called
the guard) is true. Furthermore, the diagram depicts the nesting property through
the use of the aggregation symbol. Notice that Sequences can contain zero or more
Statements, and Alternation and Iteration statements can contain one or more
GuardedCommands. Figure 7.5 shows the object instance model for the code se-
quence of Figure 7.3. Each oval represents a particular instance of an object class
and the parenthesized words identify the type of the instance. The remaining label
gives information about the instance.

An implementation of the rules given by Expressions (4.1), (4.2) and (4.4) rely

heavily on the assumption that nested programming statements can be abstracted

49

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ns-xetlil in 1 e 1

 

 

Figure 7.5. Object Instance Model for code sequence of Figure 7.3

 

into single statements with autonomous contexts that depend only on sequence and
procedure calls. In order to support the abstraction of speciﬁcations from program-
ming constructs, the notion of a referencing environment local to a programming
statement is introduced. It is important to note that in order to abstract away the
details underlying an implementation, it becomes necessary to treat programming
statements as “mini-programs” with the assumption that programming statements

are single entry, single exit. Currently, this assumption effectively excludes goto state-

50

ments but facilitates the hierarchical management of the abstraction process using
SymbolTable objects, where one SymbolTable is used per nested sequence.

One of the main purposes of the SymbolTable ADT is to support the con-
struction of speciﬁcations. Recall that the sp of the assignment statement is
sp(x:8oxpr,Q) = (3x0 :: :0 A x = ego). In order for the conjunctive expression

:0 A x = ego to be satisﬁed, expressions must be partially evaluated. The history
capabilities of a SymbolTable directly supports the evaluation of expressions by pro-
viding information about the values of various instances of a given identiﬁer and by
providing a means for storing the effects of an assignment statement. Expressions are
evaluated using textual substitution of the value of the last instance of each identiﬁer
contained in the expression. For example, an expression typically found in a program
might be as follows

q+r—s+t (TU
An expression can be processed such that the identiﬁers in the expression are replaced
with the representation of the last instance of each identiﬁer. Expression (7.1), for
example, can be translated into an internal representation such that each identiﬁer
in (7.1) is replaced with the corresponding internal representation of the last instance
for the identiﬁer. If it is assumed that the last instance for q, r, s, and t are q;, r3,

so, and t1, respectively, then Expression (7.1) would be translated into the following

‘11 + 7‘3 — so + ii, (72)

‘h instance of an identiﬁer. For example, in

where the subscripts represent the i
Expression (7.2), r3 represents the third instance of identiﬁer r. Once the initial
substitution of the last instance for each identiﬁer is completed, the representation
of an instance of an identiﬁer is replaced with the actual value of the instance. For

illustration purposes, assume that the values of instances ql, r3, so, and t; are 5, 9,

so, and x1, respectively. Then Expression (7.2) would be translated into the following

51

5+9—30-I-231. (7.3)

This process repeats until all terms of the expression are either initial instances (e.g.
so), conditional instances, (that is, the instance depends on the evaluation of a number
of logical conditions, a concept explained in the following section), or constant values.

Consider the code sequence in Figure 7.6(a). Using a bottom-up approach to spec-

 

o. (oxo-xsuo)
1. if (e 8 b) then
2. if (c 8 d) then
0. (0 n0 8 X h 0 t) 3. begin
1 if I then 4. x :8 p;
2 ithheu 4.1 (0 :18p200)
3 begin 5. n :8 q
4. n :8 p; 5.1 (8 x2 8 q t 0 8)
5. n :8 q 6. end
6 end 7. else
7 else 8. begin
8 begin 9. n :8 r;
9 n :8 r; 9.1 (O xi 8 r I U 0)
10. n :8 x + q; 10. n :8 n + q;
11. n :8 n - p; 10.1 (8 (s2 8 xi + q) I 0 e)
12. end 11. n :8 n - p;
13. else 11.1 (8 (x3 8 n2 - p) I U 8)
14. x :8 s 12. end
18. (0 ??? 0) 13. else
14. n :8 s

14.1 (8 81 8 s 2 U 8)
15. (0 ??? e)

(a) (b)

Figure 7.6. Code sequences partially annotated with speciﬁcations

 

ify the sequence requires that the deepest level of nesting be speciﬁed ﬁrst, followed
by the next to deepest level, and so on. This process yields the annotated code shown
in Figure 7.6(b), where speciﬁcations are delimited with Pascal comment notation,
‘ (8 8) ’. In our notation we use ‘h’ and ‘ l ’ to denote logical ‘and’ and ‘or’, respectively.
Combining the speciﬁcations of lines 4.1, 5.1, 9.1, 10.1, 11.1, and 14.1 can often lead

to an incoherent speciﬁcation for line 15 due to the multiple instance subscripts for

52

identiﬁer x. As mentioned earlier, each level of nesting in our approach is managed by
using separate referencing environments through the use of SymbolTable objects.
Using this approach we deﬁne the notion of dirty sets and last instances, which will
aid in the deﬁnition of methods for correctly combining the speciﬁcations of nested
statements. These methods can then be used to determine the speciﬁcation such as

that required for line 15.

Deﬁnition 1 (Last Instance)
Assuming that a begin-end sequence is single entry, single exit, then a
last instance is the most recent value of any identiﬁer accessible during

the context of the sequence.

In some cases, the last instance may be the same as the initial instance, while in
other cases the last instance is different. Given that a SymbolTable object, called
symtab, is used to manage an arbitrary level of nesting, we deﬁne the last_instance of

an identiﬁer id to be

retrieve(id,lastIndex(id)) id 6 symtab.symbols

lastinstance(id, symtab) E { undeﬁned otherwise

where retrieve(id,lastIndex(id)) represents the value obtained by referencing the last
item in the history table for identiﬁer id, and the identiﬁer id must be accessible in

the current environment.

Deﬁnition 2 (Dirty Set)
A dirty set is a construct that can be used to determine whether an assign-
ment statement has been performed on an identiﬁer in a nested sequence

of statements.

In addition to being useful for combining speciﬁcations of nested statements, a dirty
set also aids in simplifying speciﬁcations. Formally, a dirty set is deﬁned in the

following manner

53

lastinstance(y, orig) #
lastinstance(y,target) ’

dirty_set(orig,target) E {y e orig I
where orig and target are SymbolTablos of the containing and contained nested
statements, respectively.

The method for abstracting speciﬁcations from Pascal alternation statements uses
three SymbolTablos. The main SymbolTable is used to manage the entire alterna-
tion construct while two auxiliary SymbolTablos are used to manage the identiﬁers
of the guarded command constructs, assuming that the alternation statement is com-
posed of two guarded commands, that is, one guarded command corresponds to the
if case, and the other corresponds to the else case. (For case statements, an auxil-
iary SymbolTable object would be used to manage each separate case.) By using
dirty sets, identiﬁers that are modiﬁed within the scope of the guarded commands
contained within an alternation statement can be determined. That is, a set of all
identiﬁers that are conditional are speciﬁed formally as follows

I = {dirty_set(main, auxl) U dirty.set(main, aux2)},
where main represents the main SymbolTable, and aux] and aux; refer to the Sym-
bolTablos used to manage the guarded commands of the alternation statement.
Upon determining the identiﬁers that ﬁt this criterion, the main SymbolTable is
updated in order to satisfy the following condition:

(Vi: i E I : (31'sz aux; :i.name = j.nameA

lastinstance(j, auxl) = lastinstance(i, orig)) V

(3j :j E aux; : i.name = j.name A

lastinstance(j, auxg) = lastinstance(i, orig)))
which states that all identiﬁers in the dirty set are in one of the auxiliary Symbol-
Tables.
Finally, with knowledge about the values of last instances of identiﬁers that would

result from the execution of nested subblocks, a formal speciﬁcation can be completed.

54

Consider again the sequence of code given in Figure 7.6. The ﬁnal version of the
code with annotated speciﬁcations of the alternation statements using the notation
id{nost}instanco for identiﬁers, where id is the identiﬁer in question, nest is the
level of nesting, and instance is the instance number within the current context, is

given in Figure 7.7.

 

1. if (e 8 b) then

2. if (c 8 d) then

3. begin

4. n :8 p;

4.1 (0 (1(311 8 p0) I 0 t)

6. n :8 q;

5.1 (0 (si3l2 8 qO) I 0 t)

6. end

6.1 (0 ((xtan - p0) s (11(3): - (10)) s u .)

7. else

8. begin

9. n :8 r;

9.1 (8 (si3)1 8 r0) I 0 0)
10. n :8 (n + q);
10.1 (0 (11(3): - to + «10) s u t)
11. n :8 (u - p);
11.1 (8 (si3l3 8 ((r0 + qO) - p0)) I 0 0)
12. end
12.1 (0 ((s{3}1 8 r0) I (1(312 8 r0 r qO) I (31313 8 r0 e qO - p0)) I U 0)
12.2 (0 (((co 8 d0) I ((x(3}1 8 p0) I (still 8 q0))) I
12.3 ( not(c0 - «10) e (txtsn - r0) s (11(3): - r0 + qO)
12.4 I (1(111 8 r0 4 qo - p0)))) I U 8)
13. else
14. n :8 s;
14.1 (0 (x(1)1 8 s0) I 0 t)
15. (0 (((eo - 1.0) e (((co . «10) 1 «1(3): - p0) s (xtoh - q0))) I
15.1 (not(c0 8 d0) I ((x{3}1 8 r0) I (3(3l2 8 r0 + qO) I
15.2 (1(011 8 (r0 + qO - p0)))))) I
15.3 ((not(u0 8 b0)) I (x{0)1 8 sO))) I 0 8)

Figure 7.7. Example with speciﬁcations annotated by the AUTOSPEC [3] system

 

7 .3 Example

The following example demonstrates the use of four major programming constructs

described in this paper (assignment, alternation, sequence, and procedure call) along

55

with the application of the translation rules for abstracting formal speciﬁcations from
code. The program, shown in Figures 7.8(a) and 7.8(b), has four procedures, including
three different implementations of “swap”. AUTOSPBC [3, 15, 13] is a tool that we
have developed to support the derivational approach to the reverse engineering of
formal speciﬁcations from program code. Figures 7.9 and 7.10 depict the output of
AUTOSPEC when applied to the program code given in Figures 7.8(a) and 7.8(b),
respectively, where the notation id{scopo}instanco is used to indicate a variable id
with scope deﬁned by the referencing environment for scope. The instance identiﬁer
is used to provide an ordering of the assignments to a variable. The scope identiﬁer
has two purposes. When scope is an integer, it indicates the level of nesting within the
current program or procedure. When scope is an identiﬁer, it provides information
about variables speciﬁed in a different context.

Of particular interest are the speciﬁcations for the swap procedures given in Fig-
ure 7.9 named suapa and svapb. The implementation of suapa uses an arithmetic
based algorithm for swapping the values of two Variables, while swapb uses a tem-
porary variable algorithm. Although each implementation of the swap operation is
different, the code in each procedure effectively produces the same results, a prop-
erty appropriately captured by the respective speciﬁcations for suapa and svapb.
In addition, Figure 7.10 shows the formal speciﬁcation of the funnyswap procedure.
The parameter passing scheme used in this procedure is pass by value, a property
reﬂected by the speciﬁcation of the effects of the call to funnyswap in Figure 7.10.
In the speciﬁcation, no variables local to the scope of the call to funnysvap are af-
fected, and thus the speciﬁcation shows no change in variable values. The procedure
Findl‘laxliin provides another example of the speciﬁcation of alternation statements,
with the speciﬁcation of the procedure shown in Figure 7.9, and the effect of the call
to the procedure given in Figure 7.10. This example illustrates the use of textual

substitution in the speciﬁcation of the eﬂects of a call to a procedure that computes

56

the maximum and minimum of two variables.

 

progran Iainn ( input. output );

var
a, b. c. Largest. Seallest : real;
procedure Findlanlin( lunOne, lunfnozreal;
var Ian. linzreal );

begin
if lunone > luntuo then
begin
Ian :8 luIDne;
Iia :8 lunfuo;
end
else procedure funnysnap( lzinteger; Yzinteger );
begin
Nan :8 lunTno; var
lin :8 lunnne; tenp : integer;
end begin
end; tenp :8 X;
l :8 Y;
procedure euapa( var Xzinteger; var Yzinteger ); Y :8 tenp
end;
begin
7 :8 Y 0 I; begin
1 :8 Y - X; a :8 6;
Y :8 Y - l; b :8 10;
end; suapa(a.b);
euapb(a.b);
procedure seapb( var Xzinteger; var Yzinteger ); funnysuap(a,b);

VDI'

tenp : integer;

(a)

Findlanﬂin(a,b,Largest.Snallest);
c :8 Largest;
end.

(*3)

Figure 7.8. Example Pascal program

 

57

 

program lanlin( input, output );

var
a. b, c. Largest, Smallest : real;

procedure Findlanlin( lumDne, lumfuozreal; var Ian, linzreal );

begin
if (lun0ne > lumTIo) then
begin
lax :8 lumone;
(8 lan{2)1 8 lumOneO I U 8)
lin :8 lumTuo;
(0 Nin{2}1 8 lumTuoO I 0 8)
end
(8 (lan(2}1 8 lunOneO I lin{2}1 8 lumTuoO) I U 0)
else
begin
Ian :8 lamina;
(0 lan(2}1 8 lumTuoO I U 8)
Nin :8 lumDne;
(0 Nini2ll 8 lumOneO I U 8)
end
(8 (lan{2}1 8 lumTuoO I Nin{2}1 8 lumOneO) I U 8)
(8 (((lumOneO > lumTuoO) I
(IIIIOI1 8 IanneO I lini011 8 IIITIOO)) l
(not(lum0ne0 > lent-00) I
(laxf0)1 8 lumfuoo I Iin{0}1 8 lum0ne0))) I U 8)
end
(8 (((lumOneo > lumTuoO) I
(II8{O)1 8 Inloneo I Iini0}1 8 IuITI00)) I
(not(lum0ne0 > lumTuoO) I
(laxiol1 8 IumTuoO I lin{0}1 8 lumOne0))) I U 0)

procedure suapa( var X:integer; var Yzinteger );

begin

7 :8 (Y + X);

(0 (7(011 8 ('0 + 10)) I U a)

X :8 (Y - X);

(0 (1(0): - ((10 + 10) - 10)) a u a)

Y :8 (Y - X);

(8 (YIOI2 8 ((70 + X0) - ((70 + 10) - 10))) I 0 t)
and
(8 (Y{O}2 8 to I 1(0): 8 10 I 1(0}1 8 to + 10) I 0 8)

procedure suapb( var lzinteger; var Yzinteger );

var
temp : integer;

begin
temp :8 X;
(0 (temp{0}1 - 10) a u o)
X :8 Y;
(0 (H0)! 8 to) a 0 0)
Y :8 temp;
(— (1(0): - 10) a u 0)
end
(0 (7(0)1 8 10 I 2(0)1 8 Y0 I tenp{0}1 8 10) I U 0)

Figure 7.9. Output created by applying AUTOSPEC to example

 

 

58

 

procedure funnysuap( l:integer; Yzinteger );

var
temp : integer;

begin
temp :8 X;
(8 (temp{0}1 8 10) I U 8)
l :8 Y;
(8 (XiOI1 8 Y0) I 0 8)
Y :8 temp;
(8 (Yi0)1 8 20) I 0 e)
and
(8 (rte): . so a 1(0): - to a temp{0}1 - 10) a u .)

begin
a :8 5;
(8 I(0}1 8 5 I U 8)
b :8 10;
(8 b{0}1 8 10 I U 8)
suapa(a,b)
(8 (DIOIZ 8 5 I
(a(0)2 8 10 I
(Yienapa)2 8 5 I (stuapa}1 8 10 I Y(suapa}1 8 15)))) I 0 8)
suapb(a,b)
(8 (5(0): 8 10 a
(a(0)3 8 5 I
(Y{suapb}1 8 10 I (steapbli 8 5 I tempfsuapbli 8 10)))) I U 8)
funnysuap(a,b)
(8 (Y{O}1 8 5 I lifunnysuapli 8 10 I temp{funnysuap}1 8 5) I U 8)
Findlanﬂin(a,b,Largest,Snallest)
(8 (3mallest{0}1 8 Nianindlaxlinli I
Largest{0}1 8 Iaaniudhanlinli I
(((s > 10) a
(NasfFindlanﬂinli 8 5 I liniPindNanliali 8 10)) I
(not(5 > 10) I
(lan(Findlanlin)1 8 10 I lianindNasﬂinli 8 5)))) I U 8)
c :8 Largest;
(8 chI1 8 Ianffindlanlinll I U 8)
end
(8 ((CIOI1 8 IIIIFindlanlinII) I
( Smallest{0}1 8 lin{Findlanlin}1 I Largest{0}1 8 Ian{FindIanlin}1 I
(((5 > 10) I
(laxiFindlanlinl1 8 5 I IintFindlanﬂiII1 8 10)) l
(not(5 > 10) I
(laxifindlaxﬂinli 8 10 I liniFindﬂanlinli 8 5)))) I
( Y{funnysuap)1 8 5 I X{funnysuap)1 8 10 I temp{funnysuap}1 8 5 ) I
( 5(0}3 8 10 1
IN” 8 5 a
(Yieuapbli 8 10 I stuapbll 8 5 I tempfsuapbli 8 10)) I
( bi0}2 8 s a
a{0}2 8 10 I
(stuapal2 8 5 I stuapal1 8 10 I Y{suapa)1 8 15)) I
(b{0}1 8 10 I a(0)1 8 5))))) I U 8)

Figure 7.10. Output created by applying AUTOSPEC to example (cont.)

 

CHAPTER 8

Related Work

Many approaches have been suggested for reverse engineering program code. This
chapter discusses reverse engineering efforts that use formal and semi-formal ap-
proaches for the reverse engineering of program code into procedural and object-

oriented formal speciﬁcations.

8.1 Formal Approaches

This section describes reverse engineering methods that take advantage of logical and
mathematical properties of programs for the abstraction of program function as well

as in the representation of program speciﬁcations.

8.1.1 Function Abstraction

It has been shown that program ﬂowcharts can be decomposed into basic constructs
that represent sequence, alternation, and iteration [36]. Using these concepts, a
strategy outlining an approach for abstracting function behaviour from programs has
been suggested [37]. The strategy takes advantage of the mathematical properties
of structured programs in order to to abstract program function. The end result

is the determination of a precise representation of a program’s function. At the

59

 

60

heart of the approach is the concept of proper programs, prime programs, and small
primes. A proper program is a ﬂowchart program with a single entry and single
exit. An irreducible proper program is known as a prime program, and a small prime
consists of the most atomic of program statements including sequence, alternation,
and iteration.

Using the approach deﬁned by Linger et al. a speciﬁcation is constructed by de-
termining the function of each small prime contained in a program.

Overall, the approach to abstracting program function proceeds as follows:

1. Restructure the object program into a structured format.
2. Analyze the program data in order to localize the scope of variables.

3. Compute the function of each prime.

The restructuring step is essential since the rest of the approach relies on structured
programming constructs as the basis for function abstraction. Special emphasis is
put on the second step in order to reduce the difﬁculty of abstracting the speciﬁca-
tions from the code. The ﬁnal step involves a number of algorithms centered around
identifying the function of each program prime.

Sequence statements, at the most basic level, consist of assignment statements of

the form:

where x is a variable and e is some expression. A trace-table technique is used for
determining the function of sequence statements. The purpose of the trace-table is

to record the effect of an assignment on the state space of all variables in the same

1 1'. .._-.—

 

ﬁud- "

 

61

scope of the statement. For example, consider the following sequence of code:

X := X+Y;
Y = X—Y,
X := X—Y;

A trace-table containing a column for each variable within the current scope of exe-
cution can be constructed using a subscripted notation (i.e. X ,-) to denote the value
of a variable X in each separate state i. The trace table for the sequence shown above

would appear as [37]:

 

Statement Value of X Value of Y
Initially X0 Y0
X2=X+Y X1=X0+Y0 Yi=Yo

Y2=X-Y X2=X1=X0+Yo Y2=X1-Y1=Xo
X:=X-—Y X3=X2-Y2=Yo Y3=Y2=Xo

 

 

Analysis of the trace-table allows for the determination that the sequence of code
performs a swap of the values of X and Y.
Alternation abstraction is based on analyzing the guarded command conditions

of the statements. An alternative statement is one that takes the form:

IF ETHEN CI ELSE C2 ENDIF

where B is a guard and Cu is a guarded command. The speciﬁcation created from
abstracting the function from alternative statements involves the use of the conditions
in the statement and the speciﬁcation of the subsequent guarded statements. For

example, consider the following statement

IF b THEN g ELSE h END-IF,

 

62

where b is a guard, and g and h are statements. A speciﬁcation is constructed with

the following form

([b] = true -> [gIIIb] = false —+ [h]),

which states that whenever [b] is true, perform statement g, and whenever [b] is false,
perform statement h.

Linger et al. describe a number of strategies for understanding the function of
iterative constructs. One approach is through the use of program slicing. Program
slicing is a technique whereby the effect of a loop on an isolated variable is speciﬁed
for each of the variables in the loop. Upon completion, the speciﬁcations are com-
bined to in order to determine a speciﬁcation for the entire loop. Another proposed
approach for understanding loops uses pattern matching with the help of superpat-
terns. Superpatterns are templates that outline common formats of programming
constructs, particularly loops. This proceeds by performing program slicing on a

loop and comparing the results with a number of standard superpatterns.

8.1.2 A Knowledge Based Transformational System

Transformational systems have been widely used for forward engineering. The appli-
cation of transformational systems to reverse engineering has been proposed by Ward
et al. [38, 39, 40]. The approach relies on proving that two versions of a program are
equivalent. A kernel language that includes both imperative programming constructs
and general speciﬁcations as part of a single language is deﬁned. By using a kernel
or Wide Spectrum Language (WSL) that is mathematically based, the problem of ab—
stracting a speciﬁcation of a program can be reduced to ﬁrst transforming a program
into another form (a speciﬁcation) and then proving the equivalence of the original
and the new form.

The Maintainer’s Assistant is a system that incorporates the notions of Wide

 

 

63

Spectrum Languages and a Knowledge Base for performing transformations of pro—
gram code into speciﬁcations. The rest of this section outlines the major aspects of

the Maintainer’s Assistant.

Wide Spectrum Languages

Software development classically proceeds through multiple stages ranging from re-
quirements speciﬁcation to program development. Figure 8.1 illustrates this con-
cept [39]. Typically each stage uses a different language to express the problem.
For instance, natural language may be used for requirements stage So while a formal
speciﬁcation language may be used for speciﬁcation stage Sn as can be seen in F ig-
ure 8.2 [39]. A WSL provides a uniﬁed language for use in all stages starting from So

and ending in P. The WSL encorporates both executable imperative constructs as

SOHSIH52H...HSnHP

Figure 8.1. Stages of Development

SLOHSLlHSLzH...HSLnHPL

Figure 8.2. Languages for Development

well as non-executable speciﬁcations. By providing a strong mathematical basis for a

 

 

64

WSL, proving a transformation can be performed by proving the equivalence of two

formulae [39].

Knowledge Base

To assist in performing transformations of programs into speciﬁcations the Main-
tainer’s Assistant encorporates the use of a knowledge base [40]. This knowledge
base is centered around identifying programs plans. Program plans can be divided
into two classes, General Program Plans and Program Class Knowledge. Use of a
general plan knowledge base aids in identifying commonly occuring activities in pro-
grams. Program Class knowledge is used for speciﬁc types of activities. For instance,
program class knowledge could exist for maintaining a C compiler. This knowledge
would include information on design decisions typically found in developing a com-

piler.

Tbansformations

Transformation has been used in many instances for forward engineering tasks [41].
Transformation is the action of substituting part of a program with a new part that
is meaning preserving (i.e., the new part computes the same thing as the old part).
There are typically three steps to transformations. The ﬁrst is a determination of
where to apply a transformation, next is a determination of what transformations to
apply, and ﬁnally creation of a new program part to replace the matched pattern.
Transformations can take two forms, vertical and lateral. Vertical transformations
with respect to program development are used to move from an abstract representa-
tion to a concrete one while lateral transformations are used to specify equivalence
between two expressions at the same level of abstraction.

There are several types of transformations supported by the Maintainer’s Assis-

tant. Two approaches are used to perform transformations; the catalog approach,

 

 

65

and the generative set approach. The catalog approach uses the knowledge base de-
scribed in the previous section. The generative set approach uses a set of elementary
transformations to build more complex transformations.

The Maintainer’s Assistant has the ability to perform the following types of trans-

formations:

Syntactic Tl'ansformations: This type of transformation is synonymous with re-
structuring. ,

Action Systems: These transformations are used to simplify languages that allow
goto statements.

Non-Terminal Recursion: Used to remove non-terminal recursion by using a stack
of “postponed obligations”.

Local Semantic 'D'ansformations: These are transformations for changing the
ﬂow of computation.

8.2 Ob ject-Orientation

Object-Oriented Analysis, Design, and Programming have become increasingly pop-
ular in recent years. The encapsulation, inheritance, and reuse properties of Object-
Oriented Programming has made the approach attractive to many software devel-
opers. A desire to migrate existing systems from an imperative paradigm to an
ob ject-oriented paradigm has prompted the development of a number of reverse engi-
neering methods for facilitating the migration. This section describes two approaches

for abstracting objects from imperative programs.

8.2.1 Object Identiﬁcation

One approach to identifying objects in procedural code has been proposed by Liu
and Wilde [14]. Identiﬁcation is centered around the characterization of candidate

objects based on common routines, data types, and data items. These properties are

 

.__._..--“A A ——
I
-.
l

 

66

arranged as a tuple of the following format:

CandidateObject = (F, T, D)

where F is a set of routines, T is a set of types, and D is a set of data objects.
Candidate objects may not be completely disjoint which means that there may be an
overlap between data objects and routines. Additionally, there is no clear distinction
between an object and an object class. This deﬁnition of a candidate object differs
from the classical notion of objects but is necessary.

Candidate objects can overlap since rejection would limit the domain and exclude
the manifestations of objects whose only failing is that they were produced using
poor programming practices. The lack of a clear distinction between classes and class
objects is due to the fact that it may be easier in some cases to identify object classes
and then identify instances of the classes. In other cases it may be easier to reverse
this process by identifying the instances of the classes, and then the classes.

There are two methods of object identiﬁcation. The ﬁrst uses global and static
data and relates operations that operate on that data. The second uses data types

and relates routines that use the data types as parameters or return values.

Global Based Object Identiﬁcation

The method for identifying candidate objects based on global and static data uses

the following steps [14]:

1. For each global variable x, let P(x) be the set of routines that use x.

2. Using each P(x) as a vertex, construct a graph G = (V,E) such that:

V = {P(x) I x is shared by at least two routines}
E = {P($1)P($2) I (”(3001712) # Q}

where P($1)P($2) is an edge joining two vertices P(xl) and P(l‘g).

 

 

67

3. Construct a candidate object tuple (F, T, D) from each strongly connected com-
ponent C = (VC,E'C) in G such that:

F = UP(x)ch P(J’)
T 9
D = UP(x)€V¢{x}

The major drawback to using the global based object identiﬁcation method is that

the candidate objects that are found can often be very large since all functions that
reference a global variable will be included in the F component of the candidate object
tuple. Other procedures will be needed to constrain the search so that acceptable

objects are identiﬁed.

Typos Based Object Identiﬁcation

The second type of object ﬁnding method is a types-based algorithm. The steps to

this method are as follows [14]:

1. (Ordering) A topological order of all types must be deﬁned such that:

(a) If type x is used to deﬁne type y, then x is a sub-type of y and y is a
supertype of x, denoted by x < y

(b) If x < y and y < 2 then x < z (Transitivity).

2. (Initial Classiﬁcation) A relationship matrix R(F, T) is constructed where rows
are routines and columns are types of formal parameters and return values.
Values of R(F, T) are either 0 or 1 with 1 indicating that for an entry R(f,t), t
is a subtype of the type of a formal parameter or of a return value of routine f.

3. (Classiﬁcation Adjustment) For each row f in the matrix R, set R(f,t) to 0 if
there are any other supertypes of t in the same row that has been marked 1.
This handles the fact that a supertype dominates the role of classiﬁcation.

4. (Grouping) Collect routines into groups based on the sharing of types. Routines
f1 and f; are related if there exists a type t such that R( f1, t) = R(f2,t) = l.

5. (Construction) Construct a candidate object (F, T,D) from each group where:

F = {fl the routine [is a member of the group}
T = {t I R(f,t) = I for somefin F}
D = a

68

As was the case with the previous method, the types based object ﬁnding method

can identify classes that are too large.

8.2.2 REDO - Objects

REDO (Restructuring, Maintenance, Validation and Documentation of Software Sys-
tems) is an Espirit II project whose objective is to improve applications by making
them more maintainable through the use of reverse engineering techniques. The ap-
proach in reverse engineering of COBOL involve the development of general guidelines
for the process of deriving objects and speciﬁcations from program code as well as pro-
viding a framework for formally reasoning about objects [4, 42, 43]. This approach
also involves the identiﬁcation of objects as well as the systematic speciﬁcation of
classes using an extension to the Z notation, known as Z++.

The approach involves three stages deﬁned as follows

1. Translation of COBOL to UNIFORM, an intermediate language

2. Creation of outline objects through the use of data ﬂow diagrams. Code is
partitioned into single-entry, single-exit functions called phases. These phases

are abstracted into functions.

3. Simplifying transforms are applied to the abstracted functions and the objects
identiﬁed in the previous stage. The functions are incorporated into the objects

and a formal speciﬁcation using Z and Z++ is created.

'D'anslation from COBOL to UNIFORM

The translation to UNIFORM is performed in order to add extra information about

the subject system and to eliminate redundant or obsolete programming structures.

 

69

Higher Level Abstractions

A technique known as phasing is used to for identifying ﬂows of information from data
structure to data structure. For example, consider the code contained in Figure 8.3 [4].

The phasing process identiﬁes the following phases:

 

OPEN INPUT filo-1;
PERFORM process file-1;

OPEN OUTPUT file-2;

OPEN OUTPUT filo-3;

OPEN INPUT filo-4;
PERFORM process filo-4;

CLOSE filo-1;

CLOSE file-2;

CLOSE filo-3;

CLOSE file-4;

Figure 8.3. COBOL File Operations [4]

 

e ﬁle-1, ﬁle-4 —> ﬁle-2, ﬁle-3
e ﬁle-4 —-> ﬁle-2, ﬁle-3

e ﬁle-4 —-) ﬁle-3

These phases identify the possible ﬂows of information from the data structures on
left hand side to data structures on the right hand side. Identifying phases in a
program provides insight to the existence of objects. A technique for abstracting the
functionality of program phases is used to reduce programming structures into two

essential constructs [44]: functional composition and conditional expressions.

70

Simpliﬁcation and speciﬁcation

Once data types, the attributes of the data types, and the operations of the data types
have been abstracted from code they are formalized by generating a speciﬁcation in
a formal notation. The Z++ is an extension of the Z notation that adds object-
oriented properties to the speciﬁcation language. A Z++ schema consists of four
distinct sections for describing attributes (OWNS), predicates of the properties of the
internal state of a class object (IN VARIA N T), types of operations available to a class
(OPERATIONS), and deﬁnitions of the operations available to a class (ACTIONS).

The use of a formal notation provides advantages over the approach described
in Section 8.2.1 by allowing for the formal reasoning about the objects once the

speciﬁcation has been created.

8.3 Comparison of Approaches

The approach to reverse engineering described in this thesis uses a translational tech-
nique based on the semantics of the strongest postcondition predicate transformer.
This differs from the Functional Abstraction approach of Linger et al. in that their
approach is informal and assumes that the combination of speciﬁcations constructed
for small primes will result in a formal speciﬁcation for a whole program.

The Maintainer’s Assistant is a tool that uses a combination of transformation
and knowledge based approaches. This differs greatly from the approach presented in
this thesis in that much of the burden of speciﬁcation construction relies on choosing
transformations to apply to a program written using a WSL. The potential number of
transformations needed to construct a speciﬁcation for a program can be large, even
for small programs.

The approach suggested by Liu and Wilde for identifying objects in program code

provides the basis for the approach described in this thesis. Our approach, however,

 

 

71

differs in that we use formal speciﬁcations as a means for identifying objects, and the
level of granularity differs greatly due to the fact that we consider structured types
only.

The approach taken by the REDO project for identifying objects is tightly coupled
to COBOL and UNIFORM. Their use of formal speciﬁcations in the form of Z and

Z ++ is, however, attractive because the formal notations allow for formal reasoning.

 

CHAPTER 9

Conclusions and Future

Investigations

The level of abstraction of speciﬁcation constructed using the techniques described
in this thesis are at the “as-built” level, that is, the speciﬁcations contain implemen-
tation speciﬁc information. For straight-line programs (programs without iteration
or recursion) the techniques described herein are complete in that a fully automated
construction of a formal speciﬁcation from program code is achievable. As such,
automated techniques for verifying the correctness of straight-line programs can be
facilitated.

The Object Modeling Technique (OMT) is a method for modeling system require-
ments through the use of three complementary diagramming notations. The OMT
notation is attractive because it provides an easy to use, visually-oriented notation
for capturing system requirements and other types of high level information about
a system. In practice, the lack of formalism in the notations can be problematic.
However, a formalization of OMT in terms of algebraic speciﬁcations has recently
been developed [45] for describing the states and state transitions of a system.

Future investigations will focus on three areas. First, a method of reverse en-

gineering that encompasses all major facets of imperative programming constructs,

72

73

including iteration and recursion will be developed. Second, methods for constructing
higher level abstractions from lower level abstractions will be investigated. Finally, a
rigorous technique for re—engineering speciﬁcations from the imperative programming
paradigm to the object-oriented programming paradigm will be created.

Iteration and recursive procedure calls pose potential difﬁculties since we need
to construct a postcondition with respect to bound functions, 100p invariants, and
termination conditions. The investigations into this component will involve extending
the existing rules for assignment, alternation, iteration, and procedure calls to include
a generalized procedure for handling iteration and recursion.

“As-built” speciﬁcations have the property of being too tightly coupled to the pro-
grams that they describe. That is, they suffer from implementation bias. The second
component of this research is an investigation into creating more abstract representa-
tions of the “as-built” speciﬁcations. Diagrams, in the form of statecharts, data ﬂow
diagrams, and object diagrams, provide an informal speciﬁcation of systems and can
be constructed both manually and automatically. OMT is a notation that uses each
of these diagramming techniques. A formal treatment of OMT in terms of algebraic
speciﬁcations coupled with the constraints provided by “as-built” speciﬁcations will
be used to facilitate the task of abstracting higher level speciﬁcations from the lower
level “as-built” speciﬁcations. Additionally, this integration of techniques will provide
the basis for constructing high level requirements in terms of object-oriented models
and algebraic speciﬁcations.

A prototype to support the construction of “as-built” speciﬁcations will be devel-
oped. As each subsequent component of the proposed investigations is performed (i.e.
construction of higher levels of abstraction and creation of object-oriented models),
the prototype will evolve to reﬂect the new capabilities. The results of the investiga-
tions will be encapsulated in the form of a prototype that will support the methods for

the partial automation of the construction of object-oriented speciﬁcations by way of

 

74

a shift in paradigms. That is, the main target is to develop a tool that will aid in the
construction of algebraic speciﬁcations from programs. This will involve using “as-
built” speciﬁcations as constraints for identifying objects embedded in speciﬁcations
of imperative program code and specifying objects using an algebraic speciﬁcation
language. That is, we will be performing the re—engineering of formal speciﬁcations
for an imperative system into a formal speciﬁcation for an object-oriented system.
Using the integration described above for OMT and the formal strategy for reverse
engineering, a set of transformations will be developed for reconstituting the high
level speciﬁcations for the original imperative system into a high level speciﬁcation

for an object-oriented system.

BIBLIOGRAPHY

BIBLIOGRAPHY

[l] D. Gries, The Science of Programming. Springer-Verlag, 1981.

[2] E. Byrne, “A Conceptual Foundation for Software Re-engineering,” in Proceed-
ings for the Conference on Software Maintenance, pp. 226—235, IEEE, 1992.

[3] B. H. Cheng and G. C. Gannod, “Abstraction of Formal Speciﬁcations from
Program Code,” in Proceedings for the IEEE 3rd International Conference on
Tools for Artiﬁcial Intelligence, pp. 125—128, IEEE, 1991.

[4] H. Haughton and K. Lano, “Objects Revisited,” in Proceedings for the Confer-
ence on Software Maintenance, pp. 152—161, IEEE, 1991.

[5] N. G. Leveson and C. S. Turner, “An Investigation of the Therac-25 Accidents,”
IEEE Computer, pp. 18-41, July 1993.

[6] V. S. Flor, “Ruling’s Dicta Causes Uproar,” The National Law Journal, July
1991.

[7] J. M. Wing, “A Speciﬁer’s Introduction to Formal Methods,” IEEE Computer,
vol. 23, pp. 8—24, September 1990.

[8] S. L. Gerhart, “Applications of formal methods: Developing virtuoso software,”
IEEE Software, vol. 7, pp. 7—10, September 1990.

[9] N. G. Leveson, “Formal Methods in Software Engineering,” IEEE Transactions
on Software Engineering, vol. 16, pp. 929—930, September 1990.

[10] R. A. Kemmerer, “Integrating Formal Methods into the Development Process,”
IEEE Software, vol. 7, pp. 37—50, September 1990.

[11] A. Hall, “Seven myths of formal methods,” IEEE Software, vol. 7, pp. 11—19,
September 1990.

[12] B. H. Cheng, “Synthesis of Procedural Abstractions from Formal Speciﬁcations,”
in Proc. of COMPSAC’9I, pp. 149—154, September 1991.

75

 

 

76

[13] G. C. Gannod and B. H. Cheng, “Facilitating the Maintenance of Safety-Critical
Systems Using Formal Methods,” The International Journal of Software Engi-
neering and Knowledge Engineering, vol. 4, no. 2, 1994.

[14] S. Liu and N. Wilde, “Identifying Objects in a Conventional Procedural Lan-
guage: An Example of Data Design Recovery,” in Proceedings for the Conference
on Software Maintenance, IEEE, 1990.

[15] G. C. Gannod and B. H. Cheng, “A Two Phase Approach to Reverse Engineering
Using Formal Methods,” Lecture Notes in Computer Science: Formal Methods
in Programming and Their Applications, vol. 735, pp. 335—348, July 1993.

[16] F. L. Bauer, “Software Engineering,” in Information Processing 71, p. 530, North
Holland, 1972.

[17] R. S. Pressman, Software Engineering A Practitioner’s Approach. McGraw-Hill,
1992.

[18] W. M. Osborne and E. J. Chikofsky, “Fitting pieces to the maintenance puzzle,”
IEEE Software, vol. 7, pp. 11-12, January 1990.

[19] E. J. Chikofsky and J. H. Cross II, “Reverse Engineering and Design Recovery:
A Taxonomy,” IEEE Software, vol. 7, pp. 13-17, January 1990.

[20] E. J. Byrne and D. A. Gustafson, “A Software Re-engineering Process Model,”
in COMPSAC, ACM, 1992.

[21] C. A. R. Hoare, “An axiomatic basis for computer programming,” Communica-
tions of the ACM, vol. 12, pp. 576-580, October 1969.

[22] E. W. Dijkstra and C. S. Scholten, Predicate Calculus and Program Semantics.
Springer-Verlag, 1990.

[23] J. Spivey, The Z Notation: A Reference Manual. Prentice Hall, 1987.

[24] C. B. Jones, Systematic Software Development Using VDM. Prentice Hall In-
ternational Series in Computer Science, Prentice Hall International (UK) Ltd.,
second ed., 1990.

[25] E. W. Dijkstra, A Discipline of Programming. Prentice Hall, 1976.

[26] F. Lindstrom, “Re-engineering of old systems to an ob ject-oriented architecture,”

in OOPSLA, ACM, 1991.

77

[27] A. Winblad, S. Edwards, and D. King, Object-Oriented Software. Addison-
Wesley, 1990.

[28] J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, and W. Lorensen, Object-
Oriented Modeling and Design. Englewood Cliffs, New Jersey: Prentice Hall,
1991.

[29] J. Guttag and J. Horning, Larch: Languages and Tools for Formal Specification.
Springer-Verlag, 1993.

[30] T. Pratt, Programming Languages: Design and Implementation. Prentice-Hall,
1984.

[31] J. Ichbiah, “Rationale for the Design of the Ada Programming Language,” SIG-
PLAN Notices, vol. 14, 1979.

[32] N. Wirth, “The programming language Pascal,” Acta Informatica, vol. 1, pp. 35-
63, 1971.

[33] P. Benedusi, A. Cimitile, and U. DeCarlini, “A Reverse Engineering Methodology
to Reconstruct Hierarchical Data Flow Diagrams for Software Maintenance,” in
Proceedings for the Conference on Software Maintenance, pp. 180—189, IEEE,
1989.

[34] S. Leestma and L. N yhoff, Pascal Programming and Problem Solving. Macmillan,
second ed., 1987.

[35] A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools.
Addison Wesley, 1986.

[36] R. Linger, H. Mills, and B. Witt, Structured Programming: Theory and Practice.
Addison-Wesley, 1979.

[37] P. Hausler, M. Pleszkock, R. Linger, and A. Hevner, “Using Function Abstraction
to Understand Program Behavior,” IEEE Software, vol. 7, pp. 55—63, January
1990.

[38] M. Ward, F. Calliss, and M. Munro, “The Maintainer’s Assistant,” in Proceedings
for the Conference on Software Maintenance, IEEE, 1989.

[39] T. Bull, “An Introduction to the WSL Program Transformer,” in Proceedings
for the Conference on Software Maintenance, pp. 242-250, IEEE, 1990.

78

[40] F. Calliss, M. Khalil, M. Munro, and M. Ward, “A Knowledge-Based System for
Software Maintenance,” in Proceedings for the Conference on Software Mainte-
nance, pp. 319—324, IEEE, 1988.

[41] D. R. Smith, “KIDS: A Semi-automatic Program Development System,” IEEE
Transactions on Software Engineering, vol. 16, pp. 1024—1043, September 1990.

[42] K. Lano, “Formal Approaches to Reverse-Engineering,” Tech. Rep. 2487—TN-
PRC—1067, Oxford University, March 1991.

[43] K. Lano, “Test Results for the Reverse-Engineering Tool Set,” Tech. Rep. 2487-
TN-PRG-1074, Oxford University, September 1991.

[44] K. Lano and H. Haughton, “Extracting Design and Functionality from Code,”
in Proceedings for the 5th International Workshop on Computer Aided Software
Engineering, pp. 74—82, IEEE, 1992.

[45] R. H. Bourdeau and B. H. Cheng, “A Formal Semantics for the Integration of
Object and Dynamic Models,” Tech. Rep. CPS-93-32, Michigan State University,
December 1993.

 

MICHIGAN STATE UNIV. LIBRARIES
IIIIIIIIIIIII]IIIIIIIIIIIIIIIII
31293010220261