THESIS

lU

”T'llllllllll

3 1293 01688 3617

       

                                                                                             

This is to certify that the

dissertation entitled

INTEGRATING INFORMAL AND FORMAL TECHNIQUES TO
REVERSE ENGINEER IMPERATIVE PROGRAMS

presented by
GERALD CATOLICO GANNOD

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in Cogputer Science

 

{if/15,7] [Y \‘i/ / /// I Z;

 

Major professor C

I

Date 8/ 31/24?

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

f LIBRARY
Micmean State
University

 

 

 

PLACE IN RETURN BOX
to remove this checkout from your record.
TO AVOID FINES return on or before date due.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1M WWJﬁ-p.“

 

I‘Vr‘n

.IECRd
RE‘

INTEGRATING INFORMAL AND FORMAL TECHNIQUES TO
REVERSE ENGINEER IMPERATIVE PROGRAMS

By

Gerald Catolico Gannod

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

1998

DIEGRIIIX

REVER

"..{“‘3\.
I“ . Bq’s‘é\'r .'
I \\,_‘
\u .
«LT:- .
~ w. ‘39: ’0-
\. .
H-
.L.
I I.-
‘ «(I
h“. ' .
I. ‘2 ,,_
. la 0
"-x .
‘-.
a 9 f5. .
~\.‘\ "ﬁn.
I "t '3‘.
3
I‘ .
‘3 rs:- .'
ﬂ .0 .-I
l y .“I
‘ u
I
a...
‘ “.1 |
‘3 ‘ 4 I“
‘4. J‘.
N.~.

bi I~_
'-.-'. -‘-v~.
I '\
It
\
.'b 9"“.
V“.\. .‘
"‘ qvQ-I
5“.
\r.‘_
\‘r , .
.. III
p
\_.
\~.
‘A C
‘3‘:-
~ -
E An
\\ N~
is
\
't .
K A,
,J‘.‘:.‘
- J

ABSTRACT
INTEGRATING INFORMAL AND FORMAL TECHNIQUES TO
REVERSE ENGINEER IMPERATIVE PROGRAMS
By

Gerald Catolico Gannod

Many well-documented computer failures have been attributed to software. Some of
the most notable incidents include the catastrophic failures of the Theme-25 [l] and the
Ariane 5 spacecraft [2]. A commonly overlooked aspect of these failures has been the fact
that both were the result of an improper reengineering of software from one version to
another.

The failure to correctly analyze software in both the Theme—25 and Ariane 5 resulted
in catastrophic events that led to loss of life and property. These examples vividly illustrate
the need for sophisticated and systematic methods for maintaining software in order to
understand their functionality.

Reverse engineering of program code is the process of examining components
and component interrelationships in order to construct a high-level abstraction Of an
implementation [3]. Reengineering is the process of examination, understanding, and
alteration of a system with the intent of implementing the system in a new form [3].
Software reengineering is considered to be a better solution for handling legacy code

as Opposed to developing software from the original requirements. Since much of the

’ ‘_ .I‘. O."
ELLE-LIA 0' “’c c

1'
v.1.“ H.“ 'i
m.m.3.'......v .\

Maxi

-.O.

, .

area-1w “.I‘ "°

v....~\..uL hub X
t b

‘1-

EQTMEE

»
1.-\‘.Q.l. Ya ’
. 4.
3‘3....-\..-L‘I. ..sh.~
i

W-é. '
. nu~ n.4, .
-h

I " I
u.:.\§u*.

5‘ \‘0‘ i . '
m . 0.5 T... 1

_ “‘" »')l . . U
- 4.

rug},
km a a ’9

“‘ ‘\ “‘€a

functionality of the existing software has been achieved over a period of time, it must be
preserved for many reasons, including providing continuity to current users Of the software.

This research focuses on three primary contributions to the areas of software
engineering and software maintenance. First, we have developed a technique for the
construction of as-built formal speciﬁcations from program code using the strongest
postcondition predicate transformer. Second, we have developed a formal technique for
introducing abstraction into as-built speciﬁcations with the intent of obtaining design level
speciﬁcations. Third, we have used the formal technique to support program understanding

and software reuse activities.

Copyright by
Gerald Catolico Gannod
1998

To my parents

ACKNOWLEDGMENTS

This work was supported in part by the NASA Graduate Student Research Fellowship
NOT-70376 and the National Science Foundation grants CCR-9633391, CCR-9407318,
CCR-9209873, and CDA-93 123 89.

First, I would like to thank my family for always providing support and for reminding
me that I will never be as “smart” as I think I am. Several people at the Jet Propulsion
Laboratory provided assistance and valuable insight during my visits to the lab. Among
these people are John Kelly, Ron Slusser, Martin Feather, Lan Tran, Richard Santiago,
Rick Covington, Al Nikora, and Dorothy Huffman. In addition, many of the participants of
the NASA Formal Methods Working Group provided advice and comments during group
teleconferences including Robyn Lutz, Mike Lowry, Judy Crow, Jack Callahan, and Ben
DiVIto. I’m sure I have missed some people, and for that I apologize. However, be aware
that I greatly appreciate the input provided by this group of people.

I would also like to thank the past and present members of the Software Engineering
Network Systems (SENS) Research Group (formerly known as the Software Engineering
Research Group). The long discussions and interaction will surely be missed. I am indebted

to John Kelly, Bryan Pijanowski, Diane Rover, and Anthony Wojcik for their willingness

vi

E)" ‘1' ”VI-bi: “it-pu- IL.

«ﬂuent "" ""

lJ' ‘
“c... ”~‘Q‘uh‘
Mm. I sow

a: 11$ 1 out I): l

‘1 n

new We hat: its."

&\"'5 I“ 3

I
.. M21: seed '

r
I
' P

....e A.) SHE‘S; :C 3::

I
' ‘ -!-. '
:Kgx- MI‘.‘ . c. I
I 5. Nu"

I
‘IZ'I\:“‘.
. ‘I‘.
.--~u...* ”‘
I: Q “act
‘7 ‘ r .
'u‘l . I
. “L ﬁle a: ‘15 s.
‘4 \
\."'.~
.I . F- .
N4 " '
. ‘ 9y. 1
«SN‘Y‘ " '
e ' “'4“ \3
‘s.

‘ I?
‘ ”‘91 least
‘ J
fth‘n
u.|‘;‘t~:
.m M54; I dc 18‘
3; ~,.
o J" a

to serve as members of my dissertation committee, and to Linda Moore for always bailing
me out of trouble when the university came calling.

I would be committing a crime if I did not mention my advisor, Betty Cheng. Thank
you for all your support in the past several years. Words can barely express the gratitude
and thanks I owe you. I can only hope that I can be as positive a force in the lives of
students as you have been in mine.

Several friends need to be recognized for understanding when I couldn’t break away
from my schedule to see them. Speciﬁc names would be silly, so I’ll stick to naming them
by the groups that I have always known them. “SIGAWOTS”, my thoughts and prayers
will always be with you wherever it is you may be. To the “Koinonia” class, thanks for
always feeding us when we needed to be fed. You’ve all helped me to set a new standard
for my life here at the crossroads. Thanks to the “Llamas” for always providing a venue for
chillin’. Finally, thanks to the “Dread Poets” for helping me to realize that while the cause
may be just, its the experience that makes the difference.

Last, but not least, I would like to thank my wife, Barbie. Your love and support for
everything that I do is unequaled by anyone I’ve ever known. Everyday I thank God that

we found each other and it is clear to me that you, truly, “are the best”.

vii

LIST OF TABLES
IN 0f FIG “(£5

1 lmmduction

1i Pic-5.3:: Dem.“

I“
u .

M
‘i 1! «,3: \l .. ,,.
h

, v 9‘33
I‘M:
. radiant a: pm];
“~ l: . “
" .£~._
‘- “Writ .

4- D mpoﬂtom
$161..

I‘- - - .

I P321173 $612.; \ _
‘: EL?!“ a»

\-.
.‘ .3...
“K: I"
<: “~33
'\ P'u‘.
Lu~;‘t EX“
i ‘ . fa
.133» r s
"'3“-\
;: .‘klge
H. 1:13.;“-\
ix. {”0“ .
.'.r ‘;!J~.¢U
;. {\MLJP‘S
P.”

TABLE OF CONTENTS

LIST OF TABLES xi
LIST OF FIGURES xii
1 Introduction 1
1.1 Problem Description and Motivation ....................... 1
1.2 Contributions ................................... 4
1.3 Organization of Dissertation ........................... 6
2 Background 7
2.1 Software Maintenance .............................. 7
2.2 Formal Methods .................................. 11
2.3 Informal Methods ................................. 17
3 Using Strongest Postcondition to Reverse Engineer Programs 20
3.1 Basic Constructs ................................. 20
3.2 Iterative and Procedural Constructs ........................ 27
3.3 Example ...................................... 35
4 Strongest Postcondition Semantics of Pointers 39
4.1 Pointers ...................................... 39
4.2 Pointer Semantics ................................. 40
4.3 Examples ..................................... 53
5 Application of Strongest Postcondition to C Programs 59
5.1 Assignment .................................... 59
5.2 Altemation .................................... 62
5.3 Circuit Expressions ................................ 64
5.4 Sequence ..................................... 66
5.5 Iteration ...................................... 67
5.6 Functions ..................................... 70
5.7 Procedural Abstractions .............................. 72
6 Design Abstractions 76
6.1 Speciﬁcation Matching and Software Reuse ................... 76
6.2 Abstraction Matching ............................... 78
6.3 Speciﬁcation Generalization ........................... 85
6.4 Application to a JPL Ground-based Flight System ................ 94

viii

 

7 Rescue Engineering
”.1 Cor-.5111; him..-

as E ‘
' a”.
a at LL13: ' ' '
O

1 Tool Support

5.3 Oman . . .
E.” ALTCSPEC . . . .
3 SPECGEN . . .

E4 chEm. . . . .
L" lire-:3: Ptoxcr. .

I
.-|

5 Erin-.3 Gas 145:;

mp

1’"

9 .lppbcab'on of Rum
9; 0mm .
9. 13351.2: Rev-'3: I:

5'3 Bit-“i:

V
A

If" I .
‘0.- B‘E‘YII
b ﬁl': _

51‘. h
c ”meat
..-'. - ' ' .

3‘ 5:24:11; D'Al‘enw

[w iR ”i ”Mk“
I" . ”ﬁr ‘0 a
”24 ~rw£|""“ll‘t TH: ‘-
Wren . .

1: late.» ‘S I ..

1“ ﬁ“\\ RC2. \ Pt
”‘1 Sf‘ltlcl - “‘9‘ \
. . LN ‘h‘Vt‘. .\:_l‘

A.. .~
‘ ““52.
‘u '_ W n? C ht
“‘ tn I kl
A; “BI
“w 17mm
‘4} . '4.
‘ch...'

ig'xr
‘ .
.f,’ ,1 . ”2'17
W{"v it's '
z: “‘3!“ '
‘ 35- " It '
.1,‘}/\ ‘71." If ‘
a; "1‘

 

7 Reverse Engineering Framework

7.1 Combining Informal and Formal Approaches ..............
7.2 An Example ................................

8 Tool Support

8.1 Overview .................................
8.2 AUTOSPEC ................................
8.3 SPECGEN .................................
8.4 SPECEDIT .................................
8.5 Theorem Prover ..............................
8.6 Formula Class Library ..........................

9 Application of Reverse Engineering to Support Software Reuse

9.1 Overview .................................
9.2 A Software Reverse Engineering and Reuse Framework .........
9.3 Example ..................................

10 Related Work

10.1 Introduction ................................
10.2 Background ................................
10.3 Taxonomy .................................
10.4 Semantic Dimensions ...........................
10.5 A Representative Tools Survey ......................
10.6 Comparison ................................

11 Case Study

11.1 Overview .................................
11.2 Project-Speciﬁc Process ..........................
11.3 High-Level Analysis ...........................
11.4 Low-Level Analysis ............................
11.5 Formal Analysis ..............................
11.6 Discussion .................................
11.7 Lessons Learned .............................

12 Conclusions and Future Investigations

12.1 Summary of Contributions ........................
12.2 Future Investigations ...........................

A Semantics of C Expressions

A.1 Assignment Operators ..........................
A.2 Logical Operators .............................
A.3 Bitwise Operators .............................
A.4 Equality and Relational Operators ....................
A.5 Shift Operators ..............................
A.6 Additive and Multiplicative Operators ..................

ix

103
103
109

119
119
121
129
133
136
139

142
142
143
146

159
159
160
163
168
172
189

199
199
202
204
210
217
223
225

227
227
229

234

....234
....235
....235
....236
....237
....237

B Partial Order Lemmas

3.1 Lemma 1 .....................................
B.2 Lemma 2 .....................................
B.3 Lemma 3 .....................................

C Application Program

D Software Reuse Speciﬁcations
D.l As-built speciﬁcation for the Queue source code ................
D.2 Circular Queue Library Speciﬁcation .......................

E processmemonicinput Source Code

BIBLIOGRAPHY

239
239
240
242

243

251
251
253

258

 

 

.1 itgtrtsof m: 8;;
.. hmcsoi he s," r
.3 P711305: Matt?“

I .t-b

4.1 p”... "
. .. $5513.38?"
5 I.

5' -=*~--- ’

'1 5.4.x. 01A 0"
U
.. Hanson} MP

1.1 I

5'. an...‘ '
“15...“; WK.“ Nu".
5 u f" \‘

TIA-J! '
. 4.- 1.1361 .

li‘ '

1., ‘17";“\ .

I. . “hr-“XI: O. V.

m . . Cu.
... t... I" '

{.1 1‘50!“ 0.: RA;
. r ‘1
"‘ tﬂfﬁ‘tg‘huq 0-

T "" K

(\ " ll 0‘ CC
‘5 {Iph‘ ‘ '

'; ‘f‘n 301‘. D: R ‘
tug Erq'i -

I.”- r‘.)‘:)t 0“ CU
AJ 'Iﬁ- _

2.1
2.2
2.3

4.1

5.1
5.2

6.1

10.1
10.2
10.3
10.4
10.5
10.6
10.7

A.1
A.2
A.3
A.4
A.5
A.6

LIST OF TABLES

Properties of the wp predicate transformer .................... 13
Properties of the sp predicate transformer .................... 15
Pre/Post Match Criterion ............................. l7
Pointer Assignments ............................... 43
Evaluation of A on sample C assignment operators ............... 61
A Taxonomy of Programming Language Functions ............... 71
Weakening the postcondition ........................... 89
Tool Index ..................................... 192
Comparison of Commercial Tools by informational criterion .......... 193
Comparison of Research Tools by informational criterion ............ 194
Comparison of Commercial Tools by By-products ................ 195
Comparison of Research Tools by By-products ................. 196
Comparison of Commercial Tools by evaluational criterion ........... 197
Comparison of Research Tools by evaluational criterion ............. 198
Bitwise Operative Assignment Operators ..................... 235
Logical Operators ................................. 236
Bitwise Operators ................................. 236
Equality and Relational Operators ........................ 237
Shift Operators .................................. 238
Additive and Multiplicative Operators ...................... 238

xi

 

116

1‘

. L\ an 515131;: 0'

, I

Rune E32266:
.: Bail‘exrct‘rsxs"
.~‘. 011? Sara

3: Affﬁ‘rai

a... «33;: (

85...: .'
sm' Y ‘-"‘ .
:1 ‘k‘ \k..\"~‘

Q-

“ R.

. 3~~.,' '
\ “and, ‘ '

C; '1 [V
‘ ‘V‘ \M.
I: .1
. anal-‘9 ., ‘0 .’
.. “"“JMJI. .t'l
W

ls: (0115.24: or.

I I” I
‘~ 1'
.9,
o
*IE‘|S-\ ‘“
H \
.‘ .
' I I" "
. ‘ “no)‘!(i*p

l‘

I “ ‘

i5 31"” H
.. .53“

H - ‘1‘?“ “
‘3 ".~ f: s‘s\
,.

‘ J5Ftc 3"" u,
2; =a‘s\

‘
.3 p

.Y

r

, 31‘ rm

:f . “It Son Pvlnl
‘ﬁa' '

in "‘P‘Ie SC)”

5; *W -

a 3?“?
.1... ”salt
11“”); ' l

LIST OF FIGURES

2.1 01 as an abstraction of G2 ............................ 10
2.2 Reverse Engineering Process Model ....................... 11
2.3 Black box representation and differences between wp and sp: (a) wp (b) sp . . . 16
2.4 OMT Summary .................................. 18
3.1 Annotated Source Code for Unrolled Loop .................... 30
3.2 Strategy for constructing a speciﬁcation for an iteration statement ....... 32
3.3 Removal of procedure call p(ﬁ, 5, E) abstraction ................. 36
3.4 Code annotation for procedure call ........................ 37
3.5 User Consultation ................................. 38
4.1 A simple pointer example ............................. 42
4.2 Cell Memory Model ............................... 43
4.3 Pointer Extensions to the Memory Model .................... 45
4.4 The points—t0 relation ............................... 46
4.5 The coset function ................................. 46
4.6 Three Sample Programs: (a) alias (b) manyvars (c) maxThresh ..... 54
4.7 Output of AUTOSPEC applied to Figure 4.1 ................... 55
4.8 AUTOSPEC applied to the manyvars program ................. 56
4.9 AUTOSPEC applied to the maxThresh program ................ 58
5.1 An Assignment statement as a guard ....................... 62
5.2 Removal of procedure call abstraction: (a) before (b) after ........... 74
5.3 Code annotation for procedure calls ....................... 75
6.1 Syntax of Library Speciﬁcations ......................... 81
6.2 Square Root Speciﬁcation Library “Sqr” ..................... 82
6.3 Square Root Library as a partial order ...................... 83
6.4 Speciﬁcation Generalization ........................... 86
6.5 Bubble Sort Program Annotated by AUTOSPEC ................. 88
6.6 Bubble Sort Speciﬁcation Brute Force Abstraction ................ 91
6.7 Bubble Sort Speciﬁcation Abstraction (postcondition) .............. 92
6.8 Bubble Sort Speciﬁcation after deletion of “a” and “b” ............. 95
6.9 Code Sequence: Lines 108—135 ......................... 99
6.10 Annotation Abstractions ............................. 100
6.11 Code annotation: Lines 404-420 ......................... 101
7.1 Translate Source Code .............................. 110

xii

 

 

Trig"
msla: 50.2.: :c Co
Prxcss 817.2) 0;;

l 4'“.
Lou 'm' .4

3.1 1:033:11: .....
3: 1m. 0 A: 5:»; t.‘
3 szcllDag EN [
SA .lLT-LrSPEC Mar. \\
if 117:"ch A Sc‘a~
1.51;; mg», 5;;
‘ 1m.“ OSPE CG'; .\ \
ll [551395-503 \
S.LGE\I ”3:1 ‘
WWSE Eiczr.‘
*4"15?E:E:::.\
-- SEE Em

IrL’l: I) T?“f,:-:

i‘y, dlfl 1 LEE: "IE: \

H:
n. E34331: TPRIAIII‘
.4 A . .
3.. ulfl mcuc. o: :3; l

”RufﬁcE

I: LEVEL g3»

360::u‘ -

‘R"

-‘- I‘C‘E n

I! SDEP Cl 5“
~\ .9 .‘- \

I9 ~ III‘3114\c

‘I s'd- - .
“CO“. ‘1’ RIM I\
2!: £17le L
"I‘ “-1?

7.2
7.3
7.4

8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
8.11
8.12
8.13
8.14
8.15
8.16

9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13

10.1
10.2
10.3

11.1
11.2
11.3
11.4
11.5
11.6
11.7

Translate ...................................... l l 1
Translate Source Code .............................. 111
Process Binary Output Source Code ....................... 116
Tool Suite ..................................... 120
Level 0 AUTOSPEC Model ............................ 122
Level 1 Data Flow Diagram of AUTOSPEC .................... 123
AUTOSPEC Main Window ............................ 126
AUTOSPEC A Selection in the Main Window .................. 127
Launching SPECEDIT from AUTOSPEC ..................... 128
Level 0 SPECGEN Model ............................. 129
Level 1 SPECGEN Model ............................. 130
SPECGEN Interface and Output .......................... 132
Level 0 SPECEDIT Model ............................. 134
Level 1 SPECEDIT Model ............................. 135
SPECEDIT ..................................... 136
Level 0 TPROVER Model ............................. 137
Level 1 TPROVER Model ............................. 138
Example TPROVBR Session ............................ 140
OMT model of the Formula class library ..................... 141
The Reverse Engineering and Reuse Framework ................. 144
Reverse Engineering Component ......................... 145
Software Reuse Framework ............................ 147
Queue Source Code ................................ 148
Circular Queue Diagram ............................. 148
Output generated by AUTOSPEC for the enQueue procedure .......... 150
The enQueue ensures clause in a conjunctive form ............... 151
SPECGEN Interface and Output .......................... 152
The enQueue abstraction ............................ 153
Architecture of a solution to Josephus problem ................. 155
Architecture speciﬁcation ............................. 156
Component Matching ............................... 157
Wrappers generated by ABRIE for resolving naming conﬂicts ......... 158
A Taxonomy of Reverse Engineering Techniques ................ 165
Precision Hierarchy ................................ 170
Example Plan ................................... 173
Steps for Preparing, Transferring, and Radiating a Command ﬁle ........ 201
Communicator and Related Data Structures ................... 205
Command Translation Context Diagram ..................... 206
Command Translation Data Flow Diagram .................... 207
Major Data Structures for Command Translation ................ 209
Project Files Model for Command Translation .................. 209
Command Translation: Main source model ................... 211

xiii

 

11. 13:52.. ‘1' some an.

111'1. \ﬂw"'mc

..ICiP:I;tss Sincnnrzc

._.

I’v

U

111.111. "Wt III“ of 1

u 1‘
» uv-a
ll nur.v.

‘- --~~"‘ r.
:S:-:, Mu.s\

11:31 'Sozmuie 9.2.: ;:

Ho; 52‘.

M1301 !;_4 US)"
111571: .2: P! Hm

I‘ ‘n o - ‘ . ‘
“61:10 11th 1.1:; “K

1A:’.8PEC 01.." ' ‘-'

.UI‘
I

. C3313: Q1916 L57;

63 LJ
,

11.8 Translate source code ............................... 212

11.9 Translate source model .............................. 213
11.10Process Mnemonic subgraph ........................... 214
11.11Alternative view of Process Mnemonic subgraph ................ 215
11.12process_msg source code ............................ 216
11.13Source code sequence for processmemonicinput ............. 219
11.14Annotated source code for endmessagesubroutine ............ 221
11.15The POPM Macro ................................. 223
11.16Annotated source code for end_cmdxlt ..................... 224
DJ AUTOSPEC output of as-built speciﬁcations for queue source code ....... 252
D2 Circular Queue Library Speciﬁcation ....................... 253

xiv

Chapter 1

Introduction

As the demands placed on software continue to grow, there is an increasing recognition
that software can be error prone. Moreover, the rising cost of software development
has resulted in software systems that are used for longer periods of time, for multiple
purposes, and for increasingly larger customer bases. As a result, there is a need for more
sophisticated and systematic approaches for maintaining software. Our research develops
a new technique for reverse engineering that is mathematically rigorous and applicable to
practical imperative programming languages such as C. As such, this research facilitates
the systematic evolution of software by supporting software maintenance via program
understanding and software reuse. This chapter discusses the motivation for the reverse

engineering technique and gives the contributions of the work.

1.1 Problem Description and Motivation

Many well—documented computer failures have been attributed to software. Some of the

most notable incidents include the catastrophic failures of the Therac-25 [1] and the Ariane

 

5mm“. {2]. A com.

in: are he 25.131 of a: :."

Them-3.5. ll": farm-Z“
;:;Iclherx~20 [1] Ir.
|

5332117.“ mentor: .1. .
l ‘ '

3'5; ‘Ata mw- 1’ ' ’
~m*151k'ra.£n1 Oi 11.1: T
f!"- - 01“.. .

....£\ .14. “CT: 5231\“3L‘c

$535125) The Err-g.“

 

I

.‘d
s

F:

5 spacecraft [2]. A commonly overlooked aspect of these failures has been the fact that

both were the result of an improper reengineering of software from one version to another.

Theme-25. The Theme-25 is a radiation therapy system that was constructed as a follow-
up to the Theme-20 [1]. In the Therac-ZO, many hardware safety interlocks were used to
ensure that the radiation dosage was well within the prescribed limits for human exposure.
In the development of the Theme-25 it was determined that many of the safety interlocking
routines that were supported in hardware by the Therac-20 were instead to be supported
in software by the Therac-25. Therefore, the combination of the Therac-ZO software and
hardware were to be reengineered to produce the Therac-25 software. In the course of
developing the new software, many of the safety-critical properties were not preserved and

as a result several fatalities occurred during the use of the Therac-25 [1].

Ariane 5. The Ariane 5 is a spacecraft developed by the European Space Agency as a
follow-up to the highly successful Ariane 4 [2]. During the development of the Ariane 5
software it was determined that “it was not wise to make changes in software which worked
well on Ariane 4” [2]. The result of this stance was that a requirement was retained from
the Ariane 4 software that did not apply to the Ariane 5 software. Consequently, during the
maiden voyage of the Ariane 5, a series of unfortunate events led to the eventual destruction
of the spacecraft.

The failure to correctly analyze software in both the Therac-25 and Ariane 5 resulted
in catastrophic events that led to loss of life and property. These examples vividly illustrate
the need for sophisticated and systematic methods for maintaining software in order to

understand their functionality.

 

Rim: engzrrz'nng
In: :ongmcnt Intermix
on; w 9. 0.. ‘
..a:.t:.1.3n {3]. Ref

127.23: of a 5ster u:

A .

,. h w,
1...... .......I 3:72 :5
U

5 ﬁrst to dc‘ac'. pr:

5

". ”4 t
«4.3? ‘11

)of thc 61:31;

mﬁ'iid' :Ym'“
. ‘ ‘5 .L. 1““.\ mg I-

CREE mm:

6" VI.
' L-.
‘

”Tr-”LL“ ‘.
01‘ I _ . A
. MHQ [Rim 0""

4...
v “‘5
.~~.,“. o
"Nun“ 4: ‘. 0 N

s N51in‘a‘. O

‘ . -
unug“
I.‘
‘1.
“‘W‘J't: Wu
.
'
A\NO;|I h. .
\
i

A
\g‘hx. '
"n.4,".“vY 1
\a l.
“11 h .
_“‘.
7'1

.t ‘
' 37““.

Reverse engineering of program code is the process of examining components
and component interrelationships in order to construct a high-level abstraction of an
implementation [3]. Reengineering is the process of examination, understanding, and
alteration of a system with the intent of implementing the system in a new form [3].
Software reengineering is considered to be a better solution for handling legacy code
as opposed to developing software from the original requirements. Since much of the
functionality of the existing software has been achieved over a period of time, it must be
preserved for many reasons, including providing continuity to current users of the software.

Current reverse engineering techniques focus on the recovery of high-level design
representations from program code. One class of approaches constructs structural (i.e.,
diagrammatic) information about software while another class of approaches has been used
to recover functional information. These techniques have been largely informal since they
rely on syntactic analysis and pattern matching. While the approaches are invaluable for
aiding program understanding, they fail to provide a level of conﬁdence in the reliability of
software that is typically required for critical systems [4, 5].

Formal methods are techniques that incorporate the use of formal speciﬁcation
languages, where a formal speciﬁcation language has a well-deﬁned syntax and semantics.
In addition, formal methods have associated calculation rules that can be used to analyze
speciﬁcations in order to verify correctness and consistency. Since the notations have a
formal mathematical basis, formal methods facilitate the use of automated processing.

The primary focus of our research is to apply the use of formal methods to the reverse
engineering of program code in order to rigorously support maintenance and evolutionary
activities. By using formal methods, our approach addresses the need for rigor in the

3

 

III-d car-staph: c1625 3

Their Slamment:

Gn'c'r. Ingram r {r
arcg‘u'rIJn am: .2
hr}: Ipe‘rxﬂzszz. «'13 4
arm 11:51.11“. (I.
it”? bf 1.1611 far 7?

P'Smm menu;

13 Contributions

11"‘1.
g‘l‘l ’Oh‘
‘- It'” - .
1.15 h‘hL

9a....mn Is sad :0 Hr '
‘2 Q

.éIh"av'
' “\1 a.
Mk 0n5."|41 IR“

l
‘ ‘lla

reverse engineering and reengineering of program code in order to minimize and perhaps

avoid catastrophic events such as the Theme-25 and Ariane 5 cases [1, 2].

Thesis Statement:

Given imperative program code, it is possible to construct an as-built formal
speciﬁcation using a semi-automated translation process [6, 7]. When the as-
built speciﬁcations are used in tandem with informal speciﬁcation techniques,
abstract, high-level design speciﬁcations can be derived. These designs can
then be used for rigorous analysis and restructuring in order to facilitate
program understanding and software reuse.

1.2 Contributions

This research makes three major contributions to the area of software engineering and,
speciﬁcally, software maintenance. First, the strongest postcondition predicate transformer
is used for deriving as-built formal speciﬁcations from imperative program code [6]. A
speciﬁcation is said to be at the as-built level if the speciﬁcation is at a level of abstraction
just above the original implementation. As a result, an as-built speciﬁcation may contain an
implementation bias. While these detailed speciﬁcations provide a degree of traceability,
a property that is important for facilitating conﬁdence in the consistency and correctness
of the representation, they may be difﬁcult to read. Given a program 5' and a precondition
Q, the strongest postcondition, denoted sp(S, Q), is deﬁned as the strongest condition that
holds after the execution of S, given that S terminates. By deﬁning the formal semantics of
each of the constructs of a programming language, a formal speciﬁcation of the behavior
of a program written in terms of the given programming language can be constructed [6].
A formal speciﬁcation constructed in this manner can then be used for a number of

activities such as rigorous software analysis using theorem proving techniques. In order

 

' ' Y. ‘-
Icrmnsmc '31: 23......

', I .
367.413 he » the fort.

n ' 4' . D - O. 9‘ |

‘ ”QA.‘ o ,n. a
33.4.11.) TOICOSJ.» "'

1.. ,-

Semi u: shes he.

:--.-' -,;... s...
at. 1 INF)...“ 15:73:?

21‘

~19 v.0. ‘
r
. Wang Amaze? a“
g . .
I

;7-.“' <1 ﬁn.
a.” L .‘ ' .0.
- 1.4»...5111'1JJLH- '1

l: 2.5 end. It hair do"

I.

\.
H" .
' ‘04)” ‘sO

a“ ‘2? "7.. ‘1 _
k. c Nud..\ eki

-,-~.'.
«.30 w

‘n‘ lY‘ ..
wig SW05",
“MAM"
\
. if“

to demonstrate the applicability of the strongest postcondition approach to a broad context,
we have deﬁned the formal semantics of a subset of the C programming language [7] and
applied its use in the analysis of a mission control system that is used to translate user
commands for controlling unmanned spacecraft [7].

Second, we show how a formal technique based on speciﬁcation matching can be
used to introduce abstractions in as-built speciﬁcations in order to produce high-level
speciﬁcations. Another primary objective of the proposed research is to develop a technique
for introducing abstraction into as-built speciﬁcations in order to facilitate their readability.
To this end, we have developed a technique based on generalizing as-built speciﬁcations
by constructing partially-ordered sets of speciﬁcations that are ordered using speciﬁcation
matching operations. Consequently, by generalizing as-built speciﬁcations, the results
of this research can facilitate several reverse engineering and reengineering activities,
including high-level program understanding.

Third, we have developed a technique for facilitating software reuse that is based on
constructing speciﬁcation libraries via reverse engineering. Speciﬁcally, the process of
constructing an as-built speciﬁcation and the subsequent generalizations can be used to
populate libraries with software component speciﬁcations. When used along with software
reuse technology, this technique can facilitate the construction of new applications using
existing code that may or may not have been intended for reuse.

An important property of formal speciﬁcation languages is that their syntax and
semantics are well-deﬁned. As such, formal speciﬁcation languages are amenable to
automated processing [8]. To support the formal reverse engineering technique described in
this dissertation, we have developed several tools that provide assistance to a user during the

5

~ ~ - ~ 1. .
121.2}: panorama :.

‘1". N I . u. -
.I. sort. spec” ‘ ‘ " I

‘ I
sbb&;- ;

h _ I
“M In: :sc

“Fm ,
k “ -
561.131‘1‘16 d7!
\
4:.‘1.

reverse engineering process. Speciﬁcally, we have developed tools that support the use of
strongest postcondition to derive speciﬁcations from program code and derive abstractions
from as-built speciﬁcations. In addition, we have developed a theorem prover and ﬁrst-

order logic syntactic editor that can be used throughout the speciﬁcation process.

1.3 Organization of Dissertation

The remainder of this dissertation is organized as follows. Chapter 2 provides background
material for software maintenance and formal methods. Our investigations into the use of
strongest postcondition as a formal basis for reverse engineering is described in Chapter 3.
Chapter 4 extends the use of strongest postcondition to include a formal treatment of pointer
variables. The application of strongest postcondition to the C programming language is
presented in Chapter 5. The approach for introducing design abstractions into as-built
speciﬁcations is deﬁned in Chapter 6. Chapter 7 presents a reverse engineering framework
that integrates the strongest postcondition technique with the design abstraction technique.
Chapter 8 provides a description of several tools that we have developed to support reverse
engineering. In Chapter 9, we describe the use of our reverse engineering technique to
facilitate software reuse. A survey of related work is presented in Chapter 10, including
the introduction of a new taxonomy for comparing different techniques and their tools.
Chapter 1 1 presents the details of a case study that applies the reverse engineering technique
to a NASA JPL application. Finally, Chapter 12 draws conclusions and discusses remaining

investigations.

Chapter 2

Background

This chapter provides background information for software maintenance and formal
methods for software development. Included in this discussion is the formal model of

program semantics used throughout this dissertation.

2.1 Software Maintenance

One of the most difﬁcult aspects of software maintenance is the analysis of existing
programs in order to determine functionality. This step in re-engineering is known as
reverse engineering. Identifying design decisions, intended use, and domain speciﬁc details
are often signiﬁcant obstacles to successfully re-engineering a system.

Several terms are frequently used in the discussion of re-engineering [3]. Forward
Engineering is the process of developing a system by moving from high-level abstract
speciﬁcations to detailed, implementation-speciﬁc manifestations. The explicit use of the
word “forward” is used to contrast the process with Reverse Engineering, the process of
analyzing a system in order to identify system components, component relationships, and

intended behavior. Restructuring is the process of creating a logically equivalent system

 

at: same loci of 3551?.
rift 51m and 15 its:

:1; srrtzcd code Re -l

 

 

 

mister 11 In a not 3' '-

\.

hammer 7.3.2.97

15:: at for Mrs 0

Ti: ”:3

f 419}. .13.:me

It I: mar.) Inter};

,I
\

“w A. ~
...I 15 “176.111: lT..:1“.'.;

"q:
f = a
‘wu'E-S

Of M6611: 131.:

‘7‘.

*--4L'I&."‘ 1'“

or mini

n1::"‘ P’Nqo
b ‘&.I\C“l

“since 1:

tlk a.) fom Of x1“,

Mars

‘I i .
" Ll - ‘ c I

‘ 0?. ‘It

‘ if.“ a

‘ .03“;

“rmmer l

BFE‘I' I
. - .‘ ‘n q“
5r.
"n rim

I
..a‘ i
41:13 «I ‘A
“31an
‘5
.{A_ .l
{3 0f“
"IL,
I *- ““I
III
‘9
“w.

at the same level of abstraction. This process does not require semantic understanding
of the system and is best characterized by the task of transforming unstructured code
into structured code. Re-Engineering is the examination and alteration of a system to
reconstitute it in a new form, which potentially involves changes at the requirements,
design, and implementation levels.

There are four types of software maintenance: adaptive, corrective, ped‘ective, and
preventive [9]. Adaptive maintenance is the activity associated with changing code in
order to properly interface with a changing environment. Changing code in order to ﬁx
errors is corrective maintenance. Adding new features in response to user needs is an
example of perfective maintenance, and changing software in order to improve future
maintainability or reliability is known as preventive maintenance. Pressman states that
preventive maintenance is characterized by reverse and re-engineering [9], although, in
fact, any form of software maintenance may involve both activities. For instance, in order
to ﬁx software when performing corrective maintenance, it is important to understand
the current functionality of the program. Reverse engineering techniques can be used
to allow a programmer to recover the design and functionality. Perfective maintenance,
especially in legacy systems, may require a complete re-engineering in order to satisfy new
requirements.

Reﬁnement is the process of making a higher-level speciﬁcation more concrete and
showing that the new reﬁned speciﬁcation satisﬁes the higher-level speciﬁcation. Given a
high-level speciﬁcation, called 31, and a reﬁnement of the speciﬁcation, called 32, we say
that each of these speciﬁcations exist at different levels of abstraction since each provides

a different amount of detail. In the context of a formal speciﬁcation, a speciﬁcation s2

8

 

5325‘s a hzshcr-Icm 5?

"~‘- ‘- ' «.4 1
I. as .53 15 5mg: '

.1. ,.' , L l . ‘
3.511.238 a h: :CT-IC . c
.

.1TllﬁlﬁThﬁn [s 11H: ;‘

5

“£- i‘IO’mg "ﬁll {ﬁt 4'.

44..“
stash In the coat:

1‘ v
i ‘ tr
1.3.93 .

K‘ﬁj ' " ~.O.~ I
5 1CICt§x\...\‘:

.11"; 'L _
it" dcﬁrs-:1.
1‘ gm] 3 h I.

l,
J “C '
' C > 1] -
* l\
\
(11:92.
. C“ ‘41
.‘9-
"(3 #1.
~ “It 5 a]
.3 g‘.
“.534

satisﬁes a higher-level speciﬁcation 31 if it can formally proven that 32 —> 31. That is, in
all cases, 32 is stronger than 31. In the context of structural speciﬁcations, a speciﬁcation
32 satisﬁes a higher-level speciﬁcation 31 if the interfaces for 32 are consistent with the
interfaces to 31, and .91 contains the elements of 32.

Abstraction is the process of making a low-level speciﬁcation 32 less concrete
and showing that the abstracted speciﬁcation 31 is a generalization of the low-level
speciﬁcation. In the context of a formal speciﬁcation, a speciﬁcation 31 is a generalization
of a lower-level speciﬁcation 32 if it can formally proven that 32 —> 31 and that 31 71> 32.
That is, in all cases, 31 is weaker than 32. In the context of a structural speciﬁcation,
an abstracted speciﬁcation 31 is a generalization of a lower-level speciﬁcation 3? if the
interfaces for 32 are consistent with the interfaces to .31 and if the elements of 32 are
contained in 31.

From these descriptions it follows that reﬁnement and abstraction are dual concepts.
That is, given a high-level speciﬁcation 31 and a low-level speciﬁcation 32, if 32 is a
reﬁnement of 31 then 31 is an abstraction of 32. For example, consider the speciﬁcation
“2: > y”. Let a reﬁnement of this speciﬁcation appear as “(as = y + c) A (c > 0)”. Since
c > 0, the term a: = y + c always ensures that a: > y. Therefore ((2: = y + c) /\ (c >
0)) —> (a: > y). In this example, (a: > y) is an abstraction of ((2: = y + c) /\ (c > 0)) and
((3: = y + c) /\ (c > 0)) is a reﬁnement of (:1: > y). For example, consider Figure 2.1 where
the two data ﬂow diagrams depict G1 as an abstraction of G2, where the top diagram
contains the speciﬁcation G 1 and the bottom diagram is the abstracted speciﬁcation G2.
The dashed lines indicate that the bottom diagram can be replaced by the top diagram. As
such, the implication is that the behavior of G1 is reﬁned by G2.

9

 

 

 

 

 

 

 

Figure 2.1: G1 as an abstraction of G2

Byme described the re-engineering process using a graphical model similar to the one
shown in Figure 2.2 [10, 11]. The process model appears in the form of two sectioned
triangles, where each section in the triangles represents a different level of abstraction. The
higher levels in the model are concepts and requirements. The lower levels include designs
and implementations. The relative size of each of the sections is intended to represent the
amount of information known about a system at a given level of abstraction. Entry into
this reengineering process model begins with system A, where Abstraction (or reverse
engineering) is performed to an appropriate level of detail. The next step is Alteration,
where the system is constituted into a new form at a different level of abstraction. Finally,

Reﬁnement of the new form into an implementation can be performed to create system B.
This dissertation describes an approach to reverse engineering that is applicable to the

implementation and design levels. In Figure 2.2, the context for our approach is represented

10

 

Alteration
4)

 

   
  
   
 

   
 

“Forward Engineering”

“Reverse Engineering" Concept

Reﬁnement

  

Abstraction
Requirements

Design \
Implementation \

System B

 

 

 

 

 

 

 

System A

Figure 2.2: Reverse Engineering Process Model

by the dashed arrow. The motivation for deriving speciﬁcations at an implementation-
bound level of abstraction is that it provides a means of traceability between the program
source code and the formal speciﬁcations constructed using the techniques described in the
chapters that follow. This traceability is necessary in order to facilitate technology transfer
of formal methods [4, 5]. That is, currently existing development teams must be able to

understand the relationship between the source code and the speciﬁcations.

2.2 Formal Methods

Although the waterfall development life-cycle provides a structured process for developing
software, the design methodologies that support the life-cycle (e.g., Structured Analysis
and Design [12]) make use of informal techniques, thus increasing the potential for
introducing ambiguity, inconsistency, and incompleteness in designs and implementations.

In contrast, formal methods used in software development are rigorous techniques for

11

 

' ' P .
wrung. dcchopIrIg. .1. ..
rammed sprains. -
.._._I.. W“ LR ,_ _ 1
$45.1. 3.51.15 .11, ‘U1 .1 \{K\

are atikic‘ined and it. .15. .

11.1 Levels of Riga

a y" - j
' D Q “a ' .
L5...) knuc‘a It‘st‘ii (ll "

~-.~.~'" I
1.311.. These ‘Icseis a":

1mm: .\'0 use 0:“
d1 “”4"”.

121d 1: [St of cor.

an
\kﬂ

p

1M» .' . -
‘H‘d mbuUr.‘

1m] 1;

15¢ 01 for.“
“PM [0013 ,_

Include projects

‘I
sf hﬁ‘xlﬂf‘ I

‘ ‘ 5} R115)
“:73, a.

 

 

 

specifying, developing, and verifying computer software [13]. A formal method consists of
a well-deﬁned speciﬁcation language with a set of well-deﬁned inference rules that can be
used to reason about a speciﬁcation [13]. A beneﬁt of formal methods is that their notations

are well-deﬁned and thus, are amenable to automated processing [8].

2.2.1 Levels of Rigor

Rushby deﬁned levels of rigor for describing the degree to which formal methods can be

used [14]. These levels are summarized as follows.

Level 0: No use of formal methods. Examples include all projects that use
diagrammatic notations and no mathematical notation.

Level 1: Use of concepts and notations from discrete mathematics. Examples
include projects that include the use of formal languages to describe data
in data dictionaries.

Level 2: Use of formalized speciﬁcation languages with some mechanized
support tools such as syntax checkers and pretty printers. Examples
include projects that use speciﬁcations languages to describe behavior,
but do not use proof obligations to verify correctness.

Level 3: Use of fully formal speciﬁcation languages with comprehensive
support environments, including mechanized theorem proving or proof
checking. Examples include any project that involves full speciﬁcation
and proofs of the speciﬁcations using automated tools.

The approach described in this dissertation can be considered to be at level 3 in the
hierarchy of rigor since we are advocating the use of formal speciﬁcation languages and

support tools for theorem proving.

2.2.2 Program Semantics

The notation Q { S } R [15] is used to represent a partial correctness model of execution,
where, giVen that a logical condition Q holds, if the execution of program S terminates,
then logical condition R will hold. A rearrangement of the braces to produce { Q } S { R },

12

E
:2.

iii. ”7' EPICSCZISE

Apremrdzrzar. is...

12; <2.: Gael: a 5:2:

in contrast, represents a total correctness model of execution. That is, if condition Q holds,
then S is guaranteed to terminate with condition R true.

A precondition describes the initial state of a program, and a postcondition describes the
ﬁnal state. Given a statement S and a postcondition R, the weakest precondition predicate
transformer wp(S, R) describes the set of all states in which the statement S can begin
execution and terminate with postcondition R true, and the weakest liberal precondition
predicate transformer wlp(S, R) is the set of all states in which the statement S can begin
execution and establish R as true if S terminates. In this respect, wp(S, R) establishes the
total correctness of S, and wlp(S, R) establishes the partial correctness of S. The wp and wlp
are called predicate transformers because they take predicate R and, using the properties

listed in Table 2.1, produce a new predicate.

 

wp(S , false) E false

wp(S, A /\ B) E wp(S, A) /\ wp(S, B)
wp(S, A V B) => wp(S, A) V wp(S, B)
wp(S, A —> B) => wp(S, A) —> wp(S, B)

Table 2.1: Properties of the wp predicate transformer

 

The relationship between wp and wlp is the following.

wp(S, R) E wp(S, true) /\ wlp(S, R) (2.1)

This states that the weakest precondition for establishing R as true given the program S is
equivalent to the fact that if S terminates then wlp(S, R) is true, and wp(S, true) holds. The
conjunct wlp(S, R) is used to establish correctness and the conjunct wp(S, true) is used to
establish termination. The context for our investigations is that we are reverse engineering

13

 

imam: Erhstor IS 1}

2:;2‘221 com . ".ncss mm

113 Strongest P05

(93:35:): press: ~z

arm of S that 1:22;:

match sazsf'mon of R
I y'. ‘ D '1‘ ‘.
.., “11.531.18.316 5:

rates 1111:: R

In»:

{hm-s

. .‘IJ ‘

11;, F

3r"! .
4“) “6 mate I

a?

.1
.52; item am the t
‘I

My
3:14,
‘ "pTOH ,
dc “he:
'1

systems that have desirable properties or functionality that should be preserved or extended.
Termination behavior is typically determined by years of program observation. Therefore,

the partial correctness model is sufﬁcient.

2.2.3 Strongest Postcondition

Consider the predicate -nvlp(S, -IR), which is the set of all states in which there exists an
execution of S that terminates with R true. That is, we wish to describe the set of states
in which satisfaction of R is possible [16]. The predicate —vwlp(S, -R) is contrasted to
wlp(S, R) which, is the set of states in which the computation of S either fails to terminate,
or terminates with R true.

An analogous characterization can be made in terms of the computation state space
that describes initial conditions using the strongest postcondition sp(S, Q) predicate
transformer [16], which is the set of all states in which there exists a computation of S that
begins with Q true. That is, given that Q holds, execution of S results in sp(S, Q) true, if S
terminates. As such, sp(S, Q) assumes partial correctness. Table 2.2 lists some properties
of sp. Finally, we make the following observation about sp(S, Q) and wlp(S, R) and the
relationship between the two predicate transformers, given the Hoare triple Q { S } R [16]:

Q => wlp(S, R)
sp(S, Q) => R
The importance of this relationship is two-fold. First, it provides a formal basis for
translating programming statements into formal speciﬁcations. Second, the symmetry of
sp and wlp provides a method for verifying the correctness of a reverse engineering process
that utilizes the properties of wlp and 3p in tandem.

14

 

sp(S, A A B) E sp(S, A) A sp(S, B)
sp(S, A V B) => sp(S, A) V sp(S, B)
A —> B E sp(S, A) —) sp(S, B)
sp(S, A —+ B) E sp(S, A) —-> sp(S, B)
sp(S , false) E false

Table 2.2: Properties of the sp predicate transformer

 

2.2.4 strongest postcondition vs. weakest precondition

Given a Hoare triple Q { S } R, we note that wp is a backward rule, in that a derivation of a
speciﬁcation begins with R, and produces a predicate wp(S, R). The predicate transformer
wp assumes a total correctness model of computation, meaning that given S and R, if the
computation of S begins in state wp(S, R), the program S will halt with condition R true.
We contrast this model with the sp model, a forward derivation rule. That is, given
a precondition Q and a program S, sp derives a predicate sp(S, Q). The predicate
transformer sp assumes a partial correctness model of computation meaning that if a
program starts in state Q, then the execution of S will place the program in state sp(S, Q)
if S terminates. Figure 2.3 gives a graphical depiction of the differences between sp and
rap, where the input to the predicate transformer produces the corresponding predicate.
Figure 2.3(a) gives the case where the input to the predicate transformer is “S” and “R”,
and the output to the predicate transformer (given by the box and appropriately named
“wp”) is “wp(S,R)”. The sp case (Figure 2.3(b)) is similar, where the input to the predicate

transformer is “S” and “Q”, and the output to the transformer is “sp(S,Q)”.

15

 

upS—R' ‘—

£52213; Black has
\

he use of these

220115. 1'31.“

(5'?

1‘ rise :tgneetrt:
. . ' ', ' '
u a game for per:

imznitzon Q Is 1.:

‘.:3~~~‘-‘..Oli 01

ep. As SL.‘

 

 

 

 

 

 

 

 

{ Q l i t R } { Q s { R }
wptS.R) ‘—~ WP Sp > sp(S.Q)
(a) (b)

Figure 2.3: Black box representation and differences between wp and sp: (a) wp (b) sp

 

The use of these predicate transformers for reverse engineering have different
implications. Using wp implies that a postcondition R is known. However, with respect
to reverse engineering, determining R is the objective, therefore wp can only be used
as a guideline for performing reverse engineering [17]. The use of sp assumes that
a precondition Q is known and that a postcondition will be derived through the direct

application of 3p. As such, sp is better suited for reverse engineering.

2.2.5 Formal Methods Applied to Software Reuse

Software reuse is the process of constructing a software system using existing software
components. Jeng and Cheng [18] describe the use of a generality operator as the
formal basis for identifying reusable components via speciﬁcation matching. Zaremski
and Wing [19] describe several operators for matching queries to components for software
reuse. In addition, Penix and Alexander [20] deﬁne the satisﬁes criterion for component
matching. Table 2.3 lists several of the matching operators. 1 When these operators are

used for software reuse, A is a query speciﬁcation and R is a library speciﬁcation.

 

1In this chapter, we assume that the signatures of a query speciﬁcation and a library speciﬁcation match.
For details on signature matching, see [21].

16

 

 

[ Match | Deﬁnition R j A ]
Exact PIC/P081 (Apre é Rpre) A (lipost <=> Apost)

 

 

 

 

 

Plug-in (Apre => Rpre) A (Rpost => Apost)
Plug—in Post (Rposz => Aiost)
Weak POSI Rpre => (lipost => Apost)

 

Guarded Plug-in (AP... => R1,...) A ((Rﬂe A Rpost) => Apost)
Guarded P081 ((Rvpre A Rpost) => Apost)

Satisﬁes (Apr. => Rpm) A ((Apre A Rpost) => Apost)

 

 

 

 

 

 

Table 2.3: Most Match Criterion

 

2.3 Informal Methods

Informal (or semi-formal) methods are software development methods that are based on
the use of techniques that lack the use of rigorous notation. One of the advantages of using
an informal technique is that they are typically based on the use of graphical notations. As
such, the techniques are amenable to high-level discussion, and are, in general, scalable
to large systems. One of the disadvantages of using informal methods is that since the
notations lack mathematical rigor, they are prone to ambiguity. An example of an informal
method is the Object Modeling Technique (OMT) [22].

OMT is a modeling language that is commonly used in industry and academia. OMT
comprises three complementary models, each of which are simple to use and understand.
The object model describes the static, structural aspects of the system. The object model
captures the objects of the system and the relationships between the objects. The dynamic
model depicts the temporal and behavioral aspects of the system. Finally, the functional
model describes the services provided by the system. Respectively, entity-relationship
diagrams, state transition diagrams, and data ﬂow diagrams are used to represent the

object, dynamic, and functional models, and each model is only used to capture a speciﬁc

17

:-

:csmsc of the sister

mils I5 9055101: ms .-
. I ..
III to the attache}..-

The steam 11014110115

marred In ﬁgure 2.4

 

licscd ass}: at the emf,

\

(be; \1 .3 \

Filmer; \I_ ,

perspective of the system. With recent work [23, 24], rigorous analysis of each of the
models is possible, thus enabling consistency and completeness checks at the model level
prior to the implementation phase.

The speciﬁc notations for the object model, functional model, and dynamic model are
summarized in Figure 2.4. In an object model, a rectangle is used to represent a class. A
line connecting two classes indicates that some relationship exists between the two classes.

A closed circle at the end-point of an association denotes a one-to-many relationship.

 

Object Model Notation

l Classl I
I ClassZ I

E

B is a subclass of A

B is related to A

 

:1.

There is a_one-to-many
relationship between
B and A

Functional Model Notation

@ ®—>. Flow from A to B

Data Store

Dynamic Model Notation

State 1

 

Guard/Action

1 State 2 1

Figure 2.4: OMT Summary

 

 

18

 

lair functioni rm
was a data store. '2:
1:5 It: used to pm the '
is I‘m on: cnttj to .2
L11: 3mm; mxie
3.512035 bets be: 5121::

k

51*" ..-. The text lriot:

311;:

-1209. out of a stat;

|
\.. ..
«L .5 $3136.94

”‘2“- b) 1:11;?

In the functional model, a circle represents a process entity, a pair of parallel lines
represents a data store, and a rectangle represents an entity external to the current model.
Arcs are used to join the various entities of a functional model, with the arrow indicating a
ﬂow from one entity to another.

In the dynamic model, a rounded rectangle is used to denote a state and arcs indicate
transitions between states. The transitions can be labeled with strings separated by a single
slash ‘/’. The text before a slash deﬁnes a guarding condition that must be true in order for
a transition out of a state to occur, and the text after the slash deﬁnes the event or action

that is generated by taking the transition.

19

Chapter 3

Using Strongest Postcondition to
Reverse Engineer Programs

Chapter 2 introduced the strongest postcondition predicate transformer sp(S, Q). This
chapter describes our investigations into the use of the strongest postcondition as the formal
basis for reverse engineering [6] through the construction of formal speciﬁcations from
programs written in terms of the Dijkstra guarded command language [25]. The primary
result of this chapter is a demonstration of the use of the strongest postcondition to facilitate

the construction of as-built formal speciﬁcations from program code for reverse engineering

purposes.
3.1 Basic Constructs

This section describes the derivation of formal speciﬁcations from the primitive
programming constructs of assignment, alternation, and sequences. The Dijkstra guarded
command language [25] is used to represent each primitive construct but the techniques are
applicable to the general class of imperative languages. For each primitive, we ﬁrst describe
the semantics of the predicate transformers wlp and sp as they apply to each primitive and

then, for reverse engineering purposes, describe speciﬁcation derivation in terms of Hoare

2O

 

ll
‘ I Dr. '. x.
313.25. Season-..

at .sed to minute .i

31.] 555113111“8

IxressIon. Th: '. . ;

3.22:5 esteem *‘ '

,
MA\

I . “~, ,~
NIL;- Nm 6- If I \1

agesszons. then the

21.. res, cuxels.

.. . r
11.2%) ' 16]
\

.pfesszon 13.1).

323:; p .
‘ V 15 3102:-

\E

triples. Notationally, throughout the remainder of this paper, the notation { Q } S { R} will

be used to indicate a partial correctness interpretation.

3.1.1 Assignment

An assignment statement has the form x: = e; where x is a variable, and e is an
expression. The wlp of an assignment statement is expressed as wlp(x : =e, R) = Rﬁ,
which represents the postcondition R with every free occurrence of :1: replaced by the
expression 6. If 2: corresponds to a vector 'y' of variables and e represents a vector F— of
expressions, then the wlp of the assignment is of the form RE, where each y,- is replaced
by E,, respectively, in expression R. The sp of an assignment statement is expressed as

follows [16]

sp(x: =e, Q) = (Eh) :: Q: A x = eff), (3.1)

where Q is the precondition, v is the quantiﬁed variable, and ‘::’ indicates that the range of
the quantiﬁed variable v is not relevant in the current context.

Section 3.1.2 describes two lemmas for eliminating the existential quantiﬁcation in
Expression (3.1). In the ﬁrst lemma, if the precondition Q is of the form C A (:1: = u),
where C is a logical expression, then after the textual substitution of variable a: with v in
Q, Expression (3.1) reads as (322 :: C: A (v -—= u) A a: = eﬁ). Since (22 = u), the expression
(31) :: C: A (v = u) A a: = 6:) is logically equivalent to C: A (u = u) A a: = eﬁ. In the
second lemma, if a: does not appear as a free variable in either the logical expression Q or

the expression 8, then (3v :: Q: A a: = 6:) is logically equivalent to Q A a: = 6. Assuming

21

t can establish the

i r": -
.01' mgnmcni 514287?

it» .. . . ~
4;.» . mpmxr'\ I:

:5.

"i u- g
x -> 56 press.-

44, ."
“uuk .,

3‘13 Removin

'5

Qt!- .

“L: YCJOU &~\"v~l
Ev
..-.;lEss;gn “ 1

can]

"
‘ﬁh‘ I '
r.»\0iidluon IS 1

362-»; .
«on b“Cd or.

Nun Simpliﬁ

T; v
.7, ' “Here L
{x t

we can establish the conditions for satisfying these lemmas, the Hoare triple formulation

for assignment statements is as follows:

{Q} /* precondition */
x := e;
{Q$A(x=ej)} /* postcondition */

where 2) represents the initial value of the variable a: before execution of the statement and

Q is the precondition.

3.1.2 Removing quantiﬁcation from the speciﬁcation of assignment

This section describes two lemmas that justify why the existential quantiﬁcation in
Expression 3.1 can be removed. The ﬁrst lemma (Lemma 3.1.1) describes the case when
the precondition is in a particular canonical form. The second lemma (Lemma 3.1.2) is a

derivation based on the semantics for assignment described in [16].
One-point simpliﬁcation

In this section we prove that the quantiﬁcation in Expression (3.1) can be eliminated when

the precondition has a particular canonical form.

Lemma 3.1.1 (One-Point) Let the precondition Q in Expression (3.1) have the form U A
(a: = n) where U is a logical expression, n is a constant and a: is the variable from the

statement “x := e”. Then

sp(x := e,Q)EUgsz-e:

Proof. The sp derivation is as follows.

sp(x := e,Q) E (322 :: Q: Ax = 6:)
(Substitution of Q with U A (a: = n) )
E (312:: (UA(:e=n)):Aa: =63)

22

(Textual Substitution)
5(3v::U:A(v=n)Ax=e:)
(Trading [16])
5(3v:v=n:U:Ax=eﬁ)
(One-point rule [16] with v = n)
_=_ U: A 2: = e:

The fact that the existential quantiﬁcation can be removed when the precondition Q has
the form U A (:2: = n) is convenient since the canonical form can be derived from parameter
and variable declarations as described in Section 3.1.2. The logical formula U represents
the part of the precondition Q that does not specify the value of x. The extension of the
one-point rule to two points, three-points, or n points provide the more general expression

given by Expression 3.1.
Substitution

In this section we prove that the quantiﬁcation in Expression (3.1) can be eliminated when

there is no free occurrence of x in expression 6 and precondition Q.

Lemma 3.1.2 (Substitution) Assume that there are no free occurrences of x in
precondition Q and expression e in the statement x := e. Then

sp(x := e,Q)EQsze.

Proof. The sp derivation is as follows.

sp(x := e,Q) E (322 :: Q: A2: 2 eff)
(no substitutable y in U implies U}; E U )

E (31) :: QAx=e)
(Predicate calculus)
E Q A :1: = e

23

 

While the dc:

. ‘ '91
‘ r'-‘»?c‘.".zcs

s..

C)

7'54, 1 -
‘1’"1'KMILW , .‘
“d 4.1;.

H
a“? w p~

9!.
.‘
‘.
‘H,a:"r
"U _
n “4131a
u
5§~.
‘V. 4‘
W;

While the derivation of Q A a: = e is straightforward, the actual application of this case

is less frequent and only occurs in the cases when variables have no initial value.
Establishing the conditions for removing quantiﬁcation

Lemmas 3.1.1 and 3.1.2 identify the conditions for removing quantiﬁcation from the
speciﬁcation of assignment statements. In order to take advantage of these lemmas,
the conditions for removing quantiﬁcation must be established. Fortunately, there are
two properties of programs that allow for these conditions to be established: parameter
speciﬁcations and variable declarations.

An example parameter speciﬁcation might appear as follows:

proc p (value x; value-result 3;; result 2 );

Using the fact that a: is deﬁned as a value parameter and y is deﬁned as a value-result
parameter, it can be easily deduced that, upon entry into the program p, the parameters
3 and y have some initial value. As such, we can assert that (:1: = X) and (y = Y),
thus establishing the conditions for removing the quantiﬁcation in the speciﬁcations of any
assignment to x or y. To establish the conditions for Lemma 3.1.2, note that during program

execution, a declared variable has no initial value.

3.1.3 Alternation

An alternation statement (also known as a conditional statement) using the Dijkstra guarded

command language is expressed as [25]

24

.rgol ‘ ' .
:“VQIS 636, T;

1‘:
‘I‘ \\ "\
ZN!» ,
5;)

\
3m.

.‘ ‘5 “ '

‘0!) 4'3 1
~ ) Sui“;

JV. ‘

‘s- .
‘ ‘J‘Jh‘e I,
‘lq‘J’C of 3.

n:
U
A?

if
Bl -—) 51}

|| 13,, —> Sn;
f i;
where B,- —) S,- is a guarded command such that S,- is only executed if logical expression

(guard) B,- is true. The wlp for alternation statements is given by [16]:
wlp(IF, R) E (W : B,- : wlp(S,-, R)),

where IF represents the alternation statement. The equation states that the necessary
condition to satisfy R, if the alternation statement terminates, is that given B,- is true, the
wlp for each guarded statement S,- with respect to R holds. The 3p for alternation has the

form [16]

sp(IF, Q) E (32 II sp(S,, B,' A Q». (3.2)

The existential expression can be expanded into the following form comprising a sequence

of disjuncts:

819(IF, Q) E sp(Si, 81 A Q) V - - - V sp(Sn, Bn A Q). (33)

Expression (3.3) states that after execution of the if- fi statement, one of the disjuncts
sp(S,, B,- A Q) is true. The form of Expression 3.3 as a sequence of disjuncts illustrates the
disjunctive nature of alternation statements where each disjunct describes the postcondition
in terms of both the precondition Q and the guard and guarded command pairs, given by

Bi and 8,, respectively. This characterization follows the intuition that a statement S,- is

25

on". executed n :
2:.‘ec or. he 51.11;.
J J. , -o,~ o ._
siertsaon s. an:

3.2-3‘5

only executed if B,- is true. The translation of alternation statements to speciﬁcations is
based on the similarity of the semantics of Expression (3.3) and the execution behavior for
alternation statements. Using the Hoare triple notation, a speciﬁcation is constructed as
follows
{ Q}
if
Bl “—t SI;
|| 13,, —+ Sn;
fi;
{sp(Sla Bl A Q) V ' ' ' V sp(Sm Bn A Q) }

3.1.4 Sequence

For a given sequence of statements SI; . . . ;S,,, the postcondition for some statement
S,- is the precondition for some subsequent statement 8,41. The wlp and sp for sequences

follow accordingly. The wlp for sequences is deﬁned as follows [16]:
wlp(81 ; 82, R) E wlp(Sl, wlp(Sz, R))-
Likewise, the sp [16] is

sp(Slisi’rQ) E sp(S2r3p(SlrQ))' (34)

In the case of wlp, the set of states for which the sequence 81 ; 82 can execute with R
true (if the sequence terminates) is equivalent to the wlp of 81 with respect to the set of
states deﬁned by wlp(Sg, R). For sp, the derived postcondition for the sequence 81:82
with respect to the precondition Q is equivalent to the derived postcondition for $2 with
respect to a precondition given by sp(Sl, Q). The Hoare triple formulation and construction

process is as follows:

26

 

3.2 Iterative:

298-...
N“;3 '

{Q}

51;

{3P(31,Q)}
$2;

{319(32, 319(31, Q» }
3.2 Iterative and Procedural Constructs

The programming constructs of assignment, alternation, and sequence can be combined to
produce straight-line programs (programs without iteration or recursion). The introduction
of iteration and recursion into programs enables more compactness and abstraction in
program development. However, constructing formal speciﬁcations of iterative and
recursive programs can be problematic, even for the human speciﬁer. This section discusses
the formal speciﬁcation of iteration and procedural abstractions without recursion. We
deviate from our previous convention of providing the formalisms for wl p and 3p for each
construct and use an operational deﬁnition of how speciﬁcations are constructed. This
approach is necessary because the formalisms for the wlp and sp for iteration are deﬁned

in terms of recursive functions [16, 26] that are, in general, difﬁcult to practically apply.

3.2.1 Iteration

Iteration enables the repetitive application of a statement. Iteration, using the Dijkstra

language, has the form

do
B1 —) 81;

H Bn —> Sn;
0d;

27

 

Mm general I

In :hc context 0
2: teat-ans 5"“ it

till

333:: and after ca;

I 2327311017 5121677.:

SL373».
\«ftt [ﬁrm
'n-R'aZE

In more general terms, the iteration statement may contain any number of guarded
commands of the form B,- —> S,-, such that the loop is executed as long as any guard B,-
is true. A simpliﬁed form of repetition is given by “do B -> S od

In the context of iteration, a bound ﬁmction determines the upper bound on the number
of iterations still to be performed on the loop. An invariant is a predicate that is true
before and after each iteration of a loop. The problem of constructing formal speciﬁcations
of iteration statements is difﬁcult because the bound functions and the invariants must
be determined. However, for a partial correctness model of execution, concerns of
boundedness and termination fall outside of the interpretation, and thus can be relaxed.

Using the abbreviated form of repetition “do B —+ S 0d”, the semantics for iteration
in terms of the weakest liberal precondition predicate transformer wlp is given by the

following [16]:

wlp(DO, R) E (Vi :0 S 2' : wlp(IFi, B V R)), (3.5)

where the notation “IF"’ is used to indicate the execution of “if B —> S fi” 2' times.
Operationally, Expression (3.5) states that the weakest condition that must hold in order
for the execution of an iteration statement to result with R true, provided that the iteration
statement terminates, is equivalent to a conjunctive expression where each conjunct is an
expression describing the semantics of executing the loop 2' times, where i 2 O.

The strongest postcondition semantics for repetition has a similar but notably distinct

formulation [16]:

sp(DO, Q) E -«B A (32' :0 g i : sp(IFi,Q)). (3.6)

28

.- 9
Ermion 12.6 :
mam. gxer. L‘
2:25. -B .35:
21:22:: of mes i:

2:202:22 t“. s
1"! I ‘ .
-22.. prewar;
PIP-'-

F3,-

. :25'2“
mac. cons

\-

.‘A...‘_‘ O
‘ “I“ " !.
1’ until Of the :

._ .
\". ~ -\

Expression (3.6) states that the strongest condition that holds after executing an iterative
statement, given that condition Q holds, is equivalent to the condition where the loop guard
is false (p3), and a disjunctive expression describing the effects of iterating the loop some
number of times k, where k 2 0.

Although the semantics for repetition in terms of strongest postcondition and weakest
liberal precondition are less complex than that of the weakest precondition [16], the
recurrent nature of the closed forms make the application of such semantics difﬁcult.
For instance, consider the counter program “do i < n ——> i := i + 1 0d”. The

application of the sp semantics for repetition leads to the following speciﬁcation:

sp(do i < n ——> i := i + 1 od,Q)E(iZn)A(3j:03j:sp(IFj,Q)).

The closed form for iteration suggests that the loop be unrolled k times, such that
sp(I F", Q) is true. If k is set to n — start, where start is the initial value of variable i,

then the unrolled version of the loop would have the following form:

L i:= start;

2. if

3. i < n --> i:= i + l;
4. fi

5. if

6. i < n -—> i:= i + l;
7. fi

8. .H

9. if

10. i < n —-> i:= i + 1;
1L fi

Application of the rule for alternation (Expression (3.2)) yields the sequence of

annotated code shown in Figure 3.1, where the goal is to derive the speciﬁcation given

29

 

'2) he expresszon:

J

'o.‘
A

—. ..
M

H

p

.

J

1

c

i r_
-- w: =..
.I

p i):
i -
2.. :: \‘.
‘i’ 4;

“ c < "
'1 .‘
“ c .
N {-P l .2 j ..
L- y
l‘ I'-
..‘. ,
I h“ >:
:4,
:2 ‘
Lt ' “ 2*
. ll

lill‘20‘
1‘1 ‘u'
"t ‘v'
oi.
a 1'! \
‘ >
‘
M ‘5
. x
‘
n ‘-
.4 ‘ tn.
1
‘ ‘ < v-

'f F" If
r—v—A—"t-
462'

ll

1.}!
V
//

\
\
t \

 

by the expression:

sp(do i < n —) i := i + 1 od,(start<n)A(i=start)).

 

NNNN NNNNh—r—sr—or—t—ot—tr—nv—Ir—ot—t
39°>29~M¥92Nr9>o9°>19~9522~s~2r9>o

9°>'.°‘.U':“P°!°."

{(i = I) A (start < n)}

i:= start;

{ (i = start) A (start < n) }

if i < n -> i:= i + 1 fi

{sp(i := i + 1, (i < n) A (i = start) A (start < n))

2' >= n) A (i = start) A (start < n))

'Alll2<

i: start + 1) A (start < 17.)) }
= i + 1 fi
{sp(i := i + 1, (i < n) A (i = start + 1) A (start < n))
V
((i >= n) A (i = start + 1) A (start < n))

at = start + 2) A (start + 1 < 71))
V
((i >= n) A (i = start + 1) A (start < n))}

.{.((i=start+(n-start—1))A(start+(n—start—1)—1<n))
V
((i >= n)A(i =start+(n—start—2))A(start+(n—start—2) —1 <n))
Ei=n-1)A(n—2<n))}
if i < n -> i:= i + 1 fi
{sp(i:=i+l,(i<n)A(i=n—-1)A(n-2<n))
V
((i>=n)A(i=n—l)A(n—2<n))
6:21)}

Figure 3.1: Annotated Source Code for Unrolled Loop

 

In the construction of speciﬁcations of iteration statements, knowledge must be

introduced by a human speciﬁer. For instance, in line 3.2.1 of Figure 3.1 the inductive

3O

 

2mm in “r =
pruning the Info
Ber: smiled a: le
he tested 52:22:52.

For this szmplz
507:2: deﬁtzzron c
ELitdSU‘ait‘g} for

ransom 51.213771:

assertion that “i = start + (n — start — 1)” is made. This assertion is based on a speciﬁer
providing the information that (n — start — 1) additions have been performed if the loop
were unrolled at least (72 — start — 1) times. As such, by using loop unrolling and induction,
the derived speciﬁcation for the code sequence is ((n — 1 < n) A (i = n)).

For this simple example, we ﬁnd that the solution is non-trivial when applying the
formal deﬁnition of sp(DO, Q). As such, the speciﬁcation process must rely on a user-
guided strategy for constructing a speciﬁcation. A strategy for obtaining a speciﬁcation of

a repetition statement is given in Figure 3.2.

3.2.2 Procedural Abstractions

This section describes the construction of formal speciﬁcations from code containing the
use of non-recursive procedural abstractions. A procedure declaration can be represented

using the following notation

proc p ( value 5; value-result g; result ‘2' );
{P}( body >{Q}

where T, y, and 2 represent the value, value-result, and result parameters for the
procedure, respectively. A parameter of type value means that the parameter is used only
for input to the procedure. Likewise, a parameter of type result indicates that the parameter
is used only for output from the procedure. Parameters that are known as value-result
indicate that the parameters can be used for both input and output to the procedure. The
notation ( body ) represents one or more statements making up the “procedure”, while {P}

and {Q} are the precondition and postcondition, respectively. The signature of a procedure

appears as

31

l. The follow

speciﬁeazzor
o imaria
cut of

O guard}
of 6.2;:

II“ 3.3.5

 

1. The following criteria are the main characteristics to be identiﬁed during the
speciﬁcation of the repetition statement:

0 invariant (P): an expression describing the conditions prior to entry and upon
exit of the iterative structure.

0 guards (B): Boolean expressions that restrict the entry into the loop. Execution
of each guarded command, B,- —> S,- terminates with P true, so that P is an
invariant of the loop.

{PA B,}S.‘{P}, fOl'l S ’l S n

When none of the guards is true and the invariant is true, then the postcondition
of the loop should be satisﬁed (P A -rBB —+ R, where BB = Bl V . . . V B,,
and R is the postcondition).

2. Begin by introducing the assertion “Q A BB” as the precondition to the body of the
loop.

3. Query the user for modiﬁcations to the assertion made in Step 2. This guided
interaction allows the user to provide generalizations about arbitrary iterations of
the loop.

4. Apply the strongest postcondition to the loop body S,- using the precondition given
by Step 3.

5. Using the speciﬁcation obtained from Step 4 as a guideline, query the user for a
loop invariant. Although this step is non-trivial, techniques exist that aid in the
construction of loop invariants [27, 26].

6. Using the relationship stated above (P A -wBB —+ R), construct the speciﬁcation of
the loop by taking the negation of the loop guard, and the loop invariant.

Figure 3.2: Strategy for constructing a speciﬁcation for an iteration statement

 

proc p : (input_type)* —> (output_type)“ (3.7)

Where the Kleene star (*) indicates zero or more repetitions of the preceding unit,

inputiype denotes the one or more names of input parameters to the procedure p, and

32

.,‘.
t ‘y T dch
BJ‘?‘.—i'::i .

miicatron of a i

where E; is one o
.5 2 one or mo:

partisans tort

C0L‘ecttess m2

ﬁr LO Salish
1’.“
‘22.:rrr
“he afar.
“‘ R
“22.33 f.
or eat:
222

output.type denotes the one or more names of output parameters of procedure p. A

speciﬁcation of a procedure can be constructed to be of the form

{ P: U}

proc p: E0 —> El
(body)

{0: Sp(b0dy. U) AU }

where E0 is one or more input parameter types with attribute value or value-result, and
E1 is one or more output parameter types with attribute value-result or result. The
postcondition for the body of the procedure, sp( body, U), is constructed using the previously
deﬁned guidelines for assignment, alternation, sequence, and iteration as applied to the
statements of the procedure body.

Gries [26] deﬁnes a theorem for specifying the effects of a procedure call using a total
correctness model of execution. Given a procedure declaration of the above form, the

following condition holds [26]

, E,“ .. .. “,2 5,5 _ - _
{PRT ' Play A (W27) -- Q2232? => “Rt—137)} p(a, b: C) {R} (3-8)

for a procedure call p(‘d, 5, E), where 21', 5, and E represent the actual parameters of type
value, value-result, and result, respectively. Local variables of procedure p used to
compute value-result and result parameters are represented using a and a, respectively.
Inforrnally, the condition states that PRT must hold before the execution of procedure p
in order to satisfy R. In addition, PRT states that the precondition for procedure p must
hold for the parameters passed to the procedure and that the postcondition for procedure p
implies R for each value-result and result parameter. The formulation of Equation (3.8) in

terms of a partial correctness model of execution is identical, assuming that the procedure

33

2: 2522.201
meet? de.
:8 can be pr

ist't": an!
‘3"...4 ~‘j -
~

:2». _ ‘~-
k .K 5pc ‘3‘

,‘
2' J y 't \
v \‘ L, '
‘ t
{52” > .
:‘v ' ”‘9
Ler (
,‘p Q
1 1.;
.401] 3 l 1
. .s‘
or}:

is straight-line, non-recursive, and terminates. Using this theorem for the procedure call,
an abstraction of the effects of a procedure call can be derived using a speciﬁcation of the
procedure declaration. That is, the construction of a formal speciﬁcation from a procedure
call can be performed by inlining a procedure call and using the strongest postcondition for
assignment.

A procedure call p(E,b,E) can be represented by the program block [26] found in
Figure 3.3, where (body) comprises the statements of the procedure declaration for p,
{ PR } is the precondition for the call to procedure p, { P } is the speciﬁcation of the
program after the formal parameters have been replaced by actual parameters, { Q }
is the speciﬁcation of the program after the procedure has been executed, { QR } is
the speciﬁcation of the program after formal parameters have been assigned with the
values of local variables, and { R } is the speciﬁcation of the program after the actual
parameters to the procedure call have been “returned”. By representing a procedure call
in this manner, parameter binding can be achieved through multiple assignment statements
and a postcondition R can be established by using the sp for assignment. Removal of
a procedural abstraction enables the extension of the notion of straight-line programs to
include non-recursive straight-line procedures. Making the appropriate sp substitutions,
we can annotate the code sequence from Figure 3.3 to appear as shown in Figure 3.4 where
d, B, “'7, E, 5, and 7,5 are the initial values of i, Y (before execution of the procedure
body), 37 (after execution of the procedure body), 2', S, and E, respectively. Recall that
in Section 3.1.2, we described how the existential operators and the textual substitution
could be removed from the calculation of the sp. Applying that technique to assignments
and recognizing that formal and actual result parameters have no initial values, and that

34

rinsed cor

2:
cXIﬁ-r‘ 4 .
”173% r .
02 the

U
K
.‘r~
-. “a”
1"
We n-
‘ K4 l"
v ‘:
'\
JI-c‘a‘
.-_\“ ‘

local variables are used to compute the values of the value-result parameters, the above

sequence can be simpliﬁed using the semantics of sp for assignments to obtain the following

annotated code sequence:
{ PR }
237 := 5,5;
{P.°PRA)‘<=5AY:B}
(body)
{ Q }
Vii = EV; _ _
£QR:QA§7=E¥AE=—7}
bf := 37,5;

where Q is derived using sp((body), P).

3.3 Example

AUTOSPEC is a tool that has been developed to support the use of strongest postcondition
in the construction of formal speciﬁcations from existing program code [6]. In this section
we describe the use of AUTOSPEC to facilitate the analysis of programs.

AUTOSPEC accepts programs as input and using rules such as the ones described in
this chapter, derives a formal speciﬁcation of the input program. Our current investigations
include extending the AUTOSPEC tool to support the formal strongest postcondition
semantics of the C programming language as described in Chapter 5 [7]. For statements
such as assignments and conditionals, AUTOSPEC is fully automated. When processing
loops, AUTOSPEC allows a user to provide appropriate preconditions and postconditions.
For instance, consider Figure 3.5 which contains output from a session of the AUTOSPEC

system without pointers. The input program

35

 

begin
{2R}
p('d,5,e)
{R}

end "

begin
declare i, 37, ‘2', ﬁ, if;

end

Figure 3.3: Removal of procedure call p(&', b, E) abstraction

 

X := 0;
do

(M > power(2,x)) —> x := x + 1;
0d;

computes the value of the smallest integer :r such that for an input value I, I _<_

2”. The initial precondition to the loop is computed as ((I = 1.0) & (x
O) ). On encountering the loop statement the user is prompted by the string “Enter
Precondition:” to enter a precondition for an arbitrary iteration to the loop.
Figure 3.2 [6] discusses guidelines for specifying the effects of loops. Using these

guidelines, the precondition ((I = 1.0) & (x = i)) is input by the user and

36

 

is 5.5: __
{It-(36.2? P agA‘iﬁit’Aﬁﬁ’iH
(body)
{Q }
ii := Ev,
. — — . —,E — __ —?r_z_ — ._ —Y’§
£QR. (iv—f Qiz Ay — um? A z — V7,Z )}
,C = 3722;
_ _ .15 — _Bfé _ _B,C
{re-(32,2 Q ,9, My”, Ac: 2,,»

Figure 3.4: Code annotation for procedure call

 

the term (I > power (2 , x) ) is automatically generated and conjuncted to the
precondition. The resulting preliminary speciﬁcation of the postcondition for the loop that

is generated by AUTOSPEC is as follows:

(((I > power(2,as.const2)) & ((I = LO) 8. (as-const2 = i))) &
(x = (as.const2 + 1)))

where “8.” is the logical connective A’ The user is then prompted by the string “Enter
Postcondition : ” to enter a postcondition. From the preliminary speciﬁcation we can
deduce that while the guard is true, (Vi : 0 g i < a: : (I > 2‘)). Furthermore, after

execution completes I g 2“. Therefore, the ﬁnal speciﬁcation can be entered as

((I <= power(2,x)) & (forall i : ((0 <= i) & (i < x))
(I > power(2.i))))

which states that after execution of the loop, I 5 2‘c and that for every integer 0 g i < 2:,

1’>»2i

37

C.
..l.. .. Q~
. . . x .. . .:
: as.

.. > 2
I l. .- . . X
4‘

.J -

m
H

w-‘

g
0“

:0.
"P

~-

“jg-d

 

shell> as_jr approxI

{ ((I = I_0) & (x = 0)) }
do
(I > power( 2, x)) ->
X:= (Xi-l);
0d;

Enter Precondition:
((I = I_0) & (x = i))

I ((I = I_0) & (x = 0)) }
do
(I > power( 2, x)) ->
x := (x + 1);
{ (((I > power(2,as_const2)) & ((I = I_0)
(x = (as_const2 + 1))) }
0d;

& (as_const2 = i))) &

{ (((I > power(2,as_const2)) & ((I = I_0) & (as_const2 = i))) &

(x = (as_const2 + 1))) }

Enter Postcondition:

((I <= power(2,x)) & (forall i : ((0 <= i) & (i < x)) : (I > power(2,i))))

Figure 3.5: User Consultation

 

In addition to supporting the ability to have a user provide guidance during the

speciﬁcation process, AUTOSPEC supports syntactic and semantic veriﬁcation of the input

entered by users by using an integrated syntax checker and theorem prover.

38

Chapter 4

Strongest Postcondition Semantics of
Pointers

Many modern programming languages support the use of pointer variables, including
C and OH. This chapter describes how we extended the strongest postcondition predicate
transformer to include the formal semantics of programs with pointers [28]. The semantics
are deﬁned for a modiﬁed Dijkstra language that has been extended to include pointer
variables. The extension of the strongest postcondition semantics to include pointers

facilitates the use of strongest postcondition for reverse engineering a more general set
of programs.
4.1 Pointers

Using terminology of the C programming language [29], a pointer is a variable that
contains the address of a variable. A common use of pointers is the creation of aliases,
which refers to the fact that several names can be used to refer to a single data object.
For instance, the statement “x := @a”, where x is a pointer and a is some data variable,
creates an alias, thus operations involving x and a are synonymous. The notation “*p”
indicates a dereference of the pointer p in order to access the value of the referenced

object. There are four different classes of alias detection: intraprocedural may-alias,

39

interprocedural may-alias, intraprocedural must-alias, and interprocedural must-alias. The
term may-alias refers to the fact that given two variables, during some execution of
a program, the variables are aliases for one another. The term must-alias means that
during all executions of the program, the variables will be aliases for one another. The
terms interprocedural and intraprocedural indicate the context of the aliasing, where
interprocedural is global and intraprocedural is local. Compile-time analysis of programs
to detect aliasing has long been recognized as difﬁcult. In fact, it has been proven
that static analysis to detect aliases is undecidable [30]. This research does not address
may/must-aliasing problems directly although the intention in the development of the
formal semantics for pointers is to provide a theoretically rich formalism that can be used
to aid may/must-alias analysis.

In addition to having may/must-alias detection, alias detection techniques can be ﬂow-
sensitive or ﬂow-insensitive. A technique is ﬂow-sensitive if control structures are factored
into the detection algorithm. The techniques that we suggest are ﬂow-sensitive although,

again, we do not directly address alias detection.

4.2 Pointer Semantics

A pointer is a variable that contains the address of some data object. Pointers can be
assigned in a number of different ways including heap allocation and direct addressing of a
variable. For instance, the C-like command “p : = @k” assigns the address of the variable
k to pointer variable p. As such, the pointer variable p points-to variable k. This section

describes the strongest postcondition semantics of pointers.

4O

4.2.1 Failt

Tae stressest ;

mnent “x

L'l‘r“ ' -
LI rcmfn‘ﬁ- {OT

3:] after the It

4.2.1 Failure of Conventional Assignment Semantics

The strongest postcondition semantics of the assignment statement is as follows. Given a

statement “x := 'e” and a precondition Q:
sp(x :2 e,Q) —_-= (31)::ijA :1: 2 eff), (4.1)
which states that after the execution of “x := e” there exists some variable 11 such that

every free occurrence of :1: in Q is replaced with v and :r = e3. Section 3.1.2 describes
two lemmas for eliminating the existential quantiﬁcation in Expression (4.1). In the ﬁrst
lemma, if the precondition Q is of the form C' A (:1: = u), where C is a logical expression,
then after the textual substitution of variable :1: with v in Q, Expression (4.1) reads as
(31) :: C:A(v = u)Ax = eff). Since (1) = u),the expression (31) :: CffA(v = u)A:1: = 6”) is
logically equivalent to C:A(u = u)A:1: 2 e3. In the second lemma, if :1: does not appear as a
free variable in either the logical expression Q or the expression e, then (31) :: Q: Ax = ej)
is logically equivalent to Q A a: = e.

In a naive treatment of pointers, we can attempt to apply the semantics of the assignment
statement to pointer variables. However, doing so causes various problems. Consider the
example in Figure 4.1 where p and q are pointer variables, d is a typed variable, and e
is a constant. Given the precondition {*q = Y}, where *q is a dereference of an object
and Y is a constant, the strongest postcondition of the statement sequence “p := q; d
:= *p; *p := e” is {d = v Ap = qA *q 2 YA *p = e} when the conventional
semantics for assignment is used for pointer assignments. This speciﬁcation, derived using

the lemmas in Section 3.1.2, states that after execution of the sequence, d has value v, p

41

A .. , .
$5.... mint .0
tits Streamer:

lS .: $032.10.". :1;

5r-

He}.
‘4'
"i r
\Jn
V\e
' i.-
Vrl‘ T
“ Cr.
' dmv‘r

and q point to the same object, and that the value of *q = Y and *p = e. The problem with
this speciﬁcation is that while p = q (i.e., pointers p and q refer to the same object), there

is a contradiction in the conjuncts *q = Y and *p = e.

 

{*q=Y}

P 3: q;
((31)::(*q=Y)€Ap=q)E(p=qA*q=Y)(Lemma3.l.2)}
d := *p;

{(3123(p=qA*q=Y)gAd=*p)E(d=*pAp=qA*q=Y)(Lemma3.l.2)}
*p := e;

{(3122:(d=*pAp=qA*q=Y);pA*p=e)E
(d=vAp=qA*q=YA*p=e)(Lemma3.l.2)}

Figure 4.1: A simple pointer example

 

The remainder of this section presents a model for describing the formal semantics
of pointer operations that overcome the problems that occur when using conventional

assignment semantics.

4.2.2 Memory Model

In the C programming language, variables can be allocated from heap storage, registers,
or stack. The model used in this paper for representing memory is cell-based, where the
memory consists of a large number of storage cells. Each cell is named and contains a
value. A diagram of this model is shown in Figure 4.2, where the entries in the column
labeled N indicate the names of the cells, and the entries in the column labeled V indicate
the values. In the diagram, data objects x, y, and 2 have values a, b, and c, respectively. As
a convention we use “n.V”, where n is a cell name, to denote the value of the data object

11. In our example, x.V = a, y.V = b, and z.V = c.

42

 

 

 

 

 

a
y b

 

 

 

 

 

Figure 4.2: Cell Memory Model

 

4.2.3 Extending the Model for Pointers

A pointer can be assigned by heap allocation, pointer assignment, or alias assignment.
Examples can be found in Table 4.1. Different alternatives for representing the use of
pointers within the context of the cell memory model are available including the use of

indirection where if pointer p points to some variable v, then the value of p is v.

 

 

 

 

 

 

[Type Example I
heap allocation p := new T
pointer assignment p := q
alias assignment p := @x

 

 

 

Table 4.1: Pointer Assignments

 

Consider the set of data objects N and the set of pointers M that are currently allocated
at some step during the execution of a particular program. Assuming that all the pointers

in M point to data objects (not necessarily distinct) in N, using the equivalence relation

43

“:ﬁ “RC3? Win)
.1! sub that eat
of a pancake 2:.
at} other men's
1. 21d '5 and po:
12': s and :

Jr. one term

L‘Hn- ‘
H.152 =2. 2 ‘
. “Ante CR;

.1. 1"."‘21
» ‘-

L‘ 333 - ~~
“mauled

.2: e dCh are
”Se

(3.! be
the

“_n

where pointer p = q if and only if p and q point to the same object, we can partition
M such that each partition is an equivalence class. As such, any operation on a member
of a particular equivalence class is behaviorally equivalent to performing an operation on
any other member of the same equivalence class. For instance, suppose we have variables
x and y and pointers p, q, r, s, and t. Let pointers p, q, and r point to variable x, and
pointers s and t point to variable y. Since p, q, and r point to the same variable x, they
form one equivalence class, and since 5 and t point to the other variable, they form another
equivalence class.

The equivalence classes within the set of pointers M can be considered to be dynamic
since the execution of a programming statement can possibly rearrange the members of
each set as is the case when pointer variables are either reused in heap allocation or a
pointer assignment. Figure 4.3 depicts the extension of the memory model where there
is an associated equivalence class in M for each memory cell. For consistency sake we
assume that a data object can reference itself and, as such, is a member of the associated
equivalence class. For example, the data object y with pointers s and t has an associated
equivalence class from M with members { y, s, t }. We refer to this equivalence class as

M [2!].

4.2.4 Points-to and Coset

In this section we deﬁne the semantics of the points-to relation and the coset function, both
of which are used to formally describe the behavior of pointer operations.
Let M be the set of pointers, N be the set of allocated data objects, and B be the Boolean

type. Figure 4.4 deﬁnes the > (pronounced “points-to”) relation. The primary use of the

44

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-—r"""- --------- , X a
, _____

y,s,t }—. y b

~‘i‘-~- ----------- 7 z c

 

 

 

 

 

Set of Pointers Cell Memory Model

Figure 4.3: Pointer Extensions to the Memory Model

 

points-to relation is for making assertions about pointers and their relation to speciﬁc data
objects. That is, it asserts that a pointer is in the equivalence class associated to a particular
data object. Informally, the points-to relation is a heterogeneous relation on {M U N} x N.
The ﬁrst axiom states that when data objects 01 and 02 are both in the set N that 01 > 02 is
true if and only if 01 = 02, where “01 = 02” when 01 and 02 are the same data object. As
such, a data object can only reference itself and never references another data object. The
second axiom states that for a pointer p E M and a data object o E N, p > o if and only
if p E M [0]. That is, p points-to o if and only if pointer p is an element of the equivalence
class of o. Equivalently, a pointer p points to a data object o if it is in the equivalence class
M [o].

The coset function is deﬁned in Figure 4.5. The primary use of the coset function is to

identify a dereferenced object. Infonnally, the coset function maps pointers to data objects,

45

'61“
..-.re
3 painter

xix--1
0‘,
_ uCUCC ‘1'
5 ‘-
IQ

\

 

_>_.: {MUN}XN+—)B

Axioms:
(Vol,o2:ol,o2€N:ol>o2©ol=o2)
(Vp,o:p€MAoEE:p>o¢>pEM[0])

Figure 4.4: The points-to relation

 

where a pointer p maps to a data object o if p E M [o]. If a pointer does not belong to any

equivalence class then the cost function is undeﬁned.

 

coset :M r—t {N U {undeﬁned}}

coset(p) = {

o ifandonlyifp>o,
undeﬁned otherwise.

Figure 4.5: The coset function

 

4.2.5 Assignment Revisited

Given the deﬁnition of the points-to operator and the semantics of the equivalence class
model, we must redeﬁne the sp semantics of the assignment statement for simple (non-
pointer) variables to be consistent with the model. Given a statement “x := e” and a
precondition Q where x is a non-pointer variable and e is an expression, the strongest

postcondition for assignment statements is:

46

 

2.2:}: states i

set} free or:

which states that after the execution of “x := e” there exists some variable 12 such that

every free occurrence of :r.V in Q is replaced with v and 2:.V = 6:. Formally, ‘é means:

é 4: Va : (variable(u) A term(u, e) —> 6:.v) A VP 3 (pointer(p) A term(*p, 8)) ‘* egsez(p).v

Inforrnally, the notation é indicates that the expression e is transformed so that every simple
variable u that is a term in e is replaced by u.V, and every pointer dereference *p that is
a term in e is replaced by coset(p).V, where coset(p).V refers to the value of the object
identiﬁed by coset(p). This formalization ensures that there is a consistent notation for

referring to the values of data objects.

4.2.6 Heap Allocation

When a pointer is assigned a “value” then that pointer is placed into an equivalence class
such that all the members of the equivalence class point to the same data object. One
method for assigning a value to a pointer is through heap memory allocation. Heap
memory allocation has the form “p := new T” where p is a pointer and T is a data
type. Inforrnally, upon allocation of heap memory, a new data object of type T is created
and the pointer p is used to reference the object. In our model this action is represented
by introducing a new entry 0 in N with an undeﬁned value in V, and adding p to the
equivalence class M [o]. In addition, if p was previously in some equivalence class M [h], it

is removed from that set. Formally we can state this condition as follows:

sp(p := new T, Q) E (3c : c E N: Q’C’)Ap > o A o.V = undeﬁned, (4.2)

47

 

«here 0 is a
312‘: 1. term

E241}. the Is

is an 31m?

.. _.’r__ .

_ﬁ ”—1—

where o is a new data object. The textual substitution of every free occurrence of p in Q
with the term c E N ensures that p is removed from any equivalence class that it may have
previously been associated, and the assertion p > 0 places p into the equivalence class M [0].
Finally, the term o.V = undeﬁned asserts that the value of the new object o is undeﬁned.

As an example, let precondition Q be {q > 01} and statement S be “q : = new T”. Then

sp(S, Q) E sp(q := new T, q > 01)
(Expression (4.2))
5(3010 E N: (q > 01):) A q > 02 A o2.V = undeﬁned
(Textual substitution of q with c)
5(36166 N:c>ol)Aq>02Ao2.V= undeﬁned
(Points-to axiom (Figure 4.4) applied to c > 01)
E (3czce N : c=ol)Aq>o2Ao2.V= undeﬁned
(Trading [16])
2(3c:c=01:cE N)Aq>o2A02.V = undeﬁned
(One-point rule [16] with c = 01, 01 E N E true)
E q > 02 A 02.V = undeﬁned

As such, after the execution of the statement “q := new T”, the pointer q points to

some new object 02.

4.2.7 Pointer Assignment

Another way of assigning a value to a pointer is via direct aliasing as in the C-like command

p : = @x”. In terms of the equivalence class model, the pointer alias assignment adds the

pointer p to the equivalence class M [x]. The formal semantics of this command is similar

48

the heap alloea

Expression 14.33 5
Tree Ottu'rence ot
plated mm the ct; ‘

3WD 2: Ex"

to the heap allocation case. Formally the semantics is as follows:

sp(p := @x,Q)E(3c:cEN:Q§)Ap>x. (4.3)

Expression (4.3) states that after executing the statement p : = @x that p> z and that every
free occurrence of p in Q is replaced with c. This relationship ensures that the pointer p is
placed into the equivalence class associated to the variable :13. As an example, let statement

S be “p : = @x” and let precondition Q be “{p > ol}”. The 3p derivation is as follows.

sp(S, Q) E sp(p := @x,p > 01)
(Expression (4.3))
E(3c:c€N:(p>01)’c’)Ap>:1:
(Textual substitution of p with c)
E(3c:cEN:(c>ol))Ap>$
(Points-to axiom applied to c > 01)
E(30:c€N:c=ol)Ap>x
(Trading)
E(3c:c=ol:c€N)Ap>:1:
(One-point rule with c = 01, 01 E N E true)
Ep>$

Hence, the pointer p points to the data object as.
The ﬁnal way that a pointer can be assigned a value occurs when a statement of the
form “p := q” is executed, where p and q are pointers. In the terms of the equivalence

class model, the pointer assignment adds the pointer p to the class that contains pointer q.

49

in his case thc

Errression [4.4
if: g . Fore

A)”
."> or; . The."

I".

In this case the formal semantics is expressed as

sp(p := q, Q) E (3c : c E N: QZ)Ap > coset(q). (4.4)

Expression (4.4) states that after execution of the pointer assignment, p points to the object
coset(q). For example, let statement S be “p := q” and precondition Q be “{q > 01 A

p > 02}”. Then

sp(S,Q) E sp(p := q,q > 01Ap> 02)
(Expression (4.4))
E (3czcE N: (q>olAp>o2)f§)Ap> coset(q)
(Textual substitution of p with c)
E (3c:c€ N:(q>01Ac>02))Ap> coset(q)
(Points-to axiom applied to c > 02))
E (acrce N: (q>olAc=02))Ap> coset(q)
(Trading)
E (3c:c=o2:c6 NAq>ol)Ap> coset(q)
(One-point rule with c = 02, 02 6 N E true )
E q > 01 Ap > coset(q)
(Deﬁnition of coset)
E q > 01 A p > 01

As such, after the execution of “p : = q”, the pointers p and q reference the same data

object.

4.2.8 Value Assignment

In the C programming language, the value of the data object that a pointer references is

accessed using the notation “*p”, where p is a pointer variable. Using the same notation

50

corxenson. 2r.
:= e". “here
of '; sets the s

‘ r. J : “
Qédrfﬁirer‘ret

t}; mfg”. the

convention, an assignment to the data object is achieved using a command of the form “* p
:= e”, where e is an expression. In terms of the equivalence class model, the assignment
of *p sets the value of the data object coset(p) to e. Formally, the semantics of assignment

to a dereferenced data object is as follows:

sp(*p := e, Q) E (31) : v E T : ngset(p).V A coset(p).V = éﬁo‘e‘(p)'v) (4.5)

where T is the type of the data object, and v is a value of that type. For instance, if T is the
type integer, then u is some integer. The variable 1) represents the value of the data object
dereferenced by *p prior to the execution of the statement “*p := e”. Informally, the
semantics state that after execution of the statement “*p := e”, *p will have the value
éﬁmtm‘v. Additionally, Sammy will be true. For example, let statement S be “*p :=
5” and precondition Q be {p > 01 A ol.V = n A ol.V g k}. Informally the precondition
states that p points to object 01, the value of 01 (denoted ol.V) is n, and ol.V _<_ k. The 3p

derivation is as follows.

sp(S, Q) E sp(*p := 5,p > 01 A ol.V = n A ol.V g k)
(Expression (4.5))
E (312:1) E T: (p > 01 A ol.V = n A ol.V S k)f,°“‘(")‘v A coset(p).V = 5)
(Coset deﬁnition and textual substitution of coset(p).V with v)
E (3v:v€T:p>olAv=nAvSkAcoset(p).V=5)
(Trading)
E (311:1) €TAv=nzp>olAv S hAcoset(p).V=5)
(One-point rule with v = n)
Ep>olAn S kA coset(p).V= 5
(Deﬁnition of coset)
Ep>olAnS kAol.V=5

51

Hence, after the execution of “*p : = 5”, the data object pointed to by p has value 5.

4.2.9 Value Dereference

99

The command for observing the value of a pointer dereference has the form “x := * p ,
where x is a variable and p is a pointer. In terms of the equivalence class model, the
value dereference *p refers to the value of the data object associated to the equivalence
class containing p. That is, *p refers to coset(p).V. The formal semantics of a value

dereference is as follows:

sp(x := *p, Q) E (322 : v E T : fo'v A :13.V = coset(p).V) (4.6)

where T is the type of the data object, and v is a value of that type. Informally,
Expression (4.6) states that after the execution of a statement with *p on the right hand
side of an assignment, the left hand side of the assignment takes on the value of the object
that has been dereferenced. The term Qﬁ'V states that every free occurrence of x.V in
Q is replaced with the value of :1: previous to executing the statement “x := *p”. As
an example, let statement S be “x := p” and let precondition Q be {p > 01 A ol.V =

n A 2:.V = y}. The sp derivation proceeds as follows.

sp(S, Q) E sp(x := *p,p > 01 A ol.V = n A 11:.V = 3))
(Expression (4.6))
E (311:1) E T: (p > 01 A ol.V = n A x.V = 103'" /\ $.V = coset(p).V)
(Textual substitution of :r.V with v)
E (311:1) ET:p>olel.V=nAv= yAar.V = coset(p).V)

52

 

n

{"1

Hi

' 42.122

.l‘ 2
hr.

93...

(Trading)

E (321:1) €TAv=yzp>oleLV=nAv =yA:1:.V ;—_ coset(p).V)
(One-point rule with v = y)

E p > 01 A ol.V = n A :r.V = coset(p).V)

(Deﬁnition of coset)

Ep>olel.V=nA:1:.V =01.V)

This states that the new value of :c.V is equivalent to the value of the data object 01.

4.3 Examples

Figure 4.6 contains three programs for illustrating the pointer semantics described in this
chapter as well as for showing the use of an automated tool for analyzing programs with
pointers. Figure 4.6(a) is the program from Figure 4.1. Figure 4.6(b) is a program for
demonstrating how aliases are resolved using the pointer semantics, and Figure 4.6(c)
shows a program with a conditional statement and how the conditional statement impacts
pointer resolution. The speciﬁcations in this section were all automatically generated by

the AUTOSPEC tool.

4.3.1 alias

The alias program is shown in Figure 4.6(a). Figure 4.1 demonstrated the application
of conventional strongest postcondition semantics for assignment to the alias program
and the failure of those semantics to correctly specify the behavior in the context of pointer
use. In this section we describe the semantics of the speciﬁcation constructed using the

AUTOSPEC tool with support for the pointer semantics presented in Section 4.2.

53

 

program alias( program manyvars() program maxThresh(

inputs: int e; inputs: int e; int x; int y;
int *q;) decl outputs: int *2; )
int 2; int u; int *r;
decl int *q; begin
int d; Iced
int *p; if ( x > y ) —> z := @x;
Iced begin || ( x <= y ) -> z := @y;
begin fi;
r := @u; *2 := *z + e;
P == q; z := 0;
d := *p; q := @2; end
*p = e; *r = 1;
eq = er;
end
end

(a) (b) (c)

Figure 4.6: Three Sample Programs: (a) alias (b) manyvars (c) maxThresh

 

Figure 4.7 contains the output of AUTOSPEC when executed using the alias program
as input. The precondition appears as the logical formula enclosed within the curly braces
“{” and “}” following the keyword begin at line 11. It is derived from the parameter
and variable declarations, int e; int *q; , and int (1; int *p; , respectively.
Informally, the precondition states that the declared variable d has initial value (d.V)
equivalent to some constant d-0, parameter e has initial value (e . V) equivalent to some
constant e_0, and parameter *q points to some object obj _q. Additionally, the initial
value of obj .q (denoted obj -q . V) is equivalent to some constant q_0. After execution of
the ﬁrst statement of the program the pointer p points to the object identiﬁed by coset(q)
which is speciﬁed by the conjunct (p . > coset (q) ) in the speciﬁcation at lines 13-
14, where “ . >” is the points-to relation.

The ﬁnal speciﬁcation of the alias program (lines 19-2 0) is the following:

54

~
. .7 no:
' r ‘: -
L .’.;.‘.'.S
3 ..
‘ ‘D.
E
i ‘ .-
- 29-.
Q
. .
3 2--
I .--
2 .“cm

“~.

I ' e . o ' 6 0 l ' ' ' O ‘
p I'" in 1. A»! O I
1, 'V‘

. \‘I u-
0'

,..,
‘t

 

1 program alias (

2 inputs :

3 int 9;

4 int *q;

5 l

6 decl

7 int d;

8 int *p;

9 lead

10 begin

11 { ((d.V = d_0) & (((obj_q.v = q_0) & (q .> obj_q)) & (e.V = e_0))) }
12 p := q;
13 { (((d.V = d_0) & (((obj_q.v = q_0) & (q .> obj_q)) & (e.V = e_0))) &
14 (p .> coset( q ))) l

15 d := *p;

16 { ((((_cnst2 = d_0) & (((obj_q.v = q_0) & (q .> obj_q)) & (e.V = e_0))) &
17 (p .> coset( q ))) & (d.V = coset( p ).V)) )
18 *p := e;
19 { (((((_cnst2 = d_0) & (((_cnst4 = q_0) & (q .> obj_q)) & (e.V = e_0))) &
20 (p .> coset( q ))) & (d.V = _cnst4)) & (obj_q.v = e.V)) }
21 end

Figure 4.7: Output of AUTOSPEC applied to Figure 4.1

 

(((((-cnst2 = d-O) & (((_cnst4 = q_O) 8: (q .> obj_q)) 8:
(e.V = e-0) )) & (p .> coset( q ))) 8: (d.V = -cnst4)) &
(obj-q.V = e.V))
where “8.” is the logical connective “A”. The speciﬁcation states that after executing the
alias program, obj -q.V has value e.V such that e.V = e_O and d.V has value

_cnst4 such that _cnst4 = q_0. In addition, pointers p and q are aliases for the same

object obj -q.

4.3.2 manyvars

The manyvars program is shown in Figure 4.6(b). This program demonstrates the
difﬁculty in understanding programs that use a high degree of aliasing. The program
uses two integer variables and two pointer variables, where the pointers r and q are
used to create aliases of variables 11 and z, respectively. In addition, a number of

value assignments are made to the primary variables (e.g., z := 0;) and aliases (e.g.,

55

it) the AUTOSPEC

2231225 of imp

2111116 12. The
10 the data
32:5316 data \

res ' 3-“, ~
‘1 U.

‘4‘

* r : = l ; ). Figure 4.8 contains the speciﬁcation of the manyvars program as generated
by the AUTOSPEC system. The ﬁrst statement of the program at line 1 l (r : = @u) creates

an alias of variable u and is speciﬁed by the conjunct (r . > u) in the expression:

(((u.V = u__0) & (z.V = z_O)) & (r .> u))

at line 12. The fourth statement in the program at line 18 (*r := 1) assigns the value
“1” to the data object identiﬁed by coset(r) which, on account of the conjunct (r . >

u) , is the data variable 11. Hence, the conjunct (u . V = 1) appears in the speciﬁcation at

 

lines 19-20.
1 program manyvars (
2 )
3 decl
4 int 2;
5 int u;
6 int *r;
7 int *q;
8 lced
9 begin
10 { ((u.V = u_0) & (z.V = z_O)) }
11 r := @u;
12 { (((u.V = u_0) & (z.V = z_O)) & (r .> u)) )
13 z := 0;
14 ( ((((u.V = u_0) a (_cnst2 = z_O)) s. (r .> u)) s. (z.V = 0)) }
15 q := @z;
16 { (((((u.V = u_0) & (_cnst2 = z_O)) & (r .> u)) & (z.V = 0)) &
17 (q .> 2)) )
18 *r := 1;
19 { ((((((_cnst5 = u_0) & (_cnst2 = z_O)) & (r .> u)) & (z.V = 0)) &
20 (q .> 2)) & (u.V = 1)) }
21 ‘q := *r;
22 { (((((((_cnst5 = u_0) & (_cnst2 = z_O)) & (r .> u)) & (_cnst7 = 0)) &
23 (q .> 2)) & (u.V = 1)) & (z.V = coset( r ).V)) }
24 end

Figure 4.8: AUTOSPEC applied to the manyvars program

 

The ﬁnal speciﬁcation of the manyvars program (lines 2 2-2 3) is the following:

(((((((-cnst5 = u-0) & (_cnst2 = 2.0)) & (r .> u)) &
(-cnst7 = 0)) & (q .> z)) & (u.V = 1)) &
(z.V = coset( r ).V))

56

me. sures t

Ulsai

The remit:
demenstmte L

In} Objects

317“ The maxi

H
(.1
Deﬁned
It;r
., a
t e - “‘33
\‘. I} I I
i o
1 ~‘ ) 2‘
“ .2--
. V\D :
~_‘
'1.“ SA
-e‘D \

which states that after the execution of the program, 2 . V has a value equivalent to that of
coset (r) .V. Since coset(r) = u, and (u.V = 1) , the value of variable 2, denoted

z.V,is 1.

4.3.3 maxThresh

The maxThresh program is shown in Figure 4.6(c). The purpose of this program is to
demonstrate the use of AUTOSPEC in specifying the cases where pointers may reference
many objects rather than just a single object. The maxThre sh program sets pointer z to
alias the maximum of two input variables x and y. After determining the maximum, the
program adds a threshold e to the maximum.

Figure 4.9 contains the speciﬁcation of the maxThresh program as generated by
the AUTOSPEC system. After the execution of the if—fi statement (lines 11-18), the

following (as shown in lines 19-2 0) is true:

((((x.V > y.V) & ((Y.V = 57.0) & ((x.V = x_0) & (e.V = e-0)))) &
(z .> x)) | (((x.V <= y.V) & ((Y.V = y-0) & ((x.V = x-0) & (e.V =
e-0)))) & (z .> Y)))

which states that the value of variable x is greater than the value of variable y and the
pointer 2 points to the variable x, or the value of variable y is greater or equal to the value
of variable x and and the pointer 2 points to the variable y.

The ﬁnal speciﬁcation of the maxThresh program (lines 2 2-2 4) is as follows:

(((((((.cnst5 > _cnst4) & ((_cnst4 = y-0) & ((_cnstS = x_0) & (e.V
= e-0)))) & (z .> x)) | (((-cnst5 <= _cnst4) 8. ((_cnst4 = y-0) 8:
((-cnst5 = x-0) & (e.V = e_0)))) & (z .> Y))) 8: ((_cnst4 = CV3) |
(-cnstS = CV3))) & (CV3 = -06)) & (coset( 2 ).V = (-06 + e.V)))

This speciﬁcation states that the value of coset (2) .V is equivalent to the expression

(-06 + e.V) where -06 = CV3 and ((-cnst4 = CV3) I (_cnstS = CV3)).

57

;::r;ra: r.

...,,-.S r
t :12 9'

2.2;;15 :

.

D .

I

. new '-

-:-u

.

Q

, .
u D.

- A
.8
E.

()00 N

 

program maxThresh (

1 inputs :

2 int e;

3 int x;

4 int y;

5 outputs :

6 int *2;

7 )

8 begin

9 { ((y.V = y_0) & ((x.V = x_0) & (e.V = e_0))) }
10 if

11 (x > y) —>

12 z := 8x;
13 { (((x.V > y.V) & ((y.V = y_0) & ((x.V = x_0) & (e.V = e_0)))) & (z .> x)) }
14 II (x <= y) —>
15 z := @y;
16 { (((x.V <= y.V) & ((y.V = y_0) & ((x.V = x_0) & (e.V = e_0)))) & (z .> y)) l
17 fi;
18 { ((((x.V > y.V) & ((y.V = y_0) & ((x.V = x_0) & (e.V = e_0)))) & (z .> x)) I
19 (((x.V <= y.V) & ((y.V = y_0) & ((x.V = x_0) & (e.V = e_0)))) & (z .> y))) }
20 *z := (*z + e);
21 { (((((((_cnst5 > _cnst4) & ((_cnst4 = y_0) & ((_cnstS = x_0) & (e.V = e_0)))) &
22 (z .> x)) | (((_cnst5 <= _cnst4) & ((_cnst4 = y_0) & ((_cnstS = x_0) &
23 (e.V = e_0)))) & (z .> y))) & ((_cnst4 = CV3) | (_cnstS = CV3))) &
24 (CV3 = _06)) & (coset( 2 ).V = (_06 + e.V))) )
25 end

Figure 4.9: AUTOSPEC applied to the maxThresh program

 

Since the pointer 2 can point to either variable x or variable y, the value (*2 + e) in
the statement at line 21 is dependent on the result of the if— fi statement. As such,
the speciﬁcation of the conditional value for CV3 is appended to the derivation by the

AUTOSPEC system in order to preserve the one-point property of the speciﬁcation.

58

Chapter 5

Application of Strongest Postcondition
to C Programs

The C programming language is one of the most popular programming languages [29].
Based on the imperative (procedural) programming style, C contains many syntactic
and semantic elements that differentiate it from the Dijkstra guarded command language
including the use of pointers and side-effect expressions. This chapter describes our
investigations into the use of strongest postcondition to deﬁne the semantics of the
C programming language [29] based on the semantics we developed for imperative
programs in Chapter 3. The deﬁnition of the strongest postcondition semantics for the

C programming language facilitates the reverse engineering of real industrial systems.

5.1 Assignment

Let v be a variable or an assignable expression and e be an expression. An assignment in
the C programming language has the form v E e, where E is an assignment operator (i.e.,
=, +=, *=). There are two roles that an assignment statement can have. The ﬁrst is the
traditional assignment of a variable with the value of an expression. The second role is as

a side-effect Boolean expression.

59

In order to handle the dual role of an assignment statement, two functions are deﬁned.
First, in order to describe the semantics of the traditional use of assignment, an evaluation
function A: S —-> T is deﬁned, where S is the set of syntactically valid expressions, and T
is the range of the result given by evaluating the expression e. If s E S is a non-assignment
expression, then in general A(s) = 3. If, however, 3 is an assignment statement, such
as “x *= n”, the function A would be evaluated as A(x *= n) = x x n, where n is
a variable of the same type as x. Table 5.1 deﬁnes the semantics of the function A on a
few sample assignment operators. The left column of the table indicates which assignment
operator is being performed and the right column indicates the value of the assignment as
applied to v and e. For instance, for the operator =, the table states that A(vze) = A(e),
and that A(v+=e) = v + A(e).

A more general form of the function A can be deﬁned as A(b) = b, where b is a non-
assignment expression. The interpretation is that the evaluation A on any expression has
the value of the expression. For example, consider “A(x + y + z)”. The expression “x
+ y + z” is a non-assignment expression, therefore A(x + y + z) = x + y + 2. For
a discussion of the remaining expression constructs, see Appendix A. Using the deﬁnition
of A, we can deﬁne the strongest postcondition of an assignment in the following manner:

Deﬁnition 5.1 (Assignment Semantics)
Let Q be the precondition, v be the quantiﬁed variable, and ’ indicate that

the range of the quantiﬁed variable v is true. Then the strongest postcondition
of an assignment is

sp(x E e, Q) _=_ (32) :: Q: Ax = A(x 2 e:)).

Deﬁnition 5.1 states that after the execution of an assignment statement, there exists some
value v such that the textual substitution of every free occurrence of x with v in Q keeps

6O

 

 

 

 

 

 

 

 

Operation Evaluation
1) ’5 e A

= A(e)
*= v x A(e)
/ = ﬂ
+= v + A(e)
—= v — A(e)
%= 2} mod A(e)

 

 

 

 

 

Table 5.1: Evaluation of A on sample C assignment operators

 

Q true, and x takes the value of the evaluation A on x 2 e3. This means that after the
execution of an assignment statement, the precondition Q must still be true with respect to
the value that the variable x had before the assignment, and the assignment must be valid.
The second function that is used to deﬁne the effects of an assignment statement is the
logical valuation function V : S —> B, where S is the set of valid expressions, and B is
the Boolean type. Note that S includes general expressions and assignment expressions.
The purpose of V is best motivated by an example. Consider the sequence of code given
in Figure 5.1. Informally, the semantics of this code sequence is that if the guard is true,
execute Sl, otherwise execute 82. However, the guard is worth noting since the expression
is not a logical one, but rather an assignment expression. The semantics in this case are
dependent on the side-effect of executing the statement v = e. Using the evaluation

function A, function V is deﬁned as follows:

61

Emma—mi

 

where T and F are Boolean constants true and false, respectively. In general, for some
arbitrary expression b, V is deﬁned as:

T ' Ab
V(b)= 1f ()750

F if A(b) = 0 .
Although the side-effects of an assignment statement have no effect on the assignment
itself, the side-effects do impact other operations as was shown in the example in Figure 5.1.
The use of V will be important for deﬁning the semantics of alternation statements with

side—effects in Sections 5.2 and 5.3.

 

if (v = e) {
Sl

} else {
82

Figure 5.1: An Assignment statement as a guard

 

5.2 Altemation

The alternation statement for C programs can take two forms:

if B{ if B {
S 51

} and } else
32

We refer to these statements as C-IFl and C-IF2, respectively. If the guard of an
alternation statement has no side-effects, then the semantics of the alternation statement
is as follows:

62

Deﬁnition 5.2 (Conditional Semantics without Side-effects)
Let Q be the precondition for the conditional statement, and B be the guard.
Then C- IFl and C- IF2, have the following semantics, respectively.

sp(C—IFl, Q) E sp(S, B A Q) V sp(skip, -‘B /\ Q)
2 sp(S, B A Q) v (-.B A Q) (5.1)

sp(C-IFZaQ) E sp(SerAQ) V Sp(Sg,ﬁB/\Q). (52)

The speciﬁcation of sp(C-IFl, Q) states that after execution of C-IFl either sp(S, B /\
Q) is true (i.e., S was executed) or (-~B A Q) is true (guard B was false). Similarly, the
speciﬁcation of sp(C— IFl, Q) states that after execution of C— IF2 either sp(Sl, B /\ Q) is
true (i.e., SI was executed) or sp(Sg, ﬁB /\ Q) is true (guard B was false and statement 82
was executed).

If the restriction of having alternation statements without side-effects in the guards
is removed, then the semantics of the alternation statement has a different meaning.
Informally, if there is a side-effect in the guard B, then the execution of an alternation
is analogous to “executing” B, followed by the execution of the alternation using the
evaluation of B. More formally, let B be a guard of an alternation statement (C—IFl for
instance) such that the evaluation of B causes a side-effect, and let V(B) represent the
truth value of B. Execution of the alternation statement is equivalent to the execution of the

following, respectively:

B; B,

if V(B) { if V(B) {
S 81

} } else

63

We refer to the alternation statements (the i f statement with the replacement of B by V(B))

as C—IFl, and C-IF2,, respectively. The semantics of C- IF, are as follows:

Deﬁnition 5.3 (Conditional Semantics with Side-effects)

Let Q be the precondition for the conditional statement, and B be a guard
with side-effects. Then C-IFl, and C-IFZ, have the following semantics,
respectively.

3p(C-IF1, sp(B, 62))
E sp(S,V(B) Asp(B,Q))V (nV(B)/\sp(B,Q)) (5.3)

sp(C-IF1,, Q)

sp(C-IF2,,Q) E sp(C-IF2,sp(B,Q))
5 819(51, V(B) A sp(B, 62)) V sp(Sz, nV(B) A sp(B, 62))-
(5.4)

Expression (5.3) states that after execution of C- IFl, either sp(S, BAQ) is true (i.e., S was
executed) or the valuation of (ﬁB A Q) is true (the valuation of the guard V(B) was false).
Similarly, the Expression (5.4) states that after execution of C -IF2 either sp(Sl, B A Q)
is true (i.e., 81 was executed) or sp(Sg, ﬁB A Q) is true (the valuation of B was false and

statement 82 was executed).

S .3 Circuit Expressions

Expressions in the C programming language have a circuit property that cause a logical
expression to be true or false before the entire expression has been completely evaluated.
For instance, suppose the expression (v == 5) && (n == 10) is to be evaluated,
and at the time of execution v has the value 3. According to the deﬁnition of C, the
logical value of the expression (v =2 5) && (n == 10) is determined to be false

immediately after the evaluation of the subexpression (v == 5) .

In the instances when the evaluation of an expression has no side-effects, the circuit
property has no impact on the semantics of a program. That is, for instance, the semantics
of the alternative structure C-IFl has the form given by Expression (5.1) (as opposed
to the form given by Expression (5.3)). The existence of a side-effect requires that we
deﬁne the semantics of expression evaluation in the presence of the circuit property. The
following deﬁnitions deﬁne the syntax and semantics of logical expressions in the presence

of side-effects

Deﬁnition 5.4 (Atomic Expression)
Any variable, constant, or ﬁmction is an atomic expression. Let a and ,6 be
atomic expressions. Then the following are atomic expressions

 

 

Expression Meaning

(0 == ) a equals 6

(a ! = ﬂ) 0 does not equal [3

(a < B) a is less than 6

(0 <= ﬂ) a is less than or equal to 6

(o > B) a is greater than 6

(0: >= 3) a is greater than or equal to 3

Deﬁnition 5.5 (Logical Expression)
An atomic expression is a logical expression. Let a and 6 be logical
expressions. Then the following are logical expressions

Expression Meaning
(a as B) aandﬂ
(0 || 5) aorﬂ

(la) not a

 

 

Deﬁnition 5.6 (Circuit Expression Semantics)
Let a and 6 be logical expressions such that one or both of oz and ﬂ have

side-effects. The evaluation of (la) have the usual semantics, respectively. The
evaluation of (oz && ﬂ) and (al | H) has the following semantics.

sp(a && B, Q) E sp(a; £3,140!) AQ) Vsp(a, nV(a) A62) (55)

sp(a ll AQ) E sp(a;ﬂ,-'V(a)/\Q)Vsp(a,V(a)/\Q) (5.6)

65

Deﬁnition 5.4 describes the syntax for atomic logical expressions in the C programming
language and Deﬁnition 5 .5 describes the general form for logical expressions.
Deﬁnition 5.6 describes the semantics of two kinds of logical expressions: those formed by
conjunction and those formed by disjunction. Expression (5.5) states that the semantics of
evaluating an expression of the form (a && ﬂ) is equivalent to either executing the sequence
0; ﬂ, given that 12(0) A Q is true, or executing a given that -»V(a) A Q holds. Informally,
this means that either both subexpressions a and 6 are evaluated if V(a) is true, or only a
is evaluated since the falsity of V(a) forces the entire expression (a && ﬂ) to be false.

Expression (5.6) states that the semantics of evaluating an expression of the form
(a ll ﬂ) is equivalent to either executing the sequence a; ﬂ, given that --V(a) A Q is
true, or executing or given that V(a) A Q holds. Informally, this means that either both
subexpressions a and [i are evaluated if nV(a) is true, or only a is evaluated since V(a)

true forces the entire expression a ll ﬂ to be true.

5.4 Sequence

Sequences of statements in the C programming language have the form 51 ; . . . ; Sn. The

appropriate semantics using sp is as follows:

sp(Sl;SZ)Q) E sp(8293p(slrQ))' (57)

This formulation is identical to the semantics for sequences in the Dijkstra language [6].

Additionally, since the impact of side-effects are speciﬁed by the corresponding 3p

66

formalisms for assignment, alternation, and iteration, this characterization of the semantics

of sequence is sufﬁcient.

5.5 Iteration

In the C programming language, the iteration construct can take one of the following forms:

while (B) { do { for (expr1;expr2;expr3) {
S; S; S;
} } while (B) }

where B is the guard expression and expr, represent for iteration expressions. This
section describes the strongest postcondition semantics for the while, do-while, and
for iteration constructs of the C programming language. For the do—whi 1e and for

constructs, transformations using the whi 1e semantics are provided.

5.5.1 while

When no side-effects are present, the while iteration construct has the following

semantics:

Deﬁnition 5.7 (While Semantics without Side-effects)
Let Q be the precondition for the while statement and B be the guard. Then
the semantics of the whi 1e statement is as follows:

sp(while,Q) = -IB A (32' : O S i : sp(C—IFli, Q)),

Deﬁnition (5.7) states that if the execution of the while statement terminates then
the guard B is false and the result of applying the rule sp(C-IFl, Q) 2' times is true.
This construction is used given that an iteration statement can be considered a series
of alternation statements, where the guard for the alternation is given by the guard of

the iteration and the number of alternation statements that are included in the series is

67

determined by the guard. Clearly, it is not decidable to determine how many alternation
statements to include in the series. Notationally, sp(C- IF 1‘, Q), where i is the number of
iterations, means that sp is recursively applied to the result of sp(C- I Fl, Q). For instance,

sp(C- IFlj , Q) has the following derivation:

sp(C—IFlj,Q) a sp(c-IF1,sp(c-IF11"1,Q))
sp(C-IFl,sp(C—IF1,sp(C-IFlj’2,Q)))

In the case when the guard of the whi 1e statement has a side-effect, the semantics are

similar to executing the following construct:

B;

while (V(B)) {
S;
B;

}

where V is the valuation function described previously. The corresponding sp semantics of

the whi 1e statement with side-effects (denoted whi 1e,) is

Deﬁnition 5.8 (While Semantics with Side-effects)

Let the body of the statement C-IFl consist of “S; B; ” as given by the
transformation of the whi 1e statement to account for the side-eﬁ’ect, and let
Q be the precondition. Then the semantics of the whi 1e statement with side-
eﬁ’ects is

sp(while,,Q) = -1V(B) A (32' : 0 S i : sp(C-IFl‘, sp(B, Q))).

Deﬁnition (5.8) states that if the execution of the while statement terminates then the
valuation of the guard B is false and the result of applying the rule sp(C- IFl, Q) 2' times

is true.

68

 

5.5.2 do-whi la

The semantics of the do—whi 1e statement are similar to the whi 1e statement, where the
guarding condition appears after the loop body. Using the whi 1e construct, do-whi 1e

can be written as the following:

8:

B;

while (V(B)) {
S;
B;

The corresponding formal speciﬁcation of the semantics of the do—while statement is

given by Deﬁnition (5.9)

Deﬁnition 5.9 (Do-While Semantics with Side-effects)

Let the body of the statement C-IFl consist of “S; B”, and the eﬁ’ects
of executing “S; B” before entering the loop be given by the precondition
argument 0fsp(C- IFl‘, sp(B, sp(S, Q)))-

8p(do-whi1es, Q) E sp(while., 810(5, Q))
nV(B) A (32' :0 _<_ i : sp(C-IFl‘, sp(B, sp(S, Q)))).

This speciﬁcation states that after the execution of a do-while statement, the valuation of B

is false, and the body of the loop is executed i times.

5.5.3 for

Recall that the for construct in C has the form

for (expr1;expr2;expr3) {
S;
}

69

 

F.
"I
i
l
i-

 

The semantics of the for iteration statement is that the ﬁrst expression (exprl ) is executed
(evaluated) once, the second expression (expr2) is evaluated before each iteration, and the
third expression (expr3) is evaluated after each iteration. These semantics, deﬁned in terms

of the whi 1e construct, are represented by the following:

exprI;

expr2;

while (V(expr2)) {
S;
expr3;
exprZ;

The resulting formal speciﬁcation of the semantics of the for command using the sp

semantics for whi 1e is the following:

Deﬁnition 5.10 (For Semantics with Side-effects)
Let the body of the statement C— IFl consist of “S; expr3; expr2; Then the
semantics of the for statement is as follows

8p(fors, Q) E sp(while., 819(8er , 62))
E ﬁV(expr2) A (32' : O S i : sp(C-IFl‘, sp(exprZ, Q))).

This deﬁnition states that after the execution of the for loop, the logical valuation of expr2

is false, and the loop body is executed i times where the initial precondition to the loop is

given by sp(expr2, Q).

5.6 Functions

Functions in the C programming language can serve two basic purposes. A function can
be a pure value function, where the purpose is to compute and return a simple value based

on the parameters. Alternatively, a function can be a procedure, where the purpose is to

70

 

perform a number of encapsulated tasks. Table 5.2 contains a taxonomy of functions based

on the properties of variables, side-effects, values returned, and parameters.

 

 

 

 

 

 

Function Class
Property Procedural Pure valued
variables global, local local
side-effects yes no
parameters value, value-result, result value
values returned multiple single

 

 

 

Table 5.2: A Taxonomy of Programming Language Functions

 

The variables property describes the kinds of variables that are used by a function. The
side-eﬁects property indicates whether the class of functions produces side—effects. The
types of parameters and the number of values that are returned by a function are described
by the parameter and values returned properties, respectively. Pure valued functions are
characterized by the use of local variables, in that the functions produce no side-effects,
the parameters are value parameters, and the functions return a single value. Note that
a procedural function can effectively serve the role of a pure valued function if it can be
ensured that the functions produce no side-effects. This property implies that the number
of values must be singular.

A function in the C programming language has a signature (or prototype) of the form
R f(D), where ’R is the return type, and D is the input type of function f. For example,
a function max could have a signature “int max (int , int) ;”. Given a variable “x”
of type R, a parameter “a” of type D, and an assignment operator 2’, a call to the function

f hasthe form “xC‘-:’ f (a)”.

71

 

Let f be a pure valued function. The effect of calling the function is that a value is
returned and assigned to the variable x. The corresponding sp semantics for the function

call is given by the following deﬁnition

Deﬁnition 5.11 (Function Call Semantics)
Let Q be the precondition. The semantics of the function call is the following:

sp(x’é f(a),Q) = (32) ::Q:Aa:=A(x%’ f(a:)).

This deﬁnition states that after the execution of an assignment statement using a function
call, there exists some value v such that the textual substitution of x with v in Q is true, and
x takes the value of the evaluation A on x C-_‘-’ f (afj) . Note that in the case where a pure

valued function is called but not assigned that sp(f (a) , Q) = Q.

5.7 Procedural Abstractions

This section describes the construction of formal speciﬁcations from code containing the
use of procedural abstractions. In C, a signature for a procedure has the form 7?. p(D),
Where ’R is the type returned by the return statement in the procedure p, and D is the

list of input types to p. The code for a procedure has the following form:

R MD?) {
DL
Sp

}

Where D1, is a list of comma delimited type-parameter pairs, DL is a list of declarations,

and Sp is a sequence of programming statements.

72

 

5.7 .1 Parameters

Parameters for a procedure call, as stated in Table 5.2, can be value or value-result
parameters, respectively. A parameter of type value means that the parameter is used
only for input to the procedure. Parameters that are known as value-result indicate that
the parameters can be used for both input and output to the procedure. A value parameter
declaration has the general form “datatype q”, where datatype is the type of the
parameter, and q is the name of the parameter. A parameter declared in this manner
is visible in the scope local to the procedure being called. A value-result parameter
declaration has the general form “datatype *q”, where *q is a pointer variable. Any
Operation performed in a parameter declared in this manner has scope that is beyond the

local procedure.

5.7-2 Procedure Call Semantics

In Section 3.2.2 we described the sp semantics for procedure call. In C, a procedure call
13(5, 5) can be represented by the sequence of statements found in the righthand column of
Figure 5.2, where S, is the body of the procedure p, Q is the precondition, and R is the
POStCondition to the procedure call to p. In addition, annotations Qp, R1,, and RpR represent
the Precondition to the procedure call after the binding of actual parameters to formal
ParaHIeters, the postcondition to the procedure, and the postcondition after the procedure
“realms”, respectively. By representing a procedure call in this manner, parameter binding
can be achieved through assignment statements and a postcondition R can be established

by uSin g the 3p for assignment.

73

 

 

 

main () {

main () { /*Q*/
11:0;
/*Q*/ 11:5;
p('d,b) /*Q,,*/
/*R*/ Sp.
/*R,,*/
} 3=E
/*R1,R*/
by;
/*R*/
}
(a) (b)

Figure 5.2: Removal of procedure call abstraction: (a) before (b) after

Figure 5.3 depicts the code annotations for a procedure call given the use of inlining to
add eve parameter binding. Since 53‘ and y are formal parameters there are no occurrences of
53' and y in Q, and as such we can apply Lemma 3.1.2. The ﬁnal annotation is summarized

by the following deﬁnition.

Deﬁnition 5.12 (Procedure call Semantics)

Let Q be the precondition, f and 37 be formal parameters, 5 and b be actual
parameters, B be the variables local to p used to compute the results of the
value-result parameters, and S, be the body of the procedure. Then the
semantics of the procedure call are:

3p(p('d,b),Q) E sp(Sp,QAE=EA§=b) Abzﬁ

This deﬁnition states that the semantics of a procedure call is a conjunction of the
application of 3p on the precondition Q A f = a A " = b and the result of the binding of b

to a Value, E, computed within the procedure.

74

 

main () {

’t I

II
at-
\

.©°ﬂ9.|<0'

v

* u
0

* n
0:
‘5":3 5'
A
é”

\°‘|\¢Ql\03\¢cl Hl\
* u E

ca

'6

A

éﬂ

>
N
m
>

3‘.
Ca
31
go
>
HI

«0 <0 G
> >
an HI

II N ‘d

ol 2:: ”

> > °"
*.

<Ql tﬂ .\

H H

c4 en

v v

> »

el \‘

H

m

X-

\

n
DI
>
@|
u
3'
>
O‘l
u
:l
I»
\

Figure 5.3: Code annotation for procedure calls

 

75

Chapter 6

Design Abstractions

The derivation of abstract speciﬁcations from as-built speciﬁcations facilitates the
construction of a description of a system at a level of abstraction that is higher than both
source code and as-built speciﬁcations. As such, high-level reasoning and understanding
of speciﬁcations can be enabled by the existence of abstract speciﬁcations. In this chapter,
we describe a technique for identifying abstract behavior in speciﬁcations. Speciﬁcally,
we deﬁne an approach for deriving abstract speciﬁcations from as-built speciﬁcations
by requiring that the derived abstraction and the as-built speciﬁcation satisfy a matching

relation.

6.1 Speciﬁcation Matching and Software Reuse

Many approaches have been suggested for the retrieval of components from reusable
component libraries, ranging from classiﬁcation of search criteria [19, 20] to retrieval [31,
32, 33] and library structuring [34]. Jeng and Cheng describe the use of analogy and
generality [18, 35] as the basis for matching functions. Zaremski and Wing have proposed

a technique for signature [21] and speciﬁcation matching [19]. Fischer et al. have described

76

an approach for retrieval of reusable components using ﬁlters to narrow the search
spaces [32, 33]. The approach described in this chapter differs from these approaches
in that we are using speciﬁcation matching for reverse engineering as opposed to retrieval
of reusable components.

Mili et al. describe an approach for structuring component libraries using reﬁnement
orderings [34]. Their approach uses relational speciﬁcations as the formalism for
describing software components, and structures libraries using relational deﬁnitions of
reﬁnement. Our approach incorporates their ideas on the structure of libraries using partial
order relations although our focus is on axiomatic speciﬁcations, rather than relational
speciﬁcations. In addition, our primary goal is to use partial order matching operators
to generalize speciﬁcations for purposes of reverse engineering.

Other approaches to reverse engineering focus on the construction of speciﬁcations,
both informal and formal, and are based on the identiﬁcation of plans [36], the construction
of high-level structural speciﬁcations such as data ﬂow and call diagrams [37], or
transformation of programs into speciﬁcations [38, 39]. Of these techniques, the approach
proposed by Baxter and Mechlich [39] is the most closely related. They suggest an
approach to reverse engineering using “backward transformation” where a series of
transformations (semantic preserving rewrite rules), similar to those used in forward
transformation, are used in an inverse manner. The use of a library is extensive in this
approach where the contents of the library are semantic preserving transformations. In
the remainder of this chapter we describe an approach that derives abstract behavior by
preserving a match relation between generalized and as—built speciﬁcations. As such, we

do not rely on the existence of a domain library to provide speciﬁcation matches.

77

6.2 Abstraction Matching

In this section we describe an approach for software reverse engineering that is based on

the use of speciﬁcation matching.

6.2. 1 Approach

Given a library of axiomatic (pre- and postcondition) speciﬁcations describing software
components, these approaches use a plug-in or generality criteria [18, 19] to identify
components in the library that match a query speciﬁcation. The plug-in match is deﬁned as

follows:

Deﬁnition 6.1 (Generality (Plug-in) Match [31]) Let q be a query
speciﬁcation with precondition qpm and postcondition qpos, and I be a library
speciﬁcation with precondition lpn and postcondition I’m. Speciﬁcations q
andl match (denoted byl j q) if

(qpre —-) lpre) A (IPOST —') qPOSt)'

Informally, this deﬁnition means that the library component I is a reﬁnement (i.e., more
speciﬁc) than q, or conversely, that q is an abstraction of l. In both interpretations,
any program whose behavior is described by q will be satisﬁed by l and as such, 1
can be used as an implementation for the query given by q. Many different criteria for
matching query speciﬁcations with software components have been identiﬁed [20, 19] and
all vary in the degree of component modiﬁcation required to use a library component as an
implementation for a given query.

In Chapters 3, 4, and 5 we described the use of strongest postcondition to construct
formal as-built speciﬁcations from program code. Although as-built speciﬁcations facilitate
traceability between code and speciﬁcations, they may be difﬁcult to use for high-level

78

reasoning since they contain an implementation bias. Therefore, a rigorous technique for
deriving a more abstract functional speciﬁcation is desired.

Let I be a program with speciﬁcation 2' such that the precondition is ipm. The
corresponding postcondition (denoted ipost) can be derived using the strongest postcondition
(e.g., by using sp(I,ip,¢)) [6]. Let I be a speciﬁcation in a speciﬁcation library with
precondition lp.re and postcondition lpost. Suppose i j 1, then I is a generalization or
abstraction of 2'. Conversely, i is a reﬁnement of I. This means that any behavior described
by l is satisﬁed by i and as such, program I can be used as an implementation for the

speciﬁcation given by l. The following deﬁnition summarizes this idea.

Deﬁnition 6.2 (Abstraction Match) Let I be a program with speciﬁcation i
such that the corresponding precondition and postcondition are i1," and ipost,
respectively, and let I be an axiomatic speciﬁcation with precondition [We and
postcondition 1pm. A match is an abstraction match if i j I, so that

(le —> ipn.) A (ipos, -—> 1pm,).

The importance of the abstraction match is that if a speciﬁcation l exists that is well-
understood in terms of its abstract high-level behavior, then any speciﬁcation 2' that can
be shown to satisfy an abstraction match relationship, where i j I, has the same abstract
behavior as the speciﬁcation I. In terms of reverse engineering, this fact provides a means

for introducing abstraction into a speciﬁcation 2' via the identiﬁcation of speciﬁcation l.

6.2.2 Speciﬁcation Libraries

Speciﬁcation libraries have been used in the area of automated program construction to
describe theories about speciﬁc problem domains [40]. In addition, speciﬁcation libraries
have been used as a means for collecting components into reuse libraries [20, 19]. This

79

 

section describes the use of partial order relations to organize speciﬁcation libraries and
describes several properties that facilitate analysis of speciﬁcations based on semantic

commonality and difference.
Partial Order Relations

In some cases, the matching criteria given in Table 2.3 deﬁne a partial order relationship
between speciﬁcations A and R. The following lemmas and deﬁnitions reinforce this idea.

For the proofs of these lemmas, please refer to Section B.

Lemma 6.2.1 (Equivalence) The exact pre/post match is reflexive, symmetric,
and transitive (i. e., the exact pre/post match is an equivalence relation).

Using exact pre/post match as the equivalence relation for anti—symmetry, the following

lemma holds.

Lemma 6.2.2 (Plug-In) The plug-in match is reﬂexive, anti-symmetric, and
transitive (i.e., the plug-in match is a partial order relation).

Deﬁnition 6.3 (Weak Equivalence) Let A and R be axiomatic speciﬁcations.

Then deﬁne the relation:

Apost 4:} Izpost

to be weak equivalence, where (it is logical equivalence.

The intuition behind weak equivalence is that speciﬁcations A and R are equivalent if
their postconditions are logically equivalent. As such, their output behaviors are the same
while the relationship of their input behaviors is unknown. Using weak equivalence as the

equivalence relation for anti-symmetry, the following lemma holds.

Lemma 6.2.3 (Plug-In Post) The plug-in post match is a weak partial order
relation.

80

Library Structure

Mili et al. [34] have suggested that libraries be structured based on reﬁnement orderings.
Furthermore, they describe a number of properties and measures for managing and
retrieving components from libraries that are structured using reﬁnement orderings [34,
41]. Since the plug-in and the plug-in post matches are (weak) partial order relations,
speciﬁcation libraries can be structured as partially ordered sets with the matching operators
serving as the partial order (reﬁnement) relation.

The convention used in this chapter for library speciﬁcations, given in Figure 6.1, is
based on the Larch interface language [42] syntax. In this convention, domainsort and
rangesort are the input and output types of a given function, respectively. The locals
keyword lists the variables deﬁned within the scope of the speciﬁcation, if applicable. The
requires keyword is used to indicate the precondition of the given function. The ensures
keyword describes the postcondition of a given function. Finally, the modiﬁes keyword

lists the variables that are modiﬁed by the function.

 

spec name ( (var: domainsort)‘ ) —> var: rangesort
locals (var: domainsort)’
requires precondition
modiﬁes variables
ensures postcondition

Figure 6.1: Syntax of Library Speciﬁcations

 

Figure 6.2 shows the set of “Sqr” speciﬁcations that describe the square root function.
The speciﬁcation Sqr0 allows negative roots as output whereas Squ ensures that the
positive roots are returned. The speciﬁcations Squ and Sqr3 return undeﬁned values when

81

the input value is less than zero. These two speciﬁcations differ in that they allow (Squ) or
disallow (Sqr3) negative roots. The speciﬁcation Sqr4 returns root = 0 when the input is a

negative number, and a positive root for positive inputs.

 

spec Sqr0(:r::real) —+r:real spec Squ (xzreal) ——+r:real
requiresa:_>_0 requireerO
ensures r2=:r ensures rZOAr2=x
specSqr2(:r::real) -—>r:real spec Sqr3 (unreal) —->r:real
requirestrue requirestrue
ensures (xZOAr2=:c)V ensures (mZOA(rZOAr2=:r))V
(:r<0Ar=undeﬁned) (x<0Ar=undeﬁned)
spec Sqr4 (a: : real) —+ r: real spec Squos (a: : real) —+ r: real
requirestrue requirestO
ensures (zZOArzzr)V ensures r_>_0
($<0Ar=0)

Figure 6.2: Square Root Speciﬁcation Library “Sqr”

 

As a partially ordered set on the plug-in relation, the library in Figure 6.2 has the
structure given by the Hasse diagram of Figure 6.3, where the speciﬁcation at the head
of the arc is more general than the speciﬁcation at the tail of the arc. As such, Squ is
more general than Squ, and Squ is more general than Sqr3. The structure of this library
suggests that there are three different ways to construct a square root function. The ﬁrst
way requires that the inputs to the function be a positive real number. The second way
to construct a square root function is to produce an undeﬁned value when the input is a
negative real number. The ﬁnal way to construct a square root function is to return the
value zero when a negative real is used as input.

Structuring a library as a partially ordered set has many applications including the

fact that it provides a means for partitioning libraries based on behavioral differences as

82

 

Figure 6.3: Square Root Library as a partial order

 

in the example above. In addition, the partial order structure facilitates inserting new
speciﬁcations into a library and helps increase the efﬁciency of the retrieval process [31].
An interesting activity for analyzing libraries that are structured using partial order relations
is to determine if the library has certain lattice-like properties [34, 41]. In particular, it is of
interest to determine the least upper bound (lub) and greatest lower bound (glb) for given
speciﬁcations, if they exist, since the lub can be used to identify common behavior and the
glb can be used to identify compositional behavior. Given a partially ordered set, where an
ordered pair (L, j) indicates the set and the ordering operator, respectively, the following
interpretations for the lub and glb of two speciﬁcations 51 E L and 52 E L can be made,

where we denote the lub for SI and 52 as S1 n 32, and the glb as SI U 82:

Deﬁnition 6.4 (Behavioral Commonality [41]) Let T = SI Fl 32, if it exists.
Then T captures the behavior common to 51 and 52.

Deﬁnition 6.5 (Behavioral Composition [41]) Let B = 31 U 52. if it exists.
Then B captures the composition of behavior for SI and S2.

If speciﬁcations SI and 52 are related such that 51 j S; or 82 1 SI, then the following
deﬁnition is of particular interest:

83

 

Deﬁnition 6.6 (Semantic Difference [41] ) Let 51 and S2 be speciﬁcations
such that 51 j 82. Then the semantic difference between SI and 82 (denoted
S 1 9 S2) is the most general speciﬁcation E such that

SQUEjSl.

Deﬁnition 6.6 states that some speciﬁcation E, is the semantic difference between 51
and S2 in the case that the meet of S2 and E is more speciﬁc than S1. As an example,
consider the speciﬁcations for Squ and Squ, where Squ j Squ when using the plug-in
post relation. Figure 6.3 shows the Hasse diagram for (Sqr, jpip), where jp’ip is the plug-in

post relation. The semantic difference is the speciﬁcation E such that
(Squ L] E) j Squ

When we substitute the speciﬁcation names with the corresponding postconditions, we get

the following expression:
(root2 =r)l_JE j (rootZ OAroot2 =r).

Using the Hasse diagram in Figure 6.3 we ﬁnd that Squos satisﬁes the conditions for E

such that

(root2 = 7') LJ (root 2 0) j (root 2 O A root2 = r),
In fact,

(root2 = r) L! (root 2 0) E (root 2 0 A root2 = r).

That is, the meet of root2 = r and mat _>_ 0 is equivalent to (root 2 0 A root? = r).

Therefore, Squos is the semantic difference between Squ and Squ .

84

6.3 Speciﬁcation Generalization

Many of the techniques that utilize formal speciﬁcations to specify and retrieve reusable
components from component libraries attempt to identify candidate components by
searching the library for those components that satisfy speciﬁc match criterion. Similarly,
as stated by Deﬁnition 6.2, if we have a library speciﬁcation that is an abstraction match
for an as-built speciﬁcation, then the library speciﬁcation is a generalization of the as-built
speciﬁcation. However, it is possible that an abstraction match does not exist for a given
as-built speciﬁcation. In this case, some other technique must be used to derive abstractions
of the as-built speciﬁcation. In this section we describe an approach to reverse engineering
based on preserving the partial order relationship between an as-built speciﬁcation and a

derived abstraction of that as-built speciﬁcation.

6.3.1 Basic Approach

Consider an axiomatic speciﬁcation I that consists of precondition I1m and postcondition
1pm. Assuming that the relation 5 is a partially ordered matching operator, we would like
to identify an axiomatic speciﬁcation A such that I j A. That is, we would like to identify
a speciﬁcation A that is an abstraction of I in a manner that does not involve matching
speciﬁcations in a library. In fact, we can identify such a speciﬁcation by modifying I so
that we have a speciﬁcation I ’ that satisﬁes the relationship that I j I ’. If, for instance, j
is a plug-in match operator, then by either strengthening the precondition 1m, weakening
the postcondition 1pm, or both, we produce a speciﬁcation I ’ that satisﬁes the property that
I j I’. A modiﬁcation of I ’ to produce a speciﬁcation I " that satisﬁes the property I ’ j I ”

provides another level of abstraction such that I j I ’ _<_ I ”.

85

l

 

A likely situation is shown in Figure 6.4, where a speciﬁcation I has been decomposed
into several different speciﬁcations, each describing a different behavior such that the
composition of their behaviors is the original speciﬁcation I. In addition, several of these
speciﬁcations can be decomposed into other speciﬁcations, each at a different level of

abstraction.

 

 

Figure 6.4: Speciﬁcation Generalization

 

Using a brute force approach for speciﬁcation generalization can result in the
construction of an exponential number of speciﬁcations, all of which satisfy the partial
order constraints of the original speciﬁcation. For instance, the program in Figure 6.5 shows
a typical bubble sort program with logical annotations contained within the curly braces
‘{’ and ‘}’. The annotations, constructed using the strongest postcondition semantics by
a prototype system called AUTOSPEC [43], use the notation ‘&’ to indicate a logical and
(‘A’), ‘exists’ to indicate an existential quantiﬁcation, and ‘ f oral 1’ to indicate a general
quantiﬁcation. In addition, the notation v.V represents the value of a variable v. The

axiomatic speciﬁcation for the program is as follows:

86

spec BubbleSort (a[] : int, n : int) ——> root : real (6.1)
locals i,j,t : int
requiresn = |a|
modiﬁesa
ensures (i 2 n) A (j S n) A perm(a_1,a) A
(3n:1SuSn:( =a_1[u])) A
(Vk : 1 S k < n :
(Vr : k + 1 < r S n : a_1[r] Z a_1[r —1])),

where the ensures clause (postcondition) states that after the execution of the program, the
variable i is greater than or equal to the size of the array a, the variable j is less than or
equal to the size of the array a, the variable t has some value equivalent to some element
of the array, all the elements of the array are ordered in ascending fashion, and the ﬁnal
array is a permutation of the original array. Given the ﬁve conjuncts in Speciﬁcation (6.1),
it is possible to construct at least thirty-one different speciﬁcations that satisfy the partial
order property of the abstraction match operator.

In order to handle the complexity of this situation we make the assumption that the
reverse engineering programmer is guiding the abstraction process. In order to support
this process we are developing a support tool called SPECGEN that visually displays
the partially ordered sets of speciﬁcations that are constructed using the speciﬁcation
generalization technique. In the following sections, we describe several guidelines that
can be used to construct abstractions from a speciﬁcation. The remainder of this section
discusses the guidelines from the point of view of weakening the postcondition and

strengthening a precondition.

87

 

 

program BubbleSort (inputs : int aI]; int n;

outputs : int a[]; )
decl
int 1; int j; int y; int x; int t;
1ced
begin

{ (((LV = L0) A ((x.V = X.0) A ((y.V = y_0) A
((j.V = ].0) A (LV = L0))))) A ((n.V = 11.0) A (e.V = 8.0)» }

i := 1;

{ (((0-V = L0) A ((X-V = L0) A ((Y-V = Y—O) A (ll-V = 1-0) A
(471811 = i.0))))) A ((n.V = [1.0) A (e.V = 8.0)» A (LV = 1)) }

do
(i < n) —>
j := n;
{ (((i.V < n.V) A «W = k) A (n.V = n.0))) A (j.V = n.V)) }
Y == (1 + 1);
£1 ((((i.V < n.V) A «W = k) A (n.V = n.0))) A (j.V = n.V)) A (y.V = (W + 1))) }
O
(3' > y) ->
x := (j - 1);
{ («(j.V > y.V) A «(W = k) A (n.V = n.0)) A (j.V = M))) A (x.V = (j.V - 1)) A
. f (an-V] = 840)) A (a[x.Vl = “0)) }
1
(a[j.V] < a[x.V1) ->
t := aljl;
{ (((((aU-V] < a[X-Vl) A (((l.v > W) A (((i-V = k) A
(n.V = n.0)) A (j.V = M))) A (x.V = (j.V - 1)))) A
(W = a[WD) A (81W) = 840)) A (a[x.Vl = a—xO» }
a U] := a [x] ;
l («K-01812 < a[x.Vl) A («W > W) A «(W = k) A
(n.V = n.0)) A (j.V = M))) A (x.V = (j.V - 1)))) A
(LV = .enst2)) A (a[W] = a[x.V])) A (.cnst2 = a.jO)) A
(a[X-Vl = 810)) }
a[x] := t;
{ (((((((-cnst2 < msﬁ) A («W > W) A «(W = k) A
(n.V = n.0)) A (j.V = M))) A (x.V = (j.V - 1)))) A
(W = .cnst2)) A (a[j.V] = .ensta» A (a[x.Vl = .cnst2)) A
(.cnst2 = a.jO)) A (.ensta = a.x0))}
fi;
{ (((((((-m812 < mm) A ((0-V > W) A («W = k) A
(n.V = n.0)) A (j.V = M))) A (x.V = (].V - 1)))) A
(w = .enstZ» A (a[i.V] = .ensta» A (a[x.V] = .cnst2)) A
(.mst2 = 8140)) A (metal = a.xO)) }
j := (j - 1);
{ (((((«LcnstZ < .0183) A «(W > W) A «(W = k) A
(n.V = n.0)) A (j.V = M))) A (x.V = (j.V - 1)))) A (W = .cnst2))
A (a[i.V] = .cnst3)) A (a[x.V] = .cnst2)) A
d (.cnst2 = a.jO)) A (.cnsta = a.xO)) A (j.V = (M - 1))) }
o :
{ (((0-V = (k H» A (X-V = (W - 1))) A
(V r: (((k +1) < r) A (r g n.0)) : (a-1 [r] _>_ a-1[r-1]))) A
(3 U: (((k +1) < H) A (U S 0.0)): (t=a-1[u})))}
i := (i + 1);
{ (((((i = (k H» A (X-V = (W - 1))) A
(V r: (((k +1) < r) A (r g n.0)) : (a-1 [r] 2 a-1[r-1]))) A
(3 u: (((k +1) < u) A (u g n.0)) : (t = a-1[u]))) A
(LV = (.cnsﬂ + 1))) }
0d;

{ ((—.(t.v < n.V)) A ((((—~(j.V > n.0)) A (v v: ((1 s v) A (v < n.0)) :
(V r : (((v +1) < r) A (r 5 m0»: (8-1 [r] 2 8-11MB)» A
(3 u: (((k +1) < U) A (u s 0.0)) : (t = a-1lu}))) A perm( 8-1. a mi
end

Figure 6.5: Bubble Sort Program Annotated by AUTOSPEC

 

88

Weakening the postcondition

Let I be a speciﬁcation with precondition I,“ and postcondition [post and let I ’ be a

speciﬁcation such that 1;,“ H [we and [Wu ——) I ’ As such, I j I ’ , since

post '

((Ipre H IP78) A (IPOSt —§ [post))
.> (6.2)

((Ipre —* Ipre) A (Ipost _’ I;wat))'

Expression (6.2) provides a basis for deriving abstractions from a speciﬁcation by

weakening a postcondition 1pm to produce a postcondition I ’

post. Several optrons are

available for weakening the postcondition including those listed in Table 6.1, which
includes delete a conjunct, add a disjunct, A to V transformation, and A to —>

transformation.

 

 

 

 

 

 

 

| Operation ]1 IP03, [1’10“
Delete a conjunct A A B A C A A C
Add a disjunct A A B (A A B) V C
A to —> A A B A —> B
/\ to V A /\ B A V B

 

 

 

 

 

 

 

Table 6.1: Weakening the postcondition

 

Delete a conjunct. Given a speciﬁcation in conjunctive form (not necessarily a
normal form), deletion of a conjunct weakens a speciﬁcation by removing additional or
constraining conditions. For example, consider Figure 6.6, where the speciﬁcation abcde
represents the ensures clause of the speciﬁcation in Expression (6.1). In the Hasse diagram,
the vertex label ’xy’ represents the logical conjunction :1: A y. Each successive level of
abstraction is derived by deleting a conjunct from the lower levels of abstraction. Below

are guidelines that can be used to identify the appropriate conjunct for deletion.

89

Local Scope: If a conjunct speciﬁes behavior that is local to a procedure and
has no impact on the output variables of the system, then that conjunct is
a candidate for deletion. Examples include speciﬁcations of the value of
a loop index or temporary variables.

Independence: If a conjunct speciﬁes some behavior that is logically
independent of the remaining conjuncts, then that conjunct is a candidate
for deletion. As an example, consider the expression (a: = c) A (c =
y) A (z = n). The conjunct (z = n) is independent of the conjuncts
(a: = c) and (c = y).

Preservation: If a conjunct captures some behavior that must be expressed in
the higher level speciﬁcation, the remaining conjuncts are candidates for
deletion. Refer again to Figure 6.7 where the conditions a and c have
been selected as behaviors to be preserved. The remaining speciﬁcations
in the partial order indicate reﬁnements between the as-built speciﬁcation
and the speciﬁcation ce.

These guidelines are by no means comprehensive. Ultimately, a maintenance engineer

using this approach must decide whether to delete a speciﬁc conjunct in a speciﬁcation

Add a disjunct. Given a speciﬁcation in any form, adding a disjunct weakens a
speciﬁcation by generalizing or increasing the scope of the speciﬁcation. The addition of a
disjunct should be used in very few instances since the new disjunct potentially introduces

superﬂuous behavior that may not be reﬂected in the original system.

Conjunction to implication or disjunction transformations. Given a speciﬁcation in
conjunctive form (not necessarily a normal form), transformation of the conjunction to an
implication or disjunction provides a logical weakening of the speciﬁcation and facilitates
manipulation of the speciﬁcation using several standard equivalence transformations.
Our ongoing investigations include determining the usefulness of these transformation

techniques to derive speciﬁcation abstractions.

9O

 

 

   
   

   

.i=fn' ,
. FWJJ”
:(me: «alumna-Jpn)

  
 

 

       

 

 

[L
w
I

 

 

 

 

 

 

 

Figure 6.6: Bubble Sort Speciﬁcation Brute Force Abstraction

 

Strengthening the precondition

Let I be a speciﬁcation with precondition 1m and postcondition 1,0,. and let I ’ be a

speciﬁcation such that 1",,e —«) I,m and 1,0,, H 111,0”. As such, I j I ’ , since

((Iglrre —’ Ire) A (Ipost H Ipoat»
=> (6-3)
((11:11: _’ Ire) A (IPOJt —-) 11,2030)-

Expression (6.3) provides a basis for deriving abstractions from a speciﬁcation by

strengthening a precondition I,m to produce a precondition It’W. Weakening a

postcondition has many advantages over strengthening a precondition in the context of

91

A I

 

 

w ' . .. . ., _-’ ,v , ‘. '. , '. ".- . I
i ,: mmwkﬂﬂﬁébEEESﬁhiiﬁilaiLlriiiéLuﬁ-lwbﬁziieLani-m

 

 

 

gru- mm use.
:Fﬂlun) oust-mt: I?

:aan) .
jammy”) J mew-lint: J

 

 

it (exists u: 1 u u s— n: (t-a_1|um
lozuorﬂkn c-k<n:(torelr:k+1 <run:e_1ﬂ>-a_1r-1])) ]

l

 

 

 

(sorrel-ale m ‘

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.7: Bubble Sort Speciﬁcation Abstraction (postcondition)

 

deriving abstractions from speciﬁcations. The primary advantage is a consequence of
the reverse engineering activity in that we are interested in deriving a speciﬁcation of the
behavior of a program. This behavior is captured in the speciﬁcation of the postcondition
rather than the precondition. The utility of strengthening the precondition is that it provides
a mechanism for identifying a narrower set of conditions that can be used to constrain
the domain of input. The available techniques for strengthening the precondition include

adding a conjunct and deleting a disjunct.

92

Add a conjunct. Given a speciﬁcation in a conjunctive (not necessarily normal) form,
adding a conjunct to the precondition provides further conditions that are required in order

for the speciﬁcation to achieve the desired behavior.

Delete a disjunct. Given a speciﬁcation in any form, deleting a disjunct will make the
precondition more specialized (e.g., less general) in the initial conditions required to satisfy

a behavior.

6.3.2 Example

Consider once again the example in Figure 6.5 and the corresponding postcondition
speciﬁcation in Expression (6.1). In this section we focus on constructing an abstraction
of the speciﬁcation by weakening the postcondition. Since the speciﬁcation is in a
conjunctive normal form, it is appropriate to use the delete a conjunct strategy to construct
an abstraction. In a completely brute force approach we would derive four abstractions for
each of the ﬁve produced in the ﬁrst step. However, we advocate a user-driven process that
relies on a user to decide the direction of the abstraction steps. Figure 6.6 depicts the brute
force application of the delete a conjunct strategy, where the expression “abcde” at the
bottom of the graph represents the speciﬁcation of Expression (6.1), where “a” represents
the conjunct (i _>_ n), “b” represents the conjunct (j S n), “c” represents the conjunct
perm(a_1, a), “d” represents the conjunct (Elu : 1 S u S n : (t = a-1[u])), and “e”
represents the conjunct (Vk : 1 S k < n: (W : k + 1 < r S n : a_1[r] 2 a_1[r —1])).

In the ﬁrst step, the “a” conjunct can been deleted since the “a” conjunct involves

a speciﬁcation of the value of an iteration variable. Deleting one conjunct from the

speciﬁcation “bcde” results in four different speciﬁcations. Using the same reasoning as

93

in the previous step, we consider only the speciﬁcation that excludes the conjunct “b”.
Figure 6.8 shows the partially ordered set of speciﬁcations that result from deleting “a”
and “b” where the resulting speciﬁcation is “cde” which states that the output array is a
permutation of the input array, that the variable t takes the value of some element of the
array, and the array is ordered in increasing value. At this point, three abstractions are
possible. However, the conjunct (3a : 1 S u S n : (t = a_1[u])) speciﬁes information
about the temporary variable t, and as such we consider only the speciﬁcation ce, which is

equivalent to:

perm(a-1, a)A
(Vk:1Sk<n:(Vr:k+1<rSn:a-1[r] Za-1[r—1])).

This speciﬁcation states that after execution of the program, the output array is a
permutation of the input array, and that the array is ordered in increasing value.

By focusing attention on a few conjuncts, the complexity of the task of constructing
speciﬁcation abstractions can be reduced since many of the other possible abstractions
for the original as-built speciﬁcation can be removed from consideration. In addition, by
using a few simple support tools, the difﬁculty of deriving the abstractions can be greatly

reduced.

6.4 Application to a JPL Ground-based Flight System

In our previous investigations we described a technique for analyzing C programs using
the strongest postcondition predicate transformer [7]. In addition, we have deﬁned the
semantics of pointers and pointer operations in terms of sp [43]. In this section we present

a case study that applies the 3p technique for C programs to a module from a ground-

94

 

 

 

 

’ .5 "‘7' ”“3“ "‘“t‘,-m;:r¢-'wgezumerAres:«iurgent-signsrmuftxez-rewﬁm

;EI- arm as»?

 

 

 

 

 

 

 

 

a: (I ’- n) Delete mtg
b: (l '- n) _. .. ._ - W-. -,
: (pun(a__1,a)) Hem W I
.(mteu.1c-uan.(t-e_1p])) mm]:

 

 

 

 

 

 

 

 

 

 

 

 

Wei-i

 

 

 

 

 

 

 

Figure 6.8: Bubble Sort Speciﬁcation after deletion of “a” and “b”

 

based mission control system used by the NASA Jet Propulsion Laboratory. The system is
responsible for the translation of user commands into appropriate spacecraft mnemonics,
enabling users to modify spacecraft mission operations. This particular module takes
a sequence of elements from a ﬁle and returns an index to a subsequence of elements
Speciﬁed by begin and end indices. In our previous investigations, we described the 3p
semantics for C [7] and pointers [43]. Those semantics were used to construct the / *AS

AS * / annotations for the code contained in this section.

6.4-1 Code Analysis

First Code Sequence. Appendix C contains a program listing of a module that takes

a sequence of elements from a ﬁle and returns an index to a subsequence of elements

95

speciﬁed by begin and end indices. These elements correspond to message fragments used
for spacecraft control. The annotations for the code in Appendix C were constructed using
the sp semantic rules for the C programming language.

One code sequence of interest is the code for lines 4 8 - 82, which appears as follows:

if (lskip_gcmd_sfdu(fd, L2))
I

inform_user(
"line %d: copy failed: bad SFDU header (%s)",
body_1ineno, file);

dontoutput = 1;

close(fd);

if (params->cmdcntl) master_unlock();

return(NULL);

}

The purpose for this code sequence is to abort processing if the ﬁle header is corrupted.

The precondition for this block is

(fd >= 0 & fd = FHO & begin = BO & end = E0 & file .> F0),

which makes assertions about the initial values of several variables and pointers, where the
& is the logical connective ‘A’. The speciﬁcation states that f d has the initial value FHO,
and that the value is greater than or equal to 0. The speciﬁcation also states that the variables
begin and end have the values ED and E0, respectively. Finally, the speciﬁcation states
that the pointer f i la points to some object F0.

The following annotation describes the behavior of the code when the conditional path

is taken in the case that skip_gcmd_s fdu evaluates to zero:

96

(params->cmdcntl != 0 & master.unlocked() &
closed(fd) & dontoutput = 1 &
skip.gcmd-sfdu(fd, L2) = 0 &
fd >= 0 & fd = FHO & begin = B0 &
end = E0 & file .> F0) |

(params->cmdcntl = O & closed(fd) &
dontoutput = 1 & fd >= 0 &
skip.gcmd-sfdu(fd, L2) = O & fd = FHO &
begin = BO & end = E0 & file .> F0),

which is equivalent to

((params->cmdcnt1 != O & master.unlocked()) |
params->cmdcnt1 = O ) &

closed(fd) & dontoutput = 1 &
skip.gcmd.sfdu(fd, L2) = 0
& fd >= 0 & fd = FHO & begin = B0
& end = E0 & file .> F0 .

This speciﬁcation states that in addition to the precondition being true, the ﬁle FHO
is closed, the variable dontoutput is set to 1, and depending on whether the
params->cmdcntl has the value 0, the master key is unlocked. In this system,
processing is regarded as having failed whenever the variable dontoutput is set to a non-
zero value. This speciﬁcation recurs throughout this code when certain failure conditions
mernet

The postcondition annotation at lines 84 — 8 5 asserts the following:

skip.gcmd.sfdu(fd, L2) != O 8. fd >= 0 &
fd = FHO & begin = BO & end = E0 & file .> F0 ,

Which states that in addition to the precondition being true, that the function
Skip.gcmd-s f du evaluates to a non-zero value. This speciﬁcation is reasonable since
the body of the statement in question ends with a return statement. As such, the program
only proceeds past the conditional statement if skip.gcmd-s fdu evaluates to a non-zero

Value.

97

Due to space constraints, some annotations were omitted due to the similarity of some
blocks of code. For instance, annotations for the code sequence from lines 4 8 - 8 2 are very
similar to the annotations that would appear for the code blocks at lines 8 7 - 9 3, 9 8 - 1 0 6,

and 115-123.

Second Code Sequence. Another interesting sequence of code appears in Figure 6.9 and
occurs at lines 113-123. One of the activities that can be performed is to analyze the
postcondition at lines 12 5 - 1 3 4 using the speciﬁcation generalization technique described
in Section 6.3. First, we can rewrite the speciﬁcation into an equivalent form by factoring

terms so that the speciﬁcation appears as follows:

((130 = -l & end = gcmd.hdr.elem_count) I
(end <= gcmd.hdr.elem.count & end != -1 8: (64)
end = E0)) & params—>sc = gcmd.hdr.SC & '
get-gcmd.hdr(fd, gcmd.hdr) != 0 &
skip.gcmd-sfdu(fd, L2) != 0 & fd >= 0 &

fd = FHO & begin = B0 & file .> F0 .

This speciﬁcation states that the constant E0 is equal to -1 and end =
gcmd.hdr . elerrLcount or that end = E0, end <= gcmd.hdr . elerrLcount, and
end ! = - 1. In addition, several conditions regarding the input header are true as well as
conditions that describe the input ﬁle. At this point the speciﬁcation is in a form suitable
to apply the delete a conjunct strategy. Figure 6.10 shows the possible abstractions for

the speciﬁcation when we delete the ﬁle related conjuncts. Successive application of
the strategy leads to the abstraction of the behavior described by Expression 6.4. This
SPeeiﬁcation corresponds to the speciﬁcation ‘a’ in the Hasse diagram in Figure 6.10 and

aWears as follows:

98

( (E0 = -—1 & end = gcmd.hdr.elem.count) |

(end <= gcmd.hdr. elerrLcount & end != —l & end = E0) ) ,
which states that E0 = - 1 and the variable end has the value gcmd.hdr . elerrLcount,
or, end = E0, end != —1 and end <= gcmd_hdr.elem_count. Essentially, this

states that if this point in the program has been reached, the variable end has a value that is
less than or equal to gcmd.hdr . elem-c ount and not equal to — 1. This behavior creates
an issue that must be addressed since behavior for the case when end < — 1 may not be

what is expected.

 

108. /*AS (params—>sc = gcmd_hdr.SC &

109. get_gcmd_hdr(fd, gcmd_hdr) != 0 & file .> F0 &
110. fd >= 0 & fd = FHO & begin = B0 & end = E0 &
111. skip_gcmd_sfdu(fd, L2) != 0 )AS*/

112. /* make sure the file has enough elements */

113. if (end == —1)

114. end = gcmd_hdr.elem_count;

115. else if (end > gcmd_hdr.elem_count)

116. I

117. inform_user("line %d: copy: not enough elements \
118. in GCMD file (%s)',body_lineno, file);

119. dontoutput = 1;
120. close(fd);

121. if (params->cmdcnt1) master_unlock();

122. return(NULL);

123. l

124.

125. /*AS

126. (E0 = —1 & end = gcmd_hdr.elem_count &

127. params—>sc = gcmd_hdr.SC & EO = E0 & file .> F0 &
128. get_gcmd_hdr(fd, gcmd_hdr) != 0 & fd = FHO &

129. skip_gcmd_sfdu(fd, L2) != 0 & fd >= 0 & begin = B0 )
130. I

131. (end <= gcmd_hdr.elem_count & end != -1 & begin = B0 &
132. params->sc = gcmd_hdr.SC & end = E0 & file .> F0 &
133. get_gcmd_hdr(fd, gcmd_hdr) != 0 & fd = FHO &

134. skip_gcmd_sfdu(fd, L2) != 0 & fd >= 0 ) AS*/

135.

Figure 6.9: Code Sequence: Lines 108—135

99

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.10: Annotation Abstractions

 

Third Code Sequence. The ﬁnal annotation that is of interest is found on lines 4 04 - 42 0
of the program in Appendix C. The annotation, also found in Figure 6.11, shows
the annotation for the program after a simpliﬁcation step that factors conjuncts from a
disjunction.

Informally, this speciﬁcation makes assertions about a chain of elements and the
relationship between the requested subsequence of elements and the elements read from
a ﬁle. The following speciﬁcation abstraction can be derived by applying the delete a
conjunct strategy to the annotation, where we focus speciﬁcally on the conjuncts that

cOntain a reference to the variables begin and end:

(forall k : 1 <= k < begin : freed(e1em.k)) &
(forall k : end < k < gcmd.hdr.elem.count

freed(elem_k)) &
elem.end->next .> NULL 8: (65)
orig-e1em .> coset(elembegin) & '
(forall k : begin <= k < end

e1em.k->next .> coset(elem_k+1) & zeroed(elem_k))

The speciﬁcation states that all the elements outside the bounds speciﬁed by the begin and
end indices have been freed, that the pointers orig-e1em and elem.begin refer to the

same object, and that all the elements within the begin and end bounds form a chain.

 

closed(fd) &
(forall k : end < k < gcmd.hdr.elem.count : freed(e1em.k)) &
ep .> coset(elem.gcmd.hdr.elem.count) &
elem .> coset(ep) &
e1em.end->next .> NULL &
orig.e1em .> coset(elem.begin) &
checksurrLgcmd.chain(gcmd.hdr, ObjOcnstl) = 0 &
e1em.gcmd.hdr.e1em.count .> NULL &
(forall k : 1 <= k < begin : freedlelem.k)) &
(forall k : begin <= k < end :

elerrUc->next .> coset(elem.k+1) & zeroed(e1em.k)) &
((E0 = -1 s end = gcmd.hdr.elem.count) I

(end <= gcmd.hdr.elem.count & end != -1 & end = E0)) &
params->sc = gcmd.hdr.SC &
get.gcmd.hdr(fd, gcmd.hdr) != 0 &
skip.gcmd.sfdu(fd, L2) != 0 &
fd >= 0 t.
id = FHO &
begin = BO &
file .> F0 .

Figure 6.11: Code annotation: Lines 404—420

 

6.4.2 Discussion

The analysis of the code in Appendix C has led to several observations that
empirically validate the appropriateness of the delete a conjunct strategy for speciﬁcation
generalization of 3p speciﬁcations. First, the speciﬁcations that are constructed using sp
are conjunctive in nature due to the semantics of the assignment statement. As such,

101

application of the delete a conjunct strategy facilitates the analysis of speciﬁcations of
program code by decomposing those speciﬁcations into smaller, more manageable pieces.
Second, analysis of annotations that occur within the code (as opposed to only analyzing
the ﬁnal postcondition) is an important activity for understanding the behavior of programs.
As an example, our analysis of the code from lines 108-135 and the corresponding
speciﬁcation in Expression 6.4 facilitated the identiﬁcation of behavior in the program
that must be analyzed further in order to determine the impact of inconsistent inputs.
Finally, although the reverse engineered speciﬁcations describe logical abstractions, some
mechanism must be provided in order to describe the abstractions using natural language.
For instance, instead of providing the speciﬁcation in Expression (6.5) to a user, it would
be desirable to state that since the procedure returns a subsequence of a list of elements,
that the abstract behavior corresponds to a list subsequence cliche or plan [44], where a

plan describes common or canonical program behavior.

102

Chapter

Reverse

The previous Chap
[0 “PW” marge
Ethnjque With 1h:
it’ll/1111011,“ dCScn
and how Our form

7.1 Conlbillir.

Chapter 7

Reverse Engineering Framework

The previous chapters have discussed several techniques that we have developed in order
to support reverse engineering. In this chapter we integrate the strongest postcondition
technique with the speciﬁcation generalization technique to form a single process. In
addition, we describe how informal methods can be used for high-level concept discovery
and how our formal reverse engineering technique can be used to supplement program

understanding by facilitating formal reasoning.

7 .1 Combining Informal and Formal Approaches

Due to the mathematical nature of formal speciﬁcation languages, formal methods have
been perceived as time consuming and tedious. However, since the languages are well-
deﬁned, formal methods have been found to be amenable to automated processing. Semi-
fOl'rnal methods are techniques for specifying system requirements and design using
hierarchical decomposition. Many semi-formal methods use notations that are graphical,
thus facilitating ease of use in their application. The drawback to semi-formal methods

)3 that the notations are typically imprecise and ambiguous. This section describes an

103

approach to re\'

in order to bent

7.1.1 Strut

Although the re
oriented techno
programming 12
of these langu:
.Wysis and D.
function. The a
thesystem. Dui
degUi'ptions off
incorporate imp
and PIOCedures i
“hen using 5

is abSIT'dCled im
“manta Chang
leher analysis ‘
Motions by cor
@‘Phical descn'p
deSCnplionS as a E
general‘ [I] 6

he ﬁlm phatSe a

approach to reverse engineering that combines the use of semi-formal and formal methods

in order to beneﬁt from the complementary advantages of both approaches.

7.1.1 Structured Analysis

Although the recent trend in software development has been to build systems using object-
oriented technology, a majority of existing systems has been developed using imperative
programming languages, such as C, FORTRAN, and COBOL. The procedural structure
of these languages makes them amenable to the techniques offered by the Structured
Analysis and Design Technique (SADT) [12]. In SADT, the focal point is the procedure or
function. The analysis stage centers around high-level descriptions of the functionality of
the system. During the design phase, the reﬁnement and decomposition of the high-level
descriptions of functions yields more detailed descriptions of functions and procedures that
incorporate implementation details. Finally, during the implementation phase, functions
and procedures identiﬁed during design are decomposed into more speciﬁc functions.

When using SADT for reverse engineering activities, the structure of an implementation
is abstracted into low-level graphical descriptions functions known as call graphs or
structure charts. These graphs depict the calling hierarchy of functions within a system.
Further analysis of source code involves analyzing the data that ﬂows to and from various
functions by constructing data ﬂow diagrams. Our approach is to construct various
graphical descriptions of a program, in most cases automatically, and then use those
descriptions as a guide for constructing formal speciﬁcations.

In general, the construction of the graphical descriptions proceeds in two phases. In

the ﬁrst phase, a high-level model is constructed that is based on information gathered

104

 

 

from user manuals or high-level design descriptions. In the case that these documents do
not exist, it is appropriate to incorporate user interviews, and if possible, empirical testing
to determine high-level behavior. In the second phase, a low-level model in the form of
call graphs and/or control-ﬂow graphs is constructed. Several tools exist that support the

construction of such models as is described in Chapter 10.

7.1.2 A note about formal techniques and large systems

One of the limitations of our technique is that the speciﬁcations that are constructed can
grow to be exponential in size with respect to the input program. Given this limitation,
the appropriateness of using formal methods in the context of large systems must be well-
understood. That is, the use of formal methods for reverse engineering, as is the case
with all applications of formal methods, must be targeted to those contexts where it has
the highest payoff; namely critical systems [4, 5]. However, in the reverse engineering of
software, we can extend the context a bit further to include the parts of a software system
that are deemed “critical”. In order to determine those critical portions of the software,

several factors must be taken into account, including call graph and ﬂow graph complexity.

7.1.3 Applying formal techniques

The motivation of using both semi-formal and formal methods is two-fold. First, it is
desirable to take advantage of the beneﬁts of the complementary techniques. Second, by
using a semi-formal technique to guide the formal technique, organization of the formal
speciﬁcations will be based on the structure of an implementation. As such, in the case

where formal speciﬁcations are warranted, the speciﬁcations can be directly associated with

105

 

 

a graphical e
can be left u
semi-formalr
We pron:
1. Local.

2- 1756 A1
3- Global

During n.
skeletal form
[he SP predrc
melﬂluated,
complex“y 0'
”We Phases 1
Phase is Char;
are determine
the Semantics
militate Can
Semantics of 5

l0 repregem Ii]

a graphical entity, while those parts of a module that do not require rigorous descriptions
can be left unspeciﬁed (formally), with the descriptions of these modules being left to the
semi-formalisms.

We propose that three phases be followed when formally specifying a module:

1. Local Analysis
2. Use Analysis
3. Global Analysis

During the local analysis phase, the calling hierarchy of a module is constructed and a
skeletal formal speciﬁcation is built using the rules presented in Chapters 3, 4, and 5, with
the sp predicates left as parameterized transforms, that is, the transformations for sp are
unevaluated. The objective at this stage is to gain a high-level understanding of the logical
complexity of the given code. The second step, use analysis, is a recursive step where the
three phases are applied to the functions and procedures used by the original module. This
phase is characterized by the fact that the semantics of the used functions and procedures
are determined before they are used by the original module. However, in many cases, where
the semantics are either well-deﬁned or the semantics are not critical, an unevaluated sp
predicate can be used. For example, given a statement S and a precondition Q, where the
semantics of S are well-deﬁned, instead of evaluating the transformation, we use sp(S, Q)
to represent the logical expression describing the semantics. In the global analysis phase,
the ﬁnal step, the use analysis information is combined with the local analysis information
to obtain a global description of the original module. The global description, an expanded
form of the skeleton formal speciﬁcation constructed during the ﬁrst phase, elaborates
upon the semantics of a module by integrating the speciﬁcations constructed during the

106

use analysi
provided b)
described a
Deﬁn
Let L
the Sc

.11. A
.1! ca

Methr

use analysis into the skeleton. This activity corresponds to removing the encapsulation
provided by a procedure or function call. The following deﬁnition summarizes the method

described above.

Deﬁnition 7.1 (Structure Based Analysis Method)

Let M be a program with statements m1, . . . ,mk, and P = {P1 . . .Pn} be
the set of procedures called by M. In addition, let Q be the precondition for
M. A statement m.- is in P if there is a procedure 11,- in P such that program
M calls 12,- at line i of M.

Method R(M, Q)

I. (a) Apply sp(m1;... ;th)
(b) For each 2' such that m,- 6 {P1, . .. ,Pn}
33‘3P(mi,319(m1; - - - ;mi—la Q» -'= “sp(mi,sp(m1;... ; mi-1,Q))”-
(We refer to the right hand side as the skeleton.)

2. For each 1) 6 {P1,... ,Pn}, apply R(p,Qp), where Qp is the
precondition to the procedure p.

3. Replace skeletons from Step 1b with results of Step 2.

7.1.4 Abstraction

After constructing an as-built formal speciﬁcation using the process described in
Section 7.1.3, an abstraction of the speciﬁcation can be constructed using the approach
described in Chapter 6. In addition, we have found it appropriate to apply the abstraction
steps during intermediate steps of the method in Deﬁnition 7.1 in order to aid in reducing
the logical complexity of the speciﬁcations. As such, we can modify the structure based

method to be the following.

Deﬁnition 7.2 (Structure Based Analysis Method (Abstraction))

Let M be a program with statements m1, . .. , rm, and P = {P1...Pn} be
the set of procedures called by M. In addition, let Q be the precondition for
M. A statement m,- is in P if there is a procedure p,- in P such that program
M calls 17,- at line i of M.

107

 

Me

to

7-15 PI

The Chill? c

35 fOllovs s:

Method R’(M, Q)

1. (a) Apply sp(m1;. .. ;mk, Q). Abstraction method may be used after
each application of sp.

(b) For each i such that m,- 6 {P1, . .. , Pu}
set 3p(mia sp(ml; - -- Uni—1.62)) -'= "317(mi, sp(ml; - -- ;mi—r, Q»
(We refer to the right hand side as the skeleton.)

2. For each p 6 {P1,... ,Pn}, apply R(p,Qp), where Q, is the
precondition to the procedure p.

3. Replace skeletons from Step 1b with results of Step 2.
4. Apply the abstraction method to the ﬁnal speciﬁcation.

7.1.5 Process Summary

The entire combined process for the reverse engineering of programs can be summarized

as follows:

Deﬁnition 7 .3 (Informal and Formal Reverse Engineering Method)

1. Construct an informal high-level model of the software
2. Construct an informal low-level model of the soﬁ‘ware
3. Apply R’ to a module M, where M is chosen using some selection

criteria.

One of the primary difﬁculties in the process is the determination of the criteria that
can be used for Step 3. The criteria that we have used include the identiﬁcation of critical
procedures by examining the call graph constructed in Step 2. Our selection of critical
procedures is typically based on choosing those vertices with a large difference between
the in-degree and out-degree. Other criteria that can be used include keyword search and

data structure usage.

108

 

 

7.2 A;

In this 56
to resets:
Jet Propu
spacecraft
However.
This exam

Smiﬁcan.

7 .2 An Example

In this section, we demonstrate the use of the combined formal and informal approach
to reverse engineer modules from a mission control ground-based system at the NASA
Jet Propulsion Laboratory. The purpose of the code is to translate user commands into
spacecraft commands. The entire system consists of several thousand lines of code.
However, in many instances, it is more appropriate to analyze more critical sections.
This example focuses on a sequence of code in order to illustrate the derivation of the

speciﬁcations of modules that contain representative logic and programming constructs.

7.2.1 Local Analysis

Figure 7.1 gives the code for the translate procedure. Using AUTOSPEC, an initial
semi-formal analysis of the translate code yields a call graph as depicted in Figure 7.2,
where the rectangles indicate functions, and the labels correspond to the function names
given by the index to the right of the graph. From this initial analysis, we ﬁnd
that the translate function uses ﬁve functions including initializeinterpreter,
processbinarynutput, inform_user, processmnemonicinput, end_cmdx1t,
and process_carg. The translate function has four different modes: initialize,
translate, control argument assignment, and error. For this analysis, we focus on the
translate function in the translate mode (XLT). Thus, we are ignoring the initialization,
control argument, and default modes in this analysis. These modes correspond to the
INIT, CARG, and default cases of the switch statement, respectively. Therefore, we
are left with specifying the while statement depicted in Figure 7.3, where labels have

been attached to the programming constructs for convenience in the following discussion.

109

S‘W'nb w.
"be ..-.
O
r
exter:
Stat;.
struc-
82 ~-' up:
.‘ 5H1

{

Ci

Ca:

 

struct msg *translate (int op, char *args)

{

extern int dontoutput;
static struct project_parameters *pp;
struct msg *mp = NULL;

switch (op)
{

case INIT: /* initialize the interpreter */
pp = initialize_interpreter();
break;

case XLT: /* interpret a message */
while (args[0] != ’\0’)

{
if (process_mnemonic_input(&args, pp))
{
if (mp == NULL)
mp = process_binary_output(pp)t
else
{
mp—>next = process_binary_output(pp)t
mp = mp->next;
}
}

else
dontoutput = 1;

}

break;

case CARG: /* set a value for a control argument */
process_carg(&args, pp);
break;

default:

inform_user('internal error: bad op in translate");
end_cmdx1t(CMD_ERROR);
}

return(mp);

Figure 7.1: Translate Source Code

 

110

 

 

Figure 7.2: Translate

 

Informally, the translate function in the translate mode is responsible for building a
list of spacecraft instructions corresponding to interpreted commands by calling a function

called processbinary.output.

 

while (org-[0] l- '\0')

11 (pm.ll_nnmn10_1nput (aura-,1») )

f
1! (up -- mu.)
so SI SI 4 ID - Pm..._b1nlry_output(pp) ,
a 01..
Slb ‘
lit-nut . moo-Jinu-Laucpuupp) ,
ID I n-Muxt;
)
1
$2 01..

ﬂatware: . 1’

)

Figure 7.3: Translate Source 'Code

 

lll

A loca

statement .

Where the I
10 the star:
executed, tJ
some numk
outside of l
“Epsom;

L‘Smgtl

where B CC
states that 3f
“as EXQUIC‘
“dWVr
elchted 0r t

COTltajn a 3de

A local analysis (ﬁrst step) of the code in Figure 7.3 using the sp rule for the while

statement yields the following speciﬁcation:

'1(args [O] != ’\0’)/\(3i :0 s i : sp(SOi,Q)), (7.1)

where the expression (args [ 0] != '\0 ') has no side-effects, and Q is the precondition
to the statement SO. This speciﬁcation states that after the while statement has been
executed, the args array has a ’\0 ’ as the ﬁrst entry, and the statement SO has been executed
some number of iterations. Unfortunately, the speciﬁcation in (7.1) is not very informative
outside of identifying that the program uses an iterative construct. As such, an expansion
of sp(SO, Q) is warranted.

Using the labels shown in Figure 7.3, a speciﬁcation of sp(So, Q) is given by

sp(SO, Q) = sp(Sl, V(B) A sp(B, 0)) V (7.2)
sp(SZ, eV(B) A sp(B, 62))

where B corresponds to processmemonicinput (&args , pp). This speciﬁcation
states that after executing the statement SO, it will be true that either 81 was executed or 82
was executed, where the semantics are determined by the preconditions V(B) /\ sp(B, Q)
and -IV(B) A sp(B, Q), respectively. So, in this case, either the if statement (81) was
executed or the assignment statement (82) was executed. This speciﬁcation states that the
precondition sp(processmnemonic-input (&args,pp),Q) to the statement S0 may

contain a side—effect. This fact is made explicit by the use of the valuation function V.

112

Note that

Further ex;

and
SPLSZ. ~

”Speeds-e1);
Slates that g
creamed 0r
{Mp : 1W U
that gm“ th;
asSigm'flent 0:
The Prelii

consume

Expmssmn (7

Note that if the function processunnemonicinput has no Side-effect then

sp(processmnemonic-input (&args,pp),Q) = Q.

Further expansion of sp(Sl, V(B) A sp(B, Q)) and sp(SZ, -:V(B) A sp(B, Q)) yield

sp(Sl. V(B) A 819(3, Q)) = sp(Sla, (mp = NULL) A V(B) A sp(B, Q)) V (7.3)
sp(Slb, (mp aé NULL) A V(B) A sp(B, Q)),

and

3p(SZ,-1V(B)Asp(B,Q)) = sp(dontoutput = 1,-cV(B)Asp(B,Q)) (7.4)
= (dontoutput = 1) A (ﬁV(B) A sp(B, Q))f,“"“""“"“t

respectively, where v is the value of dontoutput before executing 82. Expression (7.3)
states that given that the expression ‘V(B) A sp(B,Q)’ is true, either Sla has been
executed or Slb has been executed, each depending on the added condition that either
(mp = NULL), or (mp 7L NULL), respectively. On the other hand, Expression (7.4) states
that given that the expression ‘nV(B) A sp(B, Q)’ is true, execution of 82 results in the
assignment of ‘1’ to the variable ‘dontoutput’.

The preliminary skeleton of the logical speciﬁcation of the translation module can
be constructed by substituting the Expressions (7 .3) and (7.4) hack into the original

Expression (7.2) such that

113

 

which stat
Slb. or 52

At this
Speciﬁcatit
begin 8 its.
function F:

1“ Sum}
Ofme func
function. I

encaPSUlatj
7'21 L'.

058 anal)“

Ino

Ur exam
mVOIVES S

funcum Dr

FigUre -
analﬁls f0]

{Dumped f0}

sp(So, Q) = sp(Sla, (mp = NULL) A V(B) A sp(B, Q)) V (7.5)
sp(Slb, -t(mp = NULL) A V(B) A sp(B, Q)) V
(dontoutput = 1) A (ﬁV(B) A sp(B, Q))ﬁ‘mwut’p“t
which states that in every iteration, one of three actions is executed, namely one of Sla,
Slb, or 82 (dontoutput = 1).

At this point in the analysis, since Sla and Slb are statements that depend on the
speciﬁcation of functions and procedures that are used by translate, it is appropriate to
begin a use analysis (second stage) for the translate function, where in this case, the
function processbinaryputput is analyzed.

In summary, during the local analysis phase for translate, a graphical representation
of the function was created with the intention of determining the calling hierarchy for the

function. Next, a logical analysis was performed using a top-down approach that uses

encapsulation with the intent of determining the logical complexity.

7.2.2 Use Analysis

Use analysis involves the speciﬁcation of functions that are used by a given object of study.
In our example, given that the object of study is the translate function, use analysis
involves specifying the functions used by translate. In this section we describe the
function processbinaryputput.

Figure 7.4 contains the source code for processbinary.output. The use
analysis for this function involves three steps, each corresponding to the steps

followed for translate. That is, we perform local, use, and global analyses on

114

process.
similar to
simplifyin
process-
strict appli
Here. he
informatior
analysis is
to describe
0f the entir
Chapter l8 1;-
these consu
Show how a
tri“‘Slate
COnSlder
damning “
“Spectre”...
for [he re turn
object due to

Therefme’ We

processbinary-output. The remaining analysis of processbinaryputput is
similar to the process used to analyze translate. However, in the interest of
simplifying the analysis we shall ignore many of the details involved with analyzing
processbinarybutput and focus primarily on the output characteristics. Note that the
strict application of the rules for sp requires a line by line construction of a speciﬁcation.
Here, we informally construct the speciﬁcation with the understanding that all of the
information can and should be constructed rigorously. Our main objective in this example
analysis is to provide enough information about processbinary-output to be able
to describe translate in a sufﬁcient manner without having to perform a full analysis
of the entire command translation system. Again, we note that the code used in this
chapter is taken out of context. Therefore, it is unreasonable to specify this code without
these constraints. Therefore, the speciﬁcation given in this section is used primarily to
show how a true speciﬁcation of processbinary-output might be used to describe the
translate function.

Consider the code given in Figure 7.4 for processbinaryputput. Three statements
determine whether or not the output of the function is deﬁned, labeled by I, J, and K,
respectively. Line I, for instance, has the interpretation that if space could not be allocated
for the return object, then the routine aborts, while line J forces the routine to return a NULL
object due to some other error. Finally, the line K indicates a successful return of an object.

Therefore, we can construct the following speciﬁcation for processbinaryputput:

115

SZI'BC

n. U S C. “in S K i
a...

q u a 1 ‘ud s) > 1M

7' a . \ be RU; D Abe \ b»

... 2. C D. My. .1 My 0 O C. O D. e
3. e 3. .C (L It 3 .. .1 .JL 1! . C C C C P. r

 

struct msg *process_binary_output (struct project_parameters *pp)

I

extern U16 *stem_entry;
U16 code;

U16 *ep;

struct msg *mp;

control_list;
(U16 *)stack_base;
(U32 *)min_S;

m 210
I

mp = (struct msg *)malloc(sizeof(struct msg) + MAX_MSG_BYTES);
if (m == NULL)
{

warn(”process_binary_output: \

out of memory (malloc failed)\n");

end_cmdx1t(-1);
}
/* —1 for length field, written over later */
PUSHL(mp->msg_bits - 1);

ep = get_entry(get_U32_Q());
P = ep + 1;
do
{
code = *P++;
if ((code < 1) || (code > 32))
{
warn('bad code");
end_cmdxlt(-l);
}
(*output_rtn[code])();
} while (code != RFMS);
mp->next = NULL;
mp->msg_len = *(mp->msg_bits - 1);
if (mp->msg_1en > pp->maxdmsg_bits)

{
fail(TOO_MANY_BITS, NULL, NULL);
free(mp);
return(NULL);

)

mp->msg_num = 0;

copy_space_filled(", mp->start, sizeof(mp->start));

copy_space_filled(", mp—>open, sizeof(mp->open));

copy_space_filled(", mp->close, sizeof(mp->close));

copy_space_filled(get_stem_and_title(stem_entry). mp->comment,
sizeof(mp->comment));

mp->chksum = chksum(mp->msg_bits, FLD_LEN_OF(mp->msg_1en)*2);

return(mp);

Figure 7.4: Process Binary Output Source Code

 

116

sp' warn.“
3p.ffai1.’ fr
1 return .

(A
1

“hlch stat:
Were CXCCL
Again, we

fUncti on al i i

7.2.3 G;

The ﬁnal Ste

b“ into th.

SPtSO. Q.

 

sp(warn; endrmdxlt, (mp = NULL) A Q) V
sp(fail; free; retum(NULL), (mp->msg-len > pp->max.msg_bits) A (mp 75 NULL) A Q)) V
sp(retum(mp), (mp->msg-len S pp—>max.msg_bits) A (mp 75 NULL) A Q)
(7.6)
which states that after executing processbinaryputput either warn and end-cmdxl t
were executed, the routine returned a NULL object, or the routine returned a valid object.

Again, we stress that this speciﬁcation is incomplete and only speciﬁes a small slice of the

functionality of the routine.

7.2.3 Global Analysis

The ﬁnal step in the analysis is to take the speciﬁcation in Expression (7.6) and integrate it

back into the skeleton speciﬁcation of Expression (7.5). This speciﬁcation is as follows

sp(So. Q) = ((mp = NULL) V (mp = u)) V (7.7)
(((mp->next = NULL) V (mp->next = u)) A (mp = mp->next)) V
(dontoutput = 1) A (ﬁV(B) A sp(B, Q)),“,“mt‘mt‘m‘t
where u is some new object. This speciﬁcation states that after executing SO, the variable mp
has either the value NULL or points to some new object, or mp->next has the value NULL
or points to some new object with mp pointing to mp->next. Finally, if neither of those
cases holds, it must be that dontoutput = 1. Recall that Expression (7.1) describes
the behavior of the XLT mode for translate and that the XLT mode uses an iterative
construct. As such, Expression (7.7) states that after each iteration, a chain of messages

7

is constructed or the dontoutput ﬂag is set to ‘1 . Note that in this speciﬁcation we

make the assumption that the pointer assignment behaves like a variable assignment. In

117

this case.

the alloca

7.2.4 I

At a higl

mode. ea.

condition,

[0 EXptess
a SPCClﬁca
to derive 5
of Soft“ an
facilitate {c
impemthQ
underSLarid:
5U“ reﬁcct .
armchanlsl
ICV‘e] rePYES
it is "Upon

SpeClﬁC‘aIlOr

this case, this assumption has no impact on the speciﬁcation since no reference is made to

the allocated data objects outside of simple assignments.

7.2.4 Discussion

At a high-level, the speciﬁcation in Expression (7.7) states that while in the XLT
mode, each iteration of the loop adds to a chain of messages or results in an error
condition. The reﬁnement of the speciﬁcation of the XLT mode from Expression (7.1)
to Expression (7.7) represents just a small portion of what would be required to obtain
a speciﬁcation of an entire system. As described in Section 7.1.2, it is not feasible
to derive speciﬁcations for the entire system. Formal speciﬁcations of critical sections
of software, however, are merited and having such formal, concise speciﬁcations can
facilitate formal program understanding since the speciﬁcations are behavioral rather than
imperative. The speciﬁcation in Expression (7.7) does provide a somewhat higher-level of
understanding of the corresponding program code. Nonetheless, the resulting speciﬁcations
still reﬂect signiﬁcant implementation bias. However, the “as-built” speciﬁcations provide
a mechanism for traceability of the reverse engineering process, particularly as the higher-
level representations become more abstract. That is, for technology transfer purposes,
it is important for system maintainers to understand the starting point for the formal

speciﬁcation process for reverse engineering.

118

Chapter 8

Tool Support

One of the attractive properties of formal methods is that formal languages with well-
deﬁned syntax and semantics facilitate the use of automated support tools. In this chapter,
we describe the development of several tools that have been designed to support the formal

reverse engineering techniques presented in this dissertation.

8.1 Overview

Chapter 10 describes several tools that have been developed to support reverse engineering
activities. In this chapter, we describe the development of a suite of reverse engineering
tools that have been designed to support the formal reverse engineering techniques

presented in this dissertation. The suite consists of four tools:

AUTOSPEC: AUTOSPEC is a tool that is used to support the construction of
speciﬁcations using the semantics of the strongest postcondition predicate
transformer.

SPECGEN: SPECGEN is a tool that is used to support the derivation of abstract
speciﬁcations from as-built speciﬁcations.

SPECEDIT: SPECEDIT is a speciﬁcation editor that is used to support the
construction of syntactically correct speciﬁcations.

119

TF

Figur
in the for
the tools
during th.

circles.
\

TPROVER: TPROVER is a tableau theorem prover that is used to verify the
consistency of speciﬁcations that are modiﬁed by a user.

Figure 8.1 shows the inter-relationships that exist between the various tools in the suite
in the form of a data ﬂow diagram. In addition, the diagram shows the relationship between
the tools in the suite and external tools that we have used to aid in the analysis of software
during the reverse engineering process. In the diagram, external tools are shown as dashed

circles.

 

Source Code

Source Tokens

,' Suif

|
\ Compiler ,‘\- Suit Tokens
\ I
\ I

Suif Code

  
     
    
    
   

   

Speciﬁcation

  

Annotations
Statements

Temp File

   

Speciﬁcation Speciﬁcation

Speciﬁcation

  

Speciﬁcation

 

Decisions User

 

 

 

Figure 8.1: Tool Suite

 

The overall process for using the tools begins with a pre-processing step whereby
the SUIF Compiler [45] is used to generate an intermediate format based on the C
programming language. The AUTOSPEC system takes the SUIF generated code as input

120

and be
can prc
editor.

to the g
using th

be Vlsua

8.2 A

The (Sel
in Order
engineeti

Prototype

and based on user input, generates source code annotations. During the analysis, the user
can provide assistance to the AUTOSPEC system via the use of the SPECEDIT speciﬁcation
editor. In addition, the TPROVER theorem prover can be used to verify user modiﬁcations
to the generated speciﬁcations. Finally, after as-built speciﬁcations have been constructed
using the AUTOSPEC tool, the SPECGEN tool can be used to generate abstractions that can

be visualized using the Visualization of Compiler Graphs (VCG) tool [46].

8.2 AUTOSPEC

The (Semi-)Automated Speciﬁcation system, or AUTOSPEC, was originally developed
in order to demonstrate the feasibility of our initial investigations into reverse
engineering [17]. Written in an object-oriented variant of Prolog, the original
prototype facilitated the application of user-directed heuristics to construct predicate logic
speciﬁcations from Dijkstra guarded command language programs [17]. Since then, several
different variations and reﬁnements of the AUTOSPEC system have been developed, each
with the intention of investigating some aspect of the research described in this dissertation,
including the analysis of programs using strongest postcondition semantics [6], and the
analysis of pointer semantics [43]. In this section, we describe the most recent version of
AUTOSPEC that has been reﬁned from previous versions in order to handle more complex

languages and a wider variety of programs.

8.2.1 Design

The high-level design of the AUTOSPEC system is shown in Figure 8.2. The AUTOSPEC

system interacts with three different environmental entities: the User, a speciﬁcation editor

121

called SP
theorem r
reads 3 iii
formal Spe
source CO(
comes in

user also ;

and TPRO\
\

called SPECEDIT, and a theorem prover called TPROVER. The speciﬁcation editor and
theorem prover are described in Sections 8.4 and 8.5, respectively. The AUTOSPEC system
reads a ﬁle, and based on various interactions with the user and external tools, generates
formal speciﬁcations based on the use of strongest postcondition, and annotates the original
source code with those speciﬁcations. Direct user interaction with the AUTOSPEC system
comes in the form of decisions about how a source ﬁle analysis should proceed. The
user also interacts with the AUTOSPEC system indirectly via the use of the SPECEDIT

and TPROVER systems.

 

 

 

 

 

 

 

  
  
   
  
  

File SpecEdit
Speciﬁcation
Annotated Tokens
Source

Speciﬁcation

 

Speciﬁcation

Decisions
Speciﬁcation

 

 

Tprover User

 

 

 

 

 

 

Figure 8.2: Level 0 AUTOSPEC Model

 

The design of the AUTOSPEC system follows the same general architecture of many
compiler and static analysis systems [47]. That is, the design of the AUTOSPEC system
consists of a parsing component that reads a source ﬁle and creates an abstract syntax
tree, an analysis component that is used to construct speciﬁcations from the program,
and an output component that writes the results to an appropriate output ﬁle. Figure 8.3
contains the level 1 data ﬂow diagram of the AUTOSPEC system. The Parse, SP, and

122

Output pr
In additio
componer
system is .
addition tt
311d annot.

statements

\

Output processes correspond to the parsing, analysis and output components, respectively.
In addition to the standard compiler-oriented components, the AUTOSPEC system has a
component for interacting with the user (i.e., a user interface). Data in the AUTOSPEC
system is centered primarily around the Abstract Syntax Tree and program statements. In
addition to statements, ﬂow of data in the AUTOSPEC system consists of speciﬁcations

and annotations, where annotations are speciﬁcations that are tied to speciﬁc program

 

   
 
  
   
 

 

 

 

 

 

 

 

statements.
Filenarne
File %
Annotated
Source
Tokens
Annotated Filename
Statements
Abstract
tax Tree User
swam" Interface
'ﬁ ti Statemen Statement
_ o- ca on -
Tprover Annotations
priﬁcation ’ @ Statements ,1
Statement I a
Speciﬁcation e ‘ , . ’ ’
\ ‘ ‘ ‘ Start ' ’ ’ I
_ S ciﬁcation
SpecEdit pc User

 

 

 

 

 

 

Figure 8.3: Level 1 Data Flow Diagram of AUTOSPEC

 

The primary component of the AUTOSPEC system is the analysis, or SP component.
The SP component consists of several procedures that are responsible for constructing

speciﬁcations from programming constructs. The formal speciﬁcations that are generated

123

by the S}
dissenati

launching

8.2.2

The ACT
0f source
dCthOpec
Primarily
the applic
mm be lr

SOCIlOn d6

SUI“ Cor

by the SP component correspond directly to the semantic deﬁnitions given throughout this
dissertation. In addition to constructing speciﬁcations, the SP component is responsible for

launching the TPROVER and SPECEDIT applications when user input is required.

8.2.2 Implementation

The AUTOSPEC system was developed primarily as a means for supporting the analysis
of source code using strongest postcondition. The AUTOSPEC system was originally
developed to support the Dijkstra guarded command language. Since that language is used
primarily for theoretical development and analysis, it was decided that in order to show
the applicability of our approaches to real systems, support for a commonly-used language
must be included in the subsequent implementations of AUTOSPEC. The remainder of this

section describes the implementation of a C variant of the AUTOSPEC system.

SUIF Compiler

The Stanford University Intermediate Format (SUIF) library is a suite of routines that
were developed to support research for optimizing and parallelizing compilers [45].
Developed by the Stanford University Compiler Group, the objective of the SUIF compiler
is to provide an extensible support system for a wide variety of compiler-oriented
investigations [45].

The motivation for using the SUIF compiler suite of tools is as follows. First, the SUIF
compiler provides a library of routines for parsing and accessing source code information
via the use of an abstract syntax tree representation of SUIF code. Second, by using the
SUIF compiler, we are able to take advantage of several built-in features including the

use of annotations to document programs, and source code iterators for traversing the

124

abstracts:
Finally. bj
communi:
The S
based on
of Fortra
Condition;
that is det
libraries 51
[0 um‘Crse
SUIF
AUToSpE.
SPOCiﬁCam
aintonation:
In the 1
see is [156 d
SUI: “bra;
imennwl’atr
to acCegS 1h
speciﬁcaU-OF
facility is W
ﬁnal, a [00] (

C .
0% with an!

abstract syntax trees. In addition, the SUIF library has extensive support for symbol tables.
Finally, by using the SUIF compiler we are able to leverage the experience of an established
community of users.

The SUIF library focuses on the organization of input ﬁles into abstract syntax trees
based on the structure of the C programming language. SUIF also supports the analysis
of Fortran programs and contains support for programming constructs, such as loops,
conditionals, and assignments. These constructs are translated into an intermediate format
that is decomposed into semantically equivalent SUIF constructs. In addition, the SUIF
libraries support the use of symbol tables as well as convenience functions that can be used
to traverse source code.

SUIF also supports the use of source code annotations. In the context of the
AUTOSPEC system, these SUIF code annotations are used to attach strongest postcondition
speciﬁcations to particular programming statements. After analyzing the source code, these
annotations can be translated into comments and annotated to the original source code.

In the AUTOSPEC system we use the SUIF tools as follows. First, the SUIF compiler
scc is used to generate a SUIF intermediate ﬁle from the original source code. Second, the
SUIF library of tools is used by the AUTOSPEC system in order to manipulate the SUIF
intermediate ﬁle. Third, the symbol table and source code traversal functions are used
to access the abstract syntax trees for the input source code in order to generate formal
speciﬁcations based on the semantics of the strongest postcondition. Fourth, the annotation
facility is used to associate formal speciﬁcations with speciﬁc source statements. Fifth and
ﬁnal, a tool called s2c is used to translate the SUIF source code into equivalent C source

code with annotations.

125

Interface

llle inter‘f
In additio:
the SLIP 1
main wine

procedure

~1r~
E

 

Interface

The interface for the AUTOSPEC system was constructed using the Tcl/I‘k language [48].

In addition, we used the C++ language to provide the interconnection between Tcl/I'k and

the SUIF libraries. The interface is organized primarily around the input source code. The

main window, as shown in Figure 8.4, is

used to display the source code for a user-selected

 

 

 

 

 

 

 

 

 

 

 

 

 

 

procedure.
i mi 1 1’ n)
{In W gem-u mum Hm: [BoGCMDCopy M m i
332
t
id : opcn_copy_filc(filc. L2, 96): i g
, 'rf (fd < 0) i i
t ?
dontoutput = 1; 3 g
return 0;
}
; if ((skip_gcnld_sfdl(fd, L2) != o =-_- 0)
‘ i { l
inform_uul(_tmp_shing_o, body_lincno, ﬁlo);
dontoutput = 1 ; l
dose(fd); 3
if ('panmscmdcnﬁ != 0) i
mastcr_unlock_prox1):
realm 0; i
1 l
. a; N... is .. .L. .n I. Juregg,_....-..-u.-r--.l.-.__.n\ I 4n -n J I 3.
“FL—_J / J)

 

 

Figure 8.4: AUTOSPEC Main Window

The analysis of the source code includes three distinct phases. The ﬁrst phase focuses

on allowing a user to select analysis breakpoints, thus providing the user with a means

for indicating where they prefer to provide input to the analysis. Selecting (i.e., “double-

126

 

clicking
procedu
shows tl

system.

clicking”) a particular programming statement indicates that the analysis of the current
procedure should be interrupted just before processing of the selected statement. Figure 8.5
shows the interface with a selected statement shown in italics. During an execution of the

system, the selected statement appears italicized in blue.

 

 

 

 

 

 

 

 

 

 

"i man I i .1 l
E!- W W amt mun: ‘doGCMDCopy M M g
famﬁjﬁfﬁliﬁﬁimuh -___.____ W" ____ __-_ W l}: l
; tr (fd < o) l
‘ r
dontoutput = 1; =
return 0;
I
i if ((SkiPM_8ﬂu(fd, [2) != 0) == O)
P r
infom_uur(_tlnp_stling_o, body_lincno, ﬁle); ;
dontoutput -.- 1 : f
close(fd);
if ('pmmscmdcntl la 0) i
mastcr_unlodt_prox1);
return 0: i
-1 W t- WW, _ .. . , ....... .. / i

 

 

 

 

 

Figure 8.5: AUTOSPEC A Selection in the Main Vlflndow

The second phase of the analysis is the speciﬁcation phase. In this phase, AUTOSPEC
constructs a formal speciﬁcation of the procedure by using the strongest postcondition
semantics described in this dissertation. As each analysis breakpoint is encountered,
AUTOSPEC pauses, as depicted in Figure 8.6, and allows the user to modify the current
annotation (i.e., the precondition for the next statement). The modiﬁcation of the
speciﬁcation is performed by using the SPECEDIT speciﬁcation editor, which is described

in Section 8.4. Using the SPECEDIT system, the user modiﬁes the precondition and can

127

optionally
userdeﬁ n:
prover to s

implement

E
3.

optionally launch a theorem prover that can be used to check the consistency between the
user-deﬁned speciﬁcation and the system-deﬁned speciﬁcation. We developed the theorem
prover to support the tableau proof method [49]. In Section 8.5 we describe the design and

implementation of this prover.

    
  
 
  

il“((((((end.=V JnremZ)A((begin.V= _param1)A((flle .> _peramo)A('
, I'(((((((endV _pmm2)/\((begin.v= _parem1)A((ﬁle > _paremO)AQ
’ if((ekip_gcmd_e1‘dl(fd. L2)I= 0): =o) ‘

 

( ( ( ( ( ( (end.=V _peremz) A ( (begin.V= _perem1) Ii
A ( _peremo.V= =_pVel1 ) ) ) ) A (fd.V= open__ oopy_ﬁ
) A (fd.V<o) ) A (dontoutputN: 1) ) A (retum.=V o) )
A ( (begin.V= Jerem1) A ( (ﬁle .> _peremO) A ( _peremo
) ) ) A (fd.V= open_ copy_ file(fi|e..V L2. 96)) ) A ( ~ (fd?

     
   
   

4W“.—

 

 

 

 

"Eu: (((((-(.(end. V= Jeremz) A( (begin. V= erevnn) A ( (ﬁle.>
; °K. l mmolALP m-IIIOV==J=VI |1A)))) (fd
m l open_ copy_ ﬁle(ﬁle. v.L2 96)) )A (M. v < o) )A (dontoutpuLV- -

 

1))A(remm.V = 0) )V( ( ( (end. V= _perenlz) A ( (begin. V- =
Fperem1) A( (ﬁle .> _pean) A( _penmN == JVeﬂ ) ) ) )A
> f

 

d.V = open_oopy_ﬁle(ﬁle.v. L2. 96)) )A( ~(fd.V < o) ) ))

 

 

 

Figure 8.6: Launching SPECEDIT from AUTOSPEC

The ﬁnal phase of the analysis is the post-speciﬁcation phase. During this phase, the
user is free to select and modify any annotation that is displayed in the main window.
In addition, we are planning on providing a mechanism where the user can modify a

speciﬁcation, replay the analysis, and incorporate the changes into the analysis.

128

8.35

The At"
the speci
need to l
system.

Speciﬁcal

8.3.1

The high.
Sl‘stem ir,
VCG [46}
in ﬁgure

In the {Om

the Graph
\

8.3 SPECGEN

The AUTOSPEC system focuses on the construction of as-built speciﬁcations. Once
the speciﬁcations have been constructed using the AUTOSPEC system, the speciﬁcations
need to be generalized into higher-levels of abstraction. The Speciﬁcation Generalization
system, or SPECGEN, was developed in order to aid in the construction of abstract

speciﬁcations from as-built speciﬁcations.

8.3.1 Design

The high-level design for the SPECGEN system is shown in Figure 8.7. The SPECGEN
system interacts with two environmental entities: the User, and the visualization tool
VCG [46]. Upon launching, the SPECGEN system reads a speciﬁcation from a ﬁle (SpecFile
in Figure 8.7), and based on user decisions, constructs abstractions that can be visualized
in the form of a partial-order diagram. The partial-order diagram, depicted in Figure 8.7 as

the GraphFile, is then displayed using the VCG tool.

 

 

 

 

 

  
  

 

 

 

SpecFile GraphFile
Vﬁcaﬁon \
Speciﬁcation
Graph
Dis la‘
Decisions p y\\
it it
User VCG

 

 

 

 

 

 

Figure 8.7: Level 0 SPECGEN Model

 

129

Figurr
internal 5
a user inr
parser the
Section 8.
for media
engine is:

mChnque
\

‘.

Figure 8.8 shows the level 1 data ﬂow diagram for the SPECGEN system. The
internal structure of the SPECGEN system consists of three major components: a parser,
a user interface, and an abstraction engine. The parser is a standard Lex and Yacc [50]
parser that has been constructed for checking the syntax of ﬁrst-order logic speciﬁcations.
Section 8.6 describes the parser in more detail. The user interface is the primary mechanism
for mediating interaction between the user and the abstraction engine. The abstraction
engine is responsible for constructing hi gh-level speciﬁcation generalizations based on the

techniques described in Chapter 6.

 

 

SpecFile

keeiﬁcaﬁon
Speciﬁcation

 

  
    
  
  

Speciﬁcation

 
 
 
 

Abstraction

  
 

  
 

 

 

 

 

 

\ Command, Graph
S mbollc
Speciﬁcation peciﬁcation
User
GraphFile
\
\ \ \Display Graph
Decisions ‘ \
\ ‘ A
User VCG

 

 

 

 

 

 

Figure 8.8: Level 1 SPECGEN Model

 

8.3.2 Implementation

The implementation of the SPECGEN system has two major components. The ﬁrst

component is the abstraction engine, which is responsible for deriving symbolic

130

abstractior
which pro
this sectio

Prolog

The abstra
C routines
and backtr
SIZE (101:

The P
SPCCiﬁcatir
“Sing a lil
Straighthr

faCilitated .
Imeﬂ'ace

The USer in
the dire“ u
SpeCiﬁCaUO
facilitatES SI
of intereSt’
all p053”)!

C

the top Don.

buttons label

abstractions from an input speciﬁcation. The second component is the user interface,
which provides a mechanism for facilitating user-driven speciﬁcation generalization. In

this section we describe each of these components in detail.
Prolog

The abstraction engine was written using SWI-Prolog [51] interconnected with a number of
C routines. The primary motivation for using Prolog was to take advantage its inferencing
and backtracking capabilities. The small number of routines (approximately 20) and their
size (10-15 lines each), easily justiﬁed our choice of language in this case.

The Prolog routines are primarily responsible for deriving partial-orderings of
speciﬁcations as well as pruning and expanding the orderings based on user input. By
using a library of C routines that support integration of C and Prolog, it became a
straightforward process to build the system along with a graphical user interface that

facilitated visualization and manipulation of the speciﬁcation generalizations by a user.
Interface

The user interface for SPECGEN includes two different components. The ﬁrst component is
the direct user manipulation component that allows a user to load, manipulate, and analyze
speciﬁcations using the abstraction engine. Written using Tcl/Tk [48], this component
facilitates speciﬁcation generalization by allowing a user to select speciﬁcation components
of interest, to exclude or mask out certain speciﬁcation components, and to generate
all possible abstractions. For instance, Figure 8.9(a) depicts the SPECGEN system with
the top portion containing the conjuncts of a conjunctive normal form expression. The

buttons labeled “Delete Conjuncts”, “Preserve Conjuncts”, “Focus”, and “Generate All”

131

dbwau
speciﬁcat
been gent

called VC

\

sent . ii

'iiit

 

£127

/

allow a user to derive different levels of abstraction with differing levels of detail from a
speciﬁcation. Figure 8.9(b), shows the results of an analysis, where a speciﬁcation has

been generalized using SPECGEN and the resulting partial—ordering visualized using a tool

called VCG.

 

 

 

£II mu line

L ' 0 -" Jill-M) new m _
MWV- J”) a a
mu .; M) m m I

- I:((((_MU.V - Jim; 0. (wuv - min-N) - in; l -

 

 

 

 

 

(a) SpecGen (b) Focus
Graph

Figure 8.9: SPECGEN Interface and Output

 

The second component of the user interface is the visualization component. In this
component we take advantage of an existing system called VCG [46], a system that supports
the visualization of graphs. VCG provides many functions for graph layout and placement,
issues that are well beyond the scope of our investigations. One of the shortcomings
of using VCG, however, is that it lacks a mechanism for providing feedback to external

132

systems 5
we plan 0

facility.

&4 SF

One of th
based. A
One of th
language i
Checker is
“Petiﬁca-

AETOSPEC

8.4.] D

ﬁgure8.10
SPECEDIT
and allowS
membrane
ﬁgure 8
design of [he
The User inte
w

a)’S. FITSI,

s - '
DeclﬁcatronS

systems such as the main SPECGEN system. As such, as part of our future investigations,
we plan on extending the functionality of SPECGEN to incorporate an internal visualization

facility.

8.4 SPECEDIT

One of the advantages of using formal methods is that their notations are mathematically
based. As such, these notations are amenable to automated processing and reasoning.
One of the tools that can be constructed to support any particular formal speciﬁcation
language is a syntactic checker or parser. An associated tool along the same lines of a
checker is a syntactic editor. In this section we describe the design and implementation of
a speciﬁcation editor that we have developed in order to facilitate user interaction with the

AUTOSPEC and SPECGEN systems.

8.4.1 Design

Figure 8.10 shows the high-level data ﬂow diagram model for the SPECEDIT system. The
SPECEDIT system interacts primarily with the user to construct or modify speciﬁcations,
and allows the user to save the speciﬁcations to ﬁles for later modiﬁcation or for
incorporation into other tools that use ﬁrst-order logic as an input language.

Figure 8.11 contains the data flow diagram for the SPECEDIT system. The internal
design of the SPECEDIT system has two primary components: a parser and a user interface.
The user interface facilitates the construction of a syntactically correct speciﬁcation in two
ways. First, the user interface has a graphical interface that allows users to construct

speciﬁcations using a point and click method. The user interface also has a text-based

133

 

 

SpecFile

§ﬁcaﬁon

 

 
  

Speciﬁcation

Decisions

 

User

 

 

 

Figure 8.10: Level 0 SPECEDIT Model

 

interface that allows a user to type in a speciﬁcation. The parsing component is responsible
for two different activities. First, the parser is responsible for checking the syntax of pre-
existing speciﬁcations that are contained in input ﬁles. The parser is also responsible for
checking the syntax of user modiﬁcations that are made using the text-based interface for

the system.

8.4.2 Implementation

The Speciﬁcation Editing system, or SPECEDIT, was developed in order to facilitate
speciﬁcation modiﬁcation during the analysis phase of the AUTOSPEC system. The main
objective in constructing the SPECEDIT system was to provide a way of ensuring syntactic
correctness during the modiﬁcation of a speciﬁcation. This correctness is ensured in one of
two ways: by construction, or by veriﬁcation. Correctness of construction is facilitated by
providing a mechanism whereby a user can click on various syntactic elements and replace

them with valid substitutions. For instance, Figure 8.12, shows the conjunctive formula

134

 

 

SpecFile

 

  
 
  
   

Speciﬁcation

Speciﬁcation

Speciﬁcation

 

Tokens

Abstract iﬁcation
Syntax Tree
User

Decisions

 

 

 

 

User

 

 

 

Figure 8.11: Level 1 SPECEDIT Model

 

“p(a:) A Formula”. The italicized font for p(a:) indicates that the term has been selected
using a mouse click. By double-clicking on the “Formula” conjunct in the upper window
of SPECEDIT, the user can choose to substitute the conjunct with either an atomic formula,
a conjunction, disjunction, implication, or quantiﬁcation. Since substitutions can only be
made with valid substitutions, the ﬁnal speciﬁcation is syntactically correct by construction.

The second way of ensuring syntactic correctness that is supported by SPECEDIT is
to verify correctness using a syntactic checker or parser. In the lower window of the
SPECEDIT interface there is a speciﬁcation modiﬁcation window that can be used by a
user to type in a desired speciﬁcation. By clicking on the “OK” button, a user directs the

SPECEDIT system to run the syntactic checker on the speciﬁcation in the lower window. If

135

- . 'Wmmnnm-rgmrmaam .rmr: wisest: if 35’313'2'26'3304‘6528‘3 *1. :ﬁﬁfiﬁm
a I. : ‘3‘. -:,quJ.—’J.c o1! .‘. - ‘I . , ( . . . “ p ' ' . . « _ _i . '~ .‘I.

  
  

 

 

 

 

a- w ammu- Elx) __---_..._ M -,- aw
' ackup f: :
mle _l l

1 40mm) ( p(x) A Formula )

: (andsta x) :: Formula)
Formula A Formula)

Formula V Formula) . _
1 Formula n Formula) '- i
7; Formula a» Formula) 2

 

 

0" ( PO!) A Formula )

 

_

W l

 

 

IxL______l:l

 

 

 

Figure 8.12: SPECEDIT

the speciﬁcation is syntactically correct, the speciﬁcation is loaded into the upper window

and the user is free to modify the speciﬁcation in either window.

8.5 Theorem Prover

Many of the interactions between a user and the AUTOSPEC system involves the
modiﬁcation of a speciﬁcation by a user and the re-introduction of the modiﬁed
speciﬁcations into the current analysis or program annotation. In order to verify that
the speciﬁcation modiﬁcations made by a user are logically consistent with the system-
generated speciﬁcations, we have constructed a simple theorem prover. In this section we

describe the design and implementation of the theorem prover TPROVER.

8.5.1 Design

Figure 8.13 depicts the high-level design of the TPROVER system. Except for ﬁle

interactions and user guidance, the TPROVER system is entirely self-contained. The

136

TPROVE
be prove

the TPR(

 

TPROVER system takes as input a source ﬁle containing a ﬁrst-order logic speciﬁcation to
be proved. Using guidance provided by a user for reasoning about quantiﬁed expressions,

the TPROVER system determines whether or not the speciﬁcation is valid.

 

 

SpecFile

 

Speciﬁcation

F ilename,
Ground Terms

 

 

 

User

 

Figure 8.13: Level 0 TPROVER Model

 

Figure 8.14 shows the internal structure of the TPROVER system. The primary
components of the system are: the parse component, the prover component, and the user
consult component. The parse component is a Lex and Yacc generated parser that is used to
translate ﬁrst-order logic speciﬁcations into an abstract syntax tree. The prover component
uses the information in the abstract syntax tree to generate a proof tree based on the tableau
proof technique. For certain proof rules in the tableau method, ground terms must be
identiﬁed in order to continue processing. In these instances, the consult component is
used to interact with the user in order to identify an appropriate ground term for the proof

method.

137

 

 

SpecFile

 

  
 
  

Speciﬁcation

Abstract
ﬁntax Tree

 

 

Formulas

Speciﬁcation

   
 

Tableau Entry

   

Filename,
Ground Terms

 

Proof Tree

 

 

 

 

User

 

Figure 8.14: Level 1 TPROVER Model

 

8.5.2 Implementation

with C++.

The main proof engine in the TPROVER system was constructed using C++. The proof
engine construction was facilitated by the existence of the formula class library described

in Section 8.6. The graphical user interface was constructed using Tcl/Tk interconnected

The TPROVER system takes as input the name of a ﬁle containing a logical expression.
Based on the logical rules described in Section 8.5.1, the TPROVER system will generate
a proof tree. For propositional logic and certain expressions of the ﬁrst order logic, the

theorem prover is automatic. For the remaining classes of valid input, user direction is

138

required
former c

Figu:
the main
user. The
provided
the proof
in proof c

Each indie

required. In the latter case, the TPROVER system acts as a tableau proof editor. In the
former case, the TPROVER system acts as a theorem prover.

Figure 8.15 shows the main window of the TPROVER system. The upper sub-window is
the main prover window. In this window the proof tree is constructed and displayed to the
user. The lower sub-window is the proof information window. In this window the user is
provided information about the current proof. Speciﬁcally, during times of user interaction,
the proof information window contains data about the current entry in the proof tree. To aid
in proof comprehension, the vertices in the proof trees are displayed with different colors,

each indicating a different state of processing.

8.6 Formula Class Library

Each of the support tools described in this chapter rely heavily upon the use of a class
library that we have developed to facilitate the manipulation of ﬁrst-order logic expressions.
This Formula class library is a collection of classes that organize logical expressions
according to an inductive deﬁnition of ﬁrst-order logic [49].

Figure 8.16 shows the object model for the Formula class library. The superclass
“Formula” is the base class for the entire library. Each of the subclasses BinaryFormula,
NegatedFormula, QuantiﬁedFormula, and AtomicFormula represent the different syntactic
elements of ﬁrst-order logic. In addition, the AtomicFormula class has three subclasses
(i.e., BinaryAtomic, UnaryAtomic and Literal).

The atomic classes in the Formula class library, with the exception of the Literal class,

are aggregations of Term objects. The Term objects correspond to terms in ﬁrst-order logic.

139

 

 

 

____. L77
7 .._

 

/

The implen
Constants, ar
The F0"
In addition,
communion

means {Or Che

 

Firmware-(mmrxn-wtdn
TUN‘MWIWNPU‘W
Fllml=W(Wl)=F(3))-’Pldl)
TM'SNH'IFlW-HPUI)‘
Frill)

 

 

Tamil:wa 1’00)" 1 7 .
rpm '

 

 

 

 

Figure 8.15: Example TPROVER Session

The implementation of the Term classes include speciﬁc classes to represent variables,
constants, and functions.

The Formula class library was implemented using the C++ programming language.
In addition, a standard formula parser was constructed using the Lex and Yacc parser
construction system. This parser is used by every tool in the AUTOSPEC tool suite as a

means for checking the syntax of ﬁle and user input.

 

 

Formula

 

 

 

 

 

 

 

 

 

A

 

 

 

 

BinaryFormula

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

QuantiﬁodFomrula

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

UnaryFormula AtomlcFormula NogalodFormula
BinaryAtomlc UnaryAtomic LiteralAtomic
2 l ?
Term
TarmVarlablo TormConstant TarmFunctlon

 

 

 

 

 

 

 

 

 

Figure 8.16: OMT model of the Formula class library

 

141

 

 

 

One
SOleare
“719 TEUSr
of mVersg

”mare

Chapter 9

Application of Reverse Engineering to
Support Software Reuse

One of the methods that is used to explicitly reduce the effort needed to develop
software is to reuse existing code. In the case of the Ariane 5, the software for the Ariane 4
was reused and resulted in catastrophic loss [2]. In this chapter we discuss the application
of reverse engineering to the area of software reuse in order to facilitate the construction of

software component libraries.

9.1 Overview

Historically, the use of software components-off-the-shelf (COTS) has been limited to
complete applications. The introduction of object-oriented programming, design patterns,
and other new development techniques have focused on the creation and reuse of ﬁner
grained units such as software COTS, but the wide-scale use of such components in the
same manner as hardware integrated components has been limited. As a development
technique, software reuse is a process of constructing a software system using existing
software components. Formal approaches to software reuse rely heavily upon speciﬁcation
matching criterion, where a search query using formal speciﬁcations is used to search a

library to identify components that can be used for reuse purposes. Jeng and Cheng [35, 31]

142

addresse
Chen ant
speciﬁca
[hill mak:
indices is
In thi
SOfI“ are
reuse. In
IOOlS IO 51.

a 50ft“ an

92 A:

addressed the use of formal methods and component libraries to support software reuse, and
Chen and Cheng [52, 53] investigated the construction of software based on architectural
speciﬁcations. One of the primary difﬁculties of using a formal approach for software reuse
that makes use of formally speciﬁed components is that creation of the formal speciﬁcation
indices is not explicitly addressed.

In this chapter, we present an approach for combining software reverse engineering and
software reuse to support populating speciﬁcation libraries for the purposes of software
reuse. In addition, we discuss the results of our preliminary investigations into the use of
tools to support an entire process of populating and using a speciﬁcation library to construct

a software application.

9.2 A Software Reverse Engineering and Reuse Framework

Many software reuse approaches depend on the assumption that a library of reusable
components is available for use. There are two techniques for populating these libraries
with reusable code. The ﬁrst technique is to construct components with the intention of
placing them into code repositories. Examples of these repositories are the Microsoft
Foundation Classes [54] as well as libraries for standard problem domains such as
mathematics, graphics, and networking. The second technique for populating code
repositories is by identifying existing code as potential candidates for reuse, and then
packaging that code into a library. In either case, a primary concern is the mechanisms used
for indexing, identifying, and retrieving the components from the libraries [31, 19, 20, 32].

This chapter describes how a formal approach to reverse engineering can be integrated
with a formal approach to software reuse in order to support after-the-fact construction and

143

 

use of re
software
distinct c
by the pr
process C
the linkag
is require
library is

Suite
\

use of reusable code libraries. Figure 9.1 gives an overview of the reverse engineering and
software reuse framework in the form of a data ﬂow diagram. The diagram shows two
distinct components within the framework: the Reverse Engineering component, indicated
by the process circle labeled “RevEgr Suite”, and the Reuse component, indicated by the
process circle labeled “Reuse Suite”. Two integrating factors within this framework provide
the linkage between the two components; the User and the Speciﬁcation Library. The user
is required to direct the reverse engineering and the reuse processes, and the speciﬁcation
library is the common medium and repository between the RevEgr Suite and the Reuse

Suite.

 

 

User

 

 

 

     
      
 

Problem Deﬁnitions,
Decrsrons

Assertions,
Decrsrons

Code System

Module Speciﬁcations Module Speciﬁcations

Speciﬁcation
Library

Figure 9.1: The Reverse Engineering and Reuse Framework

 

Mthin this framework, a user can analyze source code and construct formal
speciﬁcations that can be used to index the source for reuse purposes. This reverse
engineering and population activity facilitates the reuse part of the framework, namely

the search, identiﬁcation, and packaging of components into new systems. We next discuss

144

the spet

aspects

9.2.1

Figure '
engineer
intestig:
investig;

decision;

\

the speciﬁc techniques and tools that are used to support the reverse engineering and reuse

aspects of the framework, respectively.

9.2.1 Reverse Engineering

Figure 9.2 contains a data ﬂow diagram that depicts our investigations into reverse
engineering as a two-stage process, where the AUTOSPEC process bubble represents our
investigations with strongest postcondition, and SPECGEN process bubble represents our
investigations into abstraction. During the entire process the User provides guidance and

decisions in order to reduce the complexity of the speciﬁcations and abstractions.

 

 

 

 

 

Code

 

Figure 9.2: Reverse Engineering Component

 

Within the context of software reuse, our reverse engineering technique provides a

means for constructing module speciﬁcations using pre- and postconditions.

9.2.2 Software Reuse

A soﬁware architecture is deﬁned to be a conﬁguration of components and connectors [55,
56, 57]. Software architectures describe the overall organization of a software system in

terms of its constituent elements, including computational units and their interrelationships.

145

 

Chen a
on the use
a softu are
system. D
describe th
21 library 0
Using SPCCl
m3?! Syste
msumption;
Order to vali
to Populate .

Hgmeg
Cheng [52,
the Arr}; De“
Store represc
Spetiﬁcation
dissenalion.
consists of a

and Illlegram

93 Exa
In this seem.

engllleeﬁng

 

Chen and Cheng [52, 53] have developed an approach to software reuse that is based
on the use of software architectures. The approach involves a three-stage process where
a software system is speciﬁed, components are selected, and then packaged into the ﬁnal
system. During the speciﬁcation phase, a software architecture speciﬁcation is used to
describe the target system. Given this speciﬁcation, a user can select components from
a library of components that potentially satisfy the requirements of the target system.
Using speciﬁcation and component matching, components that meet the constraints of the
target system are then validated and packaged to form the ﬁnal target system. One of the
assumptions that is made by the approach is that a library of components is available. In
order to validate this assumption, we advocate the use of our reverse engineering approach
to populate component libraries with speciﬁcations of existing program code.

Figure 9.3 depicts the framework for the reuse investigations described by Chen and
Cheng [52, 53]. In the diagram, each of the stages described above is represented by
the Arch Design, Selectﬂllatch, and Package processes. The Speciﬁcation Library data
store represents the combination of speciﬁcations and associated program code. These
speciﬁcations can be constructed using the reverse engineering technique described in this
dissertation. The System data store represents the ﬁnal output of the reuse framework and
consists of a packaged component. To support this framework, Architecture Based Reuse

and Integration Environment (ABRIE) system has been developed [53].

9.3 Example

In this section we discuss an example that illustrates the use of the integrated reverse

engineering and reuse framework. First, we populate a library, using reverse engineering

146

 

 

User

 

 

 

   
  
     
 

’----- ---------------------------------------------

ABRIE
Problem Descriptions

Architecture
S . - iﬁcauo

--------’

ll

System

Component Speciﬁcation Component lmplementati

---8-----..

S iﬁcation
”film

Figure 9.3: Software Reuse Framework

 

techniques, with component speciﬁcations. Then we demonstrate the process of specifying
an application and searching the speciﬁcation and component library for suitable reusable

code. Finally, we show the process of packaging components into the ﬁnal application.

9.3.1 Populating the Library

Figure 9.4 shows the source code for an array implementation for a queue abstract data
type. The source code, written using the C programming language, implements a circular
queue so that the head and tail of the queue can “wrap” around the lowest and highest
indices of the array, as shown in Figure 9.5.

The queue data structure consists of three parts: an index to the front, or head of the
queue, an index to the end, or tail of the queue, and an array that is used to store the elements
of the queue. The queue source code contains several functions that correspond to the
operations typically associated with queues including enqueue, dequeue, new-queue,

head, and is_empty. The abstract behavior of these operations is as expected, where

147

 

.‘po- ”G
-‘0- ‘0

' D .—
13- LC
ﬂy“. -‘
i-n-A

‘5‘ ;.
i~‘\-~ ..ea
h— ‘ -.
' ﬂ
”'9‘- «9'3;

3.9.x .ne
avac‘ pr;:

--. .
w» ; ...,
Sdeg

' a
4. e: 4:71

 

typedef int QDATA;
#define MAXSIZE 100

struct queue {

int head;

int tail;

QDATA datalMAXSIZEl;
};

typedef struct queue Queue;

/‘ Operations */

int is_empty(const Queue);
QDATA headlconst Queue);
QDATA dequeue(Queue 'lt

int enQueue(Queue *, QDATA *);
Queue 'new_queue()t

void printQ(Queue);

Queue *new;queue()(

Queue *ner;
ner =
(Queue *)malloc(sizeof(Queue));
ner->head = 0;
ner->tail = 0;

return ner;
}
int is_empty(const Queue q)(

return (q.head == q.tail)t

int enQueue(Queue *q, QDATA *e){
int tail;
int head;

if ((q—>tail — q—>head) == MAXSIZE)
{
printf(‘Full\n');
return 0;
) else (
q->data[q—>tail % MAXSIZE] = *e;
q->tail = q—>tail + 1;
return 1;

}
QDATA dequeuelQueue *qll

int temp;

if (lis_empty('Q)){
temp = q->head % MAXSIZE;;
q—>head = (q->head + 1);
return q->data[temp];

} else {
return 0;

}
}
QDATA headlconst Queue q){

return q.data[q.head % MAXSIZE];

Figure 9.4: Queue Source Code

 

 

 

Figure 9.5: Circular Queue Diagram

enqueue
the front 0
front of [hi
As des
p0pulating
of an as-bi
semantics (
SOUTCC code
the symbols
”Spectivelj
Pointers anc
FOr the r

9f the 91101:
Procedure as
enQueue p]
[0 an oblECt
Oblea ‘Darg
adding“, the

is true. The

 

mallmum S

fallUre‘ me

 

enqueue adds a new element to the end of the queue, dequeue removes the element at
the front of the queue, new_queue creates a new queue, head returns the element at the
front of the queue, and i s-empty checks to see if the queue contains any elements.

As described in Section 9.2.1, construction of a speciﬁcation that is suitable for
populating a component speciﬁcation library involves two primary steps: 1) construction
of an as-built speciﬁcation, and 2) derivation of a high-level abstraction. Using the sp
semantics of the C programming language [58], the complete as-built speciﬁcations of the
source code in Figure 9.4 can be constructed as shown in Appendix D (Figure D.l), where
the symbols ‘&&’ and ‘ | | ’ are used to indicate the logical connectives and (A) and or (V),
respectively. In addition, the symbol ‘ . >’ is the points-to operator [43] for reasoning about
pointers and pointer Operations.

For the purposes of illustration, the remainder of this section will focus on the analysis
of the enQueue procedure. Figure 9.6 shows the as-built speciﬁcation of the enqueue
procedure as derived by the AUTOSPEC tool. Informally, the as-built speciﬁcation of the
enQueue procedure states that prior to the execution of the procedure, the pointer e points
to an object _param4, the value of _param4 is _pVa15, and the pointer q points to an
object _param3 such that the tail component of _param3 has the value _pVal4. In
addition, the speciﬁcation states that after execution of the procedure, one of two conditions
is true. The ﬁrst condition describes the behavior when the queue is full in which case the
difference between the values _param3 . tai l . V and _param3 . head . V is equal to the
maximum size of the queue. Here, the return value of the procedure is 0, indicating a

failure. The second condition describes the behavior when there is room to place an item

149

on theq

incremer

 

er

While
Prom”), tl
“DChnjqUC t
[he Posrcon
mUsr ﬁfSt p
form, the e

Obleq ~Da r

k

the dlSjunCI

on the queue. In this case, the return value of the procedure is 1, the index to the tail is

incremented, and the data element is added to the queue data array.

 

spec int enQueue(Queue *q, QDATA *e)

requires

(((e .> _param4) &&
(_param4.V == _pVa15)) &&
((q .> _param3) as.
(_param3.tai1.V == _pVal4)))

modifies
q (_param3)

ensures

((((((e .> _param4) && (_param4.V == _pVa15)) &&
((q .> _param3) && (_param3.tail.v == _pVal4))) &&
((_param3.tail.v - _param3.head.V) == MAXSIZE)) &&
(return.V = 0)) ||

(((((((e .> _param4) && (_param4.V == _pVa15)) &&
((q .> _param3) && (_param3.tai1.V == _pVal4))) &&
(!((_pVal4 — _param3.head.V) == MAXSIZE))) &&
(_param3.data[(_pVal4 % MAXSIZE)].V = _param4.V)) &&
(_param3.tail.v = (_pVal4 + 1))) &&
(return.V = 1)))

Figure 9.6: Output generated by AUTOSPEC for the enQueue procedure

 

While the speciﬁcation in Figure 9.6 is accurate with respect to the original source
program, the level of detail can inhibit high-level reasoning. As described in Section 6, one
technique that can be used to derive an abstraction of an as-built speciﬁcation is to weaken
the postcondition by using the delete a conjunct strategy. For the enQueue example, we
must ﬁrst put the ensures clause into a conjunctive form, as shown in Figure 9.7. In this
form, the enQueue speciﬁcation has four conjuncts that specify that (a) e points to the
object _param4, (b) _param4 .V has value _pVa15, (c) q points to _param3, and (d)

the disjunctive statement:

150

((((_param3.tai1.V = _pVal4) &
((_param3.tail.v — _param3.head.V) = MAXSIZE)) &
(return.V = 0)) ||

(((((as_const9 = _pVal4) &
(!((_pVa14 - _param3.head.V) = MAXSIZE))) &
(_param3.data[as_const9 % MAXSIZE].V = _param4.V)) &
(_param3.tai1.v = (as_const9 + 1))) &
(return.V = 1)))-

Figure 9.8(a) shows the results generated by the SPECGEN tool when applied to the
speciﬁcation in Figure 9.7. One of the operations that can be performed by the tool is
to generate all the possible abstractions of a speciﬁcation based on preserving one or
more of the conjuncts in the speciﬁcation. The representation of the speciﬁcations that
are generated by preserving conjuncts is called a focus graph. In our example, we are
interested in preserving the conjuncts (b) and (d) and deleting conjuncts (a) and (c) due
to the independence property stated in Section 6. Figure 9.8(b) shows the focus graph for
the example, where the vertex labeled “abcd” indicates that conjuncts (a), (b), (c), and
(d) are conjuncted. This vertex corresponds to the original speciﬁcation in Figure 9.7.
The remaining vertices in the graph represent the possible abstractions that are formed by

deleting conjunct (a), (c), or both.

 

(e .> _param4) && (_param4.V = _pValS) && (q .> _param3) &&
((((_param3.tail.v = _pVal4) &
((_param3.tail.v - _param3.head.V) = MAXSIZE)) &
(return.V = 0)) ||
(((((as_const9 = _pVal4) &
(!((_pVa14 - _param3.head.V) = MAXSIZE))) &
(_param3.data[as_const9 % MAXSIZE].V = _param4.V)) &
(_param3.tail.V = (as_const9 + 1))) &
(return.V = 1)))

Figure 9.7: The enQueue ensures clause in a conjunctive form

 

151

 

 

B. W 11¢
In” W
ax. M Dal-lam l
MWV-JM)
e41 .> Haun- mu

1"”)
«(((0-sun” - Jun) 1. (was - mm.“ - "I. l

 

   

 

 

 

(a) SpecGen (b) Focus
Graph

Figure 9.8: SPECGEN Interface and Output

 

Using the information provided by SPECGEN, several transformations of the
speciﬁcation in Figure 9.7 can be performed that simplify and introduce abstraction into
the postcondition. First, based on the focus graph in Figure 9.8(b), conjuncts (a) and (c)
are deleted due to the independence criteria. After the deletion of conjuncts (a) and (c),
we can perform a textual substitution of all references to the _param3 identiﬁer with a
more descriptive symbol, like Q. Finally, given that the term as_const9 has the value
_pVal4 and in the precondition for the speciﬁcation _param3 . tail .V == _pVal4,
we can replace as_const9 with the term Q . tail“, which represents the pre-value for

the tail component of the queue (i.e., the value of the tail component before execution of the

152

procedure). Given these transformations, the abstraction for the enQueue procedure can
be derived as shown in Figure 9.9. Informally, the speciﬁcation states that after execution
of the procedure, the return value is set to 0 when the difference between the head and tail
of the queue is the maximum array size. When the difference between the head and tail is
not the maximum array size, then the element E is added to the array at the tail index, the

tail index is incremented, and the return value is set to 1.

 

spec int enQueue(Queue *q, QDATA E)

requires

((q .> Q) &&
(Q.tai1 == Q.tail“))

modifies
Q.tail, Q.data

ensures

((((Q.tai1 - Q.head) = MAXSIZE) &&
(return = 0)) ||

((((!((Q.tai1‘ - Q.head) = MAXSIZE)) as
(Q.data[Q.tail“ % MAXSIZE] = E)) &&
(Q.tail' = (Q.tai1“ + 1))) &&
(return = 1)))

Figure 9.9: The enQueue abstraction

 

A process similar to the one used for enQueue can be applied to derive abstractions
for the remainder of the queue as-built speciﬁcations. For the purposes of combining the
reverse engineering suite with the reuse suite, the resulting speciﬁcations must be translated
into the syntax for the ABRIE system. The reason for the differences between the syntax of
the AUTOSPEC and ABRIE speciﬁcation languages is historical. Our initial investigations
for deriving speciﬁcations for programs focused on the analysis of the Dijkstra guarded
command language [6]. As such, a general Larch Interface Language variant [42] was

developed. The expansion of the AUTOSPEC tool to support the C programming language

153

has since prompted a need to modify the tools to generate Larch C (LCL) speciﬁcations,
an activity that we are currently performing. In contrast, ABRIE was not developed for a
speciﬁc programming language, but was intended to be tailorable to a given language. As
such, Chen and Cheng used a generic procedure-oriented syntax for the Larch Interface
Language. However, since the output formats for the AUTOSPEC and SPECGEN systems
and the input format for the ABRIE system are all based on the Larch interface language
(all contain header information and the requires, modiﬁes and ensures clauses), the actual
step for preparing the procedure speciﬁcations generated by AUTOSPEC and SPECGEN
to the library format for the ABRIE system is straightforward and can be facilitated with
automated tools. Appendix D.2 contains the module speciﬁcation in the ABRIE syntax that

was constructed for the example described in this section.

9.3.2 Specifying an Application

In the following discussion, we describe how a solution to the Josephus problem [59] can
be speciﬁed and assembled from reusable components in ABRIE. In particular, we show
how the formal speciﬁcations generated by the reverse engineering process can be used to
semantically determine the reusability.

The Josephus game can be described as follows: N people, numbered 1 to N, are sitting
in a circle. Starting at person 1, a hot potato is passed. After M passes, the person holding
the hot potato is eliminated, the circle closes ranks, and the game continues with the person
who was sitting after the eliminated person picking up the hot potato. The last remaining

persons wins. Given N and M, the Josephus problem is to determine who will win.

154

Figure 9.10 shows the structure of a solution to the Josephus problem that uses a queue

to represent people sitting in a circle. The solution is speciﬁed in ABRIE. Component

 

 

 

 

 

Figure 9.10: Architecture of a solution to Josephus problem

 

master simulates the game, and calls queue operations provided by component queue.
The two components are connected through three connectors of procedure calls. As
shown in the “Component Property” window of Figure 9.10, component queue has three
ports, each of which deﬁnes and provides a queue operation. Figure 9.11 shows the
textual speciﬁcation of the architecture. As shown in Figure 9.11, component master is
implemented using a C source ﬁle ijnain.c. Component queue needs to be implemented
and will be the focus of the reuse activities. The required behaviors of its ports have been
speciﬁed. In the next subsection, we discuss how a library component can be selected based

on these behavioral speciﬁcations to implement the queue interface.

155

 

Architecture jp
Components
Module master
Ports
ProcDef main() ( }
ProcInvoc createQueue() return Queue* ( }
ProcInvoc addToQueue(Queue*,int) return Bool ( }
ProcInvoc delFromQueue(Queue*) return int ( }
Implementation
source('/user/r02/chengb/chenyong/JP', 'jpﬂmain.c')
End
Module queue
Ports
ProcDef queate() return Queue* {
uses auxTheories;
ensures result.head=0 /\ result.tail=0;
}
ProcDef qInsert(Queue* q,int i) return Bool (
uses auxTheories;
modifies q;
ensures (q.tail—q.head = MAXSIZE => result = false)
/\ (q.tail‘ -q.head‘ ~= MAXSIZE
=> ( result = true
/\ q.tail' = q.tail“ + 1
/\ q.data[mod(q.tail“. MAXSIZE)] = i));
l
ProcDef qDelete(Queue* q) return int {

uses auxTheories;
requires q.head‘ ~= q.tail‘;
modifies q;

ensures result=q.data[mod(q.head“, MAXSIZE)]
/\ q.head'=q.head‘+1;
}
End
Connections
CallProc pol
Roles
Caller -> master . createQueue
Definer —> queue . queate
End
CallProc pc2
Roles
Caller -> master . addToQueue
Definer —> queue . qInsert
End
CallProc pc3
Roles
Caller —> master . delFromQueue
Definer -> queue . qDelete
End
End

Figure 9.11: Architecture speciﬁcation

 

156

9.3.3 Component Reuse

ABRIE incorporates a library manager for organizing and managing existing components.
Components are classiﬁed and retrieved based on their interfaces (i.e., types and ports).
When implementing an abstract component (interface) in an architecture, a single click on
the reuse button in the “Component Property” window (see Figure 9.10) triggers ABRIE
to search for the current library (which is loaded through the library manager). All
components of the same type as the query interface will be presented to the user. Based on
their speciﬁcations, the user selects one candidate for further evaluation. Figure 9.12 shows

the scenario of matching the library component circqueue for satisfying interface queue.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

gum-renew» g, .2 ...,‘ , rye-meg! "new 7 r- "rareﬁtmgfa!‘5t._.95.“=-'.~t:.21:31???(if. f_-‘._:-"-:.¥- “- “m
not J that I mm HoofObIgatlon] mumj
Tarn-t murmur-t: queue 1y": Module than Wt: CircularQueue Typo: Module
Porulm me latching Statue l: Portlane Type E
M... 1. nev_queua Procnof
2. quart PIOCDIf PM 3, head Procngg
3. rate Proenaf 4- We "“1”
qc hang“ 5 l i I E
/ l
[U l/ m l/
_____l°"°" ___JW

 

Figure 9.12: Component Matching

 

In order to determine the reusability of circqueue, we need to establish a mapping
from the ports of the target component queue to those of circqueue so that each operation
speciﬁed in queue can be implemented by a corresponding operation in circqueue. As

shown in Figure 9.12, we conjecture that qDelete can be matched (implemented) by

157

dequeue, qInsert by enQueue, and quate by new_queue. Given a match, speciﬁcation-
based proof obligations may be generated to validate the matching.

As exhibited in the mapping between circqueue and queue, naming conﬂicts, such
as qDelete of queue and dequeue of circqueue, may exist between a query speciﬁcation
and the reused component. Resolving these mismatches is one of the tasks of the
packaging process. Figure 9.13 shows the wrappers generated by ABRIE for resolving the
naming conflicts between circqueue and queue, where wrappers are generated based on the

established port mappings. The packaging process also checks connectors and generates

 

// _circqueue_wrapper.cc
// Generated by ABRIE for wrapping component circqueue

#include 'auxTypes.h"

extern int dequeue(Queue *);
int qDelete(Queue *q) {
return dequeue(q);

}

extern Bool enQueue(Queue *, int);
Bool qInsert(Queue *q, int i) {
return enQueue(q,i);

}

extern Queue* new_queue();
Queue *queate() {
return new_queue();

}

Figure 9.13: Wrappers generated by ABRIE for resolving naming conﬂicts

 

their implementation, as well as a system construction ﬁle (a makeﬁle) that describes how

an executable system is produced.

158

Chapter 10

Related Work

10.1 Introduction

Software maintenance has long been recognized as one of the most costly phases in
software projects [9]. A software system is termed a legacy system if that system has a long
maintenance history. Many techniques have been suggested for the maintenance of legacy
software as is clearly indicated by the number of surveys that have been used to catalog
these techniques [60, 61, 62]. Due to the increasing visibility of the Year 2000 Problem
(e. g., Y2K) ‘ many more tools have been suggested and subsequently catalogued [63].
Given the large number of tools, identifying one appropriate for the goals of an
individual organization can be difﬁcult. Currently, the information gathered on software
maintenance tools focuses on surface characteristics for the given tools. That is, the
gathered information typically lists the languages that are supported and the type of by-

products (i.e., artifacts) generated from analyzing the input software with the particular

 

lThe Y2K problem refers to the potential failure of systems due to the use of a two-digit encoding for the
year ﬁeld in software systems.

159

tool. For instance, Bellay and Gall [62] describe capabilities related to usability, parsing
speed, type of by-product, editing facilities, and report generation. Based on feedback and
interaction with industry, it is our claim that in addition to these surveys, it is also useful
to have an analysis of the actual by-products (i.e., function reports, call graphs, data ﬂow
diagrams) in order to gain an understanding of the value of the by-products.

In this chapter, we describe a framework for analyzing software reverse engineering
and design recovery tools and techniques. VVrthin this framework we provide a context by
which software reverse engineering and design recovery tools can be classiﬁed according
to the underlying approach used to analyze software, and we deﬁne several criteria for

comparing and contrasting tools according to the semantic quality of their by-products.

10.2 Background

Chapter 2 described background information about the area of software maintenance
and reverse engineering. In the context of software maintenance, we deﬁne a structural
abstraction to be a description of a software system that is based on the syntactic properties
of a programming language. For example, encapsulation of a sequence of programming
statements into a module is a structural abstraction. We contrast structural abstraction with
the term functional abstraction. A functional abstraction is a description of a software
system that is based on the semantics of a program. That is, a functional abstraction
describes program behavior. For instance, if a sequence of statements is grouped into a
module, the hi gh-level description of the function of that module is a functional abstraction.
Recent work in the area of reverse engineering has focused on both the derivation of

structural and functional abstractions from program code.

160

10.2.1 Evaluation of Software Technology

Brown and Wallnau [64] describe a framework for evaluating software technology that
is based on two primary goals: (1) understanding how the evaluated technology differs
from other technologies, and (2) understanding how these differences address the needs of
speciﬁc usage contexts. In order to achieve these goals, Brown and Wallnau suggest a three
phase process for technology evaluation. These phases are:

1. Descriptive modeling

2. Experiment design, and
3. Experiment evaluation

The descriptive modeling phase is used to create a context for candidate technologies.
A descriptive model is a description of the assumptions concerning features and their
relationship to usage contexts [64]. Two types of descriptive models are the technology
genealogy and the problem domain habitat. The technology genealogy describes the
historical context for a given technology, and a problem habitat describes how the features
of a given technology can be used as well as what the beneﬁts of their use will be.

The experiment design phase involves three primary activities: (1) comparitive feature
analysis, (2) hypothesis formulation, and (3) experiment design. In this phase, the goals are
to develop a set of hypotheses about the added value of a technology that can be established
by experiments, and to identify the experiments that are used to substantiate or refute the
hypotheses [64].

The ﬁnal phase, experiment evaluation, involves performing experiments to conﬁrm or
refute the hypotheses. Brown and Wallnau identify a few different classes of experiments
that can be useful in evaluating hypotheses [64]. These experiment categories include:

161

0 Model problems: narrowly deﬁned problems that are easily addressed by the
candidate technologies. Model problems allow alternative technologies to be directly
compared.

0 Compatibility studies: experiments that study how well candidate technologies
operate when combined

a Demonstrator studies: full scale trial applications of a technology

a Synthetic benchmarks: standard contrived problems that can be used to evaluate the
differences between candidate technologies

In this chapter, we analyze several reverse engineering support tools using an
assessment technique that is similar to the Brown and Wallnau “Technology Delta
Framewor ”. Speciﬁcally, we present the results of the descriptive modeling phase,
where we describe a hierarchical genealogy of reverse engineering techniques. In
addition, we deﬁne several semantic dimensions that are used to qualitatively evaluate
some representative reverse engineering support tools, an activity that corresponds to
constructing a reference model in the experiment design phase in the technology delta
framework. Next, we describe the informal and formal techniques for reverse engineering,

respectively. Finally, we provides a comparative analysis of all of the techniques.

10.2.2 Previous Surveys

The Air Force Software Technology Support Center (STSC) published a two volume report
that compiles information about hundreds of tools that are available for reengineering
purposes [61]. While the report lists many tools, the descriptions of the tools are
often limited to high-level properties, such as supported languages and vendor contact
information. Similar surveys by Zvegintzov [60, 63] also collect descriptions of

Reengineering and Y2K tools with the same shortcomings of the STSC report. However,

162

the Y2K survey [63] does classify the tools based on their intended capabilities. For
instance, some of the categories used to group tools are based on whether the tools
support activities such as inventory analysis (e.g., identiﬁcation of the executable software
inventory), recovering source from object (e.g., analysis of binaries in the case that source
is not available) , and conversion (e.g., identiﬁcation of code and data structures that require
modiﬁcation).

A recent survey by Bellay and Gall [62] compares four reverse engineering tools using
several criteria that are used to analyze the effectiveness of the input parsers, the by-
product representations, the editing and browsing capabilities, and the general usability
of the tools. While the survey in this chapter does provide a more in-depth view of tools
when compared to the previous surveys, it focuses primarily on tool properties as opposed
to the characteristics and qualities of the tool by-products.

Our approach to surveying and analyzing software reverse engineering and design
recovery tools and techniques is meant to provide a framework for assessing the quality
and usability of the by-products. As such, this survey provides a complementary approach
to the assessment and comparison of tools such as those contained in the surveys described

above.

10.3 Taxonomy

In order to classify automated and semi-automated reverse engineering techniques, we
have developed the hierarchical taxonomy shown in Figure 10.1. At the highest level, the
techniques can be subdivided into two classes: informal and formal. Informal approaches

are those methods that rely on pattern matching and user-driven clustering based on

163

the syntactic structure of code. The pattern matching and clustering approaches are
considered informal because either the representations that are constructed are informal, or
the consistency between the design speciﬁcation and the source code cannot be rigorously
veriﬁed. Nonetheless, the informal techniques do provide a means for deriving abstractions
about the general function of a program. The formal approaches are those techniques that
are based on using some type of formal analytical method for deriving a speciﬁcation from
source code. The basis for the formal techniques are grounded in mathematical logic so that
each step can be formally veriﬁed. The primary difference between the informal techniques
and the formal techniques is the use of formal speciﬁcation languages that have well-
deﬁned syntax and semantics. In addition, the formal techniques have associated inference
rules that can be used to construct proofs in order to rigorously verify the correctness of
each step of the reverse engineering process. The remainder of the section describes each
of the categories shown in Figure 10.1.

Various reverse engineering and program understanding techniques can be evaluated
and classiﬁed using the taxonomy in Figure 10.1. The utility of classifying tools using
this taxonomy is that it provides a means for determining the current trends in supporting
reverse engineering and design recovery, and aids in identifying the areas that require
further investigation. In Sections 10.5.1 and 10.5.2 we describe several representative
techniques and classify them accordingly. As a notational convention, a numerical tag
follows the name of each approach to indicate the classiﬁcation of the technique within the
taxonomy. For instance, a tool “foo” might fall in class “2” to indicate that the technique is
an informal, plan-based, commercial tool. The annotations at the leaves of the classiﬁcation
hierarchy in Figure 10.1 associate each tag to a location in the classiﬁcation.

164

 

Class

Research IPLR
Plan-Based <
Commercial IPLC
formal
Research [pAR
Parsing-Based
Commercial IPAC
Techniques
Research FI‘R
Transformation <
Commercial FI‘C
ormal

Research FXR

Translation < ,
Commercral FXC

Figure 10.1: A Taxonomy of Reverse Engineering Techniques

 

10.3.1 Informal Techniques

In the context of reverse engineering and program understanding, a technique is classiﬁed
as informal if the methods used to recover designs from source code is based on pattern
matching or analysis of syntactic structures as opposed to semantic structures. The
informal techniques can be decomposed into two additional subcategories: plan-based
and parsing-based. The plan-based techniques rely primarily on using pattern matching
to identify clichés or plans within source code and have been a major focus in both
research and commercial organizations. A program plan is a description of a computational
unit contained within a program where a computational unit performs some abstract
function [44]. A program plan can be localized or de-localized in the sense that the code

recognized as satisfying the plan can be located in contiguous (localized) or non-contiguous

165

(dc-localized) sequences of code [65]. To date, most plan-based approaches have been
developed by research organizations [36, 66, 67], although some industrial adoption of this
approach is occurring [68].

A parsing-based approach is one in which a program is analyzed using the properties
of the syntactic structure of a language. In general, the parsing-based approach is used
to construct a high-level structural abstraction of the source code. These abstractions
typically come in the form of data ﬂow diagrams or some other graphical representation
of the design. A signiﬁcant number of commercial tools use a parsing-based technique for
supporting reverse engineering [69, 70], and research organizations continue to investigate

the use of advanced parsing-based approaches [37, 71].

10.3.2 Formal Techniques

Formal methods for software development are analytical techniques for assuring, by
construction, that a derived speciﬁcation is correct with respect to some other speciﬁcation.
A reverse engineering technique is formal if the steps of the method have a formal
mathematical basis. When applied to reverse engineering, a formal method takes as input a
source program (a low-level speciﬁcation) and derives a formal speciﬁcation. In the formal
context, reverse engineering techniques can be subdivided into two categories: techniques
that use a knowledge-base or transformation library to derive formal speciﬁcations from
code, and techniques that use derivation or translation to derive formal speciﬁcations from
code.

A transformation is a means for changing a speciﬁcation from one form to another

while preserving the semantics of the speciﬁcation. In the context of programs, a program

166

transformation is a means for changing a program from one form to another while
preserving the semantics of the program. Each program transformation is typically used to
change a group of programming statements at a time, where the group is determined by the
author of the particular transformation.

Transformation is contrasted with translation, where a translation is also a means for
changing a program from one form to another while preserving semantics but at an atomic
level of granularity. The primary difference between transformation and translation is the
degree to which high-level knowledge about a problem domain or programming language
is incorporated into the transformation or translation rules. In the case of transformation,
the rules typically involve transforming aggregations of programming statements into
simpler, equivalent sequences of statements (as is the case in restructuring transformations)
or concise formal speciﬁcations. In many cases, a large library of transformations is
required to capture the many different possible code constructions. Translation, in contrast,
involves much simpler rules that are based on single atomic statements such as assignments,
conditionals, and iteratives, thus requiring fewer rules. A program compiler can be
considered a translator since each program statement is translated into an equivalent binary
form. In the context of program reverse engineering, a translation technique is one that
translates a program into an equivalent formal speciﬁcation.

Research into the use of formal methods for reverse engineering has addressed both the
use of transformation [72, 73] and translation [6]. Industrial adoption of such techniques

has begun but is limited [39, 74].

167

10.4 Semantic Dimensions

A by-product is an artifact that is constructed by a reverse engineering tool as a result of
analyzing program code. One way to evaluate the by-products of a tool or technique is
to simply list the formats and representations that are produced by a particular tool. For
instance, one tool might produce reports about the data structure formats, as well as visual
representations such as callgraphs and data ﬂow diagrams. While this knowledge about a
tool is extremely helpful, it is of equal importance to understand the nature of these by-
products and to evaluate a tool based on this information. In order to analyze the value
of the by-products of the various tools, we deﬁne four semantic dimensions: distance,
accuracy, precision, and traceability. These measures enable a software maintainer to
evaluate a tool based on the level of importance placed on the consistency between an

abstract representation as compared to a given implementation.

10.4.1 Semantic Distance

The semantic distance describes the number of levels of abstraction that separate an input
and an output of a particular technique. The semantic distance is a relative distance, since
no absolute measure of abstractness can reasonably be developed. Instead, a subjective
measure based on the level of algorithmic detail must be considered.

As a rule of thumb, the greater the semantic distance, the more abstract the by-product.
Suppose, for instance, we translated source code from FORTRAN to C. Since there is no
difference in the level of abstraction between the two representations, the semantic distance
is low or non-existent. On the other hand, if we reverse engineer source code from C into

a data-ﬂow diagram representation, the semantic distance is higher. At the extreme, we

168

might reverse engineer source code from C into a description of the concept of the program;
a transformation that would result in the highest degree of semantic distance.

A concept related to the semantic distance is the inter-step distance that measures
the semantic distance between each intermediate step of a technique. For example, if
a reverse engineering technique is comprised of three steps, where each step produces
a representation that is more abstract than the previous step, the semantic distance that

separates each step in the technique is the inter-step distance.

10.4.2 Semantic Accuracy

The semantic accuracy describes the level of conﬁdence that a speciﬁcation is correct with
respect to the input (i.e., source code). Many of the by-products derived from an analysis
of syntactic information rarely have a low semantic accuracy. That is, the information
that is recovered from the source code is accurate with a high degree of conﬁdence. In
contrast, the techniques that derive by-products based on semantic information may not be
as accurate. For instance, those techniques that are based on the plan abstraction approach
may rely on the assumption that plans are not interleaved [65], and, as such, may ignore
the effect of cancellation or composition in their description of a particular sequence of
software. That is, two or more program plans may be identiﬁed in the same sequence of
code, but their combined effects may not be well-understood and thus, the accuracy of the
design abstraction may be reduced.

One of the factors that impacts the semantic accuracy of a given technique is the number

of analysis stages and the inter-step distances between the stages. This is due to the fact that

169

an abstraction omits certain information that is embedded in a lower-level representation.

The composition of these stages results in an increased potential for a loss of accuracy.

10.4.3 Semantic Precision

Semantic precision describes the level of detail of a speciﬁcation and the degree that the
speciﬁcation is formal. Figure 10.2 depicts a precision hierarchy for a set of tool by-
products. A formal speciﬁcation is the most precise given the well-deﬁned syntax and
semantics associated with this form of description. The least precise by-product is natural
language due to its potential for ambiguity. A more precise speciﬁcation is apt to be more
amenable to automated analytical processing while a less precise speciﬁcation better suited

for discussions between programmers.

 

A
Formal Speciﬁcation
8 .8
E g Graphs/Diagrams
3 g
3 2 Pseudocode
v Natural Language

 

 

Figure 10.2: Precision Hierarchy

 

10.4.4 Semantic Traceability

Semantic traceability describes the degree that a speciﬁcation can be used to reconstruct
an equivalent program. Semantic traceability highly depends on the semantic accuracy

and semantic precision of the end by-product since the accuracy will contribute to the

170

degree to which the original program and the new program correspond semantically, and
the precision will contribute to the degree that the representation is free of ambiguity.
Furthermore, the semantic precision will impact the amount of semantic information that
can be used to construct the new program. For instance, a formal speciﬁcation might
have a high degree of semantic traceability while a graphical design has a low degree of
semantic traceability. The ability of a programmer to reproduce a working system varies
greatly between a formal speciﬁcation and a graphical design since semantic information
is contained in the formal speciﬁcation while, in general, only syntactic information is

contained in a graphical design.

10.4.5 Discussion

Ideally, a design derived from program code has a balance between all of the semantic
dimensions. A large semantic distance may produce a more abstract speciﬁcation but if
that speciﬁcation lacks accuracy and precision, there is a low degree of conﬁdence that
the speciﬁcation captures the actual functionality of the source code. On the other hand, a
speciﬁcation with a high degree of precision and traceability that lacks a reasonably large
semantic distance may be difﬁcult to understand. In the end, it is the software maintenance
programmer that must weigh the goals of a project against the relative advantages and
disadvantages offered by the by-products of the various techniques in order to make the

appropriate decision for a particular project or organization.

171

10.5 A Representative Tools Survey

In this section, we survey a number of tools that can be used to support reverse engineering
and design recovery. Several of the tools are commercially available systems while a
number of other tools are systems that are currently being developed as part of research
activities. There are far too many reverse engineering and design recovery systems
available to enumerate them all in this context. Instead, we have selected a number of
representative tools that exhibit several of the properties discussed earlier. The survey is
decomposed into two broad categories that correspond to the taxonomy in Section 10.3:
informal and formal-based techniques. Within each category, additional criteria are used to

describe the various techniques and tools.

10.5.1 Informal Techniques

This section describes different informal approaches that have been applied to reverse
engineering and design recovery. As a convention, the name of each approach is followed

by a numerical tag corresponding to the classiﬁcation hierarchy given in Figure 10.1.
Plan-Based Approaches (IPLR and IPLC Classes)

A program cliche is a commonly used sequence of code that performs some speciﬁc
function. The term plan is used to refer to the knowledge representation for describing
clichés [44]. Typically a plan contains an event section for describing the conditions that
must exist in order for an instance of the cliche to be present.

An example plan is shown in Figure 10.3 [36]. The plan has two sections: an event

section (consists of), and a constraints section (such that). The event section is as described

172

above. The constraints section provides additional conditions for evaluating the events
within a speciﬁc context of the program. Essentially, the plan in Figure 10.3 states that
if the events of reader, eof—test, and repeater are recognized with respect to the

constraints, then the concept READ-PROCESS-ALL-VALUES has been recognized.

 

READ-PROCESS-ALL-VALUES( value: ?value, PROCESS: ?body)

consists of
reader: FETCH-INPUT-VALUE(RESULT: ?inp-res, VALUES: ?value)
eof—test: NOT-EQUAL(OP1: ?inp—res, 0P2: EOF)
repeater: LOOP(TEST: ?test, BODY: ?body)

such that
contained—in(reader, ?test)
contained-in(eof—test, ?test)
data—dep(eof-test, reader, ?inp—res)

Figure 10.3: Example Plan

 

Many of the plan-based techniques use a three step process that involves parsing the
program, identifying the events, and matching the events with the plans contained in a plan
library. The variations, to be described in the next few sections, are often related to the
methods used to construct the plans to either make the techniques faster, more efﬁcient, or

convey some other view of the design of a system.

Cobol/SRE (IPLR, IPLC). The Cobol System Renovation Environment, or CobollSRE,
is a toolset developed by the Andersen Consulting Center for Strategic Technology
Research [66, 75]. The approach used in Cobol/SRE is based on the use of program plans
with the intent of identifying abstract concepts in code. These abstract concepts can be
classiﬁed as programming concepts, architectural concepts, and domain concepts [75].

While programming concepts can be automatically determined by parsing, architectural

173

and domain concepts require knowledge about architecture and domains to be encoded
into a plan library.

In the CobollSRE approach, concepts in programs are recognized by decomposing
programs into their equivalent abstract syntax tree and performing syntactic pattern
matching against the cliche library. Higher level concepts are recognized via a method of
substitution whereby constraints (or sub-concepts) for a hi gh-level concept are instantiated
with previously recognized lower level concepts. In addition, CobollSRE has features that
support ﬂow analysis, slicing (a decomposition technique that extracts program statements
that are relevant to the scope of a particular computation from a program [76]), complexity
analysis, and anomaly detection.

CobollSRE allows users to determine which program segments to analyze and which
rules to use. Criteria for selection of segments include selection using condition-based
slicing [76], forward slicing [76], and ripple-effect analysis [76]. Upon completion of the
recognition process, a window lists which concepts were recognized. Other information,
including which rules were used in recognizing the concepts, is available for user analysis
purposes. The toolset has been applied to a commercial production control system

consisting of approximately 8000 COBOL modules.

 

 

 

 

 

COBOL/SRE Summary.
Name COBOL/SRE
Class Informal-Plan-Research/Commercial
By-Products Code decomposed into commented segments
Language COBOL
Operating System Unix

 

 

 

 

174

DECODE (IPLR). DECODE uses a plan-based approach to provide an environment
for supporting the cooperative understanding of programs via the construction of an
object-oriented design from COBOL code [36]. The program understanding system is
used to recognize as much of the program as possible with the programmer ﬁlling in
where DECODE fails. Based on the COBOL/SRE approach, DECODE uses three major
components to support program understanding: an automated program recognizer (APU),
a knowledge base for storing information about a given program, and a design notebook for
allowing a user to edit retrieved designs as well as construct queries for answering questions
about the designs.

The DECODE technique has three primary steps: an automated understanding
step, a user-driven, machine-aided understanding step, and a query step. In the
automated understanding step, the APU is used to identify the existence of both low-level
(incremental) and design level or design-oriented concepts in a system. To support this
activity, plans are extended to have links to high-level conceptual design elements. In
addition, special associations such as specializations and implications are allowed.

In cases where the APU is only able to understand parts of a system, DECODE aids
the programmer in understanding the remainder by use of a structured notebook. This
activity works by allowing a programmer to browse through the code and the initial design.
Once the user recognizes new design concepts, those concepts can be added to the design
and the code can be linked to the new design element. Once an appropriate design has
been constructed, queries about the design and program can be made. DECODE supports
queries about the function of certain sections of code, the location of code corresponding
to the design, and the status of the design (e.g., has the design been completed?)

175

The graphical user interface of DECODE consists of many elements including a code

browser and a design-editor, which provides a graphical depiction of the extracted design.

 

 

 

 

 

DECODE Summary.
Name DECODE
Class Informal-Plan-Research
By-Products Graphical Design Representation
Language C, COBOL
Operating System Unix

 

 

 

 

LANTeRN (IPLR). The “Loop AN alysis Tool for Recognizing Natural concepts” or
LANTeRN is an approach that uses a multi-step process to construct predicate logic
annotations for loops [67]. The analysis process translates and normalizes loop programs
into forms that are amenable to matching various components of loops. A knowledge base
or plan library is used to identify stereotypical loop events, where events are in the form of
basic events and augmentation events.

A basic event (BB) is a fragment of a loop that forms the control aspect of a loop. These
are typically made up of conditions, enumerations, and initializations, where a condition is
a clause of the loop guarding condition, the enumeration is the segment of code that ensures
that data flows into the condition, and the initializations are the statements responsible
for the initializations of the variables into the loop condition. Augmented events (AE)
make up the remaining components of the loop body. The AE’s are subdivided into two
subcategories: the body and the initialization. The initialization is the set of statements used
to initialize the variables contained in the loop body, while the body is all other statements

in the loop not associated to data ﬂow into the loop conditions.

176

Upon analysis of the events contained within a loop, a pattern matching library is used
to characterize the loop. When a match of a rule antecedent occurs, the corresponding
formal speciﬁcation is constructed by using the consequent of the ﬁred rule. These
consequents have information such as preconditions, postconditions, and invariants.

The structure of the plan library is based on a classiﬁcation of stereotypical loops. This
classiﬁcation is based on the characterization of the structural forms of loop conditions
(i.e., single condition vs. multiple condition), the loop bodies, and the variables used to
determine the loop conditions. LANTeRN uses these characterizations in order to identify
the appropriate rules to apply a loop program. In order to facilitate efﬁciency, many rules
are abstracted and generalized into a hierarchy of plans.

The approach taken by the LANTeRN system moves in the direction of making plan-
based approaches more formal in that the ﬁnal product of the loop analysis activity is
the construction of a formal speciﬁcation. However, while the activity produces a formal
speciﬁcation, there is no formal basis for the veriﬁcation that the speciﬁcation of the plan

matches the true semantics of a loop that is being analyzed.

 

 

 

 

 

LANTeRN Summary.
Name LANTeRN
Class Informal-Plan-Research
By-Products Formal speciﬁcation (axiomatic)
Language Pascal
Operating System NA

 

 

 

 

Xinotech (IPLC). Xinotech is an interactive environment that is based on the use of a

meta-language called the Xinotech Meta-Language, or XML [68]. The Xinotech approach

177

is based on the plan, or cliche approach described earlier. The process used for analyzing
software is comprised of steps that are used to translate source code into the intermediate
XML representation, apply concept or plan recognition techniques, and then represent
the design of the system using multiple textual and graphical views. In addition, the
Xinotech system provides support for several different methodologies for analysis and code
transformations.

The Xinotech approach is characterized primarily by the extensive use of meta-
languages. These meta-languages are general purpose languages that make the Xinotech
tool applicable to many different source languages through translation into the XML
language. Plan-based transformations in Xinotech are speciﬁed using the Xinotech Plan
Abstraction Meta-Language (XPAL) and allow Xinotech to make transformations of code
into higher level abstractions. Features of Xinotech include the ability to support many

views or models of a particular system.

 

 

 

 

 

Xinotech Summary.
Name Xinotech
Class Informal-Plan-Commercial
By-Products Textual and graphical designs
Language Several
Operating System Unix, Windows

 

 

 

 

Parsing-Based Approaches (IPAR and IPAC Classes)

Many commercial tools have been developed to address software maintenance issues.
In general, these tools are typically parser-based. The by-products of these techniques

generally consist of call graphs and ﬂow diagrams although many other representations

178

exist and are produced by these systems. This section describes three commercially

available systems and two systems developed by research organizations.

Reﬁne (IPAC). The Software Reﬁnery and Reﬁne Language Tools by Reasoning Systems
have been the basis for many reverse and re—engineering tools [77]. By supporting such
features as user extensions, Reﬁne-based tools have grown in popularity.

The Reﬁne Language tools support reverse and re-engineering efforts for programs
written in various programming languages including Ada, C, COBOL, and FORTRAN.
Features of the Reﬁne tools include interactive source code browsing, generation of various
reports such as structure charts and identiﬁer (i.e., variable) deﬁnitions. A major feature
of the Reﬁne tools is the open architecture that allows users to tailor Reﬁne to speciﬁc
language dialects.

Reﬁne Language tools, when combined with the Software Reﬁnery, provide an
environment for producing reverse and re-engineering applications through the use of a
three part process of loading code into an object database, selecting code and operations to
be applied to the code, and executing the operations. The Software Reﬁnery is divided into
three tools that support this process and allow users to construct custom reengineering
tools: DIALECT, REFINE, and INTERVISTA. These tools are used to support parsing,
symbolic computation, and user interface construction, respectively.

The Reﬁne-based tools have been used in a number of well-documented instances for
building reverse and reengineering applications [78]. In addition, the Reﬁne-based tools

have been used to support many research—oriented activities [75].

179

Reﬁne Summary.

 

 

 

 

 

 

Name Reﬁne

Class Informal-Parsing-Commercial
By-Products Graphical views

Language Several

Operating System Unix

 

 

 

McCabe Visual Reengineering Toolset (IPAC). The McCabe Visual Reengineering
Toolset (VRT) provides a graphical environment for supporting code analysis [70]. The
VRT tools combine metric, complexity, and static information to aid in many reengineering
tasks.

The key feature of the McCabe VRT is the production of graphical views of a
program including structures charts that are combined with information about various
complexity measures of a system (e.g., cyclomatic). In addition, McCabe VRT supports the
identiﬁcation and elimination of redundant and dead code. Other features include testing
aids for determining logic and data complexity tests.

One of the strongest characteristics of the McCabe VRT is the number of languages
(over 15) that are supported, including Ada, COBOL, C, C++, and ASM370. These tools

operate on numerous platforms and operating systems.

 

 

 

 

 

McCabe VRT Summary.
Name McCabe VRT
Class Informal-Parsing-Commercial
By-Products Text and Graphical views
Language Several
Operating System Unix, Windows

 

 

 

 

180

Imagix 4D (IPAC) Imagix 4D is a graphical tool for supporting program understanding
through the use of multiple views of a system [69]. The main features include a 3
dimensional view of the software structure on an xyz-axis and supports hypertext browsing
of source code.

Imagix 4D uses a static syntactic analysis technique of various software sources
including code and makeﬁles to build a database of information about a subject system.
Structure charts, control ﬂow, data usage, and inheritance information is used to aid the
user in the analysis process, and support for multiple views enables a user to analyze the
system based on data types, ﬁle dependencies, and function calls. Imagix 4D allows a user
to automatically construct documents from information gathered during analysis. Imagix

4D supports C and C++ source code and runs on Sun workstations.

 

 

 

 

 

Imagix 4D Summary.
Name Imagix 4D
Class Informal-Parsing-Commercial
By-Products Text and Graphical views
Language C, CH-
Operating System Unix

 

 

 

 

Rigi (IPAR). Rigi is a parsing-based tool that focuses on constructing structural
abstractions by facilitating the management of the complexity of a graph derived from
source code [37]. Rigi uses a three step process to support program understanding. The ﬁrst
step, parsing, constructs a representation suitable for proceeding to the second step, graph
construction and visualization. The initial graph (e.g., a call graph) can be passed through

ﬁlters that allow a user to select the subsystems of interest. The ﬁnal step, an interactive and

181

iterative one, allows a user to reduce the complexity of the graphs by collapsing vertices in
the graph into functional groups, and supports the hierarchical browsing of the graph.

The underlying approach in Rigi for automatically constructing subsystems and
functional groups is a bottom-up technique. The strength of Rigi is the use of composition
operations based on well-established software engineering concepts such as coupling and
cohesion that aid in the construction of graphical speciﬁcations that depict either the calling
hierarchy of a system or some other view of that hierarchy, such as subsystems and abstract
data types. Rigi has been applied to a number of real projects including applications from

IBM, NASA, and a commercially available system called Doctor’s Practice Management

 

 

 

 

 

System [79].

Rigi Summary.
Name Rigi
Class Informal-Parsing-Research
By-Products Graphically-oriented design
Language C, C++, COBOL
Operating System Unix, Windows, Linux

 

 

 

 

Reﬂexion Models (IPAR). A Software Reﬂexion Model is a model that is used to
represent differences between an engineer’s high-level model and a corresponding low-

level model of the original source code [71]. This approach consists of three major steps:

1. High-level model deﬁnition
2. Source (low-level) model extraction from the source

3. Deﬁnition of a declarative mapping between the high-level model and low-level
model

4. Computation of a reflexion model

182

guest‘s-q

Steps 1 and 3 are performed by the software maintenance programmer while automated
tools can be used to perform steps 2 and 4. The by-products of this approach consist
of high-level, low-level, and reﬂexion models. The representation used in these models
depends entirely upon the software maintenance programmer as well as the tools used
to construct the low-level source model. A reﬂexion model is represented by a graph
that closely resembles a high-level model provided by a user. The primary modules of
the high-level model are retained and arcs between the modules (typically represented by
rectangles or circles) indicate whether or not ﬂows between the modules are consistent (or
inconsistent) with the user-deﬁned mapping.

The primary value of the reﬂexion models is the capability to communicate to the
software maintenance programmer the differences between the perceived structure of the
system (i.e., the high-level model) and the actual structure of the system (i.e., the source
model). A tool to support this approach, called RMTool, has been used to analyze C and
C++ source code, but the tool (and approach) is not limited to these languages. The size of

the programs analyzed range in size with the largest being an industrial system with over a

 

 

 

 

 

million lines of code [80].
RMTool Summary.
Name RMTool
Class Informal-Parsing-Research
By-Products Graphically-oriented design, Reﬂexion model
Language Primarily C. Easily retargetted.
Operating System Unix, Windows NT

 

 

 

 

183

 

10.5.2 Formal Techniques

This section describes the different formal approaches that have been applied to reverse

engineering and design recovery.
Transformation (FTR and FTC Classes)

Program transformations have been used primarily for forward engineering and the
development of programs [81, 82, 83, 40]. A program transformation is a semantic
preserving operation where a part of a program is replaced with a semantically equivalent
construct. When applied in forward engineering, a transformation may replace a high-
level speciﬁcation with an implementation of the speciﬁcation. In reverse engineering,
transformations are generally aimed at replacing sequences of code with semantically
equivalent formal speciﬁcations.

In general, the theoretical foundations of transformations are based on proving the
equivalence of the components of a transformation. For instance, if a construct a is to
be replaced with some other construct 3, the black box behavior of a and [3 must be proven
to be equivalent with respect to the initial and ﬁnal states. Transformations can be at the
same level (as is typical with a restructuring transformation), reﬁnements (commonly found
in program synthesis), or abstractions (which are appropriate for reverse engineering).

The main difference between a transformation and a program plan is that the
transformations are semantically preserving, meaning that a part being replaced during
a transformation is provably equivalent to the part it is being replaced by. A program plan,
on the other hand, is a knowledge representation and recognition rule from which claims

about correctness cannot be formally veriﬁed.

184

 

Maintainer’s Assistant (FTR). The Maintainer’s Assistant [84] is a reverse engineering
environment used for reverse engineering program code into formal speciﬁcations using
semantics-preserving transfonnations. The primary feature of the Maintainer’s Assistant
is the use of a formally deﬁned wide spectrum language. A wide spectrum language
is a multi-purpose language that combines low-level programming constructs such as
assignment, alternation, and iteration with high-level formal speciﬁcation constructs. In the
context of the wide spectrum language wsl, several transformations have been developed
for supporting software maintenance activities [84].

An example transformation, called a loop inversion [84], is as follows. Assuming that
the statements 81 and S2 have no exits, a code sequence “do SI; 82 0d” can be inverted
to “S1; do S2; 81 0d”. The library of semantic preserving transformations used by the
Maintainer’s Assistant plays a major role in the reverse engineering process, where the
system keeps track of information about the applicability of a particular transformation.
The program transformation process is a user-driven activity where a programmer browses
through code with a graphical interface and chooses when to apply transformations. The
ﬁnal by-product of the Maintainer’s Assistant is a formal speciﬁcation written in the wsl
language [85]. As such, the speciﬁcation can consist of several statements expressed at
different levels of abstraction. As such, the speciﬁcation may retain the sequential style of
the original program.

The Maintainer’s Assistant toolset was initially developed to support the IBM370
Assembler language and a subset of BASIC and has been applied to portions of the IBM
CICS product. In addition, the system has been expanded to support the reverse engineering
of concurrent programs [86].

185

 

Maintainer’s Assistant Summary.

 

 

 

 

 

 

Name Maintainer’s Assistant

Class Formal-Transfonnational-Research
By-Products First-order logic (WSL) speciﬁcation
Language WSL, IBM370 Assembler

Operating System Unix

 

 

 

Design Maintenance System (FTC). Baxter and Mehlich [39] describe an approach
to reverse engineering that is based on the idea that reverse engineering consists of the
“backwards” application of program transformations. In order to construct a library
of transformations, the approach records the transformations that are used to instantiate
program plans in a forward transformation system. The approach advocates design
maintenance as the primary means for maintaining a system, thus avoiding the need to
continually reverse engineer a system over its lifetime.

While the primary emphasis in this approach is the use of a transformational engine,
plan recognition technology is used extensively as a means for retrieving “clues” about
various aspects of the input code [39]. That is, program plans are used to guide the
transformation process. The approach, supported by a domain based transformation system
called the Design Maintenance System or (DMS) has been used to analyze source code

written in Motorola 6809 assembler code.

 

 

 

 

 

DMS Summary.
Name Design Maintenance System
Class Formal-Transformational-Commercial
By-Products NA
Language NA
Operating System NA

 

 

 

 

186

REDO (FTR). The REDO Project [73] produced tools for reverse engineering COBOL
program code into Z and Z“ speciﬁcations. The technique involves a three step process

where:

e COBOL programs are translated into an intermediate language called UNIFORM,
a Functional abstractions are derived from UNIFORM code, and

o Simplifying transformations are applied to the functional abstractions and objects are
derived by combining functions.

Therefore, the REDO approach can be considered to be a hybrid between translation
and transformation techniques although the primary reverse engineering activity is
transformational.

In order to derive hi gh-level abstractions from the UNIFORM code, a data-ﬂow analysis
is performed in order to identify data variables and functions associated to various data
structures. The technique also attempts to ﬁnd logically connected pieces of single entry
and single exit code as a means for identifying abstract functional units. Transformations

are then applied to these abstractions in order to derive object-oriented speciﬁcations using

 

 

 

 

 

Z++ [73].
REDO Summary.
Name REDO
Class Formal-Transformation/Translanon-Research
By-Products Z++ Speciﬁcation
Language COBOL
Operating System Unix

 

 

 

 

187

Ttanslation (FXR and FXC Classes)

Formal translation is the process of deriving semantically equivalent representations
of atomic programming constructs using the formal semantics of a language. The
primary difference between a translation and a transformation is the level of granularity
of the translation. A translation occurs at the atomic level and is associated directly
to programming constructs, whereas a transformation may involve longer sequences.
As a result, translation is more accurate and traceable (i.e., reproducible). However,
translation often produces by-products with a smaller semantic distance between code and

speciﬁcation, resulting in a representation that contains an implementation bias.

Peritus Software Services (FXC). Peritus Software Services is an organization that
specializes in the support of software evolution activities. Many of their techniques
focus on the application of weakest precondition for logical code analysis. The weakest
precondition predicate transformer wp(S, R) is deﬁned as the set of all states in which the
statement S can begin execution and tenninate with postcondition R true, meaning that
given S and R, if the computation of S begins in state wp(S, R), then the program S will
halt with condition R true.

The Pertitus Code Analyzer (PCA) is a tool that has been developed to support
the Peritus approach to logical code analysis [74]. The approach used by the PCA
system is a three step process. First, the input source code is translated into the Peritus
Intermediate Language (PIL). Second, the P11. program is analyzed using several static
analysis techniques including slicing. In addition functions are highlighted and identiﬁed

for further processing. Finally, the code is analyzed using logical analysis techniques based

188

on the use of wp. The logical analysis step is the primary reverse engineering and design

recovery activity, where the analysis of the source is decomposed into four parts including

the analysis of (1) terminating, non-iterating code, (2) terminating, iterating code, (3) non-

terrninating execution of independently terminating programs, and (4) multi-tasking code.

PCA Summary.

 

 

 

 

 

 

Name Peritus Code Analyzer

Class Formal-Translation-Commercial
By-Products First—order logic speciﬁcations
Language COBOL, C, RPG, PL/I
Operating System NA

 

 

 

AUTOSPEC (FXC). The AUTOSPEC suite of tools use a formal translation-based

approach to derive formal speciﬁcations. For a complete description of the AUTOSPEC

tools, please refer to Chapter 8.

AUTOSPEC Summary.

 

 

 

 

 

 

 

Name AUTOSPEC

Class Formal-Translation-Research

By-Products Formal Speciﬁcation, Graphically-oriented diagram
Language Dijkstra, C

Operating System Unix, Linux

 

 

10.6 Comparison

In this section we evaluate the different approaches by comparing them based on surface or

informational criteria as well as the semantic dimensions of the tool by-products.

189

10.6.1 Comparison Criteria

The criteria to be used in comparing the different approaches are subdivided into two
groups: informational and evaluational. The informational criterion are a high-level
list of surface characteristics, such as source language, platform, and technique. These
criterion serve to provide a quick glance index to the reader and a means for quickly
ﬁnding more information about a system if so desired. The evaluational criteria are a
list of detailed characteristics that allow a user to evaluate the differences between the
respective approaches. These characteristics include extensibility, high-level abstractions,

formal speciﬁcations, metrics, standard diagrams, precision level, and traceability level.

10.6.2 Informational Criteria

Informational criteria provide a quantitative means for measuring each of the tools
described in this chapter. That is, each of the criteria can be used as a feature “checkbox”
for a tool. In this paper we use a small set of informational criteria consisting of Languages,
Platforms, and Techniques. Bellay and Gall [62] list several other criterion of this type. The
language criteria indicate the languages supported by a particular tool. The languages that
the various tools support range from C, C++, COBOL, ADA, and FORTRAN. Platform
criteria are used to indicate on which hardware platforms the tools can execute. The
platforms include support for PC, Sun, IBM RS6000, HP, and Macintosh. The approach
(e.g., informal/formal, plan/parsing, etc.) is also used to further classify each technique.
The survey by Bellay and Gall covers many more characteristics that are informational in

nature [62].

190

10.6.3 Evaluational Criteria

The evaluational criteria provide a more in-depth means for categorizing different tools.
These criteria provide a means for differentiating tools according to the by-products. The
by-products include structure charts, ﬂow diagrams, data dictionaries, metrics, complexity
measures, and formal speciﬁcations. Another type of evaluational criteria is the Open
Interface characteristic which indicates whether a tool has an application programming
interface (API) to allow users to build applications. In addition, we compare the
characteristics of the by-products by indicating whether the tool produces structural or
functional abstractions. In addition, the evaluational criterion describe the by-products
using the four semantic dimensions described in Section 10.4 (i.e., distance, accuracy,

precision, and traceability).

10.6.4 A note about by-products

Tool by-products are the artifacts generated by tools as a result of program analysis. Using
the informational and evaluational criteria, different inferences can be made about the value
of a tool with respect to the by-products. For instance, a formal speciﬁcation is a form of by-
product that has the properties of being precise, and in general, traceable. However, formal
speciﬁcations are not generally perceived to be user-friendly (that is, they may require
some speciﬁc background education). Additionally, structure charts are user-friendly,
precise, and traceable but lack high-level abstraction. In the remainder of this section
we evaluate the primary by-products of each tool. We also provide an evaluation of the
by-products along each of the semantic dimensions described in Section 10.4. Inferences

about usability, productivity, etc. are all dependent on the ﬁnal end users.

191

10.6.5 Informational Comparison

In this section we compare a number of tools based on the informational criteria
listed above. In addition to evaluating the tools described earlier, we also include the
Logiscope [87], Ensemble [88], and PAT [89] toolsets in the comparison. An index of
tools is provided in Table 10.1. Tables 10.2 and 10.3 summarize the tools based on the

informational criteria.

 

Commercial Tools Research Tools
SR = Software Reﬁnery PA = PAT
VR = McCabe VRT CS = COBOL/SRE
4D = Imagix 4D DE = DECODE
XI = Xinotech Research LT = LANTRN
LS = Logiscope MA = Maintainer’s Assistant
EN = Ensemble Software RE = REDO Toolset
R1 = Rigi
AS = AutoSpec

Table 10.1: Tool Index

 

Table 10.2 compares commercially available tools using the informational criteria.
This table shows that C and COBOL are the most widely supported languages among
commercial tools and that the McCabe VRT tool supports the largest number of languages.
Among platforms, Sun is the most widely supported, although in this comparison we make
no distinction between the Solaris and SunOS Operating Systems. Again, the McCabe
VRT tool supports the largest number of platforms. Among the techniques used, parsing-
based is the most popular. Of note is the fact that the Software Reﬁnery supports the use of

transformations although the built-in tools do not use formal transformation as an analysis

192

technique. Finally, of all the commercial tools, only the Xinotech tool uses a plan-based

approach.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SR VR 4D XI LS EN
C C C C C C
C++ C C C
go COBOL O O O O O
E“ ADA a a a a
FORTRAN . . . .
Other C C C
PC 0 0
Sun C C C C C
5 IBM RSOOOO C C
5 HP C C
Macintosh
Other C C
Plan-Based O
.3: Parsing-Based C C C C C
E 4!
3 Transformation .
Translation

 

 

 

 

 

 

 

 

 

Table 10.2: Comparison of Commercial Tools by informational criterion

 

Table 10.3 compares research tools using the informational criterion. This table shows
that, like the commercial tools, C and COBOL are the most widely supported languages.
Of the research tools, Rigi supports the largest number of languages (COBOL, C, C++).
“Other” languages are also more widely supported than FORTRAN and ADA due to the
fact that most research tools use source languages that resemble production languages with
the caveat that translation to and from production languages from the research languages is

193

theoretically possible. Among platforms, Sun is supported the most, with the Rigi system
supporting the largest number of platforms (Unix, Windows). The approaches used by the
research tools are divided mainly into two groups: those approaches that use plan-based
techniques, and those approaches that use some formal technique. Only the Rigi system

uses a parsing-based technique.

 

PA CS DE LT MA RE AS

 

 

C++

COBOL O O O

 

CCCE

 

Languages

 

FORTRAN
Other C C C

 

 

 

Sun C C C C

IBM RS6000

 

 

Platform

HP
Macintosh .

 

 

Other
Plan-Based O O O O

Parsing-Based C

 

 

 

Technique

Transformation C

 

 

 

 

 

 

 

 

 

 

 

Translation .

 

Table 10.3: Comparison Of Research Tools by informational criterion

 

Parsing-based techniques are the most widely used technique among the commercial
tools, which reﬂects the fact that the parsing techniques are more mature. The research
tools focus on the use of plans or formal methods, although the plan-based technique has

194

been adopted by the commercial tool Offered by Xinotech. A possible conjecture is that the

plan-based approach is becoming more mature and is beginning to be adopted by industry.

10.6.6 Evaluational Comparison

Evaluational criteria provide a more qualitative means for comparing the various
approaches. Tables 10.4 and 10.5 summarize the by-products produced by each
tool, grouped by commercial tools and research tools, respectively. Tables 10.6 and
10.7 summarize the characteristics of the by-products using the criterion described in

Section 10.6.3. Again, these tables are grouped by commercial and research tools,

 

 

 

 

 

 

 

 

 

respectively.

SR VR 4D XI LS EN
Structure Charts Cl C C C C
Flow Diagrams C C C
Data Dictionaries C C C C
Metrics C C C
Complexity Measures C C C
Formal Speciﬁcations
Other C C C
Open Interface C C

 

 

 

 

 

 

 

 

 

Table 10.4: Comparison of Commercial Tools by By-products

 

Table 10.4 shows the by-products of the various commercial tools. Among commercial
tools, creation of structure charts is the most widely supported activity and the McCabe

VRT and the Ensemble tools create the largest number of by-products. The Software

195

Reﬁnery and Xinotech tools provide support for user-deﬁned applications via their
programmer interfaces. Of all the commercial tools, none support the construction of
formal speciﬁcations, and only the Xinotech tool creates functional abstractions in the form

of recognized program plans.

 

 

 

 

 

 

 

 

 

PA CS DE LT MA RE R1 AS
Structure Charts C C
Flow Diagrams C
Data Dictionaries
Metrics C C
Complexity Measures
Formal Speciﬁcations C C C C
Other C C C
Open Interface C

 

 

 

 

 

 

 

 

 

 

 

Table 10.5: Comparison of Research Tools by By-products

 

Table 10.5 shows the by-products of the various research tools. Most research
approaches focus on the creation of either formal speciﬁcations or some other functional
abstraction with only the Rigi tool supporting the creation Of structural by-products and
abstractions.

Overall, the main difference between the commercial and the research tools is
the nature of the by-products. That is, the research by-products focus on creating
functional abstractions whereas the commercial by-products focus on generating structural
abstractions, as shown in Tables 10.6 and 10.7. Speciﬁcally, Table 10.6 shows that

only the Xinotech tool produces functional abstractions while Table 10.7 shows that

196

only the Rigi tool produces structural abstractions. Tables 10.6 and 10.7 also show the
difference between commercial and research tools with respect to the semantic dimensions
(i.e., distance, accuracy, precision, and traceability) of the by-products. In the table,
“H” indicates high, “M” indicates medium, and “L” indicates low so that an H in the
distance row for a tool means that the by-products have a high semantic distance. The
commercial by-products tend to have a low semantic distance but are very accurate in their
representations. On the other hand, the research tools have a high degree of semantic
distance but the accuracy tends to suffer. A few of the research tools also focus on higher

precision but few do well in terms of traceability and accuracy.

 

SR VR 4D x1 LS EN
gAs-built O D O O O

0
E Abstraction
m

 

 

 

a
.5 As-built

a
8 Abstraction

 

 

 

 

 

 

 

 

 

 

 

 

 

“=3 0
.g § Distance L L L H L L
g g Accuracy H H H M H H
‘32 Precision L L L L L L
Traceability L L L L L L

 

Table 10.6: Comparison of Commercial Tools by evaluational criterion

 

197

 

 

 

 

 

 

 

 

 

 

PA cs DE LT MA RE RI As
gAs-built O O
0

Ab m t'
g) S C 10!) .
a
g As-built .
'13
° Abstraction
E o o o o o o
m Distance H H H H M M H L

8.5 Accuracy M M M M M M M H
G

§§PreCision L L L H H H L H

“'5

Traceability L L L L M M L H

 

 

 

 

 

 

 

 

 

Table 10.7: Comparison of Research Tools by evaluational criterion

 

198

 

Chapter 11

Case Study

Several of the examples that we have presented throughout this dissertation have been self-
contained entities that demonstrated a particular aspect of formal analysis. In this chapter
we present a case study that applies all of the methods for reverse engineering described in

this dissertation to a software system used by the NASA Jet Propulsion Laboratory.

11.1 Overview

In this section we provide an overview of the case study system and outline the objectives

of the analysis.

11.1.1 System Overview

The Command Subsystem provides access and facilitates the command and control of
spacecraft via a user interface. The system supports the control of multiple spacecraft and
provides real-time feedback about the status of radiating commands at each operational
point during the uploading of commands to the spacecraft [90]. In addition, the Command

subsystem supports command ﬁle reformatting and direct access to project databases.

199

The overall Command process is a six-step sequence as follows [91]:

1. A user accesses the Command Subsystem and prepares a mnemonic
command ﬁle.

2. The system translates the mnemonic ﬁle to binary format.

3. The binary ﬁle is converted into a format required by the Deep Space
Network (DSN) for radiation (i.e., transmission).

4. The Command ﬁle is transferred to the DSN for radiation to the
spacecraft.

5. The user can then control and monitor the command ﬁle from the
workstation.

6. Exit.

The command translation module of the command subsystem is responsible for two
of the items in the sequence, namely items 2 and 3. Figure 11.1, taken from the
Multimission Ground Data System User’s Guide for Workstation End Users [91] provides
a ﬂowchart of the Command process. The overall size of the command translation
subsystem is approximately ﬁve thousand lines of code while the overall command system
is approximately ﬁfty thousand lines of code. The command translation system has an
interesting history that motivates the analysis of the software. The system was originally
developed to support the control of a speciﬁc set of spacecraft. Every time a new mission
is developed (for example, the 1997 Cassini mission to Saturn), the software is updated
to handle the translation of spacecraft speciﬁc mnemonics. Given the constant change
associated with the system, the analysis of the command translation software justiﬁes its

study using reverse engineering techniques.

11.1.2 Analysis Objectives

In this chapter, we analyze the command translation subsystem in order to demonstrate the
use of a combined informal and formal technique for reverse engineering. The primary

200

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mnemmic

 

Creatdedit ﬁle

 

 

 

 

 

No
CMD ﬁle ready?
andﬁllu'
Connect to Tmslate
Comrmnicator rmemonic ﬁle
to binary
comer undue
cmd_tnnslate
calm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Binary ﬁle No
carver-ted to DSN
forms?
Yes
Transfer CMD Monitor and Exit
ﬁle to DSN for control ﬁle
radiation to radiation disconnect
eme_cpa exit
connect dean:
unlock node init
transmit directory
attach mm
supend
mm
:‘3‘3'3
al....
rate
override

 

 

 

 

 

 

 

Figure 11.1: Steps for Preparing, Transfen‘ing, and Radiating a Command ﬁle

objective of the case study is to illustrate how a formal method can be used along with

informal methods to derive information about the functionality of a software system.

201

The application of formal methods to large systems has yet to be effectively
demonstrated. However, formal methods have been shown to yield the highest payoff when
applied to systems that are critical in nature. In the area of reverse engineering, the highest
payoff for formal methods occurs not when applied to an entire system, but rather when
applied to a critical part of a system. In this Chapter, we apply our technique to a portion
of the command translation system that is responsible for processing user mnemonics
(messages). The failure of the translation system can have several impacts including
erroneous messages being transmitted to spacecraft. Our objective is to investigate various

properties of the system such as termination and translation failure.

11.2 Project-Speciﬁc Process

Reengineering projects often have project-speciﬁc process models that are used to direct
the re-analysis and re-development of software [92]. For this case study we used a process
that involved the following steps:

1. High-level informal analysis

2. Low-level informal analysis

3. Formal analysis

This process is identical in every respect to the framework described in Section 7. At

the macroscopic level this process is not project-speciﬁc. However, at the microscopic level
this process has many elements that are speciﬁc to the project. For instance, in this case
study, the high-level informal analysis was facilitated by the existence of documentation
that was written by the original developers of the software. In many other projects, the

existence of documents as a resource for high-level analysis can not be assumed.

202

 

One of the assumptions that was made concerning the existing documents was that
constant modiﬁcations to the software were not reﬂected in the documentation. Due to this
lack of document maintenance, it was assumed that only a certain level of detail from the
documents could be determined to be reliable.

The low-level informal analysis was based on the construction of source models (i.e.,
call graphs) in order to recover structural information about the system. Using some
standard visualization tools, the source models were used to determine potential points of
failure. In this context, we use the phrase point of failure to mean those parts of the source
model where there is a large difference between the in-degree and out-degree of a vertex
in the graph. The reason that these vertices of the graph are interesting is that the high
out-degree means that a procedure invokes many other procedures and thus is potentially
a critical procedure. High in-degree vertices in a graph indicate that a procedure is called
often and thus is also a potentially critical procedure.

The formal analysis follows a top-down, bottom-up approach as described in Chapter 7.
In the analysis we focused primarily on issues of mnemonic translation and spacecraft
message construction. Our intent was to examine properties of process termination
and process failure. With process termination, we were interested in determining what
conditions were required for ensuring that the translation process terminates and for process
failure, we were interested in determining what conditions force the translation process to

fail.

203

11.3 High-Level Analysis

The ﬁrst step in the process was to construct a high-level model that described the
overall functionality of the command translation software system. Our primary source
of information for constructing the high-level models were the User’s Guide [91], the
Software Speciﬁcations Document [93], and the Detailed Capabilities and Adaptation
Guide [90]. The purpose of the User’s Guide and Software Speciﬁcations Document
are self-evident. The Detailed Capabilities and Adaptation Guide provided an executive
overview of the functionality of various parts of the command system.

One of our main assumptions in deriving high-level models from the documents listed
above was that the User’s Guide and Detailed Capabilities and Adaptation Guide were
sources of hi gh-level information and, hence, could be viewed as reliable since conceptual
information rarely changes over the lifetime of a product. For the Soﬁware Speciﬁcations
Document we assumed that, contrary to the view held towards the User’s Guide and
Detailed Capabilities and Adaptation Guide, the documentation would progressively
become less accurate as more detailed implementation information was encountered. This
assumption is based on the fact that the software document had few revisions from the
initial writing and so the correspondence between the document and the source code as the

models moved closer to the implementation would decrease.

11.3.1 Context Overview

Before any translation operations can occur during on-line commanding, a communicator
must be allocated and connected to be a user at a command workstation. Allocation to a

communicator is performed by a member of the Data System Operations Team (DSOT)

204

and is restricted to users at speciﬁc workstations who are authorized to command speciﬁc
spacecraft. A Communicator is an abstract entity that relates a spacecraft to speciﬁc
radiation facilities. Figure 11.2 depicts an object model of the relationship between a
Communicator and various entities and concepts. In particular, a Communicator Table
is an aggregation of many Communicators. When a Communicator is allocated by a DSOT

member, the Command Control processor places that Communicator in the Communicator

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

table.
I Available I l Allocated I [ Spacecraft ]
I Communicator Communicator / >
Table l \
allocated—to 1+
I WorksmuonLJLRglﬁgL]
connects-to
authorized-to-access
1+ I l
[ Projectj 1 Logical] [Physical]

 

 

Figure 11.2: Communicator and Related Data Structures

 

Two abstract data objects depicted in Figure 11.2 are Available and Allocated. These are
used to model the fact that resources (i.e., communicators) are either available or allocated.
In the case of allocation, it is possible that an allocated communicator may not be present
in the Communicator table if there is not enough room in memory. Finally, the ternary
relationship between the Communicator, Spacecraft, and Radiation Facility shows that
these entities have some dependent relationship, namely that the Communicator is used

to allocate the resources for communicating to a Spacecraft via some Radiation Facility.

205

Another entity depicted in Figure 11.2 is the Workstation data object. The relationship
between Workstation and Radiation Facility models the fact that a Workstation connects-to
a Radiation Facility in order to send a command ﬁle for radiation to a spacecraft. This
connection can be either Logical or Physical in the case where a Command Translation
(deﬁned later) is either performed on-line or off—line, respectively. A Workstation also has
a relation to a Project where a Workstation can be designated as being project-speciﬁc or

multimission.

11.3.2 Command Translation

Figure 11.3 contains the context diagram for the command translation software subsystem.
This diagram contains one process (bubble) labeled “command translate”, an external
entity (rectangle) labeled “command control”, and three data stores (parallel lines) labeled
“MasterFile Table”, “Communicator Table”, and “Directive Table”. The collector (circle)

is used to abstract the inputs to command translate into one ﬂow (arc).

 

 

 

MasterFile
Table

 

 

 

 

 

Communicator
Table

 

 

 

Directive
Table

 

Figure 11.3: Command Translation Context Diagram

 

206

The Command Control process invokes and passes a communicator index to the
Command Translate process. The Command Translate process uses this index to determine
what operation is to be performed by indexing the Directive Table. The communicator
index is also used to access project ﬁles via indirect access through the Communicator Table
and the MasterFile Table. Once the appropriate project ﬁles are determined, Command
Translate will perform the desired operation as indicated by the Directive Table. If ﬁnal
output is written to new output ﬁles, the MasterFile Table is updated to reﬂect the creation
of the new ﬁles. Otherwise, no changes are made. Upon completion, Command Translate
writes a return code that is accessed by the Command Control process.

Figure 11.4 contains the data ﬂow diagram for the command translation software
subsystem. This diagram is a reﬁnement of Figure 11.3, where the dashed rectangle

represents the command translate process bubble of Figure 11.3.

 

 

 

 

 

 

 

 

 

 

 

 

I

I

I

| I

' mice 7 I

I camel | Cernrmnidor
0 Table

| I

l l

| l

I project |

I ﬁles I Directive

Table
l _______________ '

Figure 11.4: Command Translation Data Flow Diagram

 

207

The Translate Control process uses the communicator index in two ways: ﬁrst,
Translate Control uses the communicator index to reference the Directive Table in order
to determine the operation to be performed, and second, Translate Control uses the
communicator index to resolve the ﬁle names of input ﬁles and conﬁguration ﬁles.

Once the project ﬁles have been located, translate control invokes the appropriate
process (either the interpret process or the convert process) for performing the desired
operation. The interpretation process, and similarly conversion, reads mnemonic input
and translates that input into appropriate binary commands. The translation is based on
formats speciﬁc to each project. Interpretation (conversion) of mnemonics (binary inputs)
proceeds until either all of the items within an input ﬁle have been processed, an error is
encountered, or the user issues a cancel.

Once the mnemonic or binary input has been processed either an output ﬁle has been
created and stored in the project directory, the MasterFile table is updated to reﬂect the
change. If the processing resulted in an error or a cancel, no updates are made. In either
case, a return code is written and accessed by the Command Control process.

Figure 11.5 contains the object model for the command translation data structures. The
Communicator Table is an aggregation of many Communicator Entries. The qualiﬁed
relation Communicator Index between User and Communicator Table indicates that the
Communicator Table is accessed using the Communicator Index. This allows for access to
a Communicator Entry, which is used to access the MasterFile Table. A MasterFile Table
is an aggregation of many Project MasterFile entries and are indexed via the qualiﬁed
association Project Id. The Project MasterFile entries are then used to access Project

Command Files through the qualiﬁed association Project Dir.

208

 

 

Workstation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Communicator 0"de
Table Table
Communicator _ "“m‘ . MasterFrle
Entry ”W “’ Table
- ‘ . m “0' t
baggage i'""“"" Commanpgﬁles

 

 

 

 

Figure 11.5: Major Data Structures for Command Translation

 

Figure 11.6 contains the object model of Project Command Files. This model is used
to describe the different kinds of ﬁles that may be located in a Project Directory. Included
in this model are the Translation Files that come in the form of input and output ﬁles. The
Command Files entry shows a multiple inheritance from input and output, thus indicating
that Command Files can be the output of mnemonic translation and the input to binary ﬁle

reformatting. The Project Translation entry is a ﬁle that is used during translation.

 

 

I Project Command I
Files

 

 

 

l l l l
immense“! frame... are] [bananas] [Mamas]

 

 

 

 

 

 

 

 

 

 

I if... I l 1'... l
A 1 1 I
[ Mnemonic I [CommandFileeJ [SpacecraftFileel

 

 

 

Figure 11.6: Project Files Model for Command Translation

 

209

At this point in the analysis of the command translation system we had determined
that the primary function of the command translation system is to perform interpretation
and conversion of mnemonic and command input ﬁles. In addition, we had determined
that the command translation system reads a directive ﬁle in order to determine the mode
of operation for the system. The latter fact provided an important clue regarding how to
proceed with the low-level analysis of the source code in that we could use this information

as a means for focusing our analysis effort to speciﬁc operating modes.

11.4 Low-Level Analysis

The software for the command system was organized into several directories that were
partitioned by subsystem. Accordingly, the command translation system resided in a single
directory. We began our analysis by ﬁrst using a combination of tools ranging from call
graph browsers to source ﬁle editors. In addition we used the unix command “grep” to
perform keyword searches.

The ﬁrst step involved the construction of the call graph for the main procedure for
the command translation system. Figure 11.7 shows the call graph for the top level of the
command translation system. One of the cues that was used for identifying procedures to
be analyzed was procedure names. In the case of the command translation system, we were
interested in analyzing translation. As such, we focused our investigations on the translate
procedure, shown in the middle of the column to the right of main in Figure 11.7.

Figure 11.8 shows the source code for the translate routine of the command
translation subsystem. The corresponding call graph is given in Figure 11.9. During the

informal analysis, the source code and call graph were used in tandem in order to help

210

 

 

 

 

 

 

 

 

 

 

 

 

 

 
  
 

 

 

 

 

 

 

m
‘
/ - chochetLendJmet
/
ini t._perueterel
m
[ma-“WW rm mm
m
oeLenviroment]
pereebody untendetecmprﬂ
‘ eanendxlt.
mun \—
- / \ ouster-JtrerreF'
— /—
init_tranelete m
m

Figure 11.7: Command Translation: Main source model

 

identify procedures that required further study. For instance, the source code in Figure 11.8
consists of a switch statement with three cases: (1) INIT, (2) XLT, and (3) CARG.
These cases correspond to different operating modes for the software for initialization,
translation, and command ﬁle copying, respectively. In our analysis we were interested
in the XLT or translation mode and so two functions, processmnemonicinput and
processbinarynutput were tagged as requiring further study.

The next step in the process was to generate and analyze the call graph for the
processmemonicinput procedure. The call graph, shown in Figure 11.10, led to

the observation that the processsnsg procedure controls a majority of the mnemonic

211

 

struct msg *translatelop. args)
int op;
char 'args;

extern int dontoutput;
static struct project_parameters *pp;
struct msg *mp = NULL;

switch (op)
{
case INIT: /* initialize the interpreter '/
pp = initialize_interpreter();
break;

case XLT: /‘ interpret a message ‘/
while (args[0] != '\0’)
i
if (process_mnemonic_input(&args, pp))
{
if (mp == NULL)
mp = process_binary_output(pp)i
else
(
mp->next = process_binary_output(pp)i
mp = nip->next;
}
}
else
dontoutput = 1;
}

break;

case CARG: /' set a value for a control argument */
process_carg(&args. pp):
break;

default:

inform_user('interna1 error: bad op in translate'):
end_cmdxlt(CMD_ERROR)t

}
/’ only translation returns a value;
return NULL on error or no value */

return(mp);

Figure 11.8: Translate source code

 

input processing. Speciﬁcally, given that the processsnsg has a large difference between

the out-degree and in-degree, with the out-degree dominating, it led us to identify

processmsg as a critical procedure.

To simplify the analysis, we used the VCG [46] tool to aid in the visualization and
analysis of the call graphs. Speciﬁcally, the VCG tool allowed us to abstract various

functions into entities that are contained within the same by source ﬁle, as shown in

212

 

     
 

Imbreeket.

meet-ct

Figure 11.9: Translate source model

 

Figure 11.11. By folding the graph in this manner, much of the visual complexity was
removed, thus providing a level of structural abstraction.

At this point in the study of the command translation system we were able to begin
formulating questions to be answered by the formal speciﬁcation phase of the analysis. For
instance, a quick analysis of the processmsg procedure, shown in Figure 11.12, revealed
that a loop is executed until the value of the variable sp->msg.complete = 1. Using this

information, we were interested in determining when the value of sp->msg.comp1ete

213

 

 

Figure 11.10: Process Mnemonic subgraph

 

changes from 0 to 1. In addition, given that the return value of the processmsg procedure
is the negation of sp->fai led, we were also interested in determining what conditions
needed to be present in order for sp->msg_complete = 1 and sp-failed = 1 or
sp—fai led = 1. These cases would indicate that the message was syntactically correct
and that the processing either failed or succeeded. Speciﬁcally, we were interested in the

case where the message was constructed correctly but the processing still failed.

214

 

  

mm

A
I

was s

       

 

 

Figure 11.11: Alternative view of Process Mnemonic subgraph

 

215

 

bbpwwwwwwwwwwwwwwwwwwwMHi—il—IHHHHHHi—s
Nl—onocoqmm.e-wNHoxocoqmmpri-aomco\immwal—Iom

(DQO‘UlPUJNl-‘O

static void (*rtn[])() =

{ (void (*)())0, read_lookup_argument, read_numeric_argument,
read_argument_group, read_stem, read_any_stem, read_command,
begin_subroutine, end_argument_group_subroutine,
end_stem_subroutine, end_command_subroutine,
end_message_subroutine, begin_select, end_select,
begin_branch, end_branch, begin_repeat, end_repeat,
open_bracket, close_bracket, push_pointer

};

int process_msg(ep, tp, sp, parms)
U16 *ep;
struct tokens *tp;
struct interp_state *sp;
struct project_parameters *parms;

U16 code;

/* check to see if this message is excluded at this site */
if (ep[l]&SITE_BIT)
{

sp->failed = 1;

fail(EXCLUDED_MSG, tp, sp);

return(O);

}

P = ep + 3 + ep[2]; /* move P to input processing instrs */
Sp—>msg_1evel = 1; /* we are at the message level */
sp->msg_complete = 0; /* the message isn’t complete */

/* interpret instructions until we have a message */
while (lsp->msg_complete)
{

/* on failure, a new value for P

will be on top of the stack */
if (sp->failed)
P = (U16 *)STACK(0);

code = *P++;

(*(rtnlcodell)(tp.sp.parmS);
}

return(tsp->failed);

Figure 11.12: processsnsg source code

 

216

 

11.5 Formal Analysis

Prior to the formal analysis of the command translation system, the following details about

the functionality had been determined via the informal analysis of the source code:

0 The sequence of calls from originating from the translate
procedure and proceeding to processsnnemonicinput and ﬁnally to
processsnsg constitutes a “critical pa ” of execution.

o The processmsg routine terminates only when the variable
Sp->msg_complete variable is set to the value 1.

o The routine end-cmdxl t is invoked by every one of the begin-* routines
(among others) as shown in Figure 11.10. This led to the conjecture that
end-cmdx1t is a critical procedure.

Using this information, we formulated the following questions to be answered by the
formal analysis:

0 What are the conditions for terminating the translation process.

0 What are the routines that exhibit representative behavior for successful
and unsuccessful translation? That is, given known termination
condition for the routine processmsg, what routines establish
Sp->msg.complete = 1?

o Are there other terminating paths that bypass processsnsg?

In an attempt to answer these questions, we analyzed several procedures
that potentially had an impact on the issues outlined above. That is, we
analyzed the processsnnemonicdnput, processsnsg, and end-cmdx1t procedures,
as well as the procedure named end_messagesubroutine. We identiﬁed
end_message_subrout ine as a routine of interest after using the grep command to locate

the places in the code where the variable Sp->msg.comp1ete was assigned a value of l.

217

 

11.5.1 Analysis of processmemonic-input

Appendix E contains the source code for the processsnnemonicdnput procedure. The
most important sequence of the procedure appears in Figure 11.13. In lines 2-11 the
do-while loop contains several assignment statements and a call to the processmsg
procedure. The call is used to guard a break statement that, in essence, provides another
termination condition for the loop. As such, the terminating condition for this loop is the

following:

(process_msg(ep, tp, sp, params) = 1) V (ep 2 collection[249]). (11.1)
Expression 11.1 states that either the processmsg routines returns 1, or the ep pointer
takes the value of the 250th element of the collection array. The signiﬁcance of the
number 249 (or 250, depending on the perspective point of view) is that the ep pointer is
used as a cursor to refer to the current position in a message. When the entire input has
been processed, the ep pointer is moved to the adjacent memory locations until, ﬁnally, it
refers to the next element in the collection array. The more interesting aspect of the
terminating condition in Expression 11.1 is the term process _msg(ep, tp, sp, params) = 1.
In this case, we need to analyze the processmsg procedure in order to determine when

processsnsg returns 1.

11.5.2 Analysis of processmsg

Consider again Figure 11.12. At line 41, the statement return( ! sp—>fai led) indicates
that the program returns the negated value of the sp—>fai1ed variable. Since line 6
of the program in Figure 11.13 states that sp->failed = 0, it is reasonable for us

to infer that (coset(sp).faz'led = 0) is a precondition for the processmsg procedure.

218

 

0, ep = get_first_entry(248); /* 248 contains the message entries */
1.

2. dot

3, tp->token_index = tp—>t;

4, Q = control_list;

5, sp—>num_of_commands = 0;

5, sp->failed = 0;

7, sp->cmd_delimiter_deferred = O;

8.

9, if (process_msg(ep, tp, sp, parms))

10, break;

11 } while ((ep = get_next_entry(ep)) != collection[249]);
12.

13, if (sp->fai1ed)

14, generate_error_msg(sp, tp);

15.
16. *Strp = S:

17, stem_entry = sp->stem_name;

18.

19, return(!sp->failed);

Figure 11.13: Source code sequence for processsnnemonicinput

 

Given this precondition, consider lines 26 - 39. Line 28 establishes the condition that
sp—>msg.comp1ete = 0, so in the initial iteration of the 100p, (coset(sp).failed =
0) A (coset(sp).msg-complete = 0). Using strongest postcondition, then, to formally

specify the loop, we obtain the following postcondition (as generated by AUTOSPEC):

/* AutoSpec:
'(((coset(sp).msg_comp1ete.v == 1) /\
((((((R_i-1 /\
(coset(sp).fai1ed.v != 0)) /\ (as_const8 = S[O])) \\/
((R_i—1 /\
(!(coset(sp).fai1ed.v != 0))) /\
(suif_tmpo .> coset(P))) /\
(P.V = ((2 * 1) + suif_tmp0.V))) /\
(code.V = coset(suif_tmp0).V))) /\
sp(rtn[(int)code](tp, sp, parms), R_i))" */

where the term R_i is used to represent the ith iteration of the loop. The speciﬁcation states

that message processing is complete and that either the message processing failed or it was

219

successful. The term (last line) sp (rtn[ (int) code] (tp, sp, parms) , R-i) is the
speciﬁcation of the various calls to the procedures listed in lines 1-7 in Figure 11.12. In
order to determine if after the loop is executed that indeed (coset(sp).msg_complete.V =

l), we must analyze the various procedures.

11.5.3 Analysis of endsnessagesubroutine

Figure 11.14 contains the annotated source code for the end_messagesubroutine
procedure. After performing a grep search for the references to the variable
sp->msg-complete, it was determined that in only one location throughout the code is
the value of sp->msg_complete set to 1.

The original speciﬁcation for the end_message_subroutine procedure as generated

by AUTOSPEC is as follows:

/* AutoSpec:
'(((((((((parms .> _paramS) /\ (_param5.V == _pVa16)) /\
(((sp .> _param4) /\ (_param4.V == _pVa15)) /\
((tp .> _param3) /\ (_param3.V == _pVal4)))) /\
(S.V = ((4 * 1) + as_const4))) /\
(coset(sp).msg_comp1ete.v = 1)) /\
(l(as_const6 != 0))) /\ (get_next_token(tp.V) != 0)) /\
(coset(sp).failed.v = 1)) \/
((R_1 /\ (!(coset(sp).failed.v != 0))) /\
(!(get_next_token(tp.V) != Q))))" */

Since we were interested in conditions related to message processing completion and
failure, we were able to use the SPECGEN system to derive an abstraction based on deleting
conjuncts. The resulting postcondition speciﬁcation of the end_messagesubroutine

procedure is as follows:

220

 

extern void enddmessage_subroutine(tp, sp, parms)
struct tokens *tp;

struct interp_state ‘sp;

struct project_parameters *parns;

{
S = (unsigned int *)((char *)S + 4 * 1);
sp—>msg_complete = 1;

/' AutoSpec:
R_1: (((((parms .> _paramS) /\ (_param5.V == _pVal6)) /\

(((sp .> _param4) /\ (_param4.V == _pVa15)) /\

((tp -> _param3) /\ (_param3-V == _pVal4)))) /\

(S.V = ((4 * 1) + as_const4))) /\ (coset(sp).msg,complete.v = 1))

'/
if (sp->failed != 0) {
return;
}
/* AutoSpec:
“((R_1 /\ (coset(sp).failed.v != 0)) \/
(R_1 /\ (!(coset(sp).failed.v != 0))))' */
if (get_next_token(tp) != (void *)0) {
sp->failed = 1;
fail('End of message expected', tp, sp);
1
/* AutoSpec:
'((((R_1 /\ (!(as_const6 != 0))) /\ (get_next_token(tp.V) != 0)) /\
(coset(sp).failed.v = 1)) \/
((R_1 /\ (!(coset(sp).failed.v != 0))) /\
(!(get_next_token(tp.V) != Q))))" */
return;
/* AutoSpec:
'(R_1 /\ ((((!(as_const6 != 0)) /\
(get_next_token( tp.V ) != 0)) /\ (coset(sp).failed.v = 1)) \/
((!(coset(sp).failed.v != 0)) /\ (!(get_next_token( tp.V ) != 0)))))' */
1

/* AutoSpec:
I'(coset(sp).msg_complete.v = 1) /\
((((!(as_const6 != 0)) /\
(get_next_token( tp.V ) != 0)) /\
(coset(sp).failed.v = 1)) \/
((!(coset(sp).£ailed.v != 0))

/\
(!(get_next_token( tp.V )

!= Q)))) */

Figure 11.14: Annotated source code for endanessagesubroutine

221

/* AutoSpec:

I'(coset(sp).msg_complete.V = 1) /\
((((!(as_const6 != 0)) /\
(get_next_token( tp.V ) != 0)) /\
(coset(sp).failed.V = 1)) \/
((coset(sp).failed.V = 0) /\
(!(get_next_token( tp.V ) != Q)))) */

This speciﬁcation states that after executing this procedure, (coset(sp).msg-complete. V =
I) and that either (coset(sp).failed. V = I) or (coset(sp).failed.V = 0). In the case that
(coset(sp).failed V = I ), the get_next.token procedure returned a non-zero value, indicating
the message stream buffer was not empty. Conversely, in the case that (coset(sp).failed.V =
0), the message processing was successfully completed.

The completion of the above speciﬁcation allowed us to answer the question concerning
the conditions for the termination of the translation process. In doing so, it was determined
that in the event the end_messageJubmutine never appears on the message stack, the
pmcess.msg procedure can potentially run forever (or at least until there is a message stack

overﬂow).

11.5.4 Analysis of end-cmdxlt

Given our earlier observation about the termination of process_msg, we proceeded to analyze
whether or not other conditions can cause the pmcess_msg to terminate.

In the command translation source code there are several macros that are used to access
the message stack. One such macro is given in Figure 11.15. The code contained in this
macro, upon accessing the stack, will generate a failure condition and terminate the entire

program if the stack overﬂows. The importance of this macro is that several routines called

222

by processsnsg utilize this stack macro. As such, if the failure conditions are met, then

the procedure end-cmdx1t will be called.

 

#define POPM(m) S+=(int)(m); \
if (((U16 *)S<W) II (S>min_S)) \
{ \
fai1("stack overflow", NULL, NULL); \
end_cmdxlt(-1); \

Figure 11.15: The POPM Macro

 

The formal speciﬁcation of the end_cmdxlt is shown in Figure 11.16. The most
important aspect of this routine is that it terminates the entire program if invoked. As such,
the ﬁnal speciﬁcation of the program is “false”, indicating that the routine will never
reach line X in the code. Given this fact, the command translation system, speciﬁcally the
processmsg procedure and subsequently the, processmemonicdnput procedure,
will terminate either by a successful (or partially successful) completion of a message

translation, or by an eventual termination via the end_cmdxlt procedure.

11.6 Discussion

In the process of performing the case study, several discoveries concerning the structure and
functionality of the command translation system were gathered. In addition to revealing
functional properties of the system software, the case study allowed us to discover several
non-functional properties regarding the code. In this section, we summarize the case study

analysis.

223

 

o, extern void end_cmdxlt(int n) {
1, if (params->cmdcntl != 0) l
2. inform_user(l);
3. comm_tb1_ptr->alloc[comm_index].x1t_pid = 0;
4, xltdir—>new = O;
5. if (smclose('directive') == DTS_ERROR) (
6. fprintf(stderr.
7, 'translate: closing directives shared memory failed\n');
8. l
9.
10. /‘ AutoSpec:
ll. I'((((n.\.l = _paramO) /\ (coset(params).cmdcntl.v != 0)) /\
12. (coset(comm_tbl_ptr).alloc[comm_index].V = 0)) /\
13. (coset(xltdir).new.v = 0)) */
14.
15. if (dereg_appl('SFOC CMD', 'com_ws', SHM_ALLOC) == RES_ERROR) (
16. fprintf(stderr,
l7. "translate: deregistration with SMC failed: %s\n',
18. smc_errlist[smc_errno]);
19. }
20.
21 /* AutoSpec:
22 '((((n.V = _paramO) /\ (coset(params).cmdcntl.v != 0)) /\
23 (coset(comm_tbl_ptr).alloc[comm_index].v = 0)) /\
24. (coset(xltdir).new.v = 0)) *l
25.
26. if (master_detach_proj(-l) == —1) (
27. fprintflstderr,
28. 'translate: cannot detach from masterfile: %s\n',
29. master_strerror(master_errno));
30. l
31.
32. /* AutoSpec:
33. "((((u.V = _paramO) /\ (coset(params).cmdcnt1.v != 0)) /\
34. (coset(comm_tbl_ptr).alloc[comm_index].v = 0)) /\
35. (coset(xltdir).new.v = 0)) 1'/
36.
37. l
38.
39. /* AutoSpec:
40. '((((n.V = _paramO) /\ (coset(params).cmdcntl.v != 0)) /\
41. (coset(comm_tb1_ptr).alloc[comm_index].v = 0)) /\
42 (coset(xltdir).new.v = 0)) \/
43 ((n.V = _paramO) /\ (!(coset(params).cmdcnt1.v != 0)))" *l
44
4S. exit(n);
46.
47. /' AutoSpec: false */
48.
49, return;
50. 1

Figure 11.16: Annotated source code for end_cmdxlt

 

Command translation. The command translation system provides two types of
command ﬁle interpretation: user mnemonic translation and command ﬁle conversion.

In addition, it was determined that the command translation system relies heavily upon

224

communicating with other Command subsystems via the use of system ﬁles. From a low-
level perspective, the processmnemonicinput and processmsg procedures are two
of the most critical procedures in the system. These procedure either directly or indirectly

control the command translation process and they constitute a critical path of execution.

Termination. The termination of the command translation process depends heavily upon
the termination of the processmsg procedure. The processsnsg procedure terminates

in one of two ways; gracefully or by fault.

Global Variables and Macros. The command translation system relies heavily upon
the use of global variables and macros. While the source code is visually compact, the

functional complexity seemed to increase with each encounter of one of these constructs.

11.7 Lessons Learned

Several lessons about our reverse engineering approach were learned while performing the

case study described in this chapter. This section summarizes these lessons.

11.7.1 Combined Analysis Technique

The utilization of a combined informal and formal process enhanced the usefulness of both
the informal and formal techniques. The informal analysis provided a structured method
for early discovery and organization of the functionality of the system. During the low-level
analysis, the informal techniques provided valuable information and cues regarding where
to focus the formal analysis. The formal analysis facilitated the functional understanding
of the underlying logic embedded in many of the structural models derived during the

low-level analysis. In addition, given many of the questions that arose after the informal

225

analysis, the formal technique provided a method for understanding certain properties of

the code.

11.7.2 Tools

The availability of tools greatly facilitated the analysis process both during the informal
and the formal phases of analysis. However, while the tools were invaluable, they need
to mature in regards to the functionality that they provide, especially in regards to user

interface concerns.

226

Chapter 12

Conclusions and Future Investigations

Consider the following scenario:

ProgrammerX developed some software 6-18 months ago to handle activity 1’.
In the process of developing the software, programmer X used some standard
semi-formal design notation until he felt he understood problem Y. Then he
wrote the software, adjusting the functionality of the various routines when new
sub-cases for problem Y were discovered. Today, programmer X has learned
that he needs to modify the system to incorporate new requirements. As he
traverses the code, he realizes that he does not recall the ﬁmctionality for some
of the routines.

Most programmers most likely can recall at least one such similar experience. The
presenting of the above scenario clearly points out the widespread need for reverse
engineering and design recovery. The techniques that are available range from ad-hoc
to mathematically rigorous methods. In this chapter we summarize the results of our

investigations and suggest future investigations.

12.1 Summary of Contributions

In this section we summarize our contributions to the ﬁeld of software engineering and
software maintenance.

227

12.1.1 Strongest Postcondition

To date, the primary use of the strongest postcondition predicate transformer has been for
the study of issues related to the theories underlying the semantics of programming [16].
In this dissertation we demonstrated how the strongest postcondition can be applied to the
problems of reverse engineering and design recovery. In doing so, we have introduced the
use of a formal technique for reverse engineering that is based on a derivational approach
for program analysis. The technique incorporates the use of the strongest postcondition
to transform an operational speciﬁcation (i.e., a program) into a behavioral speciﬁcation
in terms of predicate logic expressions. In addition, we have applied the use of strongest
postcondition to the deﬁnition of the semantics of the C programming language in order to

demonstrate the applicability of such an approach to real languages and systems.

12.1.2 Abstraction

The construction of abstract speciﬁcations, or generalizations, from as-built speciﬁcations
has primarily been focused on the use of transformation [72]. Starting with as-built
formal speciﬁcations that are constructed from programs using the strongest postcondition,
our approach facilitates deriving abstractions based on translation and the preservation of
various ordering criteria. The end result is a speciﬁcation that is a logical abstraction of the
as-built speciﬁcation. As a result, our approach ensures consistency and retains traceability
between high-level abstractions and low-level as-built speciﬁcations. In addition, the

results of this research can be used to support program understanding.

228

12.1.3 Support for Reuse

Many formal software reuse approaches depend on the assumption that a library of reusable
components is available for use. This assumption, however, may not be reasonable under
many conditions since the techniques used to develop the components may not have been
based on formal methods. In this dissertation, we have demonstrated how a formal reverse
engineering technique can be used to generate speciﬁcation-based indices for existing

components in order to populate component libraries.

12.2 Future Investigations

Our future work will explore three major areas: Reverse Engineering, Software Reuse,

Software Reengineering. and Software Testing.

12.2.1 Reverse Engineering

One of the objectives of the research described in this dissertation was to explore the
feasibility of developing a rigorous approach to the problem of reverse engineering. Our
philosophy was based on a breadth approach in that the intent was to develop techniques
that could be used as part of an overall reverse engineering process. Along the way, several

different issues were identiﬁed that merit further study and investigation.

Loops. Abd-El-Haﬁz [67] describes a knowledge-based approach for constructing
speciﬁcations of looping constructs. Several other reverse engineering approaches make
no explicit mention of a formal or informal treatment of loops. Our approach to loops was
to provide a series of guidelines that can be applied during the loop speciﬁcation process.

In order to provide a more rigorous, and perhaps more automated, method for handling

229

loops, we intend to investigate the use of techniques such as abstract interpretation [94],

and approximation algorithms for loop analysis.

Pointers. Several approaches have been suggested for handling pointer variables [95].
Our approach assumes a single level of indirection, which is appropriate for a moderately-
sized class of programs, but requires extension to several levels of indirection in order to

be applicable to a wider class of programs.

Fully Integrated informal and formal approaches. While our approach incorporates
the use of both informal and formal methods, a fully integrated approach in the respect
that formal speciﬁcations are hidden from users has not yet been realized. For example,
it would be desirable to allow a user to construct a series of diagrams that describe the
structure of the system and then have the system construct a formal speciﬁcation based on
those diagrams. Similar work has been developed for the area of software requirements
engineering and design [96]. Our intention is to investigate the feasibility of such an

approach in the area of software maintenance and reverse engineering.

Tool environments. One of the most valuable assets that a programmer can have is
access to a set of tools that support software maintenance. In addition to the tools that we
have described in this dissertation, we intend to investigate how the use of several classes
of tools such as those described in Chapter 10 can be combined into a single software
maintenance environment. The intent is to determine how the relative advantages of each
complementary reverse engineering approach can be used to provide a programmer with as

much information as possible during the software maintenance process.

230

12.2.2 Software Reuse

In Chapter 9, we described our initial investigations into the support of software reuse
via reverse engineering. Our future investigations in this area will focus on further
demonstration of the use of our formal reverse engineering technique as a means for
populating component libraries. Speciﬁcally, we intend to investigate how non-functional
architectural information (e.g., is the module a pipe, ﬁlter, client, server, etc.) can be
extracted from code in order to enrich the module speciﬁcation in such a way that enhances

the abilities of software reuse search engines.

12.2.3 Reengineering

Reverse engineering is the ﬁrst stage of the reengineering lifecycle. The existence
of formal speciﬁcations that have been recovered from code can be used to facilitate
several reengineering activities. For instance, in our previous investigations we presented
an approach for identifying and formally specifying objects that may be embedded in
imperative program code [97]. Other potential applications of reengineering that are can
be facilitated by the results of a reverse engineering phase are system modiﬁcation and
system re-implementation, where modiﬁcation refers to changing a system to add new
functionality and re-implementation refers to preservation of functionality during activities
like system retargetting. In the case of system modiﬁcation, a formal speciﬁcation can
be used as a means for verifying that modiﬁcations to various parts of a system have no
adverse impact on the functionality of other parts of a system.

Our future investigations in the area of software reengineering will focus on addressing

several issues:

231

Object-Oriented Systems. The increasing popularity of programming languages such as
C++ and Java continue to force software organizations to make decisions regarding future
deveIOpment. In order to support the transition of current systems that have been written
using imperative languages such as C and Fortran, we intend to investigate how the results
of the formal reverse engineering approach can be used to facilitate a paradigm shift to

object-oriented languages.

Impact analysis. Impact analysis is the study of the effects of software change on
systems [98]. One of the primary tenets of software reengineering is that some form of
change is imposed on a system to produce a new system. The use of impact analysis has
been used to determine how changes affect the remainder of the system. We intend to

investigate how formal speciﬁcations can be used to facilitate the impact analysis process.

232

APPENDICES

233

Appendix A

Semantics of C Expressions

This section describes the expression semantics of the C programming language using the

functions .A and V deﬁned in Section 5.1.

A.1 Assignment Operators

Let v be a variable or an assignable expression‘ and e be an expression.
Let 7 be an assignable object with an n-bit integer vector such that it has the following

bitwise evaluation:

«4(7) = (70, 71,72, . .. an) (Al)

where the components 7,- take the value of 0 or 1 and let m be some integer. The deﬁnition
of the semantics of the bitwise assignment operators <<=, >>=, &=, " =, and | =, rely on the
use of the representation in Expression A. 1. As was the case with the non-bitwise operative

assignment expressions:

 

1In terms of the C grammar, an assignable expression is a unary-expression, posq‘ix-expression, or
primary-expression.

234

N _ T ifA(v’_-‘—.’- (3)950
v(”=e)‘{F ifA(ve'

 

 

 

 

 

 

e) = o ’
where g is one of <<=, >>=, &=, "=, and | =. Table A.1 deﬁnes the evaluation semantics
of the bitwise operative assignment expressions.
(mifllm-i-lr'” 9711101102,”; ;Om) ifO < m S n
A(7 <<=m) = (70 71,72,--- ﬁn) ifmSO
0 ifm > n
(01,027"',0m170171)"'17m-1> ifO < m S n
A(v >>= m) = (70.71.72,... ,7") ifmSO
0 ifm > n
A(v&=7) = A(v )&A('7)
A(v “=7) = A"(v) Ah)
A(v |= 7) = A(v) | A(v)

 

 

 

Table A.1: Bitwise Operative Assignment Operators

 

A.2 Logical Operators

Let a and H be expressions. The logical operators | | and && are used to form logical
expressions that are commonly used within the guards of conditional statements. Table A.2

describes both the evaluational and logical semantics of the operators.

A.3 Bitwise Operators

Let 'y and r!) be objects with integer values that have the following bitwise representations:

“4(7) = (70,71,727 ' ° ' r711)

A(¢)=(¢01¢11¢23' ° ' awn)

235

 

 

 

”a ' ' A = :3 8&3? inlet;
““0"” = 32:58:83:
a...) = mg mm:
“W = 32:58:23:

 

 

Table A.2: Logical Operators

where the components 7, and 1% take the value of 0 or 1. Table A.3 summarizes the

semantics of the bitwise operators.

 

 

 

 

 

_ T if3i:0§i§n:y,-=1th,=l

V(’Yl¢’) _ F ifVi:0_<_iSn:'y;=0ArZJ,-=0
A(7 | 11)) = A(7)|A(¢) 7e

,, T if3iz03ignzy, 2/2;
W” W {F ifVizogiSnz'yizt/J,
A(7“1/J) = A(V)“A(¢)

T if3i:0$i$n:7,~=r/1,-=1

V(’7&¢) F ifVizogisn:(7,7é¢,)v(*y.-=r/J,-=0)
A(vscw) = A(7)&A(1/J)

 

Table A.3: Bitwise Operators

A.4 Equality and Relational Operators

Leta and ,8 be expressions. The logical evaluation of the equality and relational operators

have the following semantics:

_ T 'f A(a)QA(ﬂ)9é0
V(aQﬂ)—{F ifA(a)Q./1(ﬂ)=0 '

where Q is one of ==, ! =, >, <, >=, and <=. The equality and relational operators have the

semantics shown in Table A.4.

236

 

 

 

 

 

 

 

 

“4“”: > = {3 iilE:::§;:;
“W” = {3 2:522:33
AW) = {3 35822:;
““0“” = {3. 2:32:23:
1...), = {3 1:38:33
“”4 = {31:58:33

 

Table A.4: Equality and Relational Operators

A.5 Shift Operators

Let 'y be an object with an integer value such that it has the following bitwise representation:

A(fy) : (70,71,72; ° - ° ’77:)

where the components 7,- take the value of O or 1 and let m be some integer. The logical

evaluation of the shift operators has the following semantics:

__ T ifA('me);£0
W79 )_{F ifA(7Qm)=0'

where {2 is one of << and >>. Table A5 describes the semantics of the shift operators.

A.6 Additive and Multiplicative Operators

Let a and ,6 be expressions. The logical evaluation of the additive and multiplicative

operators have the following semantics:

237

 

 

 

<7mr7m+lin°97nr013023°urom> if0<mSn
A(7<<m) = (“roman-n.7,.) ifmSO

0 ifm>n

(011029"')0m170i’711°"17771—1) if0<mSW
A(7 >>m) = (70,71,72i-Hr’7n> ifm_<_0

0 ifm>n

 

 

Table A5: Shift Operators

_ T ifA(aQﬂ);£0
”ﬁlm—{F ifA(aQﬂ)=0’

where Q is one of +, -, *, / , and %. Table A.6 gives the evaluation semantics of the additive

and multiplicative operators.

 

 

 

 

 

 

A(a + £3) — «4(0) + A(ﬂ)
A(a - 5) = A(a) - A(ﬂ)
A(a * ﬂ) = A(a) >< A(ﬂ)
A(a / s) = 31(2)

A(a % s) = A(a) mod A(ﬂ)

 

 

Table A.6: Additive and Multiplicative Operators

238

Appendix B

Partial Order Lemmas

This appendix states and proves a number of lemmas regarding partial-order and weak
partial-order relations. These lemmas substantiate the notion that the abstraction match
operator is a partial-order relation. As such, the abstraction technique described in

Chapter 6 is well-founded.

B.1 Lemma 1

The exact pre/post match is reﬂexive, symmetric, and transitive (i.e., the exact pre/post
match is an equivalence relation).

Proof. By deﬁnition, the exact pre/post relation with respect to two speciﬁcations A and B
(denoted A jet B) is (Are H 8,”) A (Amt H 3”“). The following shows that jet is
reﬂexive:

AjexA

(deﬁnition of j“ )

E (A,m H Apre) A (Apost H Apost)
((X H X) 5 true)

E true A true

((true A X) E X)

E true

239

It is also straightforward to show that jet is symmetric so that (A j“, B) ——> (B j” A),
as shown below:

AjezB

(deﬁnition of j“ )

E (Apre H Bpre) A (Apost H Bpost)
(commutativity of H)

E (Bpre H Apre) A (Bpost H Apost)
(deﬁnition of in )

E BjeeA

Finally, the proof for the transitivity of 56,, so that ((A j” B)A(B _<_” C)) —> (A jet C)
is as follows:

(A:..B) A (B:..A)

(deﬁnition of j” )

E (Am H Em) A (AW, H B,,,,) A (Bm H Cm) A
(Bpost H Cpost)

(deﬁnition of H, substitution of Bpre with Am, and BM,
With Apost)

.2.- (Am H Am) A (Amt H Amt) A (A,m H Cm) A
(Apost H Cpost)

((X H X) E true, (trueAX) E X)

E (Am H Cm) A (Amt H 0pm)

(deﬁnition of jet)

5 AjexC

Since :5“ is reﬂexive, symmetric, and transitive, jet is an equivalence relation. D

B.2 Lemma 2

The plug-in match is reﬂexive, anti-symmetric, and transitive (i.e., the plug-in match is a
partial order relation).

Proof. By deﬁnition, the plug-in relation with respect to two speciﬁcations A and B
(denoted B jp, A) is (A,m —> Em) A (Bpou —-> Amt). The following shows that

240

jp, is reﬂexive:

A jp, A

(deﬁnition of jp, )

E (Ame —> Ame) A (Apost ——> APO“)
((X —> X) E true)

E true A true

((true A X) E X)

E true

The following proof shows that 5p,- is antisymmetric so that ((B 5,0,- A) A (A :5,”- B)) —>
A jee 32

((B s...- A) A (A :5...- B»

(deﬁnition of 5,”)

E (Ame -> B1,“) A (Bpost —> Amt) A (BM, —> Ame) A
(Apost “i Bpost)

(associativity of A)

E (Am —+ Em.) A (BM —+ Am) A (Amt —+ Bpost) A
(Bpost _’ Avast)

(((X —> Y) A(Y-—>X)) E (X H Y))

_=_ (A,m H Bpre) A (Apes; H Bpm)

(deﬁnition of :5”)

E A fee B

Finally, the following proof shows that 3,, is transitive so that ((B 5?, A) A (C jp; B )) —>
C jp,‘ A!

((3 ipz‘ A) A (C in 3))

(deﬁnition of 510,-)

5 (Am —> B”) A (BM, —> Amt) A (BM ——) Cm) A
(Cpost —> Bpost)

(associativity of A)

E (A,m —> BM) A (B,m ——> Cm) A (CW; —) Em“) A
(Bpost _> Apost)

(((X —> Y) A (Y —i Z)) —> (X H Z))

=> (Apre H Cpre) A (Cpost H Apost)

(deﬁnition of 5,”)

241

EijiA

Since jp, is reﬂexive, antisymmetric, and transitive, 3,; is a partial order relation. E]

8.3 Lemma 3

The exact pre/post match is reﬂexive, symmetric, and transitive (i.e., the exact pro/post
match is an equivalence relation).

Proof. By deﬁnition, the plug-in post match with respect to speciﬁcations A and B
(denoted B jpip A) is (BM, —> APO“). Since the logical operator —> is reﬂexive,

antisymmetric, and transitive, jm, is a partial order relation.

242

Appendix C

Application Program

This section contains the source code for the example discussed in Section 6.4. The
application is a C program that is part of a mission control ground-based system used by
the NASA Jet Propulsion Laboratory. The system is responsible for the translation of user
commands into appropriate spacecraft mnemonics, enabling users to modify spacecraft
mission operations. This particular module takes a sequence of elements from a ﬁle and
returns an index to a subsequence of elements speciﬁed by begin and end indices. In our
previous investigations, we described the sp semantics for C [7] and pointers [43]. Those

semantics were used to construct the / *AS AS* / annotations for the code contained in

this section.
0. /*
1. * Inputs: file (file to read from)
2. * begin (first element to copy)
3. * end (last element to copy)
4. * Outputs: none
5. * Externally read: body_1ineno (for errors)
6. * so (to translate to mission ID)
7. * Externally modified: dontoutput (errors)
8. * Returns: the elements copied (NULL on error)
9. *
10. * This routine does the actual work of opening and parsing the GCMD
11. * file, finding the elements. and returning the appropriate ones.
12. 1"/
13. struct gcmd_elem *doGCMDCopy(char *file, int begin, int end)

243

14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.

int fd;

U16 L2;

struct gcmd_hdr gcmd_hdr;
int i;

register j;

struct gcmd_elem *orig_elem;
struct gcmd_elem *elem;
struct gcmd_elem *ep;

extern int body_1ineno;

/* open the file */
/*AS (begin = BO & end = 80 & file .> F0) AS*/

fd = open_copy_file(file, &L2, CMD_DSN):

/*AS (fd = FHO & begin = BO & end

EO & file .> F0) AS’/

if (fd < 0)

{

1

/*AS (fd < 0 & fd = FHO & begin = BO & end E0 & file .> F0) AS*/
dontoutput = 1;

/*AS (dontoutput = 1 &
fd < O & fd = FHO & begin = BO & end

E0 & file .> F0) AS*/

return(NULL);

/*AS false AS*/

I'AS (fd >= 0 & fd = FHO & begin = BO & end = 80 & file .> F0) AS*/

if (lskip_gcmd_sfdu(fd. L2))

{

/*AS (skip_gcmd_sfdu(fd, L2) = 0 &
fd >= 0 & fd = FHO & begin = BO & end = £0 a file .> F0) AS*/
inform_user('line %d: copy failed: bad SFDU header (%s)',
body_1ineno, file);

/*AS (skip_gcmd_sfdu(fd, L2) = O &
fd >= 0 & fd = FHO & begin = BO & end = E0 & file .> F0) AS*/
dontoutput = 1;

/*AS (dontoutput = 1 & skip_gcmd_sfdu(fd, L2) = 0 &
fd >= 0 & fd FHO & begin = BO & end = EO & file .> F0) AS*/

close(fd);

/*AS (closed(fd) & dontoutput = 1 & skip_gcmd_sfdu(fd, L2) = 0 &
fd >= 0 & fd = FHO & begin = BO 5 end = 30 & file .> F0) AS*/

if (params->cmdcntl) master_unlock();

/'AS (params->cmdcnt1 != 0 & sp(master_unlock(),
closed(fd) & dontoutput = 1
& skip_gcmd_sfdu(£d, L2) = 0 & fd >= 0 & fd = FHO &
begin = BO & end = 80 a file .> F0)) |
(params->cmdcntl = 0 & closed(fd) & dontoutput = l &
skip_gcmd_sfdu(fd, L2) = O &
fd >= 0 & fd = FRO & begin = BO & end = E0 & file .> F0) AS*/

return(NULL);

/*AS false AS*/

244

82.

83.

84.

85.

86.

87.

88.

89.

90.

91.

92.

93.

94.

95.

96.

97.

98.

99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.

}

/*AS (skip_gcmd_sfdu(fd, L2) != O &
fd >= 0 & fd = FHO & begin = BO & end = E0 & file .> F0) AS*/
if (!get_gcmd_hdr(fd, agcmd_hdr))
{
dontoutput = 1;
close(fd);
if (params->cmdcntl) master_unlock();
return(NULL);
}
/*AS (get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &
fd >= 0 & fd = FHO & begin = BO & end = BO & file .> F0) AS*/

if (params~>sc != gcmd_hdr.SC)
{

inform_user('line %d: copy: invalid spacecraft in GCMD file (%s)',

body_1ineno, file);
dontoutput = 1;
close(fd);
if (params->cmdcntl) master_unlock();
return(NULL);
}

/*AS (params~>sc = gcmd_hdr.SC &

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd. L2) != 0 &

fd >= 0 & fd = FHO & begin = BO & end =

/* make sure the file has enough elements */
if (end == -1)
end = gcmd_hdr.elem_count;
else if (end > gcmd_hdr.elem_count)
{

BO & file

.> F0) AS*/

inform_user('line %d: copy: not enough elements in GCMD file (%s)',

body_1ineno, file);
dontoutput = 1;
close(fd);
if (params->cmdcntl) master_unlock();
return(NULL);
)

/*AS
(£0 = —1 & end = gcmd_hdr.elem_count &
params->sc = gcmd_hdr.SC &

get_gcmd_hdr(fd. gcmd_hdr) != 0 & ski _g
fd >= 0 & fd = FHO & begin = BO & E0 = E
|

(end <= gcmd_hdr.elem_count & end != -1 &

params—>sc = gcmd_hdr.SC &
get_gcmd_hdr(fd. gcmd_hdr) != 0
fd >= 0 & fd = FHO & begin = B0 end =
/* read in the elements */

orig_elem = NULL;

cndLsfdu(fd, L2) != 0 &

0 a file

EO & file

.> F0)

& skip_gcmd_sfdu(fd, L2) != 0 &
&

.> F0) AS*/

/*AS orig_elem .> NULL & ((30 = -1 & end = gcmd_hdr.elem_count &

params->sc = gcmd_hdr.SC &

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &

fd >= 0 a fd = FHO & begin = BO & E0 = E
I

(end <= gcmd_hdr.elem_count & end != -1 &
params->sc = gcmd_hdr.SC &

O & file

.> F0)

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &

fd >= 0 & fd = FHO & begin = B0 & end =

for (i = 1; i <= gcmd_hdr.elem_count; i++)

245

E0 & file

.> F0)) AS*/

150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
162.
163.
164.
165.
166.
167.
168.
169.
170.
171.
172.
173.
174.
175.
176.
177.
178.
179.
180.
181.
182.
183.
184.
185.
186.
187.
188.
189.
190.
191.
192.
193.
194.
195.
196.
197.
198.
199.
200.
201.
202.
203.
204.
205.
206.
207.
208.
209.
210.
211.
212.
213.
214.
215.
216.
217.

/*AS 01 AS*/

if (orig_elem == NULL)
{

/*AS Qi & orig_elem .> NULL AS*/
orig_elem = get_elem(fd);

/*AS Qi & orig_elem .> ObjO AS*/
elem = orig_elem;

/*AS Qi & orig_elem .> ObjO & elem .> coset(orig_elem) AS*/

else

/‘AS !(orig_elem .> NULL) & Qi AS*/
elem->next = get_elem(fd);
/*AS !(orig_elem .> NULL) & Qi & elem—>next .> Obji AS*/
elem = elem—>next;
/*AS !(orig_elem .> NULL) & Qi & elem—>next .> Obji &
elem .> coset(elem->next) AS*/
}
/*AS (orig_elem .> ObjO & Qi & elem .> coset(orig_elem)) |
(!(orig_elem .> NULL) & Qi & elem->next .> Obji &

elem .> coset(elem->next)) AS*/

if (elem == NULL)

{
dontoutput = 1;
close(fd);
if (params—>cmdcntl) master_unlock();
for (elem = orig_elem; elem != NULL; elem = ep)
{
ep = elem->next;
freelelem);
}
return(NULL);
}

/*AS !(elem .> NULL) &
(orig_elem .> ObjO & Qi & elem .> coset(orig_elem)) |
(!(orig_elem .> NULL) a 01 & elem->next .> Obji &
elem .> coset(elem->next)) AS*/

/* make sure the data isn't corrupted */
if (elem_chksum(elem) != elem->chksum)
{
inform_user('line %d: copy: checksum failed for element %d (%s)',
body_lineno, i, file);
dontoutput = 1;
close(fd);
if (params->cmdcntl) master_unlock();
for (elem = orig_elem; elem != NULL; elem = ep)
{
ep = elem->next;
free(elem);
}

246

218.
219.
220.
221.
222.
223.
224.
225.
226.
227.
228.
229.
230.
231.
232.
233.
234.
235.
236.
237.
238.
239.
240.
241.
242.
243.
244.
245.
246.
247.
248.
249.
250.
251.
252.
253.
254.
255.
256.
257.
258.
259.
260.
261.
262.
263.
264.
265.
266.
267.
268.
269.
270.
271.
272.
273.
274.
275.
276.
277.
278.
279.
280.
281.
282.
283.
284.
285.

return(NULL);
}

/*AS (elem_chksum(elem) = elem-

>chksum) & !(elem .> NULL) &

(orig_elem .> ObjO & Qi & elem .> coset(orig_elem)) I
(!(orig_elem .> NULL) & Qi & elem—>next .> Obji &
elem .> coset(elem—>next)) AS*/

/*
t 0 fields not to be copied;
* note: proj. SC, chksum, id,
* in collapse_e1em_chain();
*/

elem->remaining_rad_time = 0;

elem—>gsoc = 0;

elem->chksum = 0;

elem->elem_num = 0;

file.

and elem_num are filled in

for (j = 0; j < (sizeof(elem->mccc)/sizeof(elem->mccc[0])); j++)

elem->mccclj] = 0;

/*AS zeroed(elem) &

(e1em_chksum(elem) = elem—

(orig_elem .> ObjO & Qi &

>chksum) & !(elem .> NULL) &
elem .> coset(orig_elem)) l

(!(orig_elem .> NULL) & Qi & elem->next .> Obji &
elem .> coset(elem—>next)) AS*/

)

/*AS (forall k :

1 <= k < gcmd_hdr.elem_count :

e1em_k->next .> coset(elem_k+1) & zeroed(e1em_k)) &

orig_elem .> ObjO & e1em_1

.> coset(orig_elem) &

((E0 = -1 s end = gcmd_hdr.elem_count &

params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd, gcmd_hdr)
fd >= 0 & fd = FHO & begin
I

(end <= gcmd_hdr.elem_count
params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd, gcmd_hdr)
fd >= 0 & fd = FHO & begin

elem->next = NULL;

& skip_gcmd_sfdu(fd, L2)
& 80 = 80 & file .> F0)

skip_gcmd_sfdu(fd, L2)
end = E0 & file

ll
00
mil”

/*AS elem_gcmd_hdr.elem_count .> NULL &

(forall k :

1 <= k < gcmd_hdr.elem_count :

e1em_k->next .> coset(elem_k+1) & zeroed(e1em_k)) &

orig_elem .> ObjO & elem_1

.> coset(orig_elem) &

((E0 = -1 & end = gcmd_hdr.elenucount &

params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd, gcmd_hdr)
fd >= 0 & fd = FHO & begin
I

(end <= gcmd_hdr.elem_count
params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd, gcmd_hdr)
fd >= 0 & fd = FHO & begin

/* check the file checksum */

!= 0 & skip_gcmd_sfdu(fd, L2)

= BO & 80 = E0 & file .> F0)
& end != -1 a

!= 0 & skip_gcmd_sfdu(fd, L2)
= BO & end = E0 & file

if (!checksungcmd_chain(agcmd_hdr, orig_elem))

{

1:08:

1: 0 &

.> FO)) AS*/

!= 0 &

!= 0 &

.> FOll AS*/

inform_user('line %d: copy: checksum failed on GCMD file (%s)',

body_1ineno, file);
dontoutput = 1;

close(fd);

if (params—>cmdcntl) master_unlock();

for (elem = orig_elem; elem !=
{

ep = elem->next;

NULL; elem = ep)

247

286.
287.
288.
289.
290.
291.
292.
293.
294.
295.
296.
297.
298.
299.
300.
301.
302.
303.
304.
305.
306.
307.
308.
309.
310.
311.
312.
313.
314.
315.
316.
317.
318.
319.
320.
321.
322.
323.
324.
325.
326.
327.
328.
329.
330.
331.
332.
333.
334.
335.
336.
337.
338.
339.
340.
341.
342.
343.
344.
345.
346.
347.
348.
349.
350.
351.
352.
353.

free(e1em);

}
return(NULL);

}

/*AS checksumbgcmd_chain(gcmd_hdr, orig_elem.
elem_gcmd_hdr.e1em_count .> NULL &

V) = 0 &

(forall k : 1 <= k < gcmd_hdr.elem_count :
elem_k->next .> coset(elem_k+1) & zeroed(elem_k)) &

orig_elem .> ObjO & e1em_1 .> coset(ori
((80 = -1 s end = gcmd_hdr.elem_count &
params->sc = gcmd_hdr.SC &

g_elem) &

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 a
fd >= 0 & fd = FHO & begin = BO & E0 = E0 & file .> F0)

I

(end <= gcmd_hdr.elem_count & end != -1 &

params->sc = gcmd_hdr.SC a

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &
fd >= 0 & fd = FHO & begin = BO & end = E0 & file .> FO)) AS*/

/‘ free any initial unneeded elements */
for (i = 1, elem = orig_elem; i < begin; i++)
(

ep = elem->next;

free(elem);

elem = ep;

}

/*AS checksumbgcmd_chain(gcmd_hdr, orig_elem.
elem_gcmd_hdr.elem_count .> NULL &

V) = O &

(forall k : 1 <= k < begin : freed(elem_k)) &

(forall k : begin <= k < gcmd_hdr.elem_c

ount :

elem_k->next .> coset(elem_k+1) & zeroed(elem_k)) &

orig_elem .> NULL & elem_1 .> coset(ori
((E0 = -1 s end = gcmd_hdr.elem_count &
params->sc = gcmd_hdr.SC &

g_elem) &

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &

fd >= 0 & fd = FHO & begin = BO 5 80 =
I

(end <= gcmd_hdr.elem_count & end != -1
params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd. gcmd_hdr) !
fd >= 0 & fd = FHO & begin = B

ll
00
mm

end =

orig_elem = elem;

/*AS orig_elem .> coset(elem_begin) &
checksum_gcmd_chain(gcmd_hdr, ObjOcnstl)
elem_gcmd_hdr.e1em_count .> NULL &

BO & file

E0 & file

= 0 &

(forall k : 1 <= k < begin : freed(elem_k)) &

(forall k : begin <= k < gcmd_hdr.elem_c

ount :

.> F0)

skip_gcmd_sfdu(fd, L2) != 0 &

.> F0)) AS*/

e1em_k->next .> coset(elem_k+1) & zeroed(elem_k)) &

((80 = -1 s end = gcmd_hdr.elem_count &
params->sc = gcmd_hdr.SC &

get_gcmd_hdr(fd, gcmd_hdr) 1: 0 & skip_gcmd_sfdu(fd, L2) != 0 &

fd >= 0 & fd = FHO & begin = BO & E0 =
I

(end <= gcmd_hdr.elem_count & end != -l
params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd. gcmd_hdr) ! &

fd >= 0 & fd = FHO & begin = B & end =
/* zero out the first element copied, only */
elem->delay = 0;

while (i++ < end)
elem = elem->next;

248

BO & file

E0 & file

.> F0)

skip_gcmd_sfdu(fd, L2) != 0 &

.> F0)) AS*/

354.
355.
356.
357.
358.
359.
360.
361.
362.
363.
364.
365.
366.
367.
368.
369.
370.
371.
372.
373.
374.
375.
376.
377.
378.
379.
380.
381.
382.
383.
384.
385.
386.
387.
388.
389.
390.
391.
392.

393

394.
395.
396.
397.
398.
399.
400.
401.
402.
403.
404.
405.
406.
407.

408

409.
410.
411.
412.
413.
414.
415.
416.
417.
418.
419.
420.
421.

ep = elem;

elem =

elem—>next;

ep->next = NULL;

/*AS ep .> coset(elem_end) & elem .> coset(elem_end->next) &
elem_end->next .> NULL & orig_elem .> coset(elem_begin) &
checksunLgcmd_chain(gcmd_hdr, 0bj0cnst1) = 0 &
elem_gcmd_hdr.elem_count .> NULL &

(
(

(

l
(

forall k : 1 <= k < begin :

freed(e1em_k)) &

forall k : begin <= k < gcmd_hdr.elem_count :
elemBk->next .> coset(elem_k+1) & zeroed(elem_k)) &
(80 = -1 & end = gcmd_hdr.elem_count &

params->sc = gcmd_hdr.SC &

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd. L2) != 0 &

fd >= 0 & fd = FHO & begin =

BO & E0 = E0 & file

end <= gcmd_hdr.elem_count & end != -1 &

params—>sc = gcmd_hdr.SC &

.> F0)

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &

fd >= 0 & fd = FHO & begin =

BO & end = E0 & file

/* free any terminal unneeded elements */

while
1

ep =

(i++ <= gcmd_hdr.elem_count)

elem->next;

freelelem);
elem = ep;

1

/*AS (

forall k : end < k < gcmd_hdr.elem_count :

.> FO)) AS*/

freed(e1em_k)) &

ep .> coset(elem_gcmd_hdr.elem_count) & elem .> coset(ep) &
elem_end—>next .> NULL & orig_elem .> coset(elem_begin) &
checksum_gcmd_chain(gcmd_hdr, Obj0cnst1) = 0 &
elem_gcmd_hdr.elem_count .> NULL &

(
(

(

I
(

close(

forall k : 1 <= k < begin :

freed(e1em_k)) &

forall k : begin <= k < end :
e1em_k->next .> coset(elem_k+1) & zeroed(elem_k)) &
(£0 = -1 s end = gcmd_hdr.elem_count &

params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd, gcmd_hdr) !

fd >= 0 & fd = FHO & begin =

0 & sk
BO & E0 = 30 a file

end <= gcmd_hdr.elem_count & end != —1 &

params->sc = gcmd_hdr.SC &
get_gcmd_hdr(fd, gcmd_hdr)

!
fd >= 0 & fd = FHO & begin =

fd);

/*AS closed(fd) &

(

forall k : end < k < gcmd_hdr.elem_count :

O &
BO & end = EO & file

ip_gcmd_sfdu(fd, L2) != 0 &

.> F0)

skip_gcmd_sfdu(fd. L2) != 0 &

.> F0)) AS*/

£reed(elem_k)) &

ep .> coset(elem_gcmd_hdr.elem_count) & elem .> coset(ep) &
elem_end->next .> NULL & orig_elem .> coset(elemeegin) &
checksunLgcmd_chain(gcmd_hdr. ObjOcnstl) = 0 &
elem_gcmd_hdr.elem_count .> NULL &

(
(

(

forall k : 1 <= k < begin :

freed(e1em_k)) &

forall k : begin <= k < end :
e1em_k->next .> coset(elem_k+l) & zeroed(elemLk)) &
(E0 = -1 & end = gcmd_hdr.elem_count & params->sc
!= 0 & skip_gcmd_sfdu(fd, L2) != O &

& get_gcmd_hdr(fd, gcmd_hdr)

fd >= 0 & fd = FHO & begin =

= gcmd_hdr.SC

BO & EO = EO & file .> F0)

(end <= gcmd_hdr.elem_count & end != —1 &

params->sc = gcmd_hdr.SC &

get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &

fd >= 0 & fd = FHO & begin =

BO & end = E0 & file

249

.> F0)) AS*/

422. if (params—>cmdcntl) master_unlock();

423.

424. /'AS

425. (params—>cmdcntl != 0 & sp(master_unlock(),

426. closed(fd) &

427. (forall k : end < k < gcmd_hdr.elem_count : freed(elem_k)) &
428. ep .> coset(elem_gcmd_hdr.e1em_count) & elem .> coset(ep) &
429. elem_end->next .> NULL & orig_elem .> coset(elem_begin) &
430. checksum_gcmd_chain(gcmd_hdr, ObjOcnstl) = 0 a

431. elem_gcmd_hdr.elem_count .> NULL &

432. (forall k : 1 <= k < begin : freed(elem_k)) &

433. (forall k : begin <= k < end :

434. elem_k—>next .> coset(elem_k+1) & zeroed(elem_k)) &

435. ((30 = -1 s end = gcmd_hdr.elem_count &

436. params->sc = gcmd_hdr.SC &

437. get_gcmd_hdr(fd. gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &
438. fd >= 0 & fd = FHO & begin = BO & £0 = E0 & file .> F0)
439. I

440. (end <= gcmd_hdr.elem_count & end != -1 &

441. params—>sc = gcmd_hdr.SC &

442. get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 &
443. fd >= 0 & fd = FHO & begin = BO & end = 80 & file .> F0))))
444. |

445. (closed(fd) &

446. (forall k : end < k < gcmd_hdr.elenLcount : freed(elenukll &
447. ep .> coset(elem_end) & elem .> coset(elem_end->next) &

448. elem_end->next .> NULL & orig_elem .> coset(elem_begin) &
449. checksum_gcmd_chain(gcmd_hdr. ObjOcnstl) = 0 &

450. e1em_gcmd_hdr.elem_count .> NULL &

451. (forall k : 1 <= k < begin : freed(elem_k)) &

452. (forall k : begin <= k < end :

453. elem_k—>next .> coset(elemLk+l) & zeroed(elem_k)) &

454. ((E0 = -1 & end = gcmd_hdr.elem_count &

455. params—>sc = gcmd_hdr.SC &

456. get_gcmd_hdr(fd, gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 a
457. Ed >= 0 & fd = FHO & begin = BO & E0 = EO & file .> F0)
458. |

459. (end <= gcmd_hdr.elem_count & end != -1 &

460. params->sc = gcmd_hdr.SC &

461. get_gcmd_hdr(fd. gcmd_hdr) != 0 & skip_gcmd_sfdu(fd, L2) != 0 a
462. Ed >= 0 & fd = FHO & begin = BO & end = E0 & file .> F0)))
463.

464. return(orig_elem);

465. l

250

Appendix D

Software Reuse Speciﬁcations

This appendix contains the as-built speciﬁcations of the queue library and the

corresponding library speciﬁcation as presented in Section 9.3.

D.1 As-built speciﬁcation for the Queue source code

Figure D.1 shows the as-built speciﬁcations for the queue source code as they
were constructed by the AUTOSPEC system. The ﬁgure contains ﬁve speciﬁcations
corresponding to the dequeue, enQueue, new_queue, head, and is-empty
operations. The format for the speciﬁcations, based on the Larch interface language [42] is

shown in Figure 6.1.

251

 

spec QDATA dequeue(Queue *q)
locals
int temp
requires
(q .> _param2) &&
(_param2.tail.v = _pVal3.tail) &&
(_param2.head.v = _pVal3.head)
modifies
q (_param2)
ensures
(q .> _param2) &&
(_param2.tail.v = _pVal3.tail) &&
((as_const2 == _pVa13.head) &&
(is_empty(_param2.V) != 1) &&
(temp.v == (as_const2 % MAXSIZE)) &&
(_param2.head.v = (as_const2 + 1)) &&
(return.V = _param2.data[temp.V]))
I I
((_param2.head.v = _pVal3.head) &&
(!(is_empty(_param2.V) != 1)) &&
(return.V = 0))

spec QDATA head(const Queue q)
requires
(q.V = _paraml) &&
(q.tail.V == _pVal3.tail) &&
(q.head.V == _pVal3.head)

ensures
(q.V = _paraml) &&
(q.tail.v = _pVal3.tail) &&
(q.head.V = _pVa13.head)
(return.V = q.data[(Q-head.v % MAXSIZE)])

spec Queue *new_queue()

requires
true

ensures
(ner.V .> o) &&
(o.head.V = 0) &&
(o.tail.v = 0) &&
(return.V = ner.V)

spec int enQueue(Queue *q, QDATA *e)

requires

(((e .> _param4) &&
(_param4.V == _pVa15)) &&
((q .> _param3) &&
(_param3.tail.v == _pVal4)))

modifies
q (_param3)

ensures

((((((e .> _param4) &&
(_param4.V == _pVa15)) &&
((q .> _param3) &&

(_param3.tail. == _pVal4))) &&
((_param3.tai1.v -
_param3.head.v == MAXSIZE)) &&

(return.V = 0)) II
(((((((e .> _param4) &&

(_param4.V == _pVa15)) &&

((q .> _param3) &&

(_param3.tail.v == _pVal4))) &&

(!((_pVa14 - _param3.head.V) ==
MAXSIZE))l &&

(_param3.data[(_pVal4 % MAXSIZE)].V =
_param4.V)) ea

(_param3.tail.v = (_pVa14 + 1))) &&

(return.V = 1)))

spec int is_empty(const Queue q)
requires

(q.V = _paramO)
ensures

(q.V = _paramO) &&

(return.V = (q.head.V == q.tail.V))

Figure D.1: AUTOSPEC output of as-built speciﬁcations for queue source code

 

252

D.2 Circular Queue Library Speciﬁcation

Figure D.2 shows the library speciﬁcation for the queue source code. The format for the
speciﬁcations is based on the Larch interface language [42] and is used by the ABRIE

system as a means for storing speciﬁcations and references to supporting source code.

 

Module CircularQueue
Ports
ProcDef dequeue(Queue* q) return int {
uses auxTheories;
requires true;
modifies q.head;
ensures
(q.head‘ ”= q.tail‘ /\ q.head' = q.head‘ + 1 /\
result = q.data[mod(q.head‘,MAXSIZE)l)
\/
(q.head' = q.head‘ /\ q.head‘ = q.tail‘ /\ result = O);
}

ProcDef enQueue(Queue* q, int e) return int (
uses auxTheories;
requires true
modifies q.tail, q.data;

ensures
(q.tail - q.head = MAXSIZE) /\ result = 0)
\/
(q.tail‘ - q.head‘ ”= MAXSIZE /\
q.data'[mod(q.tail‘,MAXSIZE)l = e /\

q.tail' = q.tail‘ + 1 /\
result = 1);
)

ProcDef head(Queue q) return int (

uses auxTheories;

requires true;

ensures result = q.data[mod(q.head‘.MAXSIZE)l;
}

ProcDef is_emptY(Queue q) return Bool {
uses auxTheories;
requires true
ensures result = (q.head == q.tail);
l

ProcDef new;queue() return Queue' (

uses auxTheories;

requires true;

ensures o.head = 0 /\ o.tail = 0 /\ result = o;
}

Implementation
source ('/user/r02/chengb/gannod/Research/Cichueue/queue/'.'queue.c')

End

Figure D.2: Circular Queue Library Speciﬁcation

 

253

Appendix E

processmemonicinput Source
Code

This appendix contains the source code for the processmemonic-input procedure.
The purpose of this procedure is to parse an input stream and to invoke the processmsg

translation routine.

int process_mnemcnic_input(strp, parms)
char Hstrp;
struct project_parameters *parms;

char *3 = *strp;

struct tokens tokens;

struct tokens *tp = stokens;
struct interp_state state;

struct interp_state *sp = &state;
int len;

U16 *ep;

/* set up token list */
bzero(tp, sizeof('tp));
tp->end_token = tp->t - 1;

/* copy special character list into token list */

strcpy(", tp->special_chars);

if (parms->field_delimiter != '*') /* '*' indicates none specified */
sprintf(tp->special_chars, '%s%c'.
tp->special_chars. parms—>field_delimiter);

if (parms->command_delimdter != '*')
sprintf(tp—>specia1_chars, '%s%c',
tp->specia1_chars, parms->command_delimiter);

if (parms->message_delimiter != '*')
sprintf(tp->special_chars, '%s%c'.
tp->special_chars, parms->message_de1imiter);

if (parms->left_bracket != '*')
sprintf(tp->specia1_chars, '%s%c',
tp->specia1_chars, parnB->left_bracket);

if (parms->right_bracket != '*’)
sprintf(tp->special_chars, '%s%c',
tp->special_chars, parms—>right_bracket);

/* initialize interpreter state ‘/

254

ol" I v

. J

sp->fail_token = NULL;
sp->fail_reason = NULL;
sp—>msg_entry = NULL;
sp->stem_name = NULL;
stem_entry = NULL;

/* tokenize the input str ‘/
while (*s != ’\0’)
{

char *cp = 3;

char 'delim;

/* skip initial blanks */
while (isspace(*cp))
cp++;
S = CD;
/‘ find the end of the token */
delim = find_delim(s, tp->special_chars);

cp = delim;

/' calculate length */

if (cp == 3)
len = 0;

else

{

while (isspace(‘-—cp))

cp++;
len = cp — s;

}

save_tok(s. len);

if (*delim == '\0')
{
s = delim;
break;
}
else if (*delim == parms—>message_delimiter)
{
s = delim + 1;
break; /* complete msg */
1
else if (*delim == parms—>field_delimiter)
{
s = delim + 1;
if ('8 == '\0') /* last (default) argument */
{
save_tok(s. 0);
l
1
else /' command delim or bracket */
(
save_tok(delim, 1);
s = delim + 1;
if (*delim == parms->right_bracket)
(
if (*3 == panms->command_delimiter)
{
save_tok(s. l);
S++;
}
)

255

/* analyze the token stream */
ep = get_first_entry(248); /* 248 contains the message entries */

do

i
/* set globals to initial values */
tp->token_index = tp->t;
Q = control_list;
Sp->num_of_commands = 0;
sp—>failed = 0;
sp->cmd_delimiter_deferred = 0;

if (processdmsg(ep, tp, sp, parns))
break;
} while ((ep = get_next_entry(ep)) != collectionl249]);

/‘ if we didn't find any match in 248. generate error message */
if (sp—>failed)

generate_errordmsg(sp, tp);

#ifdef DEBUGZ

else
if (strlen(*strp) > RESP_LN - 20)

inform_user('parsed line: '%.*s...", RESP_LN-20, *strp);
else

informLuser('parsed line: '%s", *strp);
#endif

*strp = s;

/* save the stem name for the comment in the message output */
stem_entry = sp->stem_name;

return(!sp->failed);

_ 256

BIBLIOGRAPHY

257

Bibliography

[1] N. G. Leveson and C. S. Turner, “An Investigation of the Theme-25 Accidents,” IEEE
Computer, vol. 26, pp. 18—41, July 1993.

[2] Report by the Inquiry Board, “ARIANE 5 Flight 501 Failure,” tech. rep., European
Space Agency, 1996. IL. Lions, Chairman of the Board.

[3] E. J. Chikofsky and J. H. Cross, “Reverse Engineering and Design Recovery: A
Taxonomy,” IEEE Software, vol. 7, pp. 13-17, January 1990.

[4] R. Covington, ed., Formal Methods Speciﬁcation and Veriﬁcation Guidebook for
Software and Computer Systems; Volume I : Planning and Technology Insertion,
vol. NASA-GB-002-95. National Aeronautics and Space Administration, July 1995.

[5] J. Crow, ed., Formal Methods Speciﬁcation and Analysis Guidebook for the
Veriﬁcation of Soﬁware and Computer Systems; Volume 2: A Practioner’s
Companion, vol. NASA-GB-001-97. National Aeronautics and Space
Administration, May 1997.

[6] G. C. Gannod and B. H. C. Cheng, “Strongest Postcondition as the Formal Basis for
Reverse Engineering,” Journal of Automated Software Engineering, vol. 3, pp. 139—
164, June 1996. A preliminary version appeared in the Proceedings for the IEEE
Second Working Conference on Reverse Engineering, July 1995.

[7] G. C. Gannod and B. H. C. Cheng, “Using Informal and Formal Methods for the
Reverse Engineering of C Programs,” in Proceedings of the 1996 International
Conference on Software Maintenance, pp. 265—274, IEEE, 1996. Also appears in
the Proceedings for the Third IEEE Working Conference on Reverse Engineering.

[8] B. H. C. Cheng, “Applying formal methods in automated software engineering,”
Journal of Computer and Software Engineering, vol. 2, no. 2, pp. 137—164, 1994.

[9] R. S. Pressman, Software Engineering A Practitioner’s Approach. McGraw-Hill,
fourth ed., 1997.

[10] E. J. Byme, “A Conceptual Foundation for Software Re-engineering,” in Proceedings
for the Conference on Soﬁ‘ware Maintenance, pp. 226—235, IEEE, 1992.

[11] E. J. Byme and D. A. Gustafson, “A Software Re-engineering Process Model,” in
COMPSAC, ACM, 1992.

258

[12] E. Yourdon and L. Constantine, Structured Analysis and Design: Fundamentals
Discipline of Computer Programs and System Design. Yourdon Press, 1978.

[13] J. M. Wing, “A Speciﬁer’s Introduction to Formal Methods,” IEEE Computer, vol. 23,
pp. 8—24, September 1990.

[14] J. Rushby, “Formal Methods and the Certiﬁcation of Critical Systems,” Tech. Rep.
SRI-CSL-93-7, Computer Science Laboratory, SRI International, Menlo Park, CA,
December 1993.

[15] C. A. R. Hoare, “An axiomatic basis for computer programming,” Communications
of the ACM, vol. 12, pp. 576—580, October 1969.

[16] E. W. Dijkstra and C. S. Scholten, Predicate Calculus and Program Semantics.
Springer-Verlag, 1990.

[17] B. H. C. Cheng and G. C. Gannod, “Abstraction of Formal Speciﬁcations from
Program Code,” in Proceedings for the IEEE 3rd International Conference on Tools
forArtiﬁcial Intelligence, pp. 125—128, IEEE, 1991.

[18] J. Jeng and B. H. C. Cheng, “Using Automated Reasoning Techniques to Determine
Software Reuse,” International Journal of Software Engineering and Knowledge
Engineering, vol. 2, pp. 523-546, December 1992.

[19] A. M. Zaremski and J. M. Wing, “Speciﬁcation Matching of Software Components,”
in Proceedings of the 3rd ACM SIGSOFT Symposium on the Foundations of Software
Engineering, 1995.

[20] J. Penix and P. Alexander, “Toward Automated Component Adaptation,” in
Proceedings of the 9th International Conference on Software Engineering and
Knowledge Engineering, June 1997.

[21] A. M. Zaremski and J. M. “ring, “Signature Matching: a Tool for Using Software
Libraries,” ACM Transactions on Software Engineering and Methodology, April
1995.

[22] J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, and W. Lorensen, Object-Oriented
Modeling and Design. Englewood Cliffs, New Jersey: Prentice Hall, 1991.

[23] R. H. Bourdeau and B. H. C. Cheng, “A formal semantics of object models,” IEEE
Trans. on Soﬁware Engineering, vol. 21, pp. 799—821, October 1995.

[24] Y. Wang and B. H. C. Cheng, “Formalizing and integrating the functional model
within omt,” in Proceedings of the International Conference on Soﬁware Engineering
and Knowledge Engineering, June 1998.

[25] E. W. Dijkstra, A Discipline of Programming. Prentice Hall, 1976.

[26] D. Gries, The Science of Programming. Springer-Verlag, 1981.

259

[27] S. Katz and Z. Manna, “Logical Analysis of Programs,” Communications of the ACM,
vol. 19, pp. 188—206, April 1976.

[28] G. C. Gannod and B. H. C. Cheng, “A Formal Automated Approach for Reverse
Engineering Programs with Pointers,” Tech. Rep. MSU-CPS-97-l9, Michigan State
University, 1997.

[29] B. W. Kemighan and D. M. Ritchie, The C Programming Language. Englewood
Cliffs, New Jersey: Prentice Hall, 1988.

[30] W. Landi, “Undecidability of Static Analysis,” ACM Letters on Programming
Languages and Systems, vol. 1, no. 4, pp. 323—337, 1992.

[31] J.-J. Jeng and B. H. C. Cheng, “Speciﬁcation Matching for Software Reuse: A
Foundation,” in Proceedings of the ACM Symposium on Software Reuse, pp. 97—105,
1995.

[32] B. Fischer, M. Kievemagel, and W. Struckmann, “VCR: A VDM-Based software
component retrieval tool,” in Proceedings of the ACM Symposium on Formal Methods
Application in Engineering Practice, 1995.

[33] B. Fischer, M. Kievemagel, and W. Struckmann, “Deduction-Based Software
Component Retrieval,” in Proceedings of IJCAI ’95 Workshop on Formal Approaches
to the Reuse of Plans, Proofs, and Programs, August 1995.

[34] A. Mili, R. Mili, and R. T. Mittermeir, “Storing and Retrieving Software Components:

A Reﬁnement Based System,” IEEE Transactions on Soﬁware Engineering, vol. 23,
July 1997.

[35] J .-J . Jeng and B. H. C. Cheng, “Reusing Analogous Components,” IEEE Transactions
on Knowledge and Data Engineering, 1996.

[36] A. Quilici, “A Memory-Based Approach to Recognizing Program Plans,”
Communications of the ACM, vol. 37, pp. 84—93, May 1994.

[37] S. R. Tilley, K. Wong, M.-A. Storey, and H. A. Miiller, “Programmable Reverse
Engineering,” The International Journal of Software Engineering and Knowledge
Engineering, vol. 4, no. 4, pp. 501-520, 1994.

[38] M. P. Ward and K. H. Bennett, “A Practical Solution to Reverse Engineering Legacy
Systems using Formal Methods,” in Proceedings of the Working Conference on
Reverse Engineering, 1993.

[39] I. D. Baxter and M. Mehlich, “Reverse Engineering is Reverse Forward Engineering,”

in Proceedings of the Fourth IEEE Working Conference on Reverse Engineering,
IEEE, October 1997.

[40] D. Smith, “KIDS: A Semi-automatic Program Development System,” Transactions
on Software Engineering, vol. 16, pp. 1024-1043, September 1990.

260

[41] L. L. Jilani, J. Deshamais, M. Frappier, R. Mili, and A. Mili, “Retrieving Software
Components That Minimize Adaptation Effort,” in Proceedings of the 12th Automated
Software Engineering Conference, pp. 255—262, Nov 1997.

[42] J. Guttag and J. Horning, Larch: Languages and Tools for Formal Speciﬁcation.
Springer-Verlag, 1993.

[43] G. C. Gannod and B. H. C. Cheng, “A Formal Automated Approach for
Reverse Engineering Programs with Pointers,” in Proceedings of the Twelfth IEEE
International Automated Soﬁware Engineering Conference, pp. 219—226, IEEE,
1997.

[44] C. Rich and R. C. Waters, The Programmer’s Apprentice. ACM-Press, 1990.

[45] M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao,
E. Bugnion, and M. S. Lam, “Maximizing multiprocessor performance with the suif
compiler,” IEEE Computer, December 1996.

[46] G. Sander, “Graph layout through the vcg tool,” in Proceedings of Graph Drawing,
DIMACS International Workshop GD ’94, Lecture Notes in Computer Science,
vol. 894, pp. 194-205, Springer-Verlag, 1995.

[47] A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools.
Addison Wesley, 1986.

[48] J. Ousterhout, Tcl and the Tk Toolkit. Addison-Wesley, 1994.
[49] A. Nerode and R. A. Shore, Logic For Applications. Springer-Verlag, 1993.
[50] J. Levine, T. Mason, and D. Brown, lex & yacc. O’Reilly & Associates, 1992.

[51] J. Wielemaker, SWI-Prolog 3.0 Reference Manual. University of Amsterdam, July
1998.

[52] Y. Chen and B. H. C. Cheng, “Formalizing and automating component reuse,” in Proc.
of 9th IEEE Intl. Conference on Tools with Artiﬁcial Intelligence, November 1997.

[53] Y. Chen and B. H. C. Cheng, “Facilitating an automated approach to architecture-
based software reuse,” in Proceedings of the 12th International Conference on
Automated Software Engineering, 1997.

[54] Microsoft Corporation, Microsoft Visual C++ MF C Library Reference, Part I & 2,
1997.

[55] D. Garlan and D. Perry, “Introduction to the special issue on software architecture,”
IEEE Transaction on Software Engineering, vol. 21, April 1995.

[56] D. E. Perry and A. L. Wolf, “Foundations for the study of software architecture,” ACM
SIGSOFT Soﬁware Engineering Notes, vol. 17, October 1992.

261

 

[57] M. Shaw and D. Garlan, Software Architectures: Perspectives on an Emerging
Discipline. Prentice Hall, 1996.

[58] G. C. Gannod and B. H. C. Cheng, “A formal approach to reverse engineering c
programs,” Tech. Rep. MSUCPS-TR98-l2, Michigan State University, April 1998.

[59] M. A. Weiss, Algorithms, Data Structures, and Problem Solving with C + +. Addison-
Wesley Publishing Company, Inc., 1996.

[60] N. Zvegintzov, ed., Software Management Technology Reference Guide. Software
Management News Inc., 1994.

[61] M. R. Olsem and C. Sittenauer, “Reengineering technology report (vol. 1 and 2),”
tech. rep., Software Technology Support Center, Hill AFB, 1995.

[62] B. Bellay and H. Gall, “A Comparison of four Reverse Engineering Tools,” in
Proceedings for the Fourth Working Conference on Reverse Engineering, pp. 2-1 1,
IEEE, 1997.

[63] N. Zvegintzov, “A Resource Guide to Year 2000 Tools,” Computer, vol. 30, pp. 58—
63, March 1997.

[64] A. W. Brown and K. C. Wallnau, “A Framework for Evaluating Software
Technology,” Soﬁware, vol. 13, pp. 39—49, September 1996.

[65] S. Rugaber, K. Stirewalt, and L. Wills, “The Interleaving Problem in Program
Understanding,” in Proceedings for the Second Working Conference on Reverse
Engineering, IEEE, 1995.

[66] J. Q. Ning, A. Engberts, and W. Kozaczynski, “Automated Support for Legacy Code
Understanding,” Communications of the ACM, vol. 37, pp. 50—57, May 1994.

[67] S. K. Abd-El-Haﬁz and V. R. Basili, “A Knowledge-Based Approach to the Analysis
of Loops,” Transactions on Software Engineering, vol. 22, pp. 339—360, May 1996.

[68] “Xinotech.” [Online] Available http:l/www.xinotech.com/tech-overview.html.
[69] “Imagix 4D.” [Online] Available http:l/www.teleport.com/~imagix.

[70] “The McCabe Visual Reengineering Toolset.” [Online] Available
http:l/gate.mccabe.comlvisual/reeng.html.

[71] G. C. Murphy, D. Notkin, and K. Sullivan, “Software Reﬂexion Models: Bridging
the Gap between Source and High-Level Models,” in Proceedings of the third ACM
SIGSOFT Symposium on the Foundations of Software Engineering, 1995.

[72] M. Ward, F. Calliss, and M. Munro, “The Maintainer’s Assistant,” in Proceedings for
the Conference on Software Maintenance, IEEE, 1989.

262

[73] J. Bowen, P. Breuer, and K. Lano, “The REDO Project: Final Report,” Tech. Rep.
PRG-TR-23-9l, Oxford University, 1991.

[74] “Peritus Software Services.” [Online] Available http:llwww.peritus.com/.

[75] W. Kozaczynski and J. Q. Ning, “Automated Program Understanding by Concept
Recognition,” Automated Software Engineering, vol. 1, no. 1, pp. 61-78, 1994.

[76] D. W. Binkley and K. B. Gallagher, “Program Slicing,” in Advances in Computers
(M. Zelkowitz, ed.), vol. 43, Academic Press, 1996.

[77] L. Markosian, P. Newcomb, R. Brand, S. Burson, and T. Kitzmiller, “Using an
Enabling Technology to Reengineer Legacy Systems,” Communications of the ACM,
vol. 37, pp. 58—70, May 1994.

[78] P. Newcomb, “Reengineering Procedural Into Data Flow Programs,” in Proceedings
for the Second Working Conference on Reverse Engineering, pp. 32—38, IEEE, 1995.

[79] K. Wong, S. R. Tilley, H. A. Miiller, and M.-A. D. Storey, “Structural
redocumentation: A case study,” IEEE Software, pp. 46—54, January 1995.

[80] G. C. Murphy and D. Notkin, “Reengineering with Reﬂexion Models: A Case Study,”
Computer, vol. 30, pp. 29—36, August 1997.

[81] F. L. Bauer, B. Moller, H. Partsch, and P. Pepper, “Formal Program Consdtruction
by Transformations— Computer-Aided, Intuition-Guided Programming,” IEEE
Transactions on Software Engineering, May 1991.

[82] P. E. London and M. S. Feather, “Implementing speciﬁcation freedoms,” in Readings
in Artiﬁcial Intelligence and Software Engineering (C. Rich and R. C. Waters, eds.),
pp. 285—305, Los Altos, CA: Morgan Kaufman, 1986.

[83] D. S. \Vrle, “Local formalisms: “Widening the spectrum of wide-spectrum languages,”
in Program Speciﬁcation and Transformation, pp. 459—481, 1987.

[84] M. Ward, “Abstracting a Speciﬁcation from Code,” Journal of Soﬁware Maintenance:
Research and Practice, vol. 5, pp. 101-122, 1993.

[85] T. Bull, “An Introduction to the WSL Program Transformer,” in Proceedings for the
Conference on Software Maintenance, pp. 242—250, IEEE, 1990.

[86] E. Younger, Z. Luo, K. Bennett, and T. Bull, “Reverse Engineering Concun'ent
Programs using Formal Modelling and Analysis,” in Proceedings of the 1996
International Conference on Software Maintenance, pp. 255—264, IEEE, 1996.

[87] “Verilog logiscope.” [Online] Available http:llwww.verilogusa.com/log/logiscop.htrn.

[88] “Cayenne ensemble.” [Online] Available http:llwww.cayennesoft.com/
products/datasheets/ensemsoft.html.

263

[89] M. T. Harandi and J. Q. Ning, “Knowledge-Based Program Analysis,” IEEE Software,
vol. 7, pp. 74-81, January 1990.

[90] R. A. Slusser, “Advanced Multimission Operations System (AMMOS) Detailed
Capabilities Catalog and Adaptation Guide,” Tech. Rep. JJ MOSOOOZO-OO-06(JPL
D-5140), Jet Propulsion Laboratory - California Institute of Technology, June 1995.
Internal JPL Document.

[91] D. F. Miller and L. A. Palkovic, “Space Flight Operations Center User’s Guide
for Workstation End Users,” Tech. Rep. U6 MOSOOOS8-OO-11-04(JPL D-6060), Jet
Propulsion Laboratory - California Institute of Technology, June 1994. Internal JPL
Document.

[92] E. J. Byme, “Generating Project-Speciﬁc Reengineering Process Models,” in
Proceedings of the 6th Annual DoD Software Technology Conference, Department
of Defense, 1994.

[93] N. Dehghani, “Space Flight Operations Center Command Subsystem (CMD)
Software Speciﬁcations Document,” Tech. Rep. SCMD0007-00-02, Jet Propulsion
Laboratory - California Institute of Technology, January 1990. Internal JPL
Document.

[94] P. Cousot, “Abstract Interpretation,” ACM Computing Surveys, vol. 28, pp. 324-328,
June 1996. ‘

[95] B. Steensgaard, “Points-to Analysis in Almost Linear Time,” in Proceedings of the
23rd ACM SIGPLAN Symposium on the Principles of Progamming Languages, 1996.

[96] E. Y. Wang, Integrating Informal and Formal Approaches to Object-Oriented Analysis
and Design. PhD thesis, Michigan State University, Department of Computer Science,
May 1998.

[97] G. C. Gannod and B. H. C. Cheng, “A Two Phase Approach to Reverse Engineering
Using Formal Methods,” Lecture Notes in Computer Science: Formal Methods in
Programming and Their Applications, vol. 735, pp. 335-348, July 1993.

[98] R. S. Arnold and S. A. Bohner, “Impact Analysis - Towards a Framework for
Comparison,” in Proceedings of the Conference on Software Maintenance, pp. 292—
301, IEEE, 1993.

264

HICHIGQN STATE UNIV. LIBRARIES
lllllllllllllllllllllllllllll lllllllll
31293016883617